@booklib/skills 1.2.0 → 1.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CONTRIBUTING.md +122 -0
- package/README.md +20 -2
- package/ROADMAP.md +36 -0
- package/animation-at-work/evals/evals.json +44 -0
- package/animation-at-work/examples/after.md +64 -0
- package/animation-at-work/examples/before.md +35 -0
- package/animation-at-work/scripts/audit_animations.py +295 -0
- package/bin/skills.js +552 -42
- package/clean-code-reviewer/SKILL.md +109 -1
- package/clean-code-reviewer/evals/evals.json +121 -3
- package/clean-code-reviewer/examples/after.md +48 -0
- package/clean-code-reviewer/examples/before.md +33 -0
- package/clean-code-reviewer/references/api_reference.md +158 -0
- package/clean-code-reviewer/references/practices-catalog.md +282 -0
- package/clean-code-reviewer/references/review-checklist.md +254 -0
- package/clean-code-reviewer/scripts/pre-review.py +206 -0
- package/data-intensive-patterns/evals/evals.json +43 -0
- package/data-intensive-patterns/examples/after.md +61 -0
- package/data-intensive-patterns/examples/before.md +38 -0
- package/data-intensive-patterns/scripts/adr.py +213 -0
- package/data-pipelines/evals/evals.json +45 -0
- package/data-pipelines/examples/after.md +97 -0
- package/data-pipelines/examples/before.md +37 -0
- package/data-pipelines/scripts/new_pipeline.py +444 -0
- package/design-patterns/evals/evals.json +46 -0
- package/design-patterns/examples/after.md +52 -0
- package/design-patterns/examples/before.md +29 -0
- package/design-patterns/scripts/scaffold.py +807 -0
- package/domain-driven-design/SKILL.md +120 -0
- package/domain-driven-design/evals/evals.json +48 -0
- package/domain-driven-design/examples/after.md +80 -0
- package/domain-driven-design/examples/before.md +43 -0
- package/domain-driven-design/scripts/scaffold.py +421 -0
- package/effective-java/evals/evals.json +46 -0
- package/effective-java/examples/after.md +83 -0
- package/effective-java/examples/before.md +37 -0
- package/effective-java/scripts/checkstyle_setup.py +211 -0
- package/effective-kotlin/evals/evals.json +45 -0
- package/effective-kotlin/examples/after.md +36 -0
- package/effective-kotlin/examples/before.md +38 -0
- package/effective-python/evals/evals.json +44 -0
- package/effective-python/examples/after.md +56 -0
- package/effective-python/examples/before.md +40 -0
- package/effective-python/references/api_reference.md +218 -0
- package/effective-python/references/practices-catalog.md +483 -0
- package/effective-python/references/review-checklist.md +190 -0
- package/effective-python/scripts/lint.py +173 -0
- package/kotlin-in-action/evals/evals.json +43 -0
- package/kotlin-in-action/examples/after.md +53 -0
- package/kotlin-in-action/examples/before.md +39 -0
- package/kotlin-in-action/scripts/setup_detekt.py +224 -0
- package/lean-startup/evals/evals.json +43 -0
- package/lean-startup/examples/after.md +80 -0
- package/lean-startup/examples/before.md +34 -0
- package/lean-startup/scripts/new_experiment.py +286 -0
- package/microservices-patterns/SKILL.md +140 -0
- package/microservices-patterns/evals/evals.json +45 -0
- package/microservices-patterns/examples/after.md +69 -0
- package/microservices-patterns/examples/before.md +40 -0
- package/microservices-patterns/scripts/new_service.py +583 -0
- package/package.json +2 -8
- package/refactoring-ui/evals/evals.json +45 -0
- package/refactoring-ui/examples/after.md +85 -0
- package/refactoring-ui/examples/before.md +58 -0
- package/refactoring-ui/scripts/audit_css.py +250 -0
- package/skill-router/SKILL.md +142 -0
- package/skill-router/evals/evals.json +38 -0
- package/skill-router/examples/after.md +63 -0
- package/skill-router/examples/before.md +39 -0
- package/skill-router/references/api_reference.md +24 -0
- package/skill-router/references/routing-heuristics.md +89 -0
- package/skill-router/references/skill-catalog.md +156 -0
- package/skill-router/scripts/route.py +266 -0
- package/storytelling-with-data/evals/evals.json +47 -0
- package/storytelling-with-data/examples/after.md +50 -0
- package/storytelling-with-data/examples/before.md +33 -0
- package/storytelling-with-data/scripts/chart_review.py +301 -0
- package/system-design-interview/evals/evals.json +45 -0
- package/system-design-interview/examples/after.md +94 -0
- package/system-design-interview/examples/before.md +27 -0
- package/system-design-interview/scripts/new_design.py +421 -0
- package/using-asyncio-python/evals/evals.json +43 -0
- package/using-asyncio-python/examples/after.md +68 -0
- package/using-asyncio-python/examples/before.md +39 -0
- package/using-asyncio-python/scripts/check_blocking.py +270 -0
- package/web-scraping-python/evals/evals.json +46 -0
- package/web-scraping-python/examples/after.md +109 -0
- package/web-scraping-python/examples/before.md +40 -0
- package/web-scraping-python/scripts/new_scraper.py +231 -0
- /package/{effective-python-skill → effective-python}/SKILL.md +0 -0
- /package/{effective-python-skill → effective-python}/ref-01-pythonic-thinking.md +0 -0
- /package/{effective-python-skill → effective-python}/ref-02-lists-and-dicts.md +0 -0
- /package/{effective-python-skill → effective-python}/ref-03-functions.md +0 -0
- /package/{effective-python-skill → effective-python}/ref-04-comprehensions-generators.md +0 -0
- /package/{effective-python-skill → effective-python}/ref-05-classes-interfaces.md +0 -0
- /package/{effective-python-skill → effective-python}/ref-06-metaclasses-attributes.md +0 -0
- /package/{effective-python-skill → effective-python}/ref-07-concurrency.md +0 -0
- /package/{effective-python-skill → effective-python}/ref-08-robustness-performance.md +0 -0
- /package/{effective-python-skill → effective-python}/ref-09-testing-debugging.md +0 -0
- /package/{effective-python-skill → effective-python}/ref-10-collaboration.md +0 -0
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
{
|
|
2
|
+
"evals": [
|
|
3
|
+
{
|
|
4
|
+
"id": "eval-01-pie-chart-comparison",
|
|
5
|
+
"prompt": "Review this data visualization specification:\n\nChart type: Pie chart (donut variant)\nTitle: \"Website Traffic by Source\"\nData (8 slices):\n- Organic Search: 28%\n- Direct: 22%\n- Social Media: 18%\n- Email: 12%\n- Paid Search: 9%\n- Referral: 5%\n- Display Ads: 4%\n- Other: 2%\n\nDesign choices:\n- Each slice has a distinct color (8 different hues)\n- Legend positioned to the right listing all 8 sources with their percentages\n- No data labels on the slices themselves\n- Title is \"Website Traffic by Source\" (descriptive, not action-oriented)\n- The chart will be used in a monthly marketing review presentation to decide where to increase ad spend",
|
|
6
|
+
"expectations": [
|
|
7
|
+
"Identifies pie/donut charts as inappropriate for comparing 8 categories — human perception cannot accurately compare angles or arc lengths, especially for similar-sized slices like 9%, 5%, 4%, 2%",
|
|
8
|
+
"Recommends replacing the pie chart with a horizontal bar chart ordered by value — this makes comparison trivially easy and is the explicit recommendation from Storytelling with Data Ch 2",
|
|
9
|
+
"Flags that 8 different colors violates the principle of purposeful color use — color should highlight the data point that matters, not differentiate all 8 categories",
|
|
10
|
+
"Flags the generic, descriptive title 'Website Traffic by Source' — per Ch 7, the title should state the actionable takeaway (e.g., 'Organic and Direct together drive half of all traffic — paid channels underperform')",
|
|
11
|
+
"Notes that forcing the audience to cross-reference a legend for 8 items adds unnecessary cognitive load — direct labels on bars would be clearer",
|
|
12
|
+
"Points out the context: this is for a decision about ad spend — the chart should make it obvious which channels to invest in or cut, not just show proportions",
|
|
13
|
+
"May suggest greying out all bars except the ones relevant to the decision (e.g., Paid Search and Display Ads highlighted to show underperformance) to focus attention per Ch 4"
|
|
14
|
+
]
|
|
15
|
+
},
|
|
16
|
+
{
|
|
17
|
+
"id": "eval-02-chart-junk",
|
|
18
|
+
"prompt": "Review this data visualization specification for a quarterly sales dashboard:\n\nChart type: 3D clustered column chart\nTitle: \"Q1-Q4 Sales Performance by Region\" (displayed in WordArt-style gradient text)\nData: 4 regions × 4 quarters = 16 bars\nDesign choices:\n- 3D perspective effect with visible depth on bars\n- Heavy gridlines every $50K (dark grey, 1.5px)\n- Chart border: black box outline around entire chart area\n- Background: light blue gradient fill in the plot area\n- Data markers: small diamond shapes at the top of each bar\n- Both X-axis and Y-axis tick marks visible\n- Legend box with border in lower-right corner overlapping some bars\n- Y-axis title 'Sales ($)' rotated 90 degrees\n- All 16 bars use different colors\n- Dollar signs and commas on every data label (\"$125,432.00\")\n- Drop shadow effect on the chart frame",
|
|
19
|
+
"expectations": [
|
|
20
|
+
"Identifies the 3D effect as a critical flaw: 3D distorts the visual representation of bar heights — the same bar appears different heights depending on viewing angle, making accurate comparison impossible (Ch 2 and Ch 3)",
|
|
21
|
+
"Flags the 16-color palette as violating purposeful color use — with 4 regions × 4 quarters, either region or time should use color; the other should use position/grouping alone",
|
|
22
|
+
"Flags the heavy gridlines as chart junk (Ch 3): dark 1.5px gridlines compete with the data; if gridlines are needed, they should be very light grey (~#e5e7eb) and thin (0.5px)",
|
|
23
|
+
"Flags the blue gradient background as chart junk — plot area backgrounds add noise without adding information; white or no background is correct",
|
|
24
|
+
"Flags the chart border/drop shadow as chart junk — borders around charts imply the chart needs to be 'contained' and add visual noise",
|
|
25
|
+
"Flags the rotated Y-axis title — violates the alignment principle (Ch. 5: Think Like a Designer — left-align text for readability; rotated/vertical text is harder to read and should be made horizontal or removed)",
|
|
26
|
+
"Notes the data labels show '$125,432.00' — excessive decimal precision is visual clutter with no informational value at this scale; '$125K' is clearer (Ch. 3: Eliminate Clutter; data-ink ratio — maximize the proportion of ink devoted to actual data)",
|
|
27
|
+
"Notes the overlapping legend is a usability problem — direct labeling of regions on the chart would eliminate the need for a legend entirely",
|
|
28
|
+
"Recommends: remove 3D, use flat 2D bars; remove gridlines or make them very light; white background; consistent color scheme (one color per region); direct labels; action-oriented title"
|
|
29
|
+
]
|
|
30
|
+
},
|
|
31
|
+
{
|
|
32
|
+
"id": "eval-03-clean-effective-visualization",
|
|
33
|
+
"prompt": "Review this data visualization specification:\n\nContext: Presenting to the VP of Sales — goal is to get approval to hire 2 more sales reps in the APAC region\n\nChart type: Horizontal bar chart\nTitle: \"APAC revenue per rep is half the company average — we need more headcount\"\n\nData: Revenue per sales rep by region (last 12 months)\n- North America: $2.4M per rep (4 reps)\n- Europe: $2.1M per rep (3 reps)\n- APAC: $1.2M per rep (2 reps) ← highlighted in brand blue\n- Latin America: $1.9M per rep (2 reps)\n\nDesign choices:\n- All bars grey (#9ca3af) except APAC which is brand blue (#2563eb)\n- Direct value labels at the end of each bar (\"$2.4M\", \"$2.1M\", etc.)\n- A vertical dashed reference line at $2.0M labeled 'Company average'\n- No legend (regions labeled directly on Y-axis)\n- Light horizontal gridlines removed; bars speak for themselves\n- Clean white background\n- Annotation next to APAC bar: \"Only 2 reps covering 4.5B population\"\n- Footnote: Source: Salesforce CRM, FY2025",
|
|
34
|
+
"expectations": [
|
|
35
|
+
"Recognizes this as a well-crafted, purposeful data visualization and says so explicitly",
|
|
36
|
+
"Praises the action-oriented title: 'APAC revenue per rep is half the company average — we need more headcount' states the insight AND the implication — exactly the Big Idea principle from Ch 1 and horizontal logic from Ch 7",
|
|
37
|
+
"Praises the strategic color use: all bars grey except APAC in brand blue — the audience's eye is immediately drawn to the data point that matters, implementing the Ch 4 principle of using color to direct attention",
|
|
38
|
+
"Praises the reference line: the company average line gives context to judge APAC's performance without requiring mental arithmetic",
|
|
39
|
+
"Praises direct labeling: no legend needed because regions are labeled on the Y-axis and values are labeled directly on bars — eliminating the cognitive cost of legend cross-referencing (Ch 3)",
|
|
40
|
+
"Praises the annotation: 'Only 2 reps covering 4.5B population' is a storytelling technique that adds human context to the data (Ch 7 — tell the story, don't just show the numbers)",
|
|
41
|
+
"Praises the source footnote: establishing credibility and data provenance is a design best practice",
|
|
42
|
+
"Does NOT manufacture fake issues just to have something to say",
|
|
43
|
+
"May offer optional suggestions (a second chart showing projected revenue with 4 APAC reps to make the business case) but clearly frames them as additions to strengthen the narrative, not corrections"
|
|
44
|
+
]
|
|
45
|
+
}
|
|
46
|
+
]
|
|
47
|
+
}
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
# After
|
|
2
|
+
|
|
3
|
+
A line chart with direct labels, a single accent color highlighting the insight, and an action title that states the finding — replacing three unreadable 3D pie charts.
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
CHART SPECIFICATION: Support Ticket Trend (Revised)
|
|
7
|
+
|
|
8
|
+
Chart type: Line chart (2D, no markers except at Q3 for annotation)
|
|
9
|
+
Title (action headline): "Product Bug tickets doubled in Q3 — prioritise QA investment"
|
|
10
|
+
|
|
11
|
+
Data: Ticket counts by category, Q1–Q3 2024
|
|
12
|
+
Shown as: Single line chart, all three quarters on the x-axis
|
|
13
|
+
|
|
14
|
+
Visual choices:
|
|
15
|
+
- All category lines: hsl(0, 0%, 75%) [light grey, 1.5px stroke]
|
|
16
|
+
- "Product Bugs" line: hsl(4, 90%, 58%) [red accent, 2.5px stroke]
|
|
17
|
+
— only this line is coloured; all others recede into context
|
|
18
|
+
- Direct labels at Q3 data points (right side of chart)
|
|
19
|
+
— no legend required
|
|
20
|
+
- Single annotation on Product Bugs at Q3:
|
|
21
|
+
"↑ 2× vs Q1" in matching red, placed above the data point
|
|
22
|
+
|
|
23
|
+
Axes:
|
|
24
|
+
- X: Q1 2024, Q2 2024, Q3 2024 (three points, labelled clearly)
|
|
25
|
+
- Y: Ticket volume (0–1,200), light grey gridlines, no border
|
|
26
|
+
- Y-axis title removed — units are obvious from context
|
|
27
|
+
|
|
28
|
+
Clutter removed:
|
|
29
|
+
- No 3D effects
|
|
30
|
+
- No pie wedges (angles cannot be compared accurately)
|
|
31
|
+
- No rainbow palette (colour carries no meaning when everything is coloured)
|
|
32
|
+
- No legend (direct labels replace it)
|
|
33
|
+
- No percentage labels on invisible slices
|
|
34
|
+
- Chart border removed
|
|
35
|
+
|
|
36
|
+
Narrative context (slide title above chart):
|
|
37
|
+
"Our Q3 support data shows one outlier that demands attention."
|
|
38
|
+
|
|
39
|
+
Call to action (below chart, in body text):
|
|
40
|
+
"Recommendation: allocate 2 additional QA engineers to the mobile team
|
|
41
|
+
before the Q4 release to prevent further escalation."
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Key improvements:
|
|
45
|
+
- Line chart replaces pie charts — change over three time periods is exactly what a line chart communicates; pie charts cannot show trends (Ch 2: Choose an effective visual)
|
|
46
|
+
- Grey-out-then-highlight strategy: all lines are grey, only "Product Bugs" is red — the viewer's eye goes directly to the story without instruction (Ch 4: Focus attention with preattentive attributes — color)
|
|
47
|
+
- Direct labels at Q3 replace the legend — eliminates the back-and-forth between legend and chart (Ch 3: Eliminate clutter)
|
|
48
|
+
- Action headline "Product Bug tickets doubled in Q3 — prioritise QA investment" states the takeaway instead of describing the chart (Ch 7: Tell a story — horizontal logic, action titles)
|
|
49
|
+
- Annotation "↑ 2×" with the matching accent color amplifies the key data point without adding clutter (Ch 7: Annotation is storytelling)
|
|
50
|
+
- Explicit call-to-action in body text completes the three-act structure: context → insight → recommendation (Ch 7: Three-act structure)
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# Before
|
|
2
|
+
|
|
3
|
+
A pie chart specification used to compare customer support ticket volume across seven categories over three time periods — a chart type that cannot support this data relationship.
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
CHART SPECIFICATION: Support Ticket Distribution
|
|
7
|
+
|
|
8
|
+
Chart type: Pie chart (3D, with exploded slices)
|
|
9
|
+
Title: "Support Ticket Breakdown"
|
|
10
|
+
|
|
11
|
+
Data: Ticket counts by category for Q1, Q2, Q3 2024
|
|
12
|
+
Shown as: Three separate 3D pie charts side by side
|
|
13
|
+
|
|
14
|
+
Categories (7 slices per pie):
|
|
15
|
+
- Billing Issues
|
|
16
|
+
- Login / Account Access
|
|
17
|
+
- Product Bugs
|
|
18
|
+
- Feature Requests
|
|
19
|
+
- Shipping & Delivery
|
|
20
|
+
- Refunds & Returns
|
|
21
|
+
- Other
|
|
22
|
+
|
|
23
|
+
Visual choices:
|
|
24
|
+
- Each category gets a distinct color from a rainbow palette
|
|
25
|
+
- All 7 slices are shown even when <2% share
|
|
26
|
+
- Percentages shown inside each slice (8px font)
|
|
27
|
+
- Legend placed below each chart
|
|
28
|
+
- 3D perspective tilt applied for "visual interest"
|
|
29
|
+
- No annotations or callouts
|
|
30
|
+
|
|
31
|
+
Goal: Show the trend in which ticket types are growing and which are shrinking
|
|
32
|
+
across Q1 → Q2 → Q3, and highlight that Product Bugs doubled in Q3.
|
|
33
|
+
```
|
|
@@ -0,0 +1,301 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""
|
|
3
|
+
chart_review.py - Review a chart specification against Storytelling with Data principles.
|
|
4
|
+
|
|
5
|
+
Usage:
|
|
6
|
+
python chart_review.py <spec-file.json>
|
|
7
|
+
|
|
8
|
+
Spec file format (JSON):
|
|
9
|
+
{
|
|
10
|
+
"title": "Sales by Region Q4",
|
|
11
|
+
"chart_type": "pie",
|
|
12
|
+
"data_points": 8,
|
|
13
|
+
"colors": ["#ff0000", "#00ff00"],
|
|
14
|
+
"has_gridlines": true,
|
|
15
|
+
"has_legend": true,
|
|
16
|
+
"has_direct_labels": false,
|
|
17
|
+
"is_3d": false,
|
|
18
|
+
"y_axis_starts_at_zero": true
|
|
19
|
+
}
|
|
20
|
+
|
|
21
|
+
All fields are optional. Unrecognised fields are ignored.
|
|
22
|
+
|
|
23
|
+
Based on "Storytelling with Data" by Cole Nussbaumer Knaflic.
|
|
24
|
+
Each finding references the relevant chapter.
|
|
25
|
+
"""
|
|
26
|
+
|
|
27
|
+
import argparse
|
|
28
|
+
import json
|
|
29
|
+
import pathlib
|
|
30
|
+
import sys
|
|
31
|
+
from typing import Any
|
|
32
|
+
|
|
33
|
+
# Severity ordering for output
|
|
34
|
+
PRIORITY_ORDER = {"HIGH": 0, "MEDIUM": 1, "LOW": 2}
|
|
35
|
+
|
|
36
|
+
CHART_TYPE_ALIASES: dict[str, str] = {
|
|
37
|
+
"pie": "pie",
|
|
38
|
+
"donut": "pie",
|
|
39
|
+
"doughnut": "pie",
|
|
40
|
+
"bar": "bar",
|
|
41
|
+
"column": "bar",
|
|
42
|
+
"horizontal bar": "bar",
|
|
43
|
+
"stacked bar": "bar",
|
|
44
|
+
"line": "line",
|
|
45
|
+
"area": "line",
|
|
46
|
+
"scatter": "scatter",
|
|
47
|
+
"bubble": "scatter",
|
|
48
|
+
"table": "table",
|
|
49
|
+
"heatmap": "table",
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
# Action verbs that indicate a title states a finding rather than just labelling axes.
|
|
53
|
+
INSIGHT_VERBS = {
|
|
54
|
+
"grew", "declined", "increased", "decreased", "outperformed", "underperformed",
|
|
55
|
+
"surpassed", "dropped", "rose", "fell", "exceeded", "missed", "reached",
|
|
56
|
+
"shows", "reveals", "demonstrates", "highlights", "indicates", "confirms",
|
|
57
|
+
"beats", "lags", "leads", "trails", "spikes", "dips",
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
|
|
61
|
+
def load_spec(path: pathlib.Path) -> dict[str, Any]:
|
|
62
|
+
try:
|
|
63
|
+
text = path.read_text(encoding="utf-8")
|
|
64
|
+
except OSError as exc:
|
|
65
|
+
print(f"ERROR: Cannot read file: {exc}")
|
|
66
|
+
sys.exit(1)
|
|
67
|
+
try:
|
|
68
|
+
return json.loads(text)
|
|
69
|
+
except json.JSONDecodeError as exc:
|
|
70
|
+
print(f"ERROR: Invalid JSON in {path}: {exc}")
|
|
71
|
+
sys.exit(1)
|
|
72
|
+
|
|
73
|
+
|
|
74
|
+
def normalize_chart_type(raw: str) -> str:
|
|
75
|
+
return CHART_TYPE_ALIASES.get(raw.lower().strip(), raw.lower().strip())
|
|
76
|
+
|
|
77
|
+
|
|
78
|
+
def title_is_action_oriented(title: str) -> bool:
|
|
79
|
+
"""Return True if the title starts with or contains an insight verb."""
|
|
80
|
+
first_word = title.strip().split()[0].lower().rstrip(".,;:") if title.strip() else ""
|
|
81
|
+
if first_word in INSIGHT_VERBS:
|
|
82
|
+
return True
|
|
83
|
+
# Also accept titles where an insight verb appears early (within first 4 words)
|
|
84
|
+
words = [w.lower().rstrip(".,;:") for w in title.strip().split()[:4]]
|
|
85
|
+
return any(w in INSIGHT_VERBS for w in words)
|
|
86
|
+
|
|
87
|
+
|
|
88
|
+
def check_spec(spec: dict[str, Any]) -> list[dict[str, str]]:
|
|
89
|
+
findings: list[dict[str, str]] = []
|
|
90
|
+
|
|
91
|
+
def add(priority: str, chapter: str, principle: str, detail: str, recommendation: str) -> None:
|
|
92
|
+
findings.append({
|
|
93
|
+
"priority": priority,
|
|
94
|
+
"chapter": chapter,
|
|
95
|
+
"principle": principle,
|
|
96
|
+
"detail": detail,
|
|
97
|
+
"recommendation": recommendation,
|
|
98
|
+
})
|
|
99
|
+
|
|
100
|
+
chart_type_raw = spec.get("chart_type", "")
|
|
101
|
+
chart_type = normalize_chart_type(chart_type_raw) if chart_type_raw else ""
|
|
102
|
+
data_points = spec.get("data_points")
|
|
103
|
+
colors = spec.get("colors", [])
|
|
104
|
+
has_gridlines = spec.get("has_gridlines", False)
|
|
105
|
+
has_legend = spec.get("has_legend", False)
|
|
106
|
+
has_direct_labels = spec.get("has_direct_labels", False)
|
|
107
|
+
is_3d = spec.get("is_3d", False)
|
|
108
|
+
y_axis_zero = spec.get("y_axis_starts_at_zero", True)
|
|
109
|
+
title = spec.get("title", "")
|
|
110
|
+
|
|
111
|
+
# Check 1: Pie charts with more than 4 slices
|
|
112
|
+
if chart_type == "pie" and data_points is not None and data_points > 4:
|
|
113
|
+
add(
|
|
114
|
+
priority="HIGH",
|
|
115
|
+
chapter="Chapter 2 - Choosing an Effective Visual",
|
|
116
|
+
principle="Avoid pie charts with many slices",
|
|
117
|
+
detail=f"Pie chart has {data_points} slices. Humans cannot accurately compare non-adjacent arc lengths.",
|
|
118
|
+
recommendation=(
|
|
119
|
+
"Use a horizontal bar chart sorted by value. "
|
|
120
|
+
"Bars make magnitude comparison trivial and scale to many categories."
|
|
121
|
+
),
|
|
122
|
+
)
|
|
123
|
+
|
|
124
|
+
# Check 2: More than 5 colors
|
|
125
|
+
if len(colors) > 5:
|
|
126
|
+
add(
|
|
127
|
+
priority="HIGH",
|
|
128
|
+
chapter="Chapter 4 - Focus Your Audience's Attention",
|
|
129
|
+
principle="Use color strategically, not decoratively",
|
|
130
|
+
detail=f"{len(colors)} colors used. More than 5 colors overwhelm the eye and dilute emphasis.",
|
|
131
|
+
recommendation=(
|
|
132
|
+
"Grey out all categories except the one(s) you want to highlight. "
|
|
133
|
+
"Use a single accent color to draw the eye to the key insight."
|
|
134
|
+
),
|
|
135
|
+
)
|
|
136
|
+
|
|
137
|
+
# Check 3: Gridlines present
|
|
138
|
+
if has_gridlines:
|
|
139
|
+
add(
|
|
140
|
+
priority="MEDIUM",
|
|
141
|
+
chapter="Chapter 3 - Clutter Is Your Enemy",
|
|
142
|
+
principle="Remove chart junk and non-data ink",
|
|
143
|
+
detail="Gridlines are present. They add visual noise without adding information.",
|
|
144
|
+
recommendation=(
|
|
145
|
+
"Remove gridlines entirely, or replace with light grey (#e0e0e0) hairlines. "
|
|
146
|
+
"If reference values matter, use direct annotations on the data instead."
|
|
147
|
+
),
|
|
148
|
+
)
|
|
149
|
+
|
|
150
|
+
# Check 4: Legend without direct labels
|
|
151
|
+
if has_legend and not has_direct_labels:
|
|
152
|
+
add(
|
|
153
|
+
priority="MEDIUM",
|
|
154
|
+
chapter="Chapter 5 - Think Like a Designer",
|
|
155
|
+
principle="Label data directly to reduce cognitive load",
|
|
156
|
+
detail=(
|
|
157
|
+
"A legend forces the reader to look away from the data to decode colors. "
|
|
158
|
+
"This interrupts the reading flow."
|
|
159
|
+
),
|
|
160
|
+
recommendation=(
|
|
161
|
+
"Place labels directly next to each data series or bar. "
|
|
162
|
+
"Remove the legend. Direct labelling reduces eye travel and speeds comprehension."
|
|
163
|
+
),
|
|
164
|
+
)
|
|
165
|
+
|
|
166
|
+
# Check 5: 3D charts
|
|
167
|
+
if is_3d:
|
|
168
|
+
add(
|
|
169
|
+
priority="HIGH",
|
|
170
|
+
chapter="Chapter 2 - Choosing an Effective Visual",
|
|
171
|
+
principle="Never use 3D visualisations",
|
|
172
|
+
detail=(
|
|
173
|
+
"3D perspective distorts relative bar/slice sizes due to foreshortening. "
|
|
174
|
+
"Viewers cannot accurately read values from a 3D chart."
|
|
175
|
+
),
|
|
176
|
+
recommendation=(
|
|
177
|
+
"Switch to a flat 2D version of the same chart type. "
|
|
178
|
+
"If depth is meant to encode a third variable, use facets or bubble size instead."
|
|
179
|
+
),
|
|
180
|
+
)
|
|
181
|
+
|
|
182
|
+
# Check 6: Title not action-oriented
|
|
183
|
+
if title:
|
|
184
|
+
if not title_is_action_oriented(title):
|
|
185
|
+
add(
|
|
186
|
+
priority="LOW",
|
|
187
|
+
chapter="Chapter 6 - Dissecting Model Visuals",
|
|
188
|
+
principle="Title should state the insight, not label the axes",
|
|
189
|
+
detail=(
|
|
190
|
+
f"Title '{title}' describes what the chart shows but does not communicate "
|
|
191
|
+
"the key takeaway. Readers must infer the insight themselves."
|
|
192
|
+
),
|
|
193
|
+
recommendation=(
|
|
194
|
+
"Rewrite the title as a one-sentence finding: e.g., "
|
|
195
|
+
"'APAC revenue grew 34% year-over-year, outpacing all other regions.' "
|
|
196
|
+
"This tells readers what to think before they look at the data."
|
|
197
|
+
),
|
|
198
|
+
)
|
|
199
|
+
else:
|
|
200
|
+
add(
|
|
201
|
+
priority="MEDIUM",
|
|
202
|
+
chapter="Chapter 6 - Dissecting Model Visuals",
|
|
203
|
+
principle="Every chart needs a title",
|
|
204
|
+
detail="No title field found in the spec. Untitled charts require readers to form their own interpretation.",
|
|
205
|
+
recommendation=(
|
|
206
|
+
"Add a descriptive, insight-oriented title that states the key finding directly."
|
|
207
|
+
),
|
|
208
|
+
)
|
|
209
|
+
|
|
210
|
+
# Check 7: Bar chart not starting at zero
|
|
211
|
+
if chart_type == "bar" and y_axis_zero is False:
|
|
212
|
+
add(
|
|
213
|
+
priority="HIGH",
|
|
214
|
+
chapter="Chapter 2 - Choosing an Effective Visual",
|
|
215
|
+
principle="Bar charts must start at zero",
|
|
216
|
+
detail=(
|
|
217
|
+
"The y-axis does not start at zero. Because bar length encodes value, "
|
|
218
|
+
"a truncated axis makes small differences appear dramatically large."
|
|
219
|
+
),
|
|
220
|
+
recommendation=(
|
|
221
|
+
"Set the y-axis baseline to zero. "
|
|
222
|
+
"If the differences are genuinely small, switch to a line chart, "
|
|
223
|
+
"which does not rely on bar length to encode magnitude."
|
|
224
|
+
),
|
|
225
|
+
)
|
|
226
|
+
|
|
227
|
+
# Check 8: Pie chart for proportional data — general advisory
|
|
228
|
+
if chart_type == "pie":
|
|
229
|
+
add(
|
|
230
|
+
priority="LOW",
|
|
231
|
+
chapter="Chapter 2 - Choosing an Effective Visual",
|
|
232
|
+
principle="Pie charts are rarely the best choice",
|
|
233
|
+
detail=(
|
|
234
|
+
"Even well-formed pie charts are harder to read than bar charts "
|
|
235
|
+
"because humans are poor at judging angles and arc lengths."
|
|
236
|
+
),
|
|
237
|
+
recommendation=(
|
|
238
|
+
"Consider a single stacked bar (for part-to-whole) or a simple bar chart. "
|
|
239
|
+
"Use a pie only when: (a) there are 2-3 slices and (b) the exact proportions matter less than the part-to-whole story."
|
|
240
|
+
),
|
|
241
|
+
)
|
|
242
|
+
|
|
243
|
+
return findings
|
|
244
|
+
|
|
245
|
+
|
|
246
|
+
def print_findings(findings: list[dict[str, str]]) -> None:
|
|
247
|
+
sorted_findings = sorted(findings, key=lambda f: PRIORITY_ORDER.get(f["priority"], 99))
|
|
248
|
+
|
|
249
|
+
if not sorted_findings:
|
|
250
|
+
print("No issues found. The chart spec looks good against Storytelling with Data principles.")
|
|
251
|
+
return
|
|
252
|
+
|
|
253
|
+
print(f"Found {len(sorted_findings)} issue(s):\n")
|
|
254
|
+
for i, finding in enumerate(sorted_findings, start=1):
|
|
255
|
+
priority = finding["priority"]
|
|
256
|
+
priority_display = f"[{priority}]"
|
|
257
|
+
print(f"{i}. {priority_display} {finding['principle']}")
|
|
258
|
+
print(f" Chapter : {finding['chapter']}")
|
|
259
|
+
print(f" Detail : {finding['detail']}")
|
|
260
|
+
print(f" Recommended : {finding['recommendation']}")
|
|
261
|
+
print()
|
|
262
|
+
|
|
263
|
+
counts = {"HIGH": 0, "MEDIUM": 0, "LOW": 0}
|
|
264
|
+
for f in findings:
|
|
265
|
+
counts[f["priority"]] = counts.get(f["priority"], 0) + 1
|
|
266
|
+
|
|
267
|
+
print("=" * 60)
|
|
268
|
+
print("PRIORITY SUMMARY")
|
|
269
|
+
print("=" * 60)
|
|
270
|
+
for level in ("HIGH", "MEDIUM", "LOW"):
|
|
271
|
+
if counts[level]:
|
|
272
|
+
print(f" {counts[level]:2d} {level}")
|
|
273
|
+
print(f" --")
|
|
274
|
+
print(f" {sum(counts.values()):2d} Total")
|
|
275
|
+
|
|
276
|
+
|
|
277
|
+
def main() -> None:
|
|
278
|
+
parser = argparse.ArgumentParser(
|
|
279
|
+
description="Review a chart specification against Storytelling with Data principles."
|
|
280
|
+
)
|
|
281
|
+
parser.add_argument(
|
|
282
|
+
"spec_file",
|
|
283
|
+
help="Path to a JSON chart specification file.",
|
|
284
|
+
)
|
|
285
|
+
args = parser.parse_args()
|
|
286
|
+
|
|
287
|
+
spec_path = pathlib.Path(args.spec_file)
|
|
288
|
+
if not spec_path.exists():
|
|
289
|
+
print(f"ERROR: File not found: {spec_path}")
|
|
290
|
+
sys.exit(1)
|
|
291
|
+
|
|
292
|
+
spec = load_spec(spec_path)
|
|
293
|
+
findings = check_spec(spec)
|
|
294
|
+
print_findings(findings)
|
|
295
|
+
|
|
296
|
+
has_high = any(f["priority"] == "HIGH" for f in findings)
|
|
297
|
+
sys.exit(2 if has_high else (1 if findings else 0))
|
|
298
|
+
|
|
299
|
+
|
|
300
|
+
if __name__ == "__main__":
|
|
301
|
+
main()
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
{
|
|
2
|
+
"evals": [
|
|
3
|
+
{
|
|
4
|
+
"id": "eval-01-naive-url-shortener",
|
|
5
|
+
"prompt": "I need to design a URL shortener service like bit.ly. Here's my design:\n\nThe system has a single server with a MySQL database. When a user submits a long URL, we generate a random 6-character short code, store the mapping in a `urls` table with columns (short_code, long_url, created_at), and return the short URL. When someone visits the short URL, we look up the long URL in the database and do a 302 redirect.\n\nFor the database:\n```sql\nCREATE TABLE urls (\n short_code VARCHAR(6) PRIMARY KEY,\n long_url TEXT NOT NULL,\n created_at TIMESTAMP DEFAULT NOW()\n);\n```\n\nThe application is a simple Node.js Express server that handles both write (shorten) and read (redirect) traffic on the same process. I think this will work fine for our use case.",
|
|
6
|
+
"expectations": [
|
|
7
|
+
"Calls out the absence of any back-of-envelope estimation before designing (Ch 2 / Ch 3): we don't know the QPS, storage needs, or read/write ratio — the design can't be validated without these numbers",
|
|
8
|
+
"Identifies the single point of failure: one server, one database — any failure takes down the entire service",
|
|
9
|
+
"Flags missing caching (Ch 1): URL redirects are extremely read-heavy (estimated 100:1 read/write ratio in the book's example); a Redis cache in front of the DB would handle the vast majority of redirect lookups without hitting MySQL",
|
|
10
|
+
"Flags no CDN or load balancing (Ch 1): with a single Node.js process, the service cannot scale horizontally",
|
|
11
|
+
"Questions the random collision strategy: random 6-character codes will have increasing collision probability as the table grows — needs a collision-resistant ID generation strategy (base-62 encoding of auto-increment ID, or a dedicated ID generator per Ch 7/8)",
|
|
12
|
+
"Notes that 302 redirect means the browser does NOT cache the redirect — every click hits the server again; 301 is cacheable and reduces load for stable URLs (Ch 8 covers this trade-off explicitly)",
|
|
13
|
+
"Flags the absence of rate limiting on the URL creation endpoint — without it the service is open to spam and abuse (Ch. 4: rate limiter patterns protect APIs; a URL shortener creation endpoint is a prime abuse target)",
|
|
14
|
+
"Recommends at minimum: back-of-envelope estimate first, Redis cache for redirects, read replicas for the DB, load balancer, and a more robust ID generation approach"
|
|
15
|
+
]
|
|
16
|
+
},
|
|
17
|
+
{
|
|
18
|
+
"id": "eval-02-premature-optimization-no-estimation",
|
|
19
|
+
"prompt": "I want to build a notification system (like push notifications for a mobile app). Before designing it, I've decided the system needs:\n\n1. A Kafka cluster with 50 partitions for notification events\n2. A Cassandra cluster with 6 nodes for storing notification history\n3. A Redis cluster in active-active mode across 3 data centers\n4. A Kubernetes cluster with auto-scaling pods for the notification workers\n5. A separate microservice for each notification channel (push, SMS, email, in-app)\n6. A GraphQL subscription API for real-time notification delivery\n7. Consistent hashing for distributing notifications across worker pods\n\nThe app currently has about 500 registered users and we send maybe 200 notifications per day.",
|
|
20
|
+
"expectations": [
|
|
21
|
+
"Calls out vanity scaling (Ch 2): 500 users sending 200 notifications per day is roughly 0.002 QPS — this is trivially handled by a single server with a simple database; none of the proposed infrastructure is justified",
|
|
22
|
+
"References the principle from Ch 2: scaling decisions must be based on back-of-envelope estimation, not intuition or aspiration",
|
|
23
|
+
"Calculates or estimates the actual load: 200 notifications/day = ~8/hour = ~0.002/second — a SQLite database on a $5 server would handle this with room to spare",
|
|
24
|
+
"Notes that the proposed stack adds enormous operational complexity: Kafka, Cassandra, Redis cluster, Kubernetes, and multiple microservices all require dedicated expertise to operate, monitor, and debug",
|
|
25
|
+
"Identifies this as the 'vanity scaling' anti-pattern explicitly called out in the SKILL.md review checklist",
|
|
26
|
+
"Recommends a design appropriate for the actual scale: a single service, a relational database (Postgres), an email/SMS provider SDK, and a simple job queue (like BullMQ or a DB-backed worker) — then scale when metrics demand it",
|
|
27
|
+
"Notes that the 4-step framework (Ch 3) requires establishing scope and estimating load BEFORE proposing architecture — skipping estimation leads to over-engineered designs like this one"
|
|
28
|
+
]
|
|
29
|
+
},
|
|
30
|
+
{
|
|
31
|
+
"id": "eval-03-good-system-design-proposal",
|
|
32
|
+
"prompt": "Design proposal for a news feed system (like Twitter's home timeline):\n\n**Step 1 — Requirements & Estimation**\n- Functional: users follow others, posts appear in follower feeds, feeds are reverse-chronological\n- Non-functional: 500ms p99 feed load, eventual consistency acceptable, 10M DAU\n- Estimation: 10M DAU × 2 feed loads/day = 20M reads/day ≈ 230 QPS reads; 10M DAU × 0.1 posts/day = 1M writes/day ≈ 12 QPS writes; read/write ratio ≈ 20:1\n- Storage: 1M posts/day × 200 bytes = 200MB/day ≈ 70GB/year for post content\n\n**Step 2 — High-Level Design**\n- Write path: user posts → Post Service → publishes PostCreated event to message queue\n- Fanout service subscribes to PostCreated → for each follower, prepends post ID to their feed cache in Redis\n- Read path: Feed Service → reads from Redis feed cache (list of post IDs) → fetches post content from Post Cache\n- API: GET /v1/feed?userId=X&cursor=Y (cursor-based pagination)\n\n**Step 3 — Deep Dive: Fanout Strategy**\n- Fanout-on-write for regular users: pre-populate follower feeds in Redis on every post\n- Exception for celebrities (>10K followers): fanout-on-read — too expensive to write to millions of feed caches synchronously\n- Hybrid: celebrity posts fetched at read time and merged with pre-computed feed\n\n**Step 4 — Failure Handling**\n- Message queue provides durability for fanout — if fanout service crashes, it replays from queue\n- Redis feeds are a cache, not source of truth — Post DB is the durable store; stale feeds are acceptable (eventual consistency)\n- Rate limiting on POST /posts to prevent feed spam",
|
|
33
|
+
"expectations": [
|
|
34
|
+
"Recognizes this as a well-structured design following the 4-step framework and says so explicitly",
|
|
35
|
+
"Praises the back-of-envelope estimation: derives concrete QPS numbers (230 reads, 12 writes) and uses the read/write ratio to justify the design decisions",
|
|
36
|
+
"Praises the fanout-on-write vs fanout-on-read hybrid — this is exactly the celebrity problem discussed in Ch 11 of the book; the design correctly identifies and handles the hot-user case",
|
|
37
|
+
"Praises cursor-based pagination over offset-based — correct for infinite scroll feeds where new content is constantly inserted",
|
|
38
|
+
"Praises the explicit consistency model: choosing eventual consistency and documenting why (the 500ms SLA and user experience tolerance)",
|
|
39
|
+
"Praises the failure handling section: using the message queue as a durable buffer means fanout service failures don't lose posts",
|
|
40
|
+
"Does NOT manufacture fake issues just to have something to say",
|
|
41
|
+
"May offer optional improvements (cache warming on login, feed pre-generation for cold start, monitoring mentions) but clearly frames them as enhancements"
|
|
42
|
+
]
|
|
43
|
+
}
|
|
44
|
+
]
|
|
45
|
+
}
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
# After
|
|
2
|
+
|
|
3
|
+
A structured URL shortener design that follows the 4-step framework: scope clarification, back-of-envelope estimation, high-level component design, and a focused deep dive.
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
DESIGN: URL Shortener Service
|
|
7
|
+
|
|
8
|
+
─────────────────────────────────────────────────────────────
|
|
9
|
+
STEP 1 — SCOPE & REQUIREMENTS
|
|
10
|
+
─────────────────────────────────────────────────────────────
|
|
11
|
+
Functional requirements:
|
|
12
|
+
- POST /shorten → create short URL, return 7-char code
|
|
13
|
+
- GET /{code} → redirect to original URL (301 or 302)
|
|
14
|
+
- Custom aliases → optional
|
|
15
|
+
- Expiration → optional TTL per URL
|
|
16
|
+
|
|
17
|
+
Non-functional requirements:
|
|
18
|
+
- 100M URLs shortened per day
|
|
19
|
+
- Read:Write ratio ~ 100:1 (redirects far outnumber shortens)
|
|
20
|
+
- 99.9% availability for redirects (latency < 50ms p99)
|
|
21
|
+
- URLs persist indefinitely (unless TTL set)
|
|
22
|
+
|
|
23
|
+
─────────────────────────────────────────────────────────────
|
|
24
|
+
STEP 2 — BACK-OF-ENVELOPE ESTIMATION
|
|
25
|
+
─────────────────────────────────────────────────────────────
|
|
26
|
+
Write QPS: 100M / 86,400 sec ≈ 1,160 writes/sec
|
|
27
|
+
Read QPS: 100:1 ratio ≈ 116,000 reads/sec
|
|
28
|
+
|
|
29
|
+
Storage (10-year horizon):
|
|
30
|
+
100M writes/day × 365 × 10 = 365B URLs
|
|
31
|
+
Each record ≈ 500 bytes
|
|
32
|
+
Total ≈ 365B × 500B ≈ 182 TB
|
|
33
|
+
|
|
34
|
+
Cache:
|
|
35
|
+
80/20 rule: 20% of URLs serve 80% of traffic
|
|
36
|
+
Hot URLs: 116,000 reads/sec × 0.2 × 500B ≈ ~12 MB/sec cache traffic
|
|
37
|
+
In-memory cache size: 1M hot entries × 500B ≈ 500 MB (fits on one Redis node)
|
|
38
|
+
|
|
39
|
+
─────────────────────────────────────────────────────────────
|
|
40
|
+
STEP 3 — HIGH-LEVEL DESIGN
|
|
41
|
+
─────────────────────────────────────────────────────────────
|
|
42
|
+
|
|
43
|
+
[Client]
|
|
44
|
+
│
|
|
45
|
+
▼
|
|
46
|
+
[CDN / Load Balancer]
|
|
47
|
+
│
|
|
48
|
+
├──> [Redirect Service] ──cache hit──> [Redis Cache (hot URLs)]
|
|
49
|
+
│ │ │ cache miss
|
|
50
|
+
│ └─────────────────────────────> [URL DB read replica]
|
|
51
|
+
│
|
|
52
|
+
└──> [Shortening Service]
|
|
53
|
+
│
|
|
54
|
+
├──> [ID Generator Service] (Twitter Snowflake: 64-bit IDs
|
|
55
|
+
│ → Base62 encodes to 7 chars)
|
|
56
|
+
│
|
|
57
|
+
└──> [URL DB primary - MySQL] ─replication─> [Read Replicas]
|
|
58
|
+
|
|
59
|
+
API contract:
|
|
60
|
+
POST /api/v1/shorten
|
|
61
|
+
Body: { "longUrl": "https://...", "ttl": 86400 }
|
|
62
|
+
Response: { "shortUrl": "https://short.ly/aB3xY2k" }
|
|
63
|
+
|
|
64
|
+
GET /aB3xY2k
|
|
65
|
+
Response: HTTP 301 (permanent, browser caches) or 302 (temporary, server tracks clicks)
|
|
66
|
+
|
|
67
|
+
─────────────────────────────────────────────────────────────
|
|
68
|
+
STEP 4 — DEEP DIVE: ID Generation
|
|
69
|
+
─────────────────────────────────────────────────────────────
|
|
70
|
+
Problem: UUID first-8-chars has high collision probability at scale.
|
|
71
|
+
|
|
72
|
+
Solution: Twitter Snowflake-style 64-bit ID:
|
|
73
|
+
[41-bit timestamp ms] + [10-bit machine ID] + [12-bit sequence]
|
|
74
|
+
→ Globally unique, monotonically increasing, no coordination needed
|
|
75
|
+
→ Base62 encode (a-z, A-Z, 0-9): 7 chars covers 62^7 ≈ 3.5 trillion URLs
|
|
76
|
+
|
|
77
|
+
Collision handling: none needed — IDs are guaranteed unique by construction.
|
|
78
|
+
|
|
79
|
+
─────────────────────────────────────────────────────────────
|
|
80
|
+
OPERATIONAL CONCERNS
|
|
81
|
+
─────────────────────────────────────────────────────────────
|
|
82
|
+
- Cache eviction: LRU, 24-hour TTL for hot entries
|
|
83
|
+
- DB sharding: shard by short_code hash when single primary exceeds 10TB
|
|
84
|
+
- Rate limiting: 100 shortens/hour per IP via token bucket
|
|
85
|
+
- Monitoring: p99 redirect latency, cache hit rate, DB replication lag
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Key improvements:
|
|
89
|
+
- Back-of-envelope estimation (116K reads/sec, 182TB, 500MB cache) validates that a single Redis node suffices and that DB sharding is a future concern, not a day-one requirement (Ch 2: Estimation before design)
|
|
90
|
+
- The 4-step framework structures the proposal — scope before design, estimation before components, component overview before deep dive (Ch 3: 4-step framework)
|
|
91
|
+
- API contract is defined explicitly before implementation — POST/GET endpoints with request/response shapes (Ch 3: High-level design)
|
|
92
|
+
- Snowflake ID generation replaces UUID truncation with a correct solution: no collisions at 100M/day scale (Ch 7: Unique ID generation)
|
|
93
|
+
- Read replicas serve the 100:1 read-heavy workload; Redis caches the hot 20% to keep redirect latency under 50ms (Ch 1: Caching and replication)
|
|
94
|
+
- 301 vs 302 redirect choice is a conscious trade-off: 301 reduces server load, 302 enables analytics — stated explicitly (Ch 8: URL Shortener)
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
# Before
|
|
2
|
+
|
|
3
|
+
A system design proposal for a URL shortener that jumps straight to implementation details without any capacity estimation, API definition, or structured component breakdown.
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
DESIGN PROPOSAL: URL Shortener Service
|
|
7
|
+
|
|
8
|
+
Let's just use a database with two columns: short_code and original_url.
|
|
9
|
+
When someone hits our endpoint, we look up the short_code and redirect.
|
|
10
|
+
|
|
11
|
+
We can use MySQL. The table will look like:
|
|
12
|
+
|
|
13
|
+
CREATE TABLE urls (
|
|
14
|
+
short_code VARCHAR(8) PRIMARY KEY,
|
|
15
|
+
original_url TEXT NOT NULL
|
|
16
|
+
);
|
|
17
|
+
|
|
18
|
+
For generating short codes, we'll use UUID and take the first 8 characters.
|
|
19
|
+
If there's a collision, just retry.
|
|
20
|
+
|
|
21
|
+
For the web layer, we'll run a Flask app on a single server. If it gets slow
|
|
22
|
+
we can add more Flask instances behind a load balancer later.
|
|
23
|
+
|
|
24
|
+
Caching: we could add Redis if needed.
|
|
25
|
+
|
|
26
|
+
That should work fine for our use case. Let's start coding.
|
|
27
|
+
```
|