@pencil-agent/nano-pencil 2.0.1 → 2.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +267 -267
- package/dist/build-meta.json +3 -3
- package/dist/core/export-html/AGENT.md +11 -11
- package/dist/core/export-html/template.css +971 -971
- package/dist/core/export-html/template.html +54 -54
- package/dist/core/model/custom-providers.js +1 -1
- package/dist/core/model-registry.js +5 -5
- package/dist/extensions/builtin/AGENT.md +115 -115
- package/dist/extensions/builtin/browser/AGENT.md +17 -17
- package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
- package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
- package/dist/extensions/builtin/browser/browser.md +73 -73
- package/dist/extensions/builtin/browser/install.md +142 -142
- package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
- package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
- package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
- package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
- package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
- package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
- package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
- package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
- package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
- package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
- package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
- package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
- package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
- package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
- package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
- package/dist/extensions/builtin/debug/index.js +9 -9
- package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
- package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
- package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
- package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
- package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
- package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
- package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
- package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
- package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
- package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
- package/dist/extensions/builtin/goal/README.md +67 -67
- package/dist/extensions/builtin/goal/index.js +6 -6
- package/dist/extensions/builtin/grub/README.md +112 -112
- package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
- package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
- package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
- package/dist/extensions/builtin/link-world/linkworld.md +313 -313
- package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
- package/dist/extensions/builtin/loop/README.md +92 -92
- package/dist/extensions/builtin/mcp/figma-design.md +68 -68
- package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
- package/dist/extensions/builtin/recap/AGENT.md +15 -15
- package/dist/extensions/builtin/sal/README.md +72 -72
- package/dist/extensions/builtin/security-audit/README.md +289 -289
- package/dist/extensions/builtin/team/AGENT.md +112 -112
- package/dist/extensions/builtin/team/TESTING.md +299 -299
- package/dist/extensions/builtin/token-save/README.md +56 -56
- package/dist/extensions/optional/AGENT.md +10 -10
- package/dist/modes/interactive/controllers/input-submit-controller.js +2 -2
- package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
- package/dist/modes/interactive/interactive-mode.js +19 -19
- package/dist/modes/interactive/theme/dark.json +85 -85
- package/dist/modes/interactive/theme/light.json +84 -84
- package/dist/modes/interactive/theme/theme-schema.json +335 -335
- package/dist/modes/interactive/theme/warm.json +81 -81
- package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
- package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
- package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
- package/docs/SDK-TESTING.md +364 -0
- package/docs/codex-goal-command-impl.md +1055 -1055
- package/docs/codex-goal-vs-grub.md +500 -500
- package/docs/custom-provider.md +27 -27
- package/docs/extensions.md +27 -27
- package/docs/keybindings.md +27 -27
- package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
- package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
- package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
- package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
- package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
- package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
- package/docs/loop-usage-examples.md +214 -214
- package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
- package/docs/models.md +27 -27
- package/docs/packages.md +27 -27
- package/docs/pi-design-philosophy.md +457 -457
- package/docs/planmode.md +1987 -1987
- package/docs/prompt-templates.md +27 -27
- package/docs/providers.md +27 -27
- package/docs/sdk.md +27 -27
- package/docs/skills.md +27 -27
- package/docs/startup-performance-optimization.md +301 -0
- package/docs/themes.md +27 -27
- package/docs/tui.md +27 -27
- package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
- package/package.json +190 -190
- package/docs/cc-agent-design.md +0 -1297
- package/docs/cc-tui-design.md +0 -1333
- package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
- package/docs/scan-report.md +0 -3820
- package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
- package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
|
@@ -1,352 +1,352 @@
|
|
|
1
|
-
# RAWG — Scraping & Data Extraction
|
|
2
|
-
|
|
3
|
-
Field-tested against rawg.io on 2026-04-18.
|
|
4
|
-
`https://rawg.io` — world's largest video game database with 500K+ games.
|
|
5
|
-
|
|
6
|
-
---
|
|
7
|
-
|
|
8
|
-
## API status — key required, no workaround
|
|
9
|
-
|
|
10
|
-
`https://api.rawg.io/api/` requires a valid API key on every request.
|
|
11
|
-
Empty key, dummy key, and header spoofing all return **HTTP 401**. Confirmed:
|
|
12
|
-
|
|
13
|
-
```
|
|
14
|
-
api.rawg.io/api/games?page_size=5 -> 401
|
|
15
|
-
api.rawg.io/api/games?page_size=5&key= -> 401
|
|
16
|
-
api.rawg.io/api/games?page_size=5&key=DEMO -> 401
|
|
17
|
-
rawg.io/api/games?page_size=5 -> 401
|
|
18
|
-
# Referer/Origin headers make no difference
|
|
19
|
-
```
|
|
20
|
-
|
|
21
|
-
Free API keys are available at `https://rawg.io/apidocs` after signing up at
|
|
22
|
-
`https://rawg.io/signup` (no credit card, ~1 minute). Free tier: **20,000 requests/month**.
|
|
23
|
-
Set the key in `.env` as `RAWG_API_KEY=<your_key>`.
|
|
24
|
-
|
|
25
|
-
---
|
|
26
|
-
|
|
27
|
-
## Approach 1 (Fastest, no key): HTML scraping via `window.CLIENT_PARAMS`
|
|
28
|
-
|
|
29
|
-
The website server-renders all game data into `window.CLIENT_PARAMS` in the page HTML.
|
|
30
|
-
One `http_get` call, pure JSON parse, no browser required.
|
|
31
|
-
Confirmed working on all tested game pages.
|
|
32
|
-
|
|
33
|
-
### Single game page
|
|
34
|
-
|
|
35
|
-
```python
|
|
36
|
-
import json
|
|
37
|
-
from helpers import http_get
|
|
38
|
-
|
|
39
|
-
def extract_game(slug):
|
|
40
|
-
"""
|
|
41
|
-
Fetch full game data from rawg.io/games/{slug}.
|
|
42
|
-
Handles canonical-slug redirects (e.g. 'disco-elysium-the-final-cut'
|
|
43
|
-
transparently becomes 'disco-elysium-final-cut').
|
|
44
|
-
Returns game dict or None.
|
|
45
|
-
"""
|
|
46
|
-
resp = http_get(f"https://rawg.io/games/{slug}")
|
|
47
|
-
idx = resp.find('window.CLIENT_PARAMS = {')
|
|
48
|
-
if idx < 0:
|
|
49
|
-
return None
|
|
50
|
-
chunk = resp[idx + len('window.CLIENT_PARAMS = '):]
|
|
51
|
-
# Extract JSON by counting braces
|
|
52
|
-
depth, end = 0, 0
|
|
53
|
-
for i, c in enumerate(chunk):
|
|
54
|
-
if c == '{': depth += 1
|
|
55
|
-
elif c == '}':
|
|
56
|
-
depth -= 1
|
|
57
|
-
if depth == 0:
|
|
58
|
-
end = i + 1
|
|
59
|
-
break
|
|
60
|
-
params = json.loads(chunk[:end])
|
|
61
|
-
initial_state = params['initialState']
|
|
62
|
-
entities = initial_state['entities']
|
|
63
|
-
games = entities.get('games', {})
|
|
64
|
-
# game.slug has 'g-' prefix and reflects the canonical slug after any redirect
|
|
65
|
-
canonical_key = initial_state.get('game', {}).get('slug', '')
|
|
66
|
-
game = games.get(canonical_key)
|
|
67
|
-
if not game:
|
|
68
|
-
game = games.get(f'g-{slug}')
|
|
69
|
-
if not game:
|
|
70
|
-
for g in games.values():
|
|
71
|
-
if isinstance(g, dict) and g.get('slug') == slug:
|
|
72
|
-
return g
|
|
73
|
-
return game
|
|
74
|
-
|
|
75
|
-
game = extract_game('the-witcher-3-wild-hunt')
|
|
76
|
-
# All fields confirmed present:
|
|
77
|
-
# game['name'] -> 'The Witcher 3: Wild Hunt'
|
|
78
|
-
# game['id'] -> 3328
|
|
79
|
-
# game['slug'] -> 'the-witcher-3-wild-hunt'
|
|
80
|
-
# game['rating'] -> 4.64 (RAWG community score, 0-5)
|
|
81
|
-
# game['rating_top'] -> 5
|
|
82
|
-
# game['ratings_count'] -> 7184
|
|
83
|
-
# game['metacritic'] -> 92 (None if no score)
|
|
84
|
-
# game['released'] -> '2015-05-18'
|
|
85
|
-
# game['updated'] -> '2026-04-17T23:18:04'
|
|
86
|
-
# game['playtime'] -> 43 (average hours)
|
|
87
|
-
# game['website'] -> 'https://thewitcher.com/en/witcher3'
|
|
88
|
-
# game['background_image'] -> 'https://media.rawg.io/media/games/618/618c2031a07bbff6b4f611f10b6bcdbc.jpg'
|
|
89
|
-
# game['added'] -> 22198 (count of users who added to library)
|
|
90
|
-
# game['esrb_rating'] -> {'id': 4, 'name': 'Mature', 'slug': 'mature'}
|
|
91
|
-
# game['genres'] -> [{'id': 4, 'name': 'Action', 'slug': 'action'}, ...]
|
|
92
|
-
# game['platforms'] -> ['playstation5', 'xbox-series-x', 'pc', ...] (slugs, cross-ref entities)
|
|
93
|
-
# game['parent_platforms'] -> ['pc', 'playstation', 'xbox', 'mac', 'nintendo']
|
|
94
|
-
# game['developers'] -> [{'id': 9023, 'name': 'CD PROJEKT RED', 'slug': '...'}]
|
|
95
|
-
# game['publishers'] -> [{'id': 7411, 'name': 'CD PROJEKT RED', 'slug': '...'}]
|
|
96
|
-
# game['tags'] -> [{'id': 31, 'name': 'Singleplayer', ...}, ...]
|
|
97
|
-
# game['description_raw'] -> plain-text description (detail page only)
|
|
98
|
-
# game['description'] -> HTML description
|
|
99
|
-
# game['ratings'] -> [{'title': 'exceptional', 'percent': 76.53}, ...]
|
|
100
|
-
# game['metacritic_platforms'] -> [{'metascore': 93, 'platform': {...}}, ...]
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
### Extract specific fields
|
|
104
|
-
|
|
105
|
-
```python
|
|
106
|
-
def game_summary(slug):
|
|
107
|
-
g = extract_game(slug)
|
|
108
|
-
if not g:
|
|
109
|
-
return None
|
|
110
|
-
return {
|
|
111
|
-
'id': g['id'],
|
|
112
|
-
'name': g['name'],
|
|
113
|
-
'slug': g['slug'],
|
|
114
|
-
'rating': g['rating'],
|
|
115
|
-
'metacritic': g['metacritic'],
|
|
116
|
-
'released': g['released'],
|
|
117
|
-
'playtime_hrs': g['playtime'],
|
|
118
|
-
'website': g.get('website'),
|
|
119
|
-
'esrb': (g.get('esrb_rating') or {}).get('name'),
|
|
120
|
-
'genres': [ge['name'] for ge in g.get('genres', []) if isinstance(ge, dict)],
|
|
121
|
-
'platforms': g.get('parent_platforms', []),
|
|
122
|
-
'developers': [d['name'] for d in g.get('developers', []) if isinstance(d, dict)],
|
|
123
|
-
'publishers': [p['name'] for p in g.get('publishers', []) if isinstance(p, dict)],
|
|
124
|
-
'tags': [t['name'] for t in g.get('tags', []) if isinstance(t, dict)][:10],
|
|
125
|
-
'image': g.get('background_image'),
|
|
126
|
-
}
|
|
127
|
-
|
|
128
|
-
# Confirmed results:
|
|
129
|
-
print(game_summary('red-dead-redemption-2'))
|
|
130
|
-
# {'id': 28, 'name': 'Red Dead Redemption 2', 'rating': 4.59, 'metacritic': 96,
|
|
131
|
-
# 'released': '2018-10-26', 'playtime_hrs': 21,
|
|
132
|
-
# 'esrb': 'Mature',
|
|
133
|
-
# 'genres': ['Action'],
|
|
134
|
-
# 'platforms': ['pc', 'playstation', 'xbox'],
|
|
135
|
-
# 'developers': ['Rockstar Games'], 'publishers': ['Rockstar Games'],
|
|
136
|
-
# 'tags': ['Singleplayer', 'Multiplayer', 'Atmospheric', 'Great Soundtrack', 'Co-op', ...]}
|
|
137
|
-
```
|
|
138
|
-
|
|
139
|
-
### Top 40 games from the listing page
|
|
140
|
-
|
|
141
|
-
The listing page always returns the same ~40 popular games regardless of URL params
|
|
142
|
-
(ordering/search/genres params are client-side only — the server returns the same SSR payload).
|
|
143
|
-
|
|
144
|
-
```python
|
|
145
|
-
def top_games():
|
|
146
|
-
"""Returns list of 40 game dicts from rawg.io/games listing page."""
|
|
147
|
-
resp = http_get("https://rawg.io/games")
|
|
148
|
-
idx = resp.find('window.CLIENT_PARAMS = {')
|
|
149
|
-
if idx < 0:
|
|
150
|
-
return []
|
|
151
|
-
chunk = resp[idx + len('window.CLIENT_PARAMS = '):]
|
|
152
|
-
depth, end = 0, 0
|
|
153
|
-
for i, c in enumerate(chunk):
|
|
154
|
-
if c == '{': depth += 1
|
|
155
|
-
elif c == '}':
|
|
156
|
-
depth -= 1
|
|
157
|
-
if depth == 0:
|
|
158
|
-
end = i + 1
|
|
159
|
-
break
|
|
160
|
-
params = json.loads(chunk[:end])
|
|
161
|
-
return list(params['initialState']['entities'].get('games', {}).values())
|
|
162
|
-
|
|
163
|
-
games = top_games()
|
|
164
|
-
# 40 games, each with: id, slug, name, released, rating, rating_top, ratings_count,
|
|
165
|
-
# metacritic, playtime, added, genres (full objects), parent_platforms (slugs),
|
|
166
|
-
# platforms (slugs), tags (full objects), esrb_rating, background_image, short_screenshots
|
|
167
|
-
# NOTE: listing omits description, website, developers, publishers vs detail pages
|
|
168
|
-
|
|
169
|
-
for g in games[:5]:
|
|
170
|
-
print(f"{g['name']} | rating={g['rating']} | metacritic={g['metacritic']}")
|
|
171
|
-
# Grand Theft Auto V | rating=4.47 | metacritic=92
|
|
172
|
-
# The Witcher 3: Wild Hunt | rating=4.64 | metacritic=92
|
|
173
|
-
# Portal 2 | rating=4.58 | metacritic=95
|
|
174
|
-
# Counter-Strike: Global Offensive | rating=3.57 | metacritic=81
|
|
175
|
-
# Tomb Raider (2013) | rating=4.06 | metacritic=86
|
|
176
|
-
```
|
|
177
|
-
|
|
178
|
-
### Bulk / concurrent fetching
|
|
179
|
-
|
|
180
|
-
```python
|
|
181
|
-
from concurrent.futures import ThreadPoolExecutor
|
|
182
|
-
|
|
183
|
-
slugs = ['portal-2', 'dark-souls-iii', 'minecraft', 'hades', 'celeste']
|
|
184
|
-
with ThreadPoolExecutor(max_workers=3) as ex:
|
|
185
|
-
results = list(ex.map(extract_game, slugs))
|
|
186
|
-
# Tested: 4 games in ~2.8s at max_workers=4
|
|
187
|
-
# Occasional timeout at high concurrency — keep max_workers<=3 to stay reliable
|
|
188
|
-
```
|
|
189
|
-
|
|
190
|
-
---
|
|
191
|
-
|
|
192
|
-
## Approach 2: REST API (requires free key)
|
|
193
|
-
|
|
194
|
-
All endpoints live at `https://api.rawg.io/api/`. Append `&key=YOUR_API_KEY` to every request.
|
|
195
|
-
|
|
196
|
-
### Get a free key
|
|
197
|
-
|
|
198
|
-
1. Sign up at `https://rawg.io/signup`
|
|
199
|
-
2. Visit `https://rawg.io/apidocs` — click "Get API key"
|
|
200
|
-
3. The key is a 40-char hex string
|
|
201
|
-
4. Store as `RAWG_API_KEY` in `.env`
|
|
202
|
-
|
|
203
|
-
### Games list / search
|
|
204
|
-
|
|
205
|
-
```python
|
|
206
|
-
import json, os
|
|
207
|
-
from helpers import http_get
|
|
208
|
-
|
|
209
|
-
KEY = os.environ['RAWG_API_KEY']
|
|
210
|
-
|
|
211
|
-
# Search
|
|
212
|
-
results = json.loads(http_get(
|
|
213
|
-
f"https://api.rawg.io/api/games?search=witcher&page_size=5&key={KEY}"
|
|
214
|
-
))
|
|
215
|
-
# results['count'] -> total matching games
|
|
216
|
-
# results['next'] -> next page URL (pagination)
|
|
217
|
-
# results['results'] -> list of game objects
|
|
218
|
-
|
|
219
|
-
# Top-rated
|
|
220
|
-
top = json.loads(http_get(
|
|
221
|
-
f"https://api.rawg.io/api/games?ordering=-metacritic&page_size=10&key={KEY}"
|
|
222
|
-
))
|
|
223
|
-
|
|
224
|
-
# By date range
|
|
225
|
-
recent = json.loads(http_get(
|
|
226
|
-
f"https://api.rawg.io/api/games?dates=2024-01-01,2024-12-31&ordering=-added&page_size=20&key={KEY}"
|
|
227
|
-
))
|
|
228
|
-
|
|
229
|
-
# By platform (PC=4, PS4=18, Xbox One=1, Switch=7)
|
|
230
|
-
pc_games = json.loads(http_get(
|
|
231
|
-
f"https://api.rawg.io/api/games?platforms=4&ordering=-rating&page_size=10&key={KEY}"
|
|
232
|
-
))
|
|
233
|
-
```
|
|
234
|
-
|
|
235
|
-
### Game detail
|
|
236
|
-
|
|
237
|
-
```python
|
|
238
|
-
# By ID (faster if you have it)
|
|
239
|
-
game = json.loads(http_get(f"https://api.rawg.io/api/games/3328?key={KEY}"))
|
|
240
|
-
# game['name'], game['rating'], game['metacritic'], game['description_raw'], ...
|
|
241
|
-
|
|
242
|
-
# By slug
|
|
243
|
-
game = json.loads(http_get(
|
|
244
|
-
f"https://api.rawg.io/api/games/the-witcher-3-wild-hunt?key={KEY}"
|
|
245
|
-
))
|
|
246
|
-
```
|
|
247
|
-
|
|
248
|
-
### API response fields (same as HTML scraping)
|
|
249
|
-
|
|
250
|
-
```
|
|
251
|
-
id, slug, name, released, tba, background_image,
|
|
252
|
-
rating (0-5 RAWG community), rating_top, ratings, ratings_count,
|
|
253
|
-
metacritic, playtime, added, added_by_status,
|
|
254
|
-
platforms (list of {platform:{id,name,slug}, released_at}),
|
|
255
|
-
parent_platforms (list of {platform:{id,name,slug}}),
|
|
256
|
-
genres (list of {id,name,slug}),
|
|
257
|
-
tags (list of {id,name,slug,language,games_count}),
|
|
258
|
-
developers (list of {id,name,slug}),
|
|
259
|
-
publishers (list of {id,name,slug}),
|
|
260
|
-
stores (list of {id,store:{id,name,slug}}),
|
|
261
|
-
esrb_rating ({id,name,slug}),
|
|
262
|
-
website, description_raw, description, screenshots_count,
|
|
263
|
-
movies_count, creators_count, achievements_count,
|
|
264
|
-
metacritic_url, metacritic_platforms
|
|
265
|
-
```
|
|
266
|
-
|
|
267
|
-
### Platforms and Genres lists
|
|
268
|
-
|
|
269
|
-
```python
|
|
270
|
-
# All platforms
|
|
271
|
-
platforms = json.loads(http_get(f"https://api.rawg.io/api/platforms?key={KEY}"))
|
|
272
|
-
# results: [{id, name, slug, games_count, year_start, year_end, ...}]
|
|
273
|
-
|
|
274
|
-
# Parent platforms only
|
|
275
|
-
parents = json.loads(http_get(f"https://api.rawg.io/api/platforms/lists/parents?key={KEY}"))
|
|
276
|
-
|
|
277
|
-
# Genres
|
|
278
|
-
genres = json.loads(http_get(f"https://api.rawg.io/api/genres?key={KEY}"))
|
|
279
|
-
# results: [{id, name, slug, games_count, image_background}]
|
|
280
|
-
```
|
|
281
|
-
|
|
282
|
-
### Pagination
|
|
283
|
-
|
|
284
|
-
```python
|
|
285
|
-
def get_all_pages(url_template, max_pages=5):
|
|
286
|
-
"""Paginate through API results."""
|
|
287
|
-
results = []
|
|
288
|
-
url = url_template + "&page=1"
|
|
289
|
-
for _ in range(max_pages):
|
|
290
|
-
data = json.loads(http_get(url))
|
|
291
|
-
results.extend(data.get('results', []))
|
|
292
|
-
if not data.get('next'):
|
|
293
|
-
break
|
|
294
|
-
url = data['next']
|
|
295
|
-
return results
|
|
296
|
-
```
|
|
297
|
-
|
|
298
|
-
---
|
|
299
|
-
|
|
300
|
-
## Gotchas
|
|
301
|
-
|
|
302
|
-
- **API is fully blocked without a key** — `401` for every endpoint, including empty key and
|
|
303
|
-
`rawg.io/api/` (non-`api.rawg.io` subdomain). No auth bypass exists.
|
|
304
|
-
|
|
305
|
-
- **URL params are client-side on listing pages** — `rawg.io/games?ordering=-rating` and
|
|
306
|
-
`rawg.io/games?search=witcher` return identical 40-game SSR payloads. Params only affect
|
|
307
|
-
the React client after hydration. Use the API for real filtering, or scrape individual game
|
|
308
|
-
pages by slug.
|
|
309
|
-
|
|
310
|
-
- **Slug canonical redirects** — Some slugs redirect internally:
|
|
311
|
-
`disco-elysium-the-final-cut` → `disco-elysium-final-cut`. The URL you fetch returns HTTP 200
|
|
312
|
-
but the routing state inside `CLIENT_PARAMS` reflects the canonical path. Always use
|
|
313
|
-
`initial_state['game']['slug']` as the lookup key (it already has the `g-` prefix),
|
|
314
|
-
not a constructed `'g-' + url_slug`.
|
|
315
|
-
|
|
316
|
-
- **`g-` prefix on entity keys** — Game entities in `CLIENT_PARAMS.initialState.entities.games`
|
|
317
|
-
are keyed as `g-{slug}` (e.g. `g-the-witcher-3-wild-hunt`), not bare slugs.
|
|
318
|
-
The game state slug field also carries this prefix: `{'slug': 'g-the-witcher-3-wild-hunt'}`.
|
|
319
|
-
|
|
320
|
-
- **Listing page gives 40 games, detail pages give full fields** — `description`, `website`,
|
|
321
|
-
`developers`, `publishers` are absent from the listing page payload. Only present on
|
|
322
|
-
individual game pages.
|
|
323
|
-
|
|
324
|
-
- **Concurrent requests: keep max_workers ≤ 3** — At `max_workers=5` with 10 requests,
|
|
325
|
-
some pages timed out (20s default) or returned 502. Sequential or 3-worker parallel is
|
|
326
|
-
reliable. A brief `time.sleep(0.5)` between sequential requests avoids 502 spikes.
|
|
327
|
-
|
|
328
|
-
- **`platforms` field in game entities uses slugs, `platform_entities` has full objects** —
|
|
329
|
-
In the HTML payload, `game['platforms']` is a list of slug strings
|
|
330
|
-
(`['playstation5', 'pc', ...]`). Full platform details live in
|
|
331
|
-
`entities.platforms[slug]` as `{'platform': {id, name, slug, ...}, 'released_at': '...'}`.
|
|
332
|
-
`game['parent_platforms']` is also a list of slug strings (`['pc', 'playstation', ...]`).
|
|
333
|
-
|
|
334
|
-
- **`metacritic` is `None` for games without a score** — Always check `if game['metacritic']`
|
|
335
|
-
before using. Many indie/older games have no Metacritic score.
|
|
336
|
-
|
|
337
|
-
- **`esrb_rating` is `None` for non-US-rated games** — Common for Japanese games and
|
|
338
|
-
anything outside the ESRB's jurisdiction.
|
|
339
|
-
|
|
340
|
-
- **`god-of-war` slug resolves to God of War I (PS2, 2005)** — The PS4 2018 title uses
|
|
341
|
-
`god-of-war-4` or has its own entity key. Always verify the game name in the response.
|
|
342
|
-
|
|
343
|
-
- **Free API tier: 20,000 requests/month** — Roughly 650/day. Listing endpoint returns
|
|
344
|
-
20 results per page by default (max `page_size=40`). For bulk data collection, the HTML
|
|
345
|
-
scraping approach has no documented rate limit but times out under heavy parallel load.
|
|
346
|
-
|
|
347
|
-
- **`description_raw` vs `description`** — `description` is HTML with escaped unicode
|
|
348
|
-
(`\u003C` = `<`). `description_raw` is plain text, easier to work with. Both present
|
|
349
|
-
on detail pages only.
|
|
350
|
-
|
|
351
|
-
- **`updated` field reflects last RAWG edit, not release date** — Use `released` for
|
|
352
|
-
the release date. `updated` changes frequently as the community edits entries.
|
|
1
|
+
# RAWG — Scraping & Data Extraction
|
|
2
|
+
|
|
3
|
+
Field-tested against rawg.io on 2026-04-18.
|
|
4
|
+
`https://rawg.io` — world's largest video game database with 500K+ games.
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## API status — key required, no workaround
|
|
9
|
+
|
|
10
|
+
`https://api.rawg.io/api/` requires a valid API key on every request.
|
|
11
|
+
Empty key, dummy key, and header spoofing all return **HTTP 401**. Confirmed:
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
api.rawg.io/api/games?page_size=5 -> 401
|
|
15
|
+
api.rawg.io/api/games?page_size=5&key= -> 401
|
|
16
|
+
api.rawg.io/api/games?page_size=5&key=DEMO -> 401
|
|
17
|
+
rawg.io/api/games?page_size=5 -> 401
|
|
18
|
+
# Referer/Origin headers make no difference
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
Free API keys are available at `https://rawg.io/apidocs` after signing up at
|
|
22
|
+
`https://rawg.io/signup` (no credit card, ~1 minute). Free tier: **20,000 requests/month**.
|
|
23
|
+
Set the key in `.env` as `RAWG_API_KEY=<your_key>`.
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## Approach 1 (Fastest, no key): HTML scraping via `window.CLIENT_PARAMS`
|
|
28
|
+
|
|
29
|
+
The website server-renders all game data into `window.CLIENT_PARAMS` in the page HTML.
|
|
30
|
+
One `http_get` call, pure JSON parse, no browser required.
|
|
31
|
+
Confirmed working on all tested game pages.
|
|
32
|
+
|
|
33
|
+
### Single game page
|
|
34
|
+
|
|
35
|
+
```python
|
|
36
|
+
import json
|
|
37
|
+
from helpers import http_get
|
|
38
|
+
|
|
39
|
+
def extract_game(slug):
|
|
40
|
+
"""
|
|
41
|
+
Fetch full game data from rawg.io/games/{slug}.
|
|
42
|
+
Handles canonical-slug redirects (e.g. 'disco-elysium-the-final-cut'
|
|
43
|
+
transparently becomes 'disco-elysium-final-cut').
|
|
44
|
+
Returns game dict or None.
|
|
45
|
+
"""
|
|
46
|
+
resp = http_get(f"https://rawg.io/games/{slug}")
|
|
47
|
+
idx = resp.find('window.CLIENT_PARAMS = {')
|
|
48
|
+
if idx < 0:
|
|
49
|
+
return None
|
|
50
|
+
chunk = resp[idx + len('window.CLIENT_PARAMS = '):]
|
|
51
|
+
# Extract JSON by counting braces
|
|
52
|
+
depth, end = 0, 0
|
|
53
|
+
for i, c in enumerate(chunk):
|
|
54
|
+
if c == '{': depth += 1
|
|
55
|
+
elif c == '}':
|
|
56
|
+
depth -= 1
|
|
57
|
+
if depth == 0:
|
|
58
|
+
end = i + 1
|
|
59
|
+
break
|
|
60
|
+
params = json.loads(chunk[:end])
|
|
61
|
+
initial_state = params['initialState']
|
|
62
|
+
entities = initial_state['entities']
|
|
63
|
+
games = entities.get('games', {})
|
|
64
|
+
# game.slug has 'g-' prefix and reflects the canonical slug after any redirect
|
|
65
|
+
canonical_key = initial_state.get('game', {}).get('slug', '')
|
|
66
|
+
game = games.get(canonical_key)
|
|
67
|
+
if not game:
|
|
68
|
+
game = games.get(f'g-{slug}')
|
|
69
|
+
if not game:
|
|
70
|
+
for g in games.values():
|
|
71
|
+
if isinstance(g, dict) and g.get('slug') == slug:
|
|
72
|
+
return g
|
|
73
|
+
return game
|
|
74
|
+
|
|
75
|
+
game = extract_game('the-witcher-3-wild-hunt')
|
|
76
|
+
# All fields confirmed present:
|
|
77
|
+
# game['name'] -> 'The Witcher 3: Wild Hunt'
|
|
78
|
+
# game['id'] -> 3328
|
|
79
|
+
# game['slug'] -> 'the-witcher-3-wild-hunt'
|
|
80
|
+
# game['rating'] -> 4.64 (RAWG community score, 0-5)
|
|
81
|
+
# game['rating_top'] -> 5
|
|
82
|
+
# game['ratings_count'] -> 7184
|
|
83
|
+
# game['metacritic'] -> 92 (None if no score)
|
|
84
|
+
# game['released'] -> '2015-05-18'
|
|
85
|
+
# game['updated'] -> '2026-04-17T23:18:04'
|
|
86
|
+
# game['playtime'] -> 43 (average hours)
|
|
87
|
+
# game['website'] -> 'https://thewitcher.com/en/witcher3'
|
|
88
|
+
# game['background_image'] -> 'https://media.rawg.io/media/games/618/618c2031a07bbff6b4f611f10b6bcdbc.jpg'
|
|
89
|
+
# game['added'] -> 22198 (count of users who added to library)
|
|
90
|
+
# game['esrb_rating'] -> {'id': 4, 'name': 'Mature', 'slug': 'mature'}
|
|
91
|
+
# game['genres'] -> [{'id': 4, 'name': 'Action', 'slug': 'action'}, ...]
|
|
92
|
+
# game['platforms'] -> ['playstation5', 'xbox-series-x', 'pc', ...] (slugs, cross-ref entities)
|
|
93
|
+
# game['parent_platforms'] -> ['pc', 'playstation', 'xbox', 'mac', 'nintendo']
|
|
94
|
+
# game['developers'] -> [{'id': 9023, 'name': 'CD PROJEKT RED', 'slug': '...'}]
|
|
95
|
+
# game['publishers'] -> [{'id': 7411, 'name': 'CD PROJEKT RED', 'slug': '...'}]
|
|
96
|
+
# game['tags'] -> [{'id': 31, 'name': 'Singleplayer', ...}, ...]
|
|
97
|
+
# game['description_raw'] -> plain-text description (detail page only)
|
|
98
|
+
# game['description'] -> HTML description
|
|
99
|
+
# game['ratings'] -> [{'title': 'exceptional', 'percent': 76.53}, ...]
|
|
100
|
+
# game['metacritic_platforms'] -> [{'metascore': 93, 'platform': {...}}, ...]
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
### Extract specific fields
|
|
104
|
+
|
|
105
|
+
```python
|
|
106
|
+
def game_summary(slug):
|
|
107
|
+
g = extract_game(slug)
|
|
108
|
+
if not g:
|
|
109
|
+
return None
|
|
110
|
+
return {
|
|
111
|
+
'id': g['id'],
|
|
112
|
+
'name': g['name'],
|
|
113
|
+
'slug': g['slug'],
|
|
114
|
+
'rating': g['rating'],
|
|
115
|
+
'metacritic': g['metacritic'],
|
|
116
|
+
'released': g['released'],
|
|
117
|
+
'playtime_hrs': g['playtime'],
|
|
118
|
+
'website': g.get('website'),
|
|
119
|
+
'esrb': (g.get('esrb_rating') or {}).get('name'),
|
|
120
|
+
'genres': [ge['name'] for ge in g.get('genres', []) if isinstance(ge, dict)],
|
|
121
|
+
'platforms': g.get('parent_platforms', []),
|
|
122
|
+
'developers': [d['name'] for d in g.get('developers', []) if isinstance(d, dict)],
|
|
123
|
+
'publishers': [p['name'] for p in g.get('publishers', []) if isinstance(p, dict)],
|
|
124
|
+
'tags': [t['name'] for t in g.get('tags', []) if isinstance(t, dict)][:10],
|
|
125
|
+
'image': g.get('background_image'),
|
|
126
|
+
}
|
|
127
|
+
|
|
128
|
+
# Confirmed results:
|
|
129
|
+
print(game_summary('red-dead-redemption-2'))
|
|
130
|
+
# {'id': 28, 'name': 'Red Dead Redemption 2', 'rating': 4.59, 'metacritic': 96,
|
|
131
|
+
# 'released': '2018-10-26', 'playtime_hrs': 21,
|
|
132
|
+
# 'esrb': 'Mature',
|
|
133
|
+
# 'genres': ['Action'],
|
|
134
|
+
# 'platforms': ['pc', 'playstation', 'xbox'],
|
|
135
|
+
# 'developers': ['Rockstar Games'], 'publishers': ['Rockstar Games'],
|
|
136
|
+
# 'tags': ['Singleplayer', 'Multiplayer', 'Atmospheric', 'Great Soundtrack', 'Co-op', ...]}
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
### Top 40 games from the listing page
|
|
140
|
+
|
|
141
|
+
The listing page always returns the same ~40 popular games regardless of URL params
|
|
142
|
+
(ordering/search/genres params are client-side only — the server returns the same SSR payload).
|
|
143
|
+
|
|
144
|
+
```python
|
|
145
|
+
def top_games():
|
|
146
|
+
"""Returns list of 40 game dicts from rawg.io/games listing page."""
|
|
147
|
+
resp = http_get("https://rawg.io/games")
|
|
148
|
+
idx = resp.find('window.CLIENT_PARAMS = {')
|
|
149
|
+
if idx < 0:
|
|
150
|
+
return []
|
|
151
|
+
chunk = resp[idx + len('window.CLIENT_PARAMS = '):]
|
|
152
|
+
depth, end = 0, 0
|
|
153
|
+
for i, c in enumerate(chunk):
|
|
154
|
+
if c == '{': depth += 1
|
|
155
|
+
elif c == '}':
|
|
156
|
+
depth -= 1
|
|
157
|
+
if depth == 0:
|
|
158
|
+
end = i + 1
|
|
159
|
+
break
|
|
160
|
+
params = json.loads(chunk[:end])
|
|
161
|
+
return list(params['initialState']['entities'].get('games', {}).values())
|
|
162
|
+
|
|
163
|
+
games = top_games()
|
|
164
|
+
# 40 games, each with: id, slug, name, released, rating, rating_top, ratings_count,
|
|
165
|
+
# metacritic, playtime, added, genres (full objects), parent_platforms (slugs),
|
|
166
|
+
# platforms (slugs), tags (full objects), esrb_rating, background_image, short_screenshots
|
|
167
|
+
# NOTE: listing omits description, website, developers, publishers vs detail pages
|
|
168
|
+
|
|
169
|
+
for g in games[:5]:
|
|
170
|
+
print(f"{g['name']} | rating={g['rating']} | metacritic={g['metacritic']}")
|
|
171
|
+
# Grand Theft Auto V | rating=4.47 | metacritic=92
|
|
172
|
+
# The Witcher 3: Wild Hunt | rating=4.64 | metacritic=92
|
|
173
|
+
# Portal 2 | rating=4.58 | metacritic=95
|
|
174
|
+
# Counter-Strike: Global Offensive | rating=3.57 | metacritic=81
|
|
175
|
+
# Tomb Raider (2013) | rating=4.06 | metacritic=86
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### Bulk / concurrent fetching
|
|
179
|
+
|
|
180
|
+
```python
|
|
181
|
+
from concurrent.futures import ThreadPoolExecutor
|
|
182
|
+
|
|
183
|
+
slugs = ['portal-2', 'dark-souls-iii', 'minecraft', 'hades', 'celeste']
|
|
184
|
+
with ThreadPoolExecutor(max_workers=3) as ex:
|
|
185
|
+
results = list(ex.map(extract_game, slugs))
|
|
186
|
+
# Tested: 4 games in ~2.8s at max_workers=4
|
|
187
|
+
# Occasional timeout at high concurrency — keep max_workers<=3 to stay reliable
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
---
|
|
191
|
+
|
|
192
|
+
## Approach 2: REST API (requires free key)
|
|
193
|
+
|
|
194
|
+
All endpoints live at `https://api.rawg.io/api/`. Append `&key=YOUR_API_KEY` to every request.
|
|
195
|
+
|
|
196
|
+
### Get a free key
|
|
197
|
+
|
|
198
|
+
1. Sign up at `https://rawg.io/signup`
|
|
199
|
+
2. Visit `https://rawg.io/apidocs` — click "Get API key"
|
|
200
|
+
3. The key is a 40-char hex string
|
|
201
|
+
4. Store as `RAWG_API_KEY` in `.env`
|
|
202
|
+
|
|
203
|
+
### Games list / search
|
|
204
|
+
|
|
205
|
+
```python
|
|
206
|
+
import json, os
|
|
207
|
+
from helpers import http_get
|
|
208
|
+
|
|
209
|
+
KEY = os.environ['RAWG_API_KEY']
|
|
210
|
+
|
|
211
|
+
# Search
|
|
212
|
+
results = json.loads(http_get(
|
|
213
|
+
f"https://api.rawg.io/api/games?search=witcher&page_size=5&key={KEY}"
|
|
214
|
+
))
|
|
215
|
+
# results['count'] -> total matching games
|
|
216
|
+
# results['next'] -> next page URL (pagination)
|
|
217
|
+
# results['results'] -> list of game objects
|
|
218
|
+
|
|
219
|
+
# Top-rated
|
|
220
|
+
top = json.loads(http_get(
|
|
221
|
+
f"https://api.rawg.io/api/games?ordering=-metacritic&page_size=10&key={KEY}"
|
|
222
|
+
))
|
|
223
|
+
|
|
224
|
+
# By date range
|
|
225
|
+
recent = json.loads(http_get(
|
|
226
|
+
f"https://api.rawg.io/api/games?dates=2024-01-01,2024-12-31&ordering=-added&page_size=20&key={KEY}"
|
|
227
|
+
))
|
|
228
|
+
|
|
229
|
+
# By platform (PC=4, PS4=18, Xbox One=1, Switch=7)
|
|
230
|
+
pc_games = json.loads(http_get(
|
|
231
|
+
f"https://api.rawg.io/api/games?platforms=4&ordering=-rating&page_size=10&key={KEY}"
|
|
232
|
+
))
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
### Game detail
|
|
236
|
+
|
|
237
|
+
```python
|
|
238
|
+
# By ID (faster if you have it)
|
|
239
|
+
game = json.loads(http_get(f"https://api.rawg.io/api/games/3328?key={KEY}"))
|
|
240
|
+
# game['name'], game['rating'], game['metacritic'], game['description_raw'], ...
|
|
241
|
+
|
|
242
|
+
# By slug
|
|
243
|
+
game = json.loads(http_get(
|
|
244
|
+
f"https://api.rawg.io/api/games/the-witcher-3-wild-hunt?key={KEY}"
|
|
245
|
+
))
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
### API response fields (same as HTML scraping)
|
|
249
|
+
|
|
250
|
+
```
|
|
251
|
+
id, slug, name, released, tba, background_image,
|
|
252
|
+
rating (0-5 RAWG community), rating_top, ratings, ratings_count,
|
|
253
|
+
metacritic, playtime, added, added_by_status,
|
|
254
|
+
platforms (list of {platform:{id,name,slug}, released_at}),
|
|
255
|
+
parent_platforms (list of {platform:{id,name,slug}}),
|
|
256
|
+
genres (list of {id,name,slug}),
|
|
257
|
+
tags (list of {id,name,slug,language,games_count}),
|
|
258
|
+
developers (list of {id,name,slug}),
|
|
259
|
+
publishers (list of {id,name,slug}),
|
|
260
|
+
stores (list of {id,store:{id,name,slug}}),
|
|
261
|
+
esrb_rating ({id,name,slug}),
|
|
262
|
+
website, description_raw, description, screenshots_count,
|
|
263
|
+
movies_count, creators_count, achievements_count,
|
|
264
|
+
metacritic_url, metacritic_platforms
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
### Platforms and Genres lists
|
|
268
|
+
|
|
269
|
+
```python
|
|
270
|
+
# All platforms
|
|
271
|
+
platforms = json.loads(http_get(f"https://api.rawg.io/api/platforms?key={KEY}"))
|
|
272
|
+
# results: [{id, name, slug, games_count, year_start, year_end, ...}]
|
|
273
|
+
|
|
274
|
+
# Parent platforms only
|
|
275
|
+
parents = json.loads(http_get(f"https://api.rawg.io/api/platforms/lists/parents?key={KEY}"))
|
|
276
|
+
|
|
277
|
+
# Genres
|
|
278
|
+
genres = json.loads(http_get(f"https://api.rawg.io/api/genres?key={KEY}"))
|
|
279
|
+
# results: [{id, name, slug, games_count, image_background}]
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
### Pagination
|
|
283
|
+
|
|
284
|
+
```python
|
|
285
|
+
def get_all_pages(url_template, max_pages=5):
|
|
286
|
+
"""Paginate through API results."""
|
|
287
|
+
results = []
|
|
288
|
+
url = url_template + "&page=1"
|
|
289
|
+
for _ in range(max_pages):
|
|
290
|
+
data = json.loads(http_get(url))
|
|
291
|
+
results.extend(data.get('results', []))
|
|
292
|
+
if not data.get('next'):
|
|
293
|
+
break
|
|
294
|
+
url = data['next']
|
|
295
|
+
return results
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
---
|
|
299
|
+
|
|
300
|
+
## Gotchas
|
|
301
|
+
|
|
302
|
+
- **API is fully blocked without a key** — `401` for every endpoint, including empty key and
|
|
303
|
+
`rawg.io/api/` (non-`api.rawg.io` subdomain). No auth bypass exists.
|
|
304
|
+
|
|
305
|
+
- **URL params are client-side on listing pages** — `rawg.io/games?ordering=-rating` and
|
|
306
|
+
`rawg.io/games?search=witcher` return identical 40-game SSR payloads. Params only affect
|
|
307
|
+
the React client after hydration. Use the API for real filtering, or scrape individual game
|
|
308
|
+
pages by slug.
|
|
309
|
+
|
|
310
|
+
- **Slug canonical redirects** — Some slugs redirect internally:
|
|
311
|
+
`disco-elysium-the-final-cut` → `disco-elysium-final-cut`. The URL you fetch returns HTTP 200
|
|
312
|
+
but the routing state inside `CLIENT_PARAMS` reflects the canonical path. Always use
|
|
313
|
+
`initial_state['game']['slug']` as the lookup key (it already has the `g-` prefix),
|
|
314
|
+
not a constructed `'g-' + url_slug`.
|
|
315
|
+
|
|
316
|
+
- **`g-` prefix on entity keys** — Game entities in `CLIENT_PARAMS.initialState.entities.games`
|
|
317
|
+
are keyed as `g-{slug}` (e.g. `g-the-witcher-3-wild-hunt`), not bare slugs.
|
|
318
|
+
The game state slug field also carries this prefix: `{'slug': 'g-the-witcher-3-wild-hunt'}`.
|
|
319
|
+
|
|
320
|
+
- **Listing page gives 40 games, detail pages give full fields** — `description`, `website`,
|
|
321
|
+
`developers`, `publishers` are absent from the listing page payload. Only present on
|
|
322
|
+
individual game pages.
|
|
323
|
+
|
|
324
|
+
- **Concurrent requests: keep max_workers ≤ 3** — At `max_workers=5` with 10 requests,
|
|
325
|
+
some pages timed out (20s default) or returned 502. Sequential or 3-worker parallel is
|
|
326
|
+
reliable. A brief `time.sleep(0.5)` between sequential requests avoids 502 spikes.
|
|
327
|
+
|
|
328
|
+
- **`platforms` field in game entities uses slugs, `platform_entities` has full objects** —
|
|
329
|
+
In the HTML payload, `game['platforms']` is a list of slug strings
|
|
330
|
+
(`['playstation5', 'pc', ...]`). Full platform details live in
|
|
331
|
+
`entities.platforms[slug]` as `{'platform': {id, name, slug, ...}, 'released_at': '...'}`.
|
|
332
|
+
`game['parent_platforms']` is also a list of slug strings (`['pc', 'playstation', ...]`).
|
|
333
|
+
|
|
334
|
+
- **`metacritic` is `None` for games without a score** — Always check `if game['metacritic']`
|
|
335
|
+
before using. Many indie/older games have no Metacritic score.
|
|
336
|
+
|
|
337
|
+
- **`esrb_rating` is `None` for non-US-rated games** — Common for Japanese games and
|
|
338
|
+
anything outside the ESRB's jurisdiction.
|
|
339
|
+
|
|
340
|
+
- **`god-of-war` slug resolves to God of War I (PS2, 2005)** — The PS4 2018 title uses
|
|
341
|
+
`god-of-war-4` or has its own entity key. Always verify the game name in the response.
|
|
342
|
+
|
|
343
|
+
- **Free API tier: 20,000 requests/month** — Roughly 650/day. Listing endpoint returns
|
|
344
|
+
20 results per page by default (max `page_size=40`). For bulk data collection, the HTML
|
|
345
|
+
scraping approach has no documented rate limit but times out under heavy parallel load.
|
|
346
|
+
|
|
347
|
+
- **`description_raw` vs `description`** — `description` is HTML with escaped unicode
|
|
348
|
+
(`\u003C` = `<`). `description_raw` is plain text, easier to work with. Both present
|
|
349
|
+
on detail pages only.
|
|
350
|
+
|
|
351
|
+
- **`updated` field reflects last RAWG edit, not release date** — Use `released` for
|
|
352
|
+
the release date. `updated` changes frequently as the community edits entries.
|