@pencil-agent/nano-pencil 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (195) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/mcp/mcp-client.d.ts +3 -1
  7. package/dist/core/mcp/mcp-client.js +6 -6
  8. package/dist/core/mcp/mcp-config.d.ts +3 -3
  9. package/dist/core/mcp/mcp-config.js +1 -1
  10. package/dist/core/mcp/mcp-manager.d.ts +5 -1
  11. package/dist/core/mcp/mcp-manager.js +1 -1
  12. package/dist/core/platform/config/resource-loader.d.ts +2 -0
  13. package/dist/core/platform/config/resource-loader.js +2 -2
  14. package/dist/core/runtime/agent-session.d.ts +12 -0
  15. package/dist/core/runtime/agent-session.js +8 -8
  16. package/dist/core/runtime/sdk.d.ts +8 -0
  17. package/dist/core/runtime/sdk.js +1 -1
  18. package/dist/extensions/builtin/AGENT.md +115 -115
  19. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  20. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  99. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  100. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  101. package/dist/extensions/builtin/browser/browser.md +73 -73
  102. package/dist/extensions/builtin/browser/install.md +142 -142
  103. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  104. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  105. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  107. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  112. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  113. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  114. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  115. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  116. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  117. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  118. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  119. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  120. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  121. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  122. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  123. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  124. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  125. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  126. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  127. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  128. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  129. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  130. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  131. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  132. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  133. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  134. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  135. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  136. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  137. package/dist/extensions/builtin/goal/README.md +67 -67
  138. package/dist/extensions/builtin/grub/README.md +112 -112
  139. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  140. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  141. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  142. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  143. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  144. package/dist/extensions/builtin/loop/README.md +92 -92
  145. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  146. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  147. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  148. package/dist/extensions/builtin/sal/README.md +72 -72
  149. package/dist/extensions/builtin/security-audit/README.md +289 -289
  150. package/dist/extensions/builtin/team/AGENT.md +112 -112
  151. package/dist/extensions/builtin/team/TESTING.md +299 -299
  152. package/dist/extensions/builtin/token-save/README.md +56 -56
  153. package/dist/extensions/optional/AGENT.md +10 -10
  154. package/dist/modes/interactive/interactive-mode.js +36 -36
  155. package/dist/modes/interactive/theme/dark.json +85 -85
  156. package/dist/modes/interactive/theme/light.json +84 -84
  157. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  158. package/dist/modes/interactive/theme/warm.json +81 -81
  159. package/dist/node_modules/@pencil-agent/agent-core/dist/agent-loop.js +3 -2
  160. package/dist/node_modules/@pencil-agent/agent-core/dist/structured-adaptive-agent-loop.js +2 -1
  161. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  162. package/docs/cc-agent-design.md +1297 -0
  163. package/docs/cc-tui-design.md +1333 -0
  164. package/docs/codex-goal-command-impl.md +1055 -1055
  165. package/docs/codex-goal-vs-grub.md +500 -500
  166. package/docs/custom-provider.md +27 -27
  167. package/docs/extensions.md +27 -27
  168. package/docs/keybindings.md +27 -27
  169. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  170. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  171. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  172. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  173. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  174. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  175. package/docs/loop-usage-examples.md +214 -214
  176. package/docs/models.md +27 -27
  177. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +170 -0
  178. package/docs/packages.md +27 -27
  179. package/docs/pi-design-philosophy.md +457 -457
  180. package/docs/planmode.md +1987 -1987
  181. package/docs/prompt-templates.md +27 -27
  182. package/docs/providers.md +27 -27
  183. package/docs/scan-report.md +3820 -0
  184. package/docs/sdk.md +27 -27
  185. package/docs/skills.md +27 -27
  186. package/docs/themes.md +27 -27
  187. package/docs/tui.md +27 -27
  188. package/docs//345/257/271/346/240/207Claude-Code.md +1775 -0
  189. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +261 -0
  190. package/package.json +190 -190
  191. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +0 -851
  192. package/docs/SDK-TESTING.md +0 -364
  193. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +0 -593
  194. package/docs/startup-performance-optimization.md +0 -301
  195. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +0 -47
@@ -1,473 +1,473 @@
1
- # HowLongToBeat — Scraping & Data Extraction
2
-
3
- Field-tested against howlongtobeat.com on 2026-04-18. All code blocks validated with live requests.
4
-
5
- ## Do this first
6
-
7
- **Use the search API — it returns structured JSON with all completion times in one POST call.**
8
-
9
- HLTB runs a token-gated POST endpoint at `/api/find`. You must first fetch a session token from `/api/find/init`, then include it in the search request. Both steps are plain HTTP — no browser required.
10
-
11
- ```python
12
- import json, re, urllib.request, time
13
- from helpers import http_get
14
-
15
- UA = "Mozilla/5.0"
16
-
17
- def get_token():
18
- """Fetch a fresh session token. Token encodes IP+UA+timestamp, reusable for ~15 min."""
19
- url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
20
- data = http_get(url, headers={"Referer": "https://howlongtobeat.com/"})
21
- return json.loads(data) # {token, hpKey, hpVal}
22
-
23
- def search_hltb(title, size=20, page=1, token_data=None):
24
- """
25
- Search HLTB for games. Returns raw API dict:
26
- {count, pageCurrent, pageTotal, pageSize, data: [...]}
27
- token_data can be reused across searches (fetch once, use many times).
28
- """
29
- if token_data is None:
30
- token_data = get_token()
31
- hp_key, hp_val = token_data['hpKey'], token_data['hpVal']
32
- payload = {
33
- "searchType": "games",
34
- "searchTerms": title.split(),
35
- "searchPage": page,
36
- "size": size,
37
- "searchOptions": {
38
- "games": {
39
- "userId": 0, "platform": "", "sortCategory": "popular",
40
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
41
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
42
- "rangeYear": {"min": "", "max": ""}, "modifier": ""
43
- },
44
- "users": {"sortCategory": "postcount"},
45
- "lists": {"sortCategory": "follows"},
46
- "filter": "", "sort": 0, "randomizer": 0
47
- },
48
- "useCache": True,
49
- hp_key: hp_val # honeypot field — key and value vary per token
50
- }
51
- req = urllib.request.Request(
52
- "https://howlongtobeat.com/api/find",
53
- data=json.dumps(payload).encode(),
54
- headers={
55
- "User-Agent": UA,
56
- "Content-Type": "application/json",
57
- "Origin": "https://howlongtobeat.com",
58
- "Referer": "https://howlongtobeat.com/",
59
- "x-auth-token": token_data['token'],
60
- "x-hp-key": hp_key,
61
- "x-hp-val": hp_val,
62
- },
63
- method="POST"
64
- )
65
- with urllib.request.urlopen(req, timeout=20) as r:
66
- return json.loads(r.read().decode())
67
-
68
- # Usage
69
- tok = get_token()
70
-
71
- result = search_hltb("elden ring", token_data=tok, size=3)
72
- for g in result['data']:
73
- print(g['game_id'], g['game_name'], g['release_world'])
74
- print(f" Main: {g['comp_main']/3600:.1f}h +Extras: {g['comp_plus']/3600:.1f}h 100%: {g['comp_100']/3600:.1f}h")
75
-
76
- # Confirmed output (2026-04-18):
77
- # 68151 Elden Ring 2022
78
- # Main: 60.0h +Extras: 101.2h 100%: 135.5h
79
- # 160589 Elden Ring: Nightreign 2025
80
- # Main: 28.1h +Extras: 40.1h 100%: 66.9h
81
- # 139385 Elden Ring: Shadow of the Erdtree 2024
82
- # Main: 25.7h +Extras: 39.0h 100%: 51.1h
83
- ```
84
-
85
- Token is reusable — fetch it once and pass it to multiple `search_hltb()` calls. No need to re-fetch per search.
86
-
87
- ---
88
-
89
- ## Fastest approach: search + parse in one helper
90
-
91
- ```python
92
- import json, re, urllib.request, time
93
- from helpers import http_get
94
-
95
- UA = "Mozilla/5.0"
96
-
97
- def hltb_search(title, size=5):
98
- """One-shot: get token + search, return list of dicts with hours."""
99
- url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
100
- tok = json.loads(http_get(url, headers={"Referer": "https://howlongtobeat.com/"}))
101
- hp_key, hp_val = tok['hpKey'], tok['hpVal']
102
- payload = {
103
- "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": size,
104
- "searchOptions": {
105
- "games": {"userId": 0, "platform": "", "sortCategory": "popular",
106
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
107
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
108
- "rangeYear": {"min": "", "max": ""}, "modifier": ""},
109
- "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
110
- "filter": "", "sort": 0, "randomizer": 0
111
- },
112
- "useCache": True, hp_key: hp_val
113
- }
114
- req = urllib.request.Request(
115
- "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
116
- headers={"User-Agent": UA, "Content-Type": "application/json",
117
- "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
118
- "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
119
- method="POST"
120
- )
121
- with urllib.request.urlopen(req, timeout=20) as r:
122
- data = json.loads(r.read().decode())
123
-
124
- def h(secs):
125
- return round(secs / 3600, 1) if secs else None
126
-
127
- return [
128
- {
129
- "game_id": g["game_id"],
130
- "name": g["game_name"],
131
- "type": g["game_type"], # "game" | "dlc" | "expansion" | "hack"
132
- "year": g["release_world"],
133
- "platforms": g["profile_platform"],
134
- "main": h(g["comp_main"]), # Main Story hours (polled average)
135
- "main_plus": h(g["comp_plus"]), # Main + Extras hours
136
- "completionist":h(g["comp_100"]), # Completionist hours
137
- "all_styles": h(g["comp_all"]), # All playstyles combined
138
- "main_count": g["comp_main_count"], # Number of submissions
139
- "plus_count": g["comp_plus_count"],
140
- "comp_count": g["comp_100_count"],
141
- "review_score": g["review_score"], # 0–100
142
- "image_url": f"https://howlongtobeat.com/games/{g['game_image']}",
143
- "page_url": f"https://howlongtobeat.com/game/{g['game_id']}",
144
- }
145
- for g in data["data"]
146
- ]
147
-
148
- # Verified results (2026-04-18):
149
- print(hltb_search("the witcher 3")[0])
150
- # {'game_id': 10270, 'name': 'The Witcher 3: Wild Hunt', 'type': 'game', 'year': 2015,
151
- # 'main': 51.6, 'main_plus': 103.8, 'completionist': 174.4, 'all_styles': 103.8,
152
- # 'main_count': 2681, 'plus_count': 6708, 'comp_count': 2327, 'review_score': 93, ...}
153
-
154
- print(hltb_search("gone home")[0])
155
- # {'game_id': 4010, 'name': 'Gone Home', 'main': 2.0, 'main_plus': 2.5, 'completionist': 3.1, ...}
156
- ```
157
-
158
- ---
159
-
160
- ## Game detail page (full stat breakdown, speedrun data, per-platform times)
161
-
162
- When you have a `game_id`, fetch the game page and extract `__NEXT_DATA__` for the complete dataset — includes median/avg/low/high times, speedrun data, co-op/multiplayer times, and per-platform breakdowns.
163
-
164
- ```python
165
- import json, re
166
- from helpers import http_get
167
-
168
- def get_game_detail(game_id):
169
- """
170
- Fetch complete game data from the HLTB game page.
171
- Returns pageProps['game']['data'] with keys: 'game', 'individuality', 'relationships'.
172
- """
173
- html = http_get(f"https://howlongtobeat.com/game/{game_id}")
174
- nd = json.loads(re.search(
175
- r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
176
- ).group(1))
177
- return nd['props']['pageProps']['game']['data']
178
-
179
- data = get_game_detail(10270) # Witcher 3
180
- g = data['game'][0]
181
-
182
- # Core completion times (all in seconds — divide by 3600 for hours)
183
- print(g['comp_main'] / 3600) # 51.6 — Main Story (polled avg)
184
- print(g['comp_main_med'] / 3600) # 50.0 — Main Story median
185
- print(g['comp_main_l'] / 3600) # 32.7 — Main Story low
186
- print(g['comp_main_h'] / 3600) # 85.8 — Main Story high
187
- print(g['comp_main_count']) # 2681 — submission count
188
-
189
- print(g['comp_plus'] / 3600) # 103.8 — Main + Extras
190
- print(g['comp_100'] / 3600) # 174.4 — Completionist
191
- print(g['comp_all'] / 3600) # 103.8 — All Styles
192
-
193
- # Speedrun times
194
- print(g['comp_lvl_spd']) # 1 if speedrun data exists, 0 if not
195
- print(g['comp_speed'] / 3600) # 19.2 — any% (polled avg)
196
- print(g['comp_speed_min'] / 3600) # 3.2 — fastest submission
197
- print(g['comp_speed_max'] / 3600) # 30.0 — slowest speedrun
198
- print(g['comp_speed_count']) # 15 — speedrun submissions
199
-
200
- print(g['comp_speed100'] / 3600) # 59.4 — 100% speedrun
201
- print(g['comp_speed100_count']) # 4
202
-
203
- # Multiplayer / co-op invested time
204
- print(g['comp_lvl_co']) # 1 if co-op data exists
205
- print(g['comp_lvl_mp']) # 1 if multiplayer data exists
206
- print(g['invested_co'] / 3600) # hours in co-op mode
207
- print(g['invested_mp'] / 3600) # hours in competitive multiplayer
208
- print(g['invested_co_count']) # submission count
209
-
210
- # Metadata
211
- print(g['profile_dev']) # "CD Projekt RED"
212
- print(g['profile_pub']) # "CD Projekt, Warner Bros..."
213
- print(g['profile_platform']) # "Nintendo Switch, PC, PlayStation 4, ..."
214
- print(g['profile_genre']) # "Third-Person, Action, Open World, Role-Playing"
215
- print(g['profile_steam']) # 292030 — Steam App ID (0 if not on Steam)
216
- print(g['release_world']) # "2015-05-19"
217
- print(g['rating_esrb']) # "M"
218
- print(g['review_score']) # 93 (0–100)
219
- print(g['count_comp']) # 26007 — times completed
220
- print(g['count_backlog']) # 31083
221
-
222
- # Per-platform breakdown (individuality)
223
- for plat in data['individuality']:
224
- print(plat['platform'],
225
- int(plat['comp_main'])/3600, # main hours
226
- int(plat['comp_plus'])/3600, # +extras hours
227
- int(plat['comp_100'])/3600, # 100% hours
228
- plat['count_comp']) # completions on this platform
229
- # Example:
230
- # Nintendo Switch 57.0h 112.3h 194.9h 236
231
- # PC, PS4, Xbox One 52.9h 110.0h 179.4h 11136
232
- # PS5, Xbox Series X/S 52.1h 92.5h 168.8h 343
233
-
234
- # DLC / expansion completion times
235
- for rel in data['relationships'][:3]:
236
- print(rel['game_id'], rel['game_name'], rel['game_type'],
237
- rel['comp_main']/3600 if rel['comp_main'] else None)
238
- ```
239
-
240
- ---
241
-
242
- ## Common workflows
243
-
244
- ### Quick lookup: name → completion times
245
-
246
- ```python
247
- import json, re, urllib.request, time
248
- from helpers import http_get
249
-
250
- UA = "Mozilla/5.0"
251
-
252
- def get_times(title):
253
- """Return Main/+Extras/100% hours for the top search match."""
254
- tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
255
- tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
256
- hp_key, hp_val = tok['hpKey'], tok['hpVal']
257
- payload = {
258
- "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": 1,
259
- "searchOptions": {
260
- "games": {"userId": 0, "platform": "", "sortCategory": "popular",
261
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
262
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
263
- "rangeYear": {"min": "", "max": ""}, "modifier": ""},
264
- "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
265
- "filter": "", "sort": 0, "randomizer": 0
266
- },
267
- "useCache": True, hp_key: hp_val
268
- }
269
- req = urllib.request.Request(
270
- "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
271
- headers={"User-Agent": UA, "Content-Type": "application/json",
272
- "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
273
- "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
274
- method="POST"
275
- )
276
- with urllib.request.urlopen(req, timeout=20) as r:
277
- data = json.loads(r.read().decode())
278
- if not data['data']:
279
- return None
280
- g = data['data'][0]
281
- h = lambda s: round(s/3600, 1) if s else None
282
- return {
283
- "id": g['game_id'], "name": g['game_name'],
284
- "main": h(g['comp_main']), "main_plus": h(g['comp_plus']),
285
- "completionist": h(g['comp_100'])
286
- }
287
-
288
- # Verified:
289
- print(get_times("celeste"))
290
- # {'id': 42818, 'name': 'Celeste', 'main': 8.3, 'main_plus': 14.6, 'completionist': 39.2}
291
- print(get_times("stardew valley"))
292
- # {'id': 34716, 'name': 'Stardew Valley', 'main': 53.4, 'main_plus': 94.6, 'completionist': 171.5}
293
- print(get_times("hades"))
294
- # {'id': 62941, 'name': 'Hades', 'main': 23.4, 'main_plus': 48.5, 'completionist': 95.0}
295
- ```
296
-
297
- ### Paginated search (all results for a query)
298
-
299
- `count` = total matches, `pageTotal` = total pages with current `size`. The same token works across all pages.
300
-
301
- ```python
302
- def search_all_pages(title, size=20):
303
- """Yield every search result for a query across all pages."""
304
- tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
305
- tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
306
- hp_key, hp_val = tok['hpKey'], tok['hpVal']
307
-
308
- page = 1
309
- while True:
310
- payload = {
311
- "searchType": "games", "searchTerms": title.split(),
312
- "searchPage": page, "size": size,
313
- "searchOptions": {
314
- "games": {"userId": 0, "platform": "", "sortCategory": "popular",
315
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
316
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
317
- "rangeYear": {"min": "", "max": ""}, "modifier": ""},
318
- "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
319
- "filter": "", "sort": 0, "randomizer": 0
320
- },
321
- "useCache": True, hp_key: hp_val
322
- }
323
- req = urllib.request.Request(
324
- "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
325
- headers={"User-Agent": UA, "Content-Type": "application/json",
326
- "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
327
- "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
328
- method="POST"
329
- )
330
- with urllib.request.urlopen(req, timeout=20) as r:
331
- data = json.loads(r.read().decode())
332
- yield from data['data']
333
- if page >= data['pageTotal']:
334
- break
335
- page += 1
336
-
337
- # "mario" returns 308 results across 16 pages (size=20)
338
- mario_games = list(search_all_pages("mario", size=20))
339
- print(len(mario_games)) # 308
340
- ```
341
-
342
- ### Batch lookup by game ID (parallel)
343
-
344
- ```python
345
- import json, re, urllib.request
346
- from concurrent.futures import ThreadPoolExecutor
347
- from helpers import http_get
348
-
349
- def fetch_game(game_id):
350
- html = http_get(f"https://howlongtobeat.com/game/{game_id}")
351
- nd = json.loads(re.search(
352
- r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
353
- ).group(1))
354
- g = nd['props']['pageProps']['game']['data']['game'][0]
355
- return {
356
- "id": g['game_id'], "name": g['game_name'],
357
- "main": round(g['comp_main']/3600, 1) if g['comp_main'] else None,
358
- "main_plus": round(g['comp_plus']/3600, 1) if g['comp_plus'] else None,
359
- "completionist": round(g['comp_100']/3600, 1) if g['comp_100'] else None,
360
- }
361
-
362
- ids = [10270, 68151, 42818, 26803, 34716] # Witcher3, Elden Ring, Celeste, DS3, Stardew
363
- with ThreadPoolExecutor(max_workers=5) as ex:
364
- results = list(ex.map(fetch_game, ids))
365
-
366
- for r in results:
367
- print(f"[{r['id']}] {r['name']}: {r['main']}h / {r['main_plus']}h / {r['completionist']}h")
368
-
369
- # Confirmed output:
370
- # [10270] The Witcher 3: Wild Hunt: 51.6h / 103.8h / 174.4h
371
- # [68151] Elden Ring: 60.0h / 101.2h / 135.5h
372
- # [42818] Celeste: 8.3h / 14.6h / 39.2h
373
- # [26803] Dark Souls III: 31.2h / 48.4h / 100.5h
374
- # [34716] Stardew Valley: 53.4h / 94.6h / 171.5h
375
- ```
376
-
377
- ---
378
-
379
- ## Search response field reference
380
-
381
- Every item in `data[]` from `/api/find`:
382
-
383
- | Field | Type | Description |
384
- |-------|------|-------------|
385
- | `game_id` | int | HLTB internal game ID |
386
- | `game_name` | str | Full game title |
387
- | `game_alias` | str | Alternate title / edition name |
388
- | `game_type` | str | `"game"` \| `"dlc"` \| `"expansion"` \| `"hack"` |
389
- | `game_image` | str | Image filename → `https://howlongtobeat.com/games/{game_image}` |
390
- | `release_world` | int | Release year (just the year integer, not a date) |
391
- | `profile_platform` | str | Comma-separated platform list |
392
- | `comp_main` | int | Main Story seconds (polled average), 0 if no data |
393
- | `comp_plus` | int | Main + Extras seconds |
394
- | `comp_100` | int | Completionist seconds |
395
- | `comp_all` | int | All Styles combined seconds |
396
- | `comp_main_count` | int | Submission count for Main Story |
397
- | `comp_plus_count` | int | Submission count for Main + Extras |
398
- | `comp_100_count` | int | Submission count for Completionist |
399
- | `comp_all_count` | int | Total submissions across all categories |
400
- | `comp_lvl_sp` | int | 1 if single-player data exists |
401
- | `comp_lvl_co` | int | 1 if co-op data exists |
402
- | `comp_lvl_mp` | int | 1 if multiplayer data exists |
403
- | `invested_co` | int | Average co-op time in seconds |
404
- | `invested_mp` | int | Average multiplayer time in seconds |
405
- | `count_comp` | int | Total completions logged |
406
- | `count_backlog` | int | Users with game in backlog |
407
- | `count_playing` | int | Currently playing |
408
- | `count_speedrun` | int | Speedrun entries |
409
- | `count_review` | int | Review count |
410
- | `review_score` | int | Community review score 0–100 |
411
- | `profile_popular` | int | Popularity rank |
412
-
413
- Additional fields in `__NEXT_DATA__` game page only:
414
-
415
- | Field | Description |
416
- |-------|-------------|
417
- | `comp_main_med/avg/l/h` | Median / average / low / high for main time |
418
- | `comp_plus_med/avg/l/h` | Same for Main + Extras |
419
- | `comp_100_med/avg/l/h` | Same for Completionist |
420
- | `comp_speed` | Speedrun any% average seconds |
421
- | `comp_speed_min/max/med` | Speedrun spread |
422
- | `comp_speed100` | 100% speedrun average |
423
- | `comp_speed_count` | Speedrun submission count |
424
- | `comp_lvl_spd` | 1 if speedrun data exists |
425
- | `profile_dev` | Developer name |
426
- | `profile_pub` | Publisher name |
427
- | `profile_genre` | Comma-separated genres |
428
- | `profile_steam` | Steam App ID (0 if not on Steam) |
429
- | `release_world` | Full release date `"YYYY-MM-DD"` |
430
- | `rating_esrb` | ESRB rating string (may be empty) |
431
- | `count_replay` | Times replayed |
432
- | `count_total` | Total user entries |
433
-
434
- ---
435
-
436
- ## Anti-bot measures
437
-
438
- - **Cloudflare** is present (confirmed by `CF-Ray` response header), but does not block plain HTTP with a browser UA.
439
- - **Token system**: Every search requires a fresh token from `/api/find/init`. Token encodes `timestamp::IP|UA|hpKey|hmacHash`. The server validates that the UA used to fetch the token matches the UA used in the search POST.
440
- - **Honeypot field**: `hpKey` and `hpVal` from the init response must appear as a top-level field in the POST body (e.g., `{"ign_7671546b": "a6679ea54598d502", ...}`). The key name rotates per request.
441
- - **Required headers on search POST**: `Origin: https://howlongtobeat.com` AND `Referer: https://howlongtobeat.com/` — missing either causes HTTP 403 or 404. `x-auth-token`, `x-hp-key`, `x-hp-val` are also required.
442
- - **Required header on init GET**: `Referer: https://howlongtobeat.com/` — missing causes HTTP 403.
443
- - **Token reuse**: A single token works for multiple searches and multiple pages. No per-request token fetch needed.
444
- - **No CAPTCHA** observed during testing with standard UA strings.
445
- - **Rate limits**: Not triggered during testing (token fetches + 10+ searches sequentially). Fetching many game pages in parallel (5 workers) worked without 429s.
446
-
447
- ---
448
-
449
- ## Gotchas
450
-
451
- - **Completion times are in seconds** — all `comp_*` fields are integer seconds. Divide by 3600 for hours. `0` means no data (not 0 hours).
452
-
453
- - **`release_world` is a year int in search, a full date in game page** — in the `/api/find` response, `release_world` is an integer year (e.g., `2015`). In `__NEXT_DATA__` on the game page, it's `"2015-05-19"`.
454
-
455
- - **UA fingerprinting** — the token from `/api/find/init` encodes the User-Agent. The search POST must use the identical UA that fetched the token, or you'll get HTTP 403. Since `http_get` sends `Mozilla/5.0`, use that same string for the search POST.
456
-
457
- - **Honeypot key name rotates** — `hpKey` is something like `ign_7671546b` (changes each token fetch). Always read it from the init response and use it dynamically. Never hardcode it.
458
-
459
- - **Both `x-hp-key`/`x-hp-val` headers AND the body field are required** — the server checks the request headers (`x-hp-key`, `x-hp-val`) against the dynamic key in the POST body. If either is wrong or missing, you get HTTP 404 (wrong body value) or HTTP 403 (missing/wrong header).
460
-
461
- - **`game_type` in search results** — can be `"game"`, `"dlc"`, `"expansion"`, or `"hack"`. Search results mix these by default. Filter with `if g['game_type'] == 'game'` if you only want base games.
462
-
463
- - **Games with no submission data** — `comp_main`, `comp_plus`, `comp_100` are `0` (not `None`) when no users have submitted times. Always check `if g['comp_main']:` before dividing.
464
-
465
- - **`individuality` (per-platform) data** — available only in `__NEXT_DATA__` on the game page, not in search results. `comp_main` etc. are strings, not ints, in this sub-object — cast with `int(plat['comp_main'])`.
466
-
467
- - **`profile_platform` in search** — a comma-separated string that HLTB displays. Not structured. Use game page `individuality` for per-platform time breakdowns.
468
-
469
- - **Token expiry** — if a long-running loop gets HTTP 403 with `{"error":"Session expired or invalid fingerprint"}`, call `get_token()` again and retry. Token lifetime appears to be ~15 minutes based on the timestamp embedded in the decoded value.
470
-
471
- - **No slug-based URLs** — HLTB uses integer `game_id` for all game pages, not slugs. There is no `title-to-slug` mapping; use search to find the `game_id` first.
472
-
473
- - **`sortCategory` options** — `"popular"` ranks by community engagement (best for "top result = intended game"). `"name"` sorts alphabetically. Other values (`"madnessTime"`, `"mainThenExtras"`) exist but return same results as `"name"` in testing.
1
+ # HowLongToBeat — Scraping & Data Extraction
2
+
3
+ Field-tested against howlongtobeat.com on 2026-04-18. All code blocks validated with live requests.
4
+
5
+ ## Do this first
6
+
7
+ **Use the search API — it returns structured JSON with all completion times in one POST call.**
8
+
9
+ HLTB runs a token-gated POST endpoint at `/api/find`. You must first fetch a session token from `/api/find/init`, then include it in the search request. Both steps are plain HTTP — no browser required.
10
+
11
+ ```python
12
+ import json, re, urllib.request, time
13
+ from helpers import http_get
14
+
15
+ UA = "Mozilla/5.0"
16
+
17
+ def get_token():
18
+ """Fetch a fresh session token. Token encodes IP+UA+timestamp, reusable for ~15 min."""
19
+ url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
20
+ data = http_get(url, headers={"Referer": "https://howlongtobeat.com/"})
21
+ return json.loads(data) # {token, hpKey, hpVal}
22
+
23
+ def search_hltb(title, size=20, page=1, token_data=None):
24
+ """
25
+ Search HLTB for games. Returns raw API dict:
26
+ {count, pageCurrent, pageTotal, pageSize, data: [...]}
27
+ token_data can be reused across searches (fetch once, use many times).
28
+ """
29
+ if token_data is None:
30
+ token_data = get_token()
31
+ hp_key, hp_val = token_data['hpKey'], token_data['hpVal']
32
+ payload = {
33
+ "searchType": "games",
34
+ "searchTerms": title.split(),
35
+ "searchPage": page,
36
+ "size": size,
37
+ "searchOptions": {
38
+ "games": {
39
+ "userId": 0, "platform": "", "sortCategory": "popular",
40
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
41
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
42
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""
43
+ },
44
+ "users": {"sortCategory": "postcount"},
45
+ "lists": {"sortCategory": "follows"},
46
+ "filter": "", "sort": 0, "randomizer": 0
47
+ },
48
+ "useCache": True,
49
+ hp_key: hp_val # honeypot field — key and value vary per token
50
+ }
51
+ req = urllib.request.Request(
52
+ "https://howlongtobeat.com/api/find",
53
+ data=json.dumps(payload).encode(),
54
+ headers={
55
+ "User-Agent": UA,
56
+ "Content-Type": "application/json",
57
+ "Origin": "https://howlongtobeat.com",
58
+ "Referer": "https://howlongtobeat.com/",
59
+ "x-auth-token": token_data['token'],
60
+ "x-hp-key": hp_key,
61
+ "x-hp-val": hp_val,
62
+ },
63
+ method="POST"
64
+ )
65
+ with urllib.request.urlopen(req, timeout=20) as r:
66
+ return json.loads(r.read().decode())
67
+
68
+ # Usage
69
+ tok = get_token()
70
+
71
+ result = search_hltb("elden ring", token_data=tok, size=3)
72
+ for g in result['data']:
73
+ print(g['game_id'], g['game_name'], g['release_world'])
74
+ print(f" Main: {g['comp_main']/3600:.1f}h +Extras: {g['comp_plus']/3600:.1f}h 100%: {g['comp_100']/3600:.1f}h")
75
+
76
+ # Confirmed output (2026-04-18):
77
+ # 68151 Elden Ring 2022
78
+ # Main: 60.0h +Extras: 101.2h 100%: 135.5h
79
+ # 160589 Elden Ring: Nightreign 2025
80
+ # Main: 28.1h +Extras: 40.1h 100%: 66.9h
81
+ # 139385 Elden Ring: Shadow of the Erdtree 2024
82
+ # Main: 25.7h +Extras: 39.0h 100%: 51.1h
83
+ ```
84
+
85
+ Token is reusable — fetch it once and pass it to multiple `search_hltb()` calls. No need to re-fetch per search.
86
+
87
+ ---
88
+
89
+ ## Fastest approach: search + parse in one helper
90
+
91
+ ```python
92
+ import json, re, urllib.request, time
93
+ from helpers import http_get
94
+
95
+ UA = "Mozilla/5.0"
96
+
97
+ def hltb_search(title, size=5):
98
+ """One-shot: get token + search, return list of dicts with hours."""
99
+ url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
100
+ tok = json.loads(http_get(url, headers={"Referer": "https://howlongtobeat.com/"}))
101
+ hp_key, hp_val = tok['hpKey'], tok['hpVal']
102
+ payload = {
103
+ "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": size,
104
+ "searchOptions": {
105
+ "games": {"userId": 0, "platform": "", "sortCategory": "popular",
106
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
107
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
108
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""},
109
+ "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
110
+ "filter": "", "sort": 0, "randomizer": 0
111
+ },
112
+ "useCache": True, hp_key: hp_val
113
+ }
114
+ req = urllib.request.Request(
115
+ "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
116
+ headers={"User-Agent": UA, "Content-Type": "application/json",
117
+ "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
118
+ "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
119
+ method="POST"
120
+ )
121
+ with urllib.request.urlopen(req, timeout=20) as r:
122
+ data = json.loads(r.read().decode())
123
+
124
+ def h(secs):
125
+ return round(secs / 3600, 1) if secs else None
126
+
127
+ return [
128
+ {
129
+ "game_id": g["game_id"],
130
+ "name": g["game_name"],
131
+ "type": g["game_type"], # "game" | "dlc" | "expansion" | "hack"
132
+ "year": g["release_world"],
133
+ "platforms": g["profile_platform"],
134
+ "main": h(g["comp_main"]), # Main Story hours (polled average)
135
+ "main_plus": h(g["comp_plus"]), # Main + Extras hours
136
+ "completionist":h(g["comp_100"]), # Completionist hours
137
+ "all_styles": h(g["comp_all"]), # All playstyles combined
138
+ "main_count": g["comp_main_count"], # Number of submissions
139
+ "plus_count": g["comp_plus_count"],
140
+ "comp_count": g["comp_100_count"],
141
+ "review_score": g["review_score"], # 0–100
142
+ "image_url": f"https://howlongtobeat.com/games/{g['game_image']}",
143
+ "page_url": f"https://howlongtobeat.com/game/{g['game_id']}",
144
+ }
145
+ for g in data["data"]
146
+ ]
147
+
148
+ # Verified results (2026-04-18):
149
+ print(hltb_search("the witcher 3")[0])
150
+ # {'game_id': 10270, 'name': 'The Witcher 3: Wild Hunt', 'type': 'game', 'year': 2015,
151
+ # 'main': 51.6, 'main_plus': 103.8, 'completionist': 174.4, 'all_styles': 103.8,
152
+ # 'main_count': 2681, 'plus_count': 6708, 'comp_count': 2327, 'review_score': 93, ...}
153
+
154
+ print(hltb_search("gone home")[0])
155
+ # {'game_id': 4010, 'name': 'Gone Home', 'main': 2.0, 'main_plus': 2.5, 'completionist': 3.1, ...}
156
+ ```
157
+
158
+ ---
159
+
160
+ ## Game detail page (full stat breakdown, speedrun data, per-platform times)
161
+
162
+ When you have a `game_id`, fetch the game page and extract `__NEXT_DATA__` for the complete dataset — includes median/avg/low/high times, speedrun data, co-op/multiplayer times, and per-platform breakdowns.
163
+
164
+ ```python
165
+ import json, re
166
+ from helpers import http_get
167
+
168
+ def get_game_detail(game_id):
169
+ """
170
+ Fetch complete game data from the HLTB game page.
171
+ Returns pageProps['game']['data'] with keys: 'game', 'individuality', 'relationships'.
172
+ """
173
+ html = http_get(f"https://howlongtobeat.com/game/{game_id}")
174
+ nd = json.loads(re.search(
175
+ r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
176
+ ).group(1))
177
+ return nd['props']['pageProps']['game']['data']
178
+
179
+ data = get_game_detail(10270) # Witcher 3
180
+ g = data['game'][0]
181
+
182
+ # Core completion times (all in seconds — divide by 3600 for hours)
183
+ print(g['comp_main'] / 3600) # 51.6 — Main Story (polled avg)
184
+ print(g['comp_main_med'] / 3600) # 50.0 — Main Story median
185
+ print(g['comp_main_l'] / 3600) # 32.7 — Main Story low
186
+ print(g['comp_main_h'] / 3600) # 85.8 — Main Story high
187
+ print(g['comp_main_count']) # 2681 — submission count
188
+
189
+ print(g['comp_plus'] / 3600) # 103.8 — Main + Extras
190
+ print(g['comp_100'] / 3600) # 174.4 — Completionist
191
+ print(g['comp_all'] / 3600) # 103.8 — All Styles
192
+
193
+ # Speedrun times
194
+ print(g['comp_lvl_spd']) # 1 if speedrun data exists, 0 if not
195
+ print(g['comp_speed'] / 3600) # 19.2 — any% (polled avg)
196
+ print(g['comp_speed_min'] / 3600) # 3.2 — fastest submission
197
+ print(g['comp_speed_max'] / 3600) # 30.0 — slowest speedrun
198
+ print(g['comp_speed_count']) # 15 — speedrun submissions
199
+
200
+ print(g['comp_speed100'] / 3600) # 59.4 — 100% speedrun
201
+ print(g['comp_speed100_count']) # 4
202
+
203
+ # Multiplayer / co-op invested time
204
+ print(g['comp_lvl_co']) # 1 if co-op data exists
205
+ print(g['comp_lvl_mp']) # 1 if multiplayer data exists
206
+ print(g['invested_co'] / 3600) # hours in co-op mode
207
+ print(g['invested_mp'] / 3600) # hours in competitive multiplayer
208
+ print(g['invested_co_count']) # submission count
209
+
210
+ # Metadata
211
+ print(g['profile_dev']) # "CD Projekt RED"
212
+ print(g['profile_pub']) # "CD Projekt, Warner Bros..."
213
+ print(g['profile_platform']) # "Nintendo Switch, PC, PlayStation 4, ..."
214
+ print(g['profile_genre']) # "Third-Person, Action, Open World, Role-Playing"
215
+ print(g['profile_steam']) # 292030 — Steam App ID (0 if not on Steam)
216
+ print(g['release_world']) # "2015-05-19"
217
+ print(g['rating_esrb']) # "M"
218
+ print(g['review_score']) # 93 (0–100)
219
+ print(g['count_comp']) # 26007 — times completed
220
+ print(g['count_backlog']) # 31083
221
+
222
+ # Per-platform breakdown (individuality)
223
+ for plat in data['individuality']:
224
+ print(plat['platform'],
225
+ int(plat['comp_main'])/3600, # main hours
226
+ int(plat['comp_plus'])/3600, # +extras hours
227
+ int(plat['comp_100'])/3600, # 100% hours
228
+ plat['count_comp']) # completions on this platform
229
+ # Example:
230
+ # Nintendo Switch 57.0h 112.3h 194.9h 236
231
+ # PC, PS4, Xbox One 52.9h 110.0h 179.4h 11136
232
+ # PS5, Xbox Series X/S 52.1h 92.5h 168.8h 343
233
+
234
+ # DLC / expansion completion times
235
+ for rel in data['relationships'][:3]:
236
+ print(rel['game_id'], rel['game_name'], rel['game_type'],
237
+ rel['comp_main']/3600 if rel['comp_main'] else None)
238
+ ```
239
+
240
+ ---
241
+
242
+ ## Common workflows
243
+
244
+ ### Quick lookup: name → completion times
245
+
246
+ ```python
247
+ import json, re, urllib.request, time
248
+ from helpers import http_get
249
+
250
+ UA = "Mozilla/5.0"
251
+
252
+ def get_times(title):
253
+ """Return Main/+Extras/100% hours for the top search match."""
254
+ tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
255
+ tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
256
+ hp_key, hp_val = tok['hpKey'], tok['hpVal']
257
+ payload = {
258
+ "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": 1,
259
+ "searchOptions": {
260
+ "games": {"userId": 0, "platform": "", "sortCategory": "popular",
261
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
262
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
263
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""},
264
+ "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
265
+ "filter": "", "sort": 0, "randomizer": 0
266
+ },
267
+ "useCache": True, hp_key: hp_val
268
+ }
269
+ req = urllib.request.Request(
270
+ "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
271
+ headers={"User-Agent": UA, "Content-Type": "application/json",
272
+ "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
273
+ "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
274
+ method="POST"
275
+ )
276
+ with urllib.request.urlopen(req, timeout=20) as r:
277
+ data = json.loads(r.read().decode())
278
+ if not data['data']:
279
+ return None
280
+ g = data['data'][0]
281
+ h = lambda s: round(s/3600, 1) if s else None
282
+ return {
283
+ "id": g['game_id'], "name": g['game_name'],
284
+ "main": h(g['comp_main']), "main_plus": h(g['comp_plus']),
285
+ "completionist": h(g['comp_100'])
286
+ }
287
+
288
+ # Verified:
289
+ print(get_times("celeste"))
290
+ # {'id': 42818, 'name': 'Celeste', 'main': 8.3, 'main_plus': 14.6, 'completionist': 39.2}
291
+ print(get_times("stardew valley"))
292
+ # {'id': 34716, 'name': 'Stardew Valley', 'main': 53.4, 'main_plus': 94.6, 'completionist': 171.5}
293
+ print(get_times("hades"))
294
+ # {'id': 62941, 'name': 'Hades', 'main': 23.4, 'main_plus': 48.5, 'completionist': 95.0}
295
+ ```
296
+
297
+ ### Paginated search (all results for a query)
298
+
299
+ `count` = total matches, `pageTotal` = total pages with current `size`. The same token works across all pages.
300
+
301
+ ```python
302
+ def search_all_pages(title, size=20):
303
+ """Yield every search result for a query across all pages."""
304
+ tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
305
+ tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
306
+ hp_key, hp_val = tok['hpKey'], tok['hpVal']
307
+
308
+ page = 1
309
+ while True:
310
+ payload = {
311
+ "searchType": "games", "searchTerms": title.split(),
312
+ "searchPage": page, "size": size,
313
+ "searchOptions": {
314
+ "games": {"userId": 0, "platform": "", "sortCategory": "popular",
315
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
316
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
317
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""},
318
+ "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
319
+ "filter": "", "sort": 0, "randomizer": 0
320
+ },
321
+ "useCache": True, hp_key: hp_val
322
+ }
323
+ req = urllib.request.Request(
324
+ "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
325
+ headers={"User-Agent": UA, "Content-Type": "application/json",
326
+ "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
327
+ "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
328
+ method="POST"
329
+ )
330
+ with urllib.request.urlopen(req, timeout=20) as r:
331
+ data = json.loads(r.read().decode())
332
+ yield from data['data']
333
+ if page >= data['pageTotal']:
334
+ break
335
+ page += 1
336
+
337
+ # "mario" returns 308 results across 16 pages (size=20)
338
+ mario_games = list(search_all_pages("mario", size=20))
339
+ print(len(mario_games)) # 308
340
+ ```
341
+
342
+ ### Batch lookup by game ID (parallel)
343
+
344
+ ```python
345
+ import json, re, urllib.request
346
+ from concurrent.futures import ThreadPoolExecutor
347
+ from helpers import http_get
348
+
349
+ def fetch_game(game_id):
350
+ html = http_get(f"https://howlongtobeat.com/game/{game_id}")
351
+ nd = json.loads(re.search(
352
+ r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
353
+ ).group(1))
354
+ g = nd['props']['pageProps']['game']['data']['game'][0]
355
+ return {
356
+ "id": g['game_id'], "name": g['game_name'],
357
+ "main": round(g['comp_main']/3600, 1) if g['comp_main'] else None,
358
+ "main_plus": round(g['comp_plus']/3600, 1) if g['comp_plus'] else None,
359
+ "completionist": round(g['comp_100']/3600, 1) if g['comp_100'] else None,
360
+ }
361
+
362
+ ids = [10270, 68151, 42818, 26803, 34716] # Witcher3, Elden Ring, Celeste, DS3, Stardew
363
+ with ThreadPoolExecutor(max_workers=5) as ex:
364
+ results = list(ex.map(fetch_game, ids))
365
+
366
+ for r in results:
367
+ print(f"[{r['id']}] {r['name']}: {r['main']}h / {r['main_plus']}h / {r['completionist']}h")
368
+
369
+ # Confirmed output:
370
+ # [10270] The Witcher 3: Wild Hunt: 51.6h / 103.8h / 174.4h
371
+ # [68151] Elden Ring: 60.0h / 101.2h / 135.5h
372
+ # [42818] Celeste: 8.3h / 14.6h / 39.2h
373
+ # [26803] Dark Souls III: 31.2h / 48.4h / 100.5h
374
+ # [34716] Stardew Valley: 53.4h / 94.6h / 171.5h
375
+ ```
376
+
377
+ ---
378
+
379
+ ## Search response field reference
380
+
381
+ Every item in `data[]` from `/api/find`:
382
+
383
+ | Field | Type | Description |
384
+ |-------|------|-------------|
385
+ | `game_id` | int | HLTB internal game ID |
386
+ | `game_name` | str | Full game title |
387
+ | `game_alias` | str | Alternate title / edition name |
388
+ | `game_type` | str | `"game"` \| `"dlc"` \| `"expansion"` \| `"hack"` |
389
+ | `game_image` | str | Image filename → `https://howlongtobeat.com/games/{game_image}` |
390
+ | `release_world` | int | Release year (just the year integer, not a date) |
391
+ | `profile_platform` | str | Comma-separated platform list |
392
+ | `comp_main` | int | Main Story seconds (polled average), 0 if no data |
393
+ | `comp_plus` | int | Main + Extras seconds |
394
+ | `comp_100` | int | Completionist seconds |
395
+ | `comp_all` | int | All Styles combined seconds |
396
+ | `comp_main_count` | int | Submission count for Main Story |
397
+ | `comp_plus_count` | int | Submission count for Main + Extras |
398
+ | `comp_100_count` | int | Submission count for Completionist |
399
+ | `comp_all_count` | int | Total submissions across all categories |
400
+ | `comp_lvl_sp` | int | 1 if single-player data exists |
401
+ | `comp_lvl_co` | int | 1 if co-op data exists |
402
+ | `comp_lvl_mp` | int | 1 if multiplayer data exists |
403
+ | `invested_co` | int | Average co-op time in seconds |
404
+ | `invested_mp` | int | Average multiplayer time in seconds |
405
+ | `count_comp` | int | Total completions logged |
406
+ | `count_backlog` | int | Users with game in backlog |
407
+ | `count_playing` | int | Currently playing |
408
+ | `count_speedrun` | int | Speedrun entries |
409
+ | `count_review` | int | Review count |
410
+ | `review_score` | int | Community review score 0–100 |
411
+ | `profile_popular` | int | Popularity rank |
412
+
413
+ Additional fields in `__NEXT_DATA__` game page only:
414
+
415
+ | Field | Description |
416
+ |-------|-------------|
417
+ | `comp_main_med/avg/l/h` | Median / average / low / high for main time |
418
+ | `comp_plus_med/avg/l/h` | Same for Main + Extras |
419
+ | `comp_100_med/avg/l/h` | Same for Completionist |
420
+ | `comp_speed` | Speedrun any% average seconds |
421
+ | `comp_speed_min/max/med` | Speedrun spread |
422
+ | `comp_speed100` | 100% speedrun average |
423
+ | `comp_speed_count` | Speedrun submission count |
424
+ | `comp_lvl_spd` | 1 if speedrun data exists |
425
+ | `profile_dev` | Developer name |
426
+ | `profile_pub` | Publisher name |
427
+ | `profile_genre` | Comma-separated genres |
428
+ | `profile_steam` | Steam App ID (0 if not on Steam) |
429
+ | `release_world` | Full release date `"YYYY-MM-DD"` |
430
+ | `rating_esrb` | ESRB rating string (may be empty) |
431
+ | `count_replay` | Times replayed |
432
+ | `count_total` | Total user entries |
433
+
434
+ ---
435
+
436
+ ## Anti-bot measures
437
+
438
+ - **Cloudflare** is present (confirmed by `CF-Ray` response header), but does not block plain HTTP with a browser UA.
439
+ - **Token system**: Every search requires a fresh token from `/api/find/init`. Token encodes `timestamp::IP|UA|hpKey|hmacHash`. The server validates that the UA used to fetch the token matches the UA used in the search POST.
440
+ - **Honeypot field**: `hpKey` and `hpVal` from the init response must appear as a top-level field in the POST body (e.g., `{"ign_7671546b": "a6679ea54598d502", ...}`). The key name rotates per request.
441
+ - **Required headers on search POST**: `Origin: https://howlongtobeat.com` AND `Referer: https://howlongtobeat.com/` — missing either causes HTTP 403 or 404. `x-auth-token`, `x-hp-key`, `x-hp-val` are also required.
442
+ - **Required header on init GET**: `Referer: https://howlongtobeat.com/` — missing causes HTTP 403.
443
+ - **Token reuse**: A single token works for multiple searches and multiple pages. No per-request token fetch needed.
444
+ - **No CAPTCHA** observed during testing with standard UA strings.
445
+ - **Rate limits**: Not triggered during testing (token fetches + 10+ searches sequentially). Fetching many game pages in parallel (5 workers) worked without 429s.
446
+
447
+ ---
448
+
449
+ ## Gotchas
450
+
451
+ - **Completion times are in seconds** — all `comp_*` fields are integer seconds. Divide by 3600 for hours. `0` means no data (not 0 hours).
452
+
453
+ - **`release_world` is a year int in search, a full date in game page** — in the `/api/find` response, `release_world` is an integer year (e.g., `2015`). In `__NEXT_DATA__` on the game page, it's `"2015-05-19"`.
454
+
455
+ - **UA fingerprinting** — the token from `/api/find/init` encodes the User-Agent. The search POST must use the identical UA that fetched the token, or you'll get HTTP 403. Since `http_get` sends `Mozilla/5.0`, use that same string for the search POST.
456
+
457
+ - **Honeypot key name rotates** — `hpKey` is something like `ign_7671546b` (changes each token fetch). Always read it from the init response and use it dynamically. Never hardcode it.
458
+
459
+ - **Both `x-hp-key`/`x-hp-val` headers AND the body field are required** — the server checks the request headers (`x-hp-key`, `x-hp-val`) against the dynamic key in the POST body. If either is wrong or missing, you get HTTP 404 (wrong body value) or HTTP 403 (missing/wrong header).
460
+
461
+ - **`game_type` in search results** — can be `"game"`, `"dlc"`, `"expansion"`, or `"hack"`. Search results mix these by default. Filter with `if g['game_type'] == 'game'` if you only want base games.
462
+
463
+ - **Games with no submission data** — `comp_main`, `comp_plus`, `comp_100` are `0` (not `None`) when no users have submitted times. Always check `if g['comp_main']:` before dividing.
464
+
465
+ - **`individuality` (per-platform) data** — available only in `__NEXT_DATA__` on the game page, not in search results. `comp_main` etc. are strings, not ints, in this sub-object — cast with `int(plat['comp_main'])`.
466
+
467
+ - **`profile_platform` in search** — a comma-separated string that HLTB displays. Not structured. Use game page `individuality` for per-platform time breakdowns.
468
+
469
+ - **Token expiry** — if a long-running loop gets HTTP 403 with `{"error":"Session expired or invalid fingerprint"}`, call `get_token()` again and retry. Token lifetime appears to be ~15 minutes based on the timestamp embedded in the decoded value.
470
+
471
+ - **No slug-based URLs** — HLTB uses integer `game_id` for all game pages, not slugs. There is no `title-to-slug` mapping; use search to find the `game_id` first.
472
+
473
+ - **`sortCategory` options** — `"popular"` ranks by community engagement (best for "top result = intended game"). `"name"` sorts alphabetically. Other values (`"madnessTime"`, `"mainThenExtras"`) exist but return same results as `"name"` in testing.