@pencil-agent/nano-pencil 2.0.0-beta.9 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (207) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/extensions-host/index.d.ts +1 -1
  7. package/dist/core/extensions-host/types.d.ts +5 -8
  8. package/dist/extensions/builtin/AGENT.md +115 -115
  9. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  10. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  11. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  12. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  13. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  14. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  15. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  16. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  17. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  18. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  91. package/dist/extensions/builtin/browser/browser.md +73 -73
  92. package/dist/extensions/builtin/browser/install.md +142 -142
  93. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  94. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  95. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  96. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  97. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  98. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  99. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  100. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  101. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  102. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  104. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  105. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  108. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  109. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  110. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  111. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  112. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  113. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  114. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  115. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  116. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  117. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  118. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  119. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  120. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  121. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  122. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  123. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  124. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  125. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  126. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  127. package/dist/extensions/builtin/goal/README.md +67 -67
  128. package/dist/extensions/builtin/goal/goal-controller.js +1 -1
  129. package/dist/extensions/builtin/goal/goal-prompts.js +4 -4
  130. package/dist/extensions/builtin/grub/README.md +112 -112
  131. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  132. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  133. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  134. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  135. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  136. package/dist/extensions/builtin/loop/README.md +92 -92
  137. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  138. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  139. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  140. package/dist/extensions/builtin/sal/README.md +72 -72
  141. package/dist/extensions/builtin/security-audit/README.md +289 -289
  142. package/dist/extensions/builtin/team/AGENT.md +112 -112
  143. package/dist/extensions/builtin/team/TESTING.md +299 -299
  144. package/dist/extensions/builtin/token-save/README.md +56 -56
  145. package/dist/extensions/optional/AGENT.md +10 -10
  146. package/dist/index.d.ts +5 -30
  147. package/dist/index.js +1 -1
  148. package/dist/models.d.ts +7 -0
  149. package/dist/models.js +1 -0
  150. package/dist/modes/interactive/theme/dark.json +85 -85
  151. package/dist/modes/interactive/theme/light.json +84 -84
  152. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  153. package/dist/modes/interactive/theme/warm.json +81 -81
  154. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  155. package/dist/packages/protocol/src/flags.d.ts +20 -0
  156. package/dist/packages/protocol/src/flags.js +0 -0
  157. package/dist/packages/protocol/src/hooks.d.ts +17 -0
  158. package/dist/packages/protocol/src/hooks.js +0 -0
  159. package/dist/packages/protocol/src/index.d.ts +4 -2
  160. package/dist/packages/protocol/src/index.js +1 -1
  161. package/dist/packages/protocol/src/lifecycle.d.ts +11 -21
  162. package/dist/public-config.d.ts +12 -0
  163. package/dist/public-config.js +1 -0
  164. package/dist/runtime.d.ts +9 -0
  165. package/dist/runtime.js +1 -0
  166. package/dist/session-compaction.d.ts +7 -0
  167. package/dist/session-compaction.js +1 -0
  168. package/dist/session.d.ts +7 -0
  169. package/dist/session.js +1 -0
  170. package/dist/skills.d.ts +7 -0
  171. package/dist/skills.js +1 -0
  172. package/dist/tools.d.ts +7 -0
  173. package/dist/tools.js +1 -0
  174. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  175. package/docs/SDK-TESTING.md +364 -0
  176. package/docs/codex-goal-command-impl.md +1055 -1055
  177. package/docs/codex-goal-vs-grub.md +500 -500
  178. package/docs/custom-provider.md +27 -27
  179. package/docs/extensions.md +27 -27
  180. package/docs/keybindings.md +27 -27
  181. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  182. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  183. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  184. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  185. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  186. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  187. package/docs/loop-usage-examples.md +214 -214
  188. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  189. package/docs/models.md +27 -27
  190. package/docs/packages.md +27 -27
  191. package/docs/pi-design-philosophy.md +457 -457
  192. package/docs/planmode.md +1987 -1987
  193. package/docs/prompt-templates.md +27 -27
  194. package/docs/providers.md +27 -27
  195. package/docs/sdk.md +27 -27
  196. package/docs/skills.md +27 -27
  197. package/docs/startup-performance-optimization.md +301 -0
  198. package/docs/themes.md +27 -27
  199. package/docs/tui.md +27 -27
  200. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  201. package/package.json +190 -162
  202. package/docs/cc-agent-design.md +0 -1297
  203. package/docs/cc-tui-design.md +0 -1333
  204. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
  205. package/docs/scan-report.md +0 -3820
  206. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  207. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
@@ -1,473 +1,473 @@
1
- # HowLongToBeat — Scraping & Data Extraction
2
-
3
- Field-tested against howlongtobeat.com on 2026-04-18. All code blocks validated with live requests.
4
-
5
- ## Do this first
6
-
7
- **Use the search API — it returns structured JSON with all completion times in one POST call.**
8
-
9
- HLTB runs a token-gated POST endpoint at `/api/find`. You must first fetch a session token from `/api/find/init`, then include it in the search request. Both steps are plain HTTP — no browser required.
10
-
11
- ```python
12
- import json, re, urllib.request, time
13
- from helpers import http_get
14
-
15
- UA = "Mozilla/5.0"
16
-
17
- def get_token():
18
- """Fetch a fresh session token. Token encodes IP+UA+timestamp, reusable for ~15 min."""
19
- url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
20
- data = http_get(url, headers={"Referer": "https://howlongtobeat.com/"})
21
- return json.loads(data) # {token, hpKey, hpVal}
22
-
23
- def search_hltb(title, size=20, page=1, token_data=None):
24
- """
25
- Search HLTB for games. Returns raw API dict:
26
- {count, pageCurrent, pageTotal, pageSize, data: [...]}
27
- token_data can be reused across searches (fetch once, use many times).
28
- """
29
- if token_data is None:
30
- token_data = get_token()
31
- hp_key, hp_val = token_data['hpKey'], token_data['hpVal']
32
- payload = {
33
- "searchType": "games",
34
- "searchTerms": title.split(),
35
- "searchPage": page,
36
- "size": size,
37
- "searchOptions": {
38
- "games": {
39
- "userId": 0, "platform": "", "sortCategory": "popular",
40
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
41
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
42
- "rangeYear": {"min": "", "max": ""}, "modifier": ""
43
- },
44
- "users": {"sortCategory": "postcount"},
45
- "lists": {"sortCategory": "follows"},
46
- "filter": "", "sort": 0, "randomizer": 0
47
- },
48
- "useCache": True,
49
- hp_key: hp_val # honeypot field — key and value vary per token
50
- }
51
- req = urllib.request.Request(
52
- "https://howlongtobeat.com/api/find",
53
- data=json.dumps(payload).encode(),
54
- headers={
55
- "User-Agent": UA,
56
- "Content-Type": "application/json",
57
- "Origin": "https://howlongtobeat.com",
58
- "Referer": "https://howlongtobeat.com/",
59
- "x-auth-token": token_data['token'],
60
- "x-hp-key": hp_key,
61
- "x-hp-val": hp_val,
62
- },
63
- method="POST"
64
- )
65
- with urllib.request.urlopen(req, timeout=20) as r:
66
- return json.loads(r.read().decode())
67
-
68
- # Usage
69
- tok = get_token()
70
-
71
- result = search_hltb("elden ring", token_data=tok, size=3)
72
- for g in result['data']:
73
- print(g['game_id'], g['game_name'], g['release_world'])
74
- print(f" Main: {g['comp_main']/3600:.1f}h +Extras: {g['comp_plus']/3600:.1f}h 100%: {g['comp_100']/3600:.1f}h")
75
-
76
- # Confirmed output (2026-04-18):
77
- # 68151 Elden Ring 2022
78
- # Main: 60.0h +Extras: 101.2h 100%: 135.5h
79
- # 160589 Elden Ring: Nightreign 2025
80
- # Main: 28.1h +Extras: 40.1h 100%: 66.9h
81
- # 139385 Elden Ring: Shadow of the Erdtree 2024
82
- # Main: 25.7h +Extras: 39.0h 100%: 51.1h
83
- ```
84
-
85
- Token is reusable — fetch it once and pass it to multiple `search_hltb()` calls. No need to re-fetch per search.
86
-
87
- ---
88
-
89
- ## Fastest approach: search + parse in one helper
90
-
91
- ```python
92
- import json, re, urllib.request, time
93
- from helpers import http_get
94
-
95
- UA = "Mozilla/5.0"
96
-
97
- def hltb_search(title, size=5):
98
- """One-shot: get token + search, return list of dicts with hours."""
99
- url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
100
- tok = json.loads(http_get(url, headers={"Referer": "https://howlongtobeat.com/"}))
101
- hp_key, hp_val = tok['hpKey'], tok['hpVal']
102
- payload = {
103
- "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": size,
104
- "searchOptions": {
105
- "games": {"userId": 0, "platform": "", "sortCategory": "popular",
106
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
107
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
108
- "rangeYear": {"min": "", "max": ""}, "modifier": ""},
109
- "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
110
- "filter": "", "sort": 0, "randomizer": 0
111
- },
112
- "useCache": True, hp_key: hp_val
113
- }
114
- req = urllib.request.Request(
115
- "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
116
- headers={"User-Agent": UA, "Content-Type": "application/json",
117
- "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
118
- "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
119
- method="POST"
120
- )
121
- with urllib.request.urlopen(req, timeout=20) as r:
122
- data = json.loads(r.read().decode())
123
-
124
- def h(secs):
125
- return round(secs / 3600, 1) if secs else None
126
-
127
- return [
128
- {
129
- "game_id": g["game_id"],
130
- "name": g["game_name"],
131
- "type": g["game_type"], # "game" | "dlc" | "expansion" | "hack"
132
- "year": g["release_world"],
133
- "platforms": g["profile_platform"],
134
- "main": h(g["comp_main"]), # Main Story hours (polled average)
135
- "main_plus": h(g["comp_plus"]), # Main + Extras hours
136
- "completionist":h(g["comp_100"]), # Completionist hours
137
- "all_styles": h(g["comp_all"]), # All playstyles combined
138
- "main_count": g["comp_main_count"], # Number of submissions
139
- "plus_count": g["comp_plus_count"],
140
- "comp_count": g["comp_100_count"],
141
- "review_score": g["review_score"], # 0–100
142
- "image_url": f"https://howlongtobeat.com/games/{g['game_image']}",
143
- "page_url": f"https://howlongtobeat.com/game/{g['game_id']}",
144
- }
145
- for g in data["data"]
146
- ]
147
-
148
- # Verified results (2026-04-18):
149
- print(hltb_search("the witcher 3")[0])
150
- # {'game_id': 10270, 'name': 'The Witcher 3: Wild Hunt', 'type': 'game', 'year': 2015,
151
- # 'main': 51.6, 'main_plus': 103.8, 'completionist': 174.4, 'all_styles': 103.8,
152
- # 'main_count': 2681, 'plus_count': 6708, 'comp_count': 2327, 'review_score': 93, ...}
153
-
154
- print(hltb_search("gone home")[0])
155
- # {'game_id': 4010, 'name': 'Gone Home', 'main': 2.0, 'main_plus': 2.5, 'completionist': 3.1, ...}
156
- ```
157
-
158
- ---
159
-
160
- ## Game detail page (full stat breakdown, speedrun data, per-platform times)
161
-
162
- When you have a `game_id`, fetch the game page and extract `__NEXT_DATA__` for the complete dataset — includes median/avg/low/high times, speedrun data, co-op/multiplayer times, and per-platform breakdowns.
163
-
164
- ```python
165
- import json, re
166
- from helpers import http_get
167
-
168
- def get_game_detail(game_id):
169
- """
170
- Fetch complete game data from the HLTB game page.
171
- Returns pageProps['game']['data'] with keys: 'game', 'individuality', 'relationships'.
172
- """
173
- html = http_get(f"https://howlongtobeat.com/game/{game_id}")
174
- nd = json.loads(re.search(
175
- r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
176
- ).group(1))
177
- return nd['props']['pageProps']['game']['data']
178
-
179
- data = get_game_detail(10270) # Witcher 3
180
- g = data['game'][0]
181
-
182
- # Core completion times (all in seconds — divide by 3600 for hours)
183
- print(g['comp_main'] / 3600) # 51.6 — Main Story (polled avg)
184
- print(g['comp_main_med'] / 3600) # 50.0 — Main Story median
185
- print(g['comp_main_l'] / 3600) # 32.7 — Main Story low
186
- print(g['comp_main_h'] / 3600) # 85.8 — Main Story high
187
- print(g['comp_main_count']) # 2681 — submission count
188
-
189
- print(g['comp_plus'] / 3600) # 103.8 — Main + Extras
190
- print(g['comp_100'] / 3600) # 174.4 — Completionist
191
- print(g['comp_all'] / 3600) # 103.8 — All Styles
192
-
193
- # Speedrun times
194
- print(g['comp_lvl_spd']) # 1 if speedrun data exists, 0 if not
195
- print(g['comp_speed'] / 3600) # 19.2 — any% (polled avg)
196
- print(g['comp_speed_min'] / 3600) # 3.2 — fastest submission
197
- print(g['comp_speed_max'] / 3600) # 30.0 — slowest speedrun
198
- print(g['comp_speed_count']) # 15 — speedrun submissions
199
-
200
- print(g['comp_speed100'] / 3600) # 59.4 — 100% speedrun
201
- print(g['comp_speed100_count']) # 4
202
-
203
- # Multiplayer / co-op invested time
204
- print(g['comp_lvl_co']) # 1 if co-op data exists
205
- print(g['comp_lvl_mp']) # 1 if multiplayer data exists
206
- print(g['invested_co'] / 3600) # hours in co-op mode
207
- print(g['invested_mp'] / 3600) # hours in competitive multiplayer
208
- print(g['invested_co_count']) # submission count
209
-
210
- # Metadata
211
- print(g['profile_dev']) # "CD Projekt RED"
212
- print(g['profile_pub']) # "CD Projekt, Warner Bros..."
213
- print(g['profile_platform']) # "Nintendo Switch, PC, PlayStation 4, ..."
214
- print(g['profile_genre']) # "Third-Person, Action, Open World, Role-Playing"
215
- print(g['profile_steam']) # 292030 — Steam App ID (0 if not on Steam)
216
- print(g['release_world']) # "2015-05-19"
217
- print(g['rating_esrb']) # "M"
218
- print(g['review_score']) # 93 (0–100)
219
- print(g['count_comp']) # 26007 — times completed
220
- print(g['count_backlog']) # 31083
221
-
222
- # Per-platform breakdown (individuality)
223
- for plat in data['individuality']:
224
- print(plat['platform'],
225
- int(plat['comp_main'])/3600, # main hours
226
- int(plat['comp_plus'])/3600, # +extras hours
227
- int(plat['comp_100'])/3600, # 100% hours
228
- plat['count_comp']) # completions on this platform
229
- # Example:
230
- # Nintendo Switch 57.0h 112.3h 194.9h 236
231
- # PC, PS4, Xbox One 52.9h 110.0h 179.4h 11136
232
- # PS5, Xbox Series X/S 52.1h 92.5h 168.8h 343
233
-
234
- # DLC / expansion completion times
235
- for rel in data['relationships'][:3]:
236
- print(rel['game_id'], rel['game_name'], rel['game_type'],
237
- rel['comp_main']/3600 if rel['comp_main'] else None)
238
- ```
239
-
240
- ---
241
-
242
- ## Common workflows
243
-
244
- ### Quick lookup: name → completion times
245
-
246
- ```python
247
- import json, re, urllib.request, time
248
- from helpers import http_get
249
-
250
- UA = "Mozilla/5.0"
251
-
252
- def get_times(title):
253
- """Return Main/+Extras/100% hours for the top search match."""
254
- tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
255
- tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
256
- hp_key, hp_val = tok['hpKey'], tok['hpVal']
257
- payload = {
258
- "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": 1,
259
- "searchOptions": {
260
- "games": {"userId": 0, "platform": "", "sortCategory": "popular",
261
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
262
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
263
- "rangeYear": {"min": "", "max": ""}, "modifier": ""},
264
- "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
265
- "filter": "", "sort": 0, "randomizer": 0
266
- },
267
- "useCache": True, hp_key: hp_val
268
- }
269
- req = urllib.request.Request(
270
- "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
271
- headers={"User-Agent": UA, "Content-Type": "application/json",
272
- "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
273
- "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
274
- method="POST"
275
- )
276
- with urllib.request.urlopen(req, timeout=20) as r:
277
- data = json.loads(r.read().decode())
278
- if not data['data']:
279
- return None
280
- g = data['data'][0]
281
- h = lambda s: round(s/3600, 1) if s else None
282
- return {
283
- "id": g['game_id'], "name": g['game_name'],
284
- "main": h(g['comp_main']), "main_plus": h(g['comp_plus']),
285
- "completionist": h(g['comp_100'])
286
- }
287
-
288
- # Verified:
289
- print(get_times("celeste"))
290
- # {'id': 42818, 'name': 'Celeste', 'main': 8.3, 'main_plus': 14.6, 'completionist': 39.2}
291
- print(get_times("stardew valley"))
292
- # {'id': 34716, 'name': 'Stardew Valley', 'main': 53.4, 'main_plus': 94.6, 'completionist': 171.5}
293
- print(get_times("hades"))
294
- # {'id': 62941, 'name': 'Hades', 'main': 23.4, 'main_plus': 48.5, 'completionist': 95.0}
295
- ```
296
-
297
- ### Paginated search (all results for a query)
298
-
299
- `count` = total matches, `pageTotal` = total pages with current `size`. The same token works across all pages.
300
-
301
- ```python
302
- def search_all_pages(title, size=20):
303
- """Yield every search result for a query across all pages."""
304
- tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
305
- tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
306
- hp_key, hp_val = tok['hpKey'], tok['hpVal']
307
-
308
- page = 1
309
- while True:
310
- payload = {
311
- "searchType": "games", "searchTerms": title.split(),
312
- "searchPage": page, "size": size,
313
- "searchOptions": {
314
- "games": {"userId": 0, "platform": "", "sortCategory": "popular",
315
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
316
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
317
- "rangeYear": {"min": "", "max": ""}, "modifier": ""},
318
- "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
319
- "filter": "", "sort": 0, "randomizer": 0
320
- },
321
- "useCache": True, hp_key: hp_val
322
- }
323
- req = urllib.request.Request(
324
- "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
325
- headers={"User-Agent": UA, "Content-Type": "application/json",
326
- "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
327
- "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
328
- method="POST"
329
- )
330
- with urllib.request.urlopen(req, timeout=20) as r:
331
- data = json.loads(r.read().decode())
332
- yield from data['data']
333
- if page >= data['pageTotal']:
334
- break
335
- page += 1
336
-
337
- # "mario" returns 308 results across 16 pages (size=20)
338
- mario_games = list(search_all_pages("mario", size=20))
339
- print(len(mario_games)) # 308
340
- ```
341
-
342
- ### Batch lookup by game ID (parallel)
343
-
344
- ```python
345
- import json, re, urllib.request
346
- from concurrent.futures import ThreadPoolExecutor
347
- from helpers import http_get
348
-
349
- def fetch_game(game_id):
350
- html = http_get(f"https://howlongtobeat.com/game/{game_id}")
351
- nd = json.loads(re.search(
352
- r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
353
- ).group(1))
354
- g = nd['props']['pageProps']['game']['data']['game'][0]
355
- return {
356
- "id": g['game_id'], "name": g['game_name'],
357
- "main": round(g['comp_main']/3600, 1) if g['comp_main'] else None,
358
- "main_plus": round(g['comp_plus']/3600, 1) if g['comp_plus'] else None,
359
- "completionist": round(g['comp_100']/3600, 1) if g['comp_100'] else None,
360
- }
361
-
362
- ids = [10270, 68151, 42818, 26803, 34716] # Witcher3, Elden Ring, Celeste, DS3, Stardew
363
- with ThreadPoolExecutor(max_workers=5) as ex:
364
- results = list(ex.map(fetch_game, ids))
365
-
366
- for r in results:
367
- print(f"[{r['id']}] {r['name']}: {r['main']}h / {r['main_plus']}h / {r['completionist']}h")
368
-
369
- # Confirmed output:
370
- # [10270] The Witcher 3: Wild Hunt: 51.6h / 103.8h / 174.4h
371
- # [68151] Elden Ring: 60.0h / 101.2h / 135.5h
372
- # [42818] Celeste: 8.3h / 14.6h / 39.2h
373
- # [26803] Dark Souls III: 31.2h / 48.4h / 100.5h
374
- # [34716] Stardew Valley: 53.4h / 94.6h / 171.5h
375
- ```
376
-
377
- ---
378
-
379
- ## Search response field reference
380
-
381
- Every item in `data[]` from `/api/find`:
382
-
383
- | Field | Type | Description |
384
- |-------|------|-------------|
385
- | `game_id` | int | HLTB internal game ID |
386
- | `game_name` | str | Full game title |
387
- | `game_alias` | str | Alternate title / edition name |
388
- | `game_type` | str | `"game"` \| `"dlc"` \| `"expansion"` \| `"hack"` |
389
- | `game_image` | str | Image filename → `https://howlongtobeat.com/games/{game_image}` |
390
- | `release_world` | int | Release year (just the year integer, not a date) |
391
- | `profile_platform` | str | Comma-separated platform list |
392
- | `comp_main` | int | Main Story seconds (polled average), 0 if no data |
393
- | `comp_plus` | int | Main + Extras seconds |
394
- | `comp_100` | int | Completionist seconds |
395
- | `comp_all` | int | All Styles combined seconds |
396
- | `comp_main_count` | int | Submission count for Main Story |
397
- | `comp_plus_count` | int | Submission count for Main + Extras |
398
- | `comp_100_count` | int | Submission count for Completionist |
399
- | `comp_all_count` | int | Total submissions across all categories |
400
- | `comp_lvl_sp` | int | 1 if single-player data exists |
401
- | `comp_lvl_co` | int | 1 if co-op data exists |
402
- | `comp_lvl_mp` | int | 1 if multiplayer data exists |
403
- | `invested_co` | int | Average co-op time in seconds |
404
- | `invested_mp` | int | Average multiplayer time in seconds |
405
- | `count_comp` | int | Total completions logged |
406
- | `count_backlog` | int | Users with game in backlog |
407
- | `count_playing` | int | Currently playing |
408
- | `count_speedrun` | int | Speedrun entries |
409
- | `count_review` | int | Review count |
410
- | `review_score` | int | Community review score 0–100 |
411
- | `profile_popular` | int | Popularity rank |
412
-
413
- Additional fields in `__NEXT_DATA__` game page only:
414
-
415
- | Field | Description |
416
- |-------|-------------|
417
- | `comp_main_med/avg/l/h` | Median / average / low / high for main time |
418
- | `comp_plus_med/avg/l/h` | Same for Main + Extras |
419
- | `comp_100_med/avg/l/h` | Same for Completionist |
420
- | `comp_speed` | Speedrun any% average seconds |
421
- | `comp_speed_min/max/med` | Speedrun spread |
422
- | `comp_speed100` | 100% speedrun average |
423
- | `comp_speed_count` | Speedrun submission count |
424
- | `comp_lvl_spd` | 1 if speedrun data exists |
425
- | `profile_dev` | Developer name |
426
- | `profile_pub` | Publisher name |
427
- | `profile_genre` | Comma-separated genres |
428
- | `profile_steam` | Steam App ID (0 if not on Steam) |
429
- | `release_world` | Full release date `"YYYY-MM-DD"` |
430
- | `rating_esrb` | ESRB rating string (may be empty) |
431
- | `count_replay` | Times replayed |
432
- | `count_total` | Total user entries |
433
-
434
- ---
435
-
436
- ## Anti-bot measures
437
-
438
- - **Cloudflare** is present (confirmed by `CF-Ray` response header), but does not block plain HTTP with a browser UA.
439
- - **Token system**: Every search requires a fresh token from `/api/find/init`. Token encodes `timestamp::IP|UA|hpKey|hmacHash`. The server validates that the UA used to fetch the token matches the UA used in the search POST.
440
- - **Honeypot field**: `hpKey` and `hpVal` from the init response must appear as a top-level field in the POST body (e.g., `{"ign_7671546b": "a6679ea54598d502", ...}`). The key name rotates per request.
441
- - **Required headers on search POST**: `Origin: https://howlongtobeat.com` AND `Referer: https://howlongtobeat.com/` — missing either causes HTTP 403 or 404. `x-auth-token`, `x-hp-key`, `x-hp-val` are also required.
442
- - **Required header on init GET**: `Referer: https://howlongtobeat.com/` — missing causes HTTP 403.
443
- - **Token reuse**: A single token works for multiple searches and multiple pages. No per-request token fetch needed.
444
- - **No CAPTCHA** observed during testing with standard UA strings.
445
- - **Rate limits**: Not triggered during testing (token fetches + 10+ searches sequentially). Fetching many game pages in parallel (5 workers) worked without 429s.
446
-
447
- ---
448
-
449
- ## Gotchas
450
-
451
- - **Completion times are in seconds** — all `comp_*` fields are integer seconds. Divide by 3600 for hours. `0` means no data (not 0 hours).
452
-
453
- - **`release_world` is a year int in search, a full date in game page** — in the `/api/find` response, `release_world` is an integer year (e.g., `2015`). In `__NEXT_DATA__` on the game page, it's `"2015-05-19"`.
454
-
455
- - **UA fingerprinting** — the token from `/api/find/init` encodes the User-Agent. The search POST must use the identical UA that fetched the token, or you'll get HTTP 403. Since `http_get` sends `Mozilla/5.0`, use that same string for the search POST.
456
-
457
- - **Honeypot key name rotates** — `hpKey` is something like `ign_7671546b` (changes each token fetch). Always read it from the init response and use it dynamically. Never hardcode it.
458
-
459
- - **Both `x-hp-key`/`x-hp-val` headers AND the body field are required** — the server checks the request headers (`x-hp-key`, `x-hp-val`) against the dynamic key in the POST body. If either is wrong or missing, you get HTTP 404 (wrong body value) or HTTP 403 (missing/wrong header).
460
-
461
- - **`game_type` in search results** — can be `"game"`, `"dlc"`, `"expansion"`, or `"hack"`. Search results mix these by default. Filter with `if g['game_type'] == 'game'` if you only want base games.
462
-
463
- - **Games with no submission data** — `comp_main`, `comp_plus`, `comp_100` are `0` (not `None`) when no users have submitted times. Always check `if g['comp_main']:` before dividing.
464
-
465
- - **`individuality` (per-platform) data** — available only in `__NEXT_DATA__` on the game page, not in search results. `comp_main` etc. are strings, not ints, in this sub-object — cast with `int(plat['comp_main'])`.
466
-
467
- - **`profile_platform` in search** — a comma-separated string that HLTB displays. Not structured. Use game page `individuality` for per-platform time breakdowns.
468
-
469
- - **Token expiry** — if a long-running loop gets HTTP 403 with `{"error":"Session expired or invalid fingerprint"}`, call `get_token()` again and retry. Token lifetime appears to be ~15 minutes based on the timestamp embedded in the decoded value.
470
-
471
- - **No slug-based URLs** — HLTB uses integer `game_id` for all game pages, not slugs. There is no `title-to-slug` mapping; use search to find the `game_id` first.
472
-
473
- - **`sortCategory` options** — `"popular"` ranks by community engagement (best for "top result = intended game"). `"name"` sorts alphabetically. Other values (`"madnessTime"`, `"mainThenExtras"`) exist but return same results as `"name"` in testing.
1
+ # HowLongToBeat — Scraping & Data Extraction
2
+
3
+ Field-tested against howlongtobeat.com on 2026-04-18. All code blocks validated with live requests.
4
+
5
+ ## Do this first
6
+
7
+ **Use the search API — it returns structured JSON with all completion times in one POST call.**
8
+
9
+ HLTB runs a token-gated POST endpoint at `/api/find`. You must first fetch a session token from `/api/find/init`, then include it in the search request. Both steps are plain HTTP — no browser required.
10
+
11
+ ```python
12
+ import json, re, urllib.request, time
13
+ from helpers import http_get
14
+
15
+ UA = "Mozilla/5.0"
16
+
17
+ def get_token():
18
+ """Fetch a fresh session token. Token encodes IP+UA+timestamp, reusable for ~15 min."""
19
+ url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
20
+ data = http_get(url, headers={"Referer": "https://howlongtobeat.com/"})
21
+ return json.loads(data) # {token, hpKey, hpVal}
22
+
23
+ def search_hltb(title, size=20, page=1, token_data=None):
24
+ """
25
+ Search HLTB for games. Returns raw API dict:
26
+ {count, pageCurrent, pageTotal, pageSize, data: [...]}
27
+ token_data can be reused across searches (fetch once, use many times).
28
+ """
29
+ if token_data is None:
30
+ token_data = get_token()
31
+ hp_key, hp_val = token_data['hpKey'], token_data['hpVal']
32
+ payload = {
33
+ "searchType": "games",
34
+ "searchTerms": title.split(),
35
+ "searchPage": page,
36
+ "size": size,
37
+ "searchOptions": {
38
+ "games": {
39
+ "userId": 0, "platform": "", "sortCategory": "popular",
40
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
41
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
42
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""
43
+ },
44
+ "users": {"sortCategory": "postcount"},
45
+ "lists": {"sortCategory": "follows"},
46
+ "filter": "", "sort": 0, "randomizer": 0
47
+ },
48
+ "useCache": True,
49
+ hp_key: hp_val # honeypot field — key and value vary per token
50
+ }
51
+ req = urllib.request.Request(
52
+ "https://howlongtobeat.com/api/find",
53
+ data=json.dumps(payload).encode(),
54
+ headers={
55
+ "User-Agent": UA,
56
+ "Content-Type": "application/json",
57
+ "Origin": "https://howlongtobeat.com",
58
+ "Referer": "https://howlongtobeat.com/",
59
+ "x-auth-token": token_data['token'],
60
+ "x-hp-key": hp_key,
61
+ "x-hp-val": hp_val,
62
+ },
63
+ method="POST"
64
+ )
65
+ with urllib.request.urlopen(req, timeout=20) as r:
66
+ return json.loads(r.read().decode())
67
+
68
+ # Usage
69
+ tok = get_token()
70
+
71
+ result = search_hltb("elden ring", token_data=tok, size=3)
72
+ for g in result['data']:
73
+ print(g['game_id'], g['game_name'], g['release_world'])
74
+ print(f" Main: {g['comp_main']/3600:.1f}h +Extras: {g['comp_plus']/3600:.1f}h 100%: {g['comp_100']/3600:.1f}h")
75
+
76
+ # Confirmed output (2026-04-18):
77
+ # 68151 Elden Ring 2022
78
+ # Main: 60.0h +Extras: 101.2h 100%: 135.5h
79
+ # 160589 Elden Ring: Nightreign 2025
80
+ # Main: 28.1h +Extras: 40.1h 100%: 66.9h
81
+ # 139385 Elden Ring: Shadow of the Erdtree 2024
82
+ # Main: 25.7h +Extras: 39.0h 100%: 51.1h
83
+ ```
84
+
85
+ Token is reusable — fetch it once and pass it to multiple `search_hltb()` calls. No need to re-fetch per search.
86
+
87
+ ---
88
+
89
+ ## Fastest approach: search + parse in one helper
90
+
91
+ ```python
92
+ import json, re, urllib.request, time
93
+ from helpers import http_get
94
+
95
+ UA = "Mozilla/5.0"
96
+
97
+ def hltb_search(title, size=5):
98
+ """One-shot: get token + search, return list of dicts with hours."""
99
+ url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
100
+ tok = json.loads(http_get(url, headers={"Referer": "https://howlongtobeat.com/"}))
101
+ hp_key, hp_val = tok['hpKey'], tok['hpVal']
102
+ payload = {
103
+ "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": size,
104
+ "searchOptions": {
105
+ "games": {"userId": 0, "platform": "", "sortCategory": "popular",
106
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
107
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
108
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""},
109
+ "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
110
+ "filter": "", "sort": 0, "randomizer": 0
111
+ },
112
+ "useCache": True, hp_key: hp_val
113
+ }
114
+ req = urllib.request.Request(
115
+ "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
116
+ headers={"User-Agent": UA, "Content-Type": "application/json",
117
+ "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
118
+ "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
119
+ method="POST"
120
+ )
121
+ with urllib.request.urlopen(req, timeout=20) as r:
122
+ data = json.loads(r.read().decode())
123
+
124
+ def h(secs):
125
+ return round(secs / 3600, 1) if secs else None
126
+
127
+ return [
128
+ {
129
+ "game_id": g["game_id"],
130
+ "name": g["game_name"],
131
+ "type": g["game_type"], # "game" | "dlc" | "expansion" | "hack"
132
+ "year": g["release_world"],
133
+ "platforms": g["profile_platform"],
134
+ "main": h(g["comp_main"]), # Main Story hours (polled average)
135
+ "main_plus": h(g["comp_plus"]), # Main + Extras hours
136
+ "completionist":h(g["comp_100"]), # Completionist hours
137
+ "all_styles": h(g["comp_all"]), # All playstyles combined
138
+ "main_count": g["comp_main_count"], # Number of submissions
139
+ "plus_count": g["comp_plus_count"],
140
+ "comp_count": g["comp_100_count"],
141
+ "review_score": g["review_score"], # 0–100
142
+ "image_url": f"https://howlongtobeat.com/games/{g['game_image']}",
143
+ "page_url": f"https://howlongtobeat.com/game/{g['game_id']}",
144
+ }
145
+ for g in data["data"]
146
+ ]
147
+
148
+ # Verified results (2026-04-18):
149
+ print(hltb_search("the witcher 3")[0])
150
+ # {'game_id': 10270, 'name': 'The Witcher 3: Wild Hunt', 'type': 'game', 'year': 2015,
151
+ # 'main': 51.6, 'main_plus': 103.8, 'completionist': 174.4, 'all_styles': 103.8,
152
+ # 'main_count': 2681, 'plus_count': 6708, 'comp_count': 2327, 'review_score': 93, ...}
153
+
154
+ print(hltb_search("gone home")[0])
155
+ # {'game_id': 4010, 'name': 'Gone Home', 'main': 2.0, 'main_plus': 2.5, 'completionist': 3.1, ...}
156
+ ```
157
+
158
+ ---
159
+
160
+ ## Game detail page (full stat breakdown, speedrun data, per-platform times)
161
+
162
+ When you have a `game_id`, fetch the game page and extract `__NEXT_DATA__` for the complete dataset — includes median/avg/low/high times, speedrun data, co-op/multiplayer times, and per-platform breakdowns.
163
+
164
+ ```python
165
+ import json, re
166
+ from helpers import http_get
167
+
168
+ def get_game_detail(game_id):
169
+ """
170
+ Fetch complete game data from the HLTB game page.
171
+ Returns pageProps['game']['data'] with keys: 'game', 'individuality', 'relationships'.
172
+ """
173
+ html = http_get(f"https://howlongtobeat.com/game/{game_id}")
174
+ nd = json.loads(re.search(
175
+ r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
176
+ ).group(1))
177
+ return nd['props']['pageProps']['game']['data']
178
+
179
+ data = get_game_detail(10270) # Witcher 3
180
+ g = data['game'][0]
181
+
182
+ # Core completion times (all in seconds — divide by 3600 for hours)
183
+ print(g['comp_main'] / 3600) # 51.6 — Main Story (polled avg)
184
+ print(g['comp_main_med'] / 3600) # 50.0 — Main Story median
185
+ print(g['comp_main_l'] / 3600) # 32.7 — Main Story low
186
+ print(g['comp_main_h'] / 3600) # 85.8 — Main Story high
187
+ print(g['comp_main_count']) # 2681 — submission count
188
+
189
+ print(g['comp_plus'] / 3600) # 103.8 — Main + Extras
190
+ print(g['comp_100'] / 3600) # 174.4 — Completionist
191
+ print(g['comp_all'] / 3600) # 103.8 — All Styles
192
+
193
+ # Speedrun times
194
+ print(g['comp_lvl_spd']) # 1 if speedrun data exists, 0 if not
195
+ print(g['comp_speed'] / 3600) # 19.2 — any% (polled avg)
196
+ print(g['comp_speed_min'] / 3600) # 3.2 — fastest submission
197
+ print(g['comp_speed_max'] / 3600) # 30.0 — slowest speedrun
198
+ print(g['comp_speed_count']) # 15 — speedrun submissions
199
+
200
+ print(g['comp_speed100'] / 3600) # 59.4 — 100% speedrun
201
+ print(g['comp_speed100_count']) # 4
202
+
203
+ # Multiplayer / co-op invested time
204
+ print(g['comp_lvl_co']) # 1 if co-op data exists
205
+ print(g['comp_lvl_mp']) # 1 if multiplayer data exists
206
+ print(g['invested_co'] / 3600) # hours in co-op mode
207
+ print(g['invested_mp'] / 3600) # hours in competitive multiplayer
208
+ print(g['invested_co_count']) # submission count
209
+
210
+ # Metadata
211
+ print(g['profile_dev']) # "CD Projekt RED"
212
+ print(g['profile_pub']) # "CD Projekt, Warner Bros..."
213
+ print(g['profile_platform']) # "Nintendo Switch, PC, PlayStation 4, ..."
214
+ print(g['profile_genre']) # "Third-Person, Action, Open World, Role-Playing"
215
+ print(g['profile_steam']) # 292030 — Steam App ID (0 if not on Steam)
216
+ print(g['release_world']) # "2015-05-19"
217
+ print(g['rating_esrb']) # "M"
218
+ print(g['review_score']) # 93 (0–100)
219
+ print(g['count_comp']) # 26007 — times completed
220
+ print(g['count_backlog']) # 31083
221
+
222
+ # Per-platform breakdown (individuality)
223
+ for plat in data['individuality']:
224
+ print(plat['platform'],
225
+ int(plat['comp_main'])/3600, # main hours
226
+ int(plat['comp_plus'])/3600, # +extras hours
227
+ int(plat['comp_100'])/3600, # 100% hours
228
+ plat['count_comp']) # completions on this platform
229
+ # Example:
230
+ # Nintendo Switch 57.0h 112.3h 194.9h 236
231
+ # PC, PS4, Xbox One 52.9h 110.0h 179.4h 11136
232
+ # PS5, Xbox Series X/S 52.1h 92.5h 168.8h 343
233
+
234
+ # DLC / expansion completion times
235
+ for rel in data['relationships'][:3]:
236
+ print(rel['game_id'], rel['game_name'], rel['game_type'],
237
+ rel['comp_main']/3600 if rel['comp_main'] else None)
238
+ ```
239
+
240
+ ---
241
+
242
+ ## Common workflows
243
+
244
+ ### Quick lookup: name → completion times
245
+
246
+ ```python
247
+ import json, re, urllib.request, time
248
+ from helpers import http_get
249
+
250
+ UA = "Mozilla/5.0"
251
+
252
+ def get_times(title):
253
+ """Return Main/+Extras/100% hours for the top search match."""
254
+ tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
255
+ tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
256
+ hp_key, hp_val = tok['hpKey'], tok['hpVal']
257
+ payload = {
258
+ "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": 1,
259
+ "searchOptions": {
260
+ "games": {"userId": 0, "platform": "", "sortCategory": "popular",
261
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
262
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
263
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""},
264
+ "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
265
+ "filter": "", "sort": 0, "randomizer": 0
266
+ },
267
+ "useCache": True, hp_key: hp_val
268
+ }
269
+ req = urllib.request.Request(
270
+ "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
271
+ headers={"User-Agent": UA, "Content-Type": "application/json",
272
+ "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
273
+ "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
274
+ method="POST"
275
+ )
276
+ with urllib.request.urlopen(req, timeout=20) as r:
277
+ data = json.loads(r.read().decode())
278
+ if not data['data']:
279
+ return None
280
+ g = data['data'][0]
281
+ h = lambda s: round(s/3600, 1) if s else None
282
+ return {
283
+ "id": g['game_id'], "name": g['game_name'],
284
+ "main": h(g['comp_main']), "main_plus": h(g['comp_plus']),
285
+ "completionist": h(g['comp_100'])
286
+ }
287
+
288
+ # Verified:
289
+ print(get_times("celeste"))
290
+ # {'id': 42818, 'name': 'Celeste', 'main': 8.3, 'main_plus': 14.6, 'completionist': 39.2}
291
+ print(get_times("stardew valley"))
292
+ # {'id': 34716, 'name': 'Stardew Valley', 'main': 53.4, 'main_plus': 94.6, 'completionist': 171.5}
293
+ print(get_times("hades"))
294
+ # {'id': 62941, 'name': 'Hades', 'main': 23.4, 'main_plus': 48.5, 'completionist': 95.0}
295
+ ```
296
+
297
+ ### Paginated search (all results for a query)
298
+
299
+ `count` = total matches, `pageTotal` = total pages with current `size`. The same token works across all pages.
300
+
301
+ ```python
302
+ def search_all_pages(title, size=20):
303
+ """Yield every search result for a query across all pages."""
304
+ tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
305
+ tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
306
+ hp_key, hp_val = tok['hpKey'], tok['hpVal']
307
+
308
+ page = 1
309
+ while True:
310
+ payload = {
311
+ "searchType": "games", "searchTerms": title.split(),
312
+ "searchPage": page, "size": size,
313
+ "searchOptions": {
314
+ "games": {"userId": 0, "platform": "", "sortCategory": "popular",
315
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
316
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
317
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""},
318
+ "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
319
+ "filter": "", "sort": 0, "randomizer": 0
320
+ },
321
+ "useCache": True, hp_key: hp_val
322
+ }
323
+ req = urllib.request.Request(
324
+ "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
325
+ headers={"User-Agent": UA, "Content-Type": "application/json",
326
+ "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
327
+ "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
328
+ method="POST"
329
+ )
330
+ with urllib.request.urlopen(req, timeout=20) as r:
331
+ data = json.loads(r.read().decode())
332
+ yield from data['data']
333
+ if page >= data['pageTotal']:
334
+ break
335
+ page += 1
336
+
337
+ # "mario" returns 308 results across 16 pages (size=20)
338
+ mario_games = list(search_all_pages("mario", size=20))
339
+ print(len(mario_games)) # 308
340
+ ```
341
+
342
+ ### Batch lookup by game ID (parallel)
343
+
344
+ ```python
345
+ import json, re, urllib.request
346
+ from concurrent.futures import ThreadPoolExecutor
347
+ from helpers import http_get
348
+
349
+ def fetch_game(game_id):
350
+ html = http_get(f"https://howlongtobeat.com/game/{game_id}")
351
+ nd = json.loads(re.search(
352
+ r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
353
+ ).group(1))
354
+ g = nd['props']['pageProps']['game']['data']['game'][0]
355
+ return {
356
+ "id": g['game_id'], "name": g['game_name'],
357
+ "main": round(g['comp_main']/3600, 1) if g['comp_main'] else None,
358
+ "main_plus": round(g['comp_plus']/3600, 1) if g['comp_plus'] else None,
359
+ "completionist": round(g['comp_100']/3600, 1) if g['comp_100'] else None,
360
+ }
361
+
362
+ ids = [10270, 68151, 42818, 26803, 34716] # Witcher3, Elden Ring, Celeste, DS3, Stardew
363
+ with ThreadPoolExecutor(max_workers=5) as ex:
364
+ results = list(ex.map(fetch_game, ids))
365
+
366
+ for r in results:
367
+ print(f"[{r['id']}] {r['name']}: {r['main']}h / {r['main_plus']}h / {r['completionist']}h")
368
+
369
+ # Confirmed output:
370
+ # [10270] The Witcher 3: Wild Hunt: 51.6h / 103.8h / 174.4h
371
+ # [68151] Elden Ring: 60.0h / 101.2h / 135.5h
372
+ # [42818] Celeste: 8.3h / 14.6h / 39.2h
373
+ # [26803] Dark Souls III: 31.2h / 48.4h / 100.5h
374
+ # [34716] Stardew Valley: 53.4h / 94.6h / 171.5h
375
+ ```
376
+
377
+ ---
378
+
379
+ ## Search response field reference
380
+
381
+ Every item in `data[]` from `/api/find`:
382
+
383
+ | Field | Type | Description |
384
+ |-------|------|-------------|
385
+ | `game_id` | int | HLTB internal game ID |
386
+ | `game_name` | str | Full game title |
387
+ | `game_alias` | str | Alternate title / edition name |
388
+ | `game_type` | str | `"game"` \| `"dlc"` \| `"expansion"` \| `"hack"` |
389
+ | `game_image` | str | Image filename → `https://howlongtobeat.com/games/{game_image}` |
390
+ | `release_world` | int | Release year (just the year integer, not a date) |
391
+ | `profile_platform` | str | Comma-separated platform list |
392
+ | `comp_main` | int | Main Story seconds (polled average), 0 if no data |
393
+ | `comp_plus` | int | Main + Extras seconds |
394
+ | `comp_100` | int | Completionist seconds |
395
+ | `comp_all` | int | All Styles combined seconds |
396
+ | `comp_main_count` | int | Submission count for Main Story |
397
+ | `comp_plus_count` | int | Submission count for Main + Extras |
398
+ | `comp_100_count` | int | Submission count for Completionist |
399
+ | `comp_all_count` | int | Total submissions across all categories |
400
+ | `comp_lvl_sp` | int | 1 if single-player data exists |
401
+ | `comp_lvl_co` | int | 1 if co-op data exists |
402
+ | `comp_lvl_mp` | int | 1 if multiplayer data exists |
403
+ | `invested_co` | int | Average co-op time in seconds |
404
+ | `invested_mp` | int | Average multiplayer time in seconds |
405
+ | `count_comp` | int | Total completions logged |
406
+ | `count_backlog` | int | Users with game in backlog |
407
+ | `count_playing` | int | Currently playing |
408
+ | `count_speedrun` | int | Speedrun entries |
409
+ | `count_review` | int | Review count |
410
+ | `review_score` | int | Community review score 0–100 |
411
+ | `profile_popular` | int | Popularity rank |
412
+
413
+ Additional fields in `__NEXT_DATA__` game page only:
414
+
415
+ | Field | Description |
416
+ |-------|-------------|
417
+ | `comp_main_med/avg/l/h` | Median / average / low / high for main time |
418
+ | `comp_plus_med/avg/l/h` | Same for Main + Extras |
419
+ | `comp_100_med/avg/l/h` | Same for Completionist |
420
+ | `comp_speed` | Speedrun any% average seconds |
421
+ | `comp_speed_min/max/med` | Speedrun spread |
422
+ | `comp_speed100` | 100% speedrun average |
423
+ | `comp_speed_count` | Speedrun submission count |
424
+ | `comp_lvl_spd` | 1 if speedrun data exists |
425
+ | `profile_dev` | Developer name |
426
+ | `profile_pub` | Publisher name |
427
+ | `profile_genre` | Comma-separated genres |
428
+ | `profile_steam` | Steam App ID (0 if not on Steam) |
429
+ | `release_world` | Full release date `"YYYY-MM-DD"` |
430
+ | `rating_esrb` | ESRB rating string (may be empty) |
431
+ | `count_replay` | Times replayed |
432
+ | `count_total` | Total user entries |
433
+
434
+ ---
435
+
436
+ ## Anti-bot measures
437
+
438
+ - **Cloudflare** is present (confirmed by `CF-Ray` response header), but does not block plain HTTP with a browser UA.
439
+ - **Token system**: Every search requires a fresh token from `/api/find/init`. Token encodes `timestamp::IP|UA|hpKey|hmacHash`. The server validates that the UA used to fetch the token matches the UA used in the search POST.
440
+ - **Honeypot field**: `hpKey` and `hpVal` from the init response must appear as a top-level field in the POST body (e.g., `{"ign_7671546b": "a6679ea54598d502", ...}`). The key name rotates per request.
441
+ - **Required headers on search POST**: `Origin: https://howlongtobeat.com` AND `Referer: https://howlongtobeat.com/` — missing either causes HTTP 403 or 404. `x-auth-token`, `x-hp-key`, `x-hp-val` are also required.
442
+ - **Required header on init GET**: `Referer: https://howlongtobeat.com/` — missing causes HTTP 403.
443
+ - **Token reuse**: A single token works for multiple searches and multiple pages. No per-request token fetch needed.
444
+ - **No CAPTCHA** observed during testing with standard UA strings.
445
+ - **Rate limits**: Not triggered during testing (token fetches + 10+ searches sequentially). Fetching many game pages in parallel (5 workers) worked without 429s.
446
+
447
+ ---
448
+
449
+ ## Gotchas
450
+
451
+ - **Completion times are in seconds** — all `comp_*` fields are integer seconds. Divide by 3600 for hours. `0` means no data (not 0 hours).
452
+
453
+ - **`release_world` is a year int in search, a full date in game page** — in the `/api/find` response, `release_world` is an integer year (e.g., `2015`). In `__NEXT_DATA__` on the game page, it's `"2015-05-19"`.
454
+
455
+ - **UA fingerprinting** — the token from `/api/find/init` encodes the User-Agent. The search POST must use the identical UA that fetched the token, or you'll get HTTP 403. Since `http_get` sends `Mozilla/5.0`, use that same string for the search POST.
456
+
457
+ - **Honeypot key name rotates** — `hpKey` is something like `ign_7671546b` (changes each token fetch). Always read it from the init response and use it dynamically. Never hardcode it.
458
+
459
+ - **Both `x-hp-key`/`x-hp-val` headers AND the body field are required** — the server checks the request headers (`x-hp-key`, `x-hp-val`) against the dynamic key in the POST body. If either is wrong or missing, you get HTTP 404 (wrong body value) or HTTP 403 (missing/wrong header).
460
+
461
+ - **`game_type` in search results** — can be `"game"`, `"dlc"`, `"expansion"`, or `"hack"`. Search results mix these by default. Filter with `if g['game_type'] == 'game'` if you only want base games.
462
+
463
+ - **Games with no submission data** — `comp_main`, `comp_plus`, `comp_100` are `0` (not `None`) when no users have submitted times. Always check `if g['comp_main']:` before dividing.
464
+
465
+ - **`individuality` (per-platform) data** — available only in `__NEXT_DATA__` on the game page, not in search results. `comp_main` etc. are strings, not ints, in this sub-object — cast with `int(plat['comp_main'])`.
466
+
467
+ - **`profile_platform` in search** — a comma-separated string that HLTB displays. Not structured. Use game page `individuality` for per-platform time breakdowns.
468
+
469
+ - **Token expiry** — if a long-running loop gets HTTP 403 with `{"error":"Session expired or invalid fingerprint"}`, call `get_token()` again and retry. Token lifetime appears to be ~15 minutes based on the timestamp embedded in the decoded value.
470
+
471
+ - **No slug-based URLs** — HLTB uses integer `game_id` for all game pages, not slugs. There is no `title-to-slug` mapping; use search to find the `game_id` first.
472
+
473
+ - **`sortCategory` options** — `"popular"` ranks by community engagement (best for "top result = intended game"). `"name"` sorts alphabetically. Other values (`"madnessTime"`, `"mainThenExtras"`) exist but return same results as `"name"` in testing.