@pencil-agent/nano-pencil 2.0.0-beta.8 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (241) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/extensions-host/index.d.ts +1 -1
  7. package/dist/core/extensions-host/loader.js +1 -1
  8. package/dist/core/extensions-host/runner.d.ts +1 -0
  9. package/dist/core/extensions-host/runner.js +2 -2
  10. package/dist/core/extensions-host/types.d.ts +17 -22
  11. package/dist/core/lib/ai/src/types.d.ts +12 -2
  12. package/dist/core/persona/persona-manager.js +5 -2
  13. package/dist/core/runtime/agent-session.js +3 -3
  14. package/dist/core/runtime/extension-core-bindings.d.ts +1 -0
  15. package/dist/core/runtime/extension-core-bindings.js +2 -2
  16. package/dist/extensions/builtin/AGENT.md +115 -115
  17. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  18. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  99. package/dist/extensions/builtin/browser/browser.md +73 -73
  100. package/dist/extensions/builtin/browser/install.md +142 -142
  101. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  102. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  104. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  105. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  112. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  113. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  114. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  115. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  116. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  117. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  118. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  119. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  120. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  121. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  122. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  123. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  124. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  125. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  126. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  127. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  128. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  129. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  130. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  131. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  132. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  133. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  134. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  135. package/dist/extensions/builtin/goal/README.md +67 -67
  136. package/dist/extensions/builtin/goal/goal-controller.d.ts +39 -10
  137. package/dist/extensions/builtin/goal/goal-controller.js +1 -1
  138. package/dist/extensions/builtin/goal/goal-format.js +1 -1
  139. package/dist/extensions/builtin/goal/goal-prompts.d.ts +2 -0
  140. package/dist/extensions/builtin/goal/goal-prompts.js +5 -4
  141. package/dist/extensions/builtin/goal/goal-store.js +1 -1
  142. package/dist/extensions/builtin/goal/index.d.ts +1 -1
  143. package/dist/extensions/builtin/goal/index.js +10 -7
  144. package/dist/extensions/builtin/grub/README.md +112 -112
  145. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  146. package/dist/extensions/builtin/link-world/index.js +6 -6
  147. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  148. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  149. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  150. package/dist/extensions/builtin/link-world/{network-routing.md → network-routing/network-routing.md} +67 -67
  151. package/dist/extensions/builtin/loop/README.md +92 -92
  152. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  153. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  154. package/dist/extensions/builtin/plan/index.js +1 -1
  155. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  156. package/dist/extensions/builtin/sal/README.md +72 -72
  157. package/dist/extensions/builtin/security-audit/README.md +289 -289
  158. package/dist/extensions/builtin/task/task-store.d.ts +4 -0
  159. package/dist/extensions/builtin/task/task-store.js +1 -1
  160. package/dist/extensions/builtin/team/AGENT.md +112 -112
  161. package/dist/extensions/builtin/team/TESTING.md +299 -299
  162. package/dist/extensions/builtin/token-save/README.md +56 -56
  163. package/dist/extensions/optional/AGENT.md +10 -10
  164. package/dist/index.d.ts +5 -30
  165. package/dist/index.js +1 -1
  166. package/dist/models.d.ts +7 -0
  167. package/dist/models.js +1 -0
  168. package/dist/modes/interactive/components/footer.js +1 -1
  169. package/dist/modes/interactive/components/task-status-panel.d.ts +36 -0
  170. package/dist/modes/interactive/components/task-status-panel.js +1 -0
  171. package/dist/modes/interactive/controllers/stream-render-controller.d.ts +7 -0
  172. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  173. package/dist/modes/interactive/interactive-mode.js +40 -40
  174. package/dist/modes/interactive/state/interactive-state.d.ts +2 -0
  175. package/dist/modes/interactive/state/interactive-state.js +1 -1
  176. package/dist/modes/interactive/theme/dark.json +85 -85
  177. package/dist/modes/interactive/theme/light.json +84 -84
  178. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  179. package/dist/modes/interactive/theme/warm.json +81 -81
  180. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  181. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  182. package/dist/node_modules/@pencil-agent/ai/dist/providers/anthropic.js +2 -2
  183. package/dist/node_modules/@pencil-agent/ai/dist/providers/openai-completions.js +5 -5
  184. package/dist/node_modules/@pencil-agent/ai/dist/providers/openai-responses.js +1 -1
  185. package/dist/node_modules/@pencil-agent/ai/dist/stream.js +1 -1
  186. package/dist/packages/protocol/src/commands.d.ts +33 -0
  187. package/dist/packages/protocol/src/flags.d.ts +20 -0
  188. package/dist/packages/protocol/src/hooks.d.ts +17 -0
  189. package/dist/packages/protocol/src/hooks.js +0 -0
  190. package/dist/packages/{extension-sdk → protocol}/src/index.d.ts +7 -4
  191. package/dist/packages/protocol/src/index.js +1 -0
  192. package/dist/packages/{extension-sdk → protocol}/src/lifecycle.d.ts +15 -27
  193. package/dist/packages/protocol/src/lifecycle.js +0 -0
  194. package/dist/packages/{extension-sdk → protocol}/src/tools.d.ts +1 -1
  195. package/dist/packages/protocol/src/tools.js +0 -0
  196. package/dist/public-config.d.ts +12 -0
  197. package/dist/public-config.js +1 -0
  198. package/dist/runtime.d.ts +9 -0
  199. package/dist/runtime.js +1 -0
  200. package/dist/session-compaction.d.ts +7 -0
  201. package/dist/session-compaction.js +1 -0
  202. package/dist/session.d.ts +7 -0
  203. package/dist/session.js +1 -0
  204. package/dist/skills.d.ts +7 -0
  205. package/dist/skills.js +1 -0
  206. package/dist/tools.d.ts +7 -0
  207. package/dist/tools.js +1 -0
  208. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  209. package/docs/SDK-TESTING.md +364 -0
  210. package/docs/codex-goal-command-impl.md +1055 -1055
  211. package/docs/codex-goal-vs-grub.md +500 -500
  212. package/docs/custom-provider.md +27 -27
  213. package/docs/extensions.md +27 -27
  214. package/docs/keybindings.md +27 -27
  215. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  216. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  217. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  218. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  219. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  220. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  221. package/docs/loop-usage-examples.md +214 -214
  222. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  223. package/docs/models.md +27 -27
  224. package/docs/packages.md +27 -27
  225. package/docs/pi-design-philosophy.md +457 -457
  226. package/docs/planmode.md +1987 -1987
  227. package/docs/prompt-templates.md +27 -27
  228. package/docs/providers.md +27 -27
  229. package/docs/sdk.md +27 -27
  230. package/docs/skills.md +27 -27
  231. package/docs/startup-performance-optimization.md +301 -0
  232. package/docs/themes.md +27 -27
  233. package/docs/tui.md +27 -27
  234. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  235. package/package.json +190 -162
  236. package/dist/packages/extension-sdk/src/index.js +0 -1
  237. package/docs/cc-agent-design.md +0 -1297
  238. package/docs/cc-tui-design.md +0 -1333
  239. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  240. /package/dist/packages/{extension-sdk/src/lifecycle.js → protocol/src/commands.js} +0 -0
  241. /package/dist/packages/{extension-sdk/src/tools.js → protocol/src/flags.js} +0 -0
@@ -1,473 +1,473 @@
1
- # HowLongToBeat — Scraping & Data Extraction
2
-
3
- Field-tested against howlongtobeat.com on 2026-04-18. All code blocks validated with live requests.
4
-
5
- ## Do this first
6
-
7
- **Use the search API — it returns structured JSON with all completion times in one POST call.**
8
-
9
- HLTB runs a token-gated POST endpoint at `/api/find`. You must first fetch a session token from `/api/find/init`, then include it in the search request. Both steps are plain HTTP — no browser required.
10
-
11
- ```python
12
- import json, re, urllib.request, time
13
- from helpers import http_get
14
-
15
- UA = "Mozilla/5.0"
16
-
17
- def get_token():
18
- """Fetch a fresh session token. Token encodes IP+UA+timestamp, reusable for ~15 min."""
19
- url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
20
- data = http_get(url, headers={"Referer": "https://howlongtobeat.com/"})
21
- return json.loads(data) # {token, hpKey, hpVal}
22
-
23
- def search_hltb(title, size=20, page=1, token_data=None):
24
- """
25
- Search HLTB for games. Returns raw API dict:
26
- {count, pageCurrent, pageTotal, pageSize, data: [...]}
27
- token_data can be reused across searches (fetch once, use many times).
28
- """
29
- if token_data is None:
30
- token_data = get_token()
31
- hp_key, hp_val = token_data['hpKey'], token_data['hpVal']
32
- payload = {
33
- "searchType": "games",
34
- "searchTerms": title.split(),
35
- "searchPage": page,
36
- "size": size,
37
- "searchOptions": {
38
- "games": {
39
- "userId": 0, "platform": "", "sortCategory": "popular",
40
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
41
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
42
- "rangeYear": {"min": "", "max": ""}, "modifier": ""
43
- },
44
- "users": {"sortCategory": "postcount"},
45
- "lists": {"sortCategory": "follows"},
46
- "filter": "", "sort": 0, "randomizer": 0
47
- },
48
- "useCache": True,
49
- hp_key: hp_val # honeypot field — key and value vary per token
50
- }
51
- req = urllib.request.Request(
52
- "https://howlongtobeat.com/api/find",
53
- data=json.dumps(payload).encode(),
54
- headers={
55
- "User-Agent": UA,
56
- "Content-Type": "application/json",
57
- "Origin": "https://howlongtobeat.com",
58
- "Referer": "https://howlongtobeat.com/",
59
- "x-auth-token": token_data['token'],
60
- "x-hp-key": hp_key,
61
- "x-hp-val": hp_val,
62
- },
63
- method="POST"
64
- )
65
- with urllib.request.urlopen(req, timeout=20) as r:
66
- return json.loads(r.read().decode())
67
-
68
- # Usage
69
- tok = get_token()
70
-
71
- result = search_hltb("elden ring", token_data=tok, size=3)
72
- for g in result['data']:
73
- print(g['game_id'], g['game_name'], g['release_world'])
74
- print(f" Main: {g['comp_main']/3600:.1f}h +Extras: {g['comp_plus']/3600:.1f}h 100%: {g['comp_100']/3600:.1f}h")
75
-
76
- # Confirmed output (2026-04-18):
77
- # 68151 Elden Ring 2022
78
- # Main: 60.0h +Extras: 101.2h 100%: 135.5h
79
- # 160589 Elden Ring: Nightreign 2025
80
- # Main: 28.1h +Extras: 40.1h 100%: 66.9h
81
- # 139385 Elden Ring: Shadow of the Erdtree 2024
82
- # Main: 25.7h +Extras: 39.0h 100%: 51.1h
83
- ```
84
-
85
- Token is reusable — fetch it once and pass it to multiple `search_hltb()` calls. No need to re-fetch per search.
86
-
87
- ---
88
-
89
- ## Fastest approach: search + parse in one helper
90
-
91
- ```python
92
- import json, re, urllib.request, time
93
- from helpers import http_get
94
-
95
- UA = "Mozilla/5.0"
96
-
97
- def hltb_search(title, size=5):
98
- """One-shot: get token + search, return list of dicts with hours."""
99
- url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
100
- tok = json.loads(http_get(url, headers={"Referer": "https://howlongtobeat.com/"}))
101
- hp_key, hp_val = tok['hpKey'], tok['hpVal']
102
- payload = {
103
- "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": size,
104
- "searchOptions": {
105
- "games": {"userId": 0, "platform": "", "sortCategory": "popular",
106
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
107
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
108
- "rangeYear": {"min": "", "max": ""}, "modifier": ""},
109
- "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
110
- "filter": "", "sort": 0, "randomizer": 0
111
- },
112
- "useCache": True, hp_key: hp_val
113
- }
114
- req = urllib.request.Request(
115
- "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
116
- headers={"User-Agent": UA, "Content-Type": "application/json",
117
- "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
118
- "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
119
- method="POST"
120
- )
121
- with urllib.request.urlopen(req, timeout=20) as r:
122
- data = json.loads(r.read().decode())
123
-
124
- def h(secs):
125
- return round(secs / 3600, 1) if secs else None
126
-
127
- return [
128
- {
129
- "game_id": g["game_id"],
130
- "name": g["game_name"],
131
- "type": g["game_type"], # "game" | "dlc" | "expansion" | "hack"
132
- "year": g["release_world"],
133
- "platforms": g["profile_platform"],
134
- "main": h(g["comp_main"]), # Main Story hours (polled average)
135
- "main_plus": h(g["comp_plus"]), # Main + Extras hours
136
- "completionist":h(g["comp_100"]), # Completionist hours
137
- "all_styles": h(g["comp_all"]), # All playstyles combined
138
- "main_count": g["comp_main_count"], # Number of submissions
139
- "plus_count": g["comp_plus_count"],
140
- "comp_count": g["comp_100_count"],
141
- "review_score": g["review_score"], # 0–100
142
- "image_url": f"https://howlongtobeat.com/games/{g['game_image']}",
143
- "page_url": f"https://howlongtobeat.com/game/{g['game_id']}",
144
- }
145
- for g in data["data"]
146
- ]
147
-
148
- # Verified results (2026-04-18):
149
- print(hltb_search("the witcher 3")[0])
150
- # {'game_id': 10270, 'name': 'The Witcher 3: Wild Hunt', 'type': 'game', 'year': 2015,
151
- # 'main': 51.6, 'main_plus': 103.8, 'completionist': 174.4, 'all_styles': 103.8,
152
- # 'main_count': 2681, 'plus_count': 6708, 'comp_count': 2327, 'review_score': 93, ...}
153
-
154
- print(hltb_search("gone home")[0])
155
- # {'game_id': 4010, 'name': 'Gone Home', 'main': 2.0, 'main_plus': 2.5, 'completionist': 3.1, ...}
156
- ```
157
-
158
- ---
159
-
160
- ## Game detail page (full stat breakdown, speedrun data, per-platform times)
161
-
162
- When you have a `game_id`, fetch the game page and extract `__NEXT_DATA__` for the complete dataset — includes median/avg/low/high times, speedrun data, co-op/multiplayer times, and per-platform breakdowns.
163
-
164
- ```python
165
- import json, re
166
- from helpers import http_get
167
-
168
- def get_game_detail(game_id):
169
- """
170
- Fetch complete game data from the HLTB game page.
171
- Returns pageProps['game']['data'] with keys: 'game', 'individuality', 'relationships'.
172
- """
173
- html = http_get(f"https://howlongtobeat.com/game/{game_id}")
174
- nd = json.loads(re.search(
175
- r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
176
- ).group(1))
177
- return nd['props']['pageProps']['game']['data']
178
-
179
- data = get_game_detail(10270) # Witcher 3
180
- g = data['game'][0]
181
-
182
- # Core completion times (all in seconds — divide by 3600 for hours)
183
- print(g['comp_main'] / 3600) # 51.6 — Main Story (polled avg)
184
- print(g['comp_main_med'] / 3600) # 50.0 — Main Story median
185
- print(g['comp_main_l'] / 3600) # 32.7 — Main Story low
186
- print(g['comp_main_h'] / 3600) # 85.8 — Main Story high
187
- print(g['comp_main_count']) # 2681 — submission count
188
-
189
- print(g['comp_plus'] / 3600) # 103.8 — Main + Extras
190
- print(g['comp_100'] / 3600) # 174.4 — Completionist
191
- print(g['comp_all'] / 3600) # 103.8 — All Styles
192
-
193
- # Speedrun times
194
- print(g['comp_lvl_spd']) # 1 if speedrun data exists, 0 if not
195
- print(g['comp_speed'] / 3600) # 19.2 — any% (polled avg)
196
- print(g['comp_speed_min'] / 3600) # 3.2 — fastest submission
197
- print(g['comp_speed_max'] / 3600) # 30.0 — slowest speedrun
198
- print(g['comp_speed_count']) # 15 — speedrun submissions
199
-
200
- print(g['comp_speed100'] / 3600) # 59.4 — 100% speedrun
201
- print(g['comp_speed100_count']) # 4
202
-
203
- # Multiplayer / co-op invested time
204
- print(g['comp_lvl_co']) # 1 if co-op data exists
205
- print(g['comp_lvl_mp']) # 1 if multiplayer data exists
206
- print(g['invested_co'] / 3600) # hours in co-op mode
207
- print(g['invested_mp'] / 3600) # hours in competitive multiplayer
208
- print(g['invested_co_count']) # submission count
209
-
210
- # Metadata
211
- print(g['profile_dev']) # "CD Projekt RED"
212
- print(g['profile_pub']) # "CD Projekt, Warner Bros..."
213
- print(g['profile_platform']) # "Nintendo Switch, PC, PlayStation 4, ..."
214
- print(g['profile_genre']) # "Third-Person, Action, Open World, Role-Playing"
215
- print(g['profile_steam']) # 292030 — Steam App ID (0 if not on Steam)
216
- print(g['release_world']) # "2015-05-19"
217
- print(g['rating_esrb']) # "M"
218
- print(g['review_score']) # 93 (0–100)
219
- print(g['count_comp']) # 26007 — times completed
220
- print(g['count_backlog']) # 31083
221
-
222
- # Per-platform breakdown (individuality)
223
- for plat in data['individuality']:
224
- print(plat['platform'],
225
- int(plat['comp_main'])/3600, # main hours
226
- int(plat['comp_plus'])/3600, # +extras hours
227
- int(plat['comp_100'])/3600, # 100% hours
228
- plat['count_comp']) # completions on this platform
229
- # Example:
230
- # Nintendo Switch 57.0h 112.3h 194.9h 236
231
- # PC, PS4, Xbox One 52.9h 110.0h 179.4h 11136
232
- # PS5, Xbox Series X/S 52.1h 92.5h 168.8h 343
233
-
234
- # DLC / expansion completion times
235
- for rel in data['relationships'][:3]:
236
- print(rel['game_id'], rel['game_name'], rel['game_type'],
237
- rel['comp_main']/3600 if rel['comp_main'] else None)
238
- ```
239
-
240
- ---
241
-
242
- ## Common workflows
243
-
244
- ### Quick lookup: name → completion times
245
-
246
- ```python
247
- import json, re, urllib.request, time
248
- from helpers import http_get
249
-
250
- UA = "Mozilla/5.0"
251
-
252
- def get_times(title):
253
- """Return Main/+Extras/100% hours for the top search match."""
254
- tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
255
- tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
256
- hp_key, hp_val = tok['hpKey'], tok['hpVal']
257
- payload = {
258
- "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": 1,
259
- "searchOptions": {
260
- "games": {"userId": 0, "platform": "", "sortCategory": "popular",
261
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
262
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
263
- "rangeYear": {"min": "", "max": ""}, "modifier": ""},
264
- "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
265
- "filter": "", "sort": 0, "randomizer": 0
266
- },
267
- "useCache": True, hp_key: hp_val
268
- }
269
- req = urllib.request.Request(
270
- "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
271
- headers={"User-Agent": UA, "Content-Type": "application/json",
272
- "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
273
- "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
274
- method="POST"
275
- )
276
- with urllib.request.urlopen(req, timeout=20) as r:
277
- data = json.loads(r.read().decode())
278
- if not data['data']:
279
- return None
280
- g = data['data'][0]
281
- h = lambda s: round(s/3600, 1) if s else None
282
- return {
283
- "id": g['game_id'], "name": g['game_name'],
284
- "main": h(g['comp_main']), "main_plus": h(g['comp_plus']),
285
- "completionist": h(g['comp_100'])
286
- }
287
-
288
- # Verified:
289
- print(get_times("celeste"))
290
- # {'id': 42818, 'name': 'Celeste', 'main': 8.3, 'main_plus': 14.6, 'completionist': 39.2}
291
- print(get_times("stardew valley"))
292
- # {'id': 34716, 'name': 'Stardew Valley', 'main': 53.4, 'main_plus': 94.6, 'completionist': 171.5}
293
- print(get_times("hades"))
294
- # {'id': 62941, 'name': 'Hades', 'main': 23.4, 'main_plus': 48.5, 'completionist': 95.0}
295
- ```
296
-
297
- ### Paginated search (all results for a query)
298
-
299
- `count` = total matches, `pageTotal` = total pages with current `size`. The same token works across all pages.
300
-
301
- ```python
302
- def search_all_pages(title, size=20):
303
- """Yield every search result for a query across all pages."""
304
- tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
305
- tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
306
- hp_key, hp_val = tok['hpKey'], tok['hpVal']
307
-
308
- page = 1
309
- while True:
310
- payload = {
311
- "searchType": "games", "searchTerms": title.split(),
312
- "searchPage": page, "size": size,
313
- "searchOptions": {
314
- "games": {"userId": 0, "platform": "", "sortCategory": "popular",
315
- "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
316
- "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
317
- "rangeYear": {"min": "", "max": ""}, "modifier": ""},
318
- "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
319
- "filter": "", "sort": 0, "randomizer": 0
320
- },
321
- "useCache": True, hp_key: hp_val
322
- }
323
- req = urllib.request.Request(
324
- "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
325
- headers={"User-Agent": UA, "Content-Type": "application/json",
326
- "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
327
- "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
328
- method="POST"
329
- )
330
- with urllib.request.urlopen(req, timeout=20) as r:
331
- data = json.loads(r.read().decode())
332
- yield from data['data']
333
- if page >= data['pageTotal']:
334
- break
335
- page += 1
336
-
337
- # "mario" returns 308 results across 16 pages (size=20)
338
- mario_games = list(search_all_pages("mario", size=20))
339
- print(len(mario_games)) # 308
340
- ```
341
-
342
- ### Batch lookup by game ID (parallel)
343
-
344
- ```python
345
- import json, re, urllib.request
346
- from concurrent.futures import ThreadPoolExecutor
347
- from helpers import http_get
348
-
349
- def fetch_game(game_id):
350
- html = http_get(f"https://howlongtobeat.com/game/{game_id}")
351
- nd = json.loads(re.search(
352
- r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
353
- ).group(1))
354
- g = nd['props']['pageProps']['game']['data']['game'][0]
355
- return {
356
- "id": g['game_id'], "name": g['game_name'],
357
- "main": round(g['comp_main']/3600, 1) if g['comp_main'] else None,
358
- "main_plus": round(g['comp_plus']/3600, 1) if g['comp_plus'] else None,
359
- "completionist": round(g['comp_100']/3600, 1) if g['comp_100'] else None,
360
- }
361
-
362
- ids = [10270, 68151, 42818, 26803, 34716] # Witcher3, Elden Ring, Celeste, DS3, Stardew
363
- with ThreadPoolExecutor(max_workers=5) as ex:
364
- results = list(ex.map(fetch_game, ids))
365
-
366
- for r in results:
367
- print(f"[{r['id']}] {r['name']}: {r['main']}h / {r['main_plus']}h / {r['completionist']}h")
368
-
369
- # Confirmed output:
370
- # [10270] The Witcher 3: Wild Hunt: 51.6h / 103.8h / 174.4h
371
- # [68151] Elden Ring: 60.0h / 101.2h / 135.5h
372
- # [42818] Celeste: 8.3h / 14.6h / 39.2h
373
- # [26803] Dark Souls III: 31.2h / 48.4h / 100.5h
374
- # [34716] Stardew Valley: 53.4h / 94.6h / 171.5h
375
- ```
376
-
377
- ---
378
-
379
- ## Search response field reference
380
-
381
- Every item in `data[]` from `/api/find`:
382
-
383
- | Field | Type | Description |
384
- |-------|------|-------------|
385
- | `game_id` | int | HLTB internal game ID |
386
- | `game_name` | str | Full game title |
387
- | `game_alias` | str | Alternate title / edition name |
388
- | `game_type` | str | `"game"` \| `"dlc"` \| `"expansion"` \| `"hack"` |
389
- | `game_image` | str | Image filename → `https://howlongtobeat.com/games/{game_image}` |
390
- | `release_world` | int | Release year (just the year integer, not a date) |
391
- | `profile_platform` | str | Comma-separated platform list |
392
- | `comp_main` | int | Main Story seconds (polled average), 0 if no data |
393
- | `comp_plus` | int | Main + Extras seconds |
394
- | `comp_100` | int | Completionist seconds |
395
- | `comp_all` | int | All Styles combined seconds |
396
- | `comp_main_count` | int | Submission count for Main Story |
397
- | `comp_plus_count` | int | Submission count for Main + Extras |
398
- | `comp_100_count` | int | Submission count for Completionist |
399
- | `comp_all_count` | int | Total submissions across all categories |
400
- | `comp_lvl_sp` | int | 1 if single-player data exists |
401
- | `comp_lvl_co` | int | 1 if co-op data exists |
402
- | `comp_lvl_mp` | int | 1 if multiplayer data exists |
403
- | `invested_co` | int | Average co-op time in seconds |
404
- | `invested_mp` | int | Average multiplayer time in seconds |
405
- | `count_comp` | int | Total completions logged |
406
- | `count_backlog` | int | Users with game in backlog |
407
- | `count_playing` | int | Currently playing |
408
- | `count_speedrun` | int | Speedrun entries |
409
- | `count_review` | int | Review count |
410
- | `review_score` | int | Community review score 0–100 |
411
- | `profile_popular` | int | Popularity rank |
412
-
413
- Additional fields in `__NEXT_DATA__` game page only:
414
-
415
- | Field | Description |
416
- |-------|-------------|
417
- | `comp_main_med/avg/l/h` | Median / average / low / high for main time |
418
- | `comp_plus_med/avg/l/h` | Same for Main + Extras |
419
- | `comp_100_med/avg/l/h` | Same for Completionist |
420
- | `comp_speed` | Speedrun any% average seconds |
421
- | `comp_speed_min/max/med` | Speedrun spread |
422
- | `comp_speed100` | 100% speedrun average |
423
- | `comp_speed_count` | Speedrun submission count |
424
- | `comp_lvl_spd` | 1 if speedrun data exists |
425
- | `profile_dev` | Developer name |
426
- | `profile_pub` | Publisher name |
427
- | `profile_genre` | Comma-separated genres |
428
- | `profile_steam` | Steam App ID (0 if not on Steam) |
429
- | `release_world` | Full release date `"YYYY-MM-DD"` |
430
- | `rating_esrb` | ESRB rating string (may be empty) |
431
- | `count_replay` | Times replayed |
432
- | `count_total` | Total user entries |
433
-
434
- ---
435
-
436
- ## Anti-bot measures
437
-
438
- - **Cloudflare** is present (confirmed by `CF-Ray` response header), but does not block plain HTTP with a browser UA.
439
- - **Token system**: Every search requires a fresh token from `/api/find/init`. Token encodes `timestamp::IP|UA|hpKey|hmacHash`. The server validates that the UA used to fetch the token matches the UA used in the search POST.
440
- - **Honeypot field**: `hpKey` and `hpVal` from the init response must appear as a top-level field in the POST body (e.g., `{"ign_7671546b": "a6679ea54598d502", ...}`). The key name rotates per request.
441
- - **Required headers on search POST**: `Origin: https://howlongtobeat.com` AND `Referer: https://howlongtobeat.com/` — missing either causes HTTP 403 or 404. `x-auth-token`, `x-hp-key`, `x-hp-val` are also required.
442
- - **Required header on init GET**: `Referer: https://howlongtobeat.com/` — missing causes HTTP 403.
443
- - **Token reuse**: A single token works for multiple searches and multiple pages. No per-request token fetch needed.
444
- - **No CAPTCHA** observed during testing with standard UA strings.
445
- - **Rate limits**: Not triggered during testing (token fetches + 10+ searches sequentially). Fetching many game pages in parallel (5 workers) worked without 429s.
446
-
447
- ---
448
-
449
- ## Gotchas
450
-
451
- - **Completion times are in seconds** — all `comp_*` fields are integer seconds. Divide by 3600 for hours. `0` means no data (not 0 hours).
452
-
453
- - **`release_world` is a year int in search, a full date in game page** — in the `/api/find` response, `release_world` is an integer year (e.g., `2015`). In `__NEXT_DATA__` on the game page, it's `"2015-05-19"`.
454
-
455
- - **UA fingerprinting** — the token from `/api/find/init` encodes the User-Agent. The search POST must use the identical UA that fetched the token, or you'll get HTTP 403. Since `http_get` sends `Mozilla/5.0`, use that same string for the search POST.
456
-
457
- - **Honeypot key name rotates** — `hpKey` is something like `ign_7671546b` (changes each token fetch). Always read it from the init response and use it dynamically. Never hardcode it.
458
-
459
- - **Both `x-hp-key`/`x-hp-val` headers AND the body field are required** — the server checks the request headers (`x-hp-key`, `x-hp-val`) against the dynamic key in the POST body. If either is wrong or missing, you get HTTP 404 (wrong body value) or HTTP 403 (missing/wrong header).
460
-
461
- - **`game_type` in search results** — can be `"game"`, `"dlc"`, `"expansion"`, or `"hack"`. Search results mix these by default. Filter with `if g['game_type'] == 'game'` if you only want base games.
462
-
463
- - **Games with no submission data** — `comp_main`, `comp_plus`, `comp_100` are `0` (not `None`) when no users have submitted times. Always check `if g['comp_main']:` before dividing.
464
-
465
- - **`individuality` (per-platform) data** — available only in `__NEXT_DATA__` on the game page, not in search results. `comp_main` etc. are strings, not ints, in this sub-object — cast with `int(plat['comp_main'])`.
466
-
467
- - **`profile_platform` in search** — a comma-separated string that HLTB displays. Not structured. Use game page `individuality` for per-platform time breakdowns.
468
-
469
- - **Token expiry** — if a long-running loop gets HTTP 403 with `{"error":"Session expired or invalid fingerprint"}`, call `get_token()` again and retry. Token lifetime appears to be ~15 minutes based on the timestamp embedded in the decoded value.
470
-
471
- - **No slug-based URLs** — HLTB uses integer `game_id` for all game pages, not slugs. There is no `title-to-slug` mapping; use search to find the `game_id` first.
472
-
473
- - **`sortCategory` options** — `"popular"` ranks by community engagement (best for "top result = intended game"). `"name"` sorts alphabetically. Other values (`"madnessTime"`, `"mainThenExtras"`) exist but return same results as `"name"` in testing.
1
+ # HowLongToBeat — Scraping & Data Extraction
2
+
3
+ Field-tested against howlongtobeat.com on 2026-04-18. All code blocks validated with live requests.
4
+
5
+ ## Do this first
6
+
7
+ **Use the search API — it returns structured JSON with all completion times in one POST call.**
8
+
9
+ HLTB runs a token-gated POST endpoint at `/api/find`. You must first fetch a session token from `/api/find/init`, then include it in the search request. Both steps are plain HTTP — no browser required.
10
+
11
+ ```python
12
+ import json, re, urllib.request, time
13
+ from helpers import http_get
14
+
15
+ UA = "Mozilla/5.0"
16
+
17
+ def get_token():
18
+ """Fetch a fresh session token. Token encodes IP+UA+timestamp, reusable for ~15 min."""
19
+ url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
20
+ data = http_get(url, headers={"Referer": "https://howlongtobeat.com/"})
21
+ return json.loads(data) # {token, hpKey, hpVal}
22
+
23
+ def search_hltb(title, size=20, page=1, token_data=None):
24
+ """
25
+ Search HLTB for games. Returns raw API dict:
26
+ {count, pageCurrent, pageTotal, pageSize, data: [...]}
27
+ token_data can be reused across searches (fetch once, use many times).
28
+ """
29
+ if token_data is None:
30
+ token_data = get_token()
31
+ hp_key, hp_val = token_data['hpKey'], token_data['hpVal']
32
+ payload = {
33
+ "searchType": "games",
34
+ "searchTerms": title.split(),
35
+ "searchPage": page,
36
+ "size": size,
37
+ "searchOptions": {
38
+ "games": {
39
+ "userId": 0, "platform": "", "sortCategory": "popular",
40
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
41
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
42
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""
43
+ },
44
+ "users": {"sortCategory": "postcount"},
45
+ "lists": {"sortCategory": "follows"},
46
+ "filter": "", "sort": 0, "randomizer": 0
47
+ },
48
+ "useCache": True,
49
+ hp_key: hp_val # honeypot field — key and value vary per token
50
+ }
51
+ req = urllib.request.Request(
52
+ "https://howlongtobeat.com/api/find",
53
+ data=json.dumps(payload).encode(),
54
+ headers={
55
+ "User-Agent": UA,
56
+ "Content-Type": "application/json",
57
+ "Origin": "https://howlongtobeat.com",
58
+ "Referer": "https://howlongtobeat.com/",
59
+ "x-auth-token": token_data['token'],
60
+ "x-hp-key": hp_key,
61
+ "x-hp-val": hp_val,
62
+ },
63
+ method="POST"
64
+ )
65
+ with urllib.request.urlopen(req, timeout=20) as r:
66
+ return json.loads(r.read().decode())
67
+
68
+ # Usage
69
+ tok = get_token()
70
+
71
+ result = search_hltb("elden ring", token_data=tok, size=3)
72
+ for g in result['data']:
73
+ print(g['game_id'], g['game_name'], g['release_world'])
74
+ print(f" Main: {g['comp_main']/3600:.1f}h +Extras: {g['comp_plus']/3600:.1f}h 100%: {g['comp_100']/3600:.1f}h")
75
+
76
+ # Confirmed output (2026-04-18):
77
+ # 68151 Elden Ring 2022
78
+ # Main: 60.0h +Extras: 101.2h 100%: 135.5h
79
+ # 160589 Elden Ring: Nightreign 2025
80
+ # Main: 28.1h +Extras: 40.1h 100%: 66.9h
81
+ # 139385 Elden Ring: Shadow of the Erdtree 2024
82
+ # Main: 25.7h +Extras: 39.0h 100%: 51.1h
83
+ ```
84
+
85
+ Token is reusable — fetch it once and pass it to multiple `search_hltb()` calls. No need to re-fetch per search.
86
+
87
+ ---
88
+
89
+ ## Fastest approach: search + parse in one helper
90
+
91
+ ```python
92
+ import json, re, urllib.request, time
93
+ from helpers import http_get
94
+
95
+ UA = "Mozilla/5.0"
96
+
97
+ def hltb_search(title, size=5):
98
+ """One-shot: get token + search, return list of dicts with hours."""
99
+ url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
100
+ tok = json.loads(http_get(url, headers={"Referer": "https://howlongtobeat.com/"}))
101
+ hp_key, hp_val = tok['hpKey'], tok['hpVal']
102
+ payload = {
103
+ "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": size,
104
+ "searchOptions": {
105
+ "games": {"userId": 0, "platform": "", "sortCategory": "popular",
106
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
107
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
108
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""},
109
+ "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
110
+ "filter": "", "sort": 0, "randomizer": 0
111
+ },
112
+ "useCache": True, hp_key: hp_val
113
+ }
114
+ req = urllib.request.Request(
115
+ "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
116
+ headers={"User-Agent": UA, "Content-Type": "application/json",
117
+ "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
118
+ "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
119
+ method="POST"
120
+ )
121
+ with urllib.request.urlopen(req, timeout=20) as r:
122
+ data = json.loads(r.read().decode())
123
+
124
+ def h(secs):
125
+ return round(secs / 3600, 1) if secs else None
126
+
127
+ return [
128
+ {
129
+ "game_id": g["game_id"],
130
+ "name": g["game_name"],
131
+ "type": g["game_type"], # "game" | "dlc" | "expansion" | "hack"
132
+ "year": g["release_world"],
133
+ "platforms": g["profile_platform"],
134
+ "main": h(g["comp_main"]), # Main Story hours (polled average)
135
+ "main_plus": h(g["comp_plus"]), # Main + Extras hours
136
+ "completionist":h(g["comp_100"]), # Completionist hours
137
+ "all_styles": h(g["comp_all"]), # All playstyles combined
138
+ "main_count": g["comp_main_count"], # Number of submissions
139
+ "plus_count": g["comp_plus_count"],
140
+ "comp_count": g["comp_100_count"],
141
+ "review_score": g["review_score"], # 0–100
142
+ "image_url": f"https://howlongtobeat.com/games/{g['game_image']}",
143
+ "page_url": f"https://howlongtobeat.com/game/{g['game_id']}",
144
+ }
145
+ for g in data["data"]
146
+ ]
147
+
148
+ # Verified results (2026-04-18):
149
+ print(hltb_search("the witcher 3")[0])
150
+ # {'game_id': 10270, 'name': 'The Witcher 3: Wild Hunt', 'type': 'game', 'year': 2015,
151
+ # 'main': 51.6, 'main_plus': 103.8, 'completionist': 174.4, 'all_styles': 103.8,
152
+ # 'main_count': 2681, 'plus_count': 6708, 'comp_count': 2327, 'review_score': 93, ...}
153
+
154
+ print(hltb_search("gone home")[0])
155
+ # {'game_id': 4010, 'name': 'Gone Home', 'main': 2.0, 'main_plus': 2.5, 'completionist': 3.1, ...}
156
+ ```
157
+
158
+ ---
159
+
160
+ ## Game detail page (full stat breakdown, speedrun data, per-platform times)
161
+
162
+ When you have a `game_id`, fetch the game page and extract `__NEXT_DATA__` for the complete dataset — includes median/avg/low/high times, speedrun data, co-op/multiplayer times, and per-platform breakdowns.
163
+
164
+ ```python
165
+ import json, re
166
+ from helpers import http_get
167
+
168
+ def get_game_detail(game_id):
169
+ """
170
+ Fetch complete game data from the HLTB game page.
171
+ Returns pageProps['game']['data'] with keys: 'game', 'individuality', 'relationships'.
172
+ """
173
+ html = http_get(f"https://howlongtobeat.com/game/{game_id}")
174
+ nd = json.loads(re.search(
175
+ r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
176
+ ).group(1))
177
+ return nd['props']['pageProps']['game']['data']
178
+
179
+ data = get_game_detail(10270) # Witcher 3
180
+ g = data['game'][0]
181
+
182
+ # Core completion times (all in seconds — divide by 3600 for hours)
183
+ print(g['comp_main'] / 3600) # 51.6 — Main Story (polled avg)
184
+ print(g['comp_main_med'] / 3600) # 50.0 — Main Story median
185
+ print(g['comp_main_l'] / 3600) # 32.7 — Main Story low
186
+ print(g['comp_main_h'] / 3600) # 85.8 — Main Story high
187
+ print(g['comp_main_count']) # 2681 — submission count
188
+
189
+ print(g['comp_plus'] / 3600) # 103.8 — Main + Extras
190
+ print(g['comp_100'] / 3600) # 174.4 — Completionist
191
+ print(g['comp_all'] / 3600) # 103.8 — All Styles
192
+
193
+ # Speedrun times
194
+ print(g['comp_lvl_spd']) # 1 if speedrun data exists, 0 if not
195
+ print(g['comp_speed'] / 3600) # 19.2 — any% (polled avg)
196
+ print(g['comp_speed_min'] / 3600) # 3.2 — fastest submission
197
+ print(g['comp_speed_max'] / 3600) # 30.0 — slowest speedrun
198
+ print(g['comp_speed_count']) # 15 — speedrun submissions
199
+
200
+ print(g['comp_speed100'] / 3600) # 59.4 — 100% speedrun
201
+ print(g['comp_speed100_count']) # 4
202
+
203
+ # Multiplayer / co-op invested time
204
+ print(g['comp_lvl_co']) # 1 if co-op data exists
205
+ print(g['comp_lvl_mp']) # 1 if multiplayer data exists
206
+ print(g['invested_co'] / 3600) # hours in co-op mode
207
+ print(g['invested_mp'] / 3600) # hours in competitive multiplayer
208
+ print(g['invested_co_count']) # submission count
209
+
210
+ # Metadata
211
+ print(g['profile_dev']) # "CD Projekt RED"
212
+ print(g['profile_pub']) # "CD Projekt, Warner Bros..."
213
+ print(g['profile_platform']) # "Nintendo Switch, PC, PlayStation 4, ..."
214
+ print(g['profile_genre']) # "Third-Person, Action, Open World, Role-Playing"
215
+ print(g['profile_steam']) # 292030 — Steam App ID (0 if not on Steam)
216
+ print(g['release_world']) # "2015-05-19"
217
+ print(g['rating_esrb']) # "M"
218
+ print(g['review_score']) # 93 (0–100)
219
+ print(g['count_comp']) # 26007 — times completed
220
+ print(g['count_backlog']) # 31083
221
+
222
+ # Per-platform breakdown (individuality)
223
+ for plat in data['individuality']:
224
+ print(plat['platform'],
225
+ int(plat['comp_main'])/3600, # main hours
226
+ int(plat['comp_plus'])/3600, # +extras hours
227
+ int(plat['comp_100'])/3600, # 100% hours
228
+ plat['count_comp']) # completions on this platform
229
+ # Example:
230
+ # Nintendo Switch 57.0h 112.3h 194.9h 236
231
+ # PC, PS4, Xbox One 52.9h 110.0h 179.4h 11136
232
+ # PS5, Xbox Series X/S 52.1h 92.5h 168.8h 343
233
+
234
+ # DLC / expansion completion times
235
+ for rel in data['relationships'][:3]:
236
+ print(rel['game_id'], rel['game_name'], rel['game_type'],
237
+ rel['comp_main']/3600 if rel['comp_main'] else None)
238
+ ```
239
+
240
+ ---
241
+
242
+ ## Common workflows
243
+
244
+ ### Quick lookup: name → completion times
245
+
246
+ ```python
247
+ import json, re, urllib.request, time
248
+ from helpers import http_get
249
+
250
+ UA = "Mozilla/5.0"
251
+
252
+ def get_times(title):
253
+ """Return Main/+Extras/100% hours for the top search match."""
254
+ tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
255
+ tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
256
+ hp_key, hp_val = tok['hpKey'], tok['hpVal']
257
+ payload = {
258
+ "searchType": "games", "searchTerms": title.split(), "searchPage": 1, "size": 1,
259
+ "searchOptions": {
260
+ "games": {"userId": 0, "platform": "", "sortCategory": "popular",
261
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
262
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
263
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""},
264
+ "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
265
+ "filter": "", "sort": 0, "randomizer": 0
266
+ },
267
+ "useCache": True, hp_key: hp_val
268
+ }
269
+ req = urllib.request.Request(
270
+ "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
271
+ headers={"User-Agent": UA, "Content-Type": "application/json",
272
+ "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
273
+ "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
274
+ method="POST"
275
+ )
276
+ with urllib.request.urlopen(req, timeout=20) as r:
277
+ data = json.loads(r.read().decode())
278
+ if not data['data']:
279
+ return None
280
+ g = data['data'][0]
281
+ h = lambda s: round(s/3600, 1) if s else None
282
+ return {
283
+ "id": g['game_id'], "name": g['game_name'],
284
+ "main": h(g['comp_main']), "main_plus": h(g['comp_plus']),
285
+ "completionist": h(g['comp_100'])
286
+ }
287
+
288
+ # Verified:
289
+ print(get_times("celeste"))
290
+ # {'id': 42818, 'name': 'Celeste', 'main': 8.3, 'main_plus': 14.6, 'completionist': 39.2}
291
+ print(get_times("stardew valley"))
292
+ # {'id': 34716, 'name': 'Stardew Valley', 'main': 53.4, 'main_plus': 94.6, 'completionist': 171.5}
293
+ print(get_times("hades"))
294
+ # {'id': 62941, 'name': 'Hades', 'main': 23.4, 'main_plus': 48.5, 'completionist': 95.0}
295
+ ```
296
+
297
+ ### Paginated search (all results for a query)
298
+
299
+ `count` = total matches, `pageTotal` = total pages with current `size`. The same token works across all pages.
300
+
301
+ ```python
302
+ def search_all_pages(title, size=20):
303
+ """Yield every search result for a query across all pages."""
304
+ tok_url = f"https://howlongtobeat.com/api/find/init?t={int(time.time()*1000)}"
305
+ tok = json.loads(http_get(tok_url, headers={"Referer": "https://howlongtobeat.com/"}))
306
+ hp_key, hp_val = tok['hpKey'], tok['hpVal']
307
+
308
+ page = 1
309
+ while True:
310
+ payload = {
311
+ "searchType": "games", "searchTerms": title.split(),
312
+ "searchPage": page, "size": size,
313
+ "searchOptions": {
314
+ "games": {"userId": 0, "platform": "", "sortCategory": "popular",
315
+ "rangeCategory": "main", "rangeTime": {"min": None, "max": None},
316
+ "gameplay": {"perspective": "", "flow": "", "genre": "", "difficulty": ""},
317
+ "rangeYear": {"min": "", "max": ""}, "modifier": ""},
318
+ "users": {"sortCategory": "postcount"}, "lists": {"sortCategory": "follows"},
319
+ "filter": "", "sort": 0, "randomizer": 0
320
+ },
321
+ "useCache": True, hp_key: hp_val
322
+ }
323
+ req = urllib.request.Request(
324
+ "https://howlongtobeat.com/api/find", data=json.dumps(payload).encode(),
325
+ headers={"User-Agent": UA, "Content-Type": "application/json",
326
+ "Origin": "https://howlongtobeat.com", "Referer": "https://howlongtobeat.com/",
327
+ "x-auth-token": tok['token'], "x-hp-key": hp_key, "x-hp-val": hp_val},
328
+ method="POST"
329
+ )
330
+ with urllib.request.urlopen(req, timeout=20) as r:
331
+ data = json.loads(r.read().decode())
332
+ yield from data['data']
333
+ if page >= data['pageTotal']:
334
+ break
335
+ page += 1
336
+
337
+ # "mario" returns 308 results across 16 pages (size=20)
338
+ mario_games = list(search_all_pages("mario", size=20))
339
+ print(len(mario_games)) # 308
340
+ ```
341
+
342
+ ### Batch lookup by game ID (parallel)
343
+
344
+ ```python
345
+ import json, re, urllib.request
346
+ from concurrent.futures import ThreadPoolExecutor
347
+ from helpers import http_get
348
+
349
+ def fetch_game(game_id):
350
+ html = http_get(f"https://howlongtobeat.com/game/{game_id}")
351
+ nd = json.loads(re.search(
352
+ r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', html, re.DOTALL
353
+ ).group(1))
354
+ g = nd['props']['pageProps']['game']['data']['game'][0]
355
+ return {
356
+ "id": g['game_id'], "name": g['game_name'],
357
+ "main": round(g['comp_main']/3600, 1) if g['comp_main'] else None,
358
+ "main_plus": round(g['comp_plus']/3600, 1) if g['comp_plus'] else None,
359
+ "completionist": round(g['comp_100']/3600, 1) if g['comp_100'] else None,
360
+ }
361
+
362
+ ids = [10270, 68151, 42818, 26803, 34716] # Witcher3, Elden Ring, Celeste, DS3, Stardew
363
+ with ThreadPoolExecutor(max_workers=5) as ex:
364
+ results = list(ex.map(fetch_game, ids))
365
+
366
+ for r in results:
367
+ print(f"[{r['id']}] {r['name']}: {r['main']}h / {r['main_plus']}h / {r['completionist']}h")
368
+
369
+ # Confirmed output:
370
+ # [10270] The Witcher 3: Wild Hunt: 51.6h / 103.8h / 174.4h
371
+ # [68151] Elden Ring: 60.0h / 101.2h / 135.5h
372
+ # [42818] Celeste: 8.3h / 14.6h / 39.2h
373
+ # [26803] Dark Souls III: 31.2h / 48.4h / 100.5h
374
+ # [34716] Stardew Valley: 53.4h / 94.6h / 171.5h
375
+ ```
376
+
377
+ ---
378
+
379
+ ## Search response field reference
380
+
381
+ Every item in `data[]` from `/api/find`:
382
+
383
+ | Field | Type | Description |
384
+ |-------|------|-------------|
385
+ | `game_id` | int | HLTB internal game ID |
386
+ | `game_name` | str | Full game title |
387
+ | `game_alias` | str | Alternate title / edition name |
388
+ | `game_type` | str | `"game"` \| `"dlc"` \| `"expansion"` \| `"hack"` |
389
+ | `game_image` | str | Image filename → `https://howlongtobeat.com/games/{game_image}` |
390
+ | `release_world` | int | Release year (just the year integer, not a date) |
391
+ | `profile_platform` | str | Comma-separated platform list |
392
+ | `comp_main` | int | Main Story seconds (polled average), 0 if no data |
393
+ | `comp_plus` | int | Main + Extras seconds |
394
+ | `comp_100` | int | Completionist seconds |
395
+ | `comp_all` | int | All Styles combined seconds |
396
+ | `comp_main_count` | int | Submission count for Main Story |
397
+ | `comp_plus_count` | int | Submission count for Main + Extras |
398
+ | `comp_100_count` | int | Submission count for Completionist |
399
+ | `comp_all_count` | int | Total submissions across all categories |
400
+ | `comp_lvl_sp` | int | 1 if single-player data exists |
401
+ | `comp_lvl_co` | int | 1 if co-op data exists |
402
+ | `comp_lvl_mp` | int | 1 if multiplayer data exists |
403
+ | `invested_co` | int | Average co-op time in seconds |
404
+ | `invested_mp` | int | Average multiplayer time in seconds |
405
+ | `count_comp` | int | Total completions logged |
406
+ | `count_backlog` | int | Users with game in backlog |
407
+ | `count_playing` | int | Currently playing |
408
+ | `count_speedrun` | int | Speedrun entries |
409
+ | `count_review` | int | Review count |
410
+ | `review_score` | int | Community review score 0–100 |
411
+ | `profile_popular` | int | Popularity rank |
412
+
413
+ Additional fields in `__NEXT_DATA__` game page only:
414
+
415
+ | Field | Description |
416
+ |-------|-------------|
417
+ | `comp_main_med/avg/l/h` | Median / average / low / high for main time |
418
+ | `comp_plus_med/avg/l/h` | Same for Main + Extras |
419
+ | `comp_100_med/avg/l/h` | Same for Completionist |
420
+ | `comp_speed` | Speedrun any% average seconds |
421
+ | `comp_speed_min/max/med` | Speedrun spread |
422
+ | `comp_speed100` | 100% speedrun average |
423
+ | `comp_speed_count` | Speedrun submission count |
424
+ | `comp_lvl_spd` | 1 if speedrun data exists |
425
+ | `profile_dev` | Developer name |
426
+ | `profile_pub` | Publisher name |
427
+ | `profile_genre` | Comma-separated genres |
428
+ | `profile_steam` | Steam App ID (0 if not on Steam) |
429
+ | `release_world` | Full release date `"YYYY-MM-DD"` |
430
+ | `rating_esrb` | ESRB rating string (may be empty) |
431
+ | `count_replay` | Times replayed |
432
+ | `count_total` | Total user entries |
433
+
434
+ ---
435
+
436
+ ## Anti-bot measures
437
+
438
+ - **Cloudflare** is present (confirmed by `CF-Ray` response header), but does not block plain HTTP with a browser UA.
439
+ - **Token system**: Every search requires a fresh token from `/api/find/init`. Token encodes `timestamp::IP|UA|hpKey|hmacHash`. The server validates that the UA used to fetch the token matches the UA used in the search POST.
440
+ - **Honeypot field**: `hpKey` and `hpVal` from the init response must appear as a top-level field in the POST body (e.g., `{"ign_7671546b": "a6679ea54598d502", ...}`). The key name rotates per request.
441
+ - **Required headers on search POST**: `Origin: https://howlongtobeat.com` AND `Referer: https://howlongtobeat.com/` — missing either causes HTTP 403 or 404. `x-auth-token`, `x-hp-key`, `x-hp-val` are also required.
442
+ - **Required header on init GET**: `Referer: https://howlongtobeat.com/` — missing causes HTTP 403.
443
+ - **Token reuse**: A single token works for multiple searches and multiple pages. No per-request token fetch needed.
444
+ - **No CAPTCHA** observed during testing with standard UA strings.
445
+ - **Rate limits**: Not triggered during testing (token fetches + 10+ searches sequentially). Fetching many game pages in parallel (5 workers) worked without 429s.
446
+
447
+ ---
448
+
449
+ ## Gotchas
450
+
451
+ - **Completion times are in seconds** — all `comp_*` fields are integer seconds. Divide by 3600 for hours. `0` means no data (not 0 hours).
452
+
453
+ - **`release_world` is a year int in search, a full date in game page** — in the `/api/find` response, `release_world` is an integer year (e.g., `2015`). In `__NEXT_DATA__` on the game page, it's `"2015-05-19"`.
454
+
455
+ - **UA fingerprinting** — the token from `/api/find/init` encodes the User-Agent. The search POST must use the identical UA that fetched the token, or you'll get HTTP 403. Since `http_get` sends `Mozilla/5.0`, use that same string for the search POST.
456
+
457
+ - **Honeypot key name rotates** — `hpKey` is something like `ign_7671546b` (changes each token fetch). Always read it from the init response and use it dynamically. Never hardcode it.
458
+
459
+ - **Both `x-hp-key`/`x-hp-val` headers AND the body field are required** — the server checks the request headers (`x-hp-key`, `x-hp-val`) against the dynamic key in the POST body. If either is wrong or missing, you get HTTP 404 (wrong body value) or HTTP 403 (missing/wrong header).
460
+
461
+ - **`game_type` in search results** — can be `"game"`, `"dlc"`, `"expansion"`, or `"hack"`. Search results mix these by default. Filter with `if g['game_type'] == 'game'` if you only want base games.
462
+
463
+ - **Games with no submission data** — `comp_main`, `comp_plus`, `comp_100` are `0` (not `None`) when no users have submitted times. Always check `if g['comp_main']:` before dividing.
464
+
465
+ - **`individuality` (per-platform) data** — available only in `__NEXT_DATA__` on the game page, not in search results. `comp_main` etc. are strings, not ints, in this sub-object — cast with `int(plat['comp_main'])`.
466
+
467
+ - **`profile_platform` in search** — a comma-separated string that HLTB displays. Not structured. Use game page `individuality` for per-platform time breakdowns.
468
+
469
+ - **Token expiry** — if a long-running loop gets HTTP 403 with `{"error":"Session expired or invalid fingerprint"}`, call `get_token()` again and retry. Token lifetime appears to be ~15 minutes based on the timestamp embedded in the decoded value.
470
+
471
+ - **No slug-based URLs** — HLTB uses integer `game_id` for all game pages, not slugs. There is no `title-to-slug` mapping; use search to find the `game_id` first.
472
+
473
+ - **`sortCategory` options** — `"popular"` ranks by community engagement (best for "top result = intended game"). `"name"` sorts alphabetically. Other values (`"madnessTime"`, `"mainThenExtras"`) exist but return same results as `"name"` in testing.