@pencil-agent/nano-pencil 2.0.0-beta.8 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (241) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/extensions-host/index.d.ts +1 -1
  7. package/dist/core/extensions-host/loader.js +1 -1
  8. package/dist/core/extensions-host/runner.d.ts +1 -0
  9. package/dist/core/extensions-host/runner.js +2 -2
  10. package/dist/core/extensions-host/types.d.ts +17 -22
  11. package/dist/core/lib/ai/src/types.d.ts +12 -2
  12. package/dist/core/persona/persona-manager.js +5 -2
  13. package/dist/core/runtime/agent-session.js +3 -3
  14. package/dist/core/runtime/extension-core-bindings.d.ts +1 -0
  15. package/dist/core/runtime/extension-core-bindings.js +2 -2
  16. package/dist/extensions/builtin/AGENT.md +115 -115
  17. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  18. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  99. package/dist/extensions/builtin/browser/browser.md +73 -73
  100. package/dist/extensions/builtin/browser/install.md +142 -142
  101. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  102. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  104. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  105. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  112. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  113. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  114. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  115. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  116. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  117. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  118. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  119. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  120. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  121. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  122. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  123. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  124. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  125. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  126. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  127. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  128. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  129. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  130. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  131. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  132. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  133. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  134. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  135. package/dist/extensions/builtin/goal/README.md +67 -67
  136. package/dist/extensions/builtin/goal/goal-controller.d.ts +39 -10
  137. package/dist/extensions/builtin/goal/goal-controller.js +1 -1
  138. package/dist/extensions/builtin/goal/goal-format.js +1 -1
  139. package/dist/extensions/builtin/goal/goal-prompts.d.ts +2 -0
  140. package/dist/extensions/builtin/goal/goal-prompts.js +5 -4
  141. package/dist/extensions/builtin/goal/goal-store.js +1 -1
  142. package/dist/extensions/builtin/goal/index.d.ts +1 -1
  143. package/dist/extensions/builtin/goal/index.js +10 -7
  144. package/dist/extensions/builtin/grub/README.md +112 -112
  145. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  146. package/dist/extensions/builtin/link-world/index.js +6 -6
  147. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  148. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  149. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  150. package/dist/extensions/builtin/link-world/{network-routing.md → network-routing/network-routing.md} +67 -67
  151. package/dist/extensions/builtin/loop/README.md +92 -92
  152. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  153. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  154. package/dist/extensions/builtin/plan/index.js +1 -1
  155. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  156. package/dist/extensions/builtin/sal/README.md +72 -72
  157. package/dist/extensions/builtin/security-audit/README.md +289 -289
  158. package/dist/extensions/builtin/task/task-store.d.ts +4 -0
  159. package/dist/extensions/builtin/task/task-store.js +1 -1
  160. package/dist/extensions/builtin/team/AGENT.md +112 -112
  161. package/dist/extensions/builtin/team/TESTING.md +299 -299
  162. package/dist/extensions/builtin/token-save/README.md +56 -56
  163. package/dist/extensions/optional/AGENT.md +10 -10
  164. package/dist/index.d.ts +5 -30
  165. package/dist/index.js +1 -1
  166. package/dist/models.d.ts +7 -0
  167. package/dist/models.js +1 -0
  168. package/dist/modes/interactive/components/footer.js +1 -1
  169. package/dist/modes/interactive/components/task-status-panel.d.ts +36 -0
  170. package/dist/modes/interactive/components/task-status-panel.js +1 -0
  171. package/dist/modes/interactive/controllers/stream-render-controller.d.ts +7 -0
  172. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  173. package/dist/modes/interactive/interactive-mode.js +40 -40
  174. package/dist/modes/interactive/state/interactive-state.d.ts +2 -0
  175. package/dist/modes/interactive/state/interactive-state.js +1 -1
  176. package/dist/modes/interactive/theme/dark.json +85 -85
  177. package/dist/modes/interactive/theme/light.json +84 -84
  178. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  179. package/dist/modes/interactive/theme/warm.json +81 -81
  180. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  181. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  182. package/dist/node_modules/@pencil-agent/ai/dist/providers/anthropic.js +2 -2
  183. package/dist/node_modules/@pencil-agent/ai/dist/providers/openai-completions.js +5 -5
  184. package/dist/node_modules/@pencil-agent/ai/dist/providers/openai-responses.js +1 -1
  185. package/dist/node_modules/@pencil-agent/ai/dist/stream.js +1 -1
  186. package/dist/packages/protocol/src/commands.d.ts +33 -0
  187. package/dist/packages/protocol/src/flags.d.ts +20 -0
  188. package/dist/packages/protocol/src/hooks.d.ts +17 -0
  189. package/dist/packages/protocol/src/hooks.js +0 -0
  190. package/dist/packages/{extension-sdk → protocol}/src/index.d.ts +7 -4
  191. package/dist/packages/protocol/src/index.js +1 -0
  192. package/dist/packages/{extension-sdk → protocol}/src/lifecycle.d.ts +15 -27
  193. package/dist/packages/protocol/src/lifecycle.js +0 -0
  194. package/dist/packages/{extension-sdk → protocol}/src/tools.d.ts +1 -1
  195. package/dist/packages/protocol/src/tools.js +0 -0
  196. package/dist/public-config.d.ts +12 -0
  197. package/dist/public-config.js +1 -0
  198. package/dist/runtime.d.ts +9 -0
  199. package/dist/runtime.js +1 -0
  200. package/dist/session-compaction.d.ts +7 -0
  201. package/dist/session-compaction.js +1 -0
  202. package/dist/session.d.ts +7 -0
  203. package/dist/session.js +1 -0
  204. package/dist/skills.d.ts +7 -0
  205. package/dist/skills.js +1 -0
  206. package/dist/tools.d.ts +7 -0
  207. package/dist/tools.js +1 -0
  208. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  209. package/docs/SDK-TESTING.md +364 -0
  210. package/docs/codex-goal-command-impl.md +1055 -1055
  211. package/docs/codex-goal-vs-grub.md +500 -500
  212. package/docs/custom-provider.md +27 -27
  213. package/docs/extensions.md +27 -27
  214. package/docs/keybindings.md +27 -27
  215. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  216. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  217. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  218. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  219. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  220. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  221. package/docs/loop-usage-examples.md +214 -214
  222. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  223. package/docs/models.md +27 -27
  224. package/docs/packages.md +27 -27
  225. package/docs/pi-design-philosophy.md +457 -457
  226. package/docs/planmode.md +1987 -1987
  227. package/docs/prompt-templates.md +27 -27
  228. package/docs/providers.md +27 -27
  229. package/docs/sdk.md +27 -27
  230. package/docs/skills.md +27 -27
  231. package/docs/startup-performance-optimization.md +301 -0
  232. package/docs/themes.md +27 -27
  233. package/docs/tui.md +27 -27
  234. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  235. package/package.json +190 -162
  236. package/dist/packages/extension-sdk/src/index.js +0 -1
  237. package/docs/cc-agent-design.md +0 -1297
  238. package/docs/cc-tui-design.md +0 -1333
  239. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  240. /package/dist/packages/{extension-sdk/src/lifecycle.js → protocol/src/commands.js} +0 -0
  241. /package/dist/packages/{extension-sdk/src/tools.js → protocol/src/flags.js} +0 -0
@@ -1,472 +1,472 @@
1
- # Open Library — Book Data Extraction
2
-
3
- `https://openlibrary.org` — Internet Archive's free book catalog. All endpoints are public JSON APIs — no auth, no browser, no scraping required.
4
-
5
- ## Do this first
6
-
7
- **Every task is a direct HTTP call — never open the browser.**
8
-
9
- ```python
10
- import json
11
- from helpers import http_get
12
-
13
- # Search by title
14
- results = json.loads(http_get("https://openlibrary.org/search.json?q=dune&limit=5"))
15
- # results['numFound'] == 49090
16
- # results['docs'] == list of work objects
17
- # results['start'] == 0 (offset for pagination)
18
- ```
19
-
20
- The search API is your entry point for everything. It returns work-level records (all editions grouped). To get edition details, follow the `key` to the Works or Books API.
21
-
22
- ---
23
-
24
- ## Common workflows
25
-
26
- ### Search by query, author, title, or ISBN
27
-
28
- ```python
29
- import json
30
- from helpers import http_get
31
-
32
- # Free-text search
33
- r = json.loads(http_get("https://openlibrary.org/search.json?q=dune+frank+herbert&limit=5"))
34
-
35
- # Author search
36
- r = json.loads(http_get(
37
- "https://openlibrary.org/search.json?author=tolkien&limit=5"
38
- "&fields=title,author_name,first_publish_year,isbn"
39
- ))
40
- # fields=* returns all available fields; default returns ~15
41
-
42
- # Title + author combined
43
- r = json.loads(http_get(
44
- "https://openlibrary.org/search.json?title=dune&author=frank+herbert&limit=3"
45
- "&fields=title,author_name,edition_count,first_publish_year"
46
- ))
47
- # r['docs'][0]['title'] == 'Dune'
48
- # r['docs'][0]['author_name'] == ['Frank Herbert']
49
- # r['docs'][0]['first_publish_year'] == 1965
50
- # r['docs'][0]['edition_count'] == 120
51
-
52
- # ISBN lookup (returns 0–2 results for the same work)
53
- r = json.loads(http_get("https://openlibrary.org/search.json?isbn=9780743273565"))
54
- # r['numFound'] == 2
55
- # r['docs'][0]['title'] == 'The Great Gatsby'
56
- # r['docs'][0]['key'] == '/works/OL468431W'
57
- ```
58
-
59
- **Sort options** (`&sort=`): `new` (recently added), `old`, `random`, `editions` (most editions), `scans` (most scans). Default is relevance.
60
-
61
- **Language filter**: `&language=fre` (ISO 639-2/B codes: `eng`, `fre`, `ger`, `spa`, `ita`, etc.)
62
-
63
- **Pagination**: `&limit=N&offset=N`. Max limit not enforced but keep under 100 for reliability.
64
-
65
- #### Search doc fields (default — ~15 keys always present)
66
-
67
- | Field | Type | Notes |
68
- |---|---|---|
69
- | `key` | str | `/works/OL893415W` — use for Works API |
70
- | `title` | str | Work title |
71
- | `author_name` | list[str] | e.g. `['Frank Herbert']` |
72
- | `author_key` | list[str] | e.g. `['OL79034A']` |
73
- | `first_publish_year` | int | |
74
- | `edition_count` | int | Number of editions across all languages |
75
- | `cover_i` | int | Cover image ID — use with covers API |
76
- | `cover_edition_key` | str | e.g. `OL7353617M` |
77
- | `language` | list[str] | ISO codes of all editions |
78
- | `ia` | list[str] | Internet Archive identifiers (when has_fulltext=true) |
79
- | `ebook_access` | str | `'public'`, `'borrowable'`, `'no_ebook'` |
80
- | `has_fulltext` | bool | |
81
-
82
- #### Extra fields with `&fields=*`
83
-
84
- ```python
85
- # With fields=* you also get:
86
- # 'isbn' list[str] All ISBNs across editions
87
- # 'publisher' list[str] All publishers ever
88
- # 'publish_date' list[str] All publish dates (strings, inconsistent formats)
89
- # 'publish_year' list[int] Parsed years
90
- # 'subject' list[str] Subject headings
91
- # 'person' list[str] Subject persons (e.g. 'Big Brother')
92
- # 'place' list[str] Subject places
93
- # 'time' list[str] Subject times
94
- # 'number_of_pages_median' int Median page count across editions
95
- # 'ratings_average' float e.g. 4.29
96
- # 'ratings_count' int
97
- # 'want_to_read_count' int
98
- # 'already_read_count' int
99
- # 'currently_reading_count'int
100
- # 'readinglog_count' int Total of all reading log entries
101
- # 'first_sentence' list[str]
102
- # 'id_goodreads' list[str]
103
- # 'id_librarything' list[str]
104
- # 'id_wikidata' list[str]
105
- # 'ddc' list[str] Dewey Decimal Classification
106
- # 'lcc' list[str] Library of Congress Classification
107
- # 'lccn' list[str]
108
- ```
109
-
110
- ---
111
-
112
- ### Bulk ISBN lookups (parallel)
113
-
114
- ```python
115
- import json
116
- from helpers import http_get
117
- from concurrent.futures import ThreadPoolExecutor
118
-
119
- isbns = ['9780743273565', '9780451524935', '9780618346257']
120
-
121
- def lookup_isbn(isbn):
122
- url = f"https://openlibrary.org/search.json?isbn={isbn}&fields=title,author_name,first_publish_year,key"
123
- r = json.loads(http_get(url))
124
- if r['docs']:
125
- d = r['docs'][0]
126
- return {'isbn': isbn, 'title': d.get('title'), 'author': d.get('author_name', [None])[0],
127
- 'year': d.get('first_publish_year'), 'key': d.get('key')}
128
- return {'isbn': isbn, 'found': False}
129
-
130
- with ThreadPoolExecutor(max_workers=5) as ex:
131
- books = list(ex.map(lookup_isbn, isbns))
132
-
133
- # [{'isbn': '9780743273565', 'title': 'The Great Gatsby', 'author': 'F. Scott Fitzgerald', 'year': 1920, ...},
134
- # {'isbn': '9780451524935', 'title': 'Nineteen Eighty-Four', 'author': 'George Orwell', 'year': 1949, ...},
135
- # {'isbn': '9780618346257', 'title': 'The Fellowship of the Ring', 'author': 'J.R.R. Tolkien', 'year': 1954, ...}]
136
- ```
137
-
138
- ---
139
-
140
- ### Works API (editions grouped by title)
141
-
142
- Returns all metadata for a work (all editions combined). Get the work ID from `key` in search results.
143
-
144
- ```python
145
- import json
146
- from helpers import http_get
147
-
148
- work_id = 'OL893415W' # from search doc['key'] = '/works/OL893415W'
149
- work = json.loads(http_get(f"https://openlibrary.org/works/{work_id}.json"))
150
-
151
- # work['title'] == 'Dune'
152
- # work['key'] == '/works/OL893415W'
153
- # work['covers'] == [11481354, 12375564, 11157826] ← cover IDs for covers API
154
- # work['subjects'] == ['Dune (Imaginary place)', 'Fiction', ...]
155
- # work['subject_places'] == [...] ← geographic subjects (may be absent)
156
- # work['subject_people'] == [...] ← person subjects (may be absent)
157
- # work['subject_times'] == [...] ← time subjects (may be absent)
158
- # work['authors'] == [{'author': {'key': '/authors/OL79034A'}, 'type': {...}}]
159
- # work['description'] → either str OR {'type': '/type/text', 'value': str} ← see gotchas
160
- # work['created'] == {'type': '/type/datetime', 'value': '2009-10-15T11:34:21.437031'}
161
- # work['last_modified'] same shape as created
162
- ```
163
-
164
- Helper for the description field (which has two possible shapes):
165
-
166
- ```python
167
- def get_description(work: dict) -> str:
168
- desc = work.get('description', '')
169
- if isinstance(desc, dict):
170
- return desc.get('value', '')
171
- return desc or ''
172
- ```
173
-
174
- #### Works editions (paginated list of all editions)
175
-
176
- ```python
177
- editions_resp = json.loads(http_get(
178
- f"https://openlibrary.org/works/{work_id}/editions.json?limit=10&offset=0"
179
- ))
180
- # editions_resp['size'] == 120 (total edition count)
181
- # editions_resp['entries'] == [...] (up to limit items)
182
- # editions_resp['links'] == {'self': '...', 'work': '...', 'next': '...', 'prev': '...'}
183
- # ← use links['next'] for pagination when offset+limit < size
184
-
185
- e = editions_resp['entries'][0]
186
- # e['title'] == 'Duna'
187
- # e['publishers'] == ['Editora Aleph']
188
- # e['publish_date'] == '19/08/2017' ← inconsistent format, string
189
- # e['isbn_13'] == ['9788576573135']
190
- # e['isbn_10'] == ['857657313X']
191
- # e['covers'] == [10368109]
192
- # e['number_of_pages'] == 680
193
- # e['languages'] == [{'key': '/languages/por'}]
194
- # e['key'] == '/books/OL28969075M'
195
- # e['physical_format'] == 'Paperback' (often missing)
196
- # e['notes'] → str or {'value': str} (often missing)
197
- ```
198
-
199
- ---
200
-
201
- ### Books API (specific edition)
202
-
203
- Two sub-APIs: direct JSON for raw data, or `api/books` for enriched data.
204
-
205
- #### Direct edition JSON
206
-
207
- ```python
208
- import json
209
- from helpers import http_get
210
-
211
- edition_id = 'OL7353617M' # from editions list e['key'] or cover_edition_key in search
212
- edition = json.loads(http_get(f"https://openlibrary.org/books/{edition_id}.json"))
213
-
214
- # edition['title'] == 'Fantastic Mr. Fox'
215
- # edition['publishers'] == ['Puffin']
216
- # edition['publish_date'] == 'October 1, 1988'
217
- # edition['isbn_13'] == ['9780140328721']
218
- # edition['isbn_10'] == ['0140328726']
219
- # edition['number_of_pages'] == 96
220
- # edition['covers'] == [...] ← cover IDs
221
- # edition['languages'] == [{'key': '/languages/eng'}]
222
- # edition['works'] == [{'key': '/works/OL45804W'}]
223
- # edition['authors'] == [{'key': '/authors/OL34184A'}]
224
- # edition['identifiers'] == {'goodreads': [...], 'librarything': [...]}
225
- # edition['first_sentence'] == {'value': '...'} or str (often missing)
226
- # edition['ocaid'] == 'fantast00dahl' ← Internet Archive ID (if available)
227
- ```
228
-
229
- #### Bibkeys API (enriched, multiple books at once)
230
-
231
- ```python
232
- # jscmd=data: cleaned up dict with cover URLs pre-built
233
- r = json.loads(http_get(
234
- "https://openlibrary.org/api/books"
235
- "?bibkeys=ISBN:9780743273565,ISBN:9780451524935"
236
- "&format=json&jscmd=data"
237
- ))
238
- # r == {'ISBN:9780743273565': {...}, 'ISBN:9780451524935': {...}}
239
-
240
- book = r['ISBN:9780743273565']
241
- # book['title'] == 'The Great Gatsby'
242
- # book['authors'] == [{'url': '...', 'name': 'F. Scott Fitzgerald'}]
243
- # book['publish_date'] == '2021'
244
- # book['publishers'] == [{'name': 'Independently Published'}]
245
- # book['number_of_pages'] == 208
246
- # book['url'] == 'http://openlibrary.org/books/OL46773254M/The_Great_Gatsby'
247
- # book['key'] == '/books/OL46773254M'
248
- # book['cover'] == {'small': '...S.jpg', 'medium': '...M.jpg', 'large': '...L.jpg'}
249
- # book['identifiers'] == {'isbn_13': [...], 'openlibrary': [...]}
250
- # book['subjects'] == [{'name': 'Modern fiction', 'url': '...'}, ...]
251
- # book['subject_places'] == None ← often null even with jscmd=data
252
-
253
- # jscmd=details: raw edition JSON + extra fields
254
- r2 = json.loads(http_get(
255
- "https://openlibrary.org/api/books"
256
- "?bibkeys=ISBN:9780743273565&format=json&jscmd=details"
257
- ))
258
- item = r2['ISBN:9780743273565']
259
- # item['bib_key'] == 'ISBN:9780743273565'
260
- # item['info_url'] == 'http://openlibrary.org/books/OL...'
261
- # item['preview'] == 'noview' | 'restricted' | 'full'
262
- # item['preview_url'] == URL to read on OL or IA
263
- # item['thumbnail_url']== 'https://covers.openlibrary.org/b/id/14314120-S.jpg'
264
- # item['details'] → raw edition JSON (same as /books/OL...M.json)
265
-
266
- # Supported bibkey prefixes: ISBN:, OCLC:, LCCN:, OLID: (e.g. OLID:OL46773254M)
267
- ```
268
-
269
- ---
270
-
271
- ### Authors API
272
-
273
- ```python
274
- import json
275
- from helpers import http_get
276
-
277
- # Lookup by known author key
278
- author = json.loads(http_get("https://openlibrary.org/authors/OL26320A.json"))
279
- # OL26320A is J.R.R. Tolkien (note: not Frank Herbert as originally stated — verify with search)
280
-
281
- # author['name'] == 'J.R.R. Tolkien'
282
- # author['fuller_name'] == 'John Ronald Reuel Tolkien'
283
- # author['personal_name'] == 'J. R. R. Tolkien'
284
- # author['birth_date'] == '3 January 1892' ← string, not parsed
285
- # author['death_date'] == '2 September 1973'
286
- # author['bio'] → str or {'type': '/type/text', 'value': str}
287
- # author['photos'] == [6155606, 6433524, ...] ← photo IDs for covers API
288
- # author['links'] == [{'title': '...', 'url': '...', 'type': {...}}, ...]
289
- # author['remote_ids'] == {'wikidata': 'Q892', 'viaf': '95218067', ...}
290
- # author['alternate_names']== ['J. R. R. Tolkien', 'TOLKIEN', ...]
291
- # author['key'] == '/authors/OL26320A'
292
- # author['wikipedia'] → URL string or None
293
-
294
- # Author works (paginated)
295
- works = json.loads(http_get("https://openlibrary.org/authors/OL26320A/works.json?limit=5"))
296
- # works['size'] == 415
297
- # works['entries'] == [{title, key, covers, authors, created, ...}, ...]
298
- # works['links'] == {'self': '...', 'next': '...'}
299
- ```
300
-
301
- #### Author search
302
-
303
- ```python
304
- r = json.loads(http_get("https://openlibrary.org/search/authors.json?q=tolkien"))
305
- # r['numFound'] == 40
306
- # r['docs'][0]:
307
- # 'name' == 'Christopher Tolkien'
308
- # 'key' == 'OL2623360A' ← NOTE: no /authors/ prefix here
309
- # 'birth_date' == '21 November 1924'
310
- # 'death_date' == '16 January 2020'
311
- # 'top_work' == 'The War of the Ring'
312
- # 'work_count' == 43
313
- # 'top_subjects' == [...]
314
- # 'alternate_names' == [...]
315
- # 'ratings_average' float
316
- # 'want_to_read_count' int
317
- # 'already_read_count' int
318
- # 'currently_reading_count'int
319
- ```
320
-
321
- ---
322
-
323
- ### Cover images
324
-
325
- Covers are served directly as JPEG — redirect to Internet Archive CDN. No auth needed.
326
-
327
- ```
328
- # Book covers — three key types:
329
- https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg # by cover ID (most reliable)
330
- https://covers.openlibrary.org/b/isbn/{isbn}-{size}.jpg # by ISBN
331
- https://covers.openlibrary.org/b/olid/{edition_id}-{size}.jpg # by edition OLID (unreliable — see gotchas)
332
-
333
- # Author photos:
334
- https://covers.openlibrary.org/a/id/{photo_id}-{size}.jpg
335
-
336
- # Sizes: S (small), M (medium), L (large)
337
- ```
338
-
339
- ```python
340
- import urllib.request
341
-
342
- def get_cover_bytes(cover_id: int, size: str = 'M') -> bytes | None:
343
- """Fetch cover image bytes. Returns None if no cover (43-byte GIF placeholder)."""
344
- url = f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
345
- req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
346
- with urllib.request.urlopen(req, timeout=15) as resp:
347
- data = resp.read()
348
- return None if len(data) == 43 else data # 43-byte GIF = no cover placeholder
349
-
350
- # Or just get the URL for embedding:
351
- def cover_url(cover_id: int, size: str = 'M') -> str:
352
- return f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
353
-
354
- # Usage:
355
- from helpers import http_get
356
- import json
357
- work = json.loads(http_get("https://openlibrary.org/works/OL893415W.json"))
358
- if work.get('covers'):
359
- img = get_cover_bytes(work['covers'][0], 'L') # first cover, large
360
- # img is ~20–80KB JPEG bytes, redirected from ia*.archive.org
361
- ```
362
-
363
- To get cover by ISBN directly (e.g. for UI without a full book lookup):
364
- ```python
365
- # Medium-size cover by ISBN:
366
- url = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg"
367
- # Redirects to Internet Archive CDN, content-type: image/jpeg
368
- # Use ?default=false to get 404 instead of 1×1 GIF placeholder for missing covers
369
- url_safe = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg?default=false"
370
- ```
371
-
372
- ---
373
-
374
- ### Subjects API
375
-
376
- ```python
377
- import json
378
- from helpers import http_get
379
-
380
- # Subject slugs: lowercase, underscores for spaces
381
- r = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5"))
382
- # r['name'] == 'science fiction'
383
- # r['subject_type'] == 'subject' ← also: 'person', 'place', 'time'
384
- # r['work_count'] == 20973
385
- # r['works'] == [{title, key, cover_id, authors, edition_count, ...}, ...]
386
-
387
- w = r['works'][0]
388
- # w['title'] == 'Alice\'s Adventures in Wonderland'
389
- # w['key'] == '/works/OL138052W'
390
- # w['cover_id'] == 10527843
391
- # w['cover_edition_key']== 'OL...'
392
- # w['authors'] == [{'key': '/authors/OL22098A', 'name': 'Lewis Carroll'}]
393
- # w['edition_count'] == 3546
394
- # w['first_publish_year']== ...
395
- # w['has_fulltext'] == True | False
396
- # w['ia'] == 'identifier' (Internet Archive ID when available)
397
-
398
- # Pagination: &offset=N
399
- # Place subject:
400
- r2 = json.loads(http_get("https://openlibrary.org/subjects/place:london.json?limit=5"))
401
- # r2['subject_type'] == 'place', r2['work_count'] == 23927
402
-
403
- # Person subject:
404
- # https://openlibrary.org/subjects/person:napoleon.json?limit=5
405
- # Time subject:
406
- # https://openlibrary.org/subjects/time:middle_ages.json?limit=5
407
-
408
- # Combine with ebooks=true to filter to only freely readable books:
409
- r3 = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5&ebooks=true"))
410
- # r3['works'][i]['has_fulltext'] == True for all results
411
- ```
412
-
413
- ---
414
-
415
- ### Trending books
416
-
417
- ```python
418
- import json
419
- from helpers import http_get
420
-
421
- for period in ['daily', 'weekly', 'monthly']:
422
- r = json.loads(http_get(f"https://openlibrary.org/trending/{period}.json?limit=10"))
423
- # r['works'] == list of search-doc-style objects
424
- # r['days'] == int (time window)
425
- # r['hours'] == int
426
- # Same fields as search docs (title, author_name, cover_i, key, ...)
427
- print(period, r['works'][0]['title']) # e.g. 'Atomic Habits'
428
- ```
429
-
430
- ---
431
-
432
- ## Rate limits
433
-
434
- No authentication required. No API key. No explicit rate limit published.
435
-
436
- Observed in testing: 5 requests completed in ~1 second with no throttling, no 429s. The API is served from CDN/Solr — in practice you can make 10–20 parallel requests without issue. For bulk operations (hundreds of ISBNs), use `ThreadPoolExecutor(max_workers=5)` to be a good citizen.
437
-
438
- **No `User-Agent` override needed** — the default `Mozilla/5.0` from `http_get` is accepted by all Open Library endpoints (unlike Nominatim which blocks it).
439
-
440
- ---
441
-
442
- ## Gotchas
443
-
444
- **`description` field has two shapes.** Both are real — check at runtime:
445
- ```python
446
- desc = work.get('description', '')
447
- text = desc.get('value', '') if isinstance(desc, dict) else (desc or '')
448
- ```
449
-
450
- **`/works/OL45804W` is Fantastic Mr. Fox, not Dune.** The OL IDs in the original prompt were placeholders. Always resolve real IDs via the search API rather than hardcoding them.
451
-
452
- **Author search `key` has no prefix.** `/search/authors.json` returns `key: 'OL26320A'`, but the Authors API and all other APIs use `/authors/OL26320A`. Add the prefix manually when constructing follow-up URLs.
453
-
454
- **Missing cover → 43-byte GIF placeholder, not 404.** Without `?default=false`, the covers API returns a 1×1 transparent GIF instead of HTTP 404 for unknown IDs. Check `len(data) == 43` to detect missing covers.
455
-
456
- **`covers.openlibrary.org/b/olid/{work_id}` is unreliable.** OLID-based cover URLs for work IDs (OL...W) return the placeholder even when covers exist. Always use `b/id/{cover_id}` (from `work['covers'][0]`) or `b/isbn/{isbn}` instead.
457
-
458
- **Bibkeys API picks one edition per ISBN.** When the same ISBN appears on multiple editions (reprint, reissue), `api/books?bibkeys=ISBN:...` returns one — and it may not be the most common edition.
459
-
460
- **`publish_date` is a raw string.** Values like `'October 1, 1988'`, `'19/08/2017'`, `'2021'`, and `'1965-01-01'` all appear. Don't parse without normalization.
461
-
462
- **`/works/.../editions.json` pagination uses `links.next`.** Unlike search (which uses `offset=`), check `links['next']` in the response to know if more pages exist:
463
- ```python
464
- resp = json.loads(http_get("https://openlibrary.org/works/OL893415W/editions.json?limit=50"))
465
- while 'next' in resp.get('links', {}):
466
- resp = json.loads(http_get("https://openlibrary.org" + resp['links']['next']))
467
- # process resp['entries']
468
- ```
469
-
470
- **404 for non-existent IDs.** `/works/OL99999999W.json`, `/books/OL99999999M.json`, and `/authors/OL99999999A.json` all raise `HTTPError: HTTP Error 404: Not Found`. Wrap in try/except.
471
-
472
- **Search `docs` default fields are minimal.** The default response includes ~15 fields. Add `&fields=*` to get all 100+ Solr fields (ratings, ISBNs, publishers, subjects, Goodreads IDs, etc.). Alternatively specify exactly what you need: `&fields=title,isbn,ratings_average`.
1
+ # Open Library — Book Data Extraction
2
+
3
+ `https://openlibrary.org` — Internet Archive's free book catalog. All endpoints are public JSON APIs — no auth, no browser, no scraping required.
4
+
5
+ ## Do this first
6
+
7
+ **Every task is a direct HTTP call — never open the browser.**
8
+
9
+ ```python
10
+ import json
11
+ from helpers import http_get
12
+
13
+ # Search by title
14
+ results = json.loads(http_get("https://openlibrary.org/search.json?q=dune&limit=5"))
15
+ # results['numFound'] == 49090
16
+ # results['docs'] == list of work objects
17
+ # results['start'] == 0 (offset for pagination)
18
+ ```
19
+
20
+ The search API is your entry point for everything. It returns work-level records (all editions grouped). To get edition details, follow the `key` to the Works or Books API.
21
+
22
+ ---
23
+
24
+ ## Common workflows
25
+
26
+ ### Search by query, author, title, or ISBN
27
+
28
+ ```python
29
+ import json
30
+ from helpers import http_get
31
+
32
+ # Free-text search
33
+ r = json.loads(http_get("https://openlibrary.org/search.json?q=dune+frank+herbert&limit=5"))
34
+
35
+ # Author search
36
+ r = json.loads(http_get(
37
+ "https://openlibrary.org/search.json?author=tolkien&limit=5"
38
+ "&fields=title,author_name,first_publish_year,isbn"
39
+ ))
40
+ # fields=* returns all available fields; default returns ~15
41
+
42
+ # Title + author combined
43
+ r = json.loads(http_get(
44
+ "https://openlibrary.org/search.json?title=dune&author=frank+herbert&limit=3"
45
+ "&fields=title,author_name,edition_count,first_publish_year"
46
+ ))
47
+ # r['docs'][0]['title'] == 'Dune'
48
+ # r['docs'][0]['author_name'] == ['Frank Herbert']
49
+ # r['docs'][0]['first_publish_year'] == 1965
50
+ # r['docs'][0]['edition_count'] == 120
51
+
52
+ # ISBN lookup (returns 0–2 results for the same work)
53
+ r = json.loads(http_get("https://openlibrary.org/search.json?isbn=9780743273565"))
54
+ # r['numFound'] == 2
55
+ # r['docs'][0]['title'] == 'The Great Gatsby'
56
+ # r['docs'][0]['key'] == '/works/OL468431W'
57
+ ```
58
+
59
+ **Sort options** (`&sort=`): `new` (recently added), `old`, `random`, `editions` (most editions), `scans` (most scans). Default is relevance.
60
+
61
+ **Language filter**: `&language=fre` (ISO 639-2/B codes: `eng`, `fre`, `ger`, `spa`, `ita`, etc.)
62
+
63
+ **Pagination**: `&limit=N&offset=N`. Max limit not enforced but keep under 100 for reliability.
64
+
65
+ #### Search doc fields (default — ~15 keys always present)
66
+
67
+ | Field | Type | Notes |
68
+ |---|---|---|
69
+ | `key` | str | `/works/OL893415W` — use for Works API |
70
+ | `title` | str | Work title |
71
+ | `author_name` | list[str] | e.g. `['Frank Herbert']` |
72
+ | `author_key` | list[str] | e.g. `['OL79034A']` |
73
+ | `first_publish_year` | int | |
74
+ | `edition_count` | int | Number of editions across all languages |
75
+ | `cover_i` | int | Cover image ID — use with covers API |
76
+ | `cover_edition_key` | str | e.g. `OL7353617M` |
77
+ | `language` | list[str] | ISO codes of all editions |
78
+ | `ia` | list[str] | Internet Archive identifiers (when has_fulltext=true) |
79
+ | `ebook_access` | str | `'public'`, `'borrowable'`, `'no_ebook'` |
80
+ | `has_fulltext` | bool | |
81
+
82
+ #### Extra fields with `&fields=*`
83
+
84
+ ```python
85
+ # With fields=* you also get:
86
+ # 'isbn' list[str] All ISBNs across editions
87
+ # 'publisher' list[str] All publishers ever
88
+ # 'publish_date' list[str] All publish dates (strings, inconsistent formats)
89
+ # 'publish_year' list[int] Parsed years
90
+ # 'subject' list[str] Subject headings
91
+ # 'person' list[str] Subject persons (e.g. 'Big Brother')
92
+ # 'place' list[str] Subject places
93
+ # 'time' list[str] Subject times
94
+ # 'number_of_pages_median' int Median page count across editions
95
+ # 'ratings_average' float e.g. 4.29
96
+ # 'ratings_count' int
97
+ # 'want_to_read_count' int
98
+ # 'already_read_count' int
99
+ # 'currently_reading_count'int
100
+ # 'readinglog_count' int Total of all reading log entries
101
+ # 'first_sentence' list[str]
102
+ # 'id_goodreads' list[str]
103
+ # 'id_librarything' list[str]
104
+ # 'id_wikidata' list[str]
105
+ # 'ddc' list[str] Dewey Decimal Classification
106
+ # 'lcc' list[str] Library of Congress Classification
107
+ # 'lccn' list[str]
108
+ ```
109
+
110
+ ---
111
+
112
+ ### Bulk ISBN lookups (parallel)
113
+
114
+ ```python
115
+ import json
116
+ from helpers import http_get
117
+ from concurrent.futures import ThreadPoolExecutor
118
+
119
+ isbns = ['9780743273565', '9780451524935', '9780618346257']
120
+
121
+ def lookup_isbn(isbn):
122
+ url = f"https://openlibrary.org/search.json?isbn={isbn}&fields=title,author_name,first_publish_year,key"
123
+ r = json.loads(http_get(url))
124
+ if r['docs']:
125
+ d = r['docs'][0]
126
+ return {'isbn': isbn, 'title': d.get('title'), 'author': d.get('author_name', [None])[0],
127
+ 'year': d.get('first_publish_year'), 'key': d.get('key')}
128
+ return {'isbn': isbn, 'found': False}
129
+
130
+ with ThreadPoolExecutor(max_workers=5) as ex:
131
+ books = list(ex.map(lookup_isbn, isbns))
132
+
133
+ # [{'isbn': '9780743273565', 'title': 'The Great Gatsby', 'author': 'F. Scott Fitzgerald', 'year': 1920, ...},
134
+ # {'isbn': '9780451524935', 'title': 'Nineteen Eighty-Four', 'author': 'George Orwell', 'year': 1949, ...},
135
+ # {'isbn': '9780618346257', 'title': 'The Fellowship of the Ring', 'author': 'J.R.R. Tolkien', 'year': 1954, ...}]
136
+ ```
137
+
138
+ ---
139
+
140
+ ### Works API (editions grouped by title)
141
+
142
+ Returns all metadata for a work (all editions combined). Get the work ID from `key` in search results.
143
+
144
+ ```python
145
+ import json
146
+ from helpers import http_get
147
+
148
+ work_id = 'OL893415W' # from search doc['key'] = '/works/OL893415W'
149
+ work = json.loads(http_get(f"https://openlibrary.org/works/{work_id}.json"))
150
+
151
+ # work['title'] == 'Dune'
152
+ # work['key'] == '/works/OL893415W'
153
+ # work['covers'] == [11481354, 12375564, 11157826] ← cover IDs for covers API
154
+ # work['subjects'] == ['Dune (Imaginary place)', 'Fiction', ...]
155
+ # work['subject_places'] == [...] ← geographic subjects (may be absent)
156
+ # work['subject_people'] == [...] ← person subjects (may be absent)
157
+ # work['subject_times'] == [...] ← time subjects (may be absent)
158
+ # work['authors'] == [{'author': {'key': '/authors/OL79034A'}, 'type': {...}}]
159
+ # work['description'] → either str OR {'type': '/type/text', 'value': str} ← see gotchas
160
+ # work['created'] == {'type': '/type/datetime', 'value': '2009-10-15T11:34:21.437031'}
161
+ # work['last_modified'] same shape as created
162
+ ```
163
+
164
+ Helper for the description field (which has two possible shapes):
165
+
166
+ ```python
167
+ def get_description(work: dict) -> str:
168
+ desc = work.get('description', '')
169
+ if isinstance(desc, dict):
170
+ return desc.get('value', '')
171
+ return desc or ''
172
+ ```
173
+
174
+ #### Works editions (paginated list of all editions)
175
+
176
+ ```python
177
+ editions_resp = json.loads(http_get(
178
+ f"https://openlibrary.org/works/{work_id}/editions.json?limit=10&offset=0"
179
+ ))
180
+ # editions_resp['size'] == 120 (total edition count)
181
+ # editions_resp['entries'] == [...] (up to limit items)
182
+ # editions_resp['links'] == {'self': '...', 'work': '...', 'next': '...', 'prev': '...'}
183
+ # ← use links['next'] for pagination when offset+limit < size
184
+
185
+ e = editions_resp['entries'][0]
186
+ # e['title'] == 'Duna'
187
+ # e['publishers'] == ['Editora Aleph']
188
+ # e['publish_date'] == '19/08/2017' ← inconsistent format, string
189
+ # e['isbn_13'] == ['9788576573135']
190
+ # e['isbn_10'] == ['857657313X']
191
+ # e['covers'] == [10368109]
192
+ # e['number_of_pages'] == 680
193
+ # e['languages'] == [{'key': '/languages/por'}]
194
+ # e['key'] == '/books/OL28969075M'
195
+ # e['physical_format'] == 'Paperback' (often missing)
196
+ # e['notes'] → str or {'value': str} (often missing)
197
+ ```
198
+
199
+ ---
200
+
201
+ ### Books API (specific edition)
202
+
203
+ Two sub-APIs: direct JSON for raw data, or `api/books` for enriched data.
204
+
205
+ #### Direct edition JSON
206
+
207
+ ```python
208
+ import json
209
+ from helpers import http_get
210
+
211
+ edition_id = 'OL7353617M' # from editions list e['key'] or cover_edition_key in search
212
+ edition = json.loads(http_get(f"https://openlibrary.org/books/{edition_id}.json"))
213
+
214
+ # edition['title'] == 'Fantastic Mr. Fox'
215
+ # edition['publishers'] == ['Puffin']
216
+ # edition['publish_date'] == 'October 1, 1988'
217
+ # edition['isbn_13'] == ['9780140328721']
218
+ # edition['isbn_10'] == ['0140328726']
219
+ # edition['number_of_pages'] == 96
220
+ # edition['covers'] == [...] ← cover IDs
221
+ # edition['languages'] == [{'key': '/languages/eng'}]
222
+ # edition['works'] == [{'key': '/works/OL45804W'}]
223
+ # edition['authors'] == [{'key': '/authors/OL34184A'}]
224
+ # edition['identifiers'] == {'goodreads': [...], 'librarything': [...]}
225
+ # edition['first_sentence'] == {'value': '...'} or str (often missing)
226
+ # edition['ocaid'] == 'fantast00dahl' ← Internet Archive ID (if available)
227
+ ```
228
+
229
+ #### Bibkeys API (enriched, multiple books at once)
230
+
231
+ ```python
232
+ # jscmd=data: cleaned up dict with cover URLs pre-built
233
+ r = json.loads(http_get(
234
+ "https://openlibrary.org/api/books"
235
+ "?bibkeys=ISBN:9780743273565,ISBN:9780451524935"
236
+ "&format=json&jscmd=data"
237
+ ))
238
+ # r == {'ISBN:9780743273565': {...}, 'ISBN:9780451524935': {...}}
239
+
240
+ book = r['ISBN:9780743273565']
241
+ # book['title'] == 'The Great Gatsby'
242
+ # book['authors'] == [{'url': '...', 'name': 'F. Scott Fitzgerald'}]
243
+ # book['publish_date'] == '2021'
244
+ # book['publishers'] == [{'name': 'Independently Published'}]
245
+ # book['number_of_pages'] == 208
246
+ # book['url'] == 'http://openlibrary.org/books/OL46773254M/The_Great_Gatsby'
247
+ # book['key'] == '/books/OL46773254M'
248
+ # book['cover'] == {'small': '...S.jpg', 'medium': '...M.jpg', 'large': '...L.jpg'}
249
+ # book['identifiers'] == {'isbn_13': [...], 'openlibrary': [...]}
250
+ # book['subjects'] == [{'name': 'Modern fiction', 'url': '...'}, ...]
251
+ # book['subject_places'] == None ← often null even with jscmd=data
252
+
253
+ # jscmd=details: raw edition JSON + extra fields
254
+ r2 = json.loads(http_get(
255
+ "https://openlibrary.org/api/books"
256
+ "?bibkeys=ISBN:9780743273565&format=json&jscmd=details"
257
+ ))
258
+ item = r2['ISBN:9780743273565']
259
+ # item['bib_key'] == 'ISBN:9780743273565'
260
+ # item['info_url'] == 'http://openlibrary.org/books/OL...'
261
+ # item['preview'] == 'noview' | 'restricted' | 'full'
262
+ # item['preview_url'] == URL to read on OL or IA
263
+ # item['thumbnail_url']== 'https://covers.openlibrary.org/b/id/14314120-S.jpg'
264
+ # item['details'] → raw edition JSON (same as /books/OL...M.json)
265
+
266
+ # Supported bibkey prefixes: ISBN:, OCLC:, LCCN:, OLID: (e.g. OLID:OL46773254M)
267
+ ```
268
+
269
+ ---
270
+
271
+ ### Authors API
272
+
273
+ ```python
274
+ import json
275
+ from helpers import http_get
276
+
277
+ # Lookup by known author key
278
+ author = json.loads(http_get("https://openlibrary.org/authors/OL26320A.json"))
279
+ # OL26320A is J.R.R. Tolkien (note: not Frank Herbert as originally stated — verify with search)
280
+
281
+ # author['name'] == 'J.R.R. Tolkien'
282
+ # author['fuller_name'] == 'John Ronald Reuel Tolkien'
283
+ # author['personal_name'] == 'J. R. R. Tolkien'
284
+ # author['birth_date'] == '3 January 1892' ← string, not parsed
285
+ # author['death_date'] == '2 September 1973'
286
+ # author['bio'] → str or {'type': '/type/text', 'value': str}
287
+ # author['photos'] == [6155606, 6433524, ...] ← photo IDs for covers API
288
+ # author['links'] == [{'title': '...', 'url': '...', 'type': {...}}, ...]
289
+ # author['remote_ids'] == {'wikidata': 'Q892', 'viaf': '95218067', ...}
290
+ # author['alternate_names']== ['J. R. R. Tolkien', 'TOLKIEN', ...]
291
+ # author['key'] == '/authors/OL26320A'
292
+ # author['wikipedia'] → URL string or None
293
+
294
+ # Author works (paginated)
295
+ works = json.loads(http_get("https://openlibrary.org/authors/OL26320A/works.json?limit=5"))
296
+ # works['size'] == 415
297
+ # works['entries'] == [{title, key, covers, authors, created, ...}, ...]
298
+ # works['links'] == {'self': '...', 'next': '...'}
299
+ ```
300
+
301
+ #### Author search
302
+
303
+ ```python
304
+ r = json.loads(http_get("https://openlibrary.org/search/authors.json?q=tolkien"))
305
+ # r['numFound'] == 40
306
+ # r['docs'][0]:
307
+ # 'name' == 'Christopher Tolkien'
308
+ # 'key' == 'OL2623360A' ← NOTE: no /authors/ prefix here
309
+ # 'birth_date' == '21 November 1924'
310
+ # 'death_date' == '16 January 2020'
311
+ # 'top_work' == 'The War of the Ring'
312
+ # 'work_count' == 43
313
+ # 'top_subjects' == [...]
314
+ # 'alternate_names' == [...]
315
+ # 'ratings_average' float
316
+ # 'want_to_read_count' int
317
+ # 'already_read_count' int
318
+ # 'currently_reading_count'int
319
+ ```
320
+
321
+ ---
322
+
323
+ ### Cover images
324
+
325
+ Covers are served directly as JPEG — redirect to Internet Archive CDN. No auth needed.
326
+
327
+ ```
328
+ # Book covers — three key types:
329
+ https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg # by cover ID (most reliable)
330
+ https://covers.openlibrary.org/b/isbn/{isbn}-{size}.jpg # by ISBN
331
+ https://covers.openlibrary.org/b/olid/{edition_id}-{size}.jpg # by edition OLID (unreliable — see gotchas)
332
+
333
+ # Author photos:
334
+ https://covers.openlibrary.org/a/id/{photo_id}-{size}.jpg
335
+
336
+ # Sizes: S (small), M (medium), L (large)
337
+ ```
338
+
339
+ ```python
340
+ import urllib.request
341
+
342
+ def get_cover_bytes(cover_id: int, size: str = 'M') -> bytes | None:
343
+ """Fetch cover image bytes. Returns None if no cover (43-byte GIF placeholder)."""
344
+ url = f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
345
+ req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
346
+ with urllib.request.urlopen(req, timeout=15) as resp:
347
+ data = resp.read()
348
+ return None if len(data) == 43 else data # 43-byte GIF = no cover placeholder
349
+
350
+ # Or just get the URL for embedding:
351
+ def cover_url(cover_id: int, size: str = 'M') -> str:
352
+ return f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
353
+
354
+ # Usage:
355
+ from helpers import http_get
356
+ import json
357
+ work = json.loads(http_get("https://openlibrary.org/works/OL893415W.json"))
358
+ if work.get('covers'):
359
+ img = get_cover_bytes(work['covers'][0], 'L') # first cover, large
360
+ # img is ~20–80KB JPEG bytes, redirected from ia*.archive.org
361
+ ```
362
+
363
+ To get cover by ISBN directly (e.g. for UI without a full book lookup):
364
+ ```python
365
+ # Medium-size cover by ISBN:
366
+ url = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg"
367
+ # Redirects to Internet Archive CDN, content-type: image/jpeg
368
+ # Use ?default=false to get 404 instead of 1×1 GIF placeholder for missing covers
369
+ url_safe = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg?default=false"
370
+ ```
371
+
372
+ ---
373
+
374
+ ### Subjects API
375
+
376
+ ```python
377
+ import json
378
+ from helpers import http_get
379
+
380
+ # Subject slugs: lowercase, underscores for spaces
381
+ r = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5"))
382
+ # r['name'] == 'science fiction'
383
+ # r['subject_type'] == 'subject' ← also: 'person', 'place', 'time'
384
+ # r['work_count'] == 20973
385
+ # r['works'] == [{title, key, cover_id, authors, edition_count, ...}, ...]
386
+
387
+ w = r['works'][0]
388
+ # w['title'] == 'Alice\'s Adventures in Wonderland'
389
+ # w['key'] == '/works/OL138052W'
390
+ # w['cover_id'] == 10527843
391
+ # w['cover_edition_key']== 'OL...'
392
+ # w['authors'] == [{'key': '/authors/OL22098A', 'name': 'Lewis Carroll'}]
393
+ # w['edition_count'] == 3546
394
+ # w['first_publish_year']== ...
395
+ # w['has_fulltext'] == True | False
396
+ # w['ia'] == 'identifier' (Internet Archive ID when available)
397
+
398
+ # Pagination: &offset=N
399
+ # Place subject:
400
+ r2 = json.loads(http_get("https://openlibrary.org/subjects/place:london.json?limit=5"))
401
+ # r2['subject_type'] == 'place', r2['work_count'] == 23927
402
+
403
+ # Person subject:
404
+ # https://openlibrary.org/subjects/person:napoleon.json?limit=5
405
+ # Time subject:
406
+ # https://openlibrary.org/subjects/time:middle_ages.json?limit=5
407
+
408
+ # Combine with ebooks=true to filter to only freely readable books:
409
+ r3 = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5&ebooks=true"))
410
+ # r3['works'][i]['has_fulltext'] == True for all results
411
+ ```
412
+
413
+ ---
414
+
415
+ ### Trending books
416
+
417
+ ```python
418
+ import json
419
+ from helpers import http_get
420
+
421
+ for period in ['daily', 'weekly', 'monthly']:
422
+ r = json.loads(http_get(f"https://openlibrary.org/trending/{period}.json?limit=10"))
423
+ # r['works'] == list of search-doc-style objects
424
+ # r['days'] == int (time window)
425
+ # r['hours'] == int
426
+ # Same fields as search docs (title, author_name, cover_i, key, ...)
427
+ print(period, r['works'][0]['title']) # e.g. 'Atomic Habits'
428
+ ```
429
+
430
+ ---
431
+
432
+ ## Rate limits
433
+
434
+ No authentication required. No API key. No explicit rate limit published.
435
+
436
+ Observed in testing: 5 requests completed in ~1 second with no throttling, no 429s. The API is served from CDN/Solr — in practice you can make 10–20 parallel requests without issue. For bulk operations (hundreds of ISBNs), use `ThreadPoolExecutor(max_workers=5)` to be a good citizen.
437
+
438
+ **No `User-Agent` override needed** — the default `Mozilla/5.0` from `http_get` is accepted by all Open Library endpoints (unlike Nominatim which blocks it).
439
+
440
+ ---
441
+
442
+ ## Gotchas
443
+
444
+ **`description` field has two shapes.** Both are real — check at runtime:
445
+ ```python
446
+ desc = work.get('description', '')
447
+ text = desc.get('value', '') if isinstance(desc, dict) else (desc or '')
448
+ ```
449
+
450
+ **`/works/OL45804W` is Fantastic Mr. Fox, not Dune.** The OL IDs in the original prompt were placeholders. Always resolve real IDs via the search API rather than hardcoding them.
451
+
452
+ **Author search `key` has no prefix.** `/search/authors.json` returns `key: 'OL26320A'`, but the Authors API and all other APIs use `/authors/OL26320A`. Add the prefix manually when constructing follow-up URLs.
453
+
454
+ **Missing cover → 43-byte GIF placeholder, not 404.** Without `?default=false`, the covers API returns a 1×1 transparent GIF instead of HTTP 404 for unknown IDs. Check `len(data) == 43` to detect missing covers.
455
+
456
+ **`covers.openlibrary.org/b/olid/{work_id}` is unreliable.** OLID-based cover URLs for work IDs (OL...W) return the placeholder even when covers exist. Always use `b/id/{cover_id}` (from `work['covers'][0]`) or `b/isbn/{isbn}` instead.
457
+
458
+ **Bibkeys API picks one edition per ISBN.** When the same ISBN appears on multiple editions (reprint, reissue), `api/books?bibkeys=ISBN:...` returns one — and it may not be the most common edition.
459
+
460
+ **`publish_date` is a raw string.** Values like `'October 1, 1988'`, `'19/08/2017'`, `'2021'`, and `'1965-01-01'` all appear. Don't parse without normalization.
461
+
462
+ **`/works/.../editions.json` pagination uses `links.next`.** Unlike search (which uses `offset=`), check `links['next']` in the response to know if more pages exist:
463
+ ```python
464
+ resp = json.loads(http_get("https://openlibrary.org/works/OL893415W/editions.json?limit=50"))
465
+ while 'next' in resp.get('links', {}):
466
+ resp = json.loads(http_get("https://openlibrary.org" + resp['links']['next']))
467
+ # process resp['entries']
468
+ ```
469
+
470
+ **404 for non-existent IDs.** `/works/OL99999999W.json`, `/books/OL99999999M.json`, and `/authors/OL99999999A.json` all raise `HTTPError: HTTP Error 404: Not Found`. Wrap in try/except.
471
+
472
+ **Search `docs` default fields are minimal.** The default response includes ~15 fields. Add `&fields=*` to get all 100+ Solr fields (ratings, ISBNs, publishers, subjects, Goodreads IDs, etc.). Alternatively specify exactly what you need: `&fields=title,isbn,ratings_average`.