@pencil-agent/nano-pencil 2.0.0-beta.9 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (207) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/extensions-host/index.d.ts +1 -1
  7. package/dist/core/extensions-host/types.d.ts +5 -8
  8. package/dist/extensions/builtin/AGENT.md +115 -115
  9. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  10. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  11. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  12. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  13. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  14. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  15. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  16. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  17. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  18. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  91. package/dist/extensions/builtin/browser/browser.md +73 -73
  92. package/dist/extensions/builtin/browser/install.md +142 -142
  93. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  94. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  95. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  96. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  97. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  98. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  99. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  100. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  101. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  102. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  104. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  105. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  108. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  109. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  110. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  111. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  112. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  113. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  114. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  115. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  116. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  117. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  118. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  119. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  120. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  121. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  122. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  123. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  124. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  125. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  126. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  127. package/dist/extensions/builtin/goal/README.md +67 -67
  128. package/dist/extensions/builtin/goal/goal-controller.js +1 -1
  129. package/dist/extensions/builtin/goal/goal-prompts.js +4 -4
  130. package/dist/extensions/builtin/grub/README.md +112 -112
  131. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  132. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  133. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  134. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  135. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  136. package/dist/extensions/builtin/loop/README.md +92 -92
  137. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  138. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  139. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  140. package/dist/extensions/builtin/sal/README.md +72 -72
  141. package/dist/extensions/builtin/security-audit/README.md +289 -289
  142. package/dist/extensions/builtin/team/AGENT.md +112 -112
  143. package/dist/extensions/builtin/team/TESTING.md +299 -299
  144. package/dist/extensions/builtin/token-save/README.md +56 -56
  145. package/dist/extensions/optional/AGENT.md +10 -10
  146. package/dist/index.d.ts +5 -30
  147. package/dist/index.js +1 -1
  148. package/dist/models.d.ts +7 -0
  149. package/dist/models.js +1 -0
  150. package/dist/modes/interactive/theme/dark.json +85 -85
  151. package/dist/modes/interactive/theme/light.json +84 -84
  152. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  153. package/dist/modes/interactive/theme/warm.json +81 -81
  154. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  155. package/dist/packages/protocol/src/flags.d.ts +20 -0
  156. package/dist/packages/protocol/src/flags.js +0 -0
  157. package/dist/packages/protocol/src/hooks.d.ts +17 -0
  158. package/dist/packages/protocol/src/hooks.js +0 -0
  159. package/dist/packages/protocol/src/index.d.ts +4 -2
  160. package/dist/packages/protocol/src/index.js +1 -1
  161. package/dist/packages/protocol/src/lifecycle.d.ts +11 -21
  162. package/dist/public-config.d.ts +12 -0
  163. package/dist/public-config.js +1 -0
  164. package/dist/runtime.d.ts +9 -0
  165. package/dist/runtime.js +1 -0
  166. package/dist/session-compaction.d.ts +7 -0
  167. package/dist/session-compaction.js +1 -0
  168. package/dist/session.d.ts +7 -0
  169. package/dist/session.js +1 -0
  170. package/dist/skills.d.ts +7 -0
  171. package/dist/skills.js +1 -0
  172. package/dist/tools.d.ts +7 -0
  173. package/dist/tools.js +1 -0
  174. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  175. package/docs/SDK-TESTING.md +364 -0
  176. package/docs/codex-goal-command-impl.md +1055 -1055
  177. package/docs/codex-goal-vs-grub.md +500 -500
  178. package/docs/custom-provider.md +27 -27
  179. package/docs/extensions.md +27 -27
  180. package/docs/keybindings.md +27 -27
  181. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  182. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  183. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  184. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  185. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  186. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  187. package/docs/loop-usage-examples.md +214 -214
  188. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  189. package/docs/models.md +27 -27
  190. package/docs/packages.md +27 -27
  191. package/docs/pi-design-philosophy.md +457 -457
  192. package/docs/planmode.md +1987 -1987
  193. package/docs/prompt-templates.md +27 -27
  194. package/docs/providers.md +27 -27
  195. package/docs/sdk.md +27 -27
  196. package/docs/skills.md +27 -27
  197. package/docs/startup-performance-optimization.md +301 -0
  198. package/docs/themes.md +27 -27
  199. package/docs/tui.md +27 -27
  200. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  201. package/package.json +190 -162
  202. package/docs/cc-agent-design.md +0 -1297
  203. package/docs/cc-tui-design.md +0 -1333
  204. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
  205. package/docs/scan-report.md +0 -3820
  206. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  207. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
@@ -1,472 +1,472 @@
1
- # Open Library — Book Data Extraction
2
-
3
- `https://openlibrary.org` — Internet Archive's free book catalog. All endpoints are public JSON APIs — no auth, no browser, no scraping required.
4
-
5
- ## Do this first
6
-
7
- **Every task is a direct HTTP call — never open the browser.**
8
-
9
- ```python
10
- import json
11
- from helpers import http_get
12
-
13
- # Search by title
14
- results = json.loads(http_get("https://openlibrary.org/search.json?q=dune&limit=5"))
15
- # results['numFound'] == 49090
16
- # results['docs'] == list of work objects
17
- # results['start'] == 0 (offset for pagination)
18
- ```
19
-
20
- The search API is your entry point for everything. It returns work-level records (all editions grouped). To get edition details, follow the `key` to the Works or Books API.
21
-
22
- ---
23
-
24
- ## Common workflows
25
-
26
- ### Search by query, author, title, or ISBN
27
-
28
- ```python
29
- import json
30
- from helpers import http_get
31
-
32
- # Free-text search
33
- r = json.loads(http_get("https://openlibrary.org/search.json?q=dune+frank+herbert&limit=5"))
34
-
35
- # Author search
36
- r = json.loads(http_get(
37
- "https://openlibrary.org/search.json?author=tolkien&limit=5"
38
- "&fields=title,author_name,first_publish_year,isbn"
39
- ))
40
- # fields=* returns all available fields; default returns ~15
41
-
42
- # Title + author combined
43
- r = json.loads(http_get(
44
- "https://openlibrary.org/search.json?title=dune&author=frank+herbert&limit=3"
45
- "&fields=title,author_name,edition_count,first_publish_year"
46
- ))
47
- # r['docs'][0]['title'] == 'Dune'
48
- # r['docs'][0]['author_name'] == ['Frank Herbert']
49
- # r['docs'][0]['first_publish_year'] == 1965
50
- # r['docs'][0]['edition_count'] == 120
51
-
52
- # ISBN lookup (returns 0–2 results for the same work)
53
- r = json.loads(http_get("https://openlibrary.org/search.json?isbn=9780743273565"))
54
- # r['numFound'] == 2
55
- # r['docs'][0]['title'] == 'The Great Gatsby'
56
- # r['docs'][0]['key'] == '/works/OL468431W'
57
- ```
58
-
59
- **Sort options** (`&sort=`): `new` (recently added), `old`, `random`, `editions` (most editions), `scans` (most scans). Default is relevance.
60
-
61
- **Language filter**: `&language=fre` (ISO 639-2/B codes: `eng`, `fre`, `ger`, `spa`, `ita`, etc.)
62
-
63
- **Pagination**: `&limit=N&offset=N`. Max limit not enforced but keep under 100 for reliability.
64
-
65
- #### Search doc fields (default — ~15 keys always present)
66
-
67
- | Field | Type | Notes |
68
- |---|---|---|
69
- | `key` | str | `/works/OL893415W` — use for Works API |
70
- | `title` | str | Work title |
71
- | `author_name` | list[str] | e.g. `['Frank Herbert']` |
72
- | `author_key` | list[str] | e.g. `['OL79034A']` |
73
- | `first_publish_year` | int | |
74
- | `edition_count` | int | Number of editions across all languages |
75
- | `cover_i` | int | Cover image ID — use with covers API |
76
- | `cover_edition_key` | str | e.g. `OL7353617M` |
77
- | `language` | list[str] | ISO codes of all editions |
78
- | `ia` | list[str] | Internet Archive identifiers (when has_fulltext=true) |
79
- | `ebook_access` | str | `'public'`, `'borrowable'`, `'no_ebook'` |
80
- | `has_fulltext` | bool | |
81
-
82
- #### Extra fields with `&fields=*`
83
-
84
- ```python
85
- # With fields=* you also get:
86
- # 'isbn' list[str] All ISBNs across editions
87
- # 'publisher' list[str] All publishers ever
88
- # 'publish_date' list[str] All publish dates (strings, inconsistent formats)
89
- # 'publish_year' list[int] Parsed years
90
- # 'subject' list[str] Subject headings
91
- # 'person' list[str] Subject persons (e.g. 'Big Brother')
92
- # 'place' list[str] Subject places
93
- # 'time' list[str] Subject times
94
- # 'number_of_pages_median' int Median page count across editions
95
- # 'ratings_average' float e.g. 4.29
96
- # 'ratings_count' int
97
- # 'want_to_read_count' int
98
- # 'already_read_count' int
99
- # 'currently_reading_count'int
100
- # 'readinglog_count' int Total of all reading log entries
101
- # 'first_sentence' list[str]
102
- # 'id_goodreads' list[str]
103
- # 'id_librarything' list[str]
104
- # 'id_wikidata' list[str]
105
- # 'ddc' list[str] Dewey Decimal Classification
106
- # 'lcc' list[str] Library of Congress Classification
107
- # 'lccn' list[str]
108
- ```
109
-
110
- ---
111
-
112
- ### Bulk ISBN lookups (parallel)
113
-
114
- ```python
115
- import json
116
- from helpers import http_get
117
- from concurrent.futures import ThreadPoolExecutor
118
-
119
- isbns = ['9780743273565', '9780451524935', '9780618346257']
120
-
121
- def lookup_isbn(isbn):
122
- url = f"https://openlibrary.org/search.json?isbn={isbn}&fields=title,author_name,first_publish_year,key"
123
- r = json.loads(http_get(url))
124
- if r['docs']:
125
- d = r['docs'][0]
126
- return {'isbn': isbn, 'title': d.get('title'), 'author': d.get('author_name', [None])[0],
127
- 'year': d.get('first_publish_year'), 'key': d.get('key')}
128
- return {'isbn': isbn, 'found': False}
129
-
130
- with ThreadPoolExecutor(max_workers=5) as ex:
131
- books = list(ex.map(lookup_isbn, isbns))
132
-
133
- # [{'isbn': '9780743273565', 'title': 'The Great Gatsby', 'author': 'F. Scott Fitzgerald', 'year': 1920, ...},
134
- # {'isbn': '9780451524935', 'title': 'Nineteen Eighty-Four', 'author': 'George Orwell', 'year': 1949, ...},
135
- # {'isbn': '9780618346257', 'title': 'The Fellowship of the Ring', 'author': 'J.R.R. Tolkien', 'year': 1954, ...}]
136
- ```
137
-
138
- ---
139
-
140
- ### Works API (editions grouped by title)
141
-
142
- Returns all metadata for a work (all editions combined). Get the work ID from `key` in search results.
143
-
144
- ```python
145
- import json
146
- from helpers import http_get
147
-
148
- work_id = 'OL893415W' # from search doc['key'] = '/works/OL893415W'
149
- work = json.loads(http_get(f"https://openlibrary.org/works/{work_id}.json"))
150
-
151
- # work['title'] == 'Dune'
152
- # work['key'] == '/works/OL893415W'
153
- # work['covers'] == [11481354, 12375564, 11157826] ← cover IDs for covers API
154
- # work['subjects'] == ['Dune (Imaginary place)', 'Fiction', ...]
155
- # work['subject_places'] == [...] ← geographic subjects (may be absent)
156
- # work['subject_people'] == [...] ← person subjects (may be absent)
157
- # work['subject_times'] == [...] ← time subjects (may be absent)
158
- # work['authors'] == [{'author': {'key': '/authors/OL79034A'}, 'type': {...}}]
159
- # work['description'] → either str OR {'type': '/type/text', 'value': str} ← see gotchas
160
- # work['created'] == {'type': '/type/datetime', 'value': '2009-10-15T11:34:21.437031'}
161
- # work['last_modified'] same shape as created
162
- ```
163
-
164
- Helper for the description field (which has two possible shapes):
165
-
166
- ```python
167
- def get_description(work: dict) -> str:
168
- desc = work.get('description', '')
169
- if isinstance(desc, dict):
170
- return desc.get('value', '')
171
- return desc or ''
172
- ```
173
-
174
- #### Works editions (paginated list of all editions)
175
-
176
- ```python
177
- editions_resp = json.loads(http_get(
178
- f"https://openlibrary.org/works/{work_id}/editions.json?limit=10&offset=0"
179
- ))
180
- # editions_resp['size'] == 120 (total edition count)
181
- # editions_resp['entries'] == [...] (up to limit items)
182
- # editions_resp['links'] == {'self': '...', 'work': '...', 'next': '...', 'prev': '...'}
183
- # ← use links['next'] for pagination when offset+limit < size
184
-
185
- e = editions_resp['entries'][0]
186
- # e['title'] == 'Duna'
187
- # e['publishers'] == ['Editora Aleph']
188
- # e['publish_date'] == '19/08/2017' ← inconsistent format, string
189
- # e['isbn_13'] == ['9788576573135']
190
- # e['isbn_10'] == ['857657313X']
191
- # e['covers'] == [10368109]
192
- # e['number_of_pages'] == 680
193
- # e['languages'] == [{'key': '/languages/por'}]
194
- # e['key'] == '/books/OL28969075M'
195
- # e['physical_format'] == 'Paperback' (often missing)
196
- # e['notes'] → str or {'value': str} (often missing)
197
- ```
198
-
199
- ---
200
-
201
- ### Books API (specific edition)
202
-
203
- Two sub-APIs: direct JSON for raw data, or `api/books` for enriched data.
204
-
205
- #### Direct edition JSON
206
-
207
- ```python
208
- import json
209
- from helpers import http_get
210
-
211
- edition_id = 'OL7353617M' # from editions list e['key'] or cover_edition_key in search
212
- edition = json.loads(http_get(f"https://openlibrary.org/books/{edition_id}.json"))
213
-
214
- # edition['title'] == 'Fantastic Mr. Fox'
215
- # edition['publishers'] == ['Puffin']
216
- # edition['publish_date'] == 'October 1, 1988'
217
- # edition['isbn_13'] == ['9780140328721']
218
- # edition['isbn_10'] == ['0140328726']
219
- # edition['number_of_pages'] == 96
220
- # edition['covers'] == [...] ← cover IDs
221
- # edition['languages'] == [{'key': '/languages/eng'}]
222
- # edition['works'] == [{'key': '/works/OL45804W'}]
223
- # edition['authors'] == [{'key': '/authors/OL34184A'}]
224
- # edition['identifiers'] == {'goodreads': [...], 'librarything': [...]}
225
- # edition['first_sentence'] == {'value': '...'} or str (often missing)
226
- # edition['ocaid'] == 'fantast00dahl' ← Internet Archive ID (if available)
227
- ```
228
-
229
- #### Bibkeys API (enriched, multiple books at once)
230
-
231
- ```python
232
- # jscmd=data: cleaned up dict with cover URLs pre-built
233
- r = json.loads(http_get(
234
- "https://openlibrary.org/api/books"
235
- "?bibkeys=ISBN:9780743273565,ISBN:9780451524935"
236
- "&format=json&jscmd=data"
237
- ))
238
- # r == {'ISBN:9780743273565': {...}, 'ISBN:9780451524935': {...}}
239
-
240
- book = r['ISBN:9780743273565']
241
- # book['title'] == 'The Great Gatsby'
242
- # book['authors'] == [{'url': '...', 'name': 'F. Scott Fitzgerald'}]
243
- # book['publish_date'] == '2021'
244
- # book['publishers'] == [{'name': 'Independently Published'}]
245
- # book['number_of_pages'] == 208
246
- # book['url'] == 'http://openlibrary.org/books/OL46773254M/The_Great_Gatsby'
247
- # book['key'] == '/books/OL46773254M'
248
- # book['cover'] == {'small': '...S.jpg', 'medium': '...M.jpg', 'large': '...L.jpg'}
249
- # book['identifiers'] == {'isbn_13': [...], 'openlibrary': [...]}
250
- # book['subjects'] == [{'name': 'Modern fiction', 'url': '...'}, ...]
251
- # book['subject_places'] == None ← often null even with jscmd=data
252
-
253
- # jscmd=details: raw edition JSON + extra fields
254
- r2 = json.loads(http_get(
255
- "https://openlibrary.org/api/books"
256
- "?bibkeys=ISBN:9780743273565&format=json&jscmd=details"
257
- ))
258
- item = r2['ISBN:9780743273565']
259
- # item['bib_key'] == 'ISBN:9780743273565'
260
- # item['info_url'] == 'http://openlibrary.org/books/OL...'
261
- # item['preview'] == 'noview' | 'restricted' | 'full'
262
- # item['preview_url'] == URL to read on OL or IA
263
- # item['thumbnail_url']== 'https://covers.openlibrary.org/b/id/14314120-S.jpg'
264
- # item['details'] → raw edition JSON (same as /books/OL...M.json)
265
-
266
- # Supported bibkey prefixes: ISBN:, OCLC:, LCCN:, OLID: (e.g. OLID:OL46773254M)
267
- ```
268
-
269
- ---
270
-
271
- ### Authors API
272
-
273
- ```python
274
- import json
275
- from helpers import http_get
276
-
277
- # Lookup by known author key
278
- author = json.loads(http_get("https://openlibrary.org/authors/OL26320A.json"))
279
- # OL26320A is J.R.R. Tolkien (note: not Frank Herbert as originally stated — verify with search)
280
-
281
- # author['name'] == 'J.R.R. Tolkien'
282
- # author['fuller_name'] == 'John Ronald Reuel Tolkien'
283
- # author['personal_name'] == 'J. R. R. Tolkien'
284
- # author['birth_date'] == '3 January 1892' ← string, not parsed
285
- # author['death_date'] == '2 September 1973'
286
- # author['bio'] → str or {'type': '/type/text', 'value': str}
287
- # author['photos'] == [6155606, 6433524, ...] ← photo IDs for covers API
288
- # author['links'] == [{'title': '...', 'url': '...', 'type': {...}}, ...]
289
- # author['remote_ids'] == {'wikidata': 'Q892', 'viaf': '95218067', ...}
290
- # author['alternate_names']== ['J. R. R. Tolkien', 'TOLKIEN', ...]
291
- # author['key'] == '/authors/OL26320A'
292
- # author['wikipedia'] → URL string or None
293
-
294
- # Author works (paginated)
295
- works = json.loads(http_get("https://openlibrary.org/authors/OL26320A/works.json?limit=5"))
296
- # works['size'] == 415
297
- # works['entries'] == [{title, key, covers, authors, created, ...}, ...]
298
- # works['links'] == {'self': '...', 'next': '...'}
299
- ```
300
-
301
- #### Author search
302
-
303
- ```python
304
- r = json.loads(http_get("https://openlibrary.org/search/authors.json?q=tolkien"))
305
- # r['numFound'] == 40
306
- # r['docs'][0]:
307
- # 'name' == 'Christopher Tolkien'
308
- # 'key' == 'OL2623360A' ← NOTE: no /authors/ prefix here
309
- # 'birth_date' == '21 November 1924'
310
- # 'death_date' == '16 January 2020'
311
- # 'top_work' == 'The War of the Ring'
312
- # 'work_count' == 43
313
- # 'top_subjects' == [...]
314
- # 'alternate_names' == [...]
315
- # 'ratings_average' float
316
- # 'want_to_read_count' int
317
- # 'already_read_count' int
318
- # 'currently_reading_count'int
319
- ```
320
-
321
- ---
322
-
323
- ### Cover images
324
-
325
- Covers are served directly as JPEG — redirect to Internet Archive CDN. No auth needed.
326
-
327
- ```
328
- # Book covers — three key types:
329
- https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg # by cover ID (most reliable)
330
- https://covers.openlibrary.org/b/isbn/{isbn}-{size}.jpg # by ISBN
331
- https://covers.openlibrary.org/b/olid/{edition_id}-{size}.jpg # by edition OLID (unreliable — see gotchas)
332
-
333
- # Author photos:
334
- https://covers.openlibrary.org/a/id/{photo_id}-{size}.jpg
335
-
336
- # Sizes: S (small), M (medium), L (large)
337
- ```
338
-
339
- ```python
340
- import urllib.request
341
-
342
- def get_cover_bytes(cover_id: int, size: str = 'M') -> bytes | None:
343
- """Fetch cover image bytes. Returns None if no cover (43-byte GIF placeholder)."""
344
- url = f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
345
- req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
346
- with urllib.request.urlopen(req, timeout=15) as resp:
347
- data = resp.read()
348
- return None if len(data) == 43 else data # 43-byte GIF = no cover placeholder
349
-
350
- # Or just get the URL for embedding:
351
- def cover_url(cover_id: int, size: str = 'M') -> str:
352
- return f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
353
-
354
- # Usage:
355
- from helpers import http_get
356
- import json
357
- work = json.loads(http_get("https://openlibrary.org/works/OL893415W.json"))
358
- if work.get('covers'):
359
- img = get_cover_bytes(work['covers'][0], 'L') # first cover, large
360
- # img is ~20–80KB JPEG bytes, redirected from ia*.archive.org
361
- ```
362
-
363
- To get cover by ISBN directly (e.g. for UI without a full book lookup):
364
- ```python
365
- # Medium-size cover by ISBN:
366
- url = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg"
367
- # Redirects to Internet Archive CDN, content-type: image/jpeg
368
- # Use ?default=false to get 404 instead of 1×1 GIF placeholder for missing covers
369
- url_safe = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg?default=false"
370
- ```
371
-
372
- ---
373
-
374
- ### Subjects API
375
-
376
- ```python
377
- import json
378
- from helpers import http_get
379
-
380
- # Subject slugs: lowercase, underscores for spaces
381
- r = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5"))
382
- # r['name'] == 'science fiction'
383
- # r['subject_type'] == 'subject' ← also: 'person', 'place', 'time'
384
- # r['work_count'] == 20973
385
- # r['works'] == [{title, key, cover_id, authors, edition_count, ...}, ...]
386
-
387
- w = r['works'][0]
388
- # w['title'] == 'Alice\'s Adventures in Wonderland'
389
- # w['key'] == '/works/OL138052W'
390
- # w['cover_id'] == 10527843
391
- # w['cover_edition_key']== 'OL...'
392
- # w['authors'] == [{'key': '/authors/OL22098A', 'name': 'Lewis Carroll'}]
393
- # w['edition_count'] == 3546
394
- # w['first_publish_year']== ...
395
- # w['has_fulltext'] == True | False
396
- # w['ia'] == 'identifier' (Internet Archive ID when available)
397
-
398
- # Pagination: &offset=N
399
- # Place subject:
400
- r2 = json.loads(http_get("https://openlibrary.org/subjects/place:london.json?limit=5"))
401
- # r2['subject_type'] == 'place', r2['work_count'] == 23927
402
-
403
- # Person subject:
404
- # https://openlibrary.org/subjects/person:napoleon.json?limit=5
405
- # Time subject:
406
- # https://openlibrary.org/subjects/time:middle_ages.json?limit=5
407
-
408
- # Combine with ebooks=true to filter to only freely readable books:
409
- r3 = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5&ebooks=true"))
410
- # r3['works'][i]['has_fulltext'] == True for all results
411
- ```
412
-
413
- ---
414
-
415
- ### Trending books
416
-
417
- ```python
418
- import json
419
- from helpers import http_get
420
-
421
- for period in ['daily', 'weekly', 'monthly']:
422
- r = json.loads(http_get(f"https://openlibrary.org/trending/{period}.json?limit=10"))
423
- # r['works'] == list of search-doc-style objects
424
- # r['days'] == int (time window)
425
- # r['hours'] == int
426
- # Same fields as search docs (title, author_name, cover_i, key, ...)
427
- print(period, r['works'][0]['title']) # e.g. 'Atomic Habits'
428
- ```
429
-
430
- ---
431
-
432
- ## Rate limits
433
-
434
- No authentication required. No API key. No explicit rate limit published.
435
-
436
- Observed in testing: 5 requests completed in ~1 second with no throttling, no 429s. The API is served from CDN/Solr — in practice you can make 10–20 parallel requests without issue. For bulk operations (hundreds of ISBNs), use `ThreadPoolExecutor(max_workers=5)` to be a good citizen.
437
-
438
- **No `User-Agent` override needed** — the default `Mozilla/5.0` from `http_get` is accepted by all Open Library endpoints (unlike Nominatim which blocks it).
439
-
440
- ---
441
-
442
- ## Gotchas
443
-
444
- **`description` field has two shapes.** Both are real — check at runtime:
445
- ```python
446
- desc = work.get('description', '')
447
- text = desc.get('value', '') if isinstance(desc, dict) else (desc or '')
448
- ```
449
-
450
- **`/works/OL45804W` is Fantastic Mr. Fox, not Dune.** The OL IDs in the original prompt were placeholders. Always resolve real IDs via the search API rather than hardcoding them.
451
-
452
- **Author search `key` has no prefix.** `/search/authors.json` returns `key: 'OL26320A'`, but the Authors API and all other APIs use `/authors/OL26320A`. Add the prefix manually when constructing follow-up URLs.
453
-
454
- **Missing cover → 43-byte GIF placeholder, not 404.** Without `?default=false`, the covers API returns a 1×1 transparent GIF instead of HTTP 404 for unknown IDs. Check `len(data) == 43` to detect missing covers.
455
-
456
- **`covers.openlibrary.org/b/olid/{work_id}` is unreliable.** OLID-based cover URLs for work IDs (OL...W) return the placeholder even when covers exist. Always use `b/id/{cover_id}` (from `work['covers'][0]`) or `b/isbn/{isbn}` instead.
457
-
458
- **Bibkeys API picks one edition per ISBN.** When the same ISBN appears on multiple editions (reprint, reissue), `api/books?bibkeys=ISBN:...` returns one — and it may not be the most common edition.
459
-
460
- **`publish_date` is a raw string.** Values like `'October 1, 1988'`, `'19/08/2017'`, `'2021'`, and `'1965-01-01'` all appear. Don't parse without normalization.
461
-
462
- **`/works/.../editions.json` pagination uses `links.next`.** Unlike search (which uses `offset=`), check `links['next']` in the response to know if more pages exist:
463
- ```python
464
- resp = json.loads(http_get("https://openlibrary.org/works/OL893415W/editions.json?limit=50"))
465
- while 'next' in resp.get('links', {}):
466
- resp = json.loads(http_get("https://openlibrary.org" + resp['links']['next']))
467
- # process resp['entries']
468
- ```
469
-
470
- **404 for non-existent IDs.** `/works/OL99999999W.json`, `/books/OL99999999M.json`, and `/authors/OL99999999A.json` all raise `HTTPError: HTTP Error 404: Not Found`. Wrap in try/except.
471
-
472
- **Search `docs` default fields are minimal.** The default response includes ~15 fields. Add `&fields=*` to get all 100+ Solr fields (ratings, ISBNs, publishers, subjects, Goodreads IDs, etc.). Alternatively specify exactly what you need: `&fields=title,isbn,ratings_average`.
1
+ # Open Library — Book Data Extraction
2
+
3
+ `https://openlibrary.org` — Internet Archive's free book catalog. All endpoints are public JSON APIs — no auth, no browser, no scraping required.
4
+
5
+ ## Do this first
6
+
7
+ **Every task is a direct HTTP call — never open the browser.**
8
+
9
+ ```python
10
+ import json
11
+ from helpers import http_get
12
+
13
+ # Search by title
14
+ results = json.loads(http_get("https://openlibrary.org/search.json?q=dune&limit=5"))
15
+ # results['numFound'] == 49090
16
+ # results['docs'] == list of work objects
17
+ # results['start'] == 0 (offset for pagination)
18
+ ```
19
+
20
+ The search API is your entry point for everything. It returns work-level records (all editions grouped). To get edition details, follow the `key` to the Works or Books API.
21
+
22
+ ---
23
+
24
+ ## Common workflows
25
+
26
+ ### Search by query, author, title, or ISBN
27
+
28
+ ```python
29
+ import json
30
+ from helpers import http_get
31
+
32
+ # Free-text search
33
+ r = json.loads(http_get("https://openlibrary.org/search.json?q=dune+frank+herbert&limit=5"))
34
+
35
+ # Author search
36
+ r = json.loads(http_get(
37
+ "https://openlibrary.org/search.json?author=tolkien&limit=5"
38
+ "&fields=title,author_name,first_publish_year,isbn"
39
+ ))
40
+ # fields=* returns all available fields; default returns ~15
41
+
42
+ # Title + author combined
43
+ r = json.loads(http_get(
44
+ "https://openlibrary.org/search.json?title=dune&author=frank+herbert&limit=3"
45
+ "&fields=title,author_name,edition_count,first_publish_year"
46
+ ))
47
+ # r['docs'][0]['title'] == 'Dune'
48
+ # r['docs'][0]['author_name'] == ['Frank Herbert']
49
+ # r['docs'][0]['first_publish_year'] == 1965
50
+ # r['docs'][0]['edition_count'] == 120
51
+
52
+ # ISBN lookup (returns 0–2 results for the same work)
53
+ r = json.loads(http_get("https://openlibrary.org/search.json?isbn=9780743273565"))
54
+ # r['numFound'] == 2
55
+ # r['docs'][0]['title'] == 'The Great Gatsby'
56
+ # r['docs'][0]['key'] == '/works/OL468431W'
57
+ ```
58
+
59
+ **Sort options** (`&sort=`): `new` (recently added), `old`, `random`, `editions` (most editions), `scans` (most scans). Default is relevance.
60
+
61
+ **Language filter**: `&language=fre` (ISO 639-2/B codes: `eng`, `fre`, `ger`, `spa`, `ita`, etc.)
62
+
63
+ **Pagination**: `&limit=N&offset=N`. Max limit not enforced but keep under 100 for reliability.
64
+
65
+ #### Search doc fields (default — ~15 keys always present)
66
+
67
+ | Field | Type | Notes |
68
+ |---|---|---|
69
+ | `key` | str | `/works/OL893415W` — use for Works API |
70
+ | `title` | str | Work title |
71
+ | `author_name` | list[str] | e.g. `['Frank Herbert']` |
72
+ | `author_key` | list[str] | e.g. `['OL79034A']` |
73
+ | `first_publish_year` | int | |
74
+ | `edition_count` | int | Number of editions across all languages |
75
+ | `cover_i` | int | Cover image ID — use with covers API |
76
+ | `cover_edition_key` | str | e.g. `OL7353617M` |
77
+ | `language` | list[str] | ISO codes of all editions |
78
+ | `ia` | list[str] | Internet Archive identifiers (when has_fulltext=true) |
79
+ | `ebook_access` | str | `'public'`, `'borrowable'`, `'no_ebook'` |
80
+ | `has_fulltext` | bool | |
81
+
82
+ #### Extra fields with `&fields=*`
83
+
84
+ ```python
85
+ # With fields=* you also get:
86
+ # 'isbn' list[str] All ISBNs across editions
87
+ # 'publisher' list[str] All publishers ever
88
+ # 'publish_date' list[str] All publish dates (strings, inconsistent formats)
89
+ # 'publish_year' list[int] Parsed years
90
+ # 'subject' list[str] Subject headings
91
+ # 'person' list[str] Subject persons (e.g. 'Big Brother')
92
+ # 'place' list[str] Subject places
93
+ # 'time' list[str] Subject times
94
+ # 'number_of_pages_median' int Median page count across editions
95
+ # 'ratings_average' float e.g. 4.29
96
+ # 'ratings_count' int
97
+ # 'want_to_read_count' int
98
+ # 'already_read_count' int
99
+ # 'currently_reading_count'int
100
+ # 'readinglog_count' int Total of all reading log entries
101
+ # 'first_sentence' list[str]
102
+ # 'id_goodreads' list[str]
103
+ # 'id_librarything' list[str]
104
+ # 'id_wikidata' list[str]
105
+ # 'ddc' list[str] Dewey Decimal Classification
106
+ # 'lcc' list[str] Library of Congress Classification
107
+ # 'lccn' list[str]
108
+ ```
109
+
110
+ ---
111
+
112
+ ### Bulk ISBN lookups (parallel)
113
+
114
+ ```python
115
+ import json
116
+ from helpers import http_get
117
+ from concurrent.futures import ThreadPoolExecutor
118
+
119
+ isbns = ['9780743273565', '9780451524935', '9780618346257']
120
+
121
+ def lookup_isbn(isbn):
122
+ url = f"https://openlibrary.org/search.json?isbn={isbn}&fields=title,author_name,first_publish_year,key"
123
+ r = json.loads(http_get(url))
124
+ if r['docs']:
125
+ d = r['docs'][0]
126
+ return {'isbn': isbn, 'title': d.get('title'), 'author': d.get('author_name', [None])[0],
127
+ 'year': d.get('first_publish_year'), 'key': d.get('key')}
128
+ return {'isbn': isbn, 'found': False}
129
+
130
+ with ThreadPoolExecutor(max_workers=5) as ex:
131
+ books = list(ex.map(lookup_isbn, isbns))
132
+
133
+ # [{'isbn': '9780743273565', 'title': 'The Great Gatsby', 'author': 'F. Scott Fitzgerald', 'year': 1920, ...},
134
+ # {'isbn': '9780451524935', 'title': 'Nineteen Eighty-Four', 'author': 'George Orwell', 'year': 1949, ...},
135
+ # {'isbn': '9780618346257', 'title': 'The Fellowship of the Ring', 'author': 'J.R.R. Tolkien', 'year': 1954, ...}]
136
+ ```
137
+
138
+ ---
139
+
140
+ ### Works API (editions grouped by title)
141
+
142
+ Returns all metadata for a work (all editions combined). Get the work ID from `key` in search results.
143
+
144
+ ```python
145
+ import json
146
+ from helpers import http_get
147
+
148
+ work_id = 'OL893415W' # from search doc['key'] = '/works/OL893415W'
149
+ work = json.loads(http_get(f"https://openlibrary.org/works/{work_id}.json"))
150
+
151
+ # work['title'] == 'Dune'
152
+ # work['key'] == '/works/OL893415W'
153
+ # work['covers'] == [11481354, 12375564, 11157826] ← cover IDs for covers API
154
+ # work['subjects'] == ['Dune (Imaginary place)', 'Fiction', ...]
155
+ # work['subject_places'] == [...] ← geographic subjects (may be absent)
156
+ # work['subject_people'] == [...] ← person subjects (may be absent)
157
+ # work['subject_times'] == [...] ← time subjects (may be absent)
158
+ # work['authors'] == [{'author': {'key': '/authors/OL79034A'}, 'type': {...}}]
159
+ # work['description'] → either str OR {'type': '/type/text', 'value': str} ← see gotchas
160
+ # work['created'] == {'type': '/type/datetime', 'value': '2009-10-15T11:34:21.437031'}
161
+ # work['last_modified'] same shape as created
162
+ ```
163
+
164
+ Helper for the description field (which has two possible shapes):
165
+
166
+ ```python
167
+ def get_description(work: dict) -> str:
168
+ desc = work.get('description', '')
169
+ if isinstance(desc, dict):
170
+ return desc.get('value', '')
171
+ return desc or ''
172
+ ```
173
+
174
+ #### Works editions (paginated list of all editions)
175
+
176
+ ```python
177
+ editions_resp = json.loads(http_get(
178
+ f"https://openlibrary.org/works/{work_id}/editions.json?limit=10&offset=0"
179
+ ))
180
+ # editions_resp['size'] == 120 (total edition count)
181
+ # editions_resp['entries'] == [...] (up to limit items)
182
+ # editions_resp['links'] == {'self': '...', 'work': '...', 'next': '...', 'prev': '...'}
183
+ # ← use links['next'] for pagination when offset+limit < size
184
+
185
+ e = editions_resp['entries'][0]
186
+ # e['title'] == 'Duna'
187
+ # e['publishers'] == ['Editora Aleph']
188
+ # e['publish_date'] == '19/08/2017' ← inconsistent format, string
189
+ # e['isbn_13'] == ['9788576573135']
190
+ # e['isbn_10'] == ['857657313X']
191
+ # e['covers'] == [10368109]
192
+ # e['number_of_pages'] == 680
193
+ # e['languages'] == [{'key': '/languages/por'}]
194
+ # e['key'] == '/books/OL28969075M'
195
+ # e['physical_format'] == 'Paperback' (often missing)
196
+ # e['notes'] → str or {'value': str} (often missing)
197
+ ```
198
+
199
+ ---
200
+
201
+ ### Books API (specific edition)
202
+
203
+ Two sub-APIs: direct JSON for raw data, or `api/books` for enriched data.
204
+
205
+ #### Direct edition JSON
206
+
207
+ ```python
208
+ import json
209
+ from helpers import http_get
210
+
211
+ edition_id = 'OL7353617M' # from editions list e['key'] or cover_edition_key in search
212
+ edition = json.loads(http_get(f"https://openlibrary.org/books/{edition_id}.json"))
213
+
214
+ # edition['title'] == 'Fantastic Mr. Fox'
215
+ # edition['publishers'] == ['Puffin']
216
+ # edition['publish_date'] == 'October 1, 1988'
217
+ # edition['isbn_13'] == ['9780140328721']
218
+ # edition['isbn_10'] == ['0140328726']
219
+ # edition['number_of_pages'] == 96
220
+ # edition['covers'] == [...] ← cover IDs
221
+ # edition['languages'] == [{'key': '/languages/eng'}]
222
+ # edition['works'] == [{'key': '/works/OL45804W'}]
223
+ # edition['authors'] == [{'key': '/authors/OL34184A'}]
224
+ # edition['identifiers'] == {'goodreads': [...], 'librarything': [...]}
225
+ # edition['first_sentence'] == {'value': '...'} or str (often missing)
226
+ # edition['ocaid'] == 'fantast00dahl' ← Internet Archive ID (if available)
227
+ ```
228
+
229
+ #### Bibkeys API (enriched, multiple books at once)
230
+
231
+ ```python
232
+ # jscmd=data: cleaned up dict with cover URLs pre-built
233
+ r = json.loads(http_get(
234
+ "https://openlibrary.org/api/books"
235
+ "?bibkeys=ISBN:9780743273565,ISBN:9780451524935"
236
+ "&format=json&jscmd=data"
237
+ ))
238
+ # r == {'ISBN:9780743273565': {...}, 'ISBN:9780451524935': {...}}
239
+
240
+ book = r['ISBN:9780743273565']
241
+ # book['title'] == 'The Great Gatsby'
242
+ # book['authors'] == [{'url': '...', 'name': 'F. Scott Fitzgerald'}]
243
+ # book['publish_date'] == '2021'
244
+ # book['publishers'] == [{'name': 'Independently Published'}]
245
+ # book['number_of_pages'] == 208
246
+ # book['url'] == 'http://openlibrary.org/books/OL46773254M/The_Great_Gatsby'
247
+ # book['key'] == '/books/OL46773254M'
248
+ # book['cover'] == {'small': '...S.jpg', 'medium': '...M.jpg', 'large': '...L.jpg'}
249
+ # book['identifiers'] == {'isbn_13': [...], 'openlibrary': [...]}
250
+ # book['subjects'] == [{'name': 'Modern fiction', 'url': '...'}, ...]
251
+ # book['subject_places'] == None ← often null even with jscmd=data
252
+
253
+ # jscmd=details: raw edition JSON + extra fields
254
+ r2 = json.loads(http_get(
255
+ "https://openlibrary.org/api/books"
256
+ "?bibkeys=ISBN:9780743273565&format=json&jscmd=details"
257
+ ))
258
+ item = r2['ISBN:9780743273565']
259
+ # item['bib_key'] == 'ISBN:9780743273565'
260
+ # item['info_url'] == 'http://openlibrary.org/books/OL...'
261
+ # item['preview'] == 'noview' | 'restricted' | 'full'
262
+ # item['preview_url'] == URL to read on OL or IA
263
+ # item['thumbnail_url']== 'https://covers.openlibrary.org/b/id/14314120-S.jpg'
264
+ # item['details'] → raw edition JSON (same as /books/OL...M.json)
265
+
266
+ # Supported bibkey prefixes: ISBN:, OCLC:, LCCN:, OLID: (e.g. OLID:OL46773254M)
267
+ ```
268
+
269
+ ---
270
+
271
+ ### Authors API
272
+
273
+ ```python
274
+ import json
275
+ from helpers import http_get
276
+
277
+ # Lookup by known author key
278
+ author = json.loads(http_get("https://openlibrary.org/authors/OL26320A.json"))
279
+ # OL26320A is J.R.R. Tolkien (note: not Frank Herbert as originally stated — verify with search)
280
+
281
+ # author['name'] == 'J.R.R. Tolkien'
282
+ # author['fuller_name'] == 'John Ronald Reuel Tolkien'
283
+ # author['personal_name'] == 'J. R. R. Tolkien'
284
+ # author['birth_date'] == '3 January 1892' ← string, not parsed
285
+ # author['death_date'] == '2 September 1973'
286
+ # author['bio'] → str or {'type': '/type/text', 'value': str}
287
+ # author['photos'] == [6155606, 6433524, ...] ← photo IDs for covers API
288
+ # author['links'] == [{'title': '...', 'url': '...', 'type': {...}}, ...]
289
+ # author['remote_ids'] == {'wikidata': 'Q892', 'viaf': '95218067', ...}
290
+ # author['alternate_names']== ['J. R. R. Tolkien', 'TOLKIEN', ...]
291
+ # author['key'] == '/authors/OL26320A'
292
+ # author['wikipedia'] → URL string or None
293
+
294
+ # Author works (paginated)
295
+ works = json.loads(http_get("https://openlibrary.org/authors/OL26320A/works.json?limit=5"))
296
+ # works['size'] == 415
297
+ # works['entries'] == [{title, key, covers, authors, created, ...}, ...]
298
+ # works['links'] == {'self': '...', 'next': '...'}
299
+ ```
300
+
301
+ #### Author search
302
+
303
+ ```python
304
+ r = json.loads(http_get("https://openlibrary.org/search/authors.json?q=tolkien"))
305
+ # r['numFound'] == 40
306
+ # r['docs'][0]:
307
+ # 'name' == 'Christopher Tolkien'
308
+ # 'key' == 'OL2623360A' ← NOTE: no /authors/ prefix here
309
+ # 'birth_date' == '21 November 1924'
310
+ # 'death_date' == '16 January 2020'
311
+ # 'top_work' == 'The War of the Ring'
312
+ # 'work_count' == 43
313
+ # 'top_subjects' == [...]
314
+ # 'alternate_names' == [...]
315
+ # 'ratings_average' float
316
+ # 'want_to_read_count' int
317
+ # 'already_read_count' int
318
+ # 'currently_reading_count'int
319
+ ```
320
+
321
+ ---
322
+
323
+ ### Cover images
324
+
325
+ Covers are served directly as JPEG — redirect to Internet Archive CDN. No auth needed.
326
+
327
+ ```
328
+ # Book covers — three key types:
329
+ https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg # by cover ID (most reliable)
330
+ https://covers.openlibrary.org/b/isbn/{isbn}-{size}.jpg # by ISBN
331
+ https://covers.openlibrary.org/b/olid/{edition_id}-{size}.jpg # by edition OLID (unreliable — see gotchas)
332
+
333
+ # Author photos:
334
+ https://covers.openlibrary.org/a/id/{photo_id}-{size}.jpg
335
+
336
+ # Sizes: S (small), M (medium), L (large)
337
+ ```
338
+
339
+ ```python
340
+ import urllib.request
341
+
342
+ def get_cover_bytes(cover_id: int, size: str = 'M') -> bytes | None:
343
+ """Fetch cover image bytes. Returns None if no cover (43-byte GIF placeholder)."""
344
+ url = f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
345
+ req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
346
+ with urllib.request.urlopen(req, timeout=15) as resp:
347
+ data = resp.read()
348
+ return None if len(data) == 43 else data # 43-byte GIF = no cover placeholder
349
+
350
+ # Or just get the URL for embedding:
351
+ def cover_url(cover_id: int, size: str = 'M') -> str:
352
+ return f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
353
+
354
+ # Usage:
355
+ from helpers import http_get
356
+ import json
357
+ work = json.loads(http_get("https://openlibrary.org/works/OL893415W.json"))
358
+ if work.get('covers'):
359
+ img = get_cover_bytes(work['covers'][0], 'L') # first cover, large
360
+ # img is ~20–80KB JPEG bytes, redirected from ia*.archive.org
361
+ ```
362
+
363
+ To get cover by ISBN directly (e.g. for UI without a full book lookup):
364
+ ```python
365
+ # Medium-size cover by ISBN:
366
+ url = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg"
367
+ # Redirects to Internet Archive CDN, content-type: image/jpeg
368
+ # Use ?default=false to get 404 instead of 1×1 GIF placeholder for missing covers
369
+ url_safe = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg?default=false"
370
+ ```
371
+
372
+ ---
373
+
374
+ ### Subjects API
375
+
376
+ ```python
377
+ import json
378
+ from helpers import http_get
379
+
380
+ # Subject slugs: lowercase, underscores for spaces
381
+ r = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5"))
382
+ # r['name'] == 'science fiction'
383
+ # r['subject_type'] == 'subject' ← also: 'person', 'place', 'time'
384
+ # r['work_count'] == 20973
385
+ # r['works'] == [{title, key, cover_id, authors, edition_count, ...}, ...]
386
+
387
+ w = r['works'][0]
388
+ # w['title'] == 'Alice\'s Adventures in Wonderland'
389
+ # w['key'] == '/works/OL138052W'
390
+ # w['cover_id'] == 10527843
391
+ # w['cover_edition_key']== 'OL...'
392
+ # w['authors'] == [{'key': '/authors/OL22098A', 'name': 'Lewis Carroll'}]
393
+ # w['edition_count'] == 3546
394
+ # w['first_publish_year']== ...
395
+ # w['has_fulltext'] == True | False
396
+ # w['ia'] == 'identifier' (Internet Archive ID when available)
397
+
398
+ # Pagination: &offset=N
399
+ # Place subject:
400
+ r2 = json.loads(http_get("https://openlibrary.org/subjects/place:london.json?limit=5"))
401
+ # r2['subject_type'] == 'place', r2['work_count'] == 23927
402
+
403
+ # Person subject:
404
+ # https://openlibrary.org/subjects/person:napoleon.json?limit=5
405
+ # Time subject:
406
+ # https://openlibrary.org/subjects/time:middle_ages.json?limit=5
407
+
408
+ # Combine with ebooks=true to filter to only freely readable books:
409
+ r3 = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5&ebooks=true"))
410
+ # r3['works'][i]['has_fulltext'] == True for all results
411
+ ```
412
+
413
+ ---
414
+
415
+ ### Trending books
416
+
417
+ ```python
418
+ import json
419
+ from helpers import http_get
420
+
421
+ for period in ['daily', 'weekly', 'monthly']:
422
+ r = json.loads(http_get(f"https://openlibrary.org/trending/{period}.json?limit=10"))
423
+ # r['works'] == list of search-doc-style objects
424
+ # r['days'] == int (time window)
425
+ # r['hours'] == int
426
+ # Same fields as search docs (title, author_name, cover_i, key, ...)
427
+ print(period, r['works'][0]['title']) # e.g. 'Atomic Habits'
428
+ ```
429
+
430
+ ---
431
+
432
+ ## Rate limits
433
+
434
+ No authentication required. No API key. No explicit rate limit published.
435
+
436
+ Observed in testing: 5 requests completed in ~1 second with no throttling, no 429s. The API is served from CDN/Solr — in practice you can make 10–20 parallel requests without issue. For bulk operations (hundreds of ISBNs), use `ThreadPoolExecutor(max_workers=5)` to be a good citizen.
437
+
438
+ **No `User-Agent` override needed** — the default `Mozilla/5.0` from `http_get` is accepted by all Open Library endpoints (unlike Nominatim which blocks it).
439
+
440
+ ---
441
+
442
+ ## Gotchas
443
+
444
+ **`description` field has two shapes.** Both are real — check at runtime:
445
+ ```python
446
+ desc = work.get('description', '')
447
+ text = desc.get('value', '') if isinstance(desc, dict) else (desc or '')
448
+ ```
449
+
450
+ **`/works/OL45804W` is Fantastic Mr. Fox, not Dune.** The OL IDs in the original prompt were placeholders. Always resolve real IDs via the search API rather than hardcoding them.
451
+
452
+ **Author search `key` has no prefix.** `/search/authors.json` returns `key: 'OL26320A'`, but the Authors API and all other APIs use `/authors/OL26320A`. Add the prefix manually when constructing follow-up URLs.
453
+
454
+ **Missing cover → 43-byte GIF placeholder, not 404.** Without `?default=false`, the covers API returns a 1×1 transparent GIF instead of HTTP 404 for unknown IDs. Check `len(data) == 43` to detect missing covers.
455
+
456
+ **`covers.openlibrary.org/b/olid/{work_id}` is unreliable.** OLID-based cover URLs for work IDs (OL...W) return the placeholder even when covers exist. Always use `b/id/{cover_id}` (from `work['covers'][0]`) or `b/isbn/{isbn}` instead.
457
+
458
+ **Bibkeys API picks one edition per ISBN.** When the same ISBN appears on multiple editions (reprint, reissue), `api/books?bibkeys=ISBN:...` returns one — and it may not be the most common edition.
459
+
460
+ **`publish_date` is a raw string.** Values like `'October 1, 1988'`, `'19/08/2017'`, `'2021'`, and `'1965-01-01'` all appear. Don't parse without normalization.
461
+
462
+ **`/works/.../editions.json` pagination uses `links.next`.** Unlike search (which uses `offset=`), check `links['next']` in the response to know if more pages exist:
463
+ ```python
464
+ resp = json.loads(http_get("https://openlibrary.org/works/OL893415W/editions.json?limit=50"))
465
+ while 'next' in resp.get('links', {}):
466
+ resp = json.loads(http_get("https://openlibrary.org" + resp['links']['next']))
467
+ # process resp['entries']
468
+ ```
469
+
470
+ **404 for non-existent IDs.** `/works/OL99999999W.json`, `/books/OL99999999M.json`, and `/authors/OL99999999A.json` all raise `HTTPError: HTTP Error 404: Not Found`. Wrap in try/except.
471
+
472
+ **Search `docs` default fields are minimal.** The default response includes ~15 fields. Add `&fields=*` to get all 100+ Solr fields (ratings, ISBNs, publishers, subjects, Goodreads IDs, etc.). Alternatively specify exactly what you need: `&fields=title,isbn,ratings_average`.