@pencil-agent/nano-pencil 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (195) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/mcp/mcp-client.d.ts +3 -1
  7. package/dist/core/mcp/mcp-client.js +6 -6
  8. package/dist/core/mcp/mcp-config.d.ts +3 -3
  9. package/dist/core/mcp/mcp-config.js +1 -1
  10. package/dist/core/mcp/mcp-manager.d.ts +5 -1
  11. package/dist/core/mcp/mcp-manager.js +1 -1
  12. package/dist/core/platform/config/resource-loader.d.ts +2 -0
  13. package/dist/core/platform/config/resource-loader.js +2 -2
  14. package/dist/core/runtime/agent-session.d.ts +12 -0
  15. package/dist/core/runtime/agent-session.js +8 -8
  16. package/dist/core/runtime/sdk.d.ts +8 -0
  17. package/dist/core/runtime/sdk.js +1 -1
  18. package/dist/extensions/builtin/AGENT.md +115 -115
  19. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  20. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  99. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  100. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  101. package/dist/extensions/builtin/browser/browser.md +73 -73
  102. package/dist/extensions/builtin/browser/install.md +142 -142
  103. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  104. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  105. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  107. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  112. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  113. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  114. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  115. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  116. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  117. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  118. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  119. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  120. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  121. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  122. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  123. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  124. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  125. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  126. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  127. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  128. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  129. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  130. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  131. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  132. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  133. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  134. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  135. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  136. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  137. package/dist/extensions/builtin/goal/README.md +67 -67
  138. package/dist/extensions/builtin/grub/README.md +112 -112
  139. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  140. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  141. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  142. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  143. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  144. package/dist/extensions/builtin/loop/README.md +92 -92
  145. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  146. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  147. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  148. package/dist/extensions/builtin/sal/README.md +72 -72
  149. package/dist/extensions/builtin/security-audit/README.md +289 -289
  150. package/dist/extensions/builtin/team/AGENT.md +112 -112
  151. package/dist/extensions/builtin/team/TESTING.md +299 -299
  152. package/dist/extensions/builtin/token-save/README.md +56 -56
  153. package/dist/extensions/optional/AGENT.md +10 -10
  154. package/dist/modes/interactive/interactive-mode.js +36 -36
  155. package/dist/modes/interactive/theme/dark.json +85 -85
  156. package/dist/modes/interactive/theme/light.json +84 -84
  157. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  158. package/dist/modes/interactive/theme/warm.json +81 -81
  159. package/dist/node_modules/@pencil-agent/agent-core/dist/agent-loop.js +3 -2
  160. package/dist/node_modules/@pencil-agent/agent-core/dist/structured-adaptive-agent-loop.js +2 -1
  161. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  162. package/docs/cc-agent-design.md +1297 -0
  163. package/docs/cc-tui-design.md +1333 -0
  164. package/docs/codex-goal-command-impl.md +1055 -1055
  165. package/docs/codex-goal-vs-grub.md +500 -500
  166. package/docs/custom-provider.md +27 -27
  167. package/docs/extensions.md +27 -27
  168. package/docs/keybindings.md +27 -27
  169. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  170. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  171. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  172. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  173. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  174. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  175. package/docs/loop-usage-examples.md +214 -214
  176. package/docs/models.md +27 -27
  177. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +170 -0
  178. package/docs/packages.md +27 -27
  179. package/docs/pi-design-philosophy.md +457 -457
  180. package/docs/planmode.md +1987 -1987
  181. package/docs/prompt-templates.md +27 -27
  182. package/docs/providers.md +27 -27
  183. package/docs/scan-report.md +3820 -0
  184. package/docs/sdk.md +27 -27
  185. package/docs/skills.md +27 -27
  186. package/docs/themes.md +27 -27
  187. package/docs/tui.md +27 -27
  188. package/docs//345/257/271/346/240/207Claude-Code.md +1775 -0
  189. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +261 -0
  190. package/package.json +190 -190
  191. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +0 -851
  192. package/docs/SDK-TESTING.md +0 -364
  193. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +0 -593
  194. package/docs/startup-performance-optimization.md +0 -301
  195. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +0 -47
@@ -1,472 +1,472 @@
1
- # Open Library — Book Data Extraction
2
-
3
- `https://openlibrary.org` — Internet Archive's free book catalog. All endpoints are public JSON APIs — no auth, no browser, no scraping required.
4
-
5
- ## Do this first
6
-
7
- **Every task is a direct HTTP call — never open the browser.**
8
-
9
- ```python
10
- import json
11
- from helpers import http_get
12
-
13
- # Search by title
14
- results = json.loads(http_get("https://openlibrary.org/search.json?q=dune&limit=5"))
15
- # results['numFound'] == 49090
16
- # results['docs'] == list of work objects
17
- # results['start'] == 0 (offset for pagination)
18
- ```
19
-
20
- The search API is your entry point for everything. It returns work-level records (all editions grouped). To get edition details, follow the `key` to the Works or Books API.
21
-
22
- ---
23
-
24
- ## Common workflows
25
-
26
- ### Search by query, author, title, or ISBN
27
-
28
- ```python
29
- import json
30
- from helpers import http_get
31
-
32
- # Free-text search
33
- r = json.loads(http_get("https://openlibrary.org/search.json?q=dune+frank+herbert&limit=5"))
34
-
35
- # Author search
36
- r = json.loads(http_get(
37
- "https://openlibrary.org/search.json?author=tolkien&limit=5"
38
- "&fields=title,author_name,first_publish_year,isbn"
39
- ))
40
- # fields=* returns all available fields; default returns ~15
41
-
42
- # Title + author combined
43
- r = json.loads(http_get(
44
- "https://openlibrary.org/search.json?title=dune&author=frank+herbert&limit=3"
45
- "&fields=title,author_name,edition_count,first_publish_year"
46
- ))
47
- # r['docs'][0]['title'] == 'Dune'
48
- # r['docs'][0]['author_name'] == ['Frank Herbert']
49
- # r['docs'][0]['first_publish_year'] == 1965
50
- # r['docs'][0]['edition_count'] == 120
51
-
52
- # ISBN lookup (returns 0–2 results for the same work)
53
- r = json.loads(http_get("https://openlibrary.org/search.json?isbn=9780743273565"))
54
- # r['numFound'] == 2
55
- # r['docs'][0]['title'] == 'The Great Gatsby'
56
- # r['docs'][0]['key'] == '/works/OL468431W'
57
- ```
58
-
59
- **Sort options** (`&sort=`): `new` (recently added), `old`, `random`, `editions` (most editions), `scans` (most scans). Default is relevance.
60
-
61
- **Language filter**: `&language=fre` (ISO 639-2/B codes: `eng`, `fre`, `ger`, `spa`, `ita`, etc.)
62
-
63
- **Pagination**: `&limit=N&offset=N`. Max limit not enforced but keep under 100 for reliability.
64
-
65
- #### Search doc fields (default — ~15 keys always present)
66
-
67
- | Field | Type | Notes |
68
- |---|---|---|
69
- | `key` | str | `/works/OL893415W` — use for Works API |
70
- | `title` | str | Work title |
71
- | `author_name` | list[str] | e.g. `['Frank Herbert']` |
72
- | `author_key` | list[str] | e.g. `['OL79034A']` |
73
- | `first_publish_year` | int | |
74
- | `edition_count` | int | Number of editions across all languages |
75
- | `cover_i` | int | Cover image ID — use with covers API |
76
- | `cover_edition_key` | str | e.g. `OL7353617M` |
77
- | `language` | list[str] | ISO codes of all editions |
78
- | `ia` | list[str] | Internet Archive identifiers (when has_fulltext=true) |
79
- | `ebook_access` | str | `'public'`, `'borrowable'`, `'no_ebook'` |
80
- | `has_fulltext` | bool | |
81
-
82
- #### Extra fields with `&fields=*`
83
-
84
- ```python
85
- # With fields=* you also get:
86
- # 'isbn' list[str] All ISBNs across editions
87
- # 'publisher' list[str] All publishers ever
88
- # 'publish_date' list[str] All publish dates (strings, inconsistent formats)
89
- # 'publish_year' list[int] Parsed years
90
- # 'subject' list[str] Subject headings
91
- # 'person' list[str] Subject persons (e.g. 'Big Brother')
92
- # 'place' list[str] Subject places
93
- # 'time' list[str] Subject times
94
- # 'number_of_pages_median' int Median page count across editions
95
- # 'ratings_average' float e.g. 4.29
96
- # 'ratings_count' int
97
- # 'want_to_read_count' int
98
- # 'already_read_count' int
99
- # 'currently_reading_count'int
100
- # 'readinglog_count' int Total of all reading log entries
101
- # 'first_sentence' list[str]
102
- # 'id_goodreads' list[str]
103
- # 'id_librarything' list[str]
104
- # 'id_wikidata' list[str]
105
- # 'ddc' list[str] Dewey Decimal Classification
106
- # 'lcc' list[str] Library of Congress Classification
107
- # 'lccn' list[str]
108
- ```
109
-
110
- ---
111
-
112
- ### Bulk ISBN lookups (parallel)
113
-
114
- ```python
115
- import json
116
- from helpers import http_get
117
- from concurrent.futures import ThreadPoolExecutor
118
-
119
- isbns = ['9780743273565', '9780451524935', '9780618346257']
120
-
121
- def lookup_isbn(isbn):
122
- url = f"https://openlibrary.org/search.json?isbn={isbn}&fields=title,author_name,first_publish_year,key"
123
- r = json.loads(http_get(url))
124
- if r['docs']:
125
- d = r['docs'][0]
126
- return {'isbn': isbn, 'title': d.get('title'), 'author': d.get('author_name', [None])[0],
127
- 'year': d.get('first_publish_year'), 'key': d.get('key')}
128
- return {'isbn': isbn, 'found': False}
129
-
130
- with ThreadPoolExecutor(max_workers=5) as ex:
131
- books = list(ex.map(lookup_isbn, isbns))
132
-
133
- # [{'isbn': '9780743273565', 'title': 'The Great Gatsby', 'author': 'F. Scott Fitzgerald', 'year': 1920, ...},
134
- # {'isbn': '9780451524935', 'title': 'Nineteen Eighty-Four', 'author': 'George Orwell', 'year': 1949, ...},
135
- # {'isbn': '9780618346257', 'title': 'The Fellowship of the Ring', 'author': 'J.R.R. Tolkien', 'year': 1954, ...}]
136
- ```
137
-
138
- ---
139
-
140
- ### Works API (editions grouped by title)
141
-
142
- Returns all metadata for a work (all editions combined). Get the work ID from `key` in search results.
143
-
144
- ```python
145
- import json
146
- from helpers import http_get
147
-
148
- work_id = 'OL893415W' # from search doc['key'] = '/works/OL893415W'
149
- work = json.loads(http_get(f"https://openlibrary.org/works/{work_id}.json"))
150
-
151
- # work['title'] == 'Dune'
152
- # work['key'] == '/works/OL893415W'
153
- # work['covers'] == [11481354, 12375564, 11157826] ← cover IDs for covers API
154
- # work['subjects'] == ['Dune (Imaginary place)', 'Fiction', ...]
155
- # work['subject_places'] == [...] ← geographic subjects (may be absent)
156
- # work['subject_people'] == [...] ← person subjects (may be absent)
157
- # work['subject_times'] == [...] ← time subjects (may be absent)
158
- # work['authors'] == [{'author': {'key': '/authors/OL79034A'}, 'type': {...}}]
159
- # work['description'] → either str OR {'type': '/type/text', 'value': str} ← see gotchas
160
- # work['created'] == {'type': '/type/datetime', 'value': '2009-10-15T11:34:21.437031'}
161
- # work['last_modified'] same shape as created
162
- ```
163
-
164
- Helper for the description field (which has two possible shapes):
165
-
166
- ```python
167
- def get_description(work: dict) -> str:
168
- desc = work.get('description', '')
169
- if isinstance(desc, dict):
170
- return desc.get('value', '')
171
- return desc or ''
172
- ```
173
-
174
- #### Works editions (paginated list of all editions)
175
-
176
- ```python
177
- editions_resp = json.loads(http_get(
178
- f"https://openlibrary.org/works/{work_id}/editions.json?limit=10&offset=0"
179
- ))
180
- # editions_resp['size'] == 120 (total edition count)
181
- # editions_resp['entries'] == [...] (up to limit items)
182
- # editions_resp['links'] == {'self': '...', 'work': '...', 'next': '...', 'prev': '...'}
183
- # ← use links['next'] for pagination when offset+limit < size
184
-
185
- e = editions_resp['entries'][0]
186
- # e['title'] == 'Duna'
187
- # e['publishers'] == ['Editora Aleph']
188
- # e['publish_date'] == '19/08/2017' ← inconsistent format, string
189
- # e['isbn_13'] == ['9788576573135']
190
- # e['isbn_10'] == ['857657313X']
191
- # e['covers'] == [10368109]
192
- # e['number_of_pages'] == 680
193
- # e['languages'] == [{'key': '/languages/por'}]
194
- # e['key'] == '/books/OL28969075M'
195
- # e['physical_format'] == 'Paperback' (often missing)
196
- # e['notes'] → str or {'value': str} (often missing)
197
- ```
198
-
199
- ---
200
-
201
- ### Books API (specific edition)
202
-
203
- Two sub-APIs: direct JSON for raw data, or `api/books` for enriched data.
204
-
205
- #### Direct edition JSON
206
-
207
- ```python
208
- import json
209
- from helpers import http_get
210
-
211
- edition_id = 'OL7353617M' # from editions list e['key'] or cover_edition_key in search
212
- edition = json.loads(http_get(f"https://openlibrary.org/books/{edition_id}.json"))
213
-
214
- # edition['title'] == 'Fantastic Mr. Fox'
215
- # edition['publishers'] == ['Puffin']
216
- # edition['publish_date'] == 'October 1, 1988'
217
- # edition['isbn_13'] == ['9780140328721']
218
- # edition['isbn_10'] == ['0140328726']
219
- # edition['number_of_pages'] == 96
220
- # edition['covers'] == [...] ← cover IDs
221
- # edition['languages'] == [{'key': '/languages/eng'}]
222
- # edition['works'] == [{'key': '/works/OL45804W'}]
223
- # edition['authors'] == [{'key': '/authors/OL34184A'}]
224
- # edition['identifiers'] == {'goodreads': [...], 'librarything': [...]}
225
- # edition['first_sentence'] == {'value': '...'} or str (often missing)
226
- # edition['ocaid'] == 'fantast00dahl' ← Internet Archive ID (if available)
227
- ```
228
-
229
- #### Bibkeys API (enriched, multiple books at once)
230
-
231
- ```python
232
- # jscmd=data: cleaned up dict with cover URLs pre-built
233
- r = json.loads(http_get(
234
- "https://openlibrary.org/api/books"
235
- "?bibkeys=ISBN:9780743273565,ISBN:9780451524935"
236
- "&format=json&jscmd=data"
237
- ))
238
- # r == {'ISBN:9780743273565': {...}, 'ISBN:9780451524935': {...}}
239
-
240
- book = r['ISBN:9780743273565']
241
- # book['title'] == 'The Great Gatsby'
242
- # book['authors'] == [{'url': '...', 'name': 'F. Scott Fitzgerald'}]
243
- # book['publish_date'] == '2021'
244
- # book['publishers'] == [{'name': 'Independently Published'}]
245
- # book['number_of_pages'] == 208
246
- # book['url'] == 'http://openlibrary.org/books/OL46773254M/The_Great_Gatsby'
247
- # book['key'] == '/books/OL46773254M'
248
- # book['cover'] == {'small': '...S.jpg', 'medium': '...M.jpg', 'large': '...L.jpg'}
249
- # book['identifiers'] == {'isbn_13': [...], 'openlibrary': [...]}
250
- # book['subjects'] == [{'name': 'Modern fiction', 'url': '...'}, ...]
251
- # book['subject_places'] == None ← often null even with jscmd=data
252
-
253
- # jscmd=details: raw edition JSON + extra fields
254
- r2 = json.loads(http_get(
255
- "https://openlibrary.org/api/books"
256
- "?bibkeys=ISBN:9780743273565&format=json&jscmd=details"
257
- ))
258
- item = r2['ISBN:9780743273565']
259
- # item['bib_key'] == 'ISBN:9780743273565'
260
- # item['info_url'] == 'http://openlibrary.org/books/OL...'
261
- # item['preview'] == 'noview' | 'restricted' | 'full'
262
- # item['preview_url'] == URL to read on OL or IA
263
- # item['thumbnail_url']== 'https://covers.openlibrary.org/b/id/14314120-S.jpg'
264
- # item['details'] → raw edition JSON (same as /books/OL...M.json)
265
-
266
- # Supported bibkey prefixes: ISBN:, OCLC:, LCCN:, OLID: (e.g. OLID:OL46773254M)
267
- ```
268
-
269
- ---
270
-
271
- ### Authors API
272
-
273
- ```python
274
- import json
275
- from helpers import http_get
276
-
277
- # Lookup by known author key
278
- author = json.loads(http_get("https://openlibrary.org/authors/OL26320A.json"))
279
- # OL26320A is J.R.R. Tolkien (note: not Frank Herbert as originally stated — verify with search)
280
-
281
- # author['name'] == 'J.R.R. Tolkien'
282
- # author['fuller_name'] == 'John Ronald Reuel Tolkien'
283
- # author['personal_name'] == 'J. R. R. Tolkien'
284
- # author['birth_date'] == '3 January 1892' ← string, not parsed
285
- # author['death_date'] == '2 September 1973'
286
- # author['bio'] → str or {'type': '/type/text', 'value': str}
287
- # author['photos'] == [6155606, 6433524, ...] ← photo IDs for covers API
288
- # author['links'] == [{'title': '...', 'url': '...', 'type': {...}}, ...]
289
- # author['remote_ids'] == {'wikidata': 'Q892', 'viaf': '95218067', ...}
290
- # author['alternate_names']== ['J. R. R. Tolkien', 'TOLKIEN', ...]
291
- # author['key'] == '/authors/OL26320A'
292
- # author['wikipedia'] → URL string or None
293
-
294
- # Author works (paginated)
295
- works = json.loads(http_get("https://openlibrary.org/authors/OL26320A/works.json?limit=5"))
296
- # works['size'] == 415
297
- # works['entries'] == [{title, key, covers, authors, created, ...}, ...]
298
- # works['links'] == {'self': '...', 'next': '...'}
299
- ```
300
-
301
- #### Author search
302
-
303
- ```python
304
- r = json.loads(http_get("https://openlibrary.org/search/authors.json?q=tolkien"))
305
- # r['numFound'] == 40
306
- # r['docs'][0]:
307
- # 'name' == 'Christopher Tolkien'
308
- # 'key' == 'OL2623360A' ← NOTE: no /authors/ prefix here
309
- # 'birth_date' == '21 November 1924'
310
- # 'death_date' == '16 January 2020'
311
- # 'top_work' == 'The War of the Ring'
312
- # 'work_count' == 43
313
- # 'top_subjects' == [...]
314
- # 'alternate_names' == [...]
315
- # 'ratings_average' float
316
- # 'want_to_read_count' int
317
- # 'already_read_count' int
318
- # 'currently_reading_count'int
319
- ```
320
-
321
- ---
322
-
323
- ### Cover images
324
-
325
- Covers are served directly as JPEG — redirect to Internet Archive CDN. No auth needed.
326
-
327
- ```
328
- # Book covers — three key types:
329
- https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg # by cover ID (most reliable)
330
- https://covers.openlibrary.org/b/isbn/{isbn}-{size}.jpg # by ISBN
331
- https://covers.openlibrary.org/b/olid/{edition_id}-{size}.jpg # by edition OLID (unreliable — see gotchas)
332
-
333
- # Author photos:
334
- https://covers.openlibrary.org/a/id/{photo_id}-{size}.jpg
335
-
336
- # Sizes: S (small), M (medium), L (large)
337
- ```
338
-
339
- ```python
340
- import urllib.request
341
-
342
- def get_cover_bytes(cover_id: int, size: str = 'M') -> bytes | None:
343
- """Fetch cover image bytes. Returns None if no cover (43-byte GIF placeholder)."""
344
- url = f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
345
- req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
346
- with urllib.request.urlopen(req, timeout=15) as resp:
347
- data = resp.read()
348
- return None if len(data) == 43 else data # 43-byte GIF = no cover placeholder
349
-
350
- # Or just get the URL for embedding:
351
- def cover_url(cover_id: int, size: str = 'M') -> str:
352
- return f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
353
-
354
- # Usage:
355
- from helpers import http_get
356
- import json
357
- work = json.loads(http_get("https://openlibrary.org/works/OL893415W.json"))
358
- if work.get('covers'):
359
- img = get_cover_bytes(work['covers'][0], 'L') # first cover, large
360
- # img is ~20–80KB JPEG bytes, redirected from ia*.archive.org
361
- ```
362
-
363
- To get cover by ISBN directly (e.g. for UI without a full book lookup):
364
- ```python
365
- # Medium-size cover by ISBN:
366
- url = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg"
367
- # Redirects to Internet Archive CDN, content-type: image/jpeg
368
- # Use ?default=false to get 404 instead of 1×1 GIF placeholder for missing covers
369
- url_safe = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg?default=false"
370
- ```
371
-
372
- ---
373
-
374
- ### Subjects API
375
-
376
- ```python
377
- import json
378
- from helpers import http_get
379
-
380
- # Subject slugs: lowercase, underscores for spaces
381
- r = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5"))
382
- # r['name'] == 'science fiction'
383
- # r['subject_type'] == 'subject' ← also: 'person', 'place', 'time'
384
- # r['work_count'] == 20973
385
- # r['works'] == [{title, key, cover_id, authors, edition_count, ...}, ...]
386
-
387
- w = r['works'][0]
388
- # w['title'] == 'Alice\'s Adventures in Wonderland'
389
- # w['key'] == '/works/OL138052W'
390
- # w['cover_id'] == 10527843
391
- # w['cover_edition_key']== 'OL...'
392
- # w['authors'] == [{'key': '/authors/OL22098A', 'name': 'Lewis Carroll'}]
393
- # w['edition_count'] == 3546
394
- # w['first_publish_year']== ...
395
- # w['has_fulltext'] == True | False
396
- # w['ia'] == 'identifier' (Internet Archive ID when available)
397
-
398
- # Pagination: &offset=N
399
- # Place subject:
400
- r2 = json.loads(http_get("https://openlibrary.org/subjects/place:london.json?limit=5"))
401
- # r2['subject_type'] == 'place', r2['work_count'] == 23927
402
-
403
- # Person subject:
404
- # https://openlibrary.org/subjects/person:napoleon.json?limit=5
405
- # Time subject:
406
- # https://openlibrary.org/subjects/time:middle_ages.json?limit=5
407
-
408
- # Combine with ebooks=true to filter to only freely readable books:
409
- r3 = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5&ebooks=true"))
410
- # r3['works'][i]['has_fulltext'] == True for all results
411
- ```
412
-
413
- ---
414
-
415
- ### Trending books
416
-
417
- ```python
418
- import json
419
- from helpers import http_get
420
-
421
- for period in ['daily', 'weekly', 'monthly']:
422
- r = json.loads(http_get(f"https://openlibrary.org/trending/{period}.json?limit=10"))
423
- # r['works'] == list of search-doc-style objects
424
- # r['days'] == int (time window)
425
- # r['hours'] == int
426
- # Same fields as search docs (title, author_name, cover_i, key, ...)
427
- print(period, r['works'][0]['title']) # e.g. 'Atomic Habits'
428
- ```
429
-
430
- ---
431
-
432
- ## Rate limits
433
-
434
- No authentication required. No API key. No explicit rate limit published.
435
-
436
- Observed in testing: 5 requests completed in ~1 second with no throttling, no 429s. The API is served from CDN/Solr — in practice you can make 10–20 parallel requests without issue. For bulk operations (hundreds of ISBNs), use `ThreadPoolExecutor(max_workers=5)` to be a good citizen.
437
-
438
- **No `User-Agent` override needed** — the default `Mozilla/5.0` from `http_get` is accepted by all Open Library endpoints (unlike Nominatim which blocks it).
439
-
440
- ---
441
-
442
- ## Gotchas
443
-
444
- **`description` field has two shapes.** Both are real — check at runtime:
445
- ```python
446
- desc = work.get('description', '')
447
- text = desc.get('value', '') if isinstance(desc, dict) else (desc or '')
448
- ```
449
-
450
- **`/works/OL45804W` is Fantastic Mr. Fox, not Dune.** The OL IDs in the original prompt were placeholders. Always resolve real IDs via the search API rather than hardcoding them.
451
-
452
- **Author search `key` has no prefix.** `/search/authors.json` returns `key: 'OL26320A'`, but the Authors API and all other APIs use `/authors/OL26320A`. Add the prefix manually when constructing follow-up URLs.
453
-
454
- **Missing cover → 43-byte GIF placeholder, not 404.** Without `?default=false`, the covers API returns a 1×1 transparent GIF instead of HTTP 404 for unknown IDs. Check `len(data) == 43` to detect missing covers.
455
-
456
- **`covers.openlibrary.org/b/olid/{work_id}` is unreliable.** OLID-based cover URLs for work IDs (OL...W) return the placeholder even when covers exist. Always use `b/id/{cover_id}` (from `work['covers'][0]`) or `b/isbn/{isbn}` instead.
457
-
458
- **Bibkeys API picks one edition per ISBN.** When the same ISBN appears on multiple editions (reprint, reissue), `api/books?bibkeys=ISBN:...` returns one — and it may not be the most common edition.
459
-
460
- **`publish_date` is a raw string.** Values like `'October 1, 1988'`, `'19/08/2017'`, `'2021'`, and `'1965-01-01'` all appear. Don't parse without normalization.
461
-
462
- **`/works/.../editions.json` pagination uses `links.next`.** Unlike search (which uses `offset=`), check `links['next']` in the response to know if more pages exist:
463
- ```python
464
- resp = json.loads(http_get("https://openlibrary.org/works/OL893415W/editions.json?limit=50"))
465
- while 'next' in resp.get('links', {}):
466
- resp = json.loads(http_get("https://openlibrary.org" + resp['links']['next']))
467
- # process resp['entries']
468
- ```
469
-
470
- **404 for non-existent IDs.** `/works/OL99999999W.json`, `/books/OL99999999M.json`, and `/authors/OL99999999A.json` all raise `HTTPError: HTTP Error 404: Not Found`. Wrap in try/except.
471
-
472
- **Search `docs` default fields are minimal.** The default response includes ~15 fields. Add `&fields=*` to get all 100+ Solr fields (ratings, ISBNs, publishers, subjects, Goodreads IDs, etc.). Alternatively specify exactly what you need: `&fields=title,isbn,ratings_average`.
1
+ # Open Library — Book Data Extraction
2
+
3
+ `https://openlibrary.org` — Internet Archive's free book catalog. All endpoints are public JSON APIs — no auth, no browser, no scraping required.
4
+
5
+ ## Do this first
6
+
7
+ **Every task is a direct HTTP call — never open the browser.**
8
+
9
+ ```python
10
+ import json
11
+ from helpers import http_get
12
+
13
+ # Search by title
14
+ results = json.loads(http_get("https://openlibrary.org/search.json?q=dune&limit=5"))
15
+ # results['numFound'] == 49090
16
+ # results['docs'] == list of work objects
17
+ # results['start'] == 0 (offset for pagination)
18
+ ```
19
+
20
+ The search API is your entry point for everything. It returns work-level records (all editions grouped). To get edition details, follow the `key` to the Works or Books API.
21
+
22
+ ---
23
+
24
+ ## Common workflows
25
+
26
+ ### Search by query, author, title, or ISBN
27
+
28
+ ```python
29
+ import json
30
+ from helpers import http_get
31
+
32
+ # Free-text search
33
+ r = json.loads(http_get("https://openlibrary.org/search.json?q=dune+frank+herbert&limit=5"))
34
+
35
+ # Author search
36
+ r = json.loads(http_get(
37
+ "https://openlibrary.org/search.json?author=tolkien&limit=5"
38
+ "&fields=title,author_name,first_publish_year,isbn"
39
+ ))
40
+ # fields=* returns all available fields; default returns ~15
41
+
42
+ # Title + author combined
43
+ r = json.loads(http_get(
44
+ "https://openlibrary.org/search.json?title=dune&author=frank+herbert&limit=3"
45
+ "&fields=title,author_name,edition_count,first_publish_year"
46
+ ))
47
+ # r['docs'][0]['title'] == 'Dune'
48
+ # r['docs'][0]['author_name'] == ['Frank Herbert']
49
+ # r['docs'][0]['first_publish_year'] == 1965
50
+ # r['docs'][0]['edition_count'] == 120
51
+
52
+ # ISBN lookup (returns 0–2 results for the same work)
53
+ r = json.loads(http_get("https://openlibrary.org/search.json?isbn=9780743273565"))
54
+ # r['numFound'] == 2
55
+ # r['docs'][0]['title'] == 'The Great Gatsby'
56
+ # r['docs'][0]['key'] == '/works/OL468431W'
57
+ ```
58
+
59
+ **Sort options** (`&sort=`): `new` (recently added), `old`, `random`, `editions` (most editions), `scans` (most scans). Default is relevance.
60
+
61
+ **Language filter**: `&language=fre` (ISO 639-2/B codes: `eng`, `fre`, `ger`, `spa`, `ita`, etc.)
62
+
63
+ **Pagination**: `&limit=N&offset=N`. Max limit not enforced but keep under 100 for reliability.
64
+
65
+ #### Search doc fields (default — ~15 keys always present)
66
+
67
+ | Field | Type | Notes |
68
+ |---|---|---|
69
+ | `key` | str | `/works/OL893415W` — use for Works API |
70
+ | `title` | str | Work title |
71
+ | `author_name` | list[str] | e.g. `['Frank Herbert']` |
72
+ | `author_key` | list[str] | e.g. `['OL79034A']` |
73
+ | `first_publish_year` | int | |
74
+ | `edition_count` | int | Number of editions across all languages |
75
+ | `cover_i` | int | Cover image ID — use with covers API |
76
+ | `cover_edition_key` | str | e.g. `OL7353617M` |
77
+ | `language` | list[str] | ISO codes of all editions |
78
+ | `ia` | list[str] | Internet Archive identifiers (when has_fulltext=true) |
79
+ | `ebook_access` | str | `'public'`, `'borrowable'`, `'no_ebook'` |
80
+ | `has_fulltext` | bool | |
81
+
82
+ #### Extra fields with `&fields=*`
83
+
84
+ ```python
85
+ # With fields=* you also get:
86
+ # 'isbn' list[str] All ISBNs across editions
87
+ # 'publisher' list[str] All publishers ever
88
+ # 'publish_date' list[str] All publish dates (strings, inconsistent formats)
89
+ # 'publish_year' list[int] Parsed years
90
+ # 'subject' list[str] Subject headings
91
+ # 'person' list[str] Subject persons (e.g. 'Big Brother')
92
+ # 'place' list[str] Subject places
93
+ # 'time' list[str] Subject times
94
+ # 'number_of_pages_median' int Median page count across editions
95
+ # 'ratings_average' float e.g. 4.29
96
+ # 'ratings_count' int
97
+ # 'want_to_read_count' int
98
+ # 'already_read_count' int
99
+ # 'currently_reading_count'int
100
+ # 'readinglog_count' int Total of all reading log entries
101
+ # 'first_sentence' list[str]
102
+ # 'id_goodreads' list[str]
103
+ # 'id_librarything' list[str]
104
+ # 'id_wikidata' list[str]
105
+ # 'ddc' list[str] Dewey Decimal Classification
106
+ # 'lcc' list[str] Library of Congress Classification
107
+ # 'lccn' list[str]
108
+ ```
109
+
110
+ ---
111
+
112
+ ### Bulk ISBN lookups (parallel)
113
+
114
+ ```python
115
+ import json
116
+ from helpers import http_get
117
+ from concurrent.futures import ThreadPoolExecutor
118
+
119
+ isbns = ['9780743273565', '9780451524935', '9780618346257']
120
+
121
+ def lookup_isbn(isbn):
122
+ url = f"https://openlibrary.org/search.json?isbn={isbn}&fields=title,author_name,first_publish_year,key"
123
+ r = json.loads(http_get(url))
124
+ if r['docs']:
125
+ d = r['docs'][0]
126
+ return {'isbn': isbn, 'title': d.get('title'), 'author': d.get('author_name', [None])[0],
127
+ 'year': d.get('first_publish_year'), 'key': d.get('key')}
128
+ return {'isbn': isbn, 'found': False}
129
+
130
+ with ThreadPoolExecutor(max_workers=5) as ex:
131
+ books = list(ex.map(lookup_isbn, isbns))
132
+
133
+ # [{'isbn': '9780743273565', 'title': 'The Great Gatsby', 'author': 'F. Scott Fitzgerald', 'year': 1920, ...},
134
+ # {'isbn': '9780451524935', 'title': 'Nineteen Eighty-Four', 'author': 'George Orwell', 'year': 1949, ...},
135
+ # {'isbn': '9780618346257', 'title': 'The Fellowship of the Ring', 'author': 'J.R.R. Tolkien', 'year': 1954, ...}]
136
+ ```
137
+
138
+ ---
139
+
140
+ ### Works API (editions grouped by title)
141
+
142
+ Returns all metadata for a work (all editions combined). Get the work ID from `key` in search results.
143
+
144
+ ```python
145
+ import json
146
+ from helpers import http_get
147
+
148
+ work_id = 'OL893415W' # from search doc['key'] = '/works/OL893415W'
149
+ work = json.loads(http_get(f"https://openlibrary.org/works/{work_id}.json"))
150
+
151
+ # work['title'] == 'Dune'
152
+ # work['key'] == '/works/OL893415W'
153
+ # work['covers'] == [11481354, 12375564, 11157826] ← cover IDs for covers API
154
+ # work['subjects'] == ['Dune (Imaginary place)', 'Fiction', ...]
155
+ # work['subject_places'] == [...] ← geographic subjects (may be absent)
156
+ # work['subject_people'] == [...] ← person subjects (may be absent)
157
+ # work['subject_times'] == [...] ← time subjects (may be absent)
158
+ # work['authors'] == [{'author': {'key': '/authors/OL79034A'}, 'type': {...}}]
159
+ # work['description'] → either str OR {'type': '/type/text', 'value': str} ← see gotchas
160
+ # work['created'] == {'type': '/type/datetime', 'value': '2009-10-15T11:34:21.437031'}
161
+ # work['last_modified'] same shape as created
162
+ ```
163
+
164
+ Helper for the description field (which has two possible shapes):
165
+
166
+ ```python
167
+ def get_description(work: dict) -> str:
168
+ desc = work.get('description', '')
169
+ if isinstance(desc, dict):
170
+ return desc.get('value', '')
171
+ return desc or ''
172
+ ```
173
+
174
+ #### Works editions (paginated list of all editions)
175
+
176
+ ```python
177
+ editions_resp = json.loads(http_get(
178
+ f"https://openlibrary.org/works/{work_id}/editions.json?limit=10&offset=0"
179
+ ))
180
+ # editions_resp['size'] == 120 (total edition count)
181
+ # editions_resp['entries'] == [...] (up to limit items)
182
+ # editions_resp['links'] == {'self': '...', 'work': '...', 'next': '...', 'prev': '...'}
183
+ # ← use links['next'] for pagination when offset+limit < size
184
+
185
+ e = editions_resp['entries'][0]
186
+ # e['title'] == 'Duna'
187
+ # e['publishers'] == ['Editora Aleph']
188
+ # e['publish_date'] == '19/08/2017' ← inconsistent format, string
189
+ # e['isbn_13'] == ['9788576573135']
190
+ # e['isbn_10'] == ['857657313X']
191
+ # e['covers'] == [10368109]
192
+ # e['number_of_pages'] == 680
193
+ # e['languages'] == [{'key': '/languages/por'}]
194
+ # e['key'] == '/books/OL28969075M'
195
+ # e['physical_format'] == 'Paperback' (often missing)
196
+ # e['notes'] → str or {'value': str} (often missing)
197
+ ```
198
+
199
+ ---
200
+
201
+ ### Books API (specific edition)
202
+
203
+ Two sub-APIs: direct JSON for raw data, or `api/books` for enriched data.
204
+
205
+ #### Direct edition JSON
206
+
207
+ ```python
208
+ import json
209
+ from helpers import http_get
210
+
211
+ edition_id = 'OL7353617M' # from editions list e['key'] or cover_edition_key in search
212
+ edition = json.loads(http_get(f"https://openlibrary.org/books/{edition_id}.json"))
213
+
214
+ # edition['title'] == 'Fantastic Mr. Fox'
215
+ # edition['publishers'] == ['Puffin']
216
+ # edition['publish_date'] == 'October 1, 1988'
217
+ # edition['isbn_13'] == ['9780140328721']
218
+ # edition['isbn_10'] == ['0140328726']
219
+ # edition['number_of_pages'] == 96
220
+ # edition['covers'] == [...] ← cover IDs
221
+ # edition['languages'] == [{'key': '/languages/eng'}]
222
+ # edition['works'] == [{'key': '/works/OL45804W'}]
223
+ # edition['authors'] == [{'key': '/authors/OL34184A'}]
224
+ # edition['identifiers'] == {'goodreads': [...], 'librarything': [...]}
225
+ # edition['first_sentence'] == {'value': '...'} or str (often missing)
226
+ # edition['ocaid'] == 'fantast00dahl' ← Internet Archive ID (if available)
227
+ ```
228
+
229
+ #### Bibkeys API (enriched, multiple books at once)
230
+
231
+ ```python
232
+ # jscmd=data: cleaned up dict with cover URLs pre-built
233
+ r = json.loads(http_get(
234
+ "https://openlibrary.org/api/books"
235
+ "?bibkeys=ISBN:9780743273565,ISBN:9780451524935"
236
+ "&format=json&jscmd=data"
237
+ ))
238
+ # r == {'ISBN:9780743273565': {...}, 'ISBN:9780451524935': {...}}
239
+
240
+ book = r['ISBN:9780743273565']
241
+ # book['title'] == 'The Great Gatsby'
242
+ # book['authors'] == [{'url': '...', 'name': 'F. Scott Fitzgerald'}]
243
+ # book['publish_date'] == '2021'
244
+ # book['publishers'] == [{'name': 'Independently Published'}]
245
+ # book['number_of_pages'] == 208
246
+ # book['url'] == 'http://openlibrary.org/books/OL46773254M/The_Great_Gatsby'
247
+ # book['key'] == '/books/OL46773254M'
248
+ # book['cover'] == {'small': '...S.jpg', 'medium': '...M.jpg', 'large': '...L.jpg'}
249
+ # book['identifiers'] == {'isbn_13': [...], 'openlibrary': [...]}
250
+ # book['subjects'] == [{'name': 'Modern fiction', 'url': '...'}, ...]
251
+ # book['subject_places'] == None ← often null even with jscmd=data
252
+
253
+ # jscmd=details: raw edition JSON + extra fields
254
+ r2 = json.loads(http_get(
255
+ "https://openlibrary.org/api/books"
256
+ "?bibkeys=ISBN:9780743273565&format=json&jscmd=details"
257
+ ))
258
+ item = r2['ISBN:9780743273565']
259
+ # item['bib_key'] == 'ISBN:9780743273565'
260
+ # item['info_url'] == 'http://openlibrary.org/books/OL...'
261
+ # item['preview'] == 'noview' | 'restricted' | 'full'
262
+ # item['preview_url'] == URL to read on OL or IA
263
+ # item['thumbnail_url']== 'https://covers.openlibrary.org/b/id/14314120-S.jpg'
264
+ # item['details'] → raw edition JSON (same as /books/OL...M.json)
265
+
266
+ # Supported bibkey prefixes: ISBN:, OCLC:, LCCN:, OLID: (e.g. OLID:OL46773254M)
267
+ ```
268
+
269
+ ---
270
+
271
+ ### Authors API
272
+
273
+ ```python
274
+ import json
275
+ from helpers import http_get
276
+
277
+ # Lookup by known author key
278
+ author = json.loads(http_get("https://openlibrary.org/authors/OL26320A.json"))
279
+ # OL26320A is J.R.R. Tolkien (note: not Frank Herbert as originally stated — verify with search)
280
+
281
+ # author['name'] == 'J.R.R. Tolkien'
282
+ # author['fuller_name'] == 'John Ronald Reuel Tolkien'
283
+ # author['personal_name'] == 'J. R. R. Tolkien'
284
+ # author['birth_date'] == '3 January 1892' ← string, not parsed
285
+ # author['death_date'] == '2 September 1973'
286
+ # author['bio'] → str or {'type': '/type/text', 'value': str}
287
+ # author['photos'] == [6155606, 6433524, ...] ← photo IDs for covers API
288
+ # author['links'] == [{'title': '...', 'url': '...', 'type': {...}}, ...]
289
+ # author['remote_ids'] == {'wikidata': 'Q892', 'viaf': '95218067', ...}
290
+ # author['alternate_names']== ['J. R. R. Tolkien', 'TOLKIEN', ...]
291
+ # author['key'] == '/authors/OL26320A'
292
+ # author['wikipedia'] → URL string or None
293
+
294
+ # Author works (paginated)
295
+ works = json.loads(http_get("https://openlibrary.org/authors/OL26320A/works.json?limit=5"))
296
+ # works['size'] == 415
297
+ # works['entries'] == [{title, key, covers, authors, created, ...}, ...]
298
+ # works['links'] == {'self': '...', 'next': '...'}
299
+ ```
300
+
301
+ #### Author search
302
+
303
+ ```python
304
+ r = json.loads(http_get("https://openlibrary.org/search/authors.json?q=tolkien"))
305
+ # r['numFound'] == 40
306
+ # r['docs'][0]:
307
+ # 'name' == 'Christopher Tolkien'
308
+ # 'key' == 'OL2623360A' ← NOTE: no /authors/ prefix here
309
+ # 'birth_date' == '21 November 1924'
310
+ # 'death_date' == '16 January 2020'
311
+ # 'top_work' == 'The War of the Ring'
312
+ # 'work_count' == 43
313
+ # 'top_subjects' == [...]
314
+ # 'alternate_names' == [...]
315
+ # 'ratings_average' float
316
+ # 'want_to_read_count' int
317
+ # 'already_read_count' int
318
+ # 'currently_reading_count'int
319
+ ```
320
+
321
+ ---
322
+
323
+ ### Cover images
324
+
325
+ Covers are served directly as JPEG — redirect to Internet Archive CDN. No auth needed.
326
+
327
+ ```
328
+ # Book covers — three key types:
329
+ https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg # by cover ID (most reliable)
330
+ https://covers.openlibrary.org/b/isbn/{isbn}-{size}.jpg # by ISBN
331
+ https://covers.openlibrary.org/b/olid/{edition_id}-{size}.jpg # by edition OLID (unreliable — see gotchas)
332
+
333
+ # Author photos:
334
+ https://covers.openlibrary.org/a/id/{photo_id}-{size}.jpg
335
+
336
+ # Sizes: S (small), M (medium), L (large)
337
+ ```
338
+
339
+ ```python
340
+ import urllib.request
341
+
342
+ def get_cover_bytes(cover_id: int, size: str = 'M') -> bytes | None:
343
+ """Fetch cover image bytes. Returns None if no cover (43-byte GIF placeholder)."""
344
+ url = f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
345
+ req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
346
+ with urllib.request.urlopen(req, timeout=15) as resp:
347
+ data = resp.read()
348
+ return None if len(data) == 43 else data # 43-byte GIF = no cover placeholder
349
+
350
+ # Or just get the URL for embedding:
351
+ def cover_url(cover_id: int, size: str = 'M') -> str:
352
+ return f"https://covers.openlibrary.org/b/id/{cover_id}-{size}.jpg"
353
+
354
+ # Usage:
355
+ from helpers import http_get
356
+ import json
357
+ work = json.loads(http_get("https://openlibrary.org/works/OL893415W.json"))
358
+ if work.get('covers'):
359
+ img = get_cover_bytes(work['covers'][0], 'L') # first cover, large
360
+ # img is ~20–80KB JPEG bytes, redirected from ia*.archive.org
361
+ ```
362
+
363
+ To get cover by ISBN directly (e.g. for UI without a full book lookup):
364
+ ```python
365
+ # Medium-size cover by ISBN:
366
+ url = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg"
367
+ # Redirects to Internet Archive CDN, content-type: image/jpeg
368
+ # Use ?default=false to get 404 instead of 1×1 GIF placeholder for missing covers
369
+ url_safe = f"https://covers.openlibrary.org/b/isbn/{isbn}-M.jpg?default=false"
370
+ ```
371
+
372
+ ---
373
+
374
+ ### Subjects API
375
+
376
+ ```python
377
+ import json
378
+ from helpers import http_get
379
+
380
+ # Subject slugs: lowercase, underscores for spaces
381
+ r = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5"))
382
+ # r['name'] == 'science fiction'
383
+ # r['subject_type'] == 'subject' ← also: 'person', 'place', 'time'
384
+ # r['work_count'] == 20973
385
+ # r['works'] == [{title, key, cover_id, authors, edition_count, ...}, ...]
386
+
387
+ w = r['works'][0]
388
+ # w['title'] == 'Alice\'s Adventures in Wonderland'
389
+ # w['key'] == '/works/OL138052W'
390
+ # w['cover_id'] == 10527843
391
+ # w['cover_edition_key']== 'OL...'
392
+ # w['authors'] == [{'key': '/authors/OL22098A', 'name': 'Lewis Carroll'}]
393
+ # w['edition_count'] == 3546
394
+ # w['first_publish_year']== ...
395
+ # w['has_fulltext'] == True | False
396
+ # w['ia'] == 'identifier' (Internet Archive ID when available)
397
+
398
+ # Pagination: &offset=N
399
+ # Place subject:
400
+ r2 = json.loads(http_get("https://openlibrary.org/subjects/place:london.json?limit=5"))
401
+ # r2['subject_type'] == 'place', r2['work_count'] == 23927
402
+
403
+ # Person subject:
404
+ # https://openlibrary.org/subjects/person:napoleon.json?limit=5
405
+ # Time subject:
406
+ # https://openlibrary.org/subjects/time:middle_ages.json?limit=5
407
+
408
+ # Combine with ebooks=true to filter to only freely readable books:
409
+ r3 = json.loads(http_get("https://openlibrary.org/subjects/science_fiction.json?limit=5&ebooks=true"))
410
+ # r3['works'][i]['has_fulltext'] == True for all results
411
+ ```
412
+
413
+ ---
414
+
415
+ ### Trending books
416
+
417
+ ```python
418
+ import json
419
+ from helpers import http_get
420
+
421
+ for period in ['daily', 'weekly', 'monthly']:
422
+ r = json.loads(http_get(f"https://openlibrary.org/trending/{period}.json?limit=10"))
423
+ # r['works'] == list of search-doc-style objects
424
+ # r['days'] == int (time window)
425
+ # r['hours'] == int
426
+ # Same fields as search docs (title, author_name, cover_i, key, ...)
427
+ print(period, r['works'][0]['title']) # e.g. 'Atomic Habits'
428
+ ```
429
+
430
+ ---
431
+
432
+ ## Rate limits
433
+
434
+ No authentication required. No API key. No explicit rate limit published.
435
+
436
+ Observed in testing: 5 requests completed in ~1 second with no throttling, no 429s. The API is served from CDN/Solr — in practice you can make 10–20 parallel requests without issue. For bulk operations (hundreds of ISBNs), use `ThreadPoolExecutor(max_workers=5)` to be a good citizen.
437
+
438
+ **No `User-Agent` override needed** — the default `Mozilla/5.0` from `http_get` is accepted by all Open Library endpoints (unlike Nominatim which blocks it).
439
+
440
+ ---
441
+
442
+ ## Gotchas
443
+
444
+ **`description` field has two shapes.** Both are real — check at runtime:
445
+ ```python
446
+ desc = work.get('description', '')
447
+ text = desc.get('value', '') if isinstance(desc, dict) else (desc or '')
448
+ ```
449
+
450
+ **`/works/OL45804W` is Fantastic Mr. Fox, not Dune.** The OL IDs in the original prompt were placeholders. Always resolve real IDs via the search API rather than hardcoding them.
451
+
452
+ **Author search `key` has no prefix.** `/search/authors.json` returns `key: 'OL26320A'`, but the Authors API and all other APIs use `/authors/OL26320A`. Add the prefix manually when constructing follow-up URLs.
453
+
454
+ **Missing cover → 43-byte GIF placeholder, not 404.** Without `?default=false`, the covers API returns a 1×1 transparent GIF instead of HTTP 404 for unknown IDs. Check `len(data) == 43` to detect missing covers.
455
+
456
+ **`covers.openlibrary.org/b/olid/{work_id}` is unreliable.** OLID-based cover URLs for work IDs (OL...W) return the placeholder even when covers exist. Always use `b/id/{cover_id}` (from `work['covers'][0]`) or `b/isbn/{isbn}` instead.
457
+
458
+ **Bibkeys API picks one edition per ISBN.** When the same ISBN appears on multiple editions (reprint, reissue), `api/books?bibkeys=ISBN:...` returns one — and it may not be the most common edition.
459
+
460
+ **`publish_date` is a raw string.** Values like `'October 1, 1988'`, `'19/08/2017'`, `'2021'`, and `'1965-01-01'` all appear. Don't parse without normalization.
461
+
462
+ **`/works/.../editions.json` pagination uses `links.next`.** Unlike search (which uses `offset=`), check `links['next']` in the response to know if more pages exist:
463
+ ```python
464
+ resp = json.loads(http_get("https://openlibrary.org/works/OL893415W/editions.json?limit=50"))
465
+ while 'next' in resp.get('links', {}):
466
+ resp = json.loads(http_get("https://openlibrary.org" + resp['links']['next']))
467
+ # process resp['entries']
468
+ ```
469
+
470
+ **404 for non-existent IDs.** `/works/OL99999999W.json`, `/books/OL99999999M.json`, and `/authors/OL99999999A.json` all raise `HTTPError: HTTP Error 404: Not Found`. Wrap in try/except.
471
+
472
+ **Search `docs` default fields are minimal.** The default response includes ~15 fields. Add `&fields=*` to get all 100+ Solr fields (ratings, ISBNs, publishers, subjects, Goodreads IDs, etc.). Alternatively specify exactly what you need: `&fields=title,isbn,ratings_average`.