@pencil-agent/nano-pencil 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (195) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/mcp/mcp-client.d.ts +3 -1
  7. package/dist/core/mcp/mcp-client.js +6 -6
  8. package/dist/core/mcp/mcp-config.d.ts +3 -3
  9. package/dist/core/mcp/mcp-config.js +1 -1
  10. package/dist/core/mcp/mcp-manager.d.ts +5 -1
  11. package/dist/core/mcp/mcp-manager.js +1 -1
  12. package/dist/core/platform/config/resource-loader.d.ts +2 -0
  13. package/dist/core/platform/config/resource-loader.js +2 -2
  14. package/dist/core/runtime/agent-session.d.ts +12 -0
  15. package/dist/core/runtime/agent-session.js +8 -8
  16. package/dist/core/runtime/sdk.d.ts +8 -0
  17. package/dist/core/runtime/sdk.js +1 -1
  18. package/dist/extensions/builtin/AGENT.md +115 -115
  19. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  20. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  99. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  100. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  101. package/dist/extensions/builtin/browser/browser.md +73 -73
  102. package/dist/extensions/builtin/browser/install.md +142 -142
  103. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  104. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  105. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  107. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  112. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  113. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  114. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  115. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  116. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  117. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  118. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  119. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  120. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  121. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  122. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  123. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  124. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  125. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  126. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  127. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  128. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  129. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  130. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  131. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  132. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  133. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  134. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  135. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  136. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  137. package/dist/extensions/builtin/goal/README.md +67 -67
  138. package/dist/extensions/builtin/grub/README.md +112 -112
  139. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  140. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  141. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  142. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  143. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  144. package/dist/extensions/builtin/loop/README.md +92 -92
  145. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  146. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  147. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  148. package/dist/extensions/builtin/sal/README.md +72 -72
  149. package/dist/extensions/builtin/security-audit/README.md +289 -289
  150. package/dist/extensions/builtin/team/AGENT.md +112 -112
  151. package/dist/extensions/builtin/team/TESTING.md +299 -299
  152. package/dist/extensions/builtin/token-save/README.md +56 -56
  153. package/dist/extensions/optional/AGENT.md +10 -10
  154. package/dist/modes/interactive/interactive-mode.js +36 -36
  155. package/dist/modes/interactive/theme/dark.json +85 -85
  156. package/dist/modes/interactive/theme/light.json +84 -84
  157. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  158. package/dist/modes/interactive/theme/warm.json +81 -81
  159. package/dist/node_modules/@pencil-agent/agent-core/dist/agent-loop.js +3 -2
  160. package/dist/node_modules/@pencil-agent/agent-core/dist/structured-adaptive-agent-loop.js +2 -1
  161. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  162. package/docs/cc-agent-design.md +1297 -0
  163. package/docs/cc-tui-design.md +1333 -0
  164. package/docs/codex-goal-command-impl.md +1055 -1055
  165. package/docs/codex-goal-vs-grub.md +500 -500
  166. package/docs/custom-provider.md +27 -27
  167. package/docs/extensions.md +27 -27
  168. package/docs/keybindings.md +27 -27
  169. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  170. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  171. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  172. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  173. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  174. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  175. package/docs/loop-usage-examples.md +214 -214
  176. package/docs/models.md +27 -27
  177. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +170 -0
  178. package/docs/packages.md +27 -27
  179. package/docs/pi-design-philosophy.md +457 -457
  180. package/docs/planmode.md +1987 -1987
  181. package/docs/prompt-templates.md +27 -27
  182. package/docs/providers.md +27 -27
  183. package/docs/scan-report.md +3820 -0
  184. package/docs/sdk.md +27 -27
  185. package/docs/skills.md +27 -27
  186. package/docs/themes.md +27 -27
  187. package/docs/tui.md +27 -27
  188. package/docs//345/257/271/346/240/207Claude-Code.md +1775 -0
  189. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +261 -0
  190. package/package.json +190 -190
  191. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +0 -851
  192. package/docs/SDK-TESTING.md +0 -364
  193. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +0 -593
  194. package/docs/startup-performance-optimization.md +0 -301
  195. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +0 -47
@@ -1,360 +1,360 @@
1
- # Coursera — Course & Catalog Data Extraction
2
-
3
- Field-tested against coursera.org and api.coursera.org on 2026-04-18.
4
- No authentication required for the public catalog API.
5
-
6
- ## TL;DR — Fastest Approach
7
-
8
- Use `http_get` against `api.coursera.org`. The public REST API returns clean JSON with no
9
- auth, no bot-detection, and sub-600ms latency. Use `q=search` with a keyword
10
- only when you need full-text search (requires a browser POST workaround — see below).
11
- For bulk enumeration, iterate the catalog list with `start` pagination.
12
-
13
- ---
14
-
15
- ## 1. Catalog List (http_get — always works)
16
-
17
- The default list query (`q=list` implied) returns ALL courses in Coursera's catalog —
18
- 20,659 as of the test date.
19
-
20
- ```python
21
- from helpers import http_get
22
- import json
23
-
24
- resp = http_get(
25
- "https://api.coursera.org/api/courses.v1"
26
- "?fields=name,slug,description,primaryLanguages,workload,"
27
- "partnerIds,courseType,instructorIds,domainTypes,photoUrl,certificates"
28
- "&limit=100&start=0"
29
- )
30
- data = json.loads(resp)
31
- courses = data["elements"] # list of dicts
32
- next_start = data["paging"].get("next") # e.g. "100", None when exhausted
33
- total = data["paging"].get("total") # 20659
34
- ```
35
-
36
- ### Response structure (confirmed field names)
37
-
38
- ```json
39
- {
40
- "courseType": "v2.ondemand",
41
- "description": "Gamification is the application of game elements...",
42
- "domainTypes": [
43
- {"domainId": "computer-science", "subdomainId": "design-and-product"},
44
- {"domainId": "business", "subdomainId": "marketing"}
45
- ],
46
- "photoUrl": "https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera-course-photos.s3.amazonaws.com/...",
47
- "id": "69Bku0KoEeWZtA4u62x6lQ",
48
- "slug": "gamification",
49
- "instructorIds": ["226710"],
50
- "specializations": [],
51
- "workload": "4-8 hours/week",
52
- "primaryLanguages": ["en"],
53
- "partnerIds": ["6"],
54
- "certificates": ["VerifiedCert"],
55
- "name": "Gamification"
56
- }
57
- ```
58
-
59
- Field notes:
60
- - `id` — opaque base64-ish string, stable identifier. Use for batch lookups and linking.
61
- - `slug` — URL-safe identifier. Course page: `https://www.coursera.org/learn/{slug}`
62
- - `courseType` — always `"v2.ondemand"` for self-paced courses in practice.
63
- - `workload` — free-text string, e.g. `"4-8 hours/week"`, `"1 hour 30 minutes"`, `"4 weeks of study, 1-2 hours/week"`. Not normalized.
64
- - `primaryLanguages` — ISO 639-1 list, e.g. `["en"]`, `["fr"]`.
65
- - `partnerIds` — list of partner (university/org) IDs. Join to `partners.v1` by id.
66
- - `instructorIds` — list of instructor IDs. Join to `instructors.v1` by id.
67
- - `domainTypes` — list of `{domainId, subdomainId}` objects. Domain IDs include `"data-science"`, `"computer-science"`, `"business"`, `"information-technology"`.
68
- - `certificates` — list of cert types, typically `["VerifiedCert"]`.
69
- - `photoUrl` — direct CDN URL to course image. Works without auth.
70
- - `specializations` — list of specialization IDs this course belongs to (often empty; not always populated here — use `onDemandSpecializations.v1` instead).
71
- - `previewLink` — field exists but was empty in all tested records; skip it.
72
- - `avgRating` — field does NOT appear in public API responses; not available.
73
-
74
- ### Pagination
75
-
76
- ```python
77
- def iter_all_courses(fields=None, page_size=100):
78
- base_fields = "name,slug,description,primaryLanguages,workload,partnerIds,courseType,domainTypes,photoUrl"
79
- if fields:
80
- base_fields = fields
81
- start = 0
82
- while True:
83
- url = (
84
- f"https://api.coursera.org/api/courses.v1"
85
- f"?fields={base_fields}&limit={page_size}&start={start}"
86
- )
87
- data = json.loads(http_get(url))
88
- yield from data["elements"]
89
- nxt = data["paging"].get("next")
90
- if nxt is None:
91
- break
92
- start = int(nxt)
93
- ```
94
-
95
- - `paging.next` is a string offset (e.g. `"100"`), or absent when exhausted.
96
- - `paging.total` is present on the first page (e.g. `20659`) but absent on subsequent pages.
97
- - `limit` up to at least 1000 works (tested: 1000 returned 1000 items). Use 100–500 for safe batches.
98
-
99
- ---
100
-
101
- ## 2. Partners API (http_get — works)
102
-
103
- 422 partners (universities, companies) as of test date.
104
-
105
- ```python
106
- resp = http_get(
107
- "https://api.coursera.org/api/partners.v1"
108
- "?fields=name,squareLogo,description,shortName&limit=50&start=0"
109
- )
110
- data = json.loads(resp)
111
- partners = data["elements"]
112
- # paging.next and paging.total follow same structure as courses
113
- ```
114
-
115
- ### Partner record structure
116
-
117
- ```json
118
- {
119
- "id": "6",
120
- "name": "University of Pennsylvania",
121
- "shortName": "penn",
122
- "description": "The University of Pennsylvania (commonly referred to as Penn)...",
123
- "squareLogo": "http://coursera-university-assets.s3.amazonaws.com/.../logo.png"
124
- }
125
- ```
126
-
127
- ### Partner by ID (with courseIds)
128
-
129
- ```python
130
- resp = http_get(
131
- "https://api.coursera.org/api/partners.v1"
132
- "?ids=6&fields=name,squareLogo,description,shortName,courseIds"
133
- )
134
- data = json.loads(resp)
135
- partner = data["elements"][0]
136
- # partner["courseIds"] is a list of course ID strings (150+ for large universities)
137
- ```
138
-
139
- ---
140
-
141
- ## 3. Specializations API (http_get — works)
142
-
143
- ```python
144
- resp = http_get(
145
- "https://api.coursera.org/api/onDemandSpecializations.v1"
146
- "?fields=name,slug,description,partnerIds,courseIds,tagline&limit=100&start=0"
147
- )
148
- data = json.loads(resp)
149
- specs = data["elements"]
150
- ```
151
-
152
- ### Specialization record structure
153
-
154
- ```json
155
- {
156
- "id": "AbCdEfGhIjKl",
157
- "name": "SIEM Splunk",
158
- "slug": "siem-splunk",
159
- "tagline": "Learn SIEM fundamentals with Splunk",
160
- "description": "Course Overview:\n\nIn the \"SIEM Splunk\" specialization course...",
161
- "partnerIds": ["1441"],
162
- "courseIds": ["pu2XQCuEEe6qTBJCf71DPw", "Xc46mVFkEe6a4wrvTcwXPw", "YH1ok1FXEe62cBI5JZME2w"]
163
- }
164
- ```
165
-
166
- Note: Specializations paging does NOT include `paging.total` — iterate until `paging.next` is absent.
167
-
168
- ---
169
-
170
- ## 4. Instructors API (http_get — works)
171
-
172
- Only useful for lookups by ID (from course `instructorIds`). The plain list endpoint
173
- returns many empty records (empty name/bio).
174
-
175
- ```python
176
- # Lookup specific instructors by ID
177
- resp = http_get(
178
- "https://api.coursera.org/api/instructors.v1"
179
- "?ids=226710&fields=fullName,bio,department,title,photo"
180
- )
181
- data = json.loads(resp)
182
- instructor = data["elements"][0]
183
- ```
184
-
185
- ### Instructor record structure
186
-
187
- ```json
188
- {
189
- "id": "226710",
190
- "fullName": "Kevin Werbach",
191
- "title": "Professor of Legal Studies and Business Ethics",
192
- "department": "Legal Studies and Business Ethics",
193
- "bio": "Kevin Werbach is professor of Legal Studies...",
194
- "photo": "https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/..."
195
- }
196
- ```
197
-
198
- ---
199
-
200
- ## 5. Batch ID Lookup
201
-
202
- Fetch multiple courses (or partners/instructors) in one request by passing a comma-separated `ids` list:
203
-
204
- ```python
205
- ids = ",".join(["69Bku0KoEeWZtA4u62x6lQ", "hOzhxVNuEfCW8Q55q1kSNQ", "0HiU7Oe4EeWTAQ4yevf_oQ"])
206
- resp = http_get(
207
- f"https://api.coursera.org/api/courses.v1"
208
- f"?ids={ids}&fields=name,slug,description,primaryLanguages,workload,partnerIds"
209
- )
210
- data = json.loads(resp)
211
- # data["elements"] has exactly the courses you asked for
212
- ```
213
-
214
- No observed limit on the number of IDs per request in testing (tried up to 3).
215
-
216
- ---
217
-
218
- ## 6. Keyword Search — BLOCKED for GET (405)
219
-
220
- `q=search&query=...` returns **HTTP 405 Method Not Allowed** on GET.
221
- This applies to all three resource types:
222
- - `courses.v1?q=search&query=python` → 405
223
- - `onDemandSpecializations.v1?q=search&query=data+science` → 405
224
- - `partners.v1?q=search&query=stanford` → 405
225
-
226
- The search endpoint requires a POST request (Coursera's public Autocomplete/Search
227
- service). For keyword-based discovery without a browser, use the catalog list and filter
228
- client-side, or use the browser approach below.
229
-
230
- ### Browser fallback for keyword search
231
-
232
- ```python
233
- new_tab("https://www.coursera.org/search?query=machine+learning")
234
- wait_for_load()
235
- wait(3) # Results load asynchronously via React
236
- capture_screenshot()
237
- ```
238
-
239
- Note: The search results page (`/search?query=...`) is a client-rendered React app. The
240
- HTML returned by `http_get` does NOT contain course cards — it's a bare shell with no
241
- `__NEXT_DATA__` or embedded JSON. A live browser is required to see rendered results.
242
-
243
- ---
244
-
245
- ## 7. Course Detail HTML Page (http_get — works, limited data)
246
-
247
- ```python
248
- html = http_get("https://www.coursera.org/learn/machine-learning")
249
- # html is ~980KB of server-rendered HTML (no NEXT_DATA, no Apollo state)
250
- ```
251
-
252
- The course detail page IS served as full HTML (no JS-gate), but contains very
253
- little machine-readable course data. What you can extract:
254
-
255
- ```python
256
- import re, json
257
-
258
- # Page title (includes course name)
259
- title = re.search(r'<title[^>]*>(.*?)</title>', html).group(1)
260
- # "Supervised Machine Learning: Regression and Classification | Coursera"
261
-
262
- # JSON-LD blocks (2 present)
263
- jsonld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
264
- # Block 0: FAQPage schema (common Q&A about how courses work)
265
- # Block 1: BreadcrumbList (category path, e.g. Browse > Data Science > Machine Learning)
266
- faq = json.loads(jsonld_blocks[0]) # {"@type": "FAQPage", "mainEntity": [...]}
267
- crumb = json.loads(jsonld_blocks[1]) # {"@type": "BreadcrumbList", "itemListElement": [...]}
268
-
269
- # Extract breadcrumb categories
270
- categories = [item["item"]["name"] for item in crumb["@graph"][0]["itemListElement"]]
271
- # e.g. ["Browse", "Data Science", "Machine Learning"]
272
- ```
273
-
274
- The HTML does NOT embed: description, rating, instructor names, enrollment count,
275
- price, or any course-specific metadata as machine-readable fields.
276
- Use the API (`courses.v1?ids=...`) to get those from the slug.
277
-
278
- ### Slug-to-ID lookup pattern
279
-
280
- ```python
281
- # Get course data from slug (need ID first — get it from catalog or search)
282
- # Pattern: enumerate catalog, match by slug
283
- resp = http_get("https://api.coursera.org/api/courses.v1?fields=name,slug,description&limit=100&start=0")
284
- data = json.loads(resp)
285
- by_slug = {el["slug"]: el for el in data["elements"]}
286
- course = by_slug.get("machine-learning")
287
- ```
288
-
289
- ---
290
-
291
- ## Endpoints Summary
292
-
293
- | Endpoint | Method | Result |
294
- |---|---|---|
295
- | `courses.v1` (list) | GET | 200 OK — full catalog, 20,659 courses |
296
- | `courses.v1?ids=...` | GET | 200 OK — batch lookup by ID |
297
- | `courses.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
298
- | `partners.v1` (list) | GET | 200 OK — 422 partners |
299
- | `partners.v1?ids=...` | GET | 200 OK — with courseIds |
300
- | `partners.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
301
- | `onDemandSpecializations.v1` (list) | GET | 200 OK — paginated (no total) |
302
- | `onDemandSpecializations.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
303
- | `instructors.v1?ids=...` | GET | 200 OK — rich records by ID |
304
- | `instructors.v1` (list) | GET | 200 OK — mostly empty records |
305
- | `degrees.v1` | GET | 403 Forbidden |
306
- | `/search?query=...` page HTML | GET | 200 OK — React shell only, no data |
307
- | `/learn/{slug}` page HTML | GET | 200 OK — HTML with JSON-LD breadcrumb only |
308
-
309
- ---
310
-
311
- ## Rate Limits
312
-
313
- No rate limiting observed in testing:
314
- - 5 consecutive requests with no delay: all succeeded, avg 0.55s each.
315
- - No `X-RateLimit-*` or `Retry-After` headers in responses.
316
- - No auth headers needed for any working endpoint.
317
-
318
- Response headers that are present: `X-Coursera-Request-Id`, `X-Coursera-Trace-Id-Hex`,
319
- `x-envoy-upstream-service-time`. No rate-limit indicators.
320
-
321
- Use a small delay (0.5s) between requests if doing bulk enumeration of the full 20K+
322
- catalog as a courtesy, but no hard cap was observed.
323
-
324
- ---
325
-
326
- ## Gotchas
327
-
328
- - **`q=search` is POST-only**: All three resource types (courses, specializations,
329
- partners) return 405 on GET when `q=search` is added. There is no documented public
330
- POST endpoint. For keyword filtering, enumerate the catalog and filter client-side.
331
-
332
- - **`paging.total` absent after page 1**: Only the first page response includes
333
- `paging.total`. Subsequent pages have only `paging.next`. Check for the `"next"` key
334
- being absent to detect end-of-list.
335
-
336
- - **Specializations never include `paging.total`**: The `onDemandSpecializations.v1`
337
- endpoint never returns `paging.total` in any page. Iterate until `"next"` is absent.
338
-
339
- - **`workload` is free-text, unnormalized**: Values include `"4-8 hours/week"`,
340
- `"1 hour 30 minutes"`, `"4 weeks of study, 1-2 hours/week"`. Do not parse as a number
341
- without normalization logic.
342
-
343
- - **`instructors.v1` list returns empty records**: The plain list endpoint returns many
344
- instructors with empty `fullName`, `bio`, `title`. Always look up by `ids=` using
345
- IDs from course records.
346
-
347
- - **`degrees.v1` is 403**: Degree programs are not accessible via the public API.
348
-
349
- - **HTML pages contain no embedded course data**: Both the search page and the course
350
- detail page are React-rendered. `http_get` on `/search?query=...` returns an HTML
351
- shell with no course listings. `http_get` on `/learn/{slug}` returns HTML with only
352
- a FAQ JSON-LD and a breadcrumb JSON-LD — no course description, rating, price, or
353
- enrollment data as machine-readable fields.
354
-
355
- - **`linked` resources don't populate**: Passing `includes=partners.v1` to the courses
356
- endpoint returns an empty `linked: {}` object. Cross-resource joins require separate
357
- requests by IDs.
358
-
359
- - **`previewLink` and `avgRating` fields**: These field names are accepted without error
360
- but return no data in the response objects. Do not request them.
1
+ # Coursera — Course & Catalog Data Extraction
2
+
3
+ Field-tested against coursera.org and api.coursera.org on 2026-04-18.
4
+ No authentication required for the public catalog API.
5
+
6
+ ## TL;DR — Fastest Approach
7
+
8
+ Use `http_get` against `api.coursera.org`. The public REST API returns clean JSON with no
9
+ auth, no bot-detection, and sub-600ms latency. Use `q=search` with a keyword
10
+ only when you need full-text search (requires a browser POST workaround — see below).
11
+ For bulk enumeration, iterate the catalog list with `start` pagination.
12
+
13
+ ---
14
+
15
+ ## 1. Catalog List (http_get — always works)
16
+
17
+ The default list query (`q=list` implied) returns ALL courses in Coursera's catalog —
18
+ 20,659 as of the test date.
19
+
20
+ ```python
21
+ from helpers import http_get
22
+ import json
23
+
24
+ resp = http_get(
25
+ "https://api.coursera.org/api/courses.v1"
26
+ "?fields=name,slug,description,primaryLanguages,workload,"
27
+ "partnerIds,courseType,instructorIds,domainTypes,photoUrl,certificates"
28
+ "&limit=100&start=0"
29
+ )
30
+ data = json.loads(resp)
31
+ courses = data["elements"] # list of dicts
32
+ next_start = data["paging"].get("next") # e.g. "100", None when exhausted
33
+ total = data["paging"].get("total") # 20659
34
+ ```
35
+
36
+ ### Response structure (confirmed field names)
37
+
38
+ ```json
39
+ {
40
+ "courseType": "v2.ondemand",
41
+ "description": "Gamification is the application of game elements...",
42
+ "domainTypes": [
43
+ {"domainId": "computer-science", "subdomainId": "design-and-product"},
44
+ {"domainId": "business", "subdomainId": "marketing"}
45
+ ],
46
+ "photoUrl": "https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera-course-photos.s3.amazonaws.com/...",
47
+ "id": "69Bku0KoEeWZtA4u62x6lQ",
48
+ "slug": "gamification",
49
+ "instructorIds": ["226710"],
50
+ "specializations": [],
51
+ "workload": "4-8 hours/week",
52
+ "primaryLanguages": ["en"],
53
+ "partnerIds": ["6"],
54
+ "certificates": ["VerifiedCert"],
55
+ "name": "Gamification"
56
+ }
57
+ ```
58
+
59
+ Field notes:
60
+ - `id` — opaque base64-ish string, stable identifier. Use for batch lookups and linking.
61
+ - `slug` — URL-safe identifier. Course page: `https://www.coursera.org/learn/{slug}`
62
+ - `courseType` — always `"v2.ondemand"` for self-paced courses in practice.
63
+ - `workload` — free-text string, e.g. `"4-8 hours/week"`, `"1 hour 30 minutes"`, `"4 weeks of study, 1-2 hours/week"`. Not normalized.
64
+ - `primaryLanguages` — ISO 639-1 list, e.g. `["en"]`, `["fr"]`.
65
+ - `partnerIds` — list of partner (university/org) IDs. Join to `partners.v1` by id.
66
+ - `instructorIds` — list of instructor IDs. Join to `instructors.v1` by id.
67
+ - `domainTypes` — list of `{domainId, subdomainId}` objects. Domain IDs include `"data-science"`, `"computer-science"`, `"business"`, `"information-technology"`.
68
+ - `certificates` — list of cert types, typically `["VerifiedCert"]`.
69
+ - `photoUrl` — direct CDN URL to course image. Works without auth.
70
+ - `specializations` — list of specialization IDs this course belongs to (often empty; not always populated here — use `onDemandSpecializations.v1` instead).
71
+ - `previewLink` — field exists but was empty in all tested records; skip it.
72
+ - `avgRating` — field does NOT appear in public API responses; not available.
73
+
74
+ ### Pagination
75
+
76
+ ```python
77
+ def iter_all_courses(fields=None, page_size=100):
78
+ base_fields = "name,slug,description,primaryLanguages,workload,partnerIds,courseType,domainTypes,photoUrl"
79
+ if fields:
80
+ base_fields = fields
81
+ start = 0
82
+ while True:
83
+ url = (
84
+ f"https://api.coursera.org/api/courses.v1"
85
+ f"?fields={base_fields}&limit={page_size}&start={start}"
86
+ )
87
+ data = json.loads(http_get(url))
88
+ yield from data["elements"]
89
+ nxt = data["paging"].get("next")
90
+ if nxt is None:
91
+ break
92
+ start = int(nxt)
93
+ ```
94
+
95
+ - `paging.next` is a string offset (e.g. `"100"`), or absent when exhausted.
96
+ - `paging.total` is present on the first page (e.g. `20659`) but absent on subsequent pages.
97
+ - `limit` up to at least 1000 works (tested: 1000 returned 1000 items). Use 100–500 for safe batches.
98
+
99
+ ---
100
+
101
+ ## 2. Partners API (http_get — works)
102
+
103
+ 422 partners (universities, companies) as of test date.
104
+
105
+ ```python
106
+ resp = http_get(
107
+ "https://api.coursera.org/api/partners.v1"
108
+ "?fields=name,squareLogo,description,shortName&limit=50&start=0"
109
+ )
110
+ data = json.loads(resp)
111
+ partners = data["elements"]
112
+ # paging.next and paging.total follow same structure as courses
113
+ ```
114
+
115
+ ### Partner record structure
116
+
117
+ ```json
118
+ {
119
+ "id": "6",
120
+ "name": "University of Pennsylvania",
121
+ "shortName": "penn",
122
+ "description": "The University of Pennsylvania (commonly referred to as Penn)...",
123
+ "squareLogo": "http://coursera-university-assets.s3.amazonaws.com/.../logo.png"
124
+ }
125
+ ```
126
+
127
+ ### Partner by ID (with courseIds)
128
+
129
+ ```python
130
+ resp = http_get(
131
+ "https://api.coursera.org/api/partners.v1"
132
+ "?ids=6&fields=name,squareLogo,description,shortName,courseIds"
133
+ )
134
+ data = json.loads(resp)
135
+ partner = data["elements"][0]
136
+ # partner["courseIds"] is a list of course ID strings (150+ for large universities)
137
+ ```
138
+
139
+ ---
140
+
141
+ ## 3. Specializations API (http_get — works)
142
+
143
+ ```python
144
+ resp = http_get(
145
+ "https://api.coursera.org/api/onDemandSpecializations.v1"
146
+ "?fields=name,slug,description,partnerIds,courseIds,tagline&limit=100&start=0"
147
+ )
148
+ data = json.loads(resp)
149
+ specs = data["elements"]
150
+ ```
151
+
152
+ ### Specialization record structure
153
+
154
+ ```json
155
+ {
156
+ "id": "AbCdEfGhIjKl",
157
+ "name": "SIEM Splunk",
158
+ "slug": "siem-splunk",
159
+ "tagline": "Learn SIEM fundamentals with Splunk",
160
+ "description": "Course Overview:\n\nIn the \"SIEM Splunk\" specialization course...",
161
+ "partnerIds": ["1441"],
162
+ "courseIds": ["pu2XQCuEEe6qTBJCf71DPw", "Xc46mVFkEe6a4wrvTcwXPw", "YH1ok1FXEe62cBI5JZME2w"]
163
+ }
164
+ ```
165
+
166
+ Note: Specializations paging does NOT include `paging.total` — iterate until `paging.next` is absent.
167
+
168
+ ---
169
+
170
+ ## 4. Instructors API (http_get — works)
171
+
172
+ Only useful for lookups by ID (from course `instructorIds`). The plain list endpoint
173
+ returns many empty records (empty name/bio).
174
+
175
+ ```python
176
+ # Lookup specific instructors by ID
177
+ resp = http_get(
178
+ "https://api.coursera.org/api/instructors.v1"
179
+ "?ids=226710&fields=fullName,bio,department,title,photo"
180
+ )
181
+ data = json.loads(resp)
182
+ instructor = data["elements"][0]
183
+ ```
184
+
185
+ ### Instructor record structure
186
+
187
+ ```json
188
+ {
189
+ "id": "226710",
190
+ "fullName": "Kevin Werbach",
191
+ "title": "Professor of Legal Studies and Business Ethics",
192
+ "department": "Legal Studies and Business Ethics",
193
+ "bio": "Kevin Werbach is professor of Legal Studies...",
194
+ "photo": "https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/..."
195
+ }
196
+ ```
197
+
198
+ ---
199
+
200
+ ## 5. Batch ID Lookup
201
+
202
+ Fetch multiple courses (or partners/instructors) in one request by passing a comma-separated `ids` list:
203
+
204
+ ```python
205
+ ids = ",".join(["69Bku0KoEeWZtA4u62x6lQ", "hOzhxVNuEfCW8Q55q1kSNQ", "0HiU7Oe4EeWTAQ4yevf_oQ"])
206
+ resp = http_get(
207
+ f"https://api.coursera.org/api/courses.v1"
208
+ f"?ids={ids}&fields=name,slug,description,primaryLanguages,workload,partnerIds"
209
+ )
210
+ data = json.loads(resp)
211
+ # data["elements"] has exactly the courses you asked for
212
+ ```
213
+
214
+ No observed limit on the number of IDs per request in testing (tried up to 3).
215
+
216
+ ---
217
+
218
+ ## 6. Keyword Search — BLOCKED for GET (405)
219
+
220
+ `q=search&query=...` returns **HTTP 405 Method Not Allowed** on GET.
221
+ This applies to all three resource types:
222
+ - `courses.v1?q=search&query=python` → 405
223
+ - `onDemandSpecializations.v1?q=search&query=data+science` → 405
224
+ - `partners.v1?q=search&query=stanford` → 405
225
+
226
+ The search endpoint requires a POST request (Coursera's public Autocomplete/Search
227
+ service). For keyword-based discovery without a browser, use the catalog list and filter
228
+ client-side, or use the browser approach below.
229
+
230
+ ### Browser fallback for keyword search
231
+
232
+ ```python
233
+ new_tab("https://www.coursera.org/search?query=machine+learning")
234
+ wait_for_load()
235
+ wait(3) # Results load asynchronously via React
236
+ capture_screenshot()
237
+ ```
238
+
239
+ Note: The search results page (`/search?query=...`) is a client-rendered React app. The
240
+ HTML returned by `http_get` does NOT contain course cards — it's a bare shell with no
241
+ `__NEXT_DATA__` or embedded JSON. A live browser is required to see rendered results.
242
+
243
+ ---
244
+
245
+ ## 7. Course Detail HTML Page (http_get — works, limited data)
246
+
247
+ ```python
248
+ html = http_get("https://www.coursera.org/learn/machine-learning")
249
+ # html is ~980KB of server-rendered HTML (no NEXT_DATA, no Apollo state)
250
+ ```
251
+
252
+ The course detail page IS served as full HTML (no JS-gate), but contains very
253
+ little machine-readable course data. What you can extract:
254
+
255
+ ```python
256
+ import re, json
257
+
258
+ # Page title (includes course name)
259
+ title = re.search(r'<title[^>]*>(.*?)</title>', html).group(1)
260
+ # "Supervised Machine Learning: Regression and Classification | Coursera"
261
+
262
+ # JSON-LD blocks (2 present)
263
+ jsonld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
264
+ # Block 0: FAQPage schema (common Q&A about how courses work)
265
+ # Block 1: BreadcrumbList (category path, e.g. Browse > Data Science > Machine Learning)
266
+ faq = json.loads(jsonld_blocks[0]) # {"@type": "FAQPage", "mainEntity": [...]}
267
+ crumb = json.loads(jsonld_blocks[1]) # {"@type": "BreadcrumbList", "itemListElement": [...]}
268
+
269
+ # Extract breadcrumb categories
270
+ categories = [item["item"]["name"] for item in crumb["@graph"][0]["itemListElement"]]
271
+ # e.g. ["Browse", "Data Science", "Machine Learning"]
272
+ ```
273
+
274
+ The HTML does NOT embed: description, rating, instructor names, enrollment count,
275
+ price, or any course-specific metadata as machine-readable fields.
276
+ Use the API (`courses.v1?ids=...`) to get those from the slug.
277
+
278
+ ### Slug-to-ID lookup pattern
279
+
280
+ ```python
281
+ # Get course data from slug (need ID first — get it from catalog or search)
282
+ # Pattern: enumerate catalog, match by slug
283
+ resp = http_get("https://api.coursera.org/api/courses.v1?fields=name,slug,description&limit=100&start=0")
284
+ data = json.loads(resp)
285
+ by_slug = {el["slug"]: el for el in data["elements"]}
286
+ course = by_slug.get("machine-learning")
287
+ ```
288
+
289
+ ---
290
+
291
+ ## Endpoints Summary
292
+
293
+ | Endpoint | Method | Result |
294
+ |---|---|---|
295
+ | `courses.v1` (list) | GET | 200 OK — full catalog, 20,659 courses |
296
+ | `courses.v1?ids=...` | GET | 200 OK — batch lookup by ID |
297
+ | `courses.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
298
+ | `partners.v1` (list) | GET | 200 OK — 422 partners |
299
+ | `partners.v1?ids=...` | GET | 200 OK — with courseIds |
300
+ | `partners.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
301
+ | `onDemandSpecializations.v1` (list) | GET | 200 OK — paginated (no total) |
302
+ | `onDemandSpecializations.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
303
+ | `instructors.v1?ids=...` | GET | 200 OK — rich records by ID |
304
+ | `instructors.v1` (list) | GET | 200 OK — mostly empty records |
305
+ | `degrees.v1` | GET | 403 Forbidden |
306
+ | `/search?query=...` page HTML | GET | 200 OK — React shell only, no data |
307
+ | `/learn/{slug}` page HTML | GET | 200 OK — HTML with JSON-LD breadcrumb only |
308
+
309
+ ---
310
+
311
+ ## Rate Limits
312
+
313
+ No rate limiting observed in testing:
314
+ - 5 consecutive requests with no delay: all succeeded, avg 0.55s each.
315
+ - No `X-RateLimit-*` or `Retry-After` headers in responses.
316
+ - No auth headers needed for any working endpoint.
317
+
318
+ Response headers that are present: `X-Coursera-Request-Id`, `X-Coursera-Trace-Id-Hex`,
319
+ `x-envoy-upstream-service-time`. No rate-limit indicators.
320
+
321
+ Use a small delay (0.5s) between requests if doing bulk enumeration of the full 20K+
322
+ catalog as a courtesy, but no hard cap was observed.
323
+
324
+ ---
325
+
326
+ ## Gotchas
327
+
328
+ - **`q=search` is POST-only**: All three resource types (courses, specializations,
329
+ partners) return 405 on GET when `q=search` is added. There is no documented public
330
+ POST endpoint. For keyword filtering, enumerate the catalog and filter client-side.
331
+
332
+ - **`paging.total` absent after page 1**: Only the first page response includes
333
+ `paging.total`. Subsequent pages have only `paging.next`. Check for the `"next"` key
334
+ being absent to detect end-of-list.
335
+
336
+ - **Specializations never include `paging.total`**: The `onDemandSpecializations.v1`
337
+ endpoint never returns `paging.total` in any page. Iterate until `"next"` is absent.
338
+
339
+ - **`workload` is free-text, unnormalized**: Values include `"4-8 hours/week"`,
340
+ `"1 hour 30 minutes"`, `"4 weeks of study, 1-2 hours/week"`. Do not parse as a number
341
+ without normalization logic.
342
+
343
+ - **`instructors.v1` list returns empty records**: The plain list endpoint returns many
344
+ instructors with empty `fullName`, `bio`, `title`. Always look up by `ids=` using
345
+ IDs from course records.
346
+
347
+ - **`degrees.v1` is 403**: Degree programs are not accessible via the public API.
348
+
349
+ - **HTML pages contain no embedded course data**: Both the search page and the course
350
+ detail page are React-rendered. `http_get` on `/search?query=...` returns an HTML
351
+ shell with no course listings. `http_get` on `/learn/{slug}` returns HTML with only
352
+ a FAQ JSON-LD and a breadcrumb JSON-LD — no course description, rating, price, or
353
+ enrollment data as machine-readable fields.
354
+
355
+ - **`linked` resources don't populate**: Passing `includes=partners.v1` to the courses
356
+ endpoint returns an empty `linked: {}` object. Cross-resource joins require separate
357
+ requests by IDs.
358
+
359
+ - **`previewLink` and `avgRating` fields**: These field names are accepted without error
360
+ but return no data in the response objects. Do not request them.