@pencil-agent/nano-pencil 2.0.1 → 2.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (186) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/model/custom-providers.js +1 -1
  7. package/dist/core/model-registry.js +5 -5
  8. package/dist/extensions/builtin/AGENT.md +115 -115
  9. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  10. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  11. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  12. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  13. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  14. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  15. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  16. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  17. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  18. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  91. package/dist/extensions/builtin/browser/browser.md +73 -73
  92. package/dist/extensions/builtin/browser/install.md +142 -142
  93. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  94. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  95. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  96. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  97. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  98. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  99. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  100. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  101. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  102. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  104. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  105. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  108. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  109. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  110. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  111. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  112. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  113. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  114. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  115. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  116. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  117. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  118. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  119. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  120. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  121. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  122. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  123. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  124. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  125. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  126. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  127. package/dist/extensions/builtin/goal/README.md +67 -67
  128. package/dist/extensions/builtin/grub/README.md +112 -112
  129. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  130. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  131. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  132. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  133. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  134. package/dist/extensions/builtin/loop/README.md +92 -92
  135. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  136. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  137. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  138. package/dist/extensions/builtin/sal/README.md +72 -72
  139. package/dist/extensions/builtin/security-audit/README.md +289 -289
  140. package/dist/extensions/builtin/team/AGENT.md +112 -112
  141. package/dist/extensions/builtin/team/TESTING.md +299 -299
  142. package/dist/extensions/builtin/token-save/README.md +56 -56
  143. package/dist/extensions/optional/AGENT.md +10 -10
  144. package/dist/modes/interactive/controllers/input-submit-controller.js +2 -2
  145. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  146. package/dist/modes/interactive/interactive-mode.js +19 -19
  147. package/dist/modes/interactive/theme/dark.json +85 -85
  148. package/dist/modes/interactive/theme/light.json +84 -84
  149. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  150. package/dist/modes/interactive/theme/warm.json +81 -81
  151. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  152. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  153. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  154. package/docs/SDK-TESTING.md +364 -0
  155. package/docs/codex-goal-command-impl.md +1055 -1055
  156. package/docs/codex-goal-vs-grub.md +500 -500
  157. package/docs/custom-provider.md +27 -27
  158. package/docs/extensions.md +27 -27
  159. package/docs/keybindings.md +27 -27
  160. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  161. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  162. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  163. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  164. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  165. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  166. package/docs/loop-usage-examples.md +214 -214
  167. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  168. package/docs/models.md +27 -27
  169. package/docs/packages.md +27 -27
  170. package/docs/pi-design-philosophy.md +457 -457
  171. package/docs/planmode.md +1987 -1987
  172. package/docs/prompt-templates.md +27 -27
  173. package/docs/providers.md +27 -27
  174. package/docs/sdk.md +27 -27
  175. package/docs/skills.md +27 -27
  176. package/docs/startup-performance-optimization.md +301 -0
  177. package/docs/themes.md +27 -27
  178. package/docs/tui.md +27 -27
  179. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  180. package/package.json +190 -190
  181. package/docs/cc-agent-design.md +0 -1297
  182. package/docs/cc-tui-design.md +0 -1333
  183. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
  184. package/docs/scan-report.md +0 -3820
  185. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  186. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
@@ -1,360 +1,360 @@
1
- # Coursera — Course & Catalog Data Extraction
2
-
3
- Field-tested against coursera.org and api.coursera.org on 2026-04-18.
4
- No authentication required for the public catalog API.
5
-
6
- ## TL;DR — Fastest Approach
7
-
8
- Use `http_get` against `api.coursera.org`. The public REST API returns clean JSON with no
9
- auth, no bot-detection, and sub-600ms latency. Use `q=search` with a keyword
10
- only when you need full-text search (requires a browser POST workaround — see below).
11
- For bulk enumeration, iterate the catalog list with `start` pagination.
12
-
13
- ---
14
-
15
- ## 1. Catalog List (http_get — always works)
16
-
17
- The default list query (`q=list` implied) returns ALL courses in Coursera's catalog —
18
- 20,659 as of the test date.
19
-
20
- ```python
21
- from helpers import http_get
22
- import json
23
-
24
- resp = http_get(
25
- "https://api.coursera.org/api/courses.v1"
26
- "?fields=name,slug,description,primaryLanguages,workload,"
27
- "partnerIds,courseType,instructorIds,domainTypes,photoUrl,certificates"
28
- "&limit=100&start=0"
29
- )
30
- data = json.loads(resp)
31
- courses = data["elements"] # list of dicts
32
- next_start = data["paging"].get("next") # e.g. "100", None when exhausted
33
- total = data["paging"].get("total") # 20659
34
- ```
35
-
36
- ### Response structure (confirmed field names)
37
-
38
- ```json
39
- {
40
- "courseType": "v2.ondemand",
41
- "description": "Gamification is the application of game elements...",
42
- "domainTypes": [
43
- {"domainId": "computer-science", "subdomainId": "design-and-product"},
44
- {"domainId": "business", "subdomainId": "marketing"}
45
- ],
46
- "photoUrl": "https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera-course-photos.s3.amazonaws.com/...",
47
- "id": "69Bku0KoEeWZtA4u62x6lQ",
48
- "slug": "gamification",
49
- "instructorIds": ["226710"],
50
- "specializations": [],
51
- "workload": "4-8 hours/week",
52
- "primaryLanguages": ["en"],
53
- "partnerIds": ["6"],
54
- "certificates": ["VerifiedCert"],
55
- "name": "Gamification"
56
- }
57
- ```
58
-
59
- Field notes:
60
- - `id` — opaque base64-ish string, stable identifier. Use for batch lookups and linking.
61
- - `slug` — URL-safe identifier. Course page: `https://www.coursera.org/learn/{slug}`
62
- - `courseType` — always `"v2.ondemand"` for self-paced courses in practice.
63
- - `workload` — free-text string, e.g. `"4-8 hours/week"`, `"1 hour 30 minutes"`, `"4 weeks of study, 1-2 hours/week"`. Not normalized.
64
- - `primaryLanguages` — ISO 639-1 list, e.g. `["en"]`, `["fr"]`.
65
- - `partnerIds` — list of partner (university/org) IDs. Join to `partners.v1` by id.
66
- - `instructorIds` — list of instructor IDs. Join to `instructors.v1` by id.
67
- - `domainTypes` — list of `{domainId, subdomainId}` objects. Domain IDs include `"data-science"`, `"computer-science"`, `"business"`, `"information-technology"`.
68
- - `certificates` — list of cert types, typically `["VerifiedCert"]`.
69
- - `photoUrl` — direct CDN URL to course image. Works without auth.
70
- - `specializations` — list of specialization IDs this course belongs to (often empty; not always populated here — use `onDemandSpecializations.v1` instead).
71
- - `previewLink` — field exists but was empty in all tested records; skip it.
72
- - `avgRating` — field does NOT appear in public API responses; not available.
73
-
74
- ### Pagination
75
-
76
- ```python
77
- def iter_all_courses(fields=None, page_size=100):
78
- base_fields = "name,slug,description,primaryLanguages,workload,partnerIds,courseType,domainTypes,photoUrl"
79
- if fields:
80
- base_fields = fields
81
- start = 0
82
- while True:
83
- url = (
84
- f"https://api.coursera.org/api/courses.v1"
85
- f"?fields={base_fields}&limit={page_size}&start={start}"
86
- )
87
- data = json.loads(http_get(url))
88
- yield from data["elements"]
89
- nxt = data["paging"].get("next")
90
- if nxt is None:
91
- break
92
- start = int(nxt)
93
- ```
94
-
95
- - `paging.next` is a string offset (e.g. `"100"`), or absent when exhausted.
96
- - `paging.total` is present on the first page (e.g. `20659`) but absent on subsequent pages.
97
- - `limit` up to at least 1000 works (tested: 1000 returned 1000 items). Use 100–500 for safe batches.
98
-
99
- ---
100
-
101
- ## 2. Partners API (http_get — works)
102
-
103
- 422 partners (universities, companies) as of test date.
104
-
105
- ```python
106
- resp = http_get(
107
- "https://api.coursera.org/api/partners.v1"
108
- "?fields=name,squareLogo,description,shortName&limit=50&start=0"
109
- )
110
- data = json.loads(resp)
111
- partners = data["elements"]
112
- # paging.next and paging.total follow same structure as courses
113
- ```
114
-
115
- ### Partner record structure
116
-
117
- ```json
118
- {
119
- "id": "6",
120
- "name": "University of Pennsylvania",
121
- "shortName": "penn",
122
- "description": "The University of Pennsylvania (commonly referred to as Penn)...",
123
- "squareLogo": "http://coursera-university-assets.s3.amazonaws.com/.../logo.png"
124
- }
125
- ```
126
-
127
- ### Partner by ID (with courseIds)
128
-
129
- ```python
130
- resp = http_get(
131
- "https://api.coursera.org/api/partners.v1"
132
- "?ids=6&fields=name,squareLogo,description,shortName,courseIds"
133
- )
134
- data = json.loads(resp)
135
- partner = data["elements"][0]
136
- # partner["courseIds"] is a list of course ID strings (150+ for large universities)
137
- ```
138
-
139
- ---
140
-
141
- ## 3. Specializations API (http_get — works)
142
-
143
- ```python
144
- resp = http_get(
145
- "https://api.coursera.org/api/onDemandSpecializations.v1"
146
- "?fields=name,slug,description,partnerIds,courseIds,tagline&limit=100&start=0"
147
- )
148
- data = json.loads(resp)
149
- specs = data["elements"]
150
- ```
151
-
152
- ### Specialization record structure
153
-
154
- ```json
155
- {
156
- "id": "AbCdEfGhIjKl",
157
- "name": "SIEM Splunk",
158
- "slug": "siem-splunk",
159
- "tagline": "Learn SIEM fundamentals with Splunk",
160
- "description": "Course Overview:\n\nIn the \"SIEM Splunk\" specialization course...",
161
- "partnerIds": ["1441"],
162
- "courseIds": ["pu2XQCuEEe6qTBJCf71DPw", "Xc46mVFkEe6a4wrvTcwXPw", "YH1ok1FXEe62cBI5JZME2w"]
163
- }
164
- ```
165
-
166
- Note: Specializations paging does NOT include `paging.total` — iterate until `paging.next` is absent.
167
-
168
- ---
169
-
170
- ## 4. Instructors API (http_get — works)
171
-
172
- Only useful for lookups by ID (from course `instructorIds`). The plain list endpoint
173
- returns many empty records (empty name/bio).
174
-
175
- ```python
176
- # Lookup specific instructors by ID
177
- resp = http_get(
178
- "https://api.coursera.org/api/instructors.v1"
179
- "?ids=226710&fields=fullName,bio,department,title,photo"
180
- )
181
- data = json.loads(resp)
182
- instructor = data["elements"][0]
183
- ```
184
-
185
- ### Instructor record structure
186
-
187
- ```json
188
- {
189
- "id": "226710",
190
- "fullName": "Kevin Werbach",
191
- "title": "Professor of Legal Studies and Business Ethics",
192
- "department": "Legal Studies and Business Ethics",
193
- "bio": "Kevin Werbach is professor of Legal Studies...",
194
- "photo": "https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/..."
195
- }
196
- ```
197
-
198
- ---
199
-
200
- ## 5. Batch ID Lookup
201
-
202
- Fetch multiple courses (or partners/instructors) in one request by passing a comma-separated `ids` list:
203
-
204
- ```python
205
- ids = ",".join(["69Bku0KoEeWZtA4u62x6lQ", "hOzhxVNuEfCW8Q55q1kSNQ", "0HiU7Oe4EeWTAQ4yevf_oQ"])
206
- resp = http_get(
207
- f"https://api.coursera.org/api/courses.v1"
208
- f"?ids={ids}&fields=name,slug,description,primaryLanguages,workload,partnerIds"
209
- )
210
- data = json.loads(resp)
211
- # data["elements"] has exactly the courses you asked for
212
- ```
213
-
214
- No observed limit on the number of IDs per request in testing (tried up to 3).
215
-
216
- ---
217
-
218
- ## 6. Keyword Search — BLOCKED for GET (405)
219
-
220
- `q=search&query=...` returns **HTTP 405 Method Not Allowed** on GET.
221
- This applies to all three resource types:
222
- - `courses.v1?q=search&query=python` → 405
223
- - `onDemandSpecializations.v1?q=search&query=data+science` → 405
224
- - `partners.v1?q=search&query=stanford` → 405
225
-
226
- The search endpoint requires a POST request (Coursera's public Autocomplete/Search
227
- service). For keyword-based discovery without a browser, use the catalog list and filter
228
- client-side, or use the browser approach below.
229
-
230
- ### Browser fallback for keyword search
231
-
232
- ```python
233
- new_tab("https://www.coursera.org/search?query=machine+learning")
234
- wait_for_load()
235
- wait(3) # Results load asynchronously via React
236
- capture_screenshot()
237
- ```
238
-
239
- Note: The search results page (`/search?query=...`) is a client-rendered React app. The
240
- HTML returned by `http_get` does NOT contain course cards — it's a bare shell with no
241
- `__NEXT_DATA__` or embedded JSON. A live browser is required to see rendered results.
242
-
243
- ---
244
-
245
- ## 7. Course Detail HTML Page (http_get — works, limited data)
246
-
247
- ```python
248
- html = http_get("https://www.coursera.org/learn/machine-learning")
249
- # html is ~980KB of server-rendered HTML (no NEXT_DATA, no Apollo state)
250
- ```
251
-
252
- The course detail page IS served as full HTML (no JS-gate), but contains very
253
- little machine-readable course data. What you can extract:
254
-
255
- ```python
256
- import re, json
257
-
258
- # Page title (includes course name)
259
- title = re.search(r'<title[^>]*>(.*?)</title>', html).group(1)
260
- # "Supervised Machine Learning: Regression and Classification | Coursera"
261
-
262
- # JSON-LD blocks (2 present)
263
- jsonld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
264
- # Block 0: FAQPage schema (common Q&A about how courses work)
265
- # Block 1: BreadcrumbList (category path, e.g. Browse > Data Science > Machine Learning)
266
- faq = json.loads(jsonld_blocks[0]) # {"@type": "FAQPage", "mainEntity": [...]}
267
- crumb = json.loads(jsonld_blocks[1]) # {"@type": "BreadcrumbList", "itemListElement": [...]}
268
-
269
- # Extract breadcrumb categories
270
- categories = [item["item"]["name"] for item in crumb["@graph"][0]["itemListElement"]]
271
- # e.g. ["Browse", "Data Science", "Machine Learning"]
272
- ```
273
-
274
- The HTML does NOT embed: description, rating, instructor names, enrollment count,
275
- price, or any course-specific metadata as machine-readable fields.
276
- Use the API (`courses.v1?ids=...`) to get those from the slug.
277
-
278
- ### Slug-to-ID lookup pattern
279
-
280
- ```python
281
- # Get course data from slug (need ID first — get it from catalog or search)
282
- # Pattern: enumerate catalog, match by slug
283
- resp = http_get("https://api.coursera.org/api/courses.v1?fields=name,slug,description&limit=100&start=0")
284
- data = json.loads(resp)
285
- by_slug = {el["slug"]: el for el in data["elements"]}
286
- course = by_slug.get("machine-learning")
287
- ```
288
-
289
- ---
290
-
291
- ## Endpoints Summary
292
-
293
- | Endpoint | Method | Result |
294
- |---|---|---|
295
- | `courses.v1` (list) | GET | 200 OK — full catalog, 20,659 courses |
296
- | `courses.v1?ids=...` | GET | 200 OK — batch lookup by ID |
297
- | `courses.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
298
- | `partners.v1` (list) | GET | 200 OK — 422 partners |
299
- | `partners.v1?ids=...` | GET | 200 OK — with courseIds |
300
- | `partners.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
301
- | `onDemandSpecializations.v1` (list) | GET | 200 OK — paginated (no total) |
302
- | `onDemandSpecializations.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
303
- | `instructors.v1?ids=...` | GET | 200 OK — rich records by ID |
304
- | `instructors.v1` (list) | GET | 200 OK — mostly empty records |
305
- | `degrees.v1` | GET | 403 Forbidden |
306
- | `/search?query=...` page HTML | GET | 200 OK — React shell only, no data |
307
- | `/learn/{slug}` page HTML | GET | 200 OK — HTML with JSON-LD breadcrumb only |
308
-
309
- ---
310
-
311
- ## Rate Limits
312
-
313
- No rate limiting observed in testing:
314
- - 5 consecutive requests with no delay: all succeeded, avg 0.55s each.
315
- - No `X-RateLimit-*` or `Retry-After` headers in responses.
316
- - No auth headers needed for any working endpoint.
317
-
318
- Response headers that are present: `X-Coursera-Request-Id`, `X-Coursera-Trace-Id-Hex`,
319
- `x-envoy-upstream-service-time`. No rate-limit indicators.
320
-
321
- Use a small delay (0.5s) between requests if doing bulk enumeration of the full 20K+
322
- catalog as a courtesy, but no hard cap was observed.
323
-
324
- ---
325
-
326
- ## Gotchas
327
-
328
- - **`q=search` is POST-only**: All three resource types (courses, specializations,
329
- partners) return 405 on GET when `q=search` is added. There is no documented public
330
- POST endpoint. For keyword filtering, enumerate the catalog and filter client-side.
331
-
332
- - **`paging.total` absent after page 1**: Only the first page response includes
333
- `paging.total`. Subsequent pages have only `paging.next`. Check for the `"next"` key
334
- being absent to detect end-of-list.
335
-
336
- - **Specializations never include `paging.total`**: The `onDemandSpecializations.v1`
337
- endpoint never returns `paging.total` in any page. Iterate until `"next"` is absent.
338
-
339
- - **`workload` is free-text, unnormalized**: Values include `"4-8 hours/week"`,
340
- `"1 hour 30 minutes"`, `"4 weeks of study, 1-2 hours/week"`. Do not parse as a number
341
- without normalization logic.
342
-
343
- - **`instructors.v1` list returns empty records**: The plain list endpoint returns many
344
- instructors with empty `fullName`, `bio`, `title`. Always look up by `ids=` using
345
- IDs from course records.
346
-
347
- - **`degrees.v1` is 403**: Degree programs are not accessible via the public API.
348
-
349
- - **HTML pages contain no embedded course data**: Both the search page and the course
350
- detail page are React-rendered. `http_get` on `/search?query=...` returns an HTML
351
- shell with no course listings. `http_get` on `/learn/{slug}` returns HTML with only
352
- a FAQ JSON-LD and a breadcrumb JSON-LD — no course description, rating, price, or
353
- enrollment data as machine-readable fields.
354
-
355
- - **`linked` resources don't populate**: Passing `includes=partners.v1` to the courses
356
- endpoint returns an empty `linked: {}` object. Cross-resource joins require separate
357
- requests by IDs.
358
-
359
- - **`previewLink` and `avgRating` fields**: These field names are accepted without error
360
- but return no data in the response objects. Do not request them.
1
+ # Coursera — Course & Catalog Data Extraction
2
+
3
+ Field-tested against coursera.org and api.coursera.org on 2026-04-18.
4
+ No authentication required for the public catalog API.
5
+
6
+ ## TL;DR — Fastest Approach
7
+
8
+ Use `http_get` against `api.coursera.org`. The public REST API returns clean JSON with no
9
+ auth, no bot-detection, and sub-600ms latency. Use `q=search` with a keyword
10
+ only when you need full-text search (requires a browser POST workaround — see below).
11
+ For bulk enumeration, iterate the catalog list with `start` pagination.
12
+
13
+ ---
14
+
15
+ ## 1. Catalog List (http_get — always works)
16
+
17
+ The default list query (`q=list` implied) returns ALL courses in Coursera's catalog —
18
+ 20,659 as of the test date.
19
+
20
+ ```python
21
+ from helpers import http_get
22
+ import json
23
+
24
+ resp = http_get(
25
+ "https://api.coursera.org/api/courses.v1"
26
+ "?fields=name,slug,description,primaryLanguages,workload,"
27
+ "partnerIds,courseType,instructorIds,domainTypes,photoUrl,certificates"
28
+ "&limit=100&start=0"
29
+ )
30
+ data = json.loads(resp)
31
+ courses = data["elements"] # list of dicts
32
+ next_start = data["paging"].get("next") # e.g. "100", None when exhausted
33
+ total = data["paging"].get("total") # 20659
34
+ ```
35
+
36
+ ### Response structure (confirmed field names)
37
+
38
+ ```json
39
+ {
40
+ "courseType": "v2.ondemand",
41
+ "description": "Gamification is the application of game elements...",
42
+ "domainTypes": [
43
+ {"domainId": "computer-science", "subdomainId": "design-and-product"},
44
+ {"domainId": "business", "subdomainId": "marketing"}
45
+ ],
46
+ "photoUrl": "https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://coursera-course-photos.s3.amazonaws.com/...",
47
+ "id": "69Bku0KoEeWZtA4u62x6lQ",
48
+ "slug": "gamification",
49
+ "instructorIds": ["226710"],
50
+ "specializations": [],
51
+ "workload": "4-8 hours/week",
52
+ "primaryLanguages": ["en"],
53
+ "partnerIds": ["6"],
54
+ "certificates": ["VerifiedCert"],
55
+ "name": "Gamification"
56
+ }
57
+ ```
58
+
59
+ Field notes:
60
+ - `id` — opaque base64-ish string, stable identifier. Use for batch lookups and linking.
61
+ - `slug` — URL-safe identifier. Course page: `https://www.coursera.org/learn/{slug}`
62
+ - `courseType` — always `"v2.ondemand"` for self-paced courses in practice.
63
+ - `workload` — free-text string, e.g. `"4-8 hours/week"`, `"1 hour 30 minutes"`, `"4 weeks of study, 1-2 hours/week"`. Not normalized.
64
+ - `primaryLanguages` — ISO 639-1 list, e.g. `["en"]`, `["fr"]`.
65
+ - `partnerIds` — list of partner (university/org) IDs. Join to `partners.v1` by id.
66
+ - `instructorIds` — list of instructor IDs. Join to `instructors.v1` by id.
67
+ - `domainTypes` — list of `{domainId, subdomainId}` objects. Domain IDs include `"data-science"`, `"computer-science"`, `"business"`, `"information-technology"`.
68
+ - `certificates` — list of cert types, typically `["VerifiedCert"]`.
69
+ - `photoUrl` — direct CDN URL to course image. Works without auth.
70
+ - `specializations` — list of specialization IDs this course belongs to (often empty; not always populated here — use `onDemandSpecializations.v1` instead).
71
+ - `previewLink` — field exists but was empty in all tested records; skip it.
72
+ - `avgRating` — field does NOT appear in public API responses; not available.
73
+
74
+ ### Pagination
75
+
76
+ ```python
77
+ def iter_all_courses(fields=None, page_size=100):
78
+ base_fields = "name,slug,description,primaryLanguages,workload,partnerIds,courseType,domainTypes,photoUrl"
79
+ if fields:
80
+ base_fields = fields
81
+ start = 0
82
+ while True:
83
+ url = (
84
+ f"https://api.coursera.org/api/courses.v1"
85
+ f"?fields={base_fields}&limit={page_size}&start={start}"
86
+ )
87
+ data = json.loads(http_get(url))
88
+ yield from data["elements"]
89
+ nxt = data["paging"].get("next")
90
+ if nxt is None:
91
+ break
92
+ start = int(nxt)
93
+ ```
94
+
95
+ - `paging.next` is a string offset (e.g. `"100"`), or absent when exhausted.
96
+ - `paging.total` is present on the first page (e.g. `20659`) but absent on subsequent pages.
97
+ - `limit` up to at least 1000 works (tested: 1000 returned 1000 items). Use 100–500 for safe batches.
98
+
99
+ ---
100
+
101
+ ## 2. Partners API (http_get — works)
102
+
103
+ 422 partners (universities, companies) as of test date.
104
+
105
+ ```python
106
+ resp = http_get(
107
+ "https://api.coursera.org/api/partners.v1"
108
+ "?fields=name,squareLogo,description,shortName&limit=50&start=0"
109
+ )
110
+ data = json.loads(resp)
111
+ partners = data["elements"]
112
+ # paging.next and paging.total follow same structure as courses
113
+ ```
114
+
115
+ ### Partner record structure
116
+
117
+ ```json
118
+ {
119
+ "id": "6",
120
+ "name": "University of Pennsylvania",
121
+ "shortName": "penn",
122
+ "description": "The University of Pennsylvania (commonly referred to as Penn)...",
123
+ "squareLogo": "http://coursera-university-assets.s3.amazonaws.com/.../logo.png"
124
+ }
125
+ ```
126
+
127
+ ### Partner by ID (with courseIds)
128
+
129
+ ```python
130
+ resp = http_get(
131
+ "https://api.coursera.org/api/partners.v1"
132
+ "?ids=6&fields=name,squareLogo,description,shortName,courseIds"
133
+ )
134
+ data = json.loads(resp)
135
+ partner = data["elements"][0]
136
+ # partner["courseIds"] is a list of course ID strings (150+ for large universities)
137
+ ```
138
+
139
+ ---
140
+
141
+ ## 3. Specializations API (http_get — works)
142
+
143
+ ```python
144
+ resp = http_get(
145
+ "https://api.coursera.org/api/onDemandSpecializations.v1"
146
+ "?fields=name,slug,description,partnerIds,courseIds,tagline&limit=100&start=0"
147
+ )
148
+ data = json.loads(resp)
149
+ specs = data["elements"]
150
+ ```
151
+
152
+ ### Specialization record structure
153
+
154
+ ```json
155
+ {
156
+ "id": "AbCdEfGhIjKl",
157
+ "name": "SIEM Splunk",
158
+ "slug": "siem-splunk",
159
+ "tagline": "Learn SIEM fundamentals with Splunk",
160
+ "description": "Course Overview:\n\nIn the \"SIEM Splunk\" specialization course...",
161
+ "partnerIds": ["1441"],
162
+ "courseIds": ["pu2XQCuEEe6qTBJCf71DPw", "Xc46mVFkEe6a4wrvTcwXPw", "YH1ok1FXEe62cBI5JZME2w"]
163
+ }
164
+ ```
165
+
166
+ Note: Specializations paging does NOT include `paging.total` — iterate until `paging.next` is absent.
167
+
168
+ ---
169
+
170
+ ## 4. Instructors API (http_get — works)
171
+
172
+ Only useful for lookups by ID (from course `instructorIds`). The plain list endpoint
173
+ returns many empty records (empty name/bio).
174
+
175
+ ```python
176
+ # Lookup specific instructors by ID
177
+ resp = http_get(
178
+ "https://api.coursera.org/api/instructors.v1"
179
+ "?ids=226710&fields=fullName,bio,department,title,photo"
180
+ )
181
+ data = json.loads(resp)
182
+ instructor = data["elements"][0]
183
+ ```
184
+
185
+ ### Instructor record structure
186
+
187
+ ```json
188
+ {
189
+ "id": "226710",
190
+ "fullName": "Kevin Werbach",
191
+ "title": "Professor of Legal Studies and Business Ethics",
192
+ "department": "Legal Studies and Business Ethics",
193
+ "bio": "Kevin Werbach is professor of Legal Studies...",
194
+ "photo": "https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/..."
195
+ }
196
+ ```
197
+
198
+ ---
199
+
200
+ ## 5. Batch ID Lookup
201
+
202
+ Fetch multiple courses (or partners/instructors) in one request by passing a comma-separated `ids` list:
203
+
204
+ ```python
205
+ ids = ",".join(["69Bku0KoEeWZtA4u62x6lQ", "hOzhxVNuEfCW8Q55q1kSNQ", "0HiU7Oe4EeWTAQ4yevf_oQ"])
206
+ resp = http_get(
207
+ f"https://api.coursera.org/api/courses.v1"
208
+ f"?ids={ids}&fields=name,slug,description,primaryLanguages,workload,partnerIds"
209
+ )
210
+ data = json.loads(resp)
211
+ # data["elements"] has exactly the courses you asked for
212
+ ```
213
+
214
+ No observed limit on the number of IDs per request in testing (tried up to 3).
215
+
216
+ ---
217
+
218
+ ## 6. Keyword Search — BLOCKED for GET (405)
219
+
220
+ `q=search&query=...` returns **HTTP 405 Method Not Allowed** on GET.
221
+ This applies to all three resource types:
222
+ - `courses.v1?q=search&query=python` → 405
223
+ - `onDemandSpecializations.v1?q=search&query=data+science` → 405
224
+ - `partners.v1?q=search&query=stanford` → 405
225
+
226
+ The search endpoint requires a POST request (Coursera's public Autocomplete/Search
227
+ service). For keyword-based discovery without a browser, use the catalog list and filter
228
+ client-side, or use the browser approach below.
229
+
230
+ ### Browser fallback for keyword search
231
+
232
+ ```python
233
+ new_tab("https://www.coursera.org/search?query=machine+learning")
234
+ wait_for_load()
235
+ wait(3) # Results load asynchronously via React
236
+ capture_screenshot()
237
+ ```
238
+
239
+ Note: The search results page (`/search?query=...`) is a client-rendered React app. The
240
+ HTML returned by `http_get` does NOT contain course cards — it's a bare shell with no
241
+ `__NEXT_DATA__` or embedded JSON. A live browser is required to see rendered results.
242
+
243
+ ---
244
+
245
+ ## 7. Course Detail HTML Page (http_get — works, limited data)
246
+
247
+ ```python
248
+ html = http_get("https://www.coursera.org/learn/machine-learning")
249
+ # html is ~980KB of server-rendered HTML (no NEXT_DATA, no Apollo state)
250
+ ```
251
+
252
+ The course detail page IS served as full HTML (no JS-gate), but contains very
253
+ little machine-readable course data. What you can extract:
254
+
255
+ ```python
256
+ import re, json
257
+
258
+ # Page title (includes course name)
259
+ title = re.search(r'<title[^>]*>(.*?)</title>', html).group(1)
260
+ # "Supervised Machine Learning: Regression and Classification | Coursera"
261
+
262
+ # JSON-LD blocks (2 present)
263
+ jsonld_blocks = re.findall(r'<script type="application/ld\+json">(.*?)</script>', html, re.DOTALL)
264
+ # Block 0: FAQPage schema (common Q&A about how courses work)
265
+ # Block 1: BreadcrumbList (category path, e.g. Browse > Data Science > Machine Learning)
266
+ faq = json.loads(jsonld_blocks[0]) # {"@type": "FAQPage", "mainEntity": [...]}
267
+ crumb = json.loads(jsonld_blocks[1]) # {"@type": "BreadcrumbList", "itemListElement": [...]}
268
+
269
+ # Extract breadcrumb categories
270
+ categories = [item["item"]["name"] for item in crumb["@graph"][0]["itemListElement"]]
271
+ # e.g. ["Browse", "Data Science", "Machine Learning"]
272
+ ```
273
+
274
+ The HTML does NOT embed: description, rating, instructor names, enrollment count,
275
+ price, or any course-specific metadata as machine-readable fields.
276
+ Use the API (`courses.v1?ids=...`) to get those from the slug.
277
+
278
+ ### Slug-to-ID lookup pattern
279
+
280
+ ```python
281
+ # Get course data from slug (need ID first — get it from catalog or search)
282
+ # Pattern: enumerate catalog, match by slug
283
+ resp = http_get("https://api.coursera.org/api/courses.v1?fields=name,slug,description&limit=100&start=0")
284
+ data = json.loads(resp)
285
+ by_slug = {el["slug"]: el for el in data["elements"]}
286
+ course = by_slug.get("machine-learning")
287
+ ```
288
+
289
+ ---
290
+
291
+ ## Endpoints Summary
292
+
293
+ | Endpoint | Method | Result |
294
+ |---|---|---|
295
+ | `courses.v1` (list) | GET | 200 OK — full catalog, 20,659 courses |
296
+ | `courses.v1?ids=...` | GET | 200 OK — batch lookup by ID |
297
+ | `courses.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
298
+ | `partners.v1` (list) | GET | 200 OK — 422 partners |
299
+ | `partners.v1?ids=...` | GET | 200 OK — with courseIds |
300
+ | `partners.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
301
+ | `onDemandSpecializations.v1` (list) | GET | 200 OK — paginated (no total) |
302
+ | `onDemandSpecializations.v1?q=search&query=...` | GET | **405 Method Not Allowed** |
303
+ | `instructors.v1?ids=...` | GET | 200 OK — rich records by ID |
304
+ | `instructors.v1` (list) | GET | 200 OK — mostly empty records |
305
+ | `degrees.v1` | GET | 403 Forbidden |
306
+ | `/search?query=...` page HTML | GET | 200 OK — React shell only, no data |
307
+ | `/learn/{slug}` page HTML | GET | 200 OK — HTML with JSON-LD breadcrumb only |
308
+
309
+ ---
310
+
311
+ ## Rate Limits
312
+
313
+ No rate limiting observed in testing:
314
+ - 5 consecutive requests with no delay: all succeeded, avg 0.55s each.
315
+ - No `X-RateLimit-*` or `Retry-After` headers in responses.
316
+ - No auth headers needed for any working endpoint.
317
+
318
+ Response headers that are present: `X-Coursera-Request-Id`, `X-Coursera-Trace-Id-Hex`,
319
+ `x-envoy-upstream-service-time`. No rate-limit indicators.
320
+
321
+ Use a small delay (0.5s) between requests if doing bulk enumeration of the full 20K+
322
+ catalog as a courtesy, but no hard cap was observed.
323
+
324
+ ---
325
+
326
+ ## Gotchas
327
+
328
+ - **`q=search` is POST-only**: All three resource types (courses, specializations,
329
+ partners) return 405 on GET when `q=search` is added. There is no documented public
330
+ POST endpoint. For keyword filtering, enumerate the catalog and filter client-side.
331
+
332
+ - **`paging.total` absent after page 1**: Only the first page response includes
333
+ `paging.total`. Subsequent pages have only `paging.next`. Check for the `"next"` key
334
+ being absent to detect end-of-list.
335
+
336
+ - **Specializations never include `paging.total`**: The `onDemandSpecializations.v1`
337
+ endpoint never returns `paging.total` in any page. Iterate until `"next"` is absent.
338
+
339
+ - **`workload` is free-text, unnormalized**: Values include `"4-8 hours/week"`,
340
+ `"1 hour 30 minutes"`, `"4 weeks of study, 1-2 hours/week"`. Do not parse as a number
341
+ without normalization logic.
342
+
343
+ - **`instructors.v1` list returns empty records**: The plain list endpoint returns many
344
+ instructors with empty `fullName`, `bio`, `title`. Always look up by `ids=` using
345
+ IDs from course records.
346
+
347
+ - **`degrees.v1` is 403**: Degree programs are not accessible via the public API.
348
+
349
+ - **HTML pages contain no embedded course data**: Both the search page and the course
350
+ detail page are React-rendered. `http_get` on `/search?query=...` returns an HTML
351
+ shell with no course listings. `http_get` on `/learn/{slug}` returns HTML with only
352
+ a FAQ JSON-LD and a breadcrumb JSON-LD — no course description, rating, price, or
353
+ enrollment data as machine-readable fields.
354
+
355
+ - **`linked` resources don't populate**: Passing `includes=partners.v1` to the courses
356
+ endpoint returns an empty `linked: {}` object. Cross-resource joins require separate
357
+ requests by IDs.
358
+
359
+ - **`previewLink` and `avgRating` fields**: These field names are accepted without error
360
+ but return no data in the response objects. Do not request them.