@pencil-agent/nano-pencil 2.0.0-beta.8 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (241) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/extensions-host/index.d.ts +1 -1
  7. package/dist/core/extensions-host/loader.js +1 -1
  8. package/dist/core/extensions-host/runner.d.ts +1 -0
  9. package/dist/core/extensions-host/runner.js +2 -2
  10. package/dist/core/extensions-host/types.d.ts +17 -22
  11. package/dist/core/lib/ai/src/types.d.ts +12 -2
  12. package/dist/core/persona/persona-manager.js +5 -2
  13. package/dist/core/runtime/agent-session.js +3 -3
  14. package/dist/core/runtime/extension-core-bindings.d.ts +1 -0
  15. package/dist/core/runtime/extension-core-bindings.js +2 -2
  16. package/dist/extensions/builtin/AGENT.md +115 -115
  17. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  18. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  99. package/dist/extensions/builtin/browser/browser.md +73 -73
  100. package/dist/extensions/builtin/browser/install.md +142 -142
  101. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  102. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  104. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  105. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  112. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  113. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  114. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  115. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  116. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  117. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  118. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  119. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  120. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  121. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  122. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  123. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  124. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  125. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  126. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  127. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  128. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  129. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  130. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  131. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  132. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  133. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  134. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  135. package/dist/extensions/builtin/goal/README.md +67 -67
  136. package/dist/extensions/builtin/goal/goal-controller.d.ts +39 -10
  137. package/dist/extensions/builtin/goal/goal-controller.js +1 -1
  138. package/dist/extensions/builtin/goal/goal-format.js +1 -1
  139. package/dist/extensions/builtin/goal/goal-prompts.d.ts +2 -0
  140. package/dist/extensions/builtin/goal/goal-prompts.js +5 -4
  141. package/dist/extensions/builtin/goal/goal-store.js +1 -1
  142. package/dist/extensions/builtin/goal/index.d.ts +1 -1
  143. package/dist/extensions/builtin/goal/index.js +10 -7
  144. package/dist/extensions/builtin/grub/README.md +112 -112
  145. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  146. package/dist/extensions/builtin/link-world/index.js +6 -6
  147. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  148. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  149. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  150. package/dist/extensions/builtin/link-world/{network-routing.md → network-routing/network-routing.md} +67 -67
  151. package/dist/extensions/builtin/loop/README.md +92 -92
  152. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  153. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  154. package/dist/extensions/builtin/plan/index.js +1 -1
  155. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  156. package/dist/extensions/builtin/sal/README.md +72 -72
  157. package/dist/extensions/builtin/security-audit/README.md +289 -289
  158. package/dist/extensions/builtin/task/task-store.d.ts +4 -0
  159. package/dist/extensions/builtin/task/task-store.js +1 -1
  160. package/dist/extensions/builtin/team/AGENT.md +112 -112
  161. package/dist/extensions/builtin/team/TESTING.md +299 -299
  162. package/dist/extensions/builtin/token-save/README.md +56 -56
  163. package/dist/extensions/optional/AGENT.md +10 -10
  164. package/dist/index.d.ts +5 -30
  165. package/dist/index.js +1 -1
  166. package/dist/models.d.ts +7 -0
  167. package/dist/models.js +1 -0
  168. package/dist/modes/interactive/components/footer.js +1 -1
  169. package/dist/modes/interactive/components/task-status-panel.d.ts +36 -0
  170. package/dist/modes/interactive/components/task-status-panel.js +1 -0
  171. package/dist/modes/interactive/controllers/stream-render-controller.d.ts +7 -0
  172. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  173. package/dist/modes/interactive/interactive-mode.js +40 -40
  174. package/dist/modes/interactive/state/interactive-state.d.ts +2 -0
  175. package/dist/modes/interactive/state/interactive-state.js +1 -1
  176. package/dist/modes/interactive/theme/dark.json +85 -85
  177. package/dist/modes/interactive/theme/light.json +84 -84
  178. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  179. package/dist/modes/interactive/theme/warm.json +81 -81
  180. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  181. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  182. package/dist/node_modules/@pencil-agent/ai/dist/providers/anthropic.js +2 -2
  183. package/dist/node_modules/@pencil-agent/ai/dist/providers/openai-completions.js +5 -5
  184. package/dist/node_modules/@pencil-agent/ai/dist/providers/openai-responses.js +1 -1
  185. package/dist/node_modules/@pencil-agent/ai/dist/stream.js +1 -1
  186. package/dist/packages/protocol/src/commands.d.ts +33 -0
  187. package/dist/packages/protocol/src/flags.d.ts +20 -0
  188. package/dist/packages/protocol/src/hooks.d.ts +17 -0
  189. package/dist/packages/protocol/src/hooks.js +0 -0
  190. package/dist/packages/{extension-sdk → protocol}/src/index.d.ts +7 -4
  191. package/dist/packages/protocol/src/index.js +1 -0
  192. package/dist/packages/{extension-sdk → protocol}/src/lifecycle.d.ts +15 -27
  193. package/dist/packages/protocol/src/lifecycle.js +0 -0
  194. package/dist/packages/{extension-sdk → protocol}/src/tools.d.ts +1 -1
  195. package/dist/packages/protocol/src/tools.js +0 -0
  196. package/dist/public-config.d.ts +12 -0
  197. package/dist/public-config.js +1 -0
  198. package/dist/runtime.d.ts +9 -0
  199. package/dist/runtime.js +1 -0
  200. package/dist/session-compaction.d.ts +7 -0
  201. package/dist/session-compaction.js +1 -0
  202. package/dist/session.d.ts +7 -0
  203. package/dist/session.js +1 -0
  204. package/dist/skills.d.ts +7 -0
  205. package/dist/skills.js +1 -0
  206. package/dist/tools.d.ts +7 -0
  207. package/dist/tools.js +1 -0
  208. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  209. package/docs/SDK-TESTING.md +364 -0
  210. package/docs/codex-goal-command-impl.md +1055 -1055
  211. package/docs/codex-goal-vs-grub.md +500 -500
  212. package/docs/custom-provider.md +27 -27
  213. package/docs/extensions.md +27 -27
  214. package/docs/keybindings.md +27 -27
  215. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  216. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  217. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  218. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  219. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  220. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  221. package/docs/loop-usage-examples.md +214 -214
  222. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  223. package/docs/models.md +27 -27
  224. package/docs/packages.md +27 -27
  225. package/docs/pi-design-philosophy.md +457 -457
  226. package/docs/planmode.md +1987 -1987
  227. package/docs/prompt-templates.md +27 -27
  228. package/docs/providers.md +27 -27
  229. package/docs/sdk.md +27 -27
  230. package/docs/skills.md +27 -27
  231. package/docs/startup-performance-optimization.md +301 -0
  232. package/docs/themes.md +27 -27
  233. package/docs/tui.md +27 -27
  234. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  235. package/package.json +190 -162
  236. package/dist/packages/extension-sdk/src/index.js +0 -1
  237. package/docs/cc-agent-design.md +0 -1297
  238. package/docs/cc-tui-design.md +0 -1333
  239. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  240. /package/dist/packages/{extension-sdk/src/lifecycle.js → protocol/src/commands.js} +0 -0
  241. /package/dist/packages/{extension-sdk/src/tools.js → protocol/src/flags.js} +0 -0
@@ -1,364 +1,364 @@
1
- # Quora — Data Extraction
2
-
3
- `https://www.quora.com` — Q&A platform. One reliable access path: `http_get` with a Chrome UA against question, answer, topic, and profile pages. Quora SSR-renders all public data into `window.ansFrontendGlobals.data.inlineQueryResults` via `.push()` calls. No browser needed for read-only tasks.
4
-
5
- ## Do this first: pick your access path
6
-
7
- | Goal | Best approach | Latency |
8
- |------|--------------|---------|
9
- | Question metadata + first ~3 ranked answers | `http_get` question page + parse push payloads | ~600ms |
10
- | Single answer (full text + upvotes + views) | `http_get` answer permalink | ~400ms |
11
- | Answer count for a question | question page, payload with `answerCount` | same request as above |
12
- | Topic metadata (id, name, follower count) | `http_get` topic page + parse push payloads | ~400ms |
13
- | User profile (name, follower/following, credential) | `http_get` profile page + parse push payloads | ~500ms |
14
- | Keyword search results | NOT available via http_get — server returns no result data | N/A |
15
-
16
- **Never use a browser for read-only Quora tasks.** All question, answer, topic, and profile data is server-rendered. Browser is only needed for authenticated actions (posting, upvoting, following) or for getting more than the first ~3 answers on a question page (the rest load via XHR pagination).
17
-
18
- ---
19
-
20
- ## UA requirement: Chrome or Firefox — NOT bare Mozilla/5.0
21
-
22
- ```
23
- bare "Mozilla/5.0" -> HTTP 403
24
- Googlebot UA -> HTTP 403
25
- Chrome UA -> HTTP 200 (confirmed working)
26
- Firefox UA -> HTTP 200 (confirmed working)
27
- ```
28
-
29
- Use this header bundle for all requests:
30
-
31
- ```python
32
- import urllib.request, gzip, json, re
33
-
34
- CHROME_UA = (
35
- "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
36
- "AppleWebKit/537.36 (KHTML, like Gecko) "
37
- "Chrome/123.0.0.0 Safari/537.36"
38
- )
39
-
40
- def quora_get(url):
41
- """Fetch any public Quora page. Returns HTML string.
42
- Requires Chrome/Firefox UA — bare Mozilla/5.0 returns 403.
43
- """
44
- req = urllib.request.Request(url, headers={
45
- "User-Agent": CHROME_UA,
46
- "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
47
- "Accept-Encoding": "gzip",
48
- "Accept-Language": "en-US,en;q=0.9",
49
- })
50
- with urllib.request.urlopen(req, timeout=20) as r:
51
- data = r.read()
52
- if r.headers.get("Content-Encoding") == "gzip":
53
- data = gzip.decompress(data)
54
- return data.decode()
55
- ```
56
-
57
- ---
58
-
59
- ## The data format: `ansFrontendGlobals.data.inlineQueryResults`
60
-
61
- Quora SSR embeds all page data as a series of `.push("...")` calls inside `<script>` blocks. Each call pushes a JSON-encoded string (with escaped quotes) into `inlineQueryResults`. There are no JSON-LD blocks, no `__NEXT_DATA__`, no React hydration state — only these push calls.
62
-
63
- ```python
64
- def extract_quora_payloads(html):
65
- """Extract and parse all push() payloads from a Quora page.
66
- Returns list of dicts (already decoded from double JSON encoding).
67
- """
68
- raw_payloads = re.findall(r'\.push\("((?:[^"\\]|\\.)*)"\)', html)
69
- results = []
70
- for raw in raw_payloads:
71
- try:
72
- # Two levels of encoding: outer JS string escape, inner JSON
73
- inner = json.loads('"' + raw + '"') # decode JS string escaping
74
- results.append(json.loads(inner)) # decode actual JSON
75
- except Exception:
76
- pass
77
- return results
78
- ```
79
-
80
- A question page returns **16 payloads**. A profile or topic page returns **3 payloads**. The payloads that matter are identified by their `data` keys, not by position (positions are stable across requests for the same page type, but best to key on content).
81
-
82
- ---
83
-
84
- ## Path 1: Question page — metadata + first answers (fastest)
85
-
86
- ```python
87
- def quora_question(url):
88
- """
89
- Scrape a Quora question page.
90
- Returns:
91
- question: {qid, id, title, url, slug, topics}
92
- answers: list of answer dicts (first ~3 ranked answers only)
93
- answer_count: total answer count (all answers, not just loaded)
94
- related_questions: list of question title strings
95
- Only the first ~3 highest-ranked answers are SSR'd.
96
- The rest require XHR pagination (browser or session cookies needed).
97
- """
98
- html = quora_get(url)
99
- payloads = extract_quora_payloads(html)
100
-
101
- def spans_to_text(json_str):
102
- """Quora stores all text as serialized span objects."""
103
- try:
104
- doc = json.loads(json_str)
105
- parts = []
106
- for sec in doc.get('sections', []):
107
- for span in sec.get('spans', []):
108
- if span.get('text'):
109
- parts.append(span['text'])
110
- parts.append('\n')
111
- return ''.join(parts).strip()
112
- except Exception:
113
- return json_str
114
-
115
- def author_display_name(author_dict):
116
- names = author_dict.get('names', [])
117
- if names:
118
- n = names[0]
119
- return f"{n.get('givenName', '')} {n.get('familyName', '')}".strip()
120
- return None
121
-
122
- result = {'question': {}, 'answers': [], 'answer_count': None, 'related_questions': []}
123
-
124
- for payload in payloads:
125
- data = payload.get('data', payload)
126
-
127
- # Question metadata — keyed by presence of 'qid' inside 'question'
128
- if 'question' in data and isinstance(data['question'], dict):
129
- q = data['question']
130
- if q.get('qid') and not result['question']:
131
- result['question'] = {
132
- 'qid': q.get('qid'),
133
- 'id': q.get('id'),
134
- 'title': spans_to_text(q.get('title', '')),
135
- 'url': q.get('url'),
136
- 'slug': q.get('slug'),
137
- 'topics': [t['name'] for t in q.get('navigationTopics', [])],
138
- }
139
-
140
- # Total answer count — keyed by 'answerCount'
141
- if 'answerCount' in data:
142
- result['answer_count'] = data['answerCount']
143
- rq = (data.get('bottomRelatedQuestionsInfo') or {}).get('relatedQuestions', [])
144
- result['related_questions'] = [spans_to_text(r['title']) for r in rq]
145
-
146
- # Answer nodes — keyed by node.__typename == 'QuestionAnswerItem2'
147
- node = data.get('node', {})
148
- if isinstance(node, dict) and node.get('__typename') == 'QuestionAnswerItem2':
149
- answer = node.get('answer', {})
150
- if answer.get('aid'):
151
- a_author = answer.get('author') or {}
152
- cred = answer.get('authorCredential') or {}
153
- result['answers'].append({
154
- 'aid': answer.get('aid'),
155
- 'index': node.get('index'),
156
- 'author_name': author_display_name(a_author),
157
- 'author_profile': a_author.get('profileUrl'),
158
- 'author_uid': a_author.get('uid'),
159
- 'author_credential': cred.get('translatedString'),
160
- 'num_upvotes': answer.get('numUpvotes'),
161
- 'num_views': answer.get('numViews'),
162
- 'num_shares': answer.get('numShares'),
163
- 'num_comments': answer.get('numDisplayComments'),
164
- 'creation_time_us': answer.get('creationTime'), # microseconds since epoch
165
- 'viewer_has_access': answer.get('viewerHasAccess'),
166
- 'perma_url': answer.get('permaUrl'),
167
- 'text': spans_to_text(answer.get('content', '{}')),
168
- })
169
-
170
- return result
171
- ```
172
-
173
- ### Example output
174
-
175
- ```python
176
- result = quora_question("https://www.quora.com/What-is-the-meaning-of-life")
177
-
178
- # result['question']:
179
- # {
180
- # 'qid': 2861,
181
- # 'id': 'UXVlc3Rpb25AMDoyODYx',
182
- # 'title': 'What is the meaning of life?',
183
- # 'url': '/What-is-the-meaning-of-life',
184
- # 'slug': 'What-is-the-meaning-of-life',
185
- # 'topics': ['Philosophy', 'The Big Unanswered Questions', 'Meaning of Life', ...]
186
- # }
187
-
188
- # result['answer_count']: 413
189
-
190
- # result['answers'][0]:
191
- # {
192
- # 'aid': 2779675,
193
- # 'index': 1,
194
- # 'author_name': 'Shubhankar Srivastava',
195
- # 'author_profile': '/profile/Shubhankar-Srivastava',
196
- # 'author_uid': 5381038,
197
- # 'author_credential': 'works at D. E. Shaw',
198
- # 'num_upvotes': 589,
199
- # 'num_views': 24085,
200
- # 'num_shares': 0,
201
- # 'num_comments': 8,
202
- # 'creation_time_us': 1373364681312036, # divide by 1e6 for seconds
203
- # 'viewer_has_access': True,
204
- # 'perma_url': '/What-is-the-meaning-of-life/answer/Shubhankar-Srivastava',
205
- # 'text': 'Every morning in Africa, a deer wakes up...'
206
- # }
207
- ```
208
-
209
- ### Convert creation_time_us to datetime
210
-
211
- ```python
212
- from datetime import datetime, timezone
213
- ts_sec = result['answers'][0]['creation_time_us'] / 1_000_000
214
- dt = datetime.fromtimestamp(ts_sec, tz=timezone.utc)
215
- # datetime(2013, 7, 9, 9, 31, 21, tzinfo=timezone.utc)
216
- ```
217
-
218
- ---
219
-
220
- ## Path 2: Single answer permalink
221
-
222
- Fetching `quora.com/{question-slug}/answer/{author-slug}` directly returns only that one answer's full data in 3 payloads instead of 16. Use this when you already know the answer URL.
223
-
224
- ```python
225
- def quora_answer(answer_url):
226
- """
227
- Fetch a single answer by its permalink.
228
- URL format: https://www.quora.com/{question-slug}/answer/{author-profile-slug}
229
- Returns answer dict with: aid, num_upvotes, num_views, text, author info.
230
- """
231
- html = quora_get(answer_url)
232
- payloads = extract_quora_payloads(html)
233
-
234
- for payload in payloads:
235
- data = payload.get('data', {})
236
- if 'answer' in data and isinstance(data['answer'], dict):
237
- a = data['answer']
238
- author = a.get('author') or {}
239
- names = author.get('names', [{}])
240
- n = names[0] if names else {}
241
- return {
242
- 'aid': a.get('aid'),
243
- 'num_upvotes': a.get('numUpvotes'),
244
- 'num_views': a.get('numViews'),
245
- 'author_name': f"{n.get('givenName','')} {n.get('familyName','')}".strip(),
246
- 'author_uid': author.get('uid'),
247
- 'text': _spans_to_text(a.get('content', '{}')),
248
- }
249
- return {}
250
-
251
- # Example:
252
- # quora_answer("https://www.quora.com/What-is-the-meaning-of-life/answer/Pararth-Shah")
253
- # -> {'aid': 4734237, 'num_upvotes': 234, 'num_views': 100643, 'author_name': 'Pararth Shah', ...}
254
- ```
255
-
256
- ---
257
-
258
- ## Path 3: Topic page
259
-
260
- ```python
261
- def quora_topic(topic_url):
262
- """
263
- Fetch topic metadata from a Quora topic page.
264
- URL format: https://www.quora.com/topic/{topic-slug}
265
- Returns: tid, name, num_followers, url, is_following, has_leaderboard.
266
- NOTE: The topic page itself only renders topic metadata, NOT the question feed.
267
- Question feed requires browser (XHR-loaded via React).
268
- """
269
- html = quora_get(topic_url)
270
- payloads = extract_quora_payloads(html)
271
-
272
- for payload in payloads:
273
- data = payload.get('data', {})
274
- if 'topic' in data and isinstance(data['topic'], dict):
275
- t = data['topic']
276
- return {
277
- 'tid': t.get('tid'),
278
- 'id': t.get('id'),
279
- 'name': t.get('name'),
280
- 'url': t.get('url'),
281
- 'num_followers': t.get('numFollowers'),
282
- 'is_following': t.get('isFollowing'),
283
- 'has_leaderboard': t.get('hasLeaderboard'),
284
- 'photo_url': t.get('photoUrl'),
285
- 'is_locked': t.get('isLocked'),
286
- }
287
- return {}
288
-
289
- # Example:
290
- # quora_topic("https://www.quora.com/topic/Python-programming-language")
291
- # -> {'tid': 13292, 'name': 'Python Programming Language', 'num_followers': 10, ...}
292
- ```
293
-
294
- ---
295
-
296
- ## Path 4: User profile page
297
-
298
- ```python
299
- def quora_profile(profile_url):
300
- """
301
- Fetch user profile data from https://www.quora.com/profile/{username}
302
- Returns: uid, name, credential, follower_count, following_count, profile_image_url.
303
- """
304
- html = quora_get(profile_url)
305
- payloads = extract_quora_payloads(html)
306
-
307
- for payload in payloads:
308
- data = payload.get('data', {})
309
- if 'user' in data and isinstance(data['user'], dict):
310
- u = data['user']
311
- names = u.get('names', [{}])
312
- n = names[0] if names else {}
313
- cred = u.get('profileCredential') or {}
314
- return {
315
- 'uid': u.get('uid'),
316
- 'id': u.get('id'),
317
- 'name': f"{n.get('givenName','')} {n.get('familyName','')}".strip(),
318
- 'profile_url': u.get('profileUrl'),
319
- 'follower_count': u.get('followerCount'),
320
- 'following_count': u.get('followingCount'),
321
- 'profile_image': u.get('profileImageUrl'),
322
- 'credential': cred.get('experience'),
323
- 'is_verified': u.get('isVerified'),
324
- 'is_anon': u.get('isAnon'),
325
- 'is_ai_account': u.get('isAiAccount'),
326
- 'deactivated': u.get('deactivated'),
327
- }
328
- return {}
329
-
330
- # Example:
331
- # quora_profile("https://www.quora.com/profile/Pararth-Shah")
332
- # -> {'uid': 4683832, 'name': 'Pararth Shah', 'follower_count': 5154,
333
- # 'following_count': 83, 'credential': 'Unfinished symphony.', ...}
334
- ```
335
-
336
- ---
337
-
338
- ## Gotchas
339
-
340
- - **Bare Mozilla/5.0 UA returns HTTP 403** — Always use a full Chrome or Firefox UA string. The default `http_get` helper's `"User-Agent": "Mozilla/5.0"` will be blocked. Do not use `http_get` directly; use the `quora_get` wrapper above.
341
-
342
- - **Googlebot UA returns HTTP 403** — Quora blocks crawler UAs. Only real browser UAs work.
343
-
344
- - **Double JSON encoding** — Each `.push()` argument is a JavaScript string literal containing JSON. To parse: first `json.loads('"' + raw + '"')` to decode the JS string escaping (converts `\\"` to `"`), then `json.loads(inner)` to parse the actual JSON object. Skipping either step produces parse errors.
345
-
346
- - **All text fields are serialized span objects** — `question.title`, `answer.content`, `user.descriptionQtextDocument.legacyJson`, etc. are all JSON strings containing a `{"sections": [{"spans": [...]}]}` document, not plain text. Always parse through `spans_to_text()`.
347
-
348
- - **Question page only SSR's the first ~3 answers** — The `answers` list in the result will contain at most 3 entries (the top-ranked answers). The `answer_count` field shows the true total (e.g. 413). To get more answers you need browser-based XHR pagination (Quora sends additional answers via GraphQL calls that require session auth in practice).
349
-
350
- - **`viewer_has_access: false` still includes full content** — Even when `viewerHasAccess` is `False` (answers from Quora+ Spaces / tribe-only content), the `content` field is still present in the SSR payload and the full text is readable. The flag only controls client-side gating in the browser.
351
-
352
- - **`creation_time_us` is microseconds, not milliseconds** — Divide by `1_000_000` (not `1_000`) to get a Unix timestamp in seconds. Confirmed: `1373364681312036 / 1e6 = 1373364681.3` (July 2013).
353
-
354
- - **`numFollowers` on topic pages may be 0 even for major topics** — The field reflects the logged-in user's follow state for some topics and appears to undercount. Treat as approximate.
355
-
356
- - **Search pages do not yield result data** — `https://www.quora.com/search?q=...` returns 3 payloads with viewer/network info only — no search results in the SSR payload. Search results are loaded client-side and are not accessible via http_get.
357
-
358
- - **Profile pages do not include the user's answer list** — The profile page SSR payload returns user metadata only. The list of a user's answers is loaded via XHR pagination. To get answers for a specific question, use the question URL directly.
359
-
360
- - **IDs are base64-encoded Relay global IDs** — `id: "UXVlc3Rpb25AMDoyODYx"` decodes to `"Question@0:2861"`. The numeric `qid`/`uid`/`aid`/`tid` fields are more useful for constructing URLs and deduplication. Use `qid` and `aid` as stable identifiers.
361
-
362
- - **`permaUrl` may be an absolute URL for Spaces answers** — Most answers have `permaUrl: "/Question-slug/answer/Author-Name"` (relative). Answers posted in a Quora Space have a full absolute URL like `"https://spacename.quora.com/Question-slug"`. Handle both forms.
363
-
364
- - **No public REST or GraphQL API** — Quora's internal `graphql/gql_para_POST` endpoint requires a valid `quora-formkey` header derived from the session, making it inaccessible without a real authenticated session. The SSR push-payload approach is the only reliable unauthenticated path.
1
+ # Quora — Data Extraction
2
+
3
+ `https://www.quora.com` — Q&A platform. One reliable access path: `http_get` with a Chrome UA against question, answer, topic, and profile pages. Quora SSR-renders all public data into `window.ansFrontendGlobals.data.inlineQueryResults` via `.push()` calls. No browser needed for read-only tasks.
4
+
5
+ ## Do this first: pick your access path
6
+
7
+ | Goal | Best approach | Latency |
8
+ |------|--------------|---------|
9
+ | Question metadata + first ~3 ranked answers | `http_get` question page + parse push payloads | ~600ms |
10
+ | Single answer (full text + upvotes + views) | `http_get` answer permalink | ~400ms |
11
+ | Answer count for a question | question page, payload with `answerCount` | same request as above |
12
+ | Topic metadata (id, name, follower count) | `http_get` topic page + parse push payloads | ~400ms |
13
+ | User profile (name, follower/following, credential) | `http_get` profile page + parse push payloads | ~500ms |
14
+ | Keyword search results | NOT available via http_get — server returns no result data | N/A |
15
+
16
+ **Never use a browser for read-only Quora tasks.** All question, answer, topic, and profile data is server-rendered. Browser is only needed for authenticated actions (posting, upvoting, following) or for getting more than the first ~3 answers on a question page (the rest load via XHR pagination).
17
+
18
+ ---
19
+
20
+ ## UA requirement: Chrome or Firefox — NOT bare Mozilla/5.0
21
+
22
+ ```
23
+ bare "Mozilla/5.0" -> HTTP 403
24
+ Googlebot UA -> HTTP 403
25
+ Chrome UA -> HTTP 200 (confirmed working)
26
+ Firefox UA -> HTTP 200 (confirmed working)
27
+ ```
28
+
29
+ Use this header bundle for all requests:
30
+
31
+ ```python
32
+ import urllib.request, gzip, json, re
33
+
34
+ CHROME_UA = (
35
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
36
+ "AppleWebKit/537.36 (KHTML, like Gecko) "
37
+ "Chrome/123.0.0.0 Safari/537.36"
38
+ )
39
+
40
+ def quora_get(url):
41
+ """Fetch any public Quora page. Returns HTML string.
42
+ Requires Chrome/Firefox UA — bare Mozilla/5.0 returns 403.
43
+ """
44
+ req = urllib.request.Request(url, headers={
45
+ "User-Agent": CHROME_UA,
46
+ "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
47
+ "Accept-Encoding": "gzip",
48
+ "Accept-Language": "en-US,en;q=0.9",
49
+ })
50
+ with urllib.request.urlopen(req, timeout=20) as r:
51
+ data = r.read()
52
+ if r.headers.get("Content-Encoding") == "gzip":
53
+ data = gzip.decompress(data)
54
+ return data.decode()
55
+ ```
56
+
57
+ ---
58
+
59
+ ## The data format: `ansFrontendGlobals.data.inlineQueryResults`
60
+
61
+ Quora SSR embeds all page data as a series of `.push("...")` calls inside `<script>` blocks. Each call pushes a JSON-encoded string (with escaped quotes) into `inlineQueryResults`. There are no JSON-LD blocks, no `__NEXT_DATA__`, no React hydration state — only these push calls.
62
+
63
+ ```python
64
+ def extract_quora_payloads(html):
65
+ """Extract and parse all push() payloads from a Quora page.
66
+ Returns list of dicts (already decoded from double JSON encoding).
67
+ """
68
+ raw_payloads = re.findall(r'\.push\("((?:[^"\\]|\\.)*)"\)', html)
69
+ results = []
70
+ for raw in raw_payloads:
71
+ try:
72
+ # Two levels of encoding: outer JS string escape, inner JSON
73
+ inner = json.loads('"' + raw + '"') # decode JS string escaping
74
+ results.append(json.loads(inner)) # decode actual JSON
75
+ except Exception:
76
+ pass
77
+ return results
78
+ ```
79
+
80
+ A question page returns **16 payloads**. A profile or topic page returns **3 payloads**. The payloads that matter are identified by their `data` keys, not by position (positions are stable across requests for the same page type, but best to key on content).
81
+
82
+ ---
83
+
84
+ ## Path 1: Question page — metadata + first answers (fastest)
85
+
86
+ ```python
87
+ def quora_question(url):
88
+ """
89
+ Scrape a Quora question page.
90
+ Returns:
91
+ question: {qid, id, title, url, slug, topics}
92
+ answers: list of answer dicts (first ~3 ranked answers only)
93
+ answer_count: total answer count (all answers, not just loaded)
94
+ related_questions: list of question title strings
95
+ Only the first ~3 highest-ranked answers are SSR'd.
96
+ The rest require XHR pagination (browser or session cookies needed).
97
+ """
98
+ html = quora_get(url)
99
+ payloads = extract_quora_payloads(html)
100
+
101
+ def spans_to_text(json_str):
102
+ """Quora stores all text as serialized span objects."""
103
+ try:
104
+ doc = json.loads(json_str)
105
+ parts = []
106
+ for sec in doc.get('sections', []):
107
+ for span in sec.get('spans', []):
108
+ if span.get('text'):
109
+ parts.append(span['text'])
110
+ parts.append('\n')
111
+ return ''.join(parts).strip()
112
+ except Exception:
113
+ return json_str
114
+
115
+ def author_display_name(author_dict):
116
+ names = author_dict.get('names', [])
117
+ if names:
118
+ n = names[0]
119
+ return f"{n.get('givenName', '')} {n.get('familyName', '')}".strip()
120
+ return None
121
+
122
+ result = {'question': {}, 'answers': [], 'answer_count': None, 'related_questions': []}
123
+
124
+ for payload in payloads:
125
+ data = payload.get('data', payload)
126
+
127
+ # Question metadata — keyed by presence of 'qid' inside 'question'
128
+ if 'question' in data and isinstance(data['question'], dict):
129
+ q = data['question']
130
+ if q.get('qid') and not result['question']:
131
+ result['question'] = {
132
+ 'qid': q.get('qid'),
133
+ 'id': q.get('id'),
134
+ 'title': spans_to_text(q.get('title', '')),
135
+ 'url': q.get('url'),
136
+ 'slug': q.get('slug'),
137
+ 'topics': [t['name'] for t in q.get('navigationTopics', [])],
138
+ }
139
+
140
+ # Total answer count — keyed by 'answerCount'
141
+ if 'answerCount' in data:
142
+ result['answer_count'] = data['answerCount']
143
+ rq = (data.get('bottomRelatedQuestionsInfo') or {}).get('relatedQuestions', [])
144
+ result['related_questions'] = [spans_to_text(r['title']) for r in rq]
145
+
146
+ # Answer nodes — keyed by node.__typename == 'QuestionAnswerItem2'
147
+ node = data.get('node', {})
148
+ if isinstance(node, dict) and node.get('__typename') == 'QuestionAnswerItem2':
149
+ answer = node.get('answer', {})
150
+ if answer.get('aid'):
151
+ a_author = answer.get('author') or {}
152
+ cred = answer.get('authorCredential') or {}
153
+ result['answers'].append({
154
+ 'aid': answer.get('aid'),
155
+ 'index': node.get('index'),
156
+ 'author_name': author_display_name(a_author),
157
+ 'author_profile': a_author.get('profileUrl'),
158
+ 'author_uid': a_author.get('uid'),
159
+ 'author_credential': cred.get('translatedString'),
160
+ 'num_upvotes': answer.get('numUpvotes'),
161
+ 'num_views': answer.get('numViews'),
162
+ 'num_shares': answer.get('numShares'),
163
+ 'num_comments': answer.get('numDisplayComments'),
164
+ 'creation_time_us': answer.get('creationTime'), # microseconds since epoch
165
+ 'viewer_has_access': answer.get('viewerHasAccess'),
166
+ 'perma_url': answer.get('permaUrl'),
167
+ 'text': spans_to_text(answer.get('content', '{}')),
168
+ })
169
+
170
+ return result
171
+ ```
172
+
173
+ ### Example output
174
+
175
+ ```python
176
+ result = quora_question("https://www.quora.com/What-is-the-meaning-of-life")
177
+
178
+ # result['question']:
179
+ # {
180
+ # 'qid': 2861,
181
+ # 'id': 'UXVlc3Rpb25AMDoyODYx',
182
+ # 'title': 'What is the meaning of life?',
183
+ # 'url': '/What-is-the-meaning-of-life',
184
+ # 'slug': 'What-is-the-meaning-of-life',
185
+ # 'topics': ['Philosophy', 'The Big Unanswered Questions', 'Meaning of Life', ...]
186
+ # }
187
+
188
+ # result['answer_count']: 413
189
+
190
+ # result['answers'][0]:
191
+ # {
192
+ # 'aid': 2779675,
193
+ # 'index': 1,
194
+ # 'author_name': 'Shubhankar Srivastava',
195
+ # 'author_profile': '/profile/Shubhankar-Srivastava',
196
+ # 'author_uid': 5381038,
197
+ # 'author_credential': 'works at D. E. Shaw',
198
+ # 'num_upvotes': 589,
199
+ # 'num_views': 24085,
200
+ # 'num_shares': 0,
201
+ # 'num_comments': 8,
202
+ # 'creation_time_us': 1373364681312036, # divide by 1e6 for seconds
203
+ # 'viewer_has_access': True,
204
+ # 'perma_url': '/What-is-the-meaning-of-life/answer/Shubhankar-Srivastava',
205
+ # 'text': 'Every morning in Africa, a deer wakes up...'
206
+ # }
207
+ ```
208
+
209
+ ### Convert creation_time_us to datetime
210
+
211
+ ```python
212
+ from datetime import datetime, timezone
213
+ ts_sec = result['answers'][0]['creation_time_us'] / 1_000_000
214
+ dt = datetime.fromtimestamp(ts_sec, tz=timezone.utc)
215
+ # datetime(2013, 7, 9, 9, 31, 21, tzinfo=timezone.utc)
216
+ ```
217
+
218
+ ---
219
+
220
+ ## Path 2: Single answer permalink
221
+
222
+ Fetching `quora.com/{question-slug}/answer/{author-slug}` directly returns only that one answer's full data in 3 payloads instead of 16. Use this when you already know the answer URL.
223
+
224
+ ```python
225
+ def quora_answer(answer_url):
226
+ """
227
+ Fetch a single answer by its permalink.
228
+ URL format: https://www.quora.com/{question-slug}/answer/{author-profile-slug}
229
+ Returns answer dict with: aid, num_upvotes, num_views, text, author info.
230
+ """
231
+ html = quora_get(answer_url)
232
+ payloads = extract_quora_payloads(html)
233
+
234
+ for payload in payloads:
235
+ data = payload.get('data', {})
236
+ if 'answer' in data and isinstance(data['answer'], dict):
237
+ a = data['answer']
238
+ author = a.get('author') or {}
239
+ names = author.get('names', [{}])
240
+ n = names[0] if names else {}
241
+ return {
242
+ 'aid': a.get('aid'),
243
+ 'num_upvotes': a.get('numUpvotes'),
244
+ 'num_views': a.get('numViews'),
245
+ 'author_name': f"{n.get('givenName','')} {n.get('familyName','')}".strip(),
246
+ 'author_uid': author.get('uid'),
247
+ 'text': _spans_to_text(a.get('content', '{}')),
248
+ }
249
+ return {}
250
+
251
+ # Example:
252
+ # quora_answer("https://www.quora.com/What-is-the-meaning-of-life/answer/Pararth-Shah")
253
+ # -> {'aid': 4734237, 'num_upvotes': 234, 'num_views': 100643, 'author_name': 'Pararth Shah', ...}
254
+ ```
255
+
256
+ ---
257
+
258
+ ## Path 3: Topic page
259
+
260
+ ```python
261
+ def quora_topic(topic_url):
262
+ """
263
+ Fetch topic metadata from a Quora topic page.
264
+ URL format: https://www.quora.com/topic/{topic-slug}
265
+ Returns: tid, name, num_followers, url, is_following, has_leaderboard.
266
+ NOTE: The topic page itself only renders topic metadata, NOT the question feed.
267
+ Question feed requires browser (XHR-loaded via React).
268
+ """
269
+ html = quora_get(topic_url)
270
+ payloads = extract_quora_payloads(html)
271
+
272
+ for payload in payloads:
273
+ data = payload.get('data', {})
274
+ if 'topic' in data and isinstance(data['topic'], dict):
275
+ t = data['topic']
276
+ return {
277
+ 'tid': t.get('tid'),
278
+ 'id': t.get('id'),
279
+ 'name': t.get('name'),
280
+ 'url': t.get('url'),
281
+ 'num_followers': t.get('numFollowers'),
282
+ 'is_following': t.get('isFollowing'),
283
+ 'has_leaderboard': t.get('hasLeaderboard'),
284
+ 'photo_url': t.get('photoUrl'),
285
+ 'is_locked': t.get('isLocked'),
286
+ }
287
+ return {}
288
+
289
+ # Example:
290
+ # quora_topic("https://www.quora.com/topic/Python-programming-language")
291
+ # -> {'tid': 13292, 'name': 'Python Programming Language', 'num_followers': 10, ...}
292
+ ```
293
+
294
+ ---
295
+
296
+ ## Path 4: User profile page
297
+
298
+ ```python
299
+ def quora_profile(profile_url):
300
+ """
301
+ Fetch user profile data from https://www.quora.com/profile/{username}
302
+ Returns: uid, name, credential, follower_count, following_count, profile_image_url.
303
+ """
304
+ html = quora_get(profile_url)
305
+ payloads = extract_quora_payloads(html)
306
+
307
+ for payload in payloads:
308
+ data = payload.get('data', {})
309
+ if 'user' in data and isinstance(data['user'], dict):
310
+ u = data['user']
311
+ names = u.get('names', [{}])
312
+ n = names[0] if names else {}
313
+ cred = u.get('profileCredential') or {}
314
+ return {
315
+ 'uid': u.get('uid'),
316
+ 'id': u.get('id'),
317
+ 'name': f"{n.get('givenName','')} {n.get('familyName','')}".strip(),
318
+ 'profile_url': u.get('profileUrl'),
319
+ 'follower_count': u.get('followerCount'),
320
+ 'following_count': u.get('followingCount'),
321
+ 'profile_image': u.get('profileImageUrl'),
322
+ 'credential': cred.get('experience'),
323
+ 'is_verified': u.get('isVerified'),
324
+ 'is_anon': u.get('isAnon'),
325
+ 'is_ai_account': u.get('isAiAccount'),
326
+ 'deactivated': u.get('deactivated'),
327
+ }
328
+ return {}
329
+
330
+ # Example:
331
+ # quora_profile("https://www.quora.com/profile/Pararth-Shah")
332
+ # -> {'uid': 4683832, 'name': 'Pararth Shah', 'follower_count': 5154,
333
+ # 'following_count': 83, 'credential': 'Unfinished symphony.', ...}
334
+ ```
335
+
336
+ ---
337
+
338
+ ## Gotchas
339
+
340
+ - **Bare Mozilla/5.0 UA returns HTTP 403** — Always use a full Chrome or Firefox UA string. The default `http_get` helper's `"User-Agent": "Mozilla/5.0"` will be blocked. Do not use `http_get` directly; use the `quora_get` wrapper above.
341
+
342
+ - **Googlebot UA returns HTTP 403** — Quora blocks crawler UAs. Only real browser UAs work.
343
+
344
+ - **Double JSON encoding** — Each `.push()` argument is a JavaScript string literal containing JSON. To parse: first `json.loads('"' + raw + '"')` to decode the JS string escaping (converts `\\"` to `"`), then `json.loads(inner)` to parse the actual JSON object. Skipping either step produces parse errors.
345
+
346
+ - **All text fields are serialized span objects** — `question.title`, `answer.content`, `user.descriptionQtextDocument.legacyJson`, etc. are all JSON strings containing a `{"sections": [{"spans": [...]}]}` document, not plain text. Always parse through `spans_to_text()`.
347
+
348
+ - **Question page only SSR's the first ~3 answers** — The `answers` list in the result will contain at most 3 entries (the top-ranked answers). The `answer_count` field shows the true total (e.g. 413). To get more answers you need browser-based XHR pagination (Quora sends additional answers via GraphQL calls that require session auth in practice).
349
+
350
+ - **`viewer_has_access: false` still includes full content** — Even when `viewerHasAccess` is `False` (answers from Quora+ Spaces / tribe-only content), the `content` field is still present in the SSR payload and the full text is readable. The flag only controls client-side gating in the browser.
351
+
352
+ - **`creation_time_us` is microseconds, not milliseconds** — Divide by `1_000_000` (not `1_000`) to get a Unix timestamp in seconds. Confirmed: `1373364681312036 / 1e6 = 1373364681.3` (July 2013).
353
+
354
+ - **`numFollowers` on topic pages may be 0 even for major topics** — The field reflects the logged-in user's follow state for some topics and appears to undercount. Treat as approximate.
355
+
356
+ - **Search pages do not yield result data** — `https://www.quora.com/search?q=...` returns 3 payloads with viewer/network info only — no search results in the SSR payload. Search results are loaded client-side and are not accessible via http_get.
357
+
358
+ - **Profile pages do not include the user's answer list** — The profile page SSR payload returns user metadata only. The list of a user's answers is loaded via XHR pagination. To get answers for a specific question, use the question URL directly.
359
+
360
+ - **IDs are base64-encoded Relay global IDs** — `id: "UXVlc3Rpb25AMDoyODYx"` decodes to `"Question@0:2861"`. The numeric `qid`/`uid`/`aid`/`tid` fields are more useful for constructing URLs and deduplication. Use `qid` and `aid` as stable identifiers.
361
+
362
+ - **`permaUrl` may be an absolute URL for Spaces answers** — Most answers have `permaUrl: "/Question-slug/answer/Author-Name"` (relative). Answers posted in a Quora Space have a full absolute URL like `"https://spacename.quora.com/Question-slug"`. Handle both forms.
363
+
364
+ - **No public REST or GraphQL API** — Quora's internal `graphql/gql_para_POST` endpoint requires a valid `quora-formkey` header derived from the session, making it inaccessible without a real authenticated session. The SSR push-payload approach is the only reliable unauthenticated path.