@pencil-agent/nano-pencil 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (195) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/mcp/mcp-client.d.ts +3 -1
  7. package/dist/core/mcp/mcp-client.js +6 -6
  8. package/dist/core/mcp/mcp-config.d.ts +3 -3
  9. package/dist/core/mcp/mcp-config.js +1 -1
  10. package/dist/core/mcp/mcp-manager.d.ts +5 -1
  11. package/dist/core/mcp/mcp-manager.js +1 -1
  12. package/dist/core/platform/config/resource-loader.d.ts +2 -0
  13. package/dist/core/platform/config/resource-loader.js +2 -2
  14. package/dist/core/runtime/agent-session.d.ts +12 -0
  15. package/dist/core/runtime/agent-session.js +8 -8
  16. package/dist/core/runtime/sdk.d.ts +8 -0
  17. package/dist/core/runtime/sdk.js +1 -1
  18. package/dist/extensions/builtin/AGENT.md +115 -115
  19. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  20. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  99. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  100. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  101. package/dist/extensions/builtin/browser/browser.md +73 -73
  102. package/dist/extensions/builtin/browser/install.md +142 -142
  103. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  104. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  105. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  107. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  112. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  113. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  114. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  115. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  116. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  117. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  118. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  119. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  120. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  121. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  122. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  123. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  124. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  125. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  126. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  127. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  128. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  129. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  130. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  131. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  132. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  133. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  134. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  135. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  136. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  137. package/dist/extensions/builtin/goal/README.md +67 -67
  138. package/dist/extensions/builtin/grub/README.md +112 -112
  139. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  140. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  141. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  142. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  143. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  144. package/dist/extensions/builtin/loop/README.md +92 -92
  145. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  146. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  147. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  148. package/dist/extensions/builtin/sal/README.md +72 -72
  149. package/dist/extensions/builtin/security-audit/README.md +289 -289
  150. package/dist/extensions/builtin/team/AGENT.md +112 -112
  151. package/dist/extensions/builtin/team/TESTING.md +299 -299
  152. package/dist/extensions/builtin/token-save/README.md +56 -56
  153. package/dist/extensions/optional/AGENT.md +10 -10
  154. package/dist/modes/interactive/interactive-mode.js +36 -36
  155. package/dist/modes/interactive/theme/dark.json +85 -85
  156. package/dist/modes/interactive/theme/light.json +84 -84
  157. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  158. package/dist/modes/interactive/theme/warm.json +81 -81
  159. package/dist/node_modules/@pencil-agent/agent-core/dist/agent-loop.js +3 -2
  160. package/dist/node_modules/@pencil-agent/agent-core/dist/structured-adaptive-agent-loop.js +2 -1
  161. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  162. package/docs/cc-agent-design.md +1297 -0
  163. package/docs/cc-tui-design.md +1333 -0
  164. package/docs/codex-goal-command-impl.md +1055 -1055
  165. package/docs/codex-goal-vs-grub.md +500 -500
  166. package/docs/custom-provider.md +27 -27
  167. package/docs/extensions.md +27 -27
  168. package/docs/keybindings.md +27 -27
  169. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  170. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  171. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  172. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  173. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  174. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  175. package/docs/loop-usage-examples.md +214 -214
  176. package/docs/models.md +27 -27
  177. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +170 -0
  178. package/docs/packages.md +27 -27
  179. package/docs/pi-design-philosophy.md +457 -457
  180. package/docs/planmode.md +1987 -1987
  181. package/docs/prompt-templates.md +27 -27
  182. package/docs/providers.md +27 -27
  183. package/docs/scan-report.md +3820 -0
  184. package/docs/sdk.md +27 -27
  185. package/docs/skills.md +27 -27
  186. package/docs/themes.md +27 -27
  187. package/docs/tui.md +27 -27
  188. package/docs//345/257/271/346/240/207Claude-Code.md +1775 -0
  189. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +261 -0
  190. package/package.json +190 -190
  191. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +0 -851
  192. package/docs/SDK-TESTING.md +0 -364
  193. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +0 -593
  194. package/docs/startup-performance-optimization.md +0 -301
  195. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +0 -47
@@ -1,478 +1,478 @@
1
- # npm & PyPI — Package Registry Data Extraction
2
-
3
- `https://registry.npmjs.org` · `https://api.npmjs.org` · `https://pypi.org` · `https://pypistats.org`
4
-
5
- Both registries expose full JSON APIs with no auth required. Never use a browser — every data point is available over HTTP.
6
-
7
- Tested 2026-04-18 with `uv run python` + `http_get`.
8
-
9
- ---
10
-
11
- ## Latency reference (measured)
12
-
13
- | Endpoint | Latency |
14
- |----------|---------|
15
- | PyPI package JSON | ~80ms |
16
- | npm downloads point | ~110ms |
17
- | npm registry full doc (react = 6.3MB) | ~280ms |
18
- | npm registry search | ~330ms |
19
- | pypistats.org recent | ~480ms |
20
-
21
- ---
22
-
23
- ## npm Registry
24
-
25
- ### Package metadata
26
-
27
- Two endpoints — pick based on what you need:
28
-
29
- **Full registry document** — includes all version history, time map, author, bugs, homepage, keywords, README (when present). Large for popular packages (react = 6.3MB).
30
-
31
- ```python
32
- import json
33
- data = json.loads(http_get("https://registry.npmjs.org/react"))
34
-
35
- # Top-level keys: _id, name, dist-tags, versions, time, bugs, author,
36
- # license, homepage, keywords, repository, description,
37
- # contributors, maintainers, readme, readmeFilename, users
38
- print(data['name']) # 'react'
39
- print(data['dist-tags']['latest']) # '19.2.5'
40
- print(data['time']['created']) # '2011-10-26T17:46:21.942Z'
41
- print(data['time']['modified']) # '2026-04-18T00:57:09.913Z'
42
-
43
- latest = data['dist-tags']['latest']
44
- v = data['versions'][latest]
45
- # Version object keys: name, version, description, license, keywords,
46
- # homepage, bugs, repository, engines, exports, main, scripts,
47
- # dependencies, devDependencies, peerDependencies, dist, maintainers,
48
- # _npmUser, _nodeVersion, _npmVersion
49
- print(v['description']) # 'React is a JavaScript library...'
50
- print(v['license']) # 'MIT'
51
- print(list(v.get('dependencies', {}).keys())) # [] (react 19 has no runtime deps)
52
- print(v.get('homepage')) # 'https://react.dev/'
53
- print(len(data['versions'])) # 2785 — all published versions
54
- ```
55
-
56
- **Single version endpoint** — 1–2KB instead of megabytes. Use when you only need one version's data.
57
-
58
- ```python
59
- import json
60
- # Fetch a specific version
61
- v = json.loads(http_get("https://registry.npmjs.org/react/19.2.5"))
62
- print(v['name'], v['version'], v['description'])
63
-
64
- # Fetch latest directly (no need to resolve dist-tags first)
65
- v = json.loads(http_get("https://registry.npmjs.org/react/latest"))
66
- print(v['version']) # '19.2.5'
67
- ```
68
-
69
- **Abbreviated document** — skips time map and (in theory) README; versions dict still present. Use `Accept` header.
70
-
71
- ```python
72
- import json, urllib.request, gzip
73
-
74
- req = urllib.request.Request(
75
- "https://registry.npmjs.org/react",
76
- headers={
77
- "Accept": "application/vnd.npm.install-v1+json",
78
- "Accept-Encoding": "gzip"
79
- }
80
- )
81
- with urllib.request.urlopen(req, timeout=20) as r:
82
- raw = r.read()
83
- if r.headers.get("Content-Encoding") == "gzip":
84
- raw = gzip.decompress(raw)
85
- data = json.loads(raw)
86
- # Keys: name, dist-tags, versions, modified (no time map, no readme)
87
- print(data['dist-tags']['latest']) # '4.18.1' (for lodash)
88
- ```
89
-
90
- Note: abbreviated is still large (react: 2.7MB) — use single-version endpoint when possible.
91
-
92
- ### Scoped packages
93
-
94
- Scoped packages (`@scope/name`) work with a direct path — no encoding needed:
95
-
96
- ```python
97
- import json
98
- data = json.loads(http_get("https://registry.npmjs.org/@playwright/test"))
99
- print(data['name']) # '@playwright/test'
100
- print(data['dist-tags']['latest']) # '1.59.1'
101
- print(len(data['versions'])) # 3148
102
- ```
103
-
104
- If constructing URLs dynamically, either form works:
105
- ```python
106
- # Direct path (preferred)
107
- url = f"https://registry.npmjs.org/{pkg}" # '@playwright/test'
108
- # URL-encoded slash
109
- url = f"https://registry.npmjs.org/{pkg.replace('/', '%2F')}"
110
- ```
111
-
112
- ### Download statistics
113
-
114
- The npm downloads API is separate from the registry and very fast (~110ms).
115
-
116
- **Point query** — single number for a period:
117
-
118
- ```python
119
- import json
120
-
121
- # Supported periods: last-day, last-week, last-month, last-year
122
- # Also accepts ISO date ranges: YYYY-MM-DD:YYYY-MM-DD
123
-
124
- stats = json.loads(http_get("https://api.npmjs.org/downloads/point/last-week/react"))
125
- print(stats['downloads']) # 123302510
126
- print(stats['start']) # '2026-04-11'
127
- print(stats['end']) # '2026-04-17'
128
- print(stats['package']) # 'react'
129
-
130
- # Confirmed values (2026-04-18):
131
- # last-day: 19,411,762
132
- # last-week: 123,302,510
133
- # last-month: 502,719,511
134
- # last-year: 3,000,644,845
135
- ```
136
-
137
- **Bulk point query** — up to ~128 packages in one call, comma-separated:
138
-
139
- ```python
140
- import json
141
-
142
- bulk = json.loads(http_get(
143
- "https://api.npmjs.org/downloads/point/last-week/"
144
- "react,vue,angular,webpack,typescript,eslint,jest,prettier,rollup,babel"
145
- ))
146
- # Returns dict keyed by package name
147
- for pkg, info in bulk.items():
148
- print(f"{pkg}: {info['downloads']:,}")
149
- # react: 123,302,510
150
- # vue: 11,042,359
151
- # angular: 524,366
152
- # webpack: 44,425,549
153
- # typescript: 180,054,359
154
- # eslint: 126,113,686
155
- # jest: 43,394,412
156
- # prettier: 87,551,734
157
- # rollup: 103,431,439
158
- # babel: 139,207
159
- ```
160
-
161
- **Range query** — downloads per day over a period:
162
-
163
- ```python
164
- import json
165
-
166
- resp = json.loads(http_get(
167
- "https://api.npmjs.org/downloads/range/2025-01-01:2025-01-07/react"
168
- ))
169
- # resp['downloads'] is a list of {downloads, day} objects
170
- for entry in resp['downloads']:
171
- print(entry['day'], entry['downloads'])
172
- # 2025-01-01 1336801
173
- # 2025-01-02 3288088
174
- # 2025-01-03 3381680
175
- # ...
176
- ```
177
-
178
- ### Search
179
-
180
- ```python
181
- import json
182
-
183
- # Fields: text, size (max ~250), from (offset), quality, popularity, maintenance weights
184
- data = json.loads(http_get(
185
- "https://registry.npmjs.org/-/v1/search?text=browser+automation&size=5"
186
- ))
187
- print(data['total']) # total results matching the query
188
-
189
- for obj in data['objects']:
190
- p = obj['package']
191
- s = obj['score']
192
- # p keys: name, version, description, keywords, date, links, publisher, maintainers
193
- # s keys: final, detail.quality, detail.popularity, detail.maintenance
194
- print(
195
- p['name'],
196
- p['version'],
197
- f"{s['final']:.2f}",
198
- p.get('description', '')[:60]
199
- )
200
- # agent-browser 0.26.0 462.28 Browser automation CLI for AI agents
201
- # nightmare 3.0.2 306.64 A high-level browser automation library.
202
- ```
203
-
204
- Score breakdown (all three are 0–1 floats):
205
- - `quality` — code quality signals (tests, lint, TypeScript types)
206
- - `popularity` — download counts normalized
207
- - `maintenance` — release frequency, open issues
208
-
209
- `final` is a weighted combination and can exceed 1.0 for extremely popular packages.
210
-
211
- ### Error handling
212
-
213
- ```python
214
- import json, urllib.error
215
-
216
- try:
217
- data = json.loads(http_get("https://registry.npmjs.org/nonexistent-pkg-xyz"))
218
- except urllib.error.HTTPError as e:
219
- # 404 for missing packages
220
- print(e.code) # 404
221
- print(json.loads(e.read())) # {'error': 'Not found'}
222
- ```
223
-
224
- ---
225
-
226
- ## PyPI
227
-
228
- ### Package metadata
229
-
230
- ```python
231
- import json
232
-
233
- # Latest version metadata
234
- data = json.loads(http_get("https://pypi.org/pypi/requests/json"))
235
- info = data['info']
236
-
237
- # info keys (selected):
238
- print(info['name']) # 'requests'
239
- print(info['version']) # '2.33.1'
240
- print(info['summary']) # 'Python HTTP for Humans.'
241
- print(info['license']) # 'Apache-2.0'
242
- print(info['author']) # None (sometimes empty — check author_email)
243
- print(info['author_email']) # '"Kenneth Reitz" <me@kennethreitz.org>'
244
- print(info['requires_python']) # '>=3.10'
245
- print(info['home_page']) # None (may be empty — check project_urls)
246
- print(info['project_urls'])
247
- # {'Documentation': 'https://requests.readthedocs.io',
248
- # 'Source': 'https://github.com/psf/requests'}
249
-
250
- requires = info.get('requires_dist') or []
251
- print(requires[:5])
252
- # ['charset_normalizer<4,>=2', 'idna<4,>=2.5', 'urllib3<3,>=1.26',
253
- # 'certifi>=2023.5.7', 'PySocks!=1.5.7,>=1.5.6; extra == "socks"']
254
-
255
- print(info.get('classifiers', [])[:3])
256
- # ['Development Status :: 5 - Production/Stable',
257
- # 'Intended Audience :: Developers',
258
- # 'License :: OSI Approved :: Apache Software License']
259
-
260
- # data['urls'] — list of dist files for the latest version
261
- for f in data['urls']:
262
- # keys: filename, packagetype, python_version, size, digests, url,
263
- # upload_time, requires_python, yanked, yanked_reason
264
- print(f['packagetype'], f['python_version'], f['filename'], f['size'])
265
- # bdist_wheel py3 requests-2.33.1-py3-none-any.whl 64947
266
- # sdist source requests-2.33.1.tar.gz 134120
267
- ```
268
-
269
- ### Specific version
270
-
271
- ```python
272
- import json
273
-
274
- # Fetch a pinned version (not just latest)
275
- data = json.loads(http_get("https://pypi.org/pypi/requests/2.32.3/json"))
276
- print(data['info']['version']) # '2.32.3'
277
- # Same structure as the latest endpoint
278
- ```
279
-
280
- ### Version history and yanked releases
281
-
282
- ```python
283
- import json
284
-
285
- data = json.loads(http_get("https://pypi.org/pypi/requests/json"))
286
-
287
- # data['releases'] is a dict: version_string -> list of file objects
288
- versions = list(data['releases'].keys())
289
- print("Total versions:", len(versions)) # 159
290
- # Versions are insertion-ordered (chronological, oldest first)
291
- # dict key order is stable
292
-
293
- # Find yanked versions
294
- yanked = [
295
- (ver, files[0]['yanked_reason'])
296
- for ver, files in data['releases'].items()
297
- if files and files[0].get('yanked')
298
- ]
299
- print(yanked[:2])
300
- # [('2.32.0', 'Yanked due to conflicts with CVE-2024-35195 mitigation'),
301
- # ('2.32.1', 'Yanked due to conflicts with CVE-2024-35195 mitigation ')]
302
-
303
- # info.yanked is True only if the LATEST version is yanked
304
- print(data['info']['yanked']) # False
305
- print(data['info']['yanked_reason']) # None
306
- ```
307
-
308
- ### Download statistics (pypistats.org)
309
-
310
- PyPI does not expose download counts in its own JSON API. Use pypistats.org.
311
-
312
- ```python
313
- import json
314
-
315
- # Recent (last day/week/month) — fastest, single call
316
- stats = json.loads(http_get("https://pypistats.org/api/packages/requests/recent"))
317
- d = stats['data']
318
- print(d['last_day']) # 52969887
319
- print(d['last_week']) # 356556988
320
- print(d['last_month']) # 1385411770
321
-
322
- # Historical daily totals (overall, going back ~6 months)
323
- overall = json.loads(http_get("https://pypistats.org/api/packages/requests/overall"))
324
- # overall['data'] is list of {category, date, downloads}
325
- # category is 'with_mirrors' or 'without_mirrors'
326
- for row in overall['data'][:3]:
327
- print(row['date'], row['category'], row['downloads'])
328
- # 2025-10-19 with_mirrors 21916634
329
- # 2025-10-19 without_mirrors 21882953
330
-
331
- # Without mirrors (pip installs only, more accurate for real usage):
332
- clean = json.loads(http_get(
333
- "https://pypistats.org/api/packages/requests/overall?mirrors=false"
334
- ))
335
-
336
- # By Python major version
337
- by_python = json.loads(http_get(
338
- "https://pypistats.org/api/packages/requests/python_major"
339
- ))
340
- # data rows: {category: '3', date: '...', downloads: N}
341
-
342
- # By OS
343
- by_sys = json.loads(http_get(
344
- "https://pypistats.org/api/packages/requests/system"
345
- ))
346
- # data rows: {category: 'Darwin'|'Linux'|'Windows'|'other'|'null', date, downloads}
347
-
348
- # By Python minor version
349
- by_minor = json.loads(http_get(
350
- "https://pypistats.org/api/packages/requests/python_minor"
351
- ))
352
- ```
353
-
354
- ### Parallel fetch for multiple packages
355
-
356
- ```python
357
- import json
358
- from concurrent.futures import ThreadPoolExecutor
359
-
360
- packages = ['numpy', 'pandas', 'scikit-learn', 'torch', 'tensorflow']
361
-
362
- def get_pypi_info(pkg):
363
- d = json.loads(http_get(f"https://pypi.org/pypi/{pkg}/json"))
364
- return {
365
- 'name': pkg,
366
- 'version': d['info']['version'],
367
- 'summary': d['info']['summary'],
368
- 'requires_python': d['info']['requires_python'],
369
- }
370
-
371
- with ThreadPoolExecutor(max_workers=5) as ex:
372
- results = list(ex.map(get_pypi_info, packages))
373
-
374
- for r in results:
375
- print(r['name'], r['version'], r['summary'][:50])
376
- # numpy 2.4.4 Fundamental package for array computing in Python
377
- # pandas 3.0.2 Powerful data structures for data analysis, time s
378
- # scikit-learn 1.8.0 A set of python modules for machine learning and d
379
- # torch 2.11.0 Tensors and Dynamic neural networks in Python with
380
- # tensorflow 2.21.0 TensorFlow is an open source machine learning fram
381
- ```
382
-
383
- ### Error handling
384
-
385
- ```python
386
- import json, urllib.error
387
-
388
- try:
389
- data = json.loads(http_get("https://pypi.org/pypi/nonexistent-xyz-abc/json"))
390
- except urllib.error.HTTPError as e:
391
- print(e.code) # 404
392
- # Body is HTML, not JSON — don't try to parse it
393
- ```
394
-
395
- ---
396
-
397
- ## Parallel fetch patterns
398
-
399
- ### Mixed registry + stats in one shot
400
-
401
- ```python
402
- import json
403
- from concurrent.futures import ThreadPoolExecutor
404
-
405
- def npm_info(pkg):
406
- # Use single-version endpoint (1-2KB) not full registry doc (MB)
407
- v = json.loads(http_get(f"https://registry.npmjs.org/{pkg}/latest"))
408
- s = json.loads(http_get(f"https://api.npmjs.org/downloads/point/last-month/{pkg}"))
409
- return {'name': pkg, 'version': v['version'], 'downloads': s['downloads']}
410
-
411
- pkgs = ['react', 'vue', 'svelte', 'solid-js', 'preact']
412
- with ThreadPoolExecutor(max_workers=5) as ex:
413
- results = list(ex.map(npm_info, pkgs))
414
- for r in results:
415
- print(r['name'], r['version'], f"{r['downloads']:,}")
416
- ```
417
-
418
- ### npm bulk downloads (most efficient for many packages)
419
-
420
- ```python
421
- import json
422
-
423
- # Up to ~128 packages in one HTTP call
424
- pkgs = ['react', 'vue', 'angular', 'svelte']
425
- bulk = json.loads(http_get(
426
- f"https://api.npmjs.org/downloads/point/last-week/{','.join(pkgs)}"
427
- ))
428
- # Returns: {pkg_name: {'downloads': N, 'start': '...', 'end': '...', 'package': '...'}, ...}
429
- sorted_pkgs = sorted(bulk.items(), key=lambda x: x[1]['downloads'], reverse=True)
430
- for name, info in sorted_pkgs:
431
- print(f"{name}: {info['downloads']:,}")
432
- ```
433
-
434
- ---
435
-
436
- ## Rate limits
437
-
438
- No rate limits encountered across rapid bursts of 10 sequential calls per endpoint (2026-04-18 testing):
439
-
440
- | API | Observed limit |
441
- |-----|----------------|
442
- | npm registry (`registry.npmjs.org`) | None observed |
443
- | npm downloads (`api.npmjs.org`) | None observed |
444
- | npm search | None observed |
445
- | PyPI JSON (`pypi.org`) | None observed |
446
- | pypistats.org | None observed |
447
-
448
- npm's official documentation mentions soft rate limits at very high volumes, but normal task-level usage (dozens of calls) is unaffected. If building a large scraper, add a short sleep between batches as a precaution.
449
-
450
- ---
451
-
452
- ## Gotchas
453
-
454
- - **Full npm registry doc is huge** — `registry.npmjs.org/react` is 6.3MB (2785 versions). When you only need the latest version metadata, fetch `registry.npmjs.org/react/latest` (~1.8KB) instead. Similarly for any specific version.
455
-
456
- - **npm `versions` dict keys are ordered oldest-first** — The last key is NOT necessarily the latest release; it may be a canary/experimental build. Always use `dist-tags.latest` to identify the stable latest version.
457
-
458
- - **PyPI `author` field is often `None`** — Many packages set `author_email` instead (often in `"Name" <email>` format). Fall back: `info['author'] or info['author_email']`.
459
-
460
- - **PyPI `home_page` is frequently empty** — Check `info['project_urls']` for `Homepage`, `Source`, `Documentation` links instead.
461
-
462
- - **PyPI `requires_dist` can be `None`** — Not an empty list — `None`. Always guard: `info.get('requires_dist') or []`.
463
-
464
- - **PyPI XML-RPC API is dead** — `https://pypi.org/pypi` (XML-RPC) returns a fault for most methods including `package_releases`. Use JSON API only.
465
-
466
- - **pypistats.org `total` field is `None`** — The `total` key in response JSON is null; compute sums from `data` list yourself.
467
-
468
- - **pypistats.org data goes back ~6 months** — The `overall` endpoint returns daily rows for roughly the past 180 days, not full history.
469
-
470
- - **PyPI yanked versions** — `data['releases'][ver][0]['yanked']` is `True` for yanked versions. `data['info']['yanked']` is only `True` if the latest version itself is yanked. Both `yanked` and `yanked_reason` fields exist on each file object.
471
-
472
- - **npm scoped packages** — Both `registry.npmjs.org/@scope/name` (direct path) and `registry.npmjs.org/@scope%2Fname` (URL-encoded) work. Use the direct path form.
473
-
474
- - **npm downloads bulk response is a dict** — When you request multiple packages, the response is `{pkg_name: {...}}`, not a list. Single-package response is a flat object with `downloads`, `start`, `end`, `package` directly.
475
-
476
- - **`http_get` handles gzip transparently** — The helper already decompresses gzip responses. No manual decompression needed.
477
-
478
- - **Never use a browser for either registry** — All data is JSON over HTTP. `http_get` calls take 80–480ms; a browser navigation would take 3–8 seconds with no benefit.
1
+ # npm & PyPI — Package Registry Data Extraction
2
+
3
+ `https://registry.npmjs.org` · `https://api.npmjs.org` · `https://pypi.org` · `https://pypistats.org`
4
+
5
+ Both registries expose full JSON APIs with no auth required. Never use a browser — every data point is available over HTTP.
6
+
7
+ Tested 2026-04-18 with `uv run python` + `http_get`.
8
+
9
+ ---
10
+
11
+ ## Latency reference (measured)
12
+
13
+ | Endpoint | Latency |
14
+ |----------|---------|
15
+ | PyPI package JSON | ~80ms |
16
+ | npm downloads point | ~110ms |
17
+ | npm registry full doc (react = 6.3MB) | ~280ms |
18
+ | npm registry search | ~330ms |
19
+ | pypistats.org recent | ~480ms |
20
+
21
+ ---
22
+
23
+ ## npm Registry
24
+
25
+ ### Package metadata
26
+
27
+ Two endpoints — pick based on what you need:
28
+
29
+ **Full registry document** — includes all version history, time map, author, bugs, homepage, keywords, README (when present). Large for popular packages (react = 6.3MB).
30
+
31
+ ```python
32
+ import json
33
+ data = json.loads(http_get("https://registry.npmjs.org/react"))
34
+
35
+ # Top-level keys: _id, name, dist-tags, versions, time, bugs, author,
36
+ # license, homepage, keywords, repository, description,
37
+ # contributors, maintainers, readme, readmeFilename, users
38
+ print(data['name']) # 'react'
39
+ print(data['dist-tags']['latest']) # '19.2.5'
40
+ print(data['time']['created']) # '2011-10-26T17:46:21.942Z'
41
+ print(data['time']['modified']) # '2026-04-18T00:57:09.913Z'
42
+
43
+ latest = data['dist-tags']['latest']
44
+ v = data['versions'][latest]
45
+ # Version object keys: name, version, description, license, keywords,
46
+ # homepage, bugs, repository, engines, exports, main, scripts,
47
+ # dependencies, devDependencies, peerDependencies, dist, maintainers,
48
+ # _npmUser, _nodeVersion, _npmVersion
49
+ print(v['description']) # 'React is a JavaScript library...'
50
+ print(v['license']) # 'MIT'
51
+ print(list(v.get('dependencies', {}).keys())) # [] (react 19 has no runtime deps)
52
+ print(v.get('homepage')) # 'https://react.dev/'
53
+ print(len(data['versions'])) # 2785 — all published versions
54
+ ```
55
+
56
+ **Single version endpoint** — 1–2KB instead of megabytes. Use when you only need one version's data.
57
+
58
+ ```python
59
+ import json
60
+ # Fetch a specific version
61
+ v = json.loads(http_get("https://registry.npmjs.org/react/19.2.5"))
62
+ print(v['name'], v['version'], v['description'])
63
+
64
+ # Fetch latest directly (no need to resolve dist-tags first)
65
+ v = json.loads(http_get("https://registry.npmjs.org/react/latest"))
66
+ print(v['version']) # '19.2.5'
67
+ ```
68
+
69
+ **Abbreviated document** — skips time map and (in theory) README; versions dict still present. Use `Accept` header.
70
+
71
+ ```python
72
+ import json, urllib.request, gzip
73
+
74
+ req = urllib.request.Request(
75
+ "https://registry.npmjs.org/react",
76
+ headers={
77
+ "Accept": "application/vnd.npm.install-v1+json",
78
+ "Accept-Encoding": "gzip"
79
+ }
80
+ )
81
+ with urllib.request.urlopen(req, timeout=20) as r:
82
+ raw = r.read()
83
+ if r.headers.get("Content-Encoding") == "gzip":
84
+ raw = gzip.decompress(raw)
85
+ data = json.loads(raw)
86
+ # Keys: name, dist-tags, versions, modified (no time map, no readme)
87
+ print(data['dist-tags']['latest']) # '4.18.1' (for lodash)
88
+ ```
89
+
90
+ Note: abbreviated is still large (react: 2.7MB) — use single-version endpoint when possible.
91
+
92
+ ### Scoped packages
93
+
94
+ Scoped packages (`@scope/name`) work with a direct path — no encoding needed:
95
+
96
+ ```python
97
+ import json
98
+ data = json.loads(http_get("https://registry.npmjs.org/@playwright/test"))
99
+ print(data['name']) # '@playwright/test'
100
+ print(data['dist-tags']['latest']) # '1.59.1'
101
+ print(len(data['versions'])) # 3148
102
+ ```
103
+
104
+ If constructing URLs dynamically, either form works:
105
+ ```python
106
+ # Direct path (preferred)
107
+ url = f"https://registry.npmjs.org/{pkg}" # '@playwright/test'
108
+ # URL-encoded slash
109
+ url = f"https://registry.npmjs.org/{pkg.replace('/', '%2F')}"
110
+ ```
111
+
112
+ ### Download statistics
113
+
114
+ The npm downloads API is separate from the registry and very fast (~110ms).
115
+
116
+ **Point query** — single number for a period:
117
+
118
+ ```python
119
+ import json
120
+
121
+ # Supported periods: last-day, last-week, last-month, last-year
122
+ # Also accepts ISO date ranges: YYYY-MM-DD:YYYY-MM-DD
123
+
124
+ stats = json.loads(http_get("https://api.npmjs.org/downloads/point/last-week/react"))
125
+ print(stats['downloads']) # 123302510
126
+ print(stats['start']) # '2026-04-11'
127
+ print(stats['end']) # '2026-04-17'
128
+ print(stats['package']) # 'react'
129
+
130
+ # Confirmed values (2026-04-18):
131
+ # last-day: 19,411,762
132
+ # last-week: 123,302,510
133
+ # last-month: 502,719,511
134
+ # last-year: 3,000,644,845
135
+ ```
136
+
137
+ **Bulk point query** — up to ~128 packages in one call, comma-separated:
138
+
139
+ ```python
140
+ import json
141
+
142
+ bulk = json.loads(http_get(
143
+ "https://api.npmjs.org/downloads/point/last-week/"
144
+ "react,vue,angular,webpack,typescript,eslint,jest,prettier,rollup,babel"
145
+ ))
146
+ # Returns dict keyed by package name
147
+ for pkg, info in bulk.items():
148
+ print(f"{pkg}: {info['downloads']:,}")
149
+ # react: 123,302,510
150
+ # vue: 11,042,359
151
+ # angular: 524,366
152
+ # webpack: 44,425,549
153
+ # typescript: 180,054,359
154
+ # eslint: 126,113,686
155
+ # jest: 43,394,412
156
+ # prettier: 87,551,734
157
+ # rollup: 103,431,439
158
+ # babel: 139,207
159
+ ```
160
+
161
+ **Range query** — downloads per day over a period:
162
+
163
+ ```python
164
+ import json
165
+
166
+ resp = json.loads(http_get(
167
+ "https://api.npmjs.org/downloads/range/2025-01-01:2025-01-07/react"
168
+ ))
169
+ # resp['downloads'] is a list of {downloads, day} objects
170
+ for entry in resp['downloads']:
171
+ print(entry['day'], entry['downloads'])
172
+ # 2025-01-01 1336801
173
+ # 2025-01-02 3288088
174
+ # 2025-01-03 3381680
175
+ # ...
176
+ ```
177
+
178
+ ### Search
179
+
180
+ ```python
181
+ import json
182
+
183
+ # Fields: text, size (max ~250), from (offset), quality, popularity, maintenance weights
184
+ data = json.loads(http_get(
185
+ "https://registry.npmjs.org/-/v1/search?text=browser+automation&size=5"
186
+ ))
187
+ print(data['total']) # total results matching the query
188
+
189
+ for obj in data['objects']:
190
+ p = obj['package']
191
+ s = obj['score']
192
+ # p keys: name, version, description, keywords, date, links, publisher, maintainers
193
+ # s keys: final, detail.quality, detail.popularity, detail.maintenance
194
+ print(
195
+ p['name'],
196
+ p['version'],
197
+ f"{s['final']:.2f}",
198
+ p.get('description', '')[:60]
199
+ )
200
+ # agent-browser 0.26.0 462.28 Browser automation CLI for AI agents
201
+ # nightmare 3.0.2 306.64 A high-level browser automation library.
202
+ ```
203
+
204
+ Score breakdown (all three are 0–1 floats):
205
+ - `quality` — code quality signals (tests, lint, TypeScript types)
206
+ - `popularity` — download counts normalized
207
+ - `maintenance` — release frequency, open issues
208
+
209
+ `final` is a weighted combination and can exceed 1.0 for extremely popular packages.
210
+
211
+ ### Error handling
212
+
213
+ ```python
214
+ import json, urllib.error
215
+
216
+ try:
217
+ data = json.loads(http_get("https://registry.npmjs.org/nonexistent-pkg-xyz"))
218
+ except urllib.error.HTTPError as e:
219
+ # 404 for missing packages
220
+ print(e.code) # 404
221
+ print(json.loads(e.read())) # {'error': 'Not found'}
222
+ ```
223
+
224
+ ---
225
+
226
+ ## PyPI
227
+
228
+ ### Package metadata
229
+
230
+ ```python
231
+ import json
232
+
233
+ # Latest version metadata
234
+ data = json.loads(http_get("https://pypi.org/pypi/requests/json"))
235
+ info = data['info']
236
+
237
+ # info keys (selected):
238
+ print(info['name']) # 'requests'
239
+ print(info['version']) # '2.33.1'
240
+ print(info['summary']) # 'Python HTTP for Humans.'
241
+ print(info['license']) # 'Apache-2.0'
242
+ print(info['author']) # None (sometimes empty — check author_email)
243
+ print(info['author_email']) # '"Kenneth Reitz" <me@kennethreitz.org>'
244
+ print(info['requires_python']) # '>=3.10'
245
+ print(info['home_page']) # None (may be empty — check project_urls)
246
+ print(info['project_urls'])
247
+ # {'Documentation': 'https://requests.readthedocs.io',
248
+ # 'Source': 'https://github.com/psf/requests'}
249
+
250
+ requires = info.get('requires_dist') or []
251
+ print(requires[:5])
252
+ # ['charset_normalizer<4,>=2', 'idna<4,>=2.5', 'urllib3<3,>=1.26',
253
+ # 'certifi>=2023.5.7', 'PySocks!=1.5.7,>=1.5.6; extra == "socks"']
254
+
255
+ print(info.get('classifiers', [])[:3])
256
+ # ['Development Status :: 5 - Production/Stable',
257
+ # 'Intended Audience :: Developers',
258
+ # 'License :: OSI Approved :: Apache Software License']
259
+
260
+ # data['urls'] — list of dist files for the latest version
261
+ for f in data['urls']:
262
+ # keys: filename, packagetype, python_version, size, digests, url,
263
+ # upload_time, requires_python, yanked, yanked_reason
264
+ print(f['packagetype'], f['python_version'], f['filename'], f['size'])
265
+ # bdist_wheel py3 requests-2.33.1-py3-none-any.whl 64947
266
+ # sdist source requests-2.33.1.tar.gz 134120
267
+ ```
268
+
269
+ ### Specific version
270
+
271
+ ```python
272
+ import json
273
+
274
+ # Fetch a pinned version (not just latest)
275
+ data = json.loads(http_get("https://pypi.org/pypi/requests/2.32.3/json"))
276
+ print(data['info']['version']) # '2.32.3'
277
+ # Same structure as the latest endpoint
278
+ ```
279
+
280
+ ### Version history and yanked releases
281
+
282
+ ```python
283
+ import json
284
+
285
+ data = json.loads(http_get("https://pypi.org/pypi/requests/json"))
286
+
287
+ # data['releases'] is a dict: version_string -> list of file objects
288
+ versions = list(data['releases'].keys())
289
+ print("Total versions:", len(versions)) # 159
290
+ # Versions are insertion-ordered (chronological, oldest first)
291
+ # dict key order is stable
292
+
293
+ # Find yanked versions
294
+ yanked = [
295
+ (ver, files[0]['yanked_reason'])
296
+ for ver, files in data['releases'].items()
297
+ if files and files[0].get('yanked')
298
+ ]
299
+ print(yanked[:2])
300
+ # [('2.32.0', 'Yanked due to conflicts with CVE-2024-35195 mitigation'),
301
+ # ('2.32.1', 'Yanked due to conflicts with CVE-2024-35195 mitigation ')]
302
+
303
+ # info.yanked is True only if the LATEST version is yanked
304
+ print(data['info']['yanked']) # False
305
+ print(data['info']['yanked_reason']) # None
306
+ ```
307
+
308
+ ### Download statistics (pypistats.org)
309
+
310
+ PyPI does not expose download counts in its own JSON API. Use pypistats.org.
311
+
312
+ ```python
313
+ import json
314
+
315
+ # Recent (last day/week/month) — fastest, single call
316
+ stats = json.loads(http_get("https://pypistats.org/api/packages/requests/recent"))
317
+ d = stats['data']
318
+ print(d['last_day']) # 52969887
319
+ print(d['last_week']) # 356556988
320
+ print(d['last_month']) # 1385411770
321
+
322
+ # Historical daily totals (overall, going back ~6 months)
323
+ overall = json.loads(http_get("https://pypistats.org/api/packages/requests/overall"))
324
+ # overall['data'] is list of {category, date, downloads}
325
+ # category is 'with_mirrors' or 'without_mirrors'
326
+ for row in overall['data'][:3]:
327
+ print(row['date'], row['category'], row['downloads'])
328
+ # 2025-10-19 with_mirrors 21916634
329
+ # 2025-10-19 without_mirrors 21882953
330
+
331
+ # Without mirrors (pip installs only, more accurate for real usage):
332
+ clean = json.loads(http_get(
333
+ "https://pypistats.org/api/packages/requests/overall?mirrors=false"
334
+ ))
335
+
336
+ # By Python major version
337
+ by_python = json.loads(http_get(
338
+ "https://pypistats.org/api/packages/requests/python_major"
339
+ ))
340
+ # data rows: {category: '3', date: '...', downloads: N}
341
+
342
+ # By OS
343
+ by_sys = json.loads(http_get(
344
+ "https://pypistats.org/api/packages/requests/system"
345
+ ))
346
+ # data rows: {category: 'Darwin'|'Linux'|'Windows'|'other'|'null', date, downloads}
347
+
348
+ # By Python minor version
349
+ by_minor = json.loads(http_get(
350
+ "https://pypistats.org/api/packages/requests/python_minor"
351
+ ))
352
+ ```
353
+
354
+ ### Parallel fetch for multiple packages
355
+
356
+ ```python
357
+ import json
358
+ from concurrent.futures import ThreadPoolExecutor
359
+
360
+ packages = ['numpy', 'pandas', 'scikit-learn', 'torch', 'tensorflow']
361
+
362
+ def get_pypi_info(pkg):
363
+ d = json.loads(http_get(f"https://pypi.org/pypi/{pkg}/json"))
364
+ return {
365
+ 'name': pkg,
366
+ 'version': d['info']['version'],
367
+ 'summary': d['info']['summary'],
368
+ 'requires_python': d['info']['requires_python'],
369
+ }
370
+
371
+ with ThreadPoolExecutor(max_workers=5) as ex:
372
+ results = list(ex.map(get_pypi_info, packages))
373
+
374
+ for r in results:
375
+ print(r['name'], r['version'], r['summary'][:50])
376
+ # numpy 2.4.4 Fundamental package for array computing in Python
377
+ # pandas 3.0.2 Powerful data structures for data analysis, time s
378
+ # scikit-learn 1.8.0 A set of python modules for machine learning and d
379
+ # torch 2.11.0 Tensors and Dynamic neural networks in Python with
380
+ # tensorflow 2.21.0 TensorFlow is an open source machine learning fram
381
+ ```
382
+
383
+ ### Error handling
384
+
385
+ ```python
386
+ import json, urllib.error
387
+
388
+ try:
389
+ data = json.loads(http_get("https://pypi.org/pypi/nonexistent-xyz-abc/json"))
390
+ except urllib.error.HTTPError as e:
391
+ print(e.code) # 404
392
+ # Body is HTML, not JSON — don't try to parse it
393
+ ```
394
+
395
+ ---
396
+
397
+ ## Parallel fetch patterns
398
+
399
+ ### Mixed registry + stats in one shot
400
+
401
+ ```python
402
+ import json
403
+ from concurrent.futures import ThreadPoolExecutor
404
+
405
+ def npm_info(pkg):
406
+ # Use single-version endpoint (1-2KB) not full registry doc (MB)
407
+ v = json.loads(http_get(f"https://registry.npmjs.org/{pkg}/latest"))
408
+ s = json.loads(http_get(f"https://api.npmjs.org/downloads/point/last-month/{pkg}"))
409
+ return {'name': pkg, 'version': v['version'], 'downloads': s['downloads']}
410
+
411
+ pkgs = ['react', 'vue', 'svelte', 'solid-js', 'preact']
412
+ with ThreadPoolExecutor(max_workers=5) as ex:
413
+ results = list(ex.map(npm_info, pkgs))
414
+ for r in results:
415
+ print(r['name'], r['version'], f"{r['downloads']:,}")
416
+ ```
417
+
418
+ ### npm bulk downloads (most efficient for many packages)
419
+
420
+ ```python
421
+ import json
422
+
423
+ # Up to ~128 packages in one HTTP call
424
+ pkgs = ['react', 'vue', 'angular', 'svelte']
425
+ bulk = json.loads(http_get(
426
+ f"https://api.npmjs.org/downloads/point/last-week/{','.join(pkgs)}"
427
+ ))
428
+ # Returns: {pkg_name: {'downloads': N, 'start': '...', 'end': '...', 'package': '...'}, ...}
429
+ sorted_pkgs = sorted(bulk.items(), key=lambda x: x[1]['downloads'], reverse=True)
430
+ for name, info in sorted_pkgs:
431
+ print(f"{name}: {info['downloads']:,}")
432
+ ```
433
+
434
+ ---
435
+
436
+ ## Rate limits
437
+
438
+ No rate limits encountered across rapid bursts of 10 sequential calls per endpoint (2026-04-18 testing):
439
+
440
+ | API | Observed limit |
441
+ |-----|----------------|
442
+ | npm registry (`registry.npmjs.org`) | None observed |
443
+ | npm downloads (`api.npmjs.org`) | None observed |
444
+ | npm search | None observed |
445
+ | PyPI JSON (`pypi.org`) | None observed |
446
+ | pypistats.org | None observed |
447
+
448
+ npm's official documentation mentions soft rate limits at very high volumes, but normal task-level usage (dozens of calls) is unaffected. If building a large scraper, add a short sleep between batches as a precaution.
449
+
450
+ ---
451
+
452
+ ## Gotchas
453
+
454
+ - **Full npm registry doc is huge** — `registry.npmjs.org/react` is 6.3MB (2785 versions). When you only need the latest version metadata, fetch `registry.npmjs.org/react/latest` (~1.8KB) instead. Similarly for any specific version.
455
+
456
+ - **npm `versions` dict keys are ordered oldest-first** — The last key is NOT necessarily the latest release; it may be a canary/experimental build. Always use `dist-tags.latest` to identify the stable latest version.
457
+
458
+ - **PyPI `author` field is often `None`** — Many packages set `author_email` instead (often in `"Name" <email>` format). Fall back: `info['author'] or info['author_email']`.
459
+
460
+ - **PyPI `home_page` is frequently empty** — Check `info['project_urls']` for `Homepage`, `Source`, `Documentation` links instead.
461
+
462
+ - **PyPI `requires_dist` can be `None`** — Not an empty list — `None`. Always guard: `info.get('requires_dist') or []`.
463
+
464
+ - **PyPI XML-RPC API is dead** — `https://pypi.org/pypi` (XML-RPC) returns a fault for most methods including `package_releases`. Use JSON API only.
465
+
466
+ - **pypistats.org `total` field is `None`** — The `total` key in response JSON is null; compute sums from `data` list yourself.
467
+
468
+ - **pypistats.org data goes back ~6 months** — The `overall` endpoint returns daily rows for roughly the past 180 days, not full history.
469
+
470
+ - **PyPI yanked versions** — `data['releases'][ver][0]['yanked']` is `True` for yanked versions. `data['info']['yanked']` is only `True` if the latest version itself is yanked. Both `yanked` and `yanked_reason` fields exist on each file object.
471
+
472
+ - **npm scoped packages** — Both `registry.npmjs.org/@scope/name` (direct path) and `registry.npmjs.org/@scope%2Fname` (URL-encoded) work. Use the direct path form.
473
+
474
+ - **npm downloads bulk response is a dict** — When you request multiple packages, the response is `{pkg_name: {...}}`, not a list. Single-package response is a flat object with `downloads`, `start`, `end`, `package` directly.
475
+
476
+ - **`http_get` handles gzip transparently** — The helper already decompresses gzip responses. No manual decompression needed.
477
+
478
+ - **Never use a browser for either registry** — All data is JSON over HTTP. `http_get` calls take 80–480ms; a browser navigation would take 3–8 seconds with no benefit.