@apitap/core 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (236) hide show
  1. package/LICENSE +60 -0
  2. package/README.md +362 -0
  3. package/SKILL.md +270 -0
  4. package/dist/auth/crypto.d.ts +31 -0
  5. package/dist/auth/crypto.js +66 -0
  6. package/dist/auth/crypto.js.map +1 -0
  7. package/dist/auth/handoff.d.ts +29 -0
  8. package/dist/auth/handoff.js +180 -0
  9. package/dist/auth/handoff.js.map +1 -0
  10. package/dist/auth/manager.d.ts +46 -0
  11. package/dist/auth/manager.js +127 -0
  12. package/dist/auth/manager.js.map +1 -0
  13. package/dist/auth/oauth-refresh.d.ts +16 -0
  14. package/dist/auth/oauth-refresh.js +91 -0
  15. package/dist/auth/oauth-refresh.js.map +1 -0
  16. package/dist/auth/refresh.d.ts +43 -0
  17. package/dist/auth/refresh.js +217 -0
  18. package/dist/auth/refresh.js.map +1 -0
  19. package/dist/capture/anti-bot.d.ts +15 -0
  20. package/dist/capture/anti-bot.js +43 -0
  21. package/dist/capture/anti-bot.js.map +1 -0
  22. package/dist/capture/blocklist.d.ts +6 -0
  23. package/dist/capture/blocklist.js +70 -0
  24. package/dist/capture/blocklist.js.map +1 -0
  25. package/dist/capture/body-diff.d.ts +8 -0
  26. package/dist/capture/body-diff.js +102 -0
  27. package/dist/capture/body-diff.js.map +1 -0
  28. package/dist/capture/body-variables.d.ts +13 -0
  29. package/dist/capture/body-variables.js +142 -0
  30. package/dist/capture/body-variables.js.map +1 -0
  31. package/dist/capture/domain.d.ts +8 -0
  32. package/dist/capture/domain.js +34 -0
  33. package/dist/capture/domain.js.map +1 -0
  34. package/dist/capture/entropy.d.ts +33 -0
  35. package/dist/capture/entropy.js +100 -0
  36. package/dist/capture/entropy.js.map +1 -0
  37. package/dist/capture/filter.d.ts +11 -0
  38. package/dist/capture/filter.js +49 -0
  39. package/dist/capture/filter.js.map +1 -0
  40. package/dist/capture/graphql.d.ts +21 -0
  41. package/dist/capture/graphql.js +99 -0
  42. package/dist/capture/graphql.js.map +1 -0
  43. package/dist/capture/idle.d.ts +23 -0
  44. package/dist/capture/idle.js +44 -0
  45. package/dist/capture/idle.js.map +1 -0
  46. package/dist/capture/monitor.d.ts +26 -0
  47. package/dist/capture/monitor.js +183 -0
  48. package/dist/capture/monitor.js.map +1 -0
  49. package/dist/capture/oauth-detector.d.ts +18 -0
  50. package/dist/capture/oauth-detector.js +96 -0
  51. package/dist/capture/oauth-detector.js.map +1 -0
  52. package/dist/capture/pagination.d.ts +9 -0
  53. package/dist/capture/pagination.js +40 -0
  54. package/dist/capture/pagination.js.map +1 -0
  55. package/dist/capture/parameterize.d.ts +17 -0
  56. package/dist/capture/parameterize.js +63 -0
  57. package/dist/capture/parameterize.js.map +1 -0
  58. package/dist/capture/scrubber.d.ts +5 -0
  59. package/dist/capture/scrubber.js +38 -0
  60. package/dist/capture/scrubber.js.map +1 -0
  61. package/dist/capture/session.d.ts +46 -0
  62. package/dist/capture/session.js +445 -0
  63. package/dist/capture/session.js.map +1 -0
  64. package/dist/capture/token-detector.d.ts +16 -0
  65. package/dist/capture/token-detector.js +62 -0
  66. package/dist/capture/token-detector.js.map +1 -0
  67. package/dist/capture/verifier.d.ts +17 -0
  68. package/dist/capture/verifier.js +147 -0
  69. package/dist/capture/verifier.js.map +1 -0
  70. package/dist/cli.d.ts +2 -0
  71. package/dist/cli.js +930 -0
  72. package/dist/cli.js.map +1 -0
  73. package/dist/discovery/auth.d.ts +17 -0
  74. package/dist/discovery/auth.js +81 -0
  75. package/dist/discovery/auth.js.map +1 -0
  76. package/dist/discovery/fetch.d.ts +17 -0
  77. package/dist/discovery/fetch.js +59 -0
  78. package/dist/discovery/fetch.js.map +1 -0
  79. package/dist/discovery/frameworks.d.ts +11 -0
  80. package/dist/discovery/frameworks.js +249 -0
  81. package/dist/discovery/frameworks.js.map +1 -0
  82. package/dist/discovery/index.d.ts +21 -0
  83. package/dist/discovery/index.js +219 -0
  84. package/dist/discovery/index.js.map +1 -0
  85. package/dist/discovery/openapi.d.ts +13 -0
  86. package/dist/discovery/openapi.js +175 -0
  87. package/dist/discovery/openapi.js.map +1 -0
  88. package/dist/discovery/probes.d.ts +9 -0
  89. package/dist/discovery/probes.js +70 -0
  90. package/dist/discovery/probes.js.map +1 -0
  91. package/dist/index.d.ts +25 -0
  92. package/dist/index.js +25 -0
  93. package/dist/index.js.map +1 -0
  94. package/dist/inspect/report.d.ts +52 -0
  95. package/dist/inspect/report.js +191 -0
  96. package/dist/inspect/report.js.map +1 -0
  97. package/dist/mcp.d.ts +8 -0
  98. package/dist/mcp.js +526 -0
  99. package/dist/mcp.js.map +1 -0
  100. package/dist/orchestration/browse.d.ts +38 -0
  101. package/dist/orchestration/browse.js +198 -0
  102. package/dist/orchestration/browse.js.map +1 -0
  103. package/dist/orchestration/cache.d.ts +15 -0
  104. package/dist/orchestration/cache.js +24 -0
  105. package/dist/orchestration/cache.js.map +1 -0
  106. package/dist/plugin.d.ts +17 -0
  107. package/dist/plugin.js +158 -0
  108. package/dist/plugin.js.map +1 -0
  109. package/dist/read/decoders/deepwiki.d.ts +2 -0
  110. package/dist/read/decoders/deepwiki.js +148 -0
  111. package/dist/read/decoders/deepwiki.js.map +1 -0
  112. package/dist/read/decoders/grokipedia.d.ts +2 -0
  113. package/dist/read/decoders/grokipedia.js +210 -0
  114. package/dist/read/decoders/grokipedia.js.map +1 -0
  115. package/dist/read/decoders/hackernews.d.ts +2 -0
  116. package/dist/read/decoders/hackernews.js +168 -0
  117. package/dist/read/decoders/hackernews.js.map +1 -0
  118. package/dist/read/decoders/index.d.ts +2 -0
  119. package/dist/read/decoders/index.js +12 -0
  120. package/dist/read/decoders/index.js.map +1 -0
  121. package/dist/read/decoders/reddit.d.ts +2 -0
  122. package/dist/read/decoders/reddit.js +142 -0
  123. package/dist/read/decoders/reddit.js.map +1 -0
  124. package/dist/read/decoders/twitter.d.ts +12 -0
  125. package/dist/read/decoders/twitter.js +187 -0
  126. package/dist/read/decoders/twitter.js.map +1 -0
  127. package/dist/read/decoders/wikipedia.d.ts +2 -0
  128. package/dist/read/decoders/wikipedia.js +66 -0
  129. package/dist/read/decoders/wikipedia.js.map +1 -0
  130. package/dist/read/decoders/youtube.d.ts +2 -0
  131. package/dist/read/decoders/youtube.js +69 -0
  132. package/dist/read/decoders/youtube.js.map +1 -0
  133. package/dist/read/extract.d.ts +25 -0
  134. package/dist/read/extract.js +320 -0
  135. package/dist/read/extract.js.map +1 -0
  136. package/dist/read/index.d.ts +14 -0
  137. package/dist/read/index.js +66 -0
  138. package/dist/read/index.js.map +1 -0
  139. package/dist/read/peek.d.ts +9 -0
  140. package/dist/read/peek.js +137 -0
  141. package/dist/read/peek.js.map +1 -0
  142. package/dist/read/types.d.ts +44 -0
  143. package/dist/read/types.js +3 -0
  144. package/dist/read/types.js.map +1 -0
  145. package/dist/replay/engine.d.ts +53 -0
  146. package/dist/replay/engine.js +441 -0
  147. package/dist/replay/engine.js.map +1 -0
  148. package/dist/replay/truncate.d.ts +16 -0
  149. package/dist/replay/truncate.js +92 -0
  150. package/dist/replay/truncate.js.map +1 -0
  151. package/dist/serve.d.ts +31 -0
  152. package/dist/serve.js +149 -0
  153. package/dist/serve.js.map +1 -0
  154. package/dist/skill/generator.d.ts +44 -0
  155. package/dist/skill/generator.js +419 -0
  156. package/dist/skill/generator.js.map +1 -0
  157. package/dist/skill/importer.d.ts +26 -0
  158. package/dist/skill/importer.js +80 -0
  159. package/dist/skill/importer.js.map +1 -0
  160. package/dist/skill/search.d.ts +19 -0
  161. package/dist/skill/search.js +51 -0
  162. package/dist/skill/search.js.map +1 -0
  163. package/dist/skill/signing.d.ts +16 -0
  164. package/dist/skill/signing.js +34 -0
  165. package/dist/skill/signing.js.map +1 -0
  166. package/dist/skill/ssrf.d.ts +27 -0
  167. package/dist/skill/ssrf.js +210 -0
  168. package/dist/skill/ssrf.js.map +1 -0
  169. package/dist/skill/store.d.ts +7 -0
  170. package/dist/skill/store.js +93 -0
  171. package/dist/skill/store.js.map +1 -0
  172. package/dist/stats/report.d.ts +26 -0
  173. package/dist/stats/report.js +157 -0
  174. package/dist/stats/report.js.map +1 -0
  175. package/dist/types.d.ts +214 -0
  176. package/dist/types.js +3 -0
  177. package/dist/types.js.map +1 -0
  178. package/package.json +58 -0
  179. package/src/auth/crypto.ts +92 -0
  180. package/src/auth/handoff.ts +229 -0
  181. package/src/auth/manager.ts +140 -0
  182. package/src/auth/oauth-refresh.ts +120 -0
  183. package/src/auth/refresh.ts +300 -0
  184. package/src/capture/anti-bot.ts +63 -0
  185. package/src/capture/blocklist.ts +75 -0
  186. package/src/capture/body-diff.ts +109 -0
  187. package/src/capture/body-variables.ts +156 -0
  188. package/src/capture/domain.ts +34 -0
  189. package/src/capture/entropy.ts +121 -0
  190. package/src/capture/filter.ts +56 -0
  191. package/src/capture/graphql.ts +124 -0
  192. package/src/capture/idle.ts +45 -0
  193. package/src/capture/monitor.ts +224 -0
  194. package/src/capture/oauth-detector.ts +106 -0
  195. package/src/capture/pagination.ts +49 -0
  196. package/src/capture/parameterize.ts +68 -0
  197. package/src/capture/scrubber.ts +49 -0
  198. package/src/capture/session.ts +502 -0
  199. package/src/capture/token-detector.ts +76 -0
  200. package/src/capture/verifier.ts +171 -0
  201. package/src/cli.ts +1031 -0
  202. package/src/discovery/auth.ts +99 -0
  203. package/src/discovery/fetch.ts +85 -0
  204. package/src/discovery/frameworks.ts +231 -0
  205. package/src/discovery/index.ts +256 -0
  206. package/src/discovery/openapi.ts +230 -0
  207. package/src/discovery/probes.ts +76 -0
  208. package/src/index.ts +26 -0
  209. package/src/inspect/report.ts +247 -0
  210. package/src/mcp.ts +618 -0
  211. package/src/orchestration/browse.ts +250 -0
  212. package/src/orchestration/cache.ts +37 -0
  213. package/src/plugin.ts +188 -0
  214. package/src/read/decoders/deepwiki.ts +180 -0
  215. package/src/read/decoders/grokipedia.ts +246 -0
  216. package/src/read/decoders/hackernews.ts +198 -0
  217. package/src/read/decoders/index.ts +15 -0
  218. package/src/read/decoders/reddit.ts +158 -0
  219. package/src/read/decoders/twitter.ts +211 -0
  220. package/src/read/decoders/wikipedia.ts +75 -0
  221. package/src/read/decoders/youtube.ts +75 -0
  222. package/src/read/extract.ts +396 -0
  223. package/src/read/index.ts +78 -0
  224. package/src/read/peek.ts +175 -0
  225. package/src/read/types.ts +37 -0
  226. package/src/replay/engine.ts +559 -0
  227. package/src/replay/truncate.ts +116 -0
  228. package/src/serve.ts +189 -0
  229. package/src/skill/generator.ts +473 -0
  230. package/src/skill/importer.ts +107 -0
  231. package/src/skill/search.ts +76 -0
  232. package/src/skill/signing.ts +36 -0
  233. package/src/skill/ssrf.ts +238 -0
  234. package/src/skill/store.ts +107 -0
  235. package/src/stats/report.ts +208 -0
  236. package/src/types.ts +233 -0
package/LICENSE ADDED
@@ -0,0 +1,60 @@
1
+ Business Source License 1.1
2
+
3
+ Parameters
4
+ Licensor: ApiTap Contributors
5
+ Licensed Work: ApiTap
6
+ Change Date: February 7, 2029
7
+ Change License: Apache License 2.0
8
+
9
+ Notice
10
+
11
+ Business Source License 1.1
12
+
13
+ This Business Source License (this "License") is not an Open Source license.
14
+ However, the Licensed Work will eventually be made available under an Open Source License, as stated in this License.
15
+
16
+ License text Copyright (c) 2023 Hashicorp, Inc. "Business Source License" is a trademark of Hashicorp, Inc.
17
+
18
+ ---
19
+
20
+ Terms
21
+
22
+ The Licensor hereby grants you the right to copy, modify, create derivative works, and distribute the Licensed Work. However, if you receive the Licensed Work from Licensor and you do not have a license agreement with Licensor for the Licensed Work, then your rights under this License will end on the earlier of: (i) the date such proprietary rights notice is first received by you, or (ii) the Change Date.
23
+
24
+ Violation of Licensor's intellectual property rights (including patent, trademark, and/or trade secret) is prohibited.
25
+
26
+ You are granted a personal, non-exclusive, non-transferable license to use the Licensed Work in a non-competing manner.
27
+
28
+ Restrictions:
29
+ 1. **No Competing Commercial Use:** You may not offer the Licensed Work as a commercial service that competes with a hosting or software-as-a-service offering by Licensor or any of its affiliates.
30
+
31
+ Non-competing uses include:
32
+ - Self-hosted deployment for your own use
33
+ - Internal company deployment
34
+ - Open source forks and contributions
35
+ - Academic or non-profit use
36
+ - Educational use
37
+ - Research use
38
+
39
+ Competing uses include (prohibited until Change Date):
40
+ - Offering hosted/cloud ApiTap as a service
41
+ - Rebranding ApiTap and selling it
42
+ - Providing ApiTap services to third parties for commercial gain
43
+
44
+ 2. **Patent Rights:** Licensor grants you a license to any patent rights controlled by Licensor that are necessarily infringed by the Licensed Work as provided in source code form.
45
+
46
+ 3. **Open Source Exceptions:** The restrictions in Section 1 do not apply to any fork or modification that is properly made available under an Open Source License as defined by the Open Source Initiative (www.opensource.org), provided that you make the source code of any such fork publicly available.
47
+
48
+ 4. **No Other Rights:** Except as expressly stated herein, Licensor retains all right, title, and interest in the Licensed Work.
49
+
50
+ ---
51
+
52
+ Change Date
53
+
54
+ On the Change Date (February 7, 2029), or if earlier, upon the occurrence of an event specified by Licensor, the Licensed Work automatically converts to the Change License, which is the Apache License 2.0. At that time, the restrictions in Section 1 will no longer apply, and the Licensed Work will be available under Apache License 2.0.
55
+
56
+ ---
57
+
58
+ Disclaimer of Warranties
59
+
60
+ The Licensed Work is provided "AS-IS" without warranty of any kind, express, implied, or statutory, including but not limited to warranties of merchantability, fitness for a particular purpose, and non-infringement. In no event shall Licensor be liable for any indirect, incidental, special, exemplary, or consequential damages.
package/README.md ADDED
@@ -0,0 +1,362 @@
1
+ # ApiTap
2
+
3
+ [![npm version](https://badge.fury.io/js/apitap.svg)](https://www.npmjs.com/package/apitap)
4
+ [![tests](https://img.shields.io/badge/tests-721%20passing-brightgreen)](https://github.com/n1byn1kt/apitap)
5
+ [![license](https://img.shields.io/badge/license-BSL--1.1-blue)](./LICENSE)
6
+
7
+ **The MCP server that turns any website into an API — no docs, no SDK, no browser.**
8
+
9
+ ApiTap is an MCP server that lets AI agents browse the web through APIs instead of browsers. When an agent needs data from a website, ApiTap automatically detects the site's framework (WordPress, Next.js, Shopify, etc.), discovers its internal API endpoints, and calls them directly — returning clean JSON instead of forcing the agent to render and parse HTML. For sites that need authentication, it opens a browser window for a human to log in, captures the session tokens, and hands control back to the agent. Every site visited generates a reusable "skill file" that maps the site's APIs, so the first visit is a discovery step and every subsequent visit is a direct, instant API call. It works with any MCP-compatible LLM client and reduces token costs by 20-100x compared to browser automation.
10
+
11
+ The web was built for human eyes; ApiTap makes it native to machines.
12
+
13
+ ```bash
14
+ # One tool call: discover the API + replay it
15
+ apitap browse https://techcrunch.com
16
+ ✓ Discovery: WordPress detected (medium confidence)
17
+ ✓ Replay: GET /wp-json/wp/v2/posts → 200 (10 articles)
18
+
19
+ # Or read content directly — no browser needed
20
+ apitap read https://en.wikipedia.org/wiki/Node.js
21
+ ✓ Wikipedia decoder: ~127 tokens (vs ~4,900 raw HTML)
22
+
23
+ # Or step by step:
24
+ apitap capture https://polymarket.com # Watch API traffic
25
+ apitap show gamma-api.polymarket.com # See what was captured
26
+ apitap replay gamma-api.polymarket.com get-events # Call the API directly
27
+ ```
28
+
29
+ No scraping. No browser. Just the API.
30
+
31
+ ---
32
+
33
+ ## How It Works
34
+
35
+ 1. **Capture** — Launch a Playwright browser, visit a site, browse normally. ApiTap intercepts all network traffic via CDP.
36
+ 2. **Filter** — Scoring engine separates signal from noise. Analytics, tracking pixels, and framework internals are filtered out. Only real API endpoints survive.
37
+ 3. **Generate** — Captured endpoints are grouped by domain, URLs are parameterized (`/users/123` → `/users/:id`), and a JSON skill file is written to `~/.apitap/skills/`.
38
+ 4. **Replay** — Read the skill file, substitute parameters, call the API with `fetch()`. Zero dependencies in the replay path.
39
+
40
+ ```
41
+ Capture: Browser → Playwright listener → Filter → Skill Generator → skill.json
42
+ Replay: Agent → Replay Engine (skill.json) → fetch() → API → JSON response
43
+ ```
44
+
45
+ ## Install
46
+
47
+ ```bash
48
+ npm install -g apitap
49
+ ```
50
+
51
+ Requires Node.js 20+. Playwright browsers are installed automatically on first capture.
52
+
53
+ ## Quick Start
54
+
55
+ ### Capture API traffic
56
+
57
+ ```bash
58
+ # Capture from a single domain (default)
59
+ apitap capture https://polymarket.com
60
+
61
+ # Capture all domains (CDN, API subdomains, etc.)
62
+ apitap capture https://polymarket.com --all-domains
63
+
64
+ # Include response previews in the skill file
65
+ apitap capture https://polymarket.com --preview
66
+
67
+ # Stop after 30 seconds
68
+ apitap capture https://polymarket.com --duration 30
69
+ ```
70
+
71
+ ApiTap opens a browser window. Browse the site normally — click around, scroll, search. Every API call is captured. Press Ctrl+C when done.
72
+
73
+ ### List and explore captured APIs
74
+
75
+ ```bash
76
+ # List all skill files
77
+ apitap list
78
+ ✓ gamma-api.polymarket.com 3 endpoints 2m ago
79
+ ✓ www.reddit.com 2 endpoints 1h ago
80
+
81
+ # Show endpoints for a domain
82
+ apitap show gamma-api.polymarket.com
83
+ [green] ✓ GET /events object (3 fields)
84
+ [green] ✓ GET /teams array (12 fields)
85
+
86
+ # Search across all skill files
87
+ apitap search polymarket
88
+ ```
89
+
90
+ ### Replay an endpoint
91
+
92
+ ```bash
93
+ # Replay with captured defaults
94
+ apitap replay gamma-api.polymarket.com get-events
95
+
96
+ # Override parameters
97
+ apitap replay gamma-api.polymarket.com get-events limit=5 offset=10
98
+
99
+ # Machine-readable JSON output
100
+ apitap replay gamma-api.polymarket.com get-events --json
101
+ ```
102
+
103
+ ## Text-Mode Browsing
104
+
105
+ ApiTap includes a text-mode browsing pipeline — `peek` and `read` — that lets agents consume web content without launching a browser. Seven built-in decoders extract structured content from popular sites at a fraction of the token cost:
106
+
107
+ | Site | Decoder | Typical Tokens | vs Raw HTML |
108
+ |------|---------|----------------|-------------|
109
+ | Reddit | `reddit` | ~500 | 95% smaller |
110
+ | YouTube | `youtube` | ~36 | 99% smaller |
111
+ | Wikipedia | `wikipedia` | ~127 | 97% smaller |
112
+ | Hacker News | `hackernews` | ~200 | 90% smaller |
113
+ | Grokipedia | `grokipedia` | ~150 | 90% smaller |
114
+ | Twitter/X | `twitter` | ~80 | 95% smaller |
115
+ | Any other site | `generic` | varies | ~74% avg |
116
+
117
+ **Average token savings: 74% across 83 tested domains.**
118
+
119
+ ```bash
120
+ # Triage first — zero-cost HEAD request
121
+ apitap peek https://reddit.com/r/programming
122
+ ✓ accessible, recommendation: read
123
+
124
+ # Extract content — no browser needed
125
+ apitap read https://reddit.com/r/programming
126
+ ✓ Reddit decoder: 12 posts, ~500 tokens
127
+
128
+ # Works for any URL — falls back to generic HTML extraction
129
+ apitap read https://example.com/blog/post
130
+ ```
131
+
132
+ For MCP agents, `apitap_peek` and `apitap_read` are the fastest way to consume web content — use them before reaching for `apitap_browse` or `apitap_capture`.
133
+
134
+ ## Tested Sites
135
+
136
+ ApiTap has been tested against real-world sites:
137
+
138
+ | Site | Endpoints | Tier | Replay |
139
+ |------|-----------|------|--------|
140
+ | Polymarket | 3 | Green | 200 |
141
+ | Reddit | 2 | Green | 200 |
142
+ | Discord | 4 | Green | 200 |
143
+ | GitHub | 1 | Green | 200 |
144
+ | HN (Algolia) | 1 | Yellow | 200 |
145
+ | dev.to | 2 | Green | 200 |
146
+ | CoinGecko | 6 | Green | 200 |
147
+
148
+ 78% overall replay success rate across 9 tested sites (green tier: 100%).
149
+
150
+ ## Why ApiTap?
151
+
152
+ **Why not just use the public API?** Most sites don't have one, or it's heavily rate-limited. The internal API that powers the SPA is often richer, faster, and already handles auth.
153
+
154
+ **Why not just use Playwright/Puppeteer?** Browser automation costs 50-200K tokens per page for an AI agent. ApiTap captures the API once, then your agent calls it directly at 1-5K tokens. No DOM, no selectors, no flaky waits.
155
+
156
+ **Why not reverse-engineer the API manually?** You could open DevTools and copy headers by hand. ApiTap does it in 30 seconds and gives you a portable file any agent can use.
157
+
158
+ **Isn't this just a MITM proxy?** No. ApiTap is read-only — it uses Chrome DevTools Protocol to observe responses. No certificate setup, no request modification, no code injection.
159
+
160
+ ## Replayability Tiers
161
+
162
+ Every captured endpoint is classified by replay difficulty:
163
+
164
+ | Tier | Meaning | Replay |
165
+ |------|---------|--------|
166
+ | **Green** | Public, permissive CORS, no signing | Works with `fetch()` |
167
+ | **Yellow** | Needs auth, no signing/anti-bot | Works with stored credentials |
168
+ | **Orange** | CSRF tokens, session binding | Fragile — may need browser refresh |
169
+ | **Red** | Request signing, anti-bot (Cloudflare) | Needs full browser |
170
+
171
+ GET endpoints are auto-verified during capture by comparing Playwright responses with raw `fetch()` responses.
172
+
173
+ ## MCP Server
174
+
175
+ ApiTap includes an MCP server with 12 tools for Claude Desktop, Cursor, Windsurf, and other MCP-compatible clients.
176
+
177
+ ```bash
178
+ # Start the MCP server
179
+ apitap-mcp
180
+ ```
181
+
182
+ Add to your MCP config (e.g. `claude_desktop_config.json`):
183
+
184
+ ```json
185
+ {
186
+ "mcpServers": {
187
+ "apitap": {
188
+ "command": "npx",
189
+ "args": ["apitap-mcp"]
190
+ }
191
+ }
192
+ }
193
+ ```
194
+
195
+ ### MCP Tools
196
+
197
+ | Tool | Description |
198
+ |------|-------------|
199
+ | `apitap_browse` | High-level "just get me the data" (discover + replay in one call) |
200
+ | `apitap_peek` | Zero-cost URL triage (HEAD only) |
201
+ | `apitap_read` | Extract content without a browser (7 decoders) |
202
+ | `apitap_discover` | Detect a site's APIs without launching a browser |
203
+ | `apitap_search` | Search available skill files |
204
+ | `apitap_replay` | Replay a captured API endpoint |
205
+ | `apitap_replay_batch` | Replay multiple endpoints in parallel across domains |
206
+ | `apitap_capture` | Capture API traffic via instrumented browser |
207
+ | `apitap_capture_start` | Start an interactive capture session |
208
+ | `apitap_capture_interact` | Interact with a live capture session (click, type, scroll) |
209
+ | `apitap_capture_finish` | Finish or abort a capture session |
210
+ | `apitap_auth_request` | Request human authentication for a site |
211
+
212
+ You can also serve a single skill file as a dedicated MCP server with `apitap serve <domain>` — each endpoint becomes its own tool.
213
+
214
+ ## Auth Management
215
+
216
+ ApiTap automatically detects and stores auth credentials (Bearer tokens, API keys, cookies) during capture. Credentials are encrypted at rest with AES-256-GCM.
217
+
218
+ ```bash
219
+ # View auth status
220
+ apitap auth api.example.com
221
+
222
+ # List all domains with stored auth
223
+ apitap auth --list
224
+
225
+ # Refresh expired tokens via browser
226
+ apitap refresh api.example.com
227
+
228
+ # Force fresh token before replay
229
+ apitap replay api.example.com get-data --fresh
230
+
231
+ # Clear stored auth
232
+ apitap auth api.example.com --clear
233
+ ```
234
+
235
+ ## Skill Files
236
+
237
+ Skill files are JSON documents stored at `~/.apitap/skills/<domain>.json`. They contain everything needed to replay an API — endpoints, headers, query params, request bodies, pagination patterns, and response shapes.
238
+
239
+ ```json
240
+ {
241
+ "version": "1.1",
242
+ "domain": "gamma-api.polymarket.com",
243
+ "baseUrl": "https://gamma-api.polymarket.com",
244
+ "endpoints": [
245
+ {
246
+ "id": "get-events",
247
+ "method": "GET",
248
+ "path": "/events",
249
+ "queryParams": { "limit": { "type": "string", "example": "10" } },
250
+ "headers": {},
251
+ "responseShape": { "type": "object", "fields": ["id", "title", "slug"] }
252
+ }
253
+ ]
254
+ }
255
+ ```
256
+
257
+ Skill files are portable and shareable. Auth credentials are stored separately in encrypted storage — never in the skill file itself.
258
+
259
+ ### Import / Export
260
+
261
+ ```bash
262
+ # Import a skill file from someone else
263
+ apitap import ./reddit-skills.json
264
+
265
+ # Import validates: signature check → SSRF scan → confirmation
266
+ ```
267
+
268
+ Imported files are re-signed with your local key and marked with `imported` provenance.
269
+
270
+ ## Security
271
+
272
+ ApiTap handles untrusted skill files from the internet and replays HTTP requests on your behalf. That's a high-trust position, and we treat it seriously.
273
+
274
+ ### Defense in Depth
275
+
276
+ - **Auth encryption** — AES-256-GCM with PBKDF2 key derivation, keyed to your machine
277
+ - **PII scrubbing** — Emails, phones, IPs, credit cards, SSNs detected and redacted during capture
278
+ - **SSRF protection** — Multi-layer URL validation blocks access to internal networks (see below)
279
+ - **Header injection protection** — Allowlist prevents skill files from injecting dangerous HTTP headers (`Host`, `X-Forwarded-For`, `Cookie`, `Authorization`)
280
+ - **Redirect validation** — Manual redirect handling with SSRF re-check prevents redirect-to-internal-IP attacks
281
+ - **DNS rebinding prevention** — Resolved IPs are pinned to prevent TOCTOU attacks where DNS returns different IPs on second lookup
282
+ - **Skill signing** — HMAC-SHA256 signatures detect tampering; three-state provenance tracking (self/imported/unsigned)
283
+ - **No phone-home** — Everything runs locally. No external services, no telemetry
284
+ - **Read-only capture** — Playwright intercepts responses only. No request modification or code injection
285
+
286
+ ### Why SSRF Protection Matters
287
+
288
+ Since skill files can come from anywhere — shared by colleagues, downloaded from GitHub, or imported from untrusted sources — a malicious skill file is the primary threat vector. Here's what ApiTap defends against:
289
+
290
+ **The attack:** An attacker crafts a skill file with `baseUrl: "http://169.254.169.254"` (the AWS/cloud metadata endpoint) or `baseUrl: "http://localhost:8080"` (your internal services). When you replay an endpoint, your machine makes the request, potentially leaking cloud credentials or hitting internal APIs.
291
+
292
+ **The defense:** ApiTap validates every URL at multiple points:
293
+
294
+ ```
295
+ Skill file imported
296
+ → validateUrl(): block private IPs, internal hostnames, non-HTTP schemes
297
+ → validateSkillFileUrls(): scan baseUrl + all endpoint example URLs
298
+
299
+ Endpoint replayed
300
+ → resolveAndValidateUrl(): DNS lookup + verify resolved IP isn't private
301
+ → IP pinning: fetch uses resolved IP directly (prevents DNS rebinding)
302
+ → Header filtering: strip dangerous headers from skill file
303
+ → Redirect check: if server redirects, validate new target before following
304
+ ```
305
+
306
+ **Blocked ranges:** `127.0.0.0/8`, `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`, `169.254.0.0/16` (cloud metadata), `0.0.0.0`, IPv6 equivalents (`::1`, `fe80::/10`, `fc00::/7`, `::ffff:` mapped addresses), `localhost`, `.local`, `.internal`, `file://`, `javascript:` schemes.
307
+
308
+ This is especially relevant now that [MCP servers are being used as attack vectors in the wild](https://cloud.google.com/blog/topics/threat-intelligence/distillation-experimentation-integration-ai-adversarial-use) — Google's Threat Intelligence Group recently documented underground toolkits built on compromised MCP servers. ApiTap is designed to be safe even when processing untrusted inputs.
309
+
310
+ See [docs/security-audit-v1.md](./docs/security-audit-v1.md) for the full security audit (19 findings, current posture 9/10).
311
+
312
+ ## CLI Reference
313
+
314
+ All commands support `--json` for machine-readable output.
315
+
316
+ | Command | Description |
317
+ |---------|-------------|
318
+ | `apitap browse <url>` | Discover + replay in one step |
319
+ | `apitap peek <url>` | Zero-cost URL triage (HEAD only) |
320
+ | `apitap read <url>` | Extract content without a browser |
321
+ | `apitap discover <url>` | Detect APIs without launching a browser |
322
+ | `apitap capture <url>` | Capture API traffic from a website |
323
+ | `apitap list` | List available skill files |
324
+ | `apitap show <domain>` | Show endpoints for a domain |
325
+ | `apitap search <query>` | Search skill files by domain or endpoint |
326
+ | `apitap replay <domain> <id> [key=val...]` | Replay an API endpoint |
327
+ | `apitap import <file>` | Import a skill file with safety validation |
328
+ | `apitap refresh <domain>` | Refresh auth tokens via browser |
329
+ | `apitap auth [domain]` | View or manage stored auth |
330
+ | `apitap serve <domain>` | Serve a skill file as an MCP server |
331
+ | `apitap inspect <url>` | Discover APIs without saving |
332
+ | `apitap stats` | Show token savings report |
333
+ | `apitap --version` | Print version |
334
+
335
+ ### Capture flags
336
+
337
+ | Flag | Description |
338
+ |------|-------------|
339
+ | `--all-domains` | Capture traffic from all domains (default: target domain only) |
340
+ | `--preview` | Include response data previews |
341
+ | `--duration <sec>` | Stop capture after N seconds |
342
+ | `--port <port>` | Connect to specific CDP port |
343
+ | `--launch` | Always launch a new browser |
344
+ | `--attach` | Only attach to existing browser |
345
+ | `--no-scrub` | Disable PII scrubbing |
346
+ | `--no-verify` | Skip auto-verification of GET endpoints |
347
+
348
+ ## Development
349
+
350
+ ```bash
351
+ git clone https://github.com/n1byn1kt/apitap.git
352
+ cd apitap
353
+ npm install
354
+ npm test # 721 tests, Node built-in test runner
355
+ npm run typecheck # Type checking
356
+ npm run build # Compile to dist/
357
+ npx tsx src/cli.ts capture <url> # Run from source
358
+ ```
359
+
360
+ ## License
361
+
362
+ [Business Source License 1.1](./LICENSE) — **free for all non-competing use** (personal, internal, educational, research, open source). Cannot be rebranded and sold as a competing service. Converts to Apache 2.0 on February 7, 2029.
package/SKILL.md ADDED
@@ -0,0 +1,270 @@
1
+ # ApiTap — The MCP Server That Turns Any Website Into an API
2
+
3
+ > No docs, no SDK, no browser. Just data.
4
+
5
+ ## What It Does
6
+
7
+ ApiTap gives AI agents cheap access to web data through three layers:
8
+
9
+ 1. **Read** — Decode any URL into structured text without a browser (side-channel APIs, og: tags, HTML extraction). 0-10K tokens vs 50-200K for browser automation.
10
+ 2. **Replay** — Call captured API endpoints directly. 1-5K tokens per call.
11
+ 3. **Capture** — Record API traffic from a headless browser session, generating reusable skill files.
12
+
13
+ ## MCP Tools (12)
14
+
15
+ ### Tier 0: Triage (free)
16
+
17
+ #### `apitap_peek`
18
+ Zero-cost URL triage. HTTP HEAD only — checks accessibility, bot protection, framework detection.
19
+ ```
20
+ apitap_peek(url: string) → PeekResult
21
+ ```
22
+ **Use when:** You want to know if a site is accessible before spending tokens. Check bot protection, detect frameworks.
23
+
24
+ **Returns:** `{ status, accessible, server, framework, botProtection, signals[], recommendation }`
25
+
26
+ `recommendation` is one of: `read` | `capture` | `auth_required` | `blocked`
27
+
28
+ **Example:**
29
+ ```
30
+ apitap_peek("https://www.zillow.com") → { status: 200, recommendation: "read" }
31
+ apitap_peek("https://www.doordash.com") → { status: 403, botProtection: "cloudflare", recommendation: "blocked" }
32
+ ```
33
+
34
+ ### Tier 1: Read (0-10K tokens, no browser)
35
+
36
+ #### `apitap_read`
37
+ Extract content from any URL without a browser. Uses side-channel APIs for known sites and HTML extraction for everything else.
38
+ ```
39
+ apitap_read(url: string, maxBytes?: number) → ReadResult
40
+ ```
41
+ **Use when:** You need page content, article text, post data, or listing info. Always try this before capture.
42
+
43
+ **Returns:** `{ title, author, description, content (markdown), links[], images[], metadata: { source, type, publishedAt }, cost: { tokens } }`
44
+
45
+ **Site-specific decoders (free, structured):**
46
+ | Site | Side Channel | What You Get |
47
+ |------|-------------|-------------|
48
+ | Reddit | `.json` suffix | Posts, scores, comments, authors — full structured data |
49
+ | YouTube | oembed API | Title, author, channel, thumbnail |
50
+ | Wikipedia | REST API | Article summary, structured, with edit dates |
51
+ | Hacker News | Firebase API | Stories, scores, comments, real-time |
52
+ | Grokipedia | xAI public API | Full articles with citations, search, 6M+ articles |
53
+ | Twitter/X | fxtwitter API | Full tweets, articles, engagement, quotes, media |
54
+ | Everything else | og: tags + HTML extraction | Title, content as markdown, links, images |
55
+
56
+ **Examples:**
57
+ ```
58
+ # Reddit — full subreddit listing, ~500 tokens
59
+ apitap_read("https://www.reddit.com/r/technology")
60
+
61
+ # Reddit post with comments
62
+ apitap_read("https://www.reddit.com/r/wallstreetbets/comments/abc123/some-post")
63
+
64
+ # YouTube — 36 tokens
65
+ apitap_read("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
66
+
67
+ # Wikipedia — 116 tokens
68
+ apitap_read("https://en.wikipedia.org/wiki/Artificial_intelligence")
69
+
70
+ # Grokipedia — full article with citations, 6M+ articles
71
+ apitap_read("https://grokipedia.com/wiki/SpaceX")
72
+
73
+ # Grokipedia — search across 6M articles
74
+ apitap_read("https://grokipedia.com/search?q=artificial+intelligence")
75
+
76
+ # Grokipedia — site stats and recent activity
77
+ apitap_read("https://grokipedia.com/")
78
+
79
+ # Twitter/X — full tweet with engagement, articles, quotes
80
+ apitap_read("https://x.com/elonmusk/status/123456789")
81
+
82
+ # Twitter/X article (long-form post) — full text extracted
83
+ apitap_read("https://twitter.com/writer/status/987654321")
84
+
85
+ # Any article/blog/news — generic extraction
86
+ apitap_read("https://example.com/blog/some-article")
87
+
88
+ # Zillow listing (bypasses PerimeterX via og: tags)
89
+ apitap_read("https://www.zillow.com/homedetails/123-Main-St/12345_zpid/")
90
+ ```
91
+
92
+ ### Tier 2: Replay (1-5K tokens, needs skill file)
93
+
94
+ #### `apitap_search`
95
+ Find available skill files by domain or keyword.
96
+ ```
97
+ apitap_search(query: string) → { found, results[] }
98
+ ```
99
+ **Use when:** Looking for captured API endpoints. Search by domain name or topic.
100
+
101
+ #### `apitap_replay`
102
+ Call a captured API endpoint directly — no browser needed.
103
+ ```
104
+ apitap_replay(domain: string, endpointId: string, endpointParams?: object, maxBytes?: number) → ReplayResult
105
+ ```
106
+ **Use when:** A skill file exists for this domain. This is the cheapest way to get structured API data.
107
+
108
+ **Returns:** `{ status, data (JSON), domain, endpointId, tier, fromCache }`
109
+
110
+ **Example:**
111
+ ```
112
+ # Get live stock quote (Robinhood, no auth needed)
113
+ apitap_replay("api.robinhood.com", "get-marketdata-quotes", { symbols: "TSLA,MSFT" })
114
+
115
+ # Get NBA scores (ESPN)
116
+ apitap_replay("site.api.espn.com", "get-apis-personalized-v2-scoreboard-header")
117
+
118
+ # Get crypto trending (CoinMarketCap)
119
+ apitap_replay("api.coinmarketcap.com", "get-data-api-v3-unified-trending-top-boost-listing")
120
+ ```
121
+
122
+ #### `apitap_replay_batch`
123
+ Replay multiple endpoints in one call.
124
+ ```
125
+ apitap_replay_batch(requests: Array<{ domain, endpointId, endpointParams? }>, maxBytes?: number)
126
+ ```
127
+
128
+ ### Tier 3: Capture (15-20K tokens, uses browser)
129
+
130
+ #### `apitap_capture`
131
+ Launch a headless browser to capture API traffic from a website.
132
+ ```
133
+ apitap_capture(url: string, duration?: number) → { sessionId }
134
+ ```
135
+ **Use when:** No skill file exists and `apitap_read` doesn't give you the data you need. This is expensive but creates a skill file for future free replays.
136
+
137
+ #### `apitap_capture_interact`
138
+ Send browser commands during an active capture session.
139
+ ```
140
+ apitap_capture_interact(sessionId: string, action: string, ...) → result
141
+ ```
142
+ Actions: `click`, `type`, `navigate`, `snapshot`, `scroll`, `wait`
143
+
144
+ #### `apitap_capture_finish`
145
+ End capture session, generate skill file, verify endpoints.
146
+ ```
147
+ apitap_capture_finish(sessionId: string) → { skillFile, endpoints[] }
148
+ ```
149
+
150
+ ### Auto-Router
151
+
152
+ #### `apitap_browse`
153
+ Automatic escalation: cache → skill file → discover → **read** → capture_needed.
154
+ ```
155
+ apitap_browse(url: string, query?: string, maxBytes?: number) → result
156
+ ```
157
+ **Use when:** You don't know which tier to use. This tries the cheapest option first and escalates automatically.
158
+
159
+ ### Inspection
160
+
161
+ #### `apitap_inspect`
162
+ Get details about a skill file's endpoints.
163
+ ```
164
+ apitap_inspect(domain: string) → { endpoints[], metadata }
165
+ ```
166
+
167
+ #### `apitap_stats`
168
+ Usage statistics across all skill files.
169
+ ```
170
+ apitap_stats() → { domains, endpoints, tiers }
171
+ ```
172
+
173
+ ## Decision Tree
174
+
175
+ ```
176
+ Need web data?
177
+
178
+ ├─ Know the domain? → apitap_search → found? → apitap_replay (cheapest)
179
+
180
+ ├─ Unknown URL → apitap_peek first (free)
181
+ │ ├─ recommendation: "blocked" → STOP, tell user
182
+ │ ├─ recommendation: "read" → apitap_read (no browser)
183
+ │ ├─ recommendation: "capture" → apitap_capture (browser)
184
+ │ └─ recommendation: "auth_required" → needs human login
185
+
186
+ ├─ Just need article/post content → apitap_read directly
187
+
188
+ └─ Need structured API data → apitap_capture → creates skill file → future replays free
189
+ ```
190
+
191
+ ## Key Patterns
192
+
193
+ ### Instagram profile data (login wall bypass)
194
+ Instagram blocks all normal scraping (Googlebot UA, oembed, noembed). But the **mobile API** works:
195
+ ```bash
196
+ curl -s 'https://i.instagram.com/api/v1/users/web_profile_info/?params={"user_name":"TARGET_USERNAME"}' \
197
+ -H 'User-Agent: Instagram 275.0.0.27.98 Android (33/13; 420dpi; 1080x2400; samsung; SM-G991B; o1s; exynos2100)' \
198
+ -H 'X-IG-App-ID: 936619743392459'
199
+ ```
200
+ **Returns:** Full profile JSON — bio, follower/following counts, post count, contact info (email, phone), category, highlights, recent posts with captions/engagement.
201
+
202
+ **When to use:** Need Instagram profile data, follower counts, contact info, or recent post summaries. Works without auth.
203
+
204
+ **Limitations:** Only public profiles. Rate-limited if abused. Does NOT return full post feeds — just recent edge.
205
+
206
+ ### Morning news scan
207
+ ```
208
+ # Scan multiple subreddits
209
+ for sub in ["technology", "wallstreetbets", "privacy"]:
210
+ apitap_read(f"https://www.reddit.com/r/{sub}")
211
+ ```
212
+
213
+ ### Stock research
214
+ ```
215
+ # Live quote via captured API
216
+ apitap_replay("api.robinhood.com", "get-marketdata-quotes", { symbols: "TSLA" })
217
+
218
+ # Company fundamentals
219
+ apitap_replay("api.robinhood.com", "get-fundamentals", { symbol: "TSLA" })
220
+ ```
221
+
222
+ ### Research any topic (dual knowledge base)
223
+ ```
224
+ # 1. Read Wikipedia summary (established knowledge)
225
+ apitap_read("https://en.wikipedia.org/wiki/Topic")
226
+
227
+ # 2. Read Grokipedia article (AI-curated, with citations)
228
+ apitap_read("https://grokipedia.com/wiki/Topic")
229
+
230
+ # 3. Check Reddit discussion (community sentiment)
231
+ apitap_read("https://www.reddit.com/r/relevant_sub")
232
+
233
+ # 4. Read a linked article
234
+ apitap_read("https://news-site.com/article")
235
+ ```
236
+
237
+ ### Check before committing
238
+ ```
239
+ # Peek first — is it worth reading?
240
+ result = apitap_peek("https://some-site.com")
241
+ if result.recommendation == "read":
242
+ apitap_read("https://some-site.com")
243
+ elif result.recommendation == "blocked":
244
+ # Don't waste tokens
245
+ pass
246
+ ```
247
+
248
+ ## Token Economics
249
+
250
+ | Method | Cost per page | Notes |
251
+ |--------|-------------|-------|
252
+ | Browser automation | 50-200K tokens | Full DOM serialization |
253
+ | apitap_read | 0-10K tokens | No browser, side channels |
254
+ | apitap_replay | 1-5K tokens | Direct API call, needs skill file |
255
+ | apitap_peek | ~0 tokens | HEAD request only |
256
+
257
+ ## CLI Usage
258
+
259
+ All MCP tools are also available as CLI commands:
260
+ ```bash
261
+ apitap peek <url> [--json]
262
+ apitap read <url> [--json] [--max-bytes <n>]
263
+ apitap search <query> [--json]
264
+ apitap replay <domain> <endpointId> [--params '{}'] [--json]
265
+ apitap capture <url> [--duration <sec>] [--json]
266
+ apitap inspect <domain> [--json]
267
+ apitap stats [--json]
268
+ ```
269
+
270
+ Every command supports `--json` for machine-readable output.