@crawlkit-sh/sdk 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,24 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [1.0.0] - 2024-01-25
9
+
10
+ ### Added
11
+
12
+ - Initial release of @crawlkit/sdk
13
+ - Core scraping functionality (`scrape()`)
14
+ - AI-powered data extraction (`extract()`)
15
+ - Web search via DuckDuckGo (`search()`)
16
+ - Full-page screenshots (`screenshot()`)
17
+ - LinkedIn scraping (`linkedin.company()`, `linkedin.person()`)
18
+ - Instagram scraping (`instagram.profile()`, `instagram.content()`)
19
+ - Google Play Store data (`appstore.playstoreReviews()`, `appstore.playstoreDetail()`)
20
+ - Apple App Store data (`appstore.appstoreReviews()`)
21
+ - Comprehensive TypeScript types
22
+ - Custom error classes for better error handling
23
+ - ESM and CommonJS support
24
+ - Zero runtime dependencies
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 CrawlKit
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,386 @@
1
+ # CrawlKit SDK
2
+
3
+ [![npm version](https://img.shields.io/npm/v/@crawlkit-sh/sdk.svg)](https://www.npmjs.com/package/@crawlkit-sh/sdk)
4
+ [![npm downloads](https://img.shields.io/npm/dm/@crawlkit-sh/sdk.svg)](https://www.npmjs.com/package/@crawlkit-sh/sdk)
5
+ [![TypeScript](https://img.shields.io/badge/TypeScript-Ready-blue.svg)](https://www.typescriptlang.org/)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+
8
+ > Official TypeScript/JavaScript SDK for [CrawlKit](https://crawlkit.sh) - the modern web scraping API
9
+
10
+ Turn any website into structured data with a single API call. CrawlKit handles proxies, JavaScript rendering, anti-bot detection, and data extraction so you can focus on building.
11
+
12
+ ## Features
13
+
14
+ - **Web Scraping** - Convert any webpage to clean Markdown, HTML, or raw content
15
+ - **AI Data Extraction** - Extract structured data using JSON Schema with LLM
16
+ - **Web Search** - Search the web via DuckDuckGo API
17
+ - **Screenshots** - Capture full-page screenshots
18
+ - **LinkedIn Scraping** - Scrape company profiles and person profiles
19
+ - **Instagram Scraping** - Scrape profiles and posts/reels
20
+ - **App Store Data** - Fetch reviews and details from Google Play & Apple App Store
21
+ - **Browser Automation** - Click, type, scroll, and execute JavaScript
22
+ - **TypeScript First** - Full type safety with comprehensive type definitions
23
+ - **Zero Dependencies** - Uses native fetch, works in Node.js 18+ and browsers
24
+
25
+ ## Installation
26
+
27
+ ```bash
28
+ npm install @crawlkit-sh/sdk
29
+ ```
30
+
31
+ ```bash
32
+ yarn add @crawlkit-sh/sdk
33
+ ```
34
+
35
+ ```bash
36
+ pnpm add @crawlkit-sh/sdk
37
+ ```
38
+
39
+ ## Quick Start
40
+
41
+ ```typescript
42
+ import { CrawlKit } from '@crawlkit-sh/sdk';
43
+
44
+ const crawlkit = new CrawlKit({ apiKey: 'ck_your_api_key' });
45
+
46
+ // Scrape a webpage
47
+ const page = await crawlkit.scrape({ url: 'https://example.com' });
48
+ console.log(page.markdown);
49
+ console.log(page.metadata.title);
50
+ ```
51
+
52
+ Get your API key at [crawlkit.sh](https://crawlkit.sh)
53
+
54
+ ## Examples
55
+
56
+ ### Web Scraping
57
+
58
+ Scrape any webpage and get clean, structured content:
59
+
60
+ ```typescript
61
+ const result = await crawlkit.scrape({
62
+ url: 'https://example.com/blog/article',
63
+ options: {
64
+ onlyMainContent: true, // Remove navigation, footers, etc.
65
+ waitFor: '#content', // Wait for element before scraping
66
+ }
67
+ });
68
+
69
+ console.log(result.markdown); // Clean markdown content
70
+ console.log(result.html); // Cleaned HTML
71
+ console.log(result.metadata.title); // Page title
72
+ console.log(result.metadata.author); // Author if available
73
+ console.log(result.links.internal); // Internal links found
74
+ console.log(result.links.external); // External links found
75
+ ```
76
+
77
+ ### AI-Powered Data Extraction
78
+
79
+ Extract structured data from any page using JSON Schema:
80
+
81
+ ```typescript
82
+ interface Product {
83
+ name: string;
84
+ price: number;
85
+ currency: string;
86
+ description: string;
87
+ inStock: boolean;
88
+ reviews: { rating: number; count: number };
89
+ }
90
+
91
+ const result = await crawlkit.extract<Product>({
92
+ url: 'https://example.com/product/123',
93
+ schema: {
94
+ type: 'object',
95
+ properties: {
96
+ name: { type: 'string' },
97
+ price: { type: 'number' },
98
+ currency: { type: 'string' },
99
+ description: { type: 'string' },
100
+ inStock: { type: 'boolean' },
101
+ reviews: {
102
+ type: 'object',
103
+ properties: {
104
+ rating: { type: 'number' },
105
+ count: { type: 'number' }
106
+ }
107
+ }
108
+ }
109
+ },
110
+ options: {
111
+ prompt: 'Extract product information from this e-commerce page'
112
+ }
113
+ });
114
+
115
+ // TypeScript knows result.json is Product
116
+ console.log(`${result.json.name}: $${result.json.price}`);
117
+ console.log(`In stock: ${result.json.inStock}`);
118
+ ```
119
+
120
+ ### Browser Automation
121
+
122
+ Handle SPAs, dynamic content, and interactive pages:
123
+
124
+ ```typescript
125
+ const result = await crawlkit.scrape({
126
+ url: 'https://example.com/spa',
127
+ options: {
128
+ waitFor: '.content-loaded',
129
+ actions: [
130
+ { type: 'click', selector: '#accept-cookies' },
131
+ { type: 'wait', milliseconds: 1000 },
132
+ { type: 'click', selector: '#load-more' },
133
+ { type: 'scroll', direction: 'down' },
134
+ { type: 'type', selector: '#search', text: 'query' },
135
+ { type: 'press', key: 'Enter' },
136
+ { type: 'wait', milliseconds: 2000 },
137
+ ]
138
+ }
139
+ });
140
+ ```
141
+
142
+ ### Web Search
143
+
144
+ Search the web and get structured results:
145
+
146
+ ```typescript
147
+ const result = await crawlkit.search({
148
+ query: 'typescript best practices 2024',
149
+ options: {
150
+ maxResults: 20,
151
+ timeRange: 'm', // Past month: 'd', 'w', 'm', 'y'
152
+ region: 'us-en'
153
+ }
154
+ });
155
+
156
+ for (const item of result.results) {
157
+ console.log(`${item.position}. ${item.title}`);
158
+ console.log(` ${item.url}`);
159
+ console.log(` ${item.snippet}\n`);
160
+ }
161
+ ```
162
+
163
+ ### Screenshots
164
+
165
+ Capture full-page screenshots:
166
+
167
+ ```typescript
168
+ const result = await crawlkit.screenshot({
169
+ url: 'https://example.com',
170
+ options: {
171
+ width: 1920,
172
+ height: 1080,
173
+ waitForSelector: '#main-content'
174
+ }
175
+ });
176
+
177
+ console.log('Screenshot URL:', result.url);
178
+ ```
179
+
180
+ ### LinkedIn Scraping
181
+
182
+ Scrape LinkedIn company and person profiles:
183
+
184
+ ```typescript
185
+ // Company profile
186
+ const company = await crawlkit.linkedin.company({
187
+ url: 'https://www.linkedin.com/company/openai',
188
+ options: { includeJobs: true }
189
+ });
190
+
191
+ console.log(company.company.name);
192
+ console.log(company.company.industry);
193
+ console.log(company.company.followers);
194
+ console.log(company.company.description);
195
+ console.log(company.company.employees);
196
+ console.log(company.company.jobs);
197
+
198
+ // Person profiles (batch up to 10)
199
+ const people = await crawlkit.linkedin.person({
200
+ url: [
201
+ 'https://www.linkedin.com/in/user1',
202
+ 'https://www.linkedin.com/in/user2'
203
+ ]
204
+ });
205
+
206
+ console.log(`Success: ${people.successCount}, Failed: ${people.failedCount}`);
207
+ people.persons.forEach(p => console.log(p.person));
208
+ ```
209
+
210
+ ### Instagram Scraping
211
+
212
+ Scrape Instagram profiles and content:
213
+
214
+ ```typescript
215
+ // Profile
216
+ const profile = await crawlkit.instagram.profile({
217
+ username: 'instagram'
218
+ });
219
+
220
+ console.log(profile.profile.full_name);
221
+ console.log(profile.profile.follower_count);
222
+ console.log(profile.profile.following_count);
223
+ console.log(profile.profile.biography);
224
+ console.log(profile.profile.posts); // Recent posts
225
+
226
+ // Post/Reel content
227
+ const post = await crawlkit.instagram.content({
228
+ shortcode: 'CxIIgCCq8mg' // or full URL
229
+ });
230
+
231
+ console.log(post.post.like_count);
232
+ console.log(post.post.comment_count);
233
+ console.log(post.post.video_url);
234
+ console.log(post.post.caption);
235
+ ```
236
+
237
+ ### App Store Data
238
+
239
+ Fetch app reviews and details:
240
+
241
+ ```typescript
242
+ // Google Play Store reviews with pagination
243
+ let cursor: string | null = null;
244
+ do {
245
+ const reviews = await crawlkit.appstore.playstoreReviews({
246
+ appId: 'com.example.app',
247
+ cursor,
248
+ options: { lang: 'en' }
249
+ });
250
+
251
+ reviews.reviews.forEach(r => {
252
+ console.log(`${r.rating}/5: ${r.text}`);
253
+ if (r.developerReply) {
254
+ console.log(` Reply: ${r.developerReply.text}`);
255
+ }
256
+ });
257
+
258
+ cursor = reviews.pagination.nextCursor;
259
+ } while (cursor);
260
+
261
+ // Google Play Store app details
262
+ const details = await crawlkit.appstore.playstoreDetail({
263
+ appId: 'com.example.app'
264
+ });
265
+
266
+ console.log(details.appName);
267
+ console.log(details.rating);
268
+ console.log(details.installs);
269
+ console.log(details.description);
270
+
271
+ // Apple App Store reviews
272
+ const iosReviews = await crawlkit.appstore.appstoreReviews({
273
+ appId: '123456789'
274
+ });
275
+ ```
276
+
277
+ ## Error Handling
278
+
279
+ The SDK provides typed error classes for different scenarios:
280
+
281
+ ```typescript
282
+ import {
283
+ CrawlKit,
284
+ CrawlKitError,
285
+ AuthenticationError,
286
+ InsufficientCreditsError,
287
+ ValidationError,
288
+ RateLimitError,
289
+ TimeoutError,
290
+ NotFoundError,
291
+ NetworkError
292
+ } from '@crawlkit-sh/sdk';
293
+
294
+ try {
295
+ const result = await crawlkit.scrape({ url: 'https://example.com' });
296
+ } catch (error) {
297
+ if (error instanceof AuthenticationError) {
298
+ console.log('Invalid API key');
299
+ } else if (error instanceof InsufficientCreditsError) {
300
+ console.log(`Not enough credits. Available: ${error.creditsRemaining}`);
301
+ } else if (error instanceof RateLimitError) {
302
+ console.log('Rate limit exceeded, please slow down');
303
+ } else if (error instanceof TimeoutError) {
304
+ console.log('Request timed out');
305
+ } else if (error instanceof ValidationError) {
306
+ console.log(`Invalid request: ${error.message}`);
307
+ } else if (error instanceof NetworkError) {
308
+ console.log(`Network error [${error.code}]: ${error.message}`);
309
+ } else if (error instanceof CrawlKitError) {
310
+ console.log(`API Error [${error.code}]: ${error.message}`);
311
+ console.log(`Status: ${error.statusCode}`);
312
+ if (error.creditsRefunded) {
313
+ console.log(`Credits refunded: ${error.creditsRefunded}`);
314
+ }
315
+ }
316
+ }
317
+ ```
318
+
319
+ ## Configuration
320
+
321
+ ```typescript
322
+ const crawlkit = new CrawlKit({
323
+ // Required: Your API key (get it at crawlkit.sh)
324
+ apiKey: 'ck_your_api_key',
325
+
326
+ // Optional: Custom base URL (default: https://api.crawlkit.sh)
327
+ baseUrl: 'https://api.crawlkit.sh',
328
+
329
+ // Optional: Default timeout in ms (default: 30000)
330
+ timeout: 60000,
331
+
332
+ // Optional: Custom fetch implementation
333
+ fetch: customFetch
334
+ });
335
+ ```
336
+
337
+ ## Credit Costs
338
+
339
+ | Operation | Credits |
340
+ |-----------|---------|
341
+ | `scrape()` | 1 |
342
+ | `extract()` | 5 |
343
+ | `search()` | 1 per page (~10 results) |
344
+ | `screenshot()` | 1 |
345
+ | `linkedin.company()` | 1 |
346
+ | `linkedin.person()` | 3 per URL |
347
+ | `instagram.profile()` | 1 |
348
+ | `instagram.content()` | 1 |
349
+ | `appstore.playstoreReviews()` | 1 per page |
350
+ | `appstore.playstoreDetail()` | 1 |
351
+ | `appstore.appstoreReviews()` | 1 per page |
352
+
353
+ ## TypeScript Support
354
+
355
+ This SDK is written in TypeScript and provides comprehensive type definitions for all methods and responses. Enable strict mode in your `tsconfig.json` for the best experience:
356
+
357
+ ```json
358
+ {
359
+ "compilerOptions": {
360
+ "strict": true
361
+ }
362
+ }
363
+ ```
364
+
365
+ ## Requirements
366
+
367
+ - Node.js 18.0.0 or higher (for native fetch support)
368
+ - Or any modern browser with fetch support
369
+
370
+ ## Documentation
371
+
372
+ For detailed API documentation and guides, visit [docs.crawlkit.sh](https://docs.crawlkit.sh)
373
+
374
+ ## Support
375
+
376
+ - [GitHub Issues](https://github.com/crawlkit/sdk/issues)
377
+ - [Documentation](https://docs.crawlkit.sh)
378
+ - Email: support@crawlkit.sh
379
+
380
+ ## License
381
+
382
+ MIT License - see [LICENSE](LICENSE) for details.
383
+
384
+ ---
385
+
386
+ Built with love by [CrawlKit](https://crawlkit.sh)