@sharpapi/sharpapi-node-web-scraping 1.0.1 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,12 +2,12 @@
2
2
 
3
3
  # Web Scraping API for Node.js
4
4
 
5
- ## 🌐 Extract web content and data with ease — powered by SharpAPI.
5
+ ## 🌐 Scrape web content with ease — powered by SharpAPI.
6
6
 
7
7
  [![npm version](https://img.shields.io/npm/v/@sharpapi/sharpapi-node-web-scraping.svg)](https://www.npmjs.com/package/@sharpapi/sharpapi-node-web-scraping)
8
8
  [![License](https://img.shields.io/npm/l/@sharpapi/sharpapi-node-web-scraping.svg)](https://github.com/sharpapi/sharpapi-node-client/blob/master/LICENSE.md)
9
9
 
10
- **SharpAPI Web Scraping** provides powerful web scraping capabilities including HTML extraction, text content extraction, structured data parsing, link extraction, and screenshot capture. Perfect for data aggregation, monitoring, and content extraction applications.
10
+ **SharpAPI Web Scraping** fetches and extracts content from web pages, providing structured data including page metadata, content, links, and more in a machine-readable JSON format. Perfect for data collection, content aggregation, SEO analysis, and research.
11
11
 
12
12
  ---
13
13
 
@@ -18,7 +18,12 @@
18
18
  3. [Usage](#usage)
19
19
  4. [API Documentation](#api-documentation)
20
20
  5. [Examples](#examples)
21
- 6. [License](#license)
21
+ 6. [Use Cases](#use-cases)
22
+ 7. [Response Format](#response-format)
23
+ 8. [AI Integration](#ai-integration)
24
+ 9. [API Endpoint](#api-endpoint)
25
+ 10. [Related Packages](#related-packages)
26
+ 11. [License](#license)
22
27
 
23
28
  ---
24
29
 
@@ -33,9 +38,9 @@
33
38
 
34
39
  ### Step 1. Install the package via npm:
35
40
 
36
- ```bash
41
+ \`\`\`bash
37
42
  npm install @sharpapi/sharpapi-node-web-scraping
38
- ```
43
+ \`\`\`
39
44
 
40
45
  ### Step 2. Get your API key
41
46
 
@@ -45,248 +50,357 @@ Visit [SharpAPI.com](https://sharpapi.com/) to get your API key.
45
50
 
46
51
  ## Usage
47
52
 
48
- ```javascript
53
+ \`\`\`javascript
49
54
  const { SharpApiWebScrapingService } = require('@sharpapi/sharpapi-node-web-scraping');
50
55
 
51
- const apiKey = process.env.SHARP_API_KEY;
56
+ const apiKey = process.env.SHARP_API_KEY; // Store your API key in environment variables
52
57
  const service = new SharpApiWebScrapingService(apiKey);
53
58
 
54
59
  async function scrapeWebsite() {
55
60
  try {
56
- // Extract content from a URL
57
- const content = await service.scrapeUrl('https://example.com');
58
- console.log('Title:', content.title);
59
- console.log('Content:', content.content);
60
-
61
- // Extract text content
62
- const text = await service.extractText('https://example.com');
63
- console.log('Text:', text);
64
-
65
- // Extract all links
66
- const links = await service.extractLinks('https://example.com');
67
- console.log(`Found ${links.length} links`);
61
+ // Scrape a webpage
62
+ const data = await service.scrapeUrl('https://sharpapi.com');
63
+
64
+ console.log('Title:', data.title);
65
+ console.log('Description:', data.meta_description);
66
+ console.log('Keywords:', data.meta_keywords);
67
+ console.log('Content length:', data.content.length);
68
+ console.log('Links found:', data.links.length);
68
69
  } catch (error) {
69
70
  console.error('Error:', error.message);
70
71
  }
71
72
  }
72
73
 
73
74
  scrapeWebsite();
74
- ```
75
+ \`\`\`
75
76
 
76
77
  ---
77
78
 
78
79
  ## API Documentation
79
80
 
80
- ### Methods
81
+ This endpoint is **synchronous** and returns data immediately (no polling required).
81
82
 
82
- #### `scrapeUrl(url: string): Promise<object>`
83
+ ### Method
83
84
 
84
- Scrape a webpage and extract its content (synchronous).
85
+ #### `scrapeUrl(url)`
85
86
 
86
- **Parameters:**
87
- - `url` (string, required): The URL to scrape
88
-
89
- **Returns:**
90
- - Scraped content including title, text, and metadata
91
-
92
- #### `scrapeHtml(url: string, options?: object): Promise<object>`
87
+ Scrape a webpage and extract its content in structured format.
93
88
 
94
- Extract raw HTML content from a webpage.
95
-
96
- **Options:**
97
- - `javascript` (boolean): Execute JavaScript (default: false)
98
- - `timeout` (number): Request timeout in ms (default: 30000)
99
- - `userAgent` (string): Custom user agent
100
- - `headers` (object): Custom HTTP headers
101
- - `proxy` (string): Proxy server URL
102
-
103
- #### `extractText(url: string, options?: object): Promise<object>`
89
+ **Parameters:**
90
+ - `url` (string, required): The URL to scrape (e.g., 'https://example.com' or 'example.com')
91
+
92
+ **Returns:** Promise<object> - Structured page data with metadata, content, and links
93
+
94
+ **Extracts:**
95
+ - **Page metadata**: Title, description, keywords, author
96
+ - **Open Graph tags**: OG:title, OG:description, OG:image, OG:type
97
+ - **Twitter Card tags**: Twitter:card, Twitter:title, Twitter:description
98
+ - **Content structure**: Headings (H1-H6), paragraphs, main content
99
+ - **Links**: Internal and external links with anchors
100
+ - **Meta information**: Language, charset, viewport, canonical URL
101
+ - **Timestamps**: Extraction date and time
102
+
103
+ **Example:**
104
+ \`\`\`javascript
105
+ const data = await service.scrapeUrl('https://example.com');
106
+ console.log('Page Title:', data.title);
107
+ console.log('Main Content:', data.content);
108
+ console.log('All Links:', data.links);
109
+ \`\`\`
104
110
 
105
- Extract clean text content from a webpage.
111
+ ---
106
112
 
107
- #### `extractLinks(url: string, options?: object): Promise<object>`
113
+ ## Examples
108
114
 
109
- Extract all links from a webpage.
115
+ ### Basic Web Scraping
110
116
 
111
- #### `extractStructuredData(url: string, options?: object): Promise<object>`
117
+ \`\`\`javascript
118
+ const { SharpApiWebScrapingService } = require('@sharpapi/sharpapi-node-web-scraping');
112
119
 
113
- Extract structured data (JSON-LD, microdata, etc.) from a webpage.
120
+ const service = new SharpApiWebScrapingService(process.env.SHARP_API_KEY);
114
121
 
115
- #### `takeScreenshot(url: string, options?: object): Promise<object>`
122
+ async function scrapeExample() {
123
+ const result = await service.scrapeUrl('https://sharpapi.com');
116
124
 
117
- Capture a screenshot of a webpage.
125
+ console.log('=== Page Information ===');
126
+ console.log('Title:', result.title);
127
+ console.log('Description:', result.meta_description);
128
+ console.log('Language:', result.language);
129
+ console.log('Canonical URL:', result.canonical_url);
118
130
 
119
- **Options:**
120
- - `fullPage` (boolean): Capture full page (default: false)
121
- - `width` (number): Viewport width (default: 1280)
122
- - `height` (number): Viewport height (default: 800)
123
- - `javascript` (boolean): Execute JavaScript (default: true)
131
+ console.log('\\n=== Content ===');
132
+ console.log('Characters:', result.content.length);
133
+ console.log('Preview:', result.content.substring(0, 200) + '...');
124
134
 
125
- ---
135
+ console.log('\\n=== Links Found ===');
136
+ console.log('Total links:', result.links.length);
137
+ result.links.slice(0, 5).forEach(link => {
138
+ console.log(\`- \${link.text || 'No text'}: \${link.url}\`);
139
+ });
140
+ }
126
141
 
127
- ## Examples
142
+ scrapeExample();
143
+ \`\`\`
128
144
 
129
- ### Price Monitoring
145
+ ### Extract Social Media Metadata
130
146
 
131
- ```javascript
147
+ \`\`\`javascript
132
148
  const service = new SharpApiWebScrapingService(process.env.SHARP_API_KEY);
133
149
 
134
- async function monitorPrice(productUrl) {
135
- const content = await service.scrapeUrl(productUrl);
150
+ async function getSocialMetadata(url) {
151
+ const data = await service.scrapeUrl(url);
136
152
 
137
- // Extract price from content
138
- const priceMatch = content.text.match(/\$(\d+\.\d{2})/);
139
- const price = priceMatch ? parseFloat(priceMatch[1]) : null;
153
+ console.log('=== Open Graph Tags ===');
154
+ console.log('OG:Title:', data.og_title);
155
+ console.log('OG:Description:', data.og_description);
156
+ console.log('OG:Image:', data.og_image);
157
+ console.log('OG:Type:', data.og_type);
140
158
 
141
- return {
142
- url: productUrl,
143
- price: price,
144
- title: content.title,
145
- timestamp: new Date().toISOString()
146
- };
159
+ console.log('\\n=== Twitter Card ===');
160
+ console.log('Card Type:', data.twitter_card);
161
+ console.log('Title:', data.twitter_title);
162
+ console.log('Description:', data.twitter_description);
163
+ console.log('Image:', data.twitter_image);
147
164
  }
148
165
 
149
- const priceData = await monitorPrice('https://example.com/product');
150
- console.log('Price:', priceData.price);
151
- ```
166
+ getSocialMetadata('https://example.com/article');
167
+ \`\`\`
152
168
 
153
- ### Content Aggregation
169
+ ### SEO Analysis
154
170
 
155
- ```javascript
171
+ \`\`\`javascript
156
172
  const service = new SharpApiWebScrapingService(process.env.SHARP_API_KEY);
157
173
 
158
- async function aggregateNews(urls) {
159
- const articles = await Promise.all(
160
- urls.map(async (url) => {
161
- const content = await service.scrapeUrl(url);
162
- return {
163
- title: content.title,
164
- text: content.text.substring(0, 200) + '...',
165
- url: url,
166
- scrapedAt: new Date()
167
- };
168
- })
169
- );
170
-
171
- return articles;
172
- }
174
+ async function analyzeSEO(url) {
175
+ const data = await service.scrapeUrl(url);
176
+
177
+ console.log('=== SEO Analysis ===');
178
+ console.log('Title:', data.title, \`(\${data.title.length} chars)\`);
179
+ console.log('Meta Description:', data.meta_description);
180
+ console.log('Keywords:', data.meta_keywords);
181
+ console.log('Canonical URL:', data.canonical_url);
182
+ console.log('Language:', data.language);
183
+
184
+ console.log('\\n=== Headings Structure ===');
185
+ if (data.headings) {
186
+ data.headings.forEach(heading => {
187
+ console.log(\`\${heading.level}: \${heading.text}\`);
188
+ });
189
+ }
173
190
 
174
- const newsUrls = [
175
- 'https://news-site.com/article-1',
176
- 'https://news-site.com/article-2'
177
- ];
191
+ console.log('\\n=== Link Analysis ===');
192
+ const internalLinks = data.links.filter(l => l.type === 'internal');
193
+ const externalLinks = data.links.filter(l => l.type === 'external');
194
+ console.log(\`Internal links: \${internalLinks.length}\`);
195
+ console.log(\`External links: \${externalLinks.length}\`);
196
+ }
178
197
 
179
- const articles = await aggregateNews(newsUrls);
180
- articles.forEach(article => {
181
- console.log(`\n${article.title}`);
182
- console.log(article.text);
183
- });
184
- ```
198
+ analyzeSEO('https://your-website.com');
199
+ \`\`\`
185
200
 
186
- ### SEO Analysis
201
+ ### Content Extraction for AI Processing
187
202
 
188
- ```javascript
203
+ \`\`\`javascript
189
204
  const service = new SharpApiWebScrapingService(process.env.SHARP_API_KEY);
190
205
 
191
- async function analyzeSEO(url) {
192
- const [html, links, structured] = await Promise.all([
193
- service.scrapeHtml(url),
194
- service.extractLinks(url),
195
- service.extractStructuredData(url)
196
- ]);
197
-
198
- const analysis = {
199
- url: url,
200
- title: html.title,
201
- meta: html.meta,
202
- wordCount: html.text.split(/\s+/).length,
203
- internalLinks: links.filter(l => l.internal).length,
204
- externalLinks: links.filter(l => !l.internal).length,
205
- hasStructuredData: Object.keys(structured).length > 0,
206
- structuredDataTypes: Object.keys(structured)
206
+ async function extractForAI(url) {
207
+ const data = await service.scrapeUrl(url);
208
+
209
+ // Extract clean content for AI processing
210
+ const cleanContent = {
211
+ title: data.title,
212
+ description: data.meta_description,
213
+ mainContent: data.content,
214
+ language: data.language,
215
+ author: data.author,
216
+ publishedDate: data.published_date,
217
+ modifiedDate: data.modified_date
207
218
  };
208
219
 
209
- return analysis;
220
+ console.log('Extracted content ready for AI processing:');
221
+ console.log(JSON.stringify(cleanContent, null, 2));
222
+
223
+ // Now you can pass this to SharpAPI AI endpoints:
224
+ // - Summarization: @sharpapi/sharpapi-node-summarize-text
225
+ // - Translation: @sharpapi/sharpapi-node-translate
226
+ // - Keywords: @sharpapi/sharpapi-node-generate-keywords
227
+ // - SEO Tags: @sharpapi/sharpapi-node-seo-tags
210
228
  }
211
229
 
212
- const seoReport = await analyzeSEO('https://example.com');
213
- console.log('SEO Analysis:', seoReport);
214
- ```
230
+ extractForAI('https://blog.example.com/article');
231
+ \`\`\`
215
232
 
216
- ### Website Screenshots
233
+ ### Competitor Analysis
217
234
 
218
- ```javascript
235
+ \`\`\`javascript
219
236
  const service = new SharpApiWebScrapingService(process.env.SHARP_API_KEY);
220
- const fs = require('fs');
221
237
 
222
- async function captureWebsite(url, outputPath) {
223
- const screenshot = await service.takeScreenshot(url, {
224
- fullPage: true,
225
- width: 1920,
226
- height: 1080
227
- });
238
+ async function analyzeCompetitor(url) {
239
+ const data = await service.scrapeUrl(url);
228
240
 
229
- // Save screenshot (base64 encoded)
230
- const buffer = Buffer.from(screenshot.data, 'base64');
231
- fs.writeFileSync(outputPath, buffer);
241
+ console.log('=== Competitor Analysis ===');
242
+ console.log('Domain:', new URL(url).hostname);
243
+ console.log('Title Strategy:', data.title);
244
+ console.log('Description:', data.meta_description);
245
+ console.log('Keywords Focus:', data.meta_keywords);
232
246
 
233
- console.log(`Screenshot saved to ${outputPath}`);
247
+ console.log('\\n=== Content Strategy ===');
248
+ console.log('Content Length:', data.content.length, 'characters');
249
+ console.log('Word Count (approx):', Math.round(data.content.split(' ').length));
250
+
251
+ console.log('\\n=== Link Building ===');
252
+ const externalLinks = data.links.filter(l => l.type === 'external');
253
+ console.log('External Links:', externalLinks.length);
254
+ externalLinks.slice(0, 10).forEach(link => {
255
+ console.log(\` - \${link.url}\`);
256
+ });
234
257
  }
235
258
 
236
- await captureWebsite('https://example.com', './screenshot.png');
237
- ```
259
+ analyzeCompetitor('https://competitor-website.com');
260
+ \`\`\`
238
261
 
239
262
  ---
240
263
 
241
264
  ## Use Cases
242
265
 
243
- - **Price Monitoring**: Track product prices across e-commerce sites
244
- - **Content Aggregation**: Collect articles and news from multiple sources
245
- - **SEO Analysis**: Analyze website structure and metadata
246
- - **Lead Generation**: Extract contact information from websites
247
- - **Competitive Intelligence**: Monitor competitor websites
248
- - **Data Collection**: Gather research data from web sources
249
- - **Website Monitoring**: Track website changes and updates
250
- - **Screenshot Services**: Generate website previews
266
+ - **Content Aggregation**: Collect content from multiple sources
267
+ - **Price Monitoring**: Track competitor pricing and availability
268
+ - **Research**: Gather data for analysis and insights
269
+ - **Lead Generation**: Extract business information from websites
270
+ - **Market Intelligence**: Monitor industry trends and news
271
+ - **SEO Analysis**: Analyze competitor websites and content
272
+ - **Content Curation**: Extract articles for content platforms
273
+ - **Social Media Monitoring**: Track mentions and brand presence
274
+ - **Data Enrichment**: Enhance existing data with web-sourced information
275
+ - **Competitive Intelligence**: Analyze competitor strategies
251
276
 
252
277
  ---
253
278
 
254
- ## Features
255
-
256
- - **Synchronous Processing**: Instant results, no polling
257
- - **JavaScript Execution**: Handle dynamic content
258
- - **Custom Headers**: Full control over requests
259
- - **Proxy Support**: Route requests through proxies
260
- - **Screenshot Capture**: Visual website representation
261
- - **Structured Data**: Extract JSON-LD and microdata
262
- - **Link Extraction**: Discover internal and external links
263
- - **Clean Text**: Remove HTML and extract readable content
279
+ ## Response Format
280
+
281
+ The API returns a comprehensive JSON object with the following structure:
282
+
283
+ \`\`\`json
284
+ {
285
+ "url": "https://sharpapi.com/",
286
+ "title": "SharpAPI - AI-Powered Workflow Automation API",
287
+ "meta_description": "Automate workflows with AI-powered API...",
288
+ "meta_keywords": "AI API, automation, workflow",
289
+ "author": "SharpAPI Team",
290
+ "language": "en",
291
+ "charset": "UTF-8",
292
+ "canonical_url": "https://sharpapi.com/",
293
+ "viewport": "width=device-width, initial-scale=1",
294
+
295
+ "og_title": "SharpAPI - AI-Powered API",
296
+ "og_description": "Automate your workflows...",
297
+ "og_image": "https://sharpapi.com/og-image.jpg",
298
+ "og_type": "website",
299
+ "og_url": "https://sharpapi.com/",
300
+
301
+ "twitter_card": "summary_large_image",
302
+ "twitter_title": "SharpAPI",
303
+ "twitter_description": "AI-Powered API",
304
+ "twitter_image": "https://sharpapi.com/twitter-card.jpg",
305
+
306
+ "content": "Full page content as text...",
307
+ "text_content": "Clean text without HTML...",
308
+
309
+ "headings": [
310
+ { "level": "h1", "text": "Main Heading" },
311
+ { "level": "h2", "text": "Subheading" }
312
+ ],
313
+
314
+ "links": [
315
+ {
316
+ "url": "https://sharpapi.com/about",
317
+ "text": "About Us",
318
+ "type": "internal",
319
+ "rel": null
320
+ },
321
+ {
322
+ "url": "https://example.com",
323
+ "text": "External Link",
324
+ "type": "external",
325
+ "rel": "nofollow"
326
+ }
327
+ ],
328
+
329
+ "images": [
330
+ {
331
+ "src": "https://sharpapi.com/image.jpg",
332
+ "alt": "Image description"
333
+ }
334
+ ],
335
+
336
+ "extracted_at": "2026-01-10T15:30:00Z",
337
+ "processing_time_ms": 1250
338
+ }
339
+ \`\`\`
264
340
 
265
341
  ---
266
342
 
267
- ## Best Practices
343
+ ## AI Integration
344
+
345
+ The extracted data can be seamlessly integrated with **SharpAPI's AI-powered endpoints** for further analysis:
346
+
347
+ ### Text Processing
348
+ - **[@sharpapi/sharpapi-node-summarize-text](https://www.npmjs.com/package/@sharpapi/sharpapi-node-summarize-text)** - Summarize extracted content
349
+ - **[@sharpapi/sharpapi-node-paraphrase](https://www.npmjs.com/package/@sharpapi/sharpapi-node-paraphrase)** - Rewrite content
350
+ - **[@sharpapi/sharpapi-node-translate](https://www.npmjs.com/package/@sharpapi/sharpapi-node-translate)** - Translate to other languages
351
+
352
+ ### SEO & Keywords
353
+ - **[@sharpapi/sharpapi-node-generate-keywords](https://www.npmjs.com/package/@sharpapi/sharpapi-node-generate-keywords)** - Extract keywords
354
+ - **[@sharpapi/sharpapi-node-seo-tags](https://www.npmjs.com/package/@sharpapi/sharpapi-node-seo-tags)** - Generate SEO tags
268
355
 
269
- 1. **Respect robots.txt**: Check website policies before scraping
270
- 2. **Rate Limiting**: Don't overwhelm target servers
271
- 3. **Error Handling**: Implement robust error handling
272
- 4. **Data Validation**: Validate extracted data
273
- 5. **Legal Compliance**: Ensure scraping is legal for your use case
356
+ ### Content Analysis
357
+ - **[@sharpapi/sharpapi-node-detect-spam](https://www.npmjs.com/package/@sharpapi/sharpapi-node-detect-spam)** - Detect spam content
358
+ - **[@sharpapi/sharpapi-node-product-review-sentiment](https://www.npmjs.com/package/@sharpapi/sharpapi-node-product-review-sentiment)** - Analyze sentiment
359
+
360
+ ### Example Integration
361
+
362
+ \`\`\`javascript
363
+ const { SharpApiWebScrapingService } = require('@sharpapi/sharpapi-node-web-scraping');
364
+ const { SharpApiSummarizeService } = require('@sharpapi/sharpapi-node-summarize-text');
365
+
366
+ const scrapingService = new SharpApiWebScrapingService(process.env.SHARP_API_KEY);
367
+ const summarizeService = new SharpApiSummarizeService(process.env.SHARP_API_KEY);
368
+
369
+ async function scrapeAndSummarize(url) {
370
+ // 1. Scrape the webpage
371
+ const scraped = await scrapingService.scrapeUrl(url);
372
+
373
+ // 2. Summarize the content
374
+ const statusUrl = await summarizeService.summarize(scraped.content);
375
+ const summary = await summarizeService.fetchResults(statusUrl);
376
+
377
+ console.log('Original length:', scraped.content.length);
378
+ console.log('Summary:', summary.getResultJson());
379
+ }
380
+
381
+ scrapeAndSummarize('https://blog.example.com/long-article');
382
+ \`\`\`
274
383
 
275
384
  ---
276
385
 
277
386
  ## API Endpoint
278
387
 
279
- **GET** `/utilities/scrape_url`
388
+ **GET** `/utilities/scrape_url?url={url}`
389
+
390
+ This endpoint is **synchronous** and returns 200 OK immediately.
280
391
 
281
392
  For detailed API specifications, refer to:
282
- - [Postman Documentation](https://documenter.getpostman.com/view/31106842/2sBXVeGsW6)
283
- - [Product Page](https://sharpapi.com/en/catalog/utility/web-scraping)
393
+ - [Postman Documentation](https://documenter.getpostman.com/view/31106842/2s9Ye8faUp)
394
+ - [Product Page](https://sharpapi.com/en/catalog/utility/web-scraping-api)
284
395
 
285
396
  ---
286
397
 
287
398
  ## Related Packages
288
399
 
289
- - [@sharpapi/sharpapi-node-detect-urls](https://www.npmjs.com/package/@sharpapi/sharpapi-node-detect-urls) - URL detection
400
+ - [@sharpapi/sharpapi-node-detect-urls](https://www.npmjs.com/package/@sharpapi/sharpapi-node-detect-urls) - Extract URLs from text
401
+ - [@sharpapi/sharpapi-node-detect-emails](https://www.npmjs.com/package/@sharpapi/sharpapi-node-detect-emails) - Extract emails from text
402
+ - [@sharpapi/sharpapi-node-summarize-text](https://www.npmjs.com/package/@sharpapi/sharpapi-node-summarize-text) - Summarize content
403
+ - [@sharpapi/sharpapi-node-seo-tags](https://www.npmjs.com/package/@sharpapi/sharpapi-node-seo-tags) - Generate SEO tags
290
404
  - [@sharpapi/sharpapi-node-client](https://www.npmjs.com/package/@sharpapi/sharpapi-node-client) - Full SharpAPI SDK
291
405
 
292
406
  ---
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@sharpapi/sharpapi-node-web-scraping",
3
- "version": "1.0.1",
3
+ "version": "1.0.2",
4
4
  "description": "SharpAPI.com Node.js SDK for Web Scraping API",
5
5
  "main": "src/index.js",
6
6
  "scripts": {
@@ -0,0 +1,7 @@
1
+ <claude-mem-context>
2
+ # Recent Activity
3
+
4
+ <!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
5
+
6
+ *No recent activity*
7
+ </claude-mem-context>
package/src/CLAUDE.md ADDED
@@ -0,0 +1,7 @@
1
+ <claude-mem-context>
2
+ # Recent Activity
3
+
4
+ <!-- This section is auto-generated by claude-mem. Edit content outside the tags. -->
5
+
6
+ *No recent activity*
7
+ </claude-mem-context>
@@ -2,6 +2,9 @@ const { SharpApiCoreService } = require('@sharpapi/sharpapi-node-core');
2
2
 
3
3
  /**
4
4
  * Service for accessing Web Scraping API using SharpAPI.com
5
+ *
6
+ * Fetches and extracts content from publicly accessible URLs, providing structured data
7
+ * including page metadata, content, links, and more in a machine-readable JSON format.
5
8
  */
6
9
  class SharpApiWebScrapingService extends SharpApiCoreService {
7
10
  /**
@@ -10,112 +13,36 @@ class SharpApiWebScrapingService extends SharpApiCoreService {
10
13
  * @param {string} [apiBaseUrl='https://sharpapi.com/api/v1'] - API base URL
11
14
  */
12
15
  constructor(apiKey, apiBaseUrl = 'https://sharpapi.com/api/v1') {
13
- super(apiKey, apiBaseUrl, '@sharpapi/sharpapi-node-web-scraping/1.0.1');
16
+ super(apiKey, apiBaseUrl, '@sharpapi/sharpapi-node-web-scraping/1.0.2');
14
17
  }
15
18
 
16
19
  /**
17
20
  * Scrape a webpage URL and extract its content (synchronous endpoint)
18
21
  *
19
- * @param {string} url - The URL of the webpage to scrape
20
- * @returns {Promise<object>} - The scraped content
21
- */
22
- async scrapeUrl(url) {
23
- const response = await this.makeRequest('GET', '/utilities/scrape_url', { url });
24
- return response.data;
25
- }
26
-
27
- /**
28
- * Scrape a webpage and extract its HTML content
22
+ * Fetches and extracts structured data from any publicly accessible URL, including:
23
+ * - Page title and metadata (description, keywords, author)
24
+ * - Open Graph and Twitter card details
25
+ * - Headers and meta tags (content type, viewport, canonical URL, charset)
26
+ * - Structured content extraction (headings, paragraphs, key text elements)
27
+ * - Internal and external links for site structure analysis
28
+ * - Language detection for localization
29
+ * - Timestamped results for tracking
29
30
  *
30
- * @param {string} url - The URL of the webpage to scrape
31
- * @param {object} [options] - Additional options for scraping
32
- * @param {boolean} [options.javascript=false] - Whether to execute JavaScript on the page
33
- * @param {number} [options.timeout=30000] - Timeout in milliseconds
34
- * @param {string} [options.userAgent] - Custom User-Agent string
35
- * @param {object} [options.headers] - Custom headers to send with the request
36
- * @param {string} [options.proxy] - Proxy to use for the request
37
- * @returns {Promise<object>} - The scraped HTML content and metadata
38
- */
39
- async scrapeHtml(url, options = {}) {
40
- const data = { url, ...options };
41
- const response = await this.makeRequest('POST', '/utility/web-scraping/html', data);
42
- return response.data;
43
- }
44
-
45
- /**
46
- * Extract structured data from a webpage
31
+ * @param {string} url - The URL of the webpage to scrape (e.g., 'https://example.com' or 'example.com')
32
+ * @returns {Promise<object>} - The scraped content with metadata, structured data, and links
47
33
  *
48
- * @param {string} url - The URL of the webpage to scrape
49
- * @param {object} [options] - Additional options for scraping
50
- * @param {boolean} [options.javascript=false] - Whether to execute JavaScript on the page
51
- * @param {number} [options.timeout=30000] - Timeout in milliseconds
52
- * @param {string} [options.userAgent] - Custom User-Agent string
53
- * @param {object} [options.headers] - Custom headers to send with the request
54
- * @param {string} [options.proxy] - Proxy to use for the request
55
- * @returns {Promise<object>} - The extracted structured data
34
+ * @example
35
+ * // Scrape a webpage
36
+ * const data = await service.scrapeUrl('https://sharpapi.com');
37
+ * console.log(data.title); // Page title
38
+ * console.log(data.description); // Meta description
39
+ * console.log(data.content); // Main content
40
+ * console.log(data.links); // Array of links found
56
41
  */
57
- async extractStructuredData(url, options = {}) {
58
- const data = { url, ...options };
59
- const response = await this.makeRequest('POST', '/utility/web-scraping/structured-data', data);
60
- return response.data;
61
- }
62
-
63
- /**
64
- * Take a screenshot of a webpage
65
- *
66
- * @param {string} url - The URL of the webpage to screenshot
67
- * @param {object} [options] - Additional options for screenshot
68
- * @param {boolean} [options.fullPage=false] - Whether to capture the full page or just the viewport
69
- * @param {number} [options.width=1280] - Viewport width
70
- * @param {number} [options.height=800] - Viewport height
71
- * @param {boolean} [options.javascript=true] - Whether to execute JavaScript on the page
72
- * @param {number} [options.timeout=30000] - Timeout in milliseconds
73
- * @param {string} [options.userAgent] - Custom User-Agent string
74
- * @param {object} [options.headers] - Custom headers to send with the request
75
- * @param {string} [options.proxy] - Proxy to use for the request
76
- * @returns {Promise<object>} - The screenshot data (base64 encoded)
77
- */
78
- async takeScreenshot(url, options = {}) {
79
- const data = { url, ...options };
80
- const response = await this.makeRequest('POST', '/utility/web-scraping/screenshot', data);
81
- return response.data;
82
- }
83
-
84
- /**
85
- * Extract text content from a webpage
86
- *
87
- * @param {string} url - The URL of the webpage to scrape
88
- * @param {object} [options] - Additional options for scraping
89
- * @param {boolean} [options.javascript=false] - Whether to execute JavaScript on the page
90
- * @param {number} [options.timeout=30000] - Timeout in milliseconds
91
- * @param {string} [options.userAgent] - Custom User-Agent string
92
- * @param {object} [options.headers] - Custom headers to send with the request
93
- * @param {string} [options.proxy] - Proxy to use for the request
94
- * @returns {Promise<object>} - The extracted text content
95
- */
96
- async extractText(url, options = {}) {
97
- const data = { url, ...options };
98
- const response = await this.makeRequest('POST', '/utility/web-scraping/text', data);
99
- return response.data;
100
- }
101
-
102
- /**
103
- * Extract links from a webpage
104
- *
105
- * @param {string} url - The URL of the webpage to scrape
106
- * @param {object} [options] - Additional options for scraping
107
- * @param {boolean} [options.javascript=false] - Whether to execute JavaScript on the page
108
- * @param {number} [options.timeout=30000] - Timeout in milliseconds
109
- * @param {string} [options.userAgent] - Custom User-Agent string
110
- * @param {object} [options.headers] - Custom headers to send with the request
111
- * @param {string} [options.proxy] - Proxy to use for the request
112
- * @returns {Promise<object>} - The extracted links
113
- */
114
- async extractLinks(url, options = {}) {
115
- const data = { url, ...options };
116
- const response = await this.makeRequest('POST', '/utility/web-scraping/links', data);
117
- return response.data;
42
+ async scrapeUrl(url) {
43
+ const response = await this.makeRequest('GET', '/utilities/scrape_url', { url });
44
+ return response;
118
45
  }
119
46
  }
120
47
 
121
- module.exports = { SharpApiWebScrapingService };
48
+ module.exports = { SharpApiWebScrapingService };