ai-scraper-fallback 0.0.2 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +47 -0
- package/package.json +11 -4
package/README.md
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
# AI Scraper Fallback 🤖
|
|
2
|
+
|
|
3
|
+
A robust HTML-to-JSON scraper fallback powered by Google Gemini AI. Never let a website layout change break your scraper again!
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## 🌏 Multilingual Introduction
|
|
8
|
+
|
|
9
|
+
### 🇺🇸 English
|
|
10
|
+
This package provides an intelligent fallback mechanism for web scrapers. When traditional CSS selectors fail due to website updates, this tool uses Gemini AI to "read" the HTML and extract structured data automatically.
|
|
11
|
+
|
|
12
|
+
### 🇹🇼 繁體中文 (Traditional Chinese)
|
|
13
|
+
這是一個基於 Gemini AI 的智慧型網頁爬蟲備援工具。當網站改版導致傳統的 CSS 選擇器失效時,此工具能自動啟動 AI 模式,「閱讀」網頁 HTML 並精準擷取結構化資料,讓你的爬蟲具備自動修復能力。
|
|
14
|
+
|
|
15
|
+
### 🇮🇩 Bahasa Indonesia (Susi, ini untukmu!)
|
|
16
|
+
Ini adalah alat canggih untuk mengambil data dari website secara otomatis. Jika website berubah tampilan dan kode biasa tidak jalan, alat ini menggunakan kecerdasan buatan (Gemini AI) untuk "membaca" website dan mengambil informasi yang kita butuhkan. Sangat membantu agar program tidak gampang rusak!
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## 🚀 Installation
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
npm install ai-scraper-fallback
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## 💻 Quick Start
|
|
27
|
+
|
|
28
|
+
```javascript
|
|
29
|
+
const { parseHousesWithAI } = require('ai-scraper-fallback');
|
|
30
|
+
|
|
31
|
+
async function start() {
|
|
32
|
+
const html = "<html>...your web content...</html>";
|
|
33
|
+
const apiKey = "your-gemini-api-key";
|
|
34
|
+
|
|
35
|
+
// Magic happens here!
|
|
36
|
+
const results = await parseHousesWithAI(html, 'Real Estate Web', apiKey);
|
|
37
|
+
console.log(results);
|
|
38
|
+
}
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
## 🛠 Features
|
|
42
|
+
- **Auto-healing**: Automatically handles website structural changes.
|
|
43
|
+
- **Structured Data**: Always returns clean JSON based on your requirements.
|
|
44
|
+
- **Powered by Gemini**: Uses the latest `gemini-2.5-flash` for high speed and accuracy.
|
|
45
|
+
|
|
46
|
+
## 📄 License
|
|
47
|
+
MIT
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ai-scraper-fallback",
|
|
3
|
-
"version": "0.0.
|
|
3
|
+
"version": "0.0.3",
|
|
4
4
|
"description": "A robust HTML-to-JSON scraper for real estate websites, using Google's Gemini API to extract structured data from complex web pages.",
|
|
5
5
|
"main": "index.js",
|
|
6
6
|
"scripts": {
|
|
@@ -10,9 +10,16 @@
|
|
|
10
10
|
"type": "git",
|
|
11
11
|
"url": "git+https://github.com/Sunpochin-Inc/ai-scraper-fallback.git"
|
|
12
12
|
},
|
|
13
|
-
"keywords": [
|
|
14
|
-
|
|
15
|
-
|
|
13
|
+
"keywords": [
|
|
14
|
+
"scraper",
|
|
15
|
+
"ai",
|
|
16
|
+
"gemini",
|
|
17
|
+
"fallback",
|
|
18
|
+
"house",
|
|
19
|
+
"realestate"
|
|
20
|
+
],
|
|
21
|
+
"author": "Sunpochin",
|
|
22
|
+
"license": "MIT",
|
|
16
23
|
"bugs": {
|
|
17
24
|
"url": "https://github.com/Sunpochin-Inc/ai-scraper-fallback/issues"
|
|
18
25
|
},
|