@galihvsx/gmr-scraper 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +51 -0
- package/LICENSE +21 -0
- package/README.md +335 -0
- package/dist/cli.d.ts +2 -0
- package/dist/cli.js +681 -0
- package/dist/cli.js.map +1 -0
- package/dist/index.cjs +724 -0
- package/dist/index.cjs.map +1 -0
- package/dist/index.d.cts +150 -0
- package/dist/index.d.ts +150 -0
- package/dist/index.js +707 -0
- package/dist/index.js.map +1 -0
- package/package.json +74 -0
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [1.0.0] - 2024-12-24
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- π Initial release of gmr-scraper
|
|
13
|
+
- β¨ Modern TypeScript rewrite with full type safety
|
|
14
|
+
- π Bun runtime optimization with Node.js compatibility
|
|
15
|
+
- π Retry logic with exponential backoff
|
|
16
|
+
- πΎ Optional caching layer with TTL
|
|
17
|
+
- β‘ Rate limiting to prevent API blocks
|
|
18
|
+
- π Progress callbacks for real-time updates
|
|
19
|
+
- π§ CLI tool with multiple output formats (JSON, CSV, table)
|
|
20
|
+
- π¦ Batch processing for multiple locations
|
|
21
|
+
- π Streaming API for memory-efficient processing
|
|
22
|
+
- π Analytics and insights functions
|
|
23
|
+
- π Advanced filtering capabilities
|
|
24
|
+
- π Language support for internationalization
|
|
25
|
+
- π― Custom error classes for better debugging
|
|
26
|
+
- π Comprehensive documentation and examples
|
|
27
|
+
- β
Full TypeScript declarations
|
|
28
|
+
- ποΈ Production-ready build configuration
|
|
29
|
+
|
|
30
|
+
### Features
|
|
31
|
+
|
|
32
|
+
- Scrape Google Maps reviews with ease
|
|
33
|
+
- Support for all sort types (relevant, newest, highest_rating, lowest_rating)
|
|
34
|
+
- Pagination with configurable page limits
|
|
35
|
+
- Search query filtering
|
|
36
|
+
- Clean parsed output or raw data
|
|
37
|
+
- Concurrent batch scraping
|
|
38
|
+
- Review analytics (average rating, distribution, etc.)
|
|
39
|
+
- Filter by rating, text, images, responses
|
|
40
|
+
- ESM and CJS module support
|
|
41
|
+
|
|
42
|
+
### Technical
|
|
43
|
+
|
|
44
|
+
- Built with TypeScript 5.7
|
|
45
|
+
- Uses tsup for bundling
|
|
46
|
+
- Supports Bun and Node.js (>=18)
|
|
47
|
+
- Zero runtime dependencies (except CLI tools)
|
|
48
|
+
- Comprehensive error handling
|
|
49
|
+
- Configurable timeouts and retries
|
|
50
|
+
|
|
51
|
+
[1.0.0]: https://github.com/galihvsx/gmr-scraper/releases/tag/v1.0.0
|
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025+ Galih Putro Aji
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,335 @@
|
|
|
1
|
+
# πΊοΈ GMR-Scraper
|
|
2
|
+
|
|
3
|
+
<div align="center">
|
|
4
|
+
|
|
5
|
+

|
|
6
|
+

|
|
7
|
+

|
|
8
|
+

|
|
9
|
+
|
|
10
|
+
**Modern, production-ready Google Maps review scraper**
|
|
11
|
+
|
|
12
|
+
Built with TypeScript β’ Powered by Bun β’ Feature-rich CLI
|
|
13
|
+
|
|
14
|
+
[Features](#-features) β’ [Installation](#-installation) β’ [Quick Start](#-quick-start) β’ [Documentation](#-documentation) β’ [Examples](#-examples)
|
|
15
|
+
|
|
16
|
+
</div>
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## β¨ Features
|
|
21
|
+
|
|
22
|
+
### Core Capabilities
|
|
23
|
+
|
|
24
|
+
- π― **Scrape Google Maps Reviews** - Extract reviews from any Google Maps location
|
|
25
|
+
- π **Smart Pagination** - Automatic pagination with configurable limits
|
|
26
|
+
- π **Multi-Language Support** - Scrape reviews in any language
|
|
27
|
+
- π **Advanced Filtering** - Filter by rating, text, images, keywords
|
|
28
|
+
- π **Built-in Analytics** - Calculate ratings, distributions, and insights
|
|
29
|
+
|
|
30
|
+
### Production-Ready
|
|
31
|
+
|
|
32
|
+
- β‘ **Retry Logic** - Exponential backoff with configurable attempts
|
|
33
|
+
- πΎ **Caching Layer** - Optional in-memory cache with TTL
|
|
34
|
+
- π¦ **Rate Limiting** - Token bucket algorithm to prevent API blocks
|
|
35
|
+
- β±οΈ **Timeout Control** - Configurable request timeouts
|
|
36
|
+
- π¨ **Progress Callbacks** - Real-time scraping progress updates
|
|
37
|
+
|
|
38
|
+
### Advanced Features
|
|
39
|
+
|
|
40
|
+
- π¦ **Batch Processing** - Scrape multiple locations concurrently
|
|
41
|
+
- π **Streaming API** - Memory-efficient processing for large datasets
|
|
42
|
+
- π₯οΈ **CLI Tool** - Command-line interface with multiple output formats
|
|
43
|
+
- π§ **Full TypeScript** - Complete type safety and IntelliSense support
|
|
44
|
+
- π― **Custom Errors** - Detailed error classes for better debugging
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## π¦ Installation
|
|
49
|
+
|
|
50
|
+
### Using npm
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
npm install @galihvsx/gmr-scraper
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Using Bun
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
bun add @galihvsx/gmr-scraper
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
### Using Yarn
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
yarn add @galihvsx/gmr-scraper
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
### CLI (Global)
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
npm install -g @galihvsx/gmr-scraper
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## π Quick Start
|
|
77
|
+
|
|
78
|
+
### Basic Usage
|
|
79
|
+
|
|
80
|
+
```typescript
|
|
81
|
+
import { scraper } from "@galihvsx/gmr-scraper";
|
|
82
|
+
|
|
83
|
+
const reviews = await scraper("https://www.google.com/maps/place/...", {
|
|
84
|
+
sort_type: "newest",
|
|
85
|
+
pages: 5,
|
|
86
|
+
clean: true,
|
|
87
|
+
lang: "en",
|
|
88
|
+
});
|
|
89
|
+
|
|
90
|
+
console.log(`Found ${reviews.length} reviews`);
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### With Advanced Features
|
|
94
|
+
|
|
95
|
+
```typescript
|
|
96
|
+
import { scraper } from "@galihvsx/gmr-scraper";
|
|
97
|
+
|
|
98
|
+
const reviews = await scraper(url, {
|
|
99
|
+
sort_type: "newest",
|
|
100
|
+
pages: 10,
|
|
101
|
+
clean: true,
|
|
102
|
+
|
|
103
|
+
cache: {
|
|
104
|
+
enabled: true,
|
|
105
|
+
ttl: 300000,
|
|
106
|
+
},
|
|
107
|
+
|
|
108
|
+
retry: {
|
|
109
|
+
maxAttempts: 5,
|
|
110
|
+
initialDelay: 2000,
|
|
111
|
+
},
|
|
112
|
+
|
|
113
|
+
rateLimit: {
|
|
114
|
+
requestsPerSecond: 2,
|
|
115
|
+
},
|
|
116
|
+
|
|
117
|
+
onProgress: (current, total) => {
|
|
118
|
+
console.log(`Scraping page ${current}/${total}`);
|
|
119
|
+
},
|
|
120
|
+
});
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### CLI Usage
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
gmr-scraper scrape "https://www.google.com/maps/place/..." \
|
|
127
|
+
--sort newest \
|
|
128
|
+
--pages 5 \
|
|
129
|
+
--clean \
|
|
130
|
+
--output table
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## π Documentation
|
|
136
|
+
|
|
137
|
+
### API Options
|
|
138
|
+
|
|
139
|
+
| Option | Type | Default | Description |
|
|
140
|
+
| -------------- | --------------------------------------------------------------- | ------------ | -------------------------------------- |
|
|
141
|
+
| `sort_type` | `'relevant' \| 'newest' \| 'highest_rating' \| 'lowest_rating'` | `'relevant'` | Sort order for reviews |
|
|
142
|
+
| `pages` | `number \| 'max'` | `'max'` | Number of pages to scrape |
|
|
143
|
+
| `search_query` | `string` | `''` | Filter reviews by text |
|
|
144
|
+
| `clean` | `boolean` | `false` | Return parsed objects vs raw data |
|
|
145
|
+
| `lang` | `string` | `'en'` | Language code (e.g., 'en', 'id', 'es') |
|
|
146
|
+
| `cache` | `CacheOptions` | `undefined` | Enable caching with TTL |
|
|
147
|
+
| `retry` | `RetryOptions` | `undefined` | Configure retry logic |
|
|
148
|
+
| `rateLimit` | `RateLimitOptions` | `undefined` | Configure rate limiting |
|
|
149
|
+
| `timeout` | `number` | `30000` | Request timeout in ms |
|
|
150
|
+
| `onProgress` | `function` | `undefined` | Progress callback |
|
|
151
|
+
|
|
152
|
+
### Complete Guides
|
|
153
|
+
|
|
154
|
+
- π [API Documentation](./docs/API.md)
|
|
155
|
+
- π [Migration Guide](./docs/MIGRATION.md) - From google-maps-review-scraper
|
|
156
|
+
- π οΈ [Troubleshooting](./docs/TROUBLESHOOTING.md)
|
|
157
|
+
- π» [CLI Usage](./examples/cli-usage.md)
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## π‘ Examples
|
|
162
|
+
|
|
163
|
+
### Batch Processing
|
|
164
|
+
|
|
165
|
+
```typescript
|
|
166
|
+
import { batchScraper } from "@galihvsx/gmr-scraper";
|
|
167
|
+
|
|
168
|
+
const results = await batchScraper([url1, url2, url3], {
|
|
169
|
+
concurrency: 3,
|
|
170
|
+
includeAnalytics: true,
|
|
171
|
+
onProgress: (completed, total, url) => {
|
|
172
|
+
console.log(`${completed}/${total}: ${url}`);
|
|
173
|
+
},
|
|
174
|
+
});
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
### Streaming API
|
|
178
|
+
|
|
179
|
+
```typescript
|
|
180
|
+
import { scrapeStream } from "@galihvsx/gmr-scraper";
|
|
181
|
+
|
|
182
|
+
for await (const review of scrapeStream(url, { clean: true })) {
|
|
183
|
+
console.log(`${review.author.name}: ${review.review.rating}β`);
|
|
184
|
+
}
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
### Analytics & Filtering
|
|
188
|
+
|
|
189
|
+
```typescript
|
|
190
|
+
import { calculateAnalytics, filterReviews } from "@galihvsx/gmr-scraper";
|
|
191
|
+
|
|
192
|
+
const reviews = await scraper(url, { clean: true });
|
|
193
|
+
|
|
194
|
+
const analytics = calculateAnalytics(reviews);
|
|
195
|
+
console.log(`Average rating: ${analytics.averageRating}`);
|
|
196
|
+
|
|
197
|
+
const highRated = filterReviews(reviews, {
|
|
198
|
+
minRating: 4,
|
|
199
|
+
hasText: true,
|
|
200
|
+
keywords: ["excellent", "great"],
|
|
201
|
+
});
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
### More Examples
|
|
205
|
+
|
|
206
|
+
- [Basic Usage](./examples/basic.ts)
|
|
207
|
+
- [Advanced Features](./examples/advanced.ts)
|
|
208
|
+
- [Batch Processing](./examples/batch.ts)
|
|
209
|
+
- [Streaming API](./examples/streaming.ts)
|
|
210
|
+
|
|
211
|
+
---
|
|
212
|
+
|
|
213
|
+
## π Comparison with google-maps-review-scraper
|
|
214
|
+
|
|
215
|
+
| Feature | google-maps-review-scraper | @galihvsx/gmr-scraper |
|
|
216
|
+
| ---------------------- | -------------------------- | ----------------------- |
|
|
217
|
+
| **Language** | JavaScript | β
TypeScript |
|
|
218
|
+
| **Runtime** | Node.js only | β
Bun + Node.js |
|
|
219
|
+
| **Type Safety** | β No types | β
Full TypeScript |
|
|
220
|
+
| **Retry Logic** | β No | β
Exponential backoff |
|
|
221
|
+
| **Rate Limiting** | β No | β
Token bucket |
|
|
222
|
+
| **Caching** | β No | β
In-memory cache |
|
|
223
|
+
| **Progress Callbacks** | β No | β
Real-time updates |
|
|
224
|
+
| **CLI Tool** | β No | β
Full-featured CLI |
|
|
225
|
+
| **Batch Processing** | β No | β
Concurrent scraping |
|
|
226
|
+
| **Streaming API** | β No | β
Memory-efficient |
|
|
227
|
+
| **Analytics** | β No | β
Built-in insights |
|
|
228
|
+
| **Filtering** | β No | β
Advanced filters |
|
|
229
|
+
| **Error Handling** | Basic | β
Custom error classes |
|
|
230
|
+
| **Documentation** | Basic | β
Comprehensive |
|
|
231
|
+
| **Examples** | Limited | β
Extensive |
|
|
232
|
+
| **Build System** | β No | β
tsup bundling |
|
|
233
|
+
| **Dependencies** | 1 | β
0 (runtime) |
|
|
234
|
+
|
|
235
|
+
---
|
|
236
|
+
|
|
237
|
+
## π― Why Choose GMR-Scraper?
|
|
238
|
+
|
|
239
|
+
### 1. **Production-Ready**
|
|
240
|
+
|
|
241
|
+
Built with enterprise features like retry logic, rate limiting, and caching out of the box.
|
|
242
|
+
|
|
243
|
+
### 2. **Developer Experience**
|
|
244
|
+
|
|
245
|
+
Full TypeScript support, comprehensive documentation, and extensive examples.
|
|
246
|
+
|
|
247
|
+
### 3. **Performance**
|
|
248
|
+
|
|
249
|
+
Optimized for Bun runtime with streaming API for memory-efficient processing.
|
|
250
|
+
|
|
251
|
+
### 4. **Flexibility**
|
|
252
|
+
|
|
253
|
+
Use as a library or CLI tool. Batch processing or streaming. Your choice.
|
|
254
|
+
|
|
255
|
+
### 5. **Modern Stack**
|
|
256
|
+
|
|
257
|
+
Latest TypeScript, modern async/await patterns, and best practices.
|
|
258
|
+
|
|
259
|
+
---
|
|
260
|
+
|
|
261
|
+
## π οΈ Development
|
|
262
|
+
|
|
263
|
+
### Setup
|
|
264
|
+
|
|
265
|
+
```bash
|
|
266
|
+
git clone https://github.com/galihvsx/gmr-scraper.git
|
|
267
|
+
cd gmr-scraper
|
|
268
|
+
bun install
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
### Build
|
|
272
|
+
|
|
273
|
+
```bash
|
|
274
|
+
bun run build
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
### Test
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
bun test
|
|
281
|
+
bun run test:coverage
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
### Lint
|
|
285
|
+
|
|
286
|
+
```bash
|
|
287
|
+
bun run lint
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
---
|
|
291
|
+
|
|
292
|
+
## π€ Contributing
|
|
293
|
+
|
|
294
|
+
Contributions are welcome! Please read our [Contributing Guide](./CONTRIBUTING.md) for details.
|
|
295
|
+
|
|
296
|
+
### Contributors
|
|
297
|
+
|
|
298
|
+
Thanks to all contributors who have helped make this project better!
|
|
299
|
+
|
|
300
|
+
---
|
|
301
|
+
|
|
302
|
+
## π License
|
|
303
|
+
|
|
304
|
+
MIT License - see [LICENSE](./LICENSE) for details.
|
|
305
|
+
|
|
306
|
+
---
|
|
307
|
+
|
|
308
|
+
## π Acknowledgments
|
|
309
|
+
|
|
310
|
+
- Inspired by [google-maps-review-scraper](https://github.com/YasogaN/google-maps-review-scraper) by [@YasogaN](https://github.com/YasogaN)
|
|
311
|
+
- Special thanks to [@marin-m](https://github.com/marin-m) for [pbtk](https://github.com/marin-m/pbtk) research
|
|
312
|
+
|
|
313
|
+
---
|
|
314
|
+
|
|
315
|
+
## βοΈ Legal Disclaimer
|
|
316
|
+
|
|
317
|
+
This project is not affiliated with, endorsed by, or associated with Google LLC. All product and company names are trademarks of their respective holders.
|
|
318
|
+
|
|
319
|
+
**Educational Purpose:** This project is created for educational purposes and proof of concept. It demonstrates technical approaches for API integration and data processing.
|
|
320
|
+
|
|
321
|
+
**Non-Commercial:** This is a non-commercial, open-source project shared with the community to foster learning and collaboration.
|
|
322
|
+
|
|
323
|
+
**Responsible Use:** Users are responsible for complying with Google's Terms of Service and applicable laws when using this tool.
|
|
324
|
+
|
|
325
|
+
---
|
|
326
|
+
|
|
327
|
+
<div align="center">
|
|
328
|
+
|
|
329
|
+
**Made with β€οΈ by [Galih NH](https://github.com/galihvsx)**
|
|
330
|
+
|
|
331
|
+
β Star this repo if you find it useful!
|
|
332
|
+
|
|
333
|
+
[Report Bug](https://github.com/galihvsx/gmr-scraper/issues) β’ [Request Feature](https://github.com/galihvsx/gmr-scraper/issues) β’ [Documentation](./docs/API.md)
|
|
334
|
+
|
|
335
|
+
</div>
|
package/dist/cli.d.ts
ADDED