npm - google-search-scraper-api - Versions diffs - 0.0.1 - Mend

google-search-scraper-api 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 wordstotech
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,351 @@
+# google-search-scraper-api
+A Node.js client for scraping Google search results without owning the infrastructure. Wraps the ScrapingBee Google Search API behind a small, opinionated interface so you can ship a working SERP pipeline in an afternoon instead of a quarter.
+```bash
+npm install google-search-scraper-api
+```
+```js
+const { GoogleSearchScraper } = require('google-search-scraper-api');
+const scraper = new GoogleSearchScraper({ apiKey: 'YOUR-API-KEY' });
+const { organic_results } = await scraper.search({ query: 'best running shoes' });
+console.log(organic_results[0]);
+// { position: 1, title: '...', url: '...', description: '...' }
+```
+Get an API key (1,000 free credits, no card) at [scrapingbee.com](https://www.scrapingbee.com/).
+## Why this package exists
+I've shipped two SERP scrapers from scratch. Both worked for about six weeks.
+The first ran on a pool of cheap datacenter IPs. It hit ~300 successful requests before Google started serving the consent wall, then the sorry page, then nothing. The second swapped in residential proxies, added a headless Chromium fleet behind a Redis queue, and survived longer — until Google rotated three SERP layout variants in two months and every CSS selector in the parsing layer broke at once.
+By the third project I stopped pretending the maintenance cost was incidental. A managed google search scraper api is one HTTP call, one parser, no proxy budget, no headless browser fleet. This package is the thin wrapper I wish I'd had on day one: friendly options, real defaults, a few helpers for the patterns that come up every time you build SERP tooling.
+## What you get
+- **Structured JSON for every result type** — organic results, ads, featured snippets, People Also Ask, related searches, knowledge panels, top stories, image packs. No HTML parsing on your side.
+- **Geo-targeted SERPs** — country, language, device. Pass `country: 'de'` and you get Google.de from a German residential IP.
+- **News, images, shopping, video, maps verticals** via a single `searchType` flag.
+- **Pagination** that doesn't require remembering `start=10` increments.
+- **Sensible camelCase options** — the underlying API uses `country_code`, `nb_results`, `search_type`; you don't have to.
+- **Async-first** — every method returns a Promise, plays well with `Promise.all`, `p-limit`, queues, workers.
+## How it works
+Every `.search()` call sends a GET request to:
+```
+https://app.scrapingbee.com/api/v1/store/google?api_key=…&search=…
+```
+On ScrapingBee's side that triggers:
+1. Routing the request through a residential or stealth proxy (configurable)
+2. Loading the SERP in a real headless browser
+3. Parsing the rendered page into a JSON SERP schema
+4. Returning the structured result
+You pay credits per successful response. Failed requests (HTTP 500) aren't charged, so it's safe to retry. There's no caching layer — every call returns the live SERP, which is what you want for rank tracking and what you'll need to add yourself if you're polling the same query repeatedly.
+## API reference
+### `new GoogleSearchScraper({ apiKey, timeout })`
+| Option   | Type   | Default  | Description                          |
+| -------- | ------ | -------- | ------------------------------------ |
+| `apiKey` | string | required | Your ScrapingBee API key             |
+| `timeout`| number | `60000`  | Request timeout in milliseconds      |
+### `.search(options)`
+| Option       | Type      | Description                                                                                  |
+| ------------ | --------- | -------------------------------------------------------------------------------------------- |
+| `query`      | string    | The search query (required)                                                                  |
+| `country`    | string    | ISO-2 country code: `us`, `gb`, `de`, `fr`, `jp`, etc. Determines proxy geo + Google domain. |
+| `language`   | string    | UI / results language: `en`, `de`, `es`, `fr`, etc.                                          |
+| `device`     | string    | `desktop` or `mobile`                                                                        |
+| `page`       | number    | Page number, 1-based                                                                         |
+| `nbResults`  | number    | Results per page (10–100)                                                                    |
+| `searchType` | string    | `classic` (default), `news`, `images`, `videos`, `shopping`, `maps`                          |
+| `addHtml`    | boolean   | Include raw HTML in the response                                                             |
+| `extra`      | object    | Any additional ScrapingBee parameter passed through verbatim                                 |
+Returns a parsed JSON object — see the response shape below.
+## Response shape
+A successful call returns an object that looks like this (trimmed for clarity):
+```json
+{
+  "search_metadata": {
+    "query": "best running shoes",
+    "url": "https://www.google.com/search?q=best+running+shoes",
+    "number_of_results": 412000000
+  },
+  "organic_results": [
+    {
+      "position": 1,
+      "title": "The 12 Best Running Shoes of 2026 - Runner's World",
+      "url": "https://www.runnersworld.com/...",
+      "description": "We tested hundreds of pairs..."
+    }
+  ],
+  "featured_snippet": {
+    "title": "...",
+    "description": "...",
+    "url": "..."
+  },
+  "people_also_ask": [
+    { "question": "...", "answer": "...", "url": "..." }
+  ],
+  "related_searches": ["best running shoes for flat feet", "..."],
+  "top_stories": [],
+  "ads": [],
+  "knowledge_graph": {}
+}
+```
+Fields are present only when Google actually rendered them for that query. Always null-check.
+## Use cases (with working code)
+These are the four use cases I keep coming back to. Each one is a complete, runnable snippet — drop your API key in and run with `node example.js`.
+### 1. Rank tracking for an SEO dashboard
+You have a list of keywords, you want to know where a domain ranks for each, refreshed daily.
+```js
+const { GoogleSearchScraper } = require('google-search-scraper-api');
+const scraper = new GoogleSearchScraper({ apiKey: process.env.SB_KEY });
+const targetDomain = 'runnersworld.com';
+const keywords = ['best running shoes', 'best trail running shoes', 'best marathon shoes'];
+async function rankFor(query) {
+  const { organic_results } = await scraper.search({ query, country: 'us', nbResults: 100 });
+  const hit = organic_results.find(r => new URL(r.url).hostname.endsWith(targetDomain));
+  return hit ? hit.position : null;
+}
+for (const kw of keywords) {
+  console.log(kw, '→', await rankFor(kw));
+}
+```
+Two things worth noting from production:
+- `nbResults: 100` matters. If you only fetch the first page, anything ranking past position 10 looks like "not ranking" — same bug ate a week of my dashboard's accuracy until I noticed.
+- Polling daily? Add a 1–2 second jitter between calls. ScrapingBee handles concurrency on their side, but jittering also makes your own logs easier to debug.
+### 2. People Also Ask mining for content briefs
+When I build a content brief I want every PAA question Google fires for the head term plus a couple of variants — they're the cleanest signal for what the SERP audience is actually asking.
+```js
+const seeds = ['how to scrape google', 'google search scraping', 'scrape google search results'];
+const questions = new Set();
+for (const query of seeds) {
+  const { people_also_ask = [] } = await scraper.search({ query, country: 'us' });
+  for (const paa of people_also_ask) questions.add(paa.question);
+}
+console.log([...questions]);
+```
+The PAA box re-expands when you click each question on the live SERP; the API returns the first wave. For deeper trees, query the unique questions you got back as new seeds and dedupe.
+### 3. Local SERP monitoring across regions
+A client launches a product in five countries. They want to see what Google shows their users in each market on day one, day seven, day thirty.
+```js
+const markets = [
+  { country: 'us', language: 'en' },
+  { country: 'gb', language: 'en' },
+  { country: 'de', language: 'de' },
+  { country: 'fr', language: 'fr' },
+  { country: 'jp', language: 'ja' },
+];
+const query = 'noise cancelling headphones';
+const snapshots = await Promise.all(
+  markets.map(m => scraper.search({ query, ...m, device: 'mobile' }).then(r => ({ ...m, top3: r.organic_results.slice(0, 3) })))
+);
+for (const s of snapshots) console.log(s.country, s.top3.map(r => r.url));
+```
+Two production notes:
+- `device: 'mobile'` is the right default for most international markets — global mobile share crossed desktop years ago and Google's mobile SERP differs in feature set.
+- Don't fan out to 200 parallel requests on day one. Start with `Promise.all` for ~10–20, then move to a concurrency-limited queue (see use case 4) once you exceed your plan's concurrency cap.
+### 4. Bulk keyword scraping with controlled concurrency
+You're feeding 5,000 keywords into a database. You don't want to fire them all at once, and you want retries on transient failures.
+```js
+const pLimit = require('p-limit');
+const fs = require('fs/promises');
+const limit = pLimit(10); // tune to match your ScrapingBee plan's concurrency cap
+async function withRetry(fn, tries = 3) {
+  for (let i = 0; i < tries; i++) {
+    try { return await fn(); }
+    catch (err) { if (i === tries - 1) throw err; await new Promise(r => setTimeout(r, 1000 * (i + 1))); }
+  }
+}
+async function scrapeAll(keywords) {
+  const tasks = keywords.map(kw => limit(() => withRetry(() => scraper.search({ query: kw, country: 'us' }))));
+  return Promise.all(tasks);
+}
+const keywords = (await fs.readFile('./keywords.txt', 'utf8')).split('\n').filter(Boolean);
+const results = await scrapeAll(keywords);
+await fs.writeFile('./results.json', JSON.stringify(results, null, 2));
+```
+ScrapingBee's plan tiers cap concurrency at 10 / 50 / 100 / 200 depending on the plan — exceed it and the API returns a clear error. The `p-limit` value should match (or sit just below) your tier cap.
+### 5. News monitoring
+```js
+const { news_results = [] } = await scraper.search({
+  query: '"product launch" "your brand"',
+  searchType: 'news',
+  country: 'us',
+});
+for (const item of news_results) {
+  console.log(item.date, item.source, item.title, item.url);
+}
+```
+The `searchType: 'news'` flag returns Google News results with timestamps and publication metadata, which is what you actually want for a brand-monitoring pipeline. The classic SERP's Top Stories box is shallower.
+### 6. SERPs as RAG retrieval
+If you're building an LLM workflow that needs fresh web context — answering questions about events the model wasn't trained on, building a research agent, grounding a customer-support bot — a structured SERP is a cheap retrieval layer.
+```js
+async function searchContext(query) {
+  const { organic_results, featured_snippet, people_also_ask } = await scraper.search({ query, country: 'us', nbResults: 20 });
+  return {
+    answer: featured_snippet?.description,
+    sources: organic_results.slice(0, 5).map(r => ({ title: r.title, url: r.url, snippet: r.description })),
+    related_questions: people_also_ask?.map(p => p.question) ?? [],
+  };
+}
+// Pass `searchContext` output into your prompt as a tool-call result
+```
+## Patterns you'll need eventually
+### Retry on transient failures
+ScrapingBee doesn't charge credits on HTTP 500 — retrying is genuinely safe. Three tries with linear backoff (1s, 2s, 3s) covers almost every transient blip in practice.
+### Caching
+There's no server-side cache. If you're polling the same query inside a tight window, cache responses client-side with a key like `${query}|${country}|${device}|${page}`. Redis with a 6–24 hour TTL works well for rank tracking; in-memory `Map` is fine for short scripts.
+### Saving to CSV
+```js
+const { stringify } = require('csv-stringify/sync');
+const rows = results.flatMap(r => r.organic_results.map(o => ({
+  query: r.search_metadata.query,
+  position: o.position,
+  title: o.title,
+  url: o.url,
+})));
+const csv = stringify(rows, { header: true });
+```
+### TypeScript
+The package ships plain JavaScript with no `.d.ts` bundled (yet). You can type the responses by writing a minimal `SearchResponse` interface in your own project — only `organic_results`, `featured_snippet`, and `people_also_ask` are worth typing for most apps.
+## When this package is the wrong choice
+Honest answer matters more than a feature list. Skip ScrapingBee (and this wrapper) if:
+- **You need login-walled content.** ScrapingBee's terms prohibit scraping behind logins. LinkedIn private profiles, gated content, intranet pages — wrong tool.
+- **You're scraping 50 queries a month.** A free residential proxy and `cheerio` will get you there; you don't need a managed API.
+- **You need millisecond latency.** SERP scraping involves a headless browser render. Expect 3–8 seconds per request. Fine for batch jobs, wrong for interactive search-as-you-type.
+- **You want to avoid all SaaS dependencies.** This is a wrapper around an external API; the API going down means your scraper goes down.
+## Cost expectations
+You pay ScrapingBee per successful result, not per minute of CPU. The Google Search API costs 10 credits per "light" call and 15 credits per "normal" call as of 2026, so a plan giving you 250,000 credits a month covers roughly 16,000–25,000 SERP requests. Run the math against your keyword volume before scaling up — for most SEO tools the unit economics work; for very high-frequency tracking (every keyword every hour) they don't.
+Pricing details and the latest credit costs live on [ScrapingBee's pricing page](https://www.scrapingbee.com/pricing/).
+## FAQ
+### Is it legal to scrape Google search results?
+Public SERP data is generally legal to collect in most jurisdictions, particularly for SEO research, brand monitoring, and competitive analysis. Personal data, copyrighted content, and login-walled material are different categories — check your local regulations and Google's terms before scaling a project. ScrapingBee specifically prohibits post-login scraping in its terms of service.
+### Why not run my own headless browser?
+You can. I have. The hidden cost isn't the browser, it's the proxy rotation, the CAPTCHA solving, the layout-change parsing patches, the bot-detection arms race, and the engineer-hours that go into all of it. If your scraping volume is under a few thousand requests a month, DIY is cheaper. Above that, a managed google search scraper api is cheaper than the labour to maintain a homegrown one.
+### Does this work in serverless runtimes (Lambda, Vercel, Cloudflare Workers)?
+Yes — the only dependency is `axios`. Works in any Node.js 14+ environment. Cloudflare Workers needs Node compatibility enabled.
+### How do I scrape Google search results behind a specific location, not just country?
+Pass `country` and `language`, then for finer geo-targeting use the `extra` option to pass ScrapingBee's `geo` parameter (postal code or city). City-level targeting requires premium proxies, which are billed at a higher credit cost.
+### What about Google's other surfaces — Maps, Shopping, Images?
+`searchType: 'maps'`, `searchType: 'shopping'`, `searchType: 'images'`. Same response object, different result arrays (`local_results`, `shopping_results`, `image_results`).
+### How do I handle rate limits?
+ScrapingBee returns a 429 if you exceed your plan's concurrency cap. The fix is to lower your concurrency, not slow down requests inside the cap — use `p-limit` or any queue library and set the limit to your plan's concurrency number.
+### Can I use this for Google Scholar, Google Trends, Google Flights?
+This package targets the standard Google search verticals (classic, news, images, shopping, videos, maps). For Scholar and Trends, ScrapingBee has separate endpoints not exposed here yet — open an issue if you'd like them added.
+## Documentation
+- [ScrapingBee Google Search API documentation](https://www.scrapingbee.com/documentation/google-api/)
+- [ScrapingBee pricing](https://www.scrapingbee.com/pricing/)
+## License
+MIT. See [LICENSE](LICENSE).
+## Disclaimer
+This is an unofficial Node.js wrapper around ScrapingBee's Google Search API. It's not affiliated with ScrapingBee or Google. Compliance with Google's terms of service and applicable data-protection law is your responsibility as the operator of the scraper.

package/index.js ADDED Viewed

@@ -0,0 +1,51 @@
+const axios = require('axios');
+const ENDPOINT = 'https://app.scrapingbee.com/api/v1/store/google';
+class GoogleSearchScraper {
+  constructor({ apiKey, timeout = 60000 } = {}) {
+    if (!apiKey) {
+      throw new Error('GoogleSearchScraper requires an apiKey');
+    }
+    this.apiKey = apiKey;
+    this.timeout = timeout;
+  }
+  async search({
+    query,
+    country,
+    language,
+    device,
+    page,
+    nbResults,
+    searchType,
+    addHtml,
+    extra = {},
+  } = {}) {
+    if (!query) {
+      throw new Error('search() requires a "query" parameter');
+    }
+    const params = {
+      api_key: this.apiKey,
+      search: query,
+      ...extra,
+    };
+    if (country) params.country_code = country;
+    if (language) params.language = language;
+    if (device) params.device = device;
+    if (page !== undefined) params.page = page;
+    if (nbResults !== undefined) params.nb_results = nbResults;
+    if (searchType) params.search_type = searchType;
+    if (addHtml !== undefined) params.add_html = addHtml ? 'true' : 'false';
+    const response = await axios.get(ENDPOINT, {
+      params,
+      timeout: this.timeout,
+    });
+    return response.data;
+  }
+}
+module.exports = { GoogleSearchScraper, ENDPOINT };

package/package.json ADDED Viewed

@@ -0,0 +1,23 @@
+{
+  "name": "google-search-scraper-api",
+  "version": "0.0.1",
+  "description": "Node.js client for scraping Google Search results using the ScrapingBee Google Search API.",
+  "main": "index.js",
+  "scripts": {
+    "test": "echo \"Error: no test specified\" && exit 1"
+  },
+  "author": "wordstotech",
+  "license": "MIT",
+  "homepage": "https://www.scrapingbee.com/documentation/google-api/",
+  "dependencies": {
+    "axios": "^1.7.0"
+  },
+  "engines": {
+    "node": ">=14"
+  },
+  "files": [
+    "index.js",
+    "README.md",
+    "LICENSE"
+  ]
+}