npm - opencode-skills-collection - Versions diffs - 3.0.36 → 3.0.38 - Mend

opencode-skills-collection 3.0.36 → 3.0.38

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (73) hide show

package/bundled-skills/hasdata/references/web-scraping.md ADDED Viewed

@@ -0,0 +1,159 @@
+# Web Scraping API — `POST /scrape/web`
+One endpoint to fetch any URL, optionally with JS rendering, proxies, AI extraction, and screenshots. Synchronous.
+> Reach for this only when the user gave you a specific URL, or when no Scraper API covers the field. Otherwise the platform-specific APIs return pre-extracted JSON without direct page access. Use only for public pages or content the user is authorized to access.
+## Minimal request
+```python
+import requests
+# Multiple outputs (or include "json") → response is a JSON object
+resp = requests.post(
+    "https://api.hasdata.com/scrape/web",
+    headers={"x-api-key": API_KEY},
+    json={"url": "https://example.com", "outputFormat": ["markdown", "json"]},
+    timeout=300,
+)
+data = resp.json()
+assert data["requestMetadata"]["status"] == "ok"
+print(data["markdown"])
+# Single non-JSON output → response IS the raw content (markdown/html/text bytes)
+resp = requests.post(
+    "https://api.hasdata.com/scrape/web",
+    headers={"x-api-key": API_KEY},
+    json={"url": "https://example.com", "outputFormat": ["markdown"]},
+    timeout=300,
+)
+print(resp.text)            # raw markdown — no JSON parsing
+```
+## Body parameters
+| Parameter | Type | Notes |
+|---|---|---|
+| `url` | string | **Required.** Absolute URL. |
+| `outputFormat` | string[] | `html`, `text`, `markdown`, `json`. **Single non-JSON format → raw content as the body** (not JSON-wrapped); multiple formats → JSON object with one key per format. Always include `"json"` (or another format) when you also need `requestMetadata`. |
+| `proxyType` | enum | `datacenter` (default) or `residential` — use residential only for authorized geo/availability testing where terms and access controls permit it. |
+| `proxyCountry` | string | ISO 3166-1 alpha-2 — `US`, `UK`, `DE`, `FR`, `IT`, `SE`, `BR`, `CA`, `JP`, `SG`, `IN`, `ID`, `IE`. |
+| `jsRendering` | bool | Headless browser — required for SPAs and dynamically-injected content. |
+| `wait` / `waitFor` | int (ms) / CSS string | Fixed delay vs. wait-until-selector. Prefer `waitFor`. |
+| `jsScenario` | array | Sequence of click/fill/wait/scroll/evaluate. Requires `jsRendering`. |
+| `headers` | object | Custom headers. **Cookies go here too — no separate `cookies` parameter.** |
+| `screenshot` | bool | Returns a CDN URL in the response. |
+| `extractRules` | object | CSS selectors → field text. `@attr` for attributes. **First match only**, missing → `null`. |
+| `aiExtractRules` | object | Typed LLM extraction. Types: `string`, `number`, `boolean`, `list`, `item`. |
+| `extractEmails` / `extractLinks` | bool | Quick helpers. |
+| `blockResources` / `blockAds` | bool | Skip images/CSS/ads — speeds text-only scrapes. |
+| `blockUrls` | string[] | Glob patterns to block subresources. |
+| `removeBase64Images` | bool | Strip inline base64 from response. |
+| `includeOnlyTags` / `excludeTags` | string[] | Trim DOM before serialization. |
+## CSS extraction (`extractRules`)
+```python
+"extractRules": {
+    "title": "h1",
+    "links": "a @href",   # @attr extracts attribute
+    "price": ".price-now",
+}
+```
+First match per selector. For lists of records, use `aiExtractRules` with `type: "list"`.
+## AI extraction (`aiExtractRules`)
+```python
+"aiExtractRules": {
+    "title":    {"type": "string"},
+    "price":    {"type": "number"},
+    "in_stock": {"type": "boolean"},
+    "tags":     {"type": "list", "description": "category tags"},
+    "author":   {"type": "item", "output": {
+        "name":     {"type": "string"},
+        "verified": {"type": "boolean"},
+    }},
+    "reviews":  {"type": "list", "output": {
+        "rating": {"type": "number"},
+        "text":   {"type": "string"},
+    }},
+}
+```
+Use when layout varies across pages; otherwise prefer `extractRules` for determinism and predictability.
+## JS scenarios
+```python
+"jsScenario": [
+    {"fill": ["#email", "user@example.com"]},
+    {"fill": ["#password", PASSWORD]},
+    {"click": "#login"},
+    {"waitFor": ".dashboard"},
+    {"scrollY": 2000},
+    {"waitForAndClick": ".load-more"},
+    {"evaluate": "window.__APP_STATE__"},
+]
+```
+Actions: `click`, `fill: [sel, val]`, `wait: ms`, `waitFor: sel`, `waitForAndClick: sel`, `scrollX/scrollY: px`, `evaluate: "JS"`. Sequential. Missing element on `click`/`fill` fails the request — wrap with `waitFor` first.
+## Auth via cookies
+```python
+"headers": {
+    "User-Agent": "Mozilla/5.0 ...",
+    "Cookie": "session=abc; csrf=xyz",
+    "Accept-Language": "en-US,en;q=0.9",
+}
+```
+Capture cookies once in a real browser (devtools → Storage → Cookies), forward via the `Cookie` header. Only with explicit user permission and authority to access that account/content; never use cookies to bypass someone else's access controls.
+## Slim response & speed
+```python
+{
+    "blockResources": True,                       # skip images/CSS/fonts
+    "blockAds": True,                             # skip ad/tracking
+    "blockUrls": ["**.googletagmanager.com/**", "**.doubleclick.net/**"],
+    "removeBase64Images": True,
+    "excludeTags": ["script", "style", "nav", "footer"],
+}
+```
+Reduces response size 60–90% on noisy pages.
+## Response shape
+The wrapper is JSON **only when** the response is JSON-wrapped — i.e. multiple `outputFormat` values, or a single value that includes `"json"`. With a single non-JSON format the response body is the raw content (`text/markdown`, `text/html`, `text/plain`).
+```json
+{
+  "requestMetadata": { "id": "uuid", "status": "ok", "url": "..." },
+  "headers": { "content-type": "text/html" },
+  "screenshot": "https://...jpeg",
+  "content": "<!DOCTYPE html>...",     // outputFormat: html
+  "markdown": "# Title\n...",           // outputFormat: markdown
+  "text":     "Title\n...",
+  "extractRules":    { ... },           // present iff sent
+  "aiExtractRules":  { ... },           // present iff sent
+  "extractedEmails": [ ... ],           // iff extractEmails: true
+  "extractedLinks":  [ ... ]            // iff extractLinks: true
+}
+```
+## Batch (`POST /scrape/batch/web`)
+Async wrapper for >1k URLs running the same extraction. Returns `jobId`; poll status, page `/results`. Per-batch cap **10,000 URLs**. For small workloads loop the sync endpoint at concurrency = plan limit.
+## Gotchas
+- **Disable `jsRendering` first**, enable only when the page needs it — most static pages parse fine without a headless browser.
+- **`waitFor` > `wait`.** Selector-based waits adapt to network speed.
+- **Cookies via `headers["Cookie"]` only.**
+- **`extractRules` returns first match** — for arrays use `aiExtractRules` `type: "list"`.
+- **Set client timeout ≥ 300 s** to match the server deadline.
+- **`requestMetadata.status === "ok"` is the only success signal.**

package/bundled-skills/hasdata/references/youtube.md ADDED Viewed

@@ -0,0 +1,186 @@
+# YouTube APIs
+| Endpoint | Returns |
+|---|---|
+| `/scrape/youtube/search` | Search results — videos, shorts, channels, playlists |
+| `/scrape/youtube/video` | Single video metadata (stats, captions, related) |
+| `/scrape/youtube/channel` | Channel home / videos / shorts / playlists / community |
+| `/scrape/youtube/transcript` | Full transcript with millisecond offsets |
+All synchronous `GET`. 10 credits each.
+## YouTube Search
+```python
+import requests
+resp = requests.get(
+    "https://api.hasdata.com/scrape/youtube/search",
+    headers={"x-api-key": API_KEY},
+    params={"q": "anthropic claude", "sortBy": "views", "date": "month"},
+    timeout=300,
+)
+for v in resp.json().get("videoResults", []):
+    print(v["title"], v.get("extractedViews"), v["link"])
+```
+### Query parameters
+| Param | Notes |
+|---|---|
+| `q` | **Required.** Free-text query. |
+| `sortBy` | `relevance` (default), `date`, `views`, `rating`, `popularity`. |
+| `date` | Upload window: `hour`, `today`, `week`, `month`, `year`. |
+| `length` | Duration bucket: `under4`, `between420`, `plus20`. |
+| `videoType` | `video`, `shorts`, `channel`, `playlist`, `movie`. |
+| `filters[]` | Feature flags ANDed: `hd`, `k4`, `hdr`, `subtitles`, `cc`, `d3`, `d360`, `vr180`, `live`, `bought`, `location`. |
+| `gl` / `hl` | Two-letter country / language codes. |
+| `deviceType` | `desktop`, `mobile`. |
+| `paginationToken` | Opaque cursor from the previous `pagination.nextPageToken`. |
+| `sp` | Raw YouTube `sp=` token (overrides `sortBy`, `date`, `videoType`, `length`, `filters[]`). |
+Response: `videoResults`, `shortsResults`, `channelResults`, `playlistResults`, `adsResults`, `sponsoredResults`, `searchInformation`, `pagination`.
+Per-video result keys (verified live): `videoId`, `title`, `link`, `channel`, `description`, `length`, `views`, `viewsOriginal`, `publishedDate`, `thumbnail`, `positionOnPage`. `channel` is an object — read `.channel.name` and `.channel.link`.
+## YouTube Video
+```python
+resp = requests.get(
+    "https://api.hasdata.com/scrape/youtube/video",
+    headers={"x-api-key": API_KEY},
+    params={"v": "dQw4w9WgXcQ"},
+    timeout=300,
+)
+```
+| Param | Notes |
+|---|---|
+| `v` | **Required.** 11-character YouTube video ID — the `v=` query value. |
+| `gl` / `hl` | Country / language. |
+| `deviceType` | `desktop` / `mobile`. |
+Top-level keys: `videoId`, `title`, `description`, `channel`, `views`, `extractedViews`, `likes`, `extractedLikes`, `lengthSeconds`, `publishedDate`, `keywords`, `captions`, `socialLinks`, `music`, `category`, `thumbnail`, `isFamilySafe`, `isUnlisted`, `relatedVideos`, `relatedShorts`, `endScreenVideos`, `requestMetadata`.
+Use `extractedViews` / `extractedLikes` (integers) for math; `views` / `likes` are the formatted strings.
+## YouTube Channel
+```python
+resp = requests.get(
+    "https://api.hasdata.com/scrape/youtube/channel",
+    headers={"x-api-key": API_KEY},
+    params={"channelId": "@MrBeast", "tab": "videos"},
+    timeout=300,
+)
+```
+| Param | Notes |
+|---|---|
+| `channelId` | **Required.** `@handle`, canonical `UC…` ID, or legacy `/c/<custom>` / `/user/<name>` slug. |
+| `tab` | `featured` (default), `videos`, `shorts`, `streams`, `playlists`, `posts` / `community`, `podcasts`, `releases`, `about`, `store`. |
+| `paginationToken` | Cursor for tabs that paginate. |
+| `gl` / `hl` / `deviceType` | Standard. |
+Response: `channelInfo`, `featuredVideo`, `sections[]`.
+`channelInfo` (verified live): `name`, `handle`, `channelId`, `channelUrl`, `avatar`, `banner`, `description`, `subscribers`, `extractedSubscribers`, `videosCount`, `extractedVideosCount`, `keywords[]`, `availableTabs[]`, `verified`, `websiteUrl`, `rssUrl`, `isFamilySafe`.
+`sections[]` each have `title` + `items[]` — shape depends on the tab. Iterate generically:
+```python
+for sec in resp.json().get("sections", []):
+    for item in sec.get("items", []):
+        ...
+```
+## YouTube Transcript
+```python
+resp = requests.get(
+    "https://api.hasdata.com/scrape/youtube/transcript",
+    headers={"x-api-key": API_KEY},
+    params={"v": "dQw4w9WgXcQ", "languageCode": "en"},
+    timeout=300,
+)
+text = " ".join(seg["snippet"] for seg in resp.json().get("transcript", []))
+```
+| Param | Notes |
+|---|---|
+| `v` | **Required.** 11-character video ID. |
+| `languageCode` | BCP-47 / YouTube code (`en`, `de`, `en-US`, `pt-BR`). Must exist on the video. |
+| `type` | `asr` to fetch the auto-generated speech-recognition track when no human captions exist. |
+Response: `transcript[]` and `availableTranscripts[]`.
+Each `transcript[]` entry: `startMs`, `endMs`, `snippet`, `startTimeText` (e.g. `"0:18"`).
+## Patterns
+### Search → video → transcript fan-out
+```python
+def topic_corpus(query, k=5):
+    search = requests.get(
+        "https://api.hasdata.com/scrape/youtube/search",
+        headers={"x-api-key": API_KEY},
+        params={"q": query, "sortBy": "views"}, timeout=300,
+    ).json()
+    docs = []
+    for v in search.get("videoResults", [])[:k]:
+        tr = requests.get(
+            "https://api.hasdata.com/scrape/youtube/transcript",
+            headers={"x-api-key": API_KEY},
+            params={"v": v["videoId"]}, timeout=300,
+        ).json()
+        docs.append({
+            "videoId": v["videoId"],
+            "title":   v["title"],
+            "url":     v["link"],
+            "text":    " ".join(s["snippet"] for s in tr.get("transcript", [])),
+        })
+    return docs
+```
+### Channel velocity
+```python
+def channel_velocity(handle):
+    page = requests.get(
+        "https://api.hasdata.com/scrape/youtube/channel",
+        headers={"x-api-key": API_KEY},
+        params={"channelId": handle, "tab": "videos"}, timeout=300,
+    ).json()
+    return [
+        {"date": it.get("publishedDate"),
+         "views": it.get("extractedViews"),
+         "title": it.get("title")}
+        for sec in page.get("sections", [])
+        for it in sec.get("items", [])
+    ]
+```
+### Timestamp search inside a transcript
+```python
+def mentions(video_id, needle):
+    tr = requests.get(
+        "https://api.hasdata.com/scrape/youtube/transcript",
+        headers={"x-api-key": API_KEY},
+        params={"v": video_id}, timeout=300,
+    ).json()
+    return [(s["startTimeText"], s["snippet"])
+            for s in tr.get("transcript", [])
+            if needle.lower() in s["snippet"].lower()]
+```
+## Gotchas
+- **`v` is the 11-char ID, not the URL.** Strip the `v=` value first.
+- **`languageCode` must exist on the video.** Inspect `availableTranscripts[]` if a fetch fails, then retry.
+- **`type=asr` is required when no human-authored caption track exists.** Otherwise the API errors on videos with auto-only captions.
+- **`@handle` resolves to the same channel as the canonical `UC…` ID.** Prefer handles for readability.
+- **Pagination tokens are opaque** — pass them back verbatim via `paginationToken`.
+- **`extractedViews` / `extractedLikes`** are integers; `views` / `likes` are formatted strings. Use the integer fields for arithmetic.
+- **`channelInfo.rssUrl`** is the canonical RSS feed for the channel — use it to subscribe in podcast clients without scraping.

package/bundled-skills/hasdata-cli/SKILL.md ADDED Viewed

@@ -0,0 +1,169 @@
+---
+name: hasdata-cli
+description: Command-line access to search, scraping, and structured web data.
+risk: safe
+source: official
+source_type: official
+source_repo: HasData/hasdata-cli
+license: MIT
+license_source: "https://github.com/HasData/hasdata-cli/blob/main/LICENSE"
+date_added: "2026-06-04"
+---
+# hasdata
+Use the `hasdata` CLI for real-time web data. One subcommand per API — flags, enums, defaults are derived from the live schema at `api.hasdata.com/apis`.
+## When to Use
+Use this skill when:
+- The user wants to use the HasData CLI.
+- The user needs current web data from the command line.
+- The user wants to automate data collection in scripts.
+- The user wants to retrieve search, ecommerce, travel, or local business data.
+- The user needs web-page scraping through the CLI.
+## Prerequisites
+- `command -v hasdata` — if missing, download the installer from `https://raw.githubusercontent.com/HasData/hasdata-cli/main/install.sh`, inspect it, then run it locally with `sh install.sh`.
+- One-time setup: the user runs `hasdata configure`, pastes their API key, and it's saved to `~/.hasdata/config.yaml` (mode 0600). Every future call picks it up automatically.
+- If a call fails with `no API key configured`, the user hasn't run `hasdata configure` yet — tell them to. **Never invent a key.**
+## Quick start
+```bash
+hasdata <api> --flag value [--flag value ...] --raw | jq .
+```
+Always pass `--raw` when piping to `jq` (skips pretty-print and TTY detection). Use `--pretty` only for human-readable terminal output.
+## Picking the right subcommand
+| User intent | Subcommand |
+| --- | --- |
+| Web search ("what does Google say about…") | `google-serp` (full features) or `google-serp-light` (cheap, single page) |
+| Latest news | `google-news` |
+| AI Mode SERP | `google-ai-mode` |
+| Shopping / product prices | `google-shopping` (broad), `amazon-search` / `amazon-product` (Amazon), `shopify-products` (Shopify) |
+| Immersive product page | `google-immersive-product` |
+| Maps / places / reviews | `google-maps`, `google-maps-place`, `google-maps-reviews`, `google-maps-photos`, `google-maps-posts` |
+| Yelp / YellowPages local data | `yelp-search`, `yelp-place`, `yellowpages-search`, `yellowpages-place` |
+| Real-estate listings (homes for sale/rent/sold) | `zillow-listing`, `redfin-listing` |
+| Real-estate single property deep dive | `zillow-property`, `redfin-property` |
+| Travel — short-term rentals | `airbnb-listing`, `airbnb-property` |
+| Travel — hotels / lodging | `booking-search`, `booking-place` |
+| Travel — flights | `google-flights` |
+| Jobs | `indeed-listing`, `indeed-job`, `glassdoor-listing`, `glassdoor-job` |
+| Bing search | `bing-serp` |
+| Trends | `google-trends` |
+| Images | `google-images` |
+| Short videos | `google-short-videos` |
+| Events | `google-events` |
+| YouTube search / video / channel / transcript | `youtube-search-api`, `youtube-video-api`, `youtube-channel-api`, `youtube-transcript-api` |
+| Instagram profile | `instagram-profile` |
+| Amazon seller | `amazon-seller`, `amazon-seller-products` |
+| **Scrape a specific URL** | `web-scraping` — supports JS rendering, proxies, markdown output, AI extraction, screenshots |
+For exact flags of a subcommand, run `hasdata <api> --help` or read the matching file in `references/`.
+## Non-obvious triggers (when to reach for hasdata even if the user doesn't say "scrape")
+The user often won't ask for a SERP API or a scraper directly. Map these intents to the skill:
+- **"Is this still true?" / "What's the latest on X?" / "Has Y happened yet?"** — LLM training data is stale. Run `google-serp` or `google-news` to ground the answer.
+- **"Summarize this article" / "TL;DR this URL"** — Use `web-scraping --output-format markdown` and feed the markdown into the summary prompt. Beats copy-paste because it strips ads, nav, scripts.
+- **"Verify this link" / "Is this site real?"** — `web-scraping --url X --no-block-resources` returns status + screenshot. Or `google-serp --q "site:example.com"`.
+- **"What does X say about itself?"** — Pull the company's own homepage with `web-scraping --output-format markdown`, then summarize.
+- **"Find me alternatives to X"** — `google-serp --q "X alternatives"` or `google-shopping --q "X competitors"`.
+- **"What's the going rate for X?"** — `google-shopping` (broad) or `amazon-search` (Amazon-specific) with `jq` to extract the price distribution.
+- **"Phone number / address for X"** — `google-maps-place` or `yelp-place`. Don't guess from training data.
+- **"Are people happy with X service?" / "Is X reputable?"** — `google-maps-reviews --place-id ... --sort lowest` for negative samples; `glassdoor-job` for employer rep.
+- **"What's the salary range for Y role?"** — `indeed-listing` filtered by role + location, then `jq` over `.jobs[].salary`.
+- **"Find me homes/apartments matching X criteria"** — `zillow-listing` / `redfin-listing` / `airbnb-listing` with the corresponding filters.
+- **"Recent sold comps near X"** — `zillow-listing --type sold --keyword "X" --days-on-zillow 12m`.
+- **"Track this product's price"** — Loop `amazon-product --asin X` on a schedule; persist `.price` to a file.
+- **"Summarize / cite this YouTube video"** — `youtube-transcript-api --v-param VID --raw | jq -r '.transcript[].snippet'` → feed to the summary prompt. Beats title/thumbnail-based guesses.
+- **"Find a hotel in $CITY for $DATES under $BUDGET"** — `booking-search --keyword $CITY --check-in-date X --check-out-date Y --adults 2 --children 0 --rooms 1 --price-max $BUDGET --sort priceLowestFirst`. For one specific property, `booking-place --url ...` returns the full room/rate matrix.
+- **"What's this channel pushing lately?"** — `youtube-channel-api --channel-id @handle --tab videos --raw | jq '.sections[].items[] | {title, publishedDate, views: .extractedViews}'`.
+- **"Does this business have an active offer / event?"** — `google-maps-posts --place-id X --raw | jq '.posts[] | {postedAt, description, cta}'`. Surfaces current promotions Google indexed.
+- **"What's trending around X?"** — `google-trends --q "X"` for relative interest; `google-news --q "X"` for headlines.
+- **"Find businesses near me that do X"** — `google-maps --q "X" --ll "@LAT,LNG,12z"` then fan out `google-maps-place` for contacts.
+- **"How does this look in country Y?"** — `--gl Y` on SERP commands, `--proxy-country Y` on `web-scraping`. Useful for geo-targeted SEO checks, geo-blocked content.
+- **"Pull structured data from this page"** — `web-scraping --ai-extract-rules-json '{"price": {"type": "number"}, ...}'`. Works on arbitrary pages without writing CSS selectors.
+- **"List of items → per-item details"** — Pattern: search command produces IDs/URLs, pipe through `xargs` into the matching `*-property` / `*-product` / `*-place` deep-dive command.
+- **"Find this person's role / employer / LinkedIn / followers"** — `google-serp --q '"Person Name" linkedin'` first. The organic-result title is typically `Name — Role at Company | LinkedIn` and the snippet carries location, headline, connection count. SERP often answers the whole question without ever opening the profile page.
+- **"What is company X doing? Where's their HQ? Who works there?"** — `google-serp --q "$COMPANY"` returns a `.knowledge_graph` block with founder, HQ, founded year, parent, employee range — pre-extracted. `google-news --q "$COMPANY"` for recent activity. Specific facts via targeted SERP: `--q '"$COMPANY" headquarters'`, `--q '"$COMPANY" funding'`, `--q 'site:linkedin.com/company "$COMPANY"'`.
+- **"Find public contact channels for company X"** — start with SERP: `--q '"@example.com"'` often surfaces publicly indexed business addresses. For personal emails or phone numbers, require a legitimate purpose, user authorization, and privacy-law/terms compliance; disclose unverified guesses.
+- **"Enrich this CSV of leads"** — per row: `google-serp` for LinkedIn, role, employer; another SERP to verify email or pattern. Stay in SERP unless a specific field is missing.
+- **Reverse-lookup (email / phone / domain → identity)** — `google-serp` with the literal value in quotes (`--q '"jane@x.com"'`, `--q '"+1 555 123 4567"'`, `--q '"acme corp" site:example.com'`) almost always surfaces the matching person or business.
+**SERP-first principle**: for any data-enrichment intent (people, companies, emails, products, places), reach for `google-serp` / `google-news` / `google-shopping` / `google-maps` first. They return Google's already-extracted structured fields (`.knowledge_graph`, `.organic_results[].snippet`, `.local_results[]`, etc.) without direct access to the target site. Only escalate to `web-scraping` when SERP doesn't surface the specific field you need, the data is public or authorized, and the target's terms/access controls allow it. See `references/enrichment.md`.
+If a user request matches one of the above and you don't invoke hasdata, you're probably hallucinating a stale answer.
+## Universal flag patterns
+- **Kebab-case** flag names. The CLI maps them back to the original camelCase before sending to the API.
+- **Booleans defaulting to `true`** have a paired negation: `--no-block-ads`, `--no-screenshot`, `--no-js-rendering`, `--no-extract-emails`, `--no-block-resources`. Setting both `--block-ads` and `--no-block-ads` errors.
+- **Anything ending in `-json`** accepts:
+    - inline JSON: `--extract-rules-json '{"title":"h1"}'`
+    - file: `--extract-rules-json @rules.json`
+    - stdin: `cat rules.json | hasdata web-scraping ... --extract-rules-json -`
+- **Repeatable key=value** flags split on the first `=` (so values containing `=` survive): `--headers User-Agent=foo --headers Cookie=session=abc`. Pair with `--headers-json` for a JSON base; kv items override per key.
+- **List flags** accept either repeats or comma-joined: `--lr lang_en --lr lang_fr` or `--lr lang_en,lang_fr`. Serialized as `key[]=value` for GET endpoints.
+- **Enum flags** validate client-side. If you guess wrong, the error lists the allowed values — read the message and retry.
+## Global flags (apply to every subcommand)
+| Flag | Effect |
+| --- | --- |
+| `--raw` | Write response bytes as-is (use this when piping to `jq`) |
+| `--pretty` | Pretty-print JSON (default when stdout is a TTY) |
+| `-o, --output FILE` | Write response to file instead of stdout (works for binary like screenshots) |
+| `--verbose` | Log outgoing URL and `X-RateLimit-*` headers to stderr |
+| `--api-key KEY` | Override env var (rarely needed) |
+| `--timeout DURATION` | Per-request timeout (default 2m) |
+| `--retries N` | Max retries on 429/5xx (default 2) |
+## Output contract
+Responses are JSON. Pipe through `jq` for extraction:
+```bash
+hasdata google-serp --q "espresso machine" --num 10 --raw \
+  | jq -c '.organic_results[] | {title, link, snippet}'
+```
+For real-estate / e-commerce results, the array shape is API-specific — read a single response with `--pretty` first to learn the schema, then write the `jq` filter.
+## Exit codes (script-safe)
+| Code | Meaning |
+| --- | --- |
+| 0 | success |
+| 1 | user / CLI-input error (missing required flag, bad enum value, missing API key) |
+| 2 | network error |
+| 3 | API returned 4xx (auth, quota, validation) |
+| 4 | API returned 5xx |
+## References
+- [`references/enrichment.md`](references/enrichment.md) — **person and company enrichment** (LinkedIn lookup, emails, HQ/funding/news, CSV-row enrichment, reverse-lookup) — the highest-leverage cross-API workflows
+- [`references/search.md`](references/search.md) — Google SERP / Bing / News / Trends flag catalog
+- [`references/web-scraping.md`](references/web-scraping.md) — `web-scraping` flags, JS scenarios, AI extraction
+- [`references/real-estate.md`](references/real-estate.md) — Zillow / Redfin filters and bracketed params
+- [`references/travel.md`](references/travel.md) — Airbnb / Booking / Google Flights (lodging + transport)
+- [`references/ecommerce.md`](references/ecommerce.md) — Amazon / Shopify
+- [`references/local-business.md`](references/local-business.md) — Maps (search/place/reviews/photos/posts) / Yelp / YellowPages
+- [`references/jobs.md`](references/jobs.md) — Indeed / Glassdoor
+- [`references/youtube.md`](references/youtube.md) — search / video / channel / transcript
+- [`references/all-commands.md`](references/all-commands.md) — full subcommand index with credit costs
+## Limitations
+* Requires access to HasData services and valid credentials.
+* Data quality and available fields depend on the target website and extraction method used.
+* Website changes can impact extraction results and may require adjustments to extraction logic.
+* Rate limits, quotas, and account restrictions may apply depending on the endpoint and subscription plan.

package/bundled-skills/hasdata-cli/references/all-commands.md ADDED Viewed

@@ -0,0 +1,107 @@
+# All commands
+Authoritative source: `hasdata --help`. This file is a snapshot — when in doubt, run `hasdata <api> --help` directly.
+## Search
+| Command | Cost | Notes |
+| --- | --- | --- |
+| `google-serp` | 10 | Full Google SERP — organic, ads, knowledge graph, PAA, AI overview |
+| `google-serp-light` | 5 | Cheap single-page SERP |
+| `google-ai-mode` | 5 | Google AI Mode answer |
+| `google-news` | 10 | Google News results |
+| `google-shopping` | 10 | Google Shopping results |
+| `google-immersive-product` | 5 | Immersive product page details |
+| `google-events` | 5 | Google Events |
+| `google-short-videos` | 10 | Short videos panel |
+| `google-trends` | 5 | Search trends data |
+| `google-images` | 5 | Image search |
+| `bing-serp` | 10 | Bing SERP |
+## Maps & local
+| Command | Cost | Notes |
+| --- | --- | --- |
+| `google-maps` | 5 | Maps search |
+| `google-maps-place` | 5 | Single place by place_id |
+| `google-maps-reviews` | 5 | Place reviews |
+| `google-maps-contributor-reviews` | 5 | Reviews by contributor |
+| `google-maps-photos` | 5 | Place photos |
+| `google-maps-posts` | 10 | Business-owner posts (offers, events, announcements) |
+| `yelp-search` | 5 | Yelp business search |
+| `yelp-place` | 5 | Single Yelp business |
+| `yellowpages-search` | 5 | YellowPages search |
+| `yellowpages-place` | 5 | Single YellowPages listing |
+## E-commerce
+| Command | Cost | Notes |
+| --- | --- | --- |
+| `amazon-search` | 5 | Amazon search results |
+| `amazon-product` | 5 | Amazon product by ASIN |
+| `amazon-seller` | 5 | Amazon seller profile |
+| `amazon-seller-products` | 5 | Seller's product catalog |
+| `shopify-products` | 5 | Any Shopify store's products |
+| `shopify-collections` | 5 | Shopify collections |
+## Real estate
+| Command | Cost | Notes |
+| --- | --- | --- |
+| `zillow-listing` | 5 | Zillow filtered search |
+| `zillow-property` | 5 | Zillow single property |
+| `redfin-listing` | 5 | Redfin filtered search |
+| `redfin-property` | 5 | Redfin single property |
+## Travel
+| Command | Cost | Notes |
+| --- | --- | --- |
+| `airbnb-listing` | 5 | Airbnb filtered search |
+| `airbnb-property` | 5 | Airbnb single listing |
+| `booking-search` | 10 | Booking.com search (hotels, apartments) |
+| `booking-place` | 10 | Booking.com property + room/rate list |
+| `google-flights` | 15 | Flight search via Google Flights |
+## Jobs
+| Command | Cost | Notes |
+| --- | --- | --- |
+| `indeed-listing` | 5 | Indeed search |
+| `indeed-job` | 5 | Single Indeed posting |
+| `glassdoor-listing` | 10 | Glassdoor search (with ratings) |
+| `glassdoor-job` | 10 | Single Glassdoor posting |
+## Web
+| Command | Cost | Notes |
+| --- | --- | --- |
+| `web-scraping` | 10 | Arbitrary URL — JS rendering, AI extraction, markdown output, screenshots |
+## Social
+| Command | Cost | Notes |
+| --- | --- | --- |
+| `instagram-profile` | 5 | Instagram profile by handle |
+## YouTube
+| Command | Cost | Notes |
+| --- | --- | --- |
+| `youtube-search-api` | 10 | YouTube search (videos, shorts, channels, playlists) |
+| `youtube-video-api` | 10 | Single video metadata + stats + related |
+| `youtube-channel-api` | 10 | Channel home / videos / shorts / playlists / community |
+| `youtube-transcript-api` | 10 | Full transcript with millisecond offsets |
+## Utility (no cost)
+| Command | Notes |
+| --- | --- |
+| `configure` | Interactive setup; writes ~/.hasdata/config.yaml |
+| `version` | Print version |
+| `update` | Self-update from GitHub Releases |
+| `completion {bash\|zsh\|fish\|powershell}` | Generate shell completion |
+## Deprecated
+`amazon-reviews` is deprecated — Amazon now requires login to access reviews. The subcommand still exists for backwards compatibility but currently returns no data; don't suggest it.