opencode-skills-collection 3.0.36 → 3.0.38

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (73) hide show
  1. package/bundled-skills/.antigravity-install-manifest.json +13 -1
  2. package/bundled-skills/2slides-ppt-generator/SKILL.md +786 -0
  3. package/bundled-skills/2slides-ppt-generator/references/api-reference.md +499 -0
  4. package/bundled-skills/2slides-ppt-generator/references/mcp-integration.md +282 -0
  5. package/bundled-skills/2slides-ppt-generator/references/pricing.md +195 -0
  6. package/bundled-skills/2slides-ppt-generator/scripts/api_constants.py +87 -0
  7. package/bundled-skills/2slides-ppt-generator/scripts/create_pdf_slides.py +159 -0
  8. package/bundled-skills/2slides-ppt-generator/scripts/download_slides_pages_voices.py +157 -0
  9. package/bundled-skills/2slides-ppt-generator/scripts/generate_narration.py +197 -0
  10. package/bundled-skills/2slides-ppt-generator/scripts/generate_slides.py +247 -0
  11. package/bundled-skills/2slides-ppt-generator/scripts/get_job_status.py +106 -0
  12. package/bundled-skills/2slides-ppt-generator/scripts/search_themes.py +137 -0
  13. package/bundled-skills/anti-sycophancy/README.md +86 -0
  14. package/bundled-skills/anti-sycophancy/SKILL.md +40 -0
  15. package/bundled-skills/antigravity-agent-manager/SKILL.md +112 -0
  16. package/bundled-skills/docs/integrations/jetski-cortex.md +3 -3
  17. package/bundled-skills/docs/integrations/jetski-gemini-loader/README.md +1 -1
  18. package/bundled-skills/docs/maintainers/repo-growth-seo.md +3 -3
  19. package/bundled-skills/docs/maintainers/skills-update-guide.md +1 -1
  20. package/bundled-skills/docs/sources/sources.md +1 -0
  21. package/bundled-skills/docs/users/bundles.md +1 -1
  22. package/bundled-skills/docs/users/claude-code-skills.md +1 -1
  23. package/bundled-skills/docs/users/gemini-cli-skills.md +1 -1
  24. package/bundled-skills/docs/users/getting-started.md +1 -1
  25. package/bundled-skills/docs/users/kiro-integration.md +1 -1
  26. package/bundled-skills/docs/users/usage.md +4 -4
  27. package/bundled-skills/docs/users/visual-guide.md +4 -4
  28. package/bundled-skills/event-staffing-compliance/SKILL.md +91 -0
  29. package/bundled-skills/event-staffing-ordering/SKILL.md +119 -0
  30. package/bundled-skills/examprep-ai/SKILL.md +446 -0
  31. package/bundled-skills/hasdata/SKILL.md +107 -0
  32. package/bundled-skills/hasdata/references/code-recipes.md +150 -0
  33. package/bundled-skills/hasdata/references/ecommerce.md +116 -0
  34. package/bundled-skills/hasdata/references/jobs.md +111 -0
  35. package/bundled-skills/hasdata/references/local-business.md +145 -0
  36. package/bundled-skills/hasdata/references/real-estate.md +84 -0
  37. package/bundled-skills/hasdata/references/scraper-jobs.md +252 -0
  38. package/bundled-skills/hasdata/references/search.md +154 -0
  39. package/bundled-skills/hasdata/references/travel.md +202 -0
  40. package/bundled-skills/hasdata/references/web-scraping.md +159 -0
  41. package/bundled-skills/hasdata/references/youtube.md +186 -0
  42. package/bundled-skills/hasdata-cli/SKILL.md +169 -0
  43. package/bundled-skills/hasdata-cli/references/all-commands.md +107 -0
  44. package/bundled-skills/hasdata-cli/references/ecommerce.md +106 -0
  45. package/bundled-skills/hasdata-cli/references/enrichment.md +227 -0
  46. package/bundled-skills/hasdata-cli/references/jobs.md +84 -0
  47. package/bundled-skills/hasdata-cli/references/local-business.md +123 -0
  48. package/bundled-skills/hasdata-cli/references/real-estate.md +126 -0
  49. package/bundled-skills/hasdata-cli/references/search.md +122 -0
  50. package/bundled-skills/hasdata-cli/references/travel.md +102 -0
  51. package/bundled-skills/hasdata-cli/references/web-scraping.md +181 -0
  52. package/bundled-skills/hasdata-cli/references/youtube.md +145 -0
  53. package/bundled-skills/linkedin-content-generator/SKILL.md +492 -0
  54. package/bundled-skills/linkedin-content-generator/scripts/generate_calendar.py +82 -0
  55. package/bundled-skills/linkedin-content-generator/scripts/generate_carousel.py +69 -0
  56. package/bundled-skills/linkedin-content-generator/scripts/generate_newsletter.py +64 -0
  57. package/bundled-skills/linkedin-content-generator/scripts/generate_post.py +77 -0
  58. package/bundled-skills/linkedin-content-generator/scripts/memory.md +49 -0
  59. package/bundled-skills/linkedin-content-generator/scripts/memory_manager.py +134 -0
  60. package/bundled-skills/linkedin-content-generator/scripts/utils.py +96 -0
  61. package/bundled-skills/permission-manager/README.md +22 -0
  62. package/bundled-skills/permission-manager/SKILL.md +54 -0
  63. package/bundled-skills/skill-suggester/README.md +14 -0
  64. package/bundled-skills/skill-suggester/SKILL.md +69 -0
  65. package/bundled-skills/smart-git-automation/README.md +31 -0
  66. package/bundled-skills/smart-git-automation/SKILL.md +96 -0
  67. package/bundled-skills/vercel-optimize/lib/cost-coverage.mjs +3 -1
  68. package/bundled-skills/vercel-optimize/lib/render-report.mjs +2 -2
  69. package/bundled-skills/vercel-optimize/lib/util.mjs +7 -0
  70. package/bundled-skills/vercel-optimize/lib/verify-claim.mjs +2 -7
  71. package/bundled-skills/vercel-optimize/lib/workspace-resolver.mjs +2 -1
  72. package/package.json +2 -2
  73. package/skills_index.json +268 -0
@@ -0,0 +1,122 @@
1
+ # Search references
2
+
3
+ Subcommands: `google-serp`, `google-serp-light`, `google-ai-mode`, `google-news`, `google-shopping`, `bing-serp`, `google-trends`, `google-images`, `google-events`, `google-short-videos`, `google-immersive-product`.
4
+
5
+ For `google-flights`, see `travel.md`.
6
+
7
+ Run `hasdata <api> --help` for the live, authoritative flag set. Below are the commonly-used flags and example invocations.
8
+
9
+ ---
10
+
11
+ ## google-serp (10 credits)
12
+
13
+ ```bash
14
+ hasdata google-serp --q "QUERY" [--gl COUNTRY] [--hl LANG] [--num 10] [--start 0] --raw | jq .
15
+ ```
16
+
17
+ Common flags:
18
+ - `--q TEXT` (required) — search query
19
+ - `--gl us|gb|ca|de|fr|...` — country code; affects results
20
+ - `--hl en|es|fr|de|...` — UI language
21
+ - `--num 10..100` — results per page
22
+ - `--start 0|10|20...` — pagination offset (multiples of 10)
23
+ - `--location "Austin,Texas,United States"` — Google canonical location
24
+ - `--device-type desktop|mobile|tablet`
25
+ - `--tbm isch|vid|nws|shop|lcl` — search type
26
+ - `--safe active|off`
27
+ - `--lr lang_en --lr lang_fr` — restrict to language(s)
28
+ - `--domain google.com|google.co.uk|...`
29
+ - `--tbs cdr:1,cd_min:10/17/2018,cd_max:3/8/2021` — advanced search filters
30
+
31
+ Useful response fields (via `jq`):
32
+ - `.organic_results[] | {title, link, snippet}` — main results
33
+ - `.ai_overview` — AI Overview block (when present)
34
+ - `.answer_box`, `.knowledge_graph`, `.related_searches`, `.people_also_ask`
35
+ - `.local_results`, `.shopping_results`, `.news_results`
36
+
37
+ Example — top-10 organic for prompt grounding:
38
+ ```bash
39
+ hasdata google-serp --q "$Q" --num 10 --raw \
40
+ | jq -r '.organic_results[] | "- \(.title): \(.snippet)"'
41
+ ```
42
+
43
+ ## google-serp-light (5 credits)
44
+
45
+ Same flags as `google-serp` but cheaper and returns a single page. Use when the user wants quick results and doesn't need PAA/AI Overview/local sections.
46
+
47
+ ## google-ai-mode (5 credits)
48
+
49
+ Returns Google's AI Mode answer for a query. Same `--q` / `--gl` / `--hl` semantics.
50
+
51
+ ## google-news (10 credits)
52
+
53
+ ```bash
54
+ hasdata google-news --q "QUERY" [--gl us] [--hl en] --raw | jq '.news_results[]'
55
+ ```
56
+
57
+ Per-article fields: `title`, `link`, `source.name`, `date`, `snippet`, `thumbnail`.
58
+
59
+ ## google-shopping (10 credits)
60
+
61
+ ```bash
62
+ hasdata google-shopping --q "PRODUCT" [--gl us] --raw | jq '.shopping_results[]'
63
+ ```
64
+
65
+ Per-result: `title`, `link`, `price`, `extracted_price`, `source`, `rating`, `reviews`, `delivery`.
66
+
67
+ ## bing-serp (10 credits)
68
+
69
+ ```bash
70
+ hasdata bing-serp --q "QUERY" [--cc us] [--setlang en] --raw | jq '.organic_results[]'
71
+ ```
72
+
73
+ Use when the user explicitly asks for Bing or wants a non-Google second opinion.
74
+
75
+ ## google-trends (5 credits)
76
+
77
+ ```bash
78
+ hasdata google-trends --q "TERM" [--geo US] [--cat 0] [--time "today 12-m"] --raw | jq .
79
+ ```
80
+
81
+ Multiple terms for comparison: `--q "term1,term2,term3"` (comma-separated, max 5).
82
+
83
+ ## google-images (5 credits)
84
+
85
+ ```bash
86
+ hasdata google-images --q "QUERY" --raw | jq '.images_results[] | {title, original, source}'
87
+ ```
88
+
89
+ ## google-events (5 credits)
90
+
91
+ ```bash
92
+ hasdata google-events --q "concerts in austin" [--gl us] --raw | jq '.events_results[]'
93
+ ```
94
+
95
+ ## google-short-videos (10 credits) / google-immersive-product (5 credits)
96
+
97
+ Less common — run `--help` to see flags when needed.
98
+
99
+ ---
100
+
101
+ ## Non-obvious use cases
102
+
103
+ - **Fact-check a claim before answering** — instead of relying on training data, run `google-serp --q "EXACT CLAIM"` and check whether top results corroborate or contradict.
104
+ - **Resolve "what is the URL for X?"** — agents often hallucinate URLs. `google-serp --q "X official site" --num 3` and pick the result with the matching domain.
105
+ - **Source-find a quote** — `google-serp --q "\"the exact quoted text here\""` (escape the inner quotes). Returns the page that originated it.
106
+ - **Compare same query across regions** — same `--q`, different `--gl us|gb|de|fr` to see how SERP differs by geo. Useful for international SEO and "what do users in country Y see".
107
+ - **Time-bound search** — `--tbs qdr:d` (past day), `qdr:w` (week), `qdr:m` (month), `qdr:y` (year). Combine with `google-news` for fresh-only news. Or `--tbs cdr:1,cd_min:M/D/YYYY,cd_max:M/D/YYYY` for an explicit window.
108
+ - **Site-restricted search** — `--q "site:example.com TOPIC"` to search within one domain (better than scraping the site's own search box).
109
+ - **"What's been written about X recently?"** — `google-news --q "X" --gl us` then `jq` over `.news_results[] | select(.date | test("hours? ago|day ago"))`.
110
+ - **Discover competitors** — `google-serp --q "best alternatives to PRODUCT"` followed by extracting brand names from the top results' titles.
111
+ - **Find documentation links** — `google-serp --q "LIBRARY official docs"` instead of guessing a URL pattern.
112
+ - **Translate search intent** — `--hl de --gl de --q "the English query"` shows German-language ranking; useful for multi-locale SEO checks.
113
+ - **Trends over time** — `google-trends --q "term1,term2"` (comma-separated, ≤5 terms) for relative interest curves. Better than guessing "is X popular now" from training data.
114
+ - **Image-driven research** — `google-images --q "X"` to grab source URLs; pipe top results into `web-scraping` to read context.
115
+ - **Map-distance trick** — for "things near X", use `google-maps --ll "@LAT,LNG,Zz"` not a SERP query — much higher signal for proximity intent.
116
+ - **Fact-check pricing** — when training data has stale prices for SaaS or subscription services, search and read the pricing page rather than answering from memory.
117
+
118
+ ## Picking the right SERP variant
119
+
120
+ - Use `google-serp-light` when only top organic results are needed (skips PAA / AI overview / local sections).
121
+ - Use `google-serp` when you need the full SERP feature set (PAA, AI overview, knowledge graph, local pack).
122
+ - Cache results client-side when running the same query repeatedly — the CLI does not cache.
@@ -0,0 +1,102 @@
1
+ # Travel reference
2
+
3
+ Subcommands:
4
+ - `airbnb-listing`, `airbnb-property` (5 credits each) — short-term rentals.
5
+ - `booking-search`, `booking-place` (10 credits each) — hotels and other lodging on Booking.com.
6
+ - `google-flights` (15 credits) — flight prices and itineraries via Google Flights.
7
+
8
+ `*-listing` / `*-search` is the filtered search; `*-property` / `*-place` is a single-property deep dive.
9
+
10
+ For activities at the destination see `google-events` (in `search.md`); for ground transport scrape the operator's site with `web-scraping`.
11
+
12
+ ---
13
+
14
+ ## airbnb-listing
15
+
16
+ ```bash
17
+ hasdata airbnb-listing --location "Lisbon, Portugal" \
18
+ --check-in 2026-06-15 --check-out 2026-06-22 \
19
+ --adults 2 --price-max 200 --raw
20
+ ```
21
+
22
+ Run `--help` for the full filter set (room type, amenities, instant book, etc.).
23
+
24
+ ## airbnb-property
25
+
26
+ ```bash
27
+ hasdata airbnb-property --url "https://www.airbnb.com/rooms/12345678" --raw
28
+ ```
29
+
30
+ ## booking-search (10 credits)
31
+
32
+ ```bash
33
+ hasdata booking-search \
34
+ --keyword "Lisbon" \
35
+ --check-in-date 2026-07-10 --check-out-date 2026-07-13 \
36
+ --adults 2 --children 0 --rooms 1 \
37
+ [--price-min 50 --price-max 250] [--rating 4 --rating 5] \
38
+ [--review-score reviewScoreVeryGood --review-score reviewScoreSuperb] \
39
+ [--property-type hotels --property-type apartments] \
40
+ [--meals breakfastIncluded] [--facilities freeParking --facilities pool] \
41
+ [--sort priceLowestFirst|ratingHighToLow|topReviewed|...] \
42
+ [--page 2] [--currency USD] [--language en-us] \
43
+ --raw | jq '.results[]'
44
+ ```
45
+
46
+ Required (no defaults work in production, even though `--help` shows them): `--keyword`, `--check-in-date`, `--check-out-date`, `--adults`, `--children`, `--rooms`. Pass `--children 0` explicitly when none.
47
+
48
+ When `--children > 0`, **also pass `--children-ages-json '[5,7]'`** with one age per child (0–17). Booking rejects the request otherwise.
49
+
50
+ Bracketed price filters (`--price-min` / `--price-max`) require `>= 10` / `>= 20` respectively; one of the two is required when filtering on price.
51
+
52
+ Top-level response: `results`, `searchInformation`, `pagination`, `requestMetadata`. Per-result keys (verified live): `hotelId`, `roomId`, `title`, `url`, `location`, `rating`, `reviews`, `price`, `room`, `beds`, `bedTypes`, `policies`, `photo`.
53
+
54
+ ```bash
55
+ # Cheap-first filtered search
56
+ hasdata booking-search --keyword "Paris" \
57
+ --check-in-date 2026-08-01 --check-out-date 2026-08-04 \
58
+ --adults 2 --children 0 --rooms 1 \
59
+ --review-score reviewScoreVeryGood \
60
+ --sort priceLowestFirst --raw \
61
+ | jq -c '.results[] | {title, price: .price.total, rating, url}'
62
+ ```
63
+
64
+ ## booking-place (10 credits)
65
+
66
+ ```bash
67
+ hasdata booking-place \
68
+ --url "https://www.booking.com/hotel/fr/le-bristol-paris.html" \
69
+ --check-in-date 2026-07-10 --check-out-date 2026-07-13 \
70
+ --adults 2 --children 0 --rooms 1 \
71
+ [--currency USD] [--language en-us] \
72
+ --raw | jq .
73
+ ```
74
+
75
+ Required: `--url` (must be on `booking.com` / `www.booking.com`), stay dates, `--adults`, `--children`, `--rooms`.
76
+
77
+ Response: `overview`, `bookingDetails`, `rooms[]`, `facilities`, `houseRules`, `ratings`, `reviews`, `restaurants`, `breadcrumbs`, `questionsAndAnswers`. `overview` carries `id`, `title`, `address`, `description`, `propertyType`, `photos`, `highlights`, `mostPopularFacilities`. Each `rooms[i]` has `roomId`, `name`, `bedTypes`, `beds`, `facilities`, `otherFacilities`, `variants[]` (pricing/availability per package).
78
+
79
+ ## google-flights (15 credits)
80
+
81
+ ```bash
82
+ hasdata google-flights \
83
+ --departure-id "JFK" --arrival-id "LAX" \
84
+ --outbound-date 2026-06-15 --return-date 2026-06-22 \
85
+ --currency USD --raw | jq .
86
+ ```
87
+
88
+ Round-trip vs. one-way controlled by presence/absence of `--return-date`. Run `--help` for the full flag set (cabin class, max stops, preferred airlines, etc.).
89
+
90
+ ---
91
+
92
+ ## Non-obvious use cases
93
+
94
+ - **Hotel-vs-rental arbitrage** — same dates and party size via `booking-search` and `airbnb-listing`; compare nightly cost percentile-for-percentile. The cheaper platform isn't always the same one across cities.
95
+ - **Conference-room pricing audit** — `booking-search --keyword "$CITY" --check-in-date $START --check-out-date $END --sort priceHighestFirst` during a known conference window vs an idle week; the delta is the conference premium.
96
+ - **Family-friendly filter** — `--children-ages-json '[5,9]' --children 2 --rooms 1 --travel-group family` and Booking returns only properties that accept the party size with appropriate beds.
97
+ - **Loyalty-program portfolio** — `booking-search --keyword "$CITY" --raw | jq '.results[] | select(.title | test("Marriott|Hilton|Hyatt"))'` filters to chains you collect points with.
98
+ - **Airbnb price-arbitrage check** — same dates, same area, two `airbnb-listing` calls with different `--adults` counts to surface listings that don't scale per-person price. Sometimes the difference is the deal.
99
+ - **STR-vs-LTR feasibility** — pair `airbnb-listing` for nightly rates with `zillow-listing --type forSale` (see `real-estate.md`) for purchase price in the same area; compute gross yield client-side.
100
+ - **Flights without a travel API** — `google-flights` with `--departure-id`, `--arrival-id`, dates, and `--currency` for ad-hoc fare checks against a vendor like Skyscanner.
101
+ - **Multi-leg cost planning** — chain `google-flights` calls for each leg and sum `.best_flights[].price`; cheaper than the round-trip price-bot SaaS for one-off itineraries.
102
+ - **Trip-cost preview** — combine `google-flights` (transport) + `booking-search` / `airbnb-listing` (lodging) + `google-events` (activities, see `search.md`) into one cost estimate before pitching a destination.
@@ -0,0 +1,181 @@
1
+ # web-scraping reference
2
+
3
+ Subcommand: `web-scraping` (10 credits/call). Single endpoint for arbitrary URL scraping with JS rendering, proxies, AI extraction, screenshots, markdown conversion.
4
+
5
+ > **`web-scraping` is the last resort, not the default.** Before reaching for it, ask: would `google-serp` (or `google-news` / `google-shopping` / `google-maps`) already have the field I need? Google's `.knowledge_graph`, `.organic_results[].snippet`, `.local_results[]` carry pre-extracted public facts without direct page access. Only invoke `web-scraping` when:
6
+ >
7
+ > - the user gave you a specific URL to read, OR
8
+ > - SERP came up short for a specific field, OR
9
+ > - the target page renders content that doesn't show up in any SERP snippet.
10
+ >
11
+ > Use it only for public pages or content the user is authorized to access, and respect site terms, robots/access controls, privacy law, and rate limits.
12
+ >
13
+ > See `references/enrichment.md` for the SERP-first patterns.
14
+
15
+ ```bash
16
+ hasdata web-scraping --url "URL" [flags] --raw | jq .
17
+ ```
18
+
19
+ ## Required
20
+
21
+ - `--url URL` — target page
22
+
23
+ ## Output formats
24
+
25
+ - `--output-format html|text|markdown|json` (repeatable)
26
+ - One format → that format directly (e.g. `--output-format markdown` returns markdown text under `.markdown`)
27
+ - Multiple → JSON response with one key per format
28
+ - `--output-format json` combined with others → wraps everything in JSON
29
+
30
+ ```bash
31
+ # LLM-friendly markdown for prompt context
32
+ hasdata web-scraping --url "$URL" --output-format markdown --raw | jq -r .markdown
33
+ ```
34
+
35
+ ## Proxy & rendering
36
+
37
+ - `--proxy-type datacenter|residential` (default datacenter)
38
+ - `--proxy-country US|UK|DE|IE|FR|IT|SE|BR|CA|JP|SG|IN|ID` (default US)
39
+ - `--js-rendering` / `--no-js-rendering` (default on) — full headless browser
40
+ - `--block-ads` / `--no-block-ads` (default on)
41
+ - `--block-resources` / `--no-block-resources` (default on) — blocks images/CSS for speed
42
+ - `--screenshot` / `--no-screenshot` (default on; the result includes a screenshot URL)
43
+ - `--remove-base64-images` — strip inline base64 images from response
44
+ - `--extract-emails` / `--no-extract-emails` (default on)
45
+ - `--extract-links` (default off)
46
+
47
+ ## Wait controls
48
+
49
+ - `--wait MS` — fixed wait after page load
50
+ - `--wait-for "CSS_SELECTOR"` — wait until selector appears
51
+
52
+ ## Custom JS scenario (complex array — JSON only)
53
+
54
+ ```bash
55
+ hasdata web-scraping --url "$URL" \
56
+ --js-scenario-json '[
57
+ {"wait": 2000},
58
+ {"click": ".load-more"},
59
+ {"waitFor": ".item"},
60
+ {"scrollY": 1000},
61
+ {"fill": ["input#q", "espresso"]}
62
+ ]' --raw
63
+ ```
64
+
65
+ Supported actions: `evaluate`, `click`, `wait`, `waitFor`, `waitForAndClick`, `scrollX`, `scrollY`, `fill`. Executed sequentially.
66
+
67
+ Accepts raw JSON, `@file.json`, or `-` (stdin).
68
+
69
+ ## Headers (kvSlice + JSON escape)
70
+
71
+ ```bash
72
+ # Repeatable kv form (splits on first `=`)
73
+ hasdata web-scraping --url "$URL" \
74
+ --headers "User-Agent=hasdata-cli" \
75
+ --headers "Accept-Language=en-US,en;q=0.9" \
76
+ --headers "Cookie=session=abc=def" \
77
+ --raw
78
+
79
+ # JSON base + kv overrides
80
+ hasdata web-scraping --url "$URL" \
81
+ --headers-json '{"User-Agent":"base","X-Common":"shared"}' \
82
+ --headers "User-Agent=override" \
83
+ --raw
84
+ ```
85
+
86
+ ## CSS-selector data extraction (kvSlice or JSON)
87
+
88
+ ```bash
89
+ # Lightweight kv form: --extract-rules KEY=SELECTOR
90
+ hasdata web-scraping --url "https://quotes.toscrape.com" \
91
+ --extract-rules "quote=.quote .text" \
92
+ --extract-rules "author=.quote .author" \
93
+ --raw | jq .
94
+
95
+ # JSON form for complex selectors / attributes
96
+ hasdata web-scraping --url "$URL" \
97
+ --extract-rules-json '{"title":"h1","links":"a @href","price":".price-now"}' \
98
+ --raw
99
+ ```
100
+
101
+ `@href`, `@src`, etc. extract attributes. Without `@`, extracts text content.
102
+
103
+ ## AI extraction (LLM-driven)
104
+
105
+ ```bash
106
+ hasdata web-scraping --url "$URL" \
107
+ --ai-extract-rules-json '{
108
+ "headline": {"type": "string", "description": "the main story headline"},
109
+ "comments_count": {"type": "number"},
110
+ "is_paid_content": {"type": "boolean"},
111
+ "tags": {"type": "list", "description": "topic tags"},
112
+ "author": {"type": "item", "output": {
113
+ "name": {"type": "string"},
114
+ "verified": {"type": "boolean"}
115
+ }}
116
+ }' --raw | jq .
117
+ ```
118
+
119
+ Supported types: `string`, `number`, `boolean`, `list`, `item` (nested object — defines its shape under `output`).
120
+
121
+ ## Tag filtering
122
+
123
+ - `--include-only-tags "main,article"` (comma-joined CSS selectors) — keep only matching elements
124
+ - `--exclude-tags script --exclude-tags style` (repeatable) — remove elements
125
+
126
+ ```bash
127
+ hasdata web-scraping --url "$URL" \
128
+ --output-format markdown \
129
+ --include-only-tags "article,main" \
130
+ --exclude-tags script --exclude-tags style --exclude-tags nav \
131
+ --raw | jq -r .markdown
132
+ ```
133
+
134
+ ## URL blocklist
135
+
136
+ ```bash
137
+ --block-urls-json '["**.googletagmanager.com/**","**.doubleclick.net/**"]'
138
+ ```
139
+
140
+ Glob patterns block specific subresource URLs from loading.
141
+
142
+ ## Saving binary output
143
+
144
+ The `web-scraping` response is JSON, but if `--output-format` is set to a single non-JSON format, the wrapped result is still JSON. Use `jq -r .markdown > file.md` to extract text. For screenshots specifically, the response contains a screenshot URL — fetch it separately with `curl`.
145
+
146
+ ## Non-obvious use cases
147
+
148
+ - **Page-to-prompt grounding** — `--output-format markdown` produces clean LLM-ready text from any URL. Strip nav/ads with `--exclude-tags script --exclude-tags style --exclude-tags nav`. Beats fetch + regex.
149
+ - **JavaScript-rendered SPAs that `curl` can't read** — default `--js-rendering` uses a real browser, so React/Vue/Angular pages return their hydrated DOM, not the empty shell.
150
+ - **Geo/availability testing where allowed** — `--proxy-type residential` can model residential network availability; use only for authorized tests where the target's terms and access controls permit it.
151
+ - **Geo-targeted content** — `--proxy-country DE` to see what users in Germany see (different prices, currencies, A/B variants, or geo-blocked content).
152
+ - **Quick "is this page real" check** — `--screenshot` (default on) returns a screenshot URL in the response; verify visually without manually opening the URL.
153
+ - **Universal price extractor** — `--ai-extract-rules-json '{"price":{"type":"number"},"currency":{"type":"string"},"in_stock":{"type":"boolean"}}'` works on arbitrary retailer pages without writing a CSS selector. Cheaper than maintaining per-site selectors when the user only needs occasional spot-checks.
154
+ - **Authenticated content with user authority** — `--headers Cookie=session=...` injects auth cookies if the user has them. Use only with explicit permission and authority to access that account/content; never use cookies to bypass someone else's access controls.
155
+ - **Convert paginated lists to a clean record set** — combine `--js-scenario-json` (click "Load more" 5×) with `--ai-extract-rules-json` (pull the list shape). Lets you scrape paginated SPAs with one CLI call instead of N.
156
+ - **Headless screenshot of a layout** — set `--js-rendering`, `--no-block-resources` (so CSS loads), and capture the screenshot URL from the response. Useful for "render this URL and show me what it looks like".
157
+ - **Markdown for RAG ingestion** — pipe `.markdown` from many URLs into a JSONL corpus; embed and store. The CLI handles JS, ads, images so you don't need a custom pipeline.
158
+ - **Fallback for any other API** — when no purpose-built API exists for a vertical (e.g. niche directories, government pages, less-popular real-estate sites), `web-scraping` is the catch-all.
159
+ - **Detect content changes** — schedule `web-scraping --url X --output-format markdown` and diff the output across runs to flag pricing-page or terms-of-service changes.
160
+ - **Read PDFs / non-HTML resources** — `--output-format text` works on text-extractable PDFs accessible via URL (the underlying renderer handles them).
161
+ - **AI extraction for forms or tables** — pages with structured data in HTML tables are easy: `--ai-extract-rules-json '{"rows":{"type":"list","output":{"name":{"type":"string"},"value":{"type":"number"}}}}'`. The model fills in nested rows.
162
+
163
+ ## Common patterns
164
+
165
+ ```bash
166
+ # Full-page markdown for RAG
167
+ hasdata web-scraping --url "$URL" \
168
+ --output-format markdown --no-screenshot --no-block-resources \
169
+ --raw | jq -r .markdown >> corpus.md
170
+
171
+ # JS-heavy SPA: wait + scroll
172
+ hasdata web-scraping --url "$URL" \
173
+ --js-scenario-json '[{"wait":2000},{"scrollY":2000},{"wait":1500}]' \
174
+ --wait-for ".item" \
175
+ --output-format html --raw | jq -r .html
176
+
177
+ # Extract structured data from an arbitrary page
178
+ hasdata web-scraping --url "$URL" \
179
+ --ai-extract-rules-json '{"price":{"type":"number"},"in_stock":{"type":"boolean"}}' \
180
+ --raw | jq .
181
+ ```
@@ -0,0 +1,145 @@
1
+ # YouTube reference
2
+
3
+ Subcommands: `youtube-search-api`, `youtube-video-api`, `youtube-channel-api`, `youtube-transcript-api`. 10 credits each.
4
+
5
+ `*-search-api` is keyword-driven; the others target a specific video, channel, or transcript. Most workflows chain search → video → transcript.
6
+
7
+ ---
8
+
9
+ ## youtube-search-api
10
+
11
+ ```bash
12
+ hasdata youtube-search-api --q "QUERY" [--sort-by relevance|date|views|rating|popularity] \
13
+ [--length under4|between420|plus20] [--date hour|today|week|month|year] \
14
+ [--video-type video|shorts|channel|playlist|movie] \
15
+ [--gl us] [--hl en] [--device-type desktop|mobile] \
16
+ [--pagination-token TOKEN] --raw | jq .
17
+ ```
18
+
19
+ Common flags:
20
+ - `--q TEXT` (required)
21
+ - `--sort-by relevance|date|views|rating|popularity`
22
+ - `--date hour|today|week|month|year` — upload-recency window
23
+ - `--length under4|between420|plus20` — duration bucket (`<4m`, `4–20m`, `>20m`)
24
+ - `--video-type video|shorts|channel|playlist|movie`
25
+ - `--filters hd,k4,hdr,subtitles,cc,d3,d360,vr180,live,bought,location` — feature flags ANDed
26
+ - `--gl`, `--hl` — country / language
27
+ - `--pagination-token` — copy from previous response's `pagination.nextPageToken`
28
+ - `--sp` — raw YouTube `sp=` filter token (overrides `sort-by` / `date` / `video-type` / `length` / `filters`)
29
+
30
+ Top-level response keys: `videoResults`, `shortsResults`, `channelResults`, `playlistResults`, `adsResults`, `sponsoredResults`, `searchInformation`, `pagination`.
31
+
32
+ Per-video result: `videoId`, `title`, `link`, `channel`, `description`, `length`, `views`, `viewsOriginal`, `publishedDate`, `thumbnail`, `positionOnPage`.
33
+
34
+ ```bash
35
+ # Latest videos for a topic
36
+ hasdata youtube-search-api --q "$Q" --sort-by date --date week --raw \
37
+ | jq -c '.videoResults[] | {title, channel: .channel.name, views, publishedDate, link}'
38
+ ```
39
+
40
+ ## youtube-video-api
41
+
42
+ ```bash
43
+ hasdata youtube-video-api --v-param VIDEO_ID [--gl us] [--hl en] --raw | jq .
44
+ ```
45
+
46
+ - `--v-param` (required) — 11-char video ID (the `v=` part of the watch URL)
47
+ - `--device-type desktop|mobile`
48
+ - `--gl`, `--hl`
49
+
50
+ Top-level fields: `videoId`, `title`, `description`, `channel`, `views`, `extractedViews`, `likes`, `extractedLikes`, `lengthSeconds`, `publishedDate`, `keywords[]`, `captions`, `socialLinks`, `music`, `category`, `thumbnail`, `isFamilySafe`, `isUnlisted`, `relatedVideos[]`, `relatedShorts[]`, `endScreenVideos[]`.
51
+
52
+ ```bash
53
+ # Quick stats
54
+ hasdata youtube-video-api --v-param "$VID" --raw \
55
+ | jq '{title, views: .extractedViews, likes: .extractedLikes, length: .lengthSeconds, published: .publishedDate, channel: .channel.name}'
56
+ ```
57
+
58
+ ## youtube-channel-api
59
+
60
+ ```bash
61
+ hasdata youtube-channel-api --channel-id "@HANDLE_OR_UCID" \
62
+ [--tab featured|videos|shorts|streams|playlists|posts|community|podcasts|releases|about|store] \
63
+ [--gl us] [--hl en] [--pagination-token TOKEN] --raw | jq .
64
+ ```
65
+
66
+ - `--channel-id` (required) — `@handle`, `UC…` canonical ID, or legacy `/c/<custom>` / `/user/<name>` URL slug
67
+ - `--tab` — which tab to scrape; default `featured` (Home page)
68
+ - `--pagination-token` — for tabs that paginate (`videos`, `shorts`, etc.)
69
+
70
+ Top-level response: `channelInfo`, `featuredVideo`, `sections[]`.
71
+
72
+ `channelInfo` carries: `name`, `handle`, `channelId`, `channelUrl`, `avatar`, `banner`, `description`, `subscribers`, `extractedSubscribers`, `videosCount`, `extractedVideosCount`, `keywords[]`, `availableTabs[]`, `verified`, `websiteUrl`, `rssUrl`, `isFamilySafe`.
73
+
74
+ ```bash
75
+ # All uploads (paginated)
76
+ hasdata youtube-channel-api --channel-id "@MrBeast" --tab videos --raw \
77
+ | jq -c '.sections[].items[]? | {title, videoId, views, publishedDate}'
78
+ ```
79
+
80
+ ## youtube-transcript-api
81
+
82
+ ```bash
83
+ hasdata youtube-transcript-api --v-param VIDEO_ID [--language-code en] [--type asr] --raw | jq .
84
+ ```
85
+
86
+ - `--v-param` (required) — 11-char video ID
87
+ - `--language-code` — BCP-47 / YouTube code (`en`, `de`, `en-US`, `pt-BR`); must match a track the video actually has
88
+ - `--type asr` — fetch the auto-generated speech-recognition track (omit for human-authored)
89
+
90
+ Response: `transcript[]` and `availableTranscripts[]`.
91
+
92
+ Each `transcript[]` entry: `startMs`, `endMs`, `snippet`, `startTimeText`. Join `snippet`s to reconstruct the full text.
93
+
94
+ ```bash
95
+ # Flatten to plain text
96
+ hasdata youtube-transcript-api --v-param "$VID" --raw \
97
+ | jq -r '.transcript[].snippet' | tr '\n' ' '
98
+ ```
99
+
100
+ ---
101
+
102
+ ## Non-obvious use cases
103
+
104
+ - **"What is this video actually about?"** — `youtube-transcript-api --v-param X --raw | jq -r '.transcript[].snippet'` → feed to an LLM for a real summary, not a thumbnail / title guess.
105
+ - **Cite YouTube content in a brief** — transcript + timestamps lets you quote with `(02:14)` accuracy. `--type asr` fills the gap when the creator never uploaded captions.
106
+ - **Channel growth audit** — `youtube-channel-api --channel-id @X --tab videos` paginated; `jq '.sections[].items[] | {publishedDate, views: .extractedViews}'` gives a velocity-over-time table.
107
+ - **Trend discovery** — `youtube-search-api --q "TOPIC" --sort-by views --date month` returns the highest-viewed videos in the last month — a leading indicator for cultural trends well before they hit Google News.
108
+ - **Competitor content map** — `youtube-channel-api --channel-id @competitor --tab videos --raw | jq -r '.sections[].items[].title'` enumerates every published title, useful for content-gap analysis.
109
+ - **Brand-safety scan** — `youtube-search-api --q "BRAND" --sort-by date --date week --raw | jq '.videoResults[] | select(.title | test("BRAND"; "i"))'` catches new mentions before they go viral.
110
+ - **Influencer due diligence** — combine `youtube-channel-api` for subscriber/video counts with `youtube-video-api` on their top videos (engagement rate ≈ `likes / views`).
111
+ - **"Find the moment they said X"** — `youtube-transcript-api` then `jq -r '.transcript[] | select(.snippet | test("X"; "i")) | "\(.startTimeText): \(.snippet)"'` returns timestamps of every mention.
112
+ - **Music-licensing lookup** — `youtube-video-api --v-param X --raw | jq .music` returns identified tracks (artist, title) when YouTube's Content ID matched.
113
+ - **Translate / re-localize a video** — pull the English transcript with `youtube-transcript-api`, translate via LLM, regenerate subtitles. Cheaper than re-transcribing audio.
114
+ - **Build a podcast / RSS feed for a channel** — `youtube-channel-api --raw | jq -r .channelInfo.rssUrl` returns the official RSS URL; subscribe in any podcast app.
115
+ - **Detect deleted videos** — `youtube-video-api --v-param X` returns an error for removed videos; useful for catching takedowns in archival pipelines.
116
+ - **Bulk research → transcript chain** — `youtube-search-api --q "$Q" --raw | jq -r '.videoResults[].videoId' | xargs -I{} hasdata youtube-transcript-api --v-param {} --raw` builds a research corpus from a topic search.
117
+ - **Shorts-only / long-form-only feeds** — `--video-type shorts` vs `--length plus20` to bias toward one format.
118
+ - **Live-stream discovery** — `--filters live` returns only currently live broadcasts; pair with `--sort-by date` for fresh streams.
119
+
120
+ ## Pipelines
121
+
122
+ ```bash
123
+ # Search → top 5 video transcripts as a corpus
124
+ hasdata youtube-search-api --q "$Q" --sort-by views --raw \
125
+ | jq -r '.videoResults[:5][].videoId' \
126
+ | while read -r vid; do
127
+ echo "=== $vid ==="
128
+ hasdata youtube-transcript-api --v-param "$vid" --raw \
129
+ | jq -r '.transcript[].snippet'
130
+ done > corpus.txt
131
+
132
+ # Channel → CSV of all videos
133
+ hasdata youtube-channel-api --channel-id "@$HANDLE" --tab videos --raw \
134
+ | jq -r '.sections[].items[]? | [.videoId, .title, (.extractedViews // 0), .publishedDate] | @csv' \
135
+ > channel_videos.csv
136
+ ```
137
+
138
+ ## Gotchas
139
+
140
+ - **`--v-param` is exactly 11 chars** — not the full watch URL. Strip the `v=` value or use `jq -r '.videoResults[0].videoId'` from a prior search.
141
+ - **`--language-code` must exist on the video.** Pass one of the codes listed in `availableTranscripts[]`, or omit for the default track.
142
+ - **`--type asr` is needed when no human-authored captions exist.** Without it the API returns the default human track and errors if there isn't one.
143
+ - **`@handle` is preferred over `UC…`** — easier to read, same result. Legacy `/c/` and `/user/` slugs also resolve.
144
+ - **`pagination.nextPageToken` is opaque** — pass it back via `--pagination-token` verbatim; don't try to decode.
145
+ - **`views` vs `extractedViews`** — `views` is the formatted string (`"1.2M views"`), `extractedViews` is the integer. Use the integer for math.