opencode-skills-collection 3.0.36 → 3.0.38
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bundled-skills/.antigravity-install-manifest.json +13 -1
- package/bundled-skills/2slides-ppt-generator/SKILL.md +786 -0
- package/bundled-skills/2slides-ppt-generator/references/api-reference.md +499 -0
- package/bundled-skills/2slides-ppt-generator/references/mcp-integration.md +282 -0
- package/bundled-skills/2slides-ppt-generator/references/pricing.md +195 -0
- package/bundled-skills/2slides-ppt-generator/scripts/api_constants.py +87 -0
- package/bundled-skills/2slides-ppt-generator/scripts/create_pdf_slides.py +159 -0
- package/bundled-skills/2slides-ppt-generator/scripts/download_slides_pages_voices.py +157 -0
- package/bundled-skills/2slides-ppt-generator/scripts/generate_narration.py +197 -0
- package/bundled-skills/2slides-ppt-generator/scripts/generate_slides.py +247 -0
- package/bundled-skills/2slides-ppt-generator/scripts/get_job_status.py +106 -0
- package/bundled-skills/2slides-ppt-generator/scripts/search_themes.py +137 -0
- package/bundled-skills/anti-sycophancy/README.md +86 -0
- package/bundled-skills/anti-sycophancy/SKILL.md +40 -0
- package/bundled-skills/antigravity-agent-manager/SKILL.md +112 -0
- package/bundled-skills/docs/integrations/jetski-cortex.md +3 -3
- package/bundled-skills/docs/integrations/jetski-gemini-loader/README.md +1 -1
- package/bundled-skills/docs/maintainers/repo-growth-seo.md +3 -3
- package/bundled-skills/docs/maintainers/skills-update-guide.md +1 -1
- package/bundled-skills/docs/sources/sources.md +1 -0
- package/bundled-skills/docs/users/bundles.md +1 -1
- package/bundled-skills/docs/users/claude-code-skills.md +1 -1
- package/bundled-skills/docs/users/gemini-cli-skills.md +1 -1
- package/bundled-skills/docs/users/getting-started.md +1 -1
- package/bundled-skills/docs/users/kiro-integration.md +1 -1
- package/bundled-skills/docs/users/usage.md +4 -4
- package/bundled-skills/docs/users/visual-guide.md +4 -4
- package/bundled-skills/event-staffing-compliance/SKILL.md +91 -0
- package/bundled-skills/event-staffing-ordering/SKILL.md +119 -0
- package/bundled-skills/examprep-ai/SKILL.md +446 -0
- package/bundled-skills/hasdata/SKILL.md +107 -0
- package/bundled-skills/hasdata/references/code-recipes.md +150 -0
- package/bundled-skills/hasdata/references/ecommerce.md +116 -0
- package/bundled-skills/hasdata/references/jobs.md +111 -0
- package/bundled-skills/hasdata/references/local-business.md +145 -0
- package/bundled-skills/hasdata/references/real-estate.md +84 -0
- package/bundled-skills/hasdata/references/scraper-jobs.md +252 -0
- package/bundled-skills/hasdata/references/search.md +154 -0
- package/bundled-skills/hasdata/references/travel.md +202 -0
- package/bundled-skills/hasdata/references/web-scraping.md +159 -0
- package/bundled-skills/hasdata/references/youtube.md +186 -0
- package/bundled-skills/hasdata-cli/SKILL.md +169 -0
- package/bundled-skills/hasdata-cli/references/all-commands.md +107 -0
- package/bundled-skills/hasdata-cli/references/ecommerce.md +106 -0
- package/bundled-skills/hasdata-cli/references/enrichment.md +227 -0
- package/bundled-skills/hasdata-cli/references/jobs.md +84 -0
- package/bundled-skills/hasdata-cli/references/local-business.md +123 -0
- package/bundled-skills/hasdata-cli/references/real-estate.md +126 -0
- package/bundled-skills/hasdata-cli/references/search.md +122 -0
- package/bundled-skills/hasdata-cli/references/travel.md +102 -0
- package/bundled-skills/hasdata-cli/references/web-scraping.md +181 -0
- package/bundled-skills/hasdata-cli/references/youtube.md +145 -0
- package/bundled-skills/linkedin-content-generator/SKILL.md +492 -0
- package/bundled-skills/linkedin-content-generator/scripts/generate_calendar.py +82 -0
- package/bundled-skills/linkedin-content-generator/scripts/generate_carousel.py +69 -0
- package/bundled-skills/linkedin-content-generator/scripts/generate_newsletter.py +64 -0
- package/bundled-skills/linkedin-content-generator/scripts/generate_post.py +77 -0
- package/bundled-skills/linkedin-content-generator/scripts/memory.md +49 -0
- package/bundled-skills/linkedin-content-generator/scripts/memory_manager.py +134 -0
- package/bundled-skills/linkedin-content-generator/scripts/utils.py +96 -0
- package/bundled-skills/permission-manager/README.md +22 -0
- package/bundled-skills/permission-manager/SKILL.md +54 -0
- package/bundled-skills/skill-suggester/README.md +14 -0
- package/bundled-skills/skill-suggester/SKILL.md +69 -0
- package/bundled-skills/smart-git-automation/README.md +31 -0
- package/bundled-skills/smart-git-automation/SKILL.md +96 -0
- package/bundled-skills/vercel-optimize/lib/cost-coverage.mjs +3 -1
- package/bundled-skills/vercel-optimize/lib/render-report.mjs +2 -2
- package/bundled-skills/vercel-optimize/lib/util.mjs +7 -0
- package/bundled-skills/vercel-optimize/lib/verify-claim.mjs +2 -7
- package/bundled-skills/vercel-optimize/lib/workspace-resolver.mjs +2 -1
- package/package.json +2 -2
- package/skills_index.json +268 -0
|
@@ -0,0 +1,122 @@
|
|
|
1
|
+
# Search references
|
|
2
|
+
|
|
3
|
+
Subcommands: `google-serp`, `google-serp-light`, `google-ai-mode`, `google-news`, `google-shopping`, `bing-serp`, `google-trends`, `google-images`, `google-events`, `google-short-videos`, `google-immersive-product`.
|
|
4
|
+
|
|
5
|
+
For `google-flights`, see `travel.md`.
|
|
6
|
+
|
|
7
|
+
Run `hasdata <api> --help` for the live, authoritative flag set. Below are the commonly-used flags and example invocations.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## google-serp (10 credits)
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
hasdata google-serp --q "QUERY" [--gl COUNTRY] [--hl LANG] [--num 10] [--start 0] --raw | jq .
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
Common flags:
|
|
18
|
+
- `--q TEXT` (required) — search query
|
|
19
|
+
- `--gl us|gb|ca|de|fr|...` — country code; affects results
|
|
20
|
+
- `--hl en|es|fr|de|...` — UI language
|
|
21
|
+
- `--num 10..100` — results per page
|
|
22
|
+
- `--start 0|10|20...` — pagination offset (multiples of 10)
|
|
23
|
+
- `--location "Austin,Texas,United States"` — Google canonical location
|
|
24
|
+
- `--device-type desktop|mobile|tablet`
|
|
25
|
+
- `--tbm isch|vid|nws|shop|lcl` — search type
|
|
26
|
+
- `--safe active|off`
|
|
27
|
+
- `--lr lang_en --lr lang_fr` — restrict to language(s)
|
|
28
|
+
- `--domain google.com|google.co.uk|...`
|
|
29
|
+
- `--tbs cdr:1,cd_min:10/17/2018,cd_max:3/8/2021` — advanced search filters
|
|
30
|
+
|
|
31
|
+
Useful response fields (via `jq`):
|
|
32
|
+
- `.organic_results[] | {title, link, snippet}` — main results
|
|
33
|
+
- `.ai_overview` — AI Overview block (when present)
|
|
34
|
+
- `.answer_box`, `.knowledge_graph`, `.related_searches`, `.people_also_ask`
|
|
35
|
+
- `.local_results`, `.shopping_results`, `.news_results`
|
|
36
|
+
|
|
37
|
+
Example — top-10 organic for prompt grounding:
|
|
38
|
+
```bash
|
|
39
|
+
hasdata google-serp --q "$Q" --num 10 --raw \
|
|
40
|
+
| jq -r '.organic_results[] | "- \(.title): \(.snippet)"'
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## google-serp-light (5 credits)
|
|
44
|
+
|
|
45
|
+
Same flags as `google-serp` but cheaper and returns a single page. Use when the user wants quick results and doesn't need PAA/AI Overview/local sections.
|
|
46
|
+
|
|
47
|
+
## google-ai-mode (5 credits)
|
|
48
|
+
|
|
49
|
+
Returns Google's AI Mode answer for a query. Same `--q` / `--gl` / `--hl` semantics.
|
|
50
|
+
|
|
51
|
+
## google-news (10 credits)
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
hasdata google-news --q "QUERY" [--gl us] [--hl en] --raw | jq '.news_results[]'
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
Per-article fields: `title`, `link`, `source.name`, `date`, `snippet`, `thumbnail`.
|
|
58
|
+
|
|
59
|
+
## google-shopping (10 credits)
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
hasdata google-shopping --q "PRODUCT" [--gl us] --raw | jq '.shopping_results[]'
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Per-result: `title`, `link`, `price`, `extracted_price`, `source`, `rating`, `reviews`, `delivery`.
|
|
66
|
+
|
|
67
|
+
## bing-serp (10 credits)
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
hasdata bing-serp --q "QUERY" [--cc us] [--setlang en] --raw | jq '.organic_results[]'
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Use when the user explicitly asks for Bing or wants a non-Google second opinion.
|
|
74
|
+
|
|
75
|
+
## google-trends (5 credits)
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
hasdata google-trends --q "TERM" [--geo US] [--cat 0] [--time "today 12-m"] --raw | jq .
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Multiple terms for comparison: `--q "term1,term2,term3"` (comma-separated, max 5).
|
|
82
|
+
|
|
83
|
+
## google-images (5 credits)
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
hasdata google-images --q "QUERY" --raw | jq '.images_results[] | {title, original, source}'
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
## google-events (5 credits)
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
hasdata google-events --q "concerts in austin" [--gl us] --raw | jq '.events_results[]'
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
## google-short-videos (10 credits) / google-immersive-product (5 credits)
|
|
96
|
+
|
|
97
|
+
Less common — run `--help` to see flags when needed.
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Non-obvious use cases
|
|
102
|
+
|
|
103
|
+
- **Fact-check a claim before answering** — instead of relying on training data, run `google-serp --q "EXACT CLAIM"` and check whether top results corroborate or contradict.
|
|
104
|
+
- **Resolve "what is the URL for X?"** — agents often hallucinate URLs. `google-serp --q "X official site" --num 3` and pick the result with the matching domain.
|
|
105
|
+
- **Source-find a quote** — `google-serp --q "\"the exact quoted text here\""` (escape the inner quotes). Returns the page that originated it.
|
|
106
|
+
- **Compare same query across regions** — same `--q`, different `--gl us|gb|de|fr` to see how SERP differs by geo. Useful for international SEO and "what do users in country Y see".
|
|
107
|
+
- **Time-bound search** — `--tbs qdr:d` (past day), `qdr:w` (week), `qdr:m` (month), `qdr:y` (year). Combine with `google-news` for fresh-only news. Or `--tbs cdr:1,cd_min:M/D/YYYY,cd_max:M/D/YYYY` for an explicit window.
|
|
108
|
+
- **Site-restricted search** — `--q "site:example.com TOPIC"` to search within one domain (better than scraping the site's own search box).
|
|
109
|
+
- **"What's been written about X recently?"** — `google-news --q "X" --gl us` then `jq` over `.news_results[] | select(.date | test("hours? ago|day ago"))`.
|
|
110
|
+
- **Discover competitors** — `google-serp --q "best alternatives to PRODUCT"` followed by extracting brand names from the top results' titles.
|
|
111
|
+
- **Find documentation links** — `google-serp --q "LIBRARY official docs"` instead of guessing a URL pattern.
|
|
112
|
+
- **Translate search intent** — `--hl de --gl de --q "the English query"` shows German-language ranking; useful for multi-locale SEO checks.
|
|
113
|
+
- **Trends over time** — `google-trends --q "term1,term2"` (comma-separated, ≤5 terms) for relative interest curves. Better than guessing "is X popular now" from training data.
|
|
114
|
+
- **Image-driven research** — `google-images --q "X"` to grab source URLs; pipe top results into `web-scraping` to read context.
|
|
115
|
+
- **Map-distance trick** — for "things near X", use `google-maps --ll "@LAT,LNG,Zz"` not a SERP query — much higher signal for proximity intent.
|
|
116
|
+
- **Fact-check pricing** — when training data has stale prices for SaaS or subscription services, search and read the pricing page rather than answering from memory.
|
|
117
|
+
|
|
118
|
+
## Picking the right SERP variant
|
|
119
|
+
|
|
120
|
+
- Use `google-serp-light` when only top organic results are needed (skips PAA / AI overview / local sections).
|
|
121
|
+
- Use `google-serp` when you need the full SERP feature set (PAA, AI overview, knowledge graph, local pack).
|
|
122
|
+
- Cache results client-side when running the same query repeatedly — the CLI does not cache.
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
# Travel reference
|
|
2
|
+
|
|
3
|
+
Subcommands:
|
|
4
|
+
- `airbnb-listing`, `airbnb-property` (5 credits each) — short-term rentals.
|
|
5
|
+
- `booking-search`, `booking-place` (10 credits each) — hotels and other lodging on Booking.com.
|
|
6
|
+
- `google-flights` (15 credits) — flight prices and itineraries via Google Flights.
|
|
7
|
+
|
|
8
|
+
`*-listing` / `*-search` is the filtered search; `*-property` / `*-place` is a single-property deep dive.
|
|
9
|
+
|
|
10
|
+
For activities at the destination see `google-events` (in `search.md`); for ground transport scrape the operator's site with `web-scraping`.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## airbnb-listing
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
hasdata airbnb-listing --location "Lisbon, Portugal" \
|
|
18
|
+
--check-in 2026-06-15 --check-out 2026-06-22 \
|
|
19
|
+
--adults 2 --price-max 200 --raw
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
Run `--help` for the full filter set (room type, amenities, instant book, etc.).
|
|
23
|
+
|
|
24
|
+
## airbnb-property
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
hasdata airbnb-property --url "https://www.airbnb.com/rooms/12345678" --raw
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## booking-search (10 credits)
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
hasdata booking-search \
|
|
34
|
+
--keyword "Lisbon" \
|
|
35
|
+
--check-in-date 2026-07-10 --check-out-date 2026-07-13 \
|
|
36
|
+
--adults 2 --children 0 --rooms 1 \
|
|
37
|
+
[--price-min 50 --price-max 250] [--rating 4 --rating 5] \
|
|
38
|
+
[--review-score reviewScoreVeryGood --review-score reviewScoreSuperb] \
|
|
39
|
+
[--property-type hotels --property-type apartments] \
|
|
40
|
+
[--meals breakfastIncluded] [--facilities freeParking --facilities pool] \
|
|
41
|
+
[--sort priceLowestFirst|ratingHighToLow|topReviewed|...] \
|
|
42
|
+
[--page 2] [--currency USD] [--language en-us] \
|
|
43
|
+
--raw | jq '.results[]'
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Required (no defaults work in production, even though `--help` shows them): `--keyword`, `--check-in-date`, `--check-out-date`, `--adults`, `--children`, `--rooms`. Pass `--children 0` explicitly when none.
|
|
47
|
+
|
|
48
|
+
When `--children > 0`, **also pass `--children-ages-json '[5,7]'`** with one age per child (0–17). Booking rejects the request otherwise.
|
|
49
|
+
|
|
50
|
+
Bracketed price filters (`--price-min` / `--price-max`) require `>= 10` / `>= 20` respectively; one of the two is required when filtering on price.
|
|
51
|
+
|
|
52
|
+
Top-level response: `results`, `searchInformation`, `pagination`, `requestMetadata`. Per-result keys (verified live): `hotelId`, `roomId`, `title`, `url`, `location`, `rating`, `reviews`, `price`, `room`, `beds`, `bedTypes`, `policies`, `photo`.
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
# Cheap-first filtered search
|
|
56
|
+
hasdata booking-search --keyword "Paris" \
|
|
57
|
+
--check-in-date 2026-08-01 --check-out-date 2026-08-04 \
|
|
58
|
+
--adults 2 --children 0 --rooms 1 \
|
|
59
|
+
--review-score reviewScoreVeryGood \
|
|
60
|
+
--sort priceLowestFirst --raw \
|
|
61
|
+
| jq -c '.results[] | {title, price: .price.total, rating, url}'
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## booking-place (10 credits)
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
hasdata booking-place \
|
|
68
|
+
--url "https://www.booking.com/hotel/fr/le-bristol-paris.html" \
|
|
69
|
+
--check-in-date 2026-07-10 --check-out-date 2026-07-13 \
|
|
70
|
+
--adults 2 --children 0 --rooms 1 \
|
|
71
|
+
[--currency USD] [--language en-us] \
|
|
72
|
+
--raw | jq .
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Required: `--url` (must be on `booking.com` / `www.booking.com`), stay dates, `--adults`, `--children`, `--rooms`.
|
|
76
|
+
|
|
77
|
+
Response: `overview`, `bookingDetails`, `rooms[]`, `facilities`, `houseRules`, `ratings`, `reviews`, `restaurants`, `breadcrumbs`, `questionsAndAnswers`. `overview` carries `id`, `title`, `address`, `description`, `propertyType`, `photos`, `highlights`, `mostPopularFacilities`. Each `rooms[i]` has `roomId`, `name`, `bedTypes`, `beds`, `facilities`, `otherFacilities`, `variants[]` (pricing/availability per package).
|
|
78
|
+
|
|
79
|
+
## google-flights (15 credits)
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
hasdata google-flights \
|
|
83
|
+
--departure-id "JFK" --arrival-id "LAX" \
|
|
84
|
+
--outbound-date 2026-06-15 --return-date 2026-06-22 \
|
|
85
|
+
--currency USD --raw | jq .
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Round-trip vs. one-way controlled by presence/absence of `--return-date`. Run `--help` for the full flag set (cabin class, max stops, preferred airlines, etc.).
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Non-obvious use cases
|
|
93
|
+
|
|
94
|
+
- **Hotel-vs-rental arbitrage** — same dates and party size via `booking-search` and `airbnb-listing`; compare nightly cost percentile-for-percentile. The cheaper platform isn't always the same one across cities.
|
|
95
|
+
- **Conference-room pricing audit** — `booking-search --keyword "$CITY" --check-in-date $START --check-out-date $END --sort priceHighestFirst` during a known conference window vs an idle week; the delta is the conference premium.
|
|
96
|
+
- **Family-friendly filter** — `--children-ages-json '[5,9]' --children 2 --rooms 1 --travel-group family` and Booking returns only properties that accept the party size with appropriate beds.
|
|
97
|
+
- **Loyalty-program portfolio** — `booking-search --keyword "$CITY" --raw | jq '.results[] | select(.title | test("Marriott|Hilton|Hyatt"))'` filters to chains you collect points with.
|
|
98
|
+
- **Airbnb price-arbitrage check** — same dates, same area, two `airbnb-listing` calls with different `--adults` counts to surface listings that don't scale per-person price. Sometimes the difference is the deal.
|
|
99
|
+
- **STR-vs-LTR feasibility** — pair `airbnb-listing` for nightly rates with `zillow-listing --type forSale` (see `real-estate.md`) for purchase price in the same area; compute gross yield client-side.
|
|
100
|
+
- **Flights without a travel API** — `google-flights` with `--departure-id`, `--arrival-id`, dates, and `--currency` for ad-hoc fare checks against a vendor like Skyscanner.
|
|
101
|
+
- **Multi-leg cost planning** — chain `google-flights` calls for each leg and sum `.best_flights[].price`; cheaper than the round-trip price-bot SaaS for one-off itineraries.
|
|
102
|
+
- **Trip-cost preview** — combine `google-flights` (transport) + `booking-search` / `airbnb-listing` (lodging) + `google-events` (activities, see `search.md`) into one cost estimate before pitching a destination.
|
|
@@ -0,0 +1,181 @@
|
|
|
1
|
+
# web-scraping reference
|
|
2
|
+
|
|
3
|
+
Subcommand: `web-scraping` (10 credits/call). Single endpoint for arbitrary URL scraping with JS rendering, proxies, AI extraction, screenshots, markdown conversion.
|
|
4
|
+
|
|
5
|
+
> **`web-scraping` is the last resort, not the default.** Before reaching for it, ask: would `google-serp` (or `google-news` / `google-shopping` / `google-maps`) already have the field I need? Google's `.knowledge_graph`, `.organic_results[].snippet`, `.local_results[]` carry pre-extracted public facts without direct page access. Only invoke `web-scraping` when:
|
|
6
|
+
>
|
|
7
|
+
> - the user gave you a specific URL to read, OR
|
|
8
|
+
> - SERP came up short for a specific field, OR
|
|
9
|
+
> - the target page renders content that doesn't show up in any SERP snippet.
|
|
10
|
+
>
|
|
11
|
+
> Use it only for public pages or content the user is authorized to access, and respect site terms, robots/access controls, privacy law, and rate limits.
|
|
12
|
+
>
|
|
13
|
+
> See `references/enrichment.md` for the SERP-first patterns.
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
hasdata web-scraping --url "URL" [flags] --raw | jq .
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
## Required
|
|
20
|
+
|
|
21
|
+
- `--url URL` — target page
|
|
22
|
+
|
|
23
|
+
## Output formats
|
|
24
|
+
|
|
25
|
+
- `--output-format html|text|markdown|json` (repeatable)
|
|
26
|
+
- One format → that format directly (e.g. `--output-format markdown` returns markdown text under `.markdown`)
|
|
27
|
+
- Multiple → JSON response with one key per format
|
|
28
|
+
- `--output-format json` combined with others → wraps everything in JSON
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
# LLM-friendly markdown for prompt context
|
|
32
|
+
hasdata web-scraping --url "$URL" --output-format markdown --raw | jq -r .markdown
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## Proxy & rendering
|
|
36
|
+
|
|
37
|
+
- `--proxy-type datacenter|residential` (default datacenter)
|
|
38
|
+
- `--proxy-country US|UK|DE|IE|FR|IT|SE|BR|CA|JP|SG|IN|ID` (default US)
|
|
39
|
+
- `--js-rendering` / `--no-js-rendering` (default on) — full headless browser
|
|
40
|
+
- `--block-ads` / `--no-block-ads` (default on)
|
|
41
|
+
- `--block-resources` / `--no-block-resources` (default on) — blocks images/CSS for speed
|
|
42
|
+
- `--screenshot` / `--no-screenshot` (default on; the result includes a screenshot URL)
|
|
43
|
+
- `--remove-base64-images` — strip inline base64 images from response
|
|
44
|
+
- `--extract-emails` / `--no-extract-emails` (default on)
|
|
45
|
+
- `--extract-links` (default off)
|
|
46
|
+
|
|
47
|
+
## Wait controls
|
|
48
|
+
|
|
49
|
+
- `--wait MS` — fixed wait after page load
|
|
50
|
+
- `--wait-for "CSS_SELECTOR"` — wait until selector appears
|
|
51
|
+
|
|
52
|
+
## Custom JS scenario (complex array — JSON only)
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
hasdata web-scraping --url "$URL" \
|
|
56
|
+
--js-scenario-json '[
|
|
57
|
+
{"wait": 2000},
|
|
58
|
+
{"click": ".load-more"},
|
|
59
|
+
{"waitFor": ".item"},
|
|
60
|
+
{"scrollY": 1000},
|
|
61
|
+
{"fill": ["input#q", "espresso"]}
|
|
62
|
+
]' --raw
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
Supported actions: `evaluate`, `click`, `wait`, `waitFor`, `waitForAndClick`, `scrollX`, `scrollY`, `fill`. Executed sequentially.
|
|
66
|
+
|
|
67
|
+
Accepts raw JSON, `@file.json`, or `-` (stdin).
|
|
68
|
+
|
|
69
|
+
## Headers (kvSlice + JSON escape)
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
# Repeatable kv form (splits on first `=`)
|
|
73
|
+
hasdata web-scraping --url "$URL" \
|
|
74
|
+
--headers "User-Agent=hasdata-cli" \
|
|
75
|
+
--headers "Accept-Language=en-US,en;q=0.9" \
|
|
76
|
+
--headers "Cookie=session=abc=def" \
|
|
77
|
+
--raw
|
|
78
|
+
|
|
79
|
+
# JSON base + kv overrides
|
|
80
|
+
hasdata web-scraping --url "$URL" \
|
|
81
|
+
--headers-json '{"User-Agent":"base","X-Common":"shared"}' \
|
|
82
|
+
--headers "User-Agent=override" \
|
|
83
|
+
--raw
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## CSS-selector data extraction (kvSlice or JSON)
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
# Lightweight kv form: --extract-rules KEY=SELECTOR
|
|
90
|
+
hasdata web-scraping --url "https://quotes.toscrape.com" \
|
|
91
|
+
--extract-rules "quote=.quote .text" \
|
|
92
|
+
--extract-rules "author=.quote .author" \
|
|
93
|
+
--raw | jq .
|
|
94
|
+
|
|
95
|
+
# JSON form for complex selectors / attributes
|
|
96
|
+
hasdata web-scraping --url "$URL" \
|
|
97
|
+
--extract-rules-json '{"title":"h1","links":"a @href","price":".price-now"}' \
|
|
98
|
+
--raw
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
`@href`, `@src`, etc. extract attributes. Without `@`, extracts text content.
|
|
102
|
+
|
|
103
|
+
## AI extraction (LLM-driven)
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
hasdata web-scraping --url "$URL" \
|
|
107
|
+
--ai-extract-rules-json '{
|
|
108
|
+
"headline": {"type": "string", "description": "the main story headline"},
|
|
109
|
+
"comments_count": {"type": "number"},
|
|
110
|
+
"is_paid_content": {"type": "boolean"},
|
|
111
|
+
"tags": {"type": "list", "description": "topic tags"},
|
|
112
|
+
"author": {"type": "item", "output": {
|
|
113
|
+
"name": {"type": "string"},
|
|
114
|
+
"verified": {"type": "boolean"}
|
|
115
|
+
}}
|
|
116
|
+
}' --raw | jq .
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
Supported types: `string`, `number`, `boolean`, `list`, `item` (nested object — defines its shape under `output`).
|
|
120
|
+
|
|
121
|
+
## Tag filtering
|
|
122
|
+
|
|
123
|
+
- `--include-only-tags "main,article"` (comma-joined CSS selectors) — keep only matching elements
|
|
124
|
+
- `--exclude-tags script --exclude-tags style` (repeatable) — remove elements
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
hasdata web-scraping --url "$URL" \
|
|
128
|
+
--output-format markdown \
|
|
129
|
+
--include-only-tags "article,main" \
|
|
130
|
+
--exclude-tags script --exclude-tags style --exclude-tags nav \
|
|
131
|
+
--raw | jq -r .markdown
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## URL blocklist
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
--block-urls-json '["**.googletagmanager.com/**","**.doubleclick.net/**"]'
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
Glob patterns block specific subresource URLs from loading.
|
|
141
|
+
|
|
142
|
+
## Saving binary output
|
|
143
|
+
|
|
144
|
+
The `web-scraping` response is JSON, but if `--output-format` is set to a single non-JSON format, the wrapped result is still JSON. Use `jq -r .markdown > file.md` to extract text. For screenshots specifically, the response contains a screenshot URL — fetch it separately with `curl`.
|
|
145
|
+
|
|
146
|
+
## Non-obvious use cases
|
|
147
|
+
|
|
148
|
+
- **Page-to-prompt grounding** — `--output-format markdown` produces clean LLM-ready text from any URL. Strip nav/ads with `--exclude-tags script --exclude-tags style --exclude-tags nav`. Beats fetch + regex.
|
|
149
|
+
- **JavaScript-rendered SPAs that `curl` can't read** — default `--js-rendering` uses a real browser, so React/Vue/Angular pages return their hydrated DOM, not the empty shell.
|
|
150
|
+
- **Geo/availability testing where allowed** — `--proxy-type residential` can model residential network availability; use only for authorized tests where the target's terms and access controls permit it.
|
|
151
|
+
- **Geo-targeted content** — `--proxy-country DE` to see what users in Germany see (different prices, currencies, A/B variants, or geo-blocked content).
|
|
152
|
+
- **Quick "is this page real" check** — `--screenshot` (default on) returns a screenshot URL in the response; verify visually without manually opening the URL.
|
|
153
|
+
- **Universal price extractor** — `--ai-extract-rules-json '{"price":{"type":"number"},"currency":{"type":"string"},"in_stock":{"type":"boolean"}}'` works on arbitrary retailer pages without writing a CSS selector. Cheaper than maintaining per-site selectors when the user only needs occasional spot-checks.
|
|
154
|
+
- **Authenticated content with user authority** — `--headers Cookie=session=...` injects auth cookies if the user has them. Use only with explicit permission and authority to access that account/content; never use cookies to bypass someone else's access controls.
|
|
155
|
+
- **Convert paginated lists to a clean record set** — combine `--js-scenario-json` (click "Load more" 5×) with `--ai-extract-rules-json` (pull the list shape). Lets you scrape paginated SPAs with one CLI call instead of N.
|
|
156
|
+
- **Headless screenshot of a layout** — set `--js-rendering`, `--no-block-resources` (so CSS loads), and capture the screenshot URL from the response. Useful for "render this URL and show me what it looks like".
|
|
157
|
+
- **Markdown for RAG ingestion** — pipe `.markdown` from many URLs into a JSONL corpus; embed and store. The CLI handles JS, ads, images so you don't need a custom pipeline.
|
|
158
|
+
- **Fallback for any other API** — when no purpose-built API exists for a vertical (e.g. niche directories, government pages, less-popular real-estate sites), `web-scraping` is the catch-all.
|
|
159
|
+
- **Detect content changes** — schedule `web-scraping --url X --output-format markdown` and diff the output across runs to flag pricing-page or terms-of-service changes.
|
|
160
|
+
- **Read PDFs / non-HTML resources** — `--output-format text` works on text-extractable PDFs accessible via URL (the underlying renderer handles them).
|
|
161
|
+
- **AI extraction for forms or tables** — pages with structured data in HTML tables are easy: `--ai-extract-rules-json '{"rows":{"type":"list","output":{"name":{"type":"string"},"value":{"type":"number"}}}}'`. The model fills in nested rows.
|
|
162
|
+
|
|
163
|
+
## Common patterns
|
|
164
|
+
|
|
165
|
+
```bash
|
|
166
|
+
# Full-page markdown for RAG
|
|
167
|
+
hasdata web-scraping --url "$URL" \
|
|
168
|
+
--output-format markdown --no-screenshot --no-block-resources \
|
|
169
|
+
--raw | jq -r .markdown >> corpus.md
|
|
170
|
+
|
|
171
|
+
# JS-heavy SPA: wait + scroll
|
|
172
|
+
hasdata web-scraping --url "$URL" \
|
|
173
|
+
--js-scenario-json '[{"wait":2000},{"scrollY":2000},{"wait":1500}]' \
|
|
174
|
+
--wait-for ".item" \
|
|
175
|
+
--output-format html --raw | jq -r .html
|
|
176
|
+
|
|
177
|
+
# Extract structured data from an arbitrary page
|
|
178
|
+
hasdata web-scraping --url "$URL" \
|
|
179
|
+
--ai-extract-rules-json '{"price":{"type":"number"},"in_stock":{"type":"boolean"}}' \
|
|
180
|
+
--raw | jq .
|
|
181
|
+
```
|
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
# YouTube reference
|
|
2
|
+
|
|
3
|
+
Subcommands: `youtube-search-api`, `youtube-video-api`, `youtube-channel-api`, `youtube-transcript-api`. 10 credits each.
|
|
4
|
+
|
|
5
|
+
`*-search-api` is keyword-driven; the others target a specific video, channel, or transcript. Most workflows chain search → video → transcript.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## youtube-search-api
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
hasdata youtube-search-api --q "QUERY" [--sort-by relevance|date|views|rating|popularity] \
|
|
13
|
+
[--length under4|between420|plus20] [--date hour|today|week|month|year] \
|
|
14
|
+
[--video-type video|shorts|channel|playlist|movie] \
|
|
15
|
+
[--gl us] [--hl en] [--device-type desktop|mobile] \
|
|
16
|
+
[--pagination-token TOKEN] --raw | jq .
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
Common flags:
|
|
20
|
+
- `--q TEXT` (required)
|
|
21
|
+
- `--sort-by relevance|date|views|rating|popularity`
|
|
22
|
+
- `--date hour|today|week|month|year` — upload-recency window
|
|
23
|
+
- `--length under4|between420|plus20` — duration bucket (`<4m`, `4–20m`, `>20m`)
|
|
24
|
+
- `--video-type video|shorts|channel|playlist|movie`
|
|
25
|
+
- `--filters hd,k4,hdr,subtitles,cc,d3,d360,vr180,live,bought,location` — feature flags ANDed
|
|
26
|
+
- `--gl`, `--hl` — country / language
|
|
27
|
+
- `--pagination-token` — copy from previous response's `pagination.nextPageToken`
|
|
28
|
+
- `--sp` — raw YouTube `sp=` filter token (overrides `sort-by` / `date` / `video-type` / `length` / `filters`)
|
|
29
|
+
|
|
30
|
+
Top-level response keys: `videoResults`, `shortsResults`, `channelResults`, `playlistResults`, `adsResults`, `sponsoredResults`, `searchInformation`, `pagination`.
|
|
31
|
+
|
|
32
|
+
Per-video result: `videoId`, `title`, `link`, `channel`, `description`, `length`, `views`, `viewsOriginal`, `publishedDate`, `thumbnail`, `positionOnPage`.
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
# Latest videos for a topic
|
|
36
|
+
hasdata youtube-search-api --q "$Q" --sort-by date --date week --raw \
|
|
37
|
+
| jq -c '.videoResults[] | {title, channel: .channel.name, views, publishedDate, link}'
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## youtube-video-api
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
hasdata youtube-video-api --v-param VIDEO_ID [--gl us] [--hl en] --raw | jq .
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
- `--v-param` (required) — 11-char video ID (the `v=` part of the watch URL)
|
|
47
|
+
- `--device-type desktop|mobile`
|
|
48
|
+
- `--gl`, `--hl`
|
|
49
|
+
|
|
50
|
+
Top-level fields: `videoId`, `title`, `description`, `channel`, `views`, `extractedViews`, `likes`, `extractedLikes`, `lengthSeconds`, `publishedDate`, `keywords[]`, `captions`, `socialLinks`, `music`, `category`, `thumbnail`, `isFamilySafe`, `isUnlisted`, `relatedVideos[]`, `relatedShorts[]`, `endScreenVideos[]`.
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
# Quick stats
|
|
54
|
+
hasdata youtube-video-api --v-param "$VID" --raw \
|
|
55
|
+
| jq '{title, views: .extractedViews, likes: .extractedLikes, length: .lengthSeconds, published: .publishedDate, channel: .channel.name}'
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
## youtube-channel-api
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
hasdata youtube-channel-api --channel-id "@HANDLE_OR_UCID" \
|
|
62
|
+
[--tab featured|videos|shorts|streams|playlists|posts|community|podcasts|releases|about|store] \
|
|
63
|
+
[--gl us] [--hl en] [--pagination-token TOKEN] --raw | jq .
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
- `--channel-id` (required) — `@handle`, `UC…` canonical ID, or legacy `/c/<custom>` / `/user/<name>` URL slug
|
|
67
|
+
- `--tab` — which tab to scrape; default `featured` (Home page)
|
|
68
|
+
- `--pagination-token` — for tabs that paginate (`videos`, `shorts`, etc.)
|
|
69
|
+
|
|
70
|
+
Top-level response: `channelInfo`, `featuredVideo`, `sections[]`.
|
|
71
|
+
|
|
72
|
+
`channelInfo` carries: `name`, `handle`, `channelId`, `channelUrl`, `avatar`, `banner`, `description`, `subscribers`, `extractedSubscribers`, `videosCount`, `extractedVideosCount`, `keywords[]`, `availableTabs[]`, `verified`, `websiteUrl`, `rssUrl`, `isFamilySafe`.
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
# All uploads (paginated)
|
|
76
|
+
hasdata youtube-channel-api --channel-id "@MrBeast" --tab videos --raw \
|
|
77
|
+
| jq -c '.sections[].items[]? | {title, videoId, views, publishedDate}'
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
## youtube-transcript-api
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
hasdata youtube-transcript-api --v-param VIDEO_ID [--language-code en] [--type asr] --raw | jq .
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
- `--v-param` (required) — 11-char video ID
|
|
87
|
+
- `--language-code` — BCP-47 / YouTube code (`en`, `de`, `en-US`, `pt-BR`); must match a track the video actually has
|
|
88
|
+
- `--type asr` — fetch the auto-generated speech-recognition track (omit for human-authored)
|
|
89
|
+
|
|
90
|
+
Response: `transcript[]` and `availableTranscripts[]`.
|
|
91
|
+
|
|
92
|
+
Each `transcript[]` entry: `startMs`, `endMs`, `snippet`, `startTimeText`. Join `snippet`s to reconstruct the full text.
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
# Flatten to plain text
|
|
96
|
+
hasdata youtube-transcript-api --v-param "$VID" --raw \
|
|
97
|
+
| jq -r '.transcript[].snippet' | tr '\n' ' '
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## Non-obvious use cases
|
|
103
|
+
|
|
104
|
+
- **"What is this video actually about?"** — `youtube-transcript-api --v-param X --raw | jq -r '.transcript[].snippet'` → feed to an LLM for a real summary, not a thumbnail / title guess.
|
|
105
|
+
- **Cite YouTube content in a brief** — transcript + timestamps lets you quote with `(02:14)` accuracy. `--type asr` fills the gap when the creator never uploaded captions.
|
|
106
|
+
- **Channel growth audit** — `youtube-channel-api --channel-id @X --tab videos` paginated; `jq '.sections[].items[] | {publishedDate, views: .extractedViews}'` gives a velocity-over-time table.
|
|
107
|
+
- **Trend discovery** — `youtube-search-api --q "TOPIC" --sort-by views --date month` returns the highest-viewed videos in the last month — a leading indicator for cultural trends well before they hit Google News.
|
|
108
|
+
- **Competitor content map** — `youtube-channel-api --channel-id @competitor --tab videos --raw | jq -r '.sections[].items[].title'` enumerates every published title, useful for content-gap analysis.
|
|
109
|
+
- **Brand-safety scan** — `youtube-search-api --q "BRAND" --sort-by date --date week --raw | jq '.videoResults[] | select(.title | test("BRAND"; "i"))'` catches new mentions before they go viral.
|
|
110
|
+
- **Influencer due diligence** — combine `youtube-channel-api` for subscriber/video counts with `youtube-video-api` on their top videos (engagement rate ≈ `likes / views`).
|
|
111
|
+
- **"Find the moment they said X"** — `youtube-transcript-api` then `jq -r '.transcript[] | select(.snippet | test("X"; "i")) | "\(.startTimeText): \(.snippet)"'` returns timestamps of every mention.
|
|
112
|
+
- **Music-licensing lookup** — `youtube-video-api --v-param X --raw | jq .music` returns identified tracks (artist, title) when YouTube's Content ID matched.
|
|
113
|
+
- **Translate / re-localize a video** — pull the English transcript with `youtube-transcript-api`, translate via LLM, regenerate subtitles. Cheaper than re-transcribing audio.
|
|
114
|
+
- **Build a podcast / RSS feed for a channel** — `youtube-channel-api --raw | jq -r .channelInfo.rssUrl` returns the official RSS URL; subscribe in any podcast app.
|
|
115
|
+
- **Detect deleted videos** — `youtube-video-api --v-param X` returns an error for removed videos; useful for catching takedowns in archival pipelines.
|
|
116
|
+
- **Bulk research → transcript chain** — `youtube-search-api --q "$Q" --raw | jq -r '.videoResults[].videoId' | xargs -I{} hasdata youtube-transcript-api --v-param {} --raw` builds a research corpus from a topic search.
|
|
117
|
+
- **Shorts-only / long-form-only feeds** — `--video-type shorts` vs `--length plus20` to bias toward one format.
|
|
118
|
+
- **Live-stream discovery** — `--filters live` returns only currently live broadcasts; pair with `--sort-by date` for fresh streams.
|
|
119
|
+
|
|
120
|
+
## Pipelines
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
# Search → top 5 video transcripts as a corpus
|
|
124
|
+
hasdata youtube-search-api --q "$Q" --sort-by views --raw \
|
|
125
|
+
| jq -r '.videoResults[:5][].videoId' \
|
|
126
|
+
| while read -r vid; do
|
|
127
|
+
echo "=== $vid ==="
|
|
128
|
+
hasdata youtube-transcript-api --v-param "$vid" --raw \
|
|
129
|
+
| jq -r '.transcript[].snippet'
|
|
130
|
+
done > corpus.txt
|
|
131
|
+
|
|
132
|
+
# Channel → CSV of all videos
|
|
133
|
+
hasdata youtube-channel-api --channel-id "@$HANDLE" --tab videos --raw \
|
|
134
|
+
| jq -r '.sections[].items[]? | [.videoId, .title, (.extractedViews // 0), .publishedDate] | @csv' \
|
|
135
|
+
> channel_videos.csv
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
## Gotchas
|
|
139
|
+
|
|
140
|
+
- **`--v-param` is exactly 11 chars** — not the full watch URL. Strip the `v=` value or use `jq -r '.videoResults[0].videoId'` from a prior search.
|
|
141
|
+
- **`--language-code` must exist on the video.** Pass one of the codes listed in `availableTranscripts[]`, or omit for the default track.
|
|
142
|
+
- **`--type asr` is needed when no human-authored captions exist.** Without it the API returns the default human track and errors if there isn't one.
|
|
143
|
+
- **`@handle` is preferred over `UC…`** — easier to read, same result. Legacy `/c/` and `/user/` slugs also resolve.
|
|
144
|
+
- **`pagination.nextPageToken` is opaque** — pass it back via `--pagination-token` verbatim; don't try to decode.
|
|
145
|
+
- **`views` vs `extractedViews`** — `views` is the formatted string (`"1.2M views"`), `extractedViews` is the integer. Use the integer for math.
|