crawlora 1.5.0.pre.sdk.2 → 1.6.0.pre.sdk.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d5df185ab17cbdb8ee244385b900049f2209efe980e1a672f39a7f61d18bfd35
4
- data.tar.gz: af839794521c83c882affd859e4929ec76225144e45bb3f29141795b0fbb9b69
3
+ metadata.gz: '06503911970d596c0e191164b365247bde58a78ce2cd840c0ff18492b9d1cc6b'
4
+ data.tar.gz: bf80854532f824cf8ce73881ed7c978fcd81073f178505f91e3723179ff6b7aa
5
5
  SHA512:
6
- metadata.gz: ffd5f36fde004299e22e4f07d4c19e9904032d0ff78ee7c9a57b9d4d49964a2fba2fc0e7e14f412e9814b1c64e3d599c63259b69a6173dfbf50a71414354bd87
7
- data.tar.gz: 2d529c261a355e69021267c41f520db47c0a8a399d107108ba08d3fe839158ef6929ef88986c6205592a144980a2f977af4eceeb42fa20ee2a2eb74568232874
6
+ metadata.gz: 14ee5fee236698951fa031eb8d86469fc481b27f18394d9dbffb6ba40301229f3a9a1806f2323c0576945e906cd961cb1a9576333b064354f497d7a5e52fc3c5
7
+ data.tar.gz: 9cad2508981712b0146393781dcbad62a28ab93ef17fae77f0159f97f4fef8c96ccc93aa2adcfa8bc8ed3ca9f00861ac3acf41b9bb1c3e8bcabd51e2d8ed755c
data/CHANGELOG.md CHANGED
@@ -1,5 +1,20 @@
1
1
  # Changelog
2
2
 
3
+ ## 1.6.0-sdk.1
4
+
5
+ - Added the **Reddit** platform (`reddit.search`, `reddit.post`,
6
+ `reddit.comments`, `reddit.subreddit_posts`) and the **Brand** platform
7
+ (`brand.retrieve`), plus Yahoo Finance `yahoo_finance.lookup`. Regenerated from
8
+ the public API contract.
9
+
10
+ ## 1.5.0-sdk.3
11
+
12
+ - Richer RBS: generated `sig/crawlora.rbs` now declares typed keyword parameters
13
+ per operation (Steep/Sorbet users get real signatures instead of `**untyped`).
14
+ - Internal cleanups: split the request and pagination methods into focused
15
+ private helpers, enabled tuned rubocop metric budgets, and hardened multipart
16
+ `Content-Disposition` field/filename escaping. No public API changes.
17
+
3
18
  ## 1.5.0-sdk.2
4
19
 
5
20
  - Packaging: point the gem homepage at https://crawlora.net/, expand the gem
data/README.md CHANGED
@@ -11,13 +11,17 @@ plus retries, pagination, middleware hooks, and client-side rate limiting.
11
11
 
12
12
  ## Install
13
13
 
14
+ Published on [RubyGems](https://rubygems.org/gems/crawlora). The current release
15
+ is a prerelease (`1.5.0.pre.sdk.3`), so install it with `--pre` or pin the
16
+ version:
17
+
14
18
  ```ruby
15
19
  # Gemfile
16
- gem "crawlora"
20
+ gem "crawlora", "1.5.0.pre.sdk.3"
17
21
  ```
18
22
 
19
23
  ```sh
20
- gem install crawlora
24
+ gem install crawlora --pre
21
25
  ```
22
26
 
23
27
  ## Quick start
data/docs/operations.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  Generated from `openapi/public.json`. Deprecated, admin, and internal operations are excluded from this SDK contract.
4
4
 
5
- Total operations: `330`
5
+ Total operations: `336`
6
6
 
7
7
  | Group | SDK method | Operation ID | HTTP | Params | Auth | Response | Notes |
8
8
  | --- | --- | --- | --- | --- | --- | --- | --- |
@@ -41,6 +41,7 @@ Total operations: `330`
41
41
  | bing | `bing.search` | `bing-search` | `GET /bing/search` | `q` (query String required)<br>`page` (query Integer)<br>`count` (query Integer)<br>`country` (query String)<br>`lang` (query String) | `ApiKeyAuth` | `BingSearchResponse` | |
42
42
  | bing | `bing.suggest` | `bing-suggest` | `GET /bing/suggest` | `q` (query String required)<br>`count` (query Integer)<br>`country` (query String)<br>`lang` (query String) | `ApiKeyAuth` | `BingSuggestResponse` | |
43
43
  | bing | `bing.videos` | `bing-videos` | `GET /bing/videos` | `q` (query String required)<br>`page` (query Integer)<br>`count` (query Integer)<br>`country` (query String)<br>`lang` (query String) | `ApiKeyAuth` | `BingVideosResponse` | |
44
+ | brand | `brand.retrieve` | `brand-retrieve` | `GET /brand/retrieve` | `domain` (query String required)<br>`force_language` (query String)<br>`maxSpeed` (query bool)<br>`maxAgeMs` (query Integer)<br>`timeoutMS` (query Integer) | `ApiKeyAuth` | `BrandRetrieveResponse` | |
44
45
  | brave | `brave.images` | `brave-images` | `GET /brave/images` | `q` (query String required)<br>`offset` (query Integer)<br>`count` (query Integer)<br>`country` (query "all" \| "ar" \| "at" \| "au" \| "be" \| "br" \| "ca" \| "ch" \| "cl" \| "cn" \| "de" \| "dk" \| "es" \| "fi" \| "fr" \| "gb" \| "gr" \| "hk" \| "id" \| "in" \| "it" \| "jp" \| "kr" \| "mx" \| "my" \| "nl" \| "no" \| "nz" \| "ph" \| "pl" \| "pt" \| "ru" \| "sa" \| "se" \| "sg" \| "tr" \| "tw" \| "us" \| "za")<br>`lang` (query "de-de" \| "en-ca" \| "en-gb" \| "en-in" \| "en-us" \| "fi-fi" \| "fr-ca" \| "fr-fr" \| "ja-jp" \| "pt-br" \| "sq-al" \| "sw-ke" \| "zh-tw") | `ApiKeyAuth` | `BraveImagesResponse` | |
45
46
  | brave | `brave.news` | `brave-news` | `GET /brave/news` | `q` (query String required)<br>`offset` (query Integer)<br>`count` (query Integer)<br>`country` (query "all" \| "ar" \| "at" \| "au" \| "be" \| "br" \| "ca" \| "ch" \| "cl" \| "cn" \| "de" \| "dk" \| "es" \| "fi" \| "fr" \| "gb" \| "gr" \| "hk" \| "id" \| "in" \| "it" \| "jp" \| "kr" \| "mx" \| "my" \| "nl" \| "no" \| "nz" \| "ph" \| "pl" \| "pt" \| "ru" \| "sa" \| "se" \| "sg" \| "tr" \| "tw" \| "us" \| "za")<br>`lang` (query "de-de" \| "en-ca" \| "en-gb" \| "en-in" \| "en-us" \| "fi-fi" \| "fr-ca" \| "fr-fr" \| "ja-jp" \| "pt-br" \| "sq-al" \| "sw-ke" \| "zh-tw")<br>`time_range` (query "any" \| "day" \| "week" \| "month" \| "year" \| "custom")<br>`date_from` (query String)<br>`date_to` (query String) | `ApiKeyAuth` | `BraveNewsResponse` | |
46
47
  | brave | `brave.search` | `brave-search` | `GET /brave/search` | `q` (query String required)<br>`offset` (query Integer)<br>`country` (query "all" \| "ar" \| "at" \| "au" \| "be" \| "br" \| "ca" \| "ch" \| "cl" \| "cn" \| "de" \| "dk" \| "es" \| "fi" \| "fr" \| "gb" \| "gr" \| "hk" \| "id" \| "in" \| "it" \| "jp" \| "kr" \| "mx" \| "my" \| "nl" \| "no" \| "nz" \| "ph" \| "pl" \| "pt" \| "ru" \| "sa" \| "se" \| "sg" \| "tr" \| "tw" \| "us" \| "za")<br>`lang` (query "de-de" \| "en-ca" \| "en-gb" \| "en-in" \| "en-us" \| "fi-fi" \| "fr-ca" \| "fr-fr" \| "ja-jp" \| "pt-br" \| "sq-al" \| "sw-ke" \| "zh-tw")<br>`time_range` (query "any" \| "day" \| "week" \| "month" \| "year" \| "custom")<br>`date_from` (query String)<br>`date_to` (query String) | `ApiKeyAuth` | `BraveSearchResponse` | |
@@ -167,6 +168,10 @@ Total operations: `330`
167
168
  | product_hunt | `product_hunt.makers` | `producthunt-makers` | `GET /producthunt/product/{id}/makers` | `id` (path String required)<br>`cursor` (query String) | `ApiKeyAuth` | `ProductHuntMakersResponse` | |
168
169
  | product_hunt | `product_hunt.reviews` | `producthunt-reviews` | `GET /producthunt/product/{id}/reviews` | `id` (path String required) | `ApiKeyAuth` | `ProductHuntReviewsResponse` | |
169
170
  | product_hunt | `product_hunt.search` | `producthunt-search` | `GET /producthunt/search` | `query` (query String required)<br>`type` (query "product" \| "user" \| "launch")<br>`page` (query Integer)<br>`featured` (query bool)<br>`topics` (query String) | `ApiKeyAuth` | `ProductHuntSearchResponse` | |
171
+ | reddit | `reddit.comments` | `reddit-comments` | `GET /reddit/comments/{id}` | `id` (path String required)<br>`sort` (query "confidence" \| "top" \| "new" \| "controversial" \| "old" \| "qa")<br>`limit` (query Integer)<br>`depth` (query Integer) | `ApiKeyAuth` | `RedditCommentsResponse` | |
172
+ | reddit | `reddit.post` | `reddit-post` | `GET /reddit/post/{id}` | `id` (path String required) | `ApiKeyAuth` | `RedditPostResponse` | |
173
+ | reddit | `reddit.search` | `reddit-search` | `GET /reddit/search` | `q` (query String required)<br>`subreddit` (query String)<br>`sort` (query "relevance" \| "hot" \| "new" \| "top" \| "comments")<br>`time` (query "hour" \| "day" \| "week" \| "month" \| "year" \| "all")<br>`limit` (query Integer)<br>`after` (query String) | `ApiKeyAuth` | `RedditSearchResponse` | |
174
+ | reddit | `reddit.subreddit_posts` | `reddit-subreddit-posts` | `GET /reddit/subreddit/{subreddit}/posts` | `subreddit` (path String required)<br>`sort` (query "hot" \| "new" \| "top" \| "rising")<br>`time` (query "hour" \| "day" \| "week" \| "month" \| "year" \| "all")<br>`limit` (query Integer)<br>`after` (query String) | `ApiKeyAuth` | `RedditSubredditPostsResponse` | |
170
175
  | referrals | `referrals.click` | `referrals-click` | `POST /referrals/click` | `request` (body String required) | none | `ReferralsClickResponse` | |
171
176
  | referrals | `referrals.me` | `referrals-me` | `GET /referrals/me` | none | `JWTAuth` | `ReferralsMeResponse` | |
172
177
  | referrals | `referrals.me_events` | `referrals-me-events` | `GET /referrals/me/events` | `limit` (query Integer) | `JWTAuth` | `ReferralsMeEventsResponse` | |
@@ -274,10 +279,10 @@ Total operations: `330`
274
279
  | trustpilot | `trustpilot.categories` | `trustpilot-categories` | `GET /trustpilot/categories` | none | `ApiKeyAuth` | `TrustpilotCategoriesResponse` | |
275
280
  | trustpilot | `trustpilot.category_search` | `trustpilot-category-search` | `GET /trustpilot/categories/search` | `q` (query String required)<br>`country` (query String)<br>`locale` (query String)<br>`size` (query Integer) | `ApiKeyAuth` | `TrustpilotCategorySearchResponse` | |
276
281
  | trustpilot | `trustpilot.category` | `trustpilot-category` | `GET /trustpilot/category/{slug}` | `slug` (path String required)<br>`page` (query Integer) | `ApiKeyAuth` | `TrustpilotCategoryResponse` | |
277
- | usage | `usage.me_endpoints` | `usage-me-endpoints` | `GET /usage/me/endpoints` | `range` (query "period" \| "day" \| "week" \| "month" \| "custom")<br>`limit` (query Integer)<br>`from` (query String)<br>`to` (query String) | `JWTAuth` | `UsageMeEndpointsResponse` | |
278
- | usage | `usage.me_overview` | `usage-me-overview` | `GET /usage/me/overview` | `range` (query "period" \| "day" \| "week" \| "month" \| "custom")<br>`from` (query String)<br>`to` (query String) | `JWTAuth` | `UsageMeOverviewResponse` | |
279
- | usage | `usage.me_recent_ips` | `usage-me-recent-ips` | `GET /usage/me/recent-ips` | `range` (query "period" \| "day" \| "week" \| "month" \| "custom")<br>`limit` (query Integer)<br>`from` (query String)<br>`to` (query String) | `JWTAuth` | `UsageMeRecentIpsResponse` | |
280
- | usage | `usage.me_timeseries` | `usage-me-timeseries` | `GET /usage/me/timeseries` | `range` (query "period" \| "day" \| "week" \| "month" \| "custom")<br>`bucket` (query "hour" \| "day")<br>`endpoint` (query String)<br>`from` (query String)<br>`to` (query String) | `JWTAuth` | `UsageMeTimeseriesResponse` | |
282
+ | usage | `usage.me_endpoints` | `usage-me-endpoints` | `GET /usage/me/endpoints` | `range` (query "period" \| "day" \| "week" \| "month" \| "custom")<br>`limit` (query Integer)<br>`from` (query String)<br>`to` (query String) | `ApiKeyAuth` | `UsageMeEndpointsResponse` | |
283
+ | usage | `usage.me_overview` | `usage-me-overview` | `GET /usage/me/overview` | `range` (query "period" \| "day" \| "week" \| "month" \| "custom")<br>`from` (query String)<br>`to` (query String) | `ApiKeyAuth` | `UsageMeOverviewResponse` | |
284
+ | usage | `usage.me_recent_ips` | `usage-me-recent-ips` | `GET /usage/me/recent-ips` | `range` (query "period" \| "day" \| "week" \| "month" \| "custom")<br>`limit` (query Integer)<br>`from` (query String)<br>`to` (query String) | `ApiKeyAuth` | `UsageMeRecentIpsResponse` | |
285
+ | usage | `usage.me_timeseries` | `usage-me-timeseries` | `GET /usage/me/timeseries` | `range` (query "period" \| "day" \| "week" \| "month" \| "custom")<br>`bucket` (query "hour" \| "day")<br>`endpoint` (query String)<br>`from` (query String)<br>`to` (query String) | `ApiKeyAuth` | `UsageMeTimeseriesResponse` | |
281
286
  | user | `user.me` | `user-me` | `GET /user/me` | none | `JWTAuth` | `UserMeResponse` | |
282
287
  | user | `user.me_api_keys` | `user-me-api-keys` | `GET /user/me/api-keys` | none | `JWTAuth` | `UserMeApiKeysResponse` | |
283
288
  | user | `user.me_api_keys_rotate` | `user-me-api-keys-rotate` | `POST /user/me/api-keys/rotate` | none | `JWTAuth` | `UserMeApiKeysRotateResponse` | |
@@ -287,6 +292,7 @@ Total operations: `330`
287
292
  | yahoo_finance | `yahoo_finance.download` | `yahoo-finance-download` | `POST /yahoo-finance/download` | `request` (body String required) | `ApiKeyAuth` | `YahooFinanceDownloadResponse` | |
288
293
  | yahoo_finance | `yahoo_finance.industries` | `yahoo-finance-industries` | `GET /yahoo-finance/industries` | none | `ApiKeyAuth` | `YahooFinanceIndustriesResponse` | |
289
294
  | yahoo_finance | `yahoo_finance.industry` | `yahoo-finance-industry` | `GET /yahoo-finance/industries/{key}` | `key` (path String required) | `ApiKeyAuth` | `YahooFinanceIndustryResponse` | |
295
+ | yahoo_finance | `yahoo_finance.lookup` | `yahoo-finance-lookup` | `GET /yahoo-finance/lookup` | `query` (query String required)<br>`type` (query "all" \| "equity" \| "etf" \| "mutualfund" \| "index" \| "future" \| "currency" \| "cryptocurrency")<br>`count` (query Integer)<br>`start` (query Integer) | `ApiKeyAuth` | `YahooFinanceLookupResponse` | |
290
296
  | yahoo_finance | `yahoo_finance.market_status` | `yahoo-finance-market-status` | `GET /yahoo-finance/market/{market}/status` | `market` (path String required) | `ApiKeyAuth` | `YahooFinanceMarketStatusResponse` | |
291
297
  | yahoo_finance | `yahoo_finance.market_summary` | `yahoo-finance-market-summary` | `GET /yahoo-finance/market/{market}/summary` | `market` (path String required) | `ApiKeyAuth` | `YahooFinanceMarketSummaryResponse` | |
292
298
  | yahoo_finance | `yahoo_finance.screener_custom` | `yahoo-finance-screener-custom` | `POST /yahoo-finance/screener` | `request` (body String required) | `ApiKeyAuth` | `YahooFinanceScreenerCustomResponse` | |
data/docs/recipes.md CHANGED
@@ -17,6 +17,15 @@ Crawlora.client(jwt_token: "eyJ…")
17
17
  Both fall back to environment variables: `CRAWLORA_API_KEY` and
18
18
  `CRAWLORA_BASE_URL`.
19
19
 
20
+ ## Reddit and Brand
21
+
22
+ Newer platforms are grouped like every other endpoint:
23
+
24
+ ```ruby
25
+ posts = client.reddit.search(q: "ruby", subreddit: "programming")
26
+ brand = client.brand.retrieve(domain: "stripe.com")
27
+ ```
28
+
20
29
  ## Retries and Retry-After
21
30
 
22
31
  ```ruby
@@ -167,7 +167,7 @@ module Crawlora
167
167
  @on_retry = on_retry
168
168
  @request_id = request_id
169
169
  @idempotency_keys = idempotency_keys
170
- @rate_limiter = rate_limit || max_concurrency ? RateLimiter.new(rate_limit, max_concurrency) : nil
170
+ @rate_limiter = (rate_limit || max_concurrency) ? RateLimiter.new(rate_limit, max_concurrency) : nil
171
171
  @logger = logger
172
172
  @before_request = as_hook_list(before_request)
173
173
  @after_response = as_hook_list(after_response)
@@ -200,7 +200,7 @@ module Crawlora
200
200
  log(event: "request", operation: operation_id)
201
201
  max_retries = retries.nil? ? @retries : [0, retries.to_i].max
202
202
  idempotency_key =
203
- @idempotency_keys && %w[POST PATCH].include?(operation["method"]) ? SecureRandom.hex(16) : nil
203
+ (@idempotency_keys && %w[POST PATCH].include?(operation["method"])) ? SecureRandom.hex(16) : nil
204
204
 
205
205
  attempt = 0
206
206
  loop do
@@ -225,8 +225,8 @@ module Crawlora
225
225
  # +next_cursor+ extractor) sends the cursor parameter and stops when
226
226
  # +next_cursor+ returns a falsy value.
227
227
  def paginate(operation_id, params = {}, page_param: nil, cursor_param: nil, next_cursor: nil,
228
- start: nil, step: 1, max_pages: nil, response_type: "auto", timeout: nil, headers: nil)
229
- unless block_given?
228
+ start: nil, step: 1, max_pages: nil, response_type: "auto", timeout: nil, headers: nil, &block)
229
+ unless block
230
230
  return enum_for(:paginate, operation_id, params, page_param: page_param, cursor_param: cursor_param,
231
231
  next_cursor: next_cursor, start: start, step: step, max_pages: max_pages,
232
232
  response_type: response_type, timeout: timeout, headers: headers)
@@ -236,29 +236,56 @@ module Crawlora
236
236
  raise ArgumentError, "unknown Crawlora operation: #{operation_id}" if operation.nil?
237
237
 
238
238
  base_params = stringify_keys(params)
239
+ opts = { response_type: response_type, timeout: timeout, headers: headers }
239
240
 
240
241
  if cursor_param || next_cursor
241
- raise ArgumentError, "cursor pagination requires both cursor_param and next_cursor" unless cursor_param && next_cursor
242
+ paginate_cursor(operation_id, operation, base_params, cursor_param: cursor_param, next_cursor: next_cursor,
243
+ start: start, max_pages: max_pages, opts: opts, &block)
244
+ else
245
+ paginate_numeric(operation_id, operation, base_params, page_param: page_param, start: start, step: step,
246
+ max_pages: max_pages, opts: opts, &block)
247
+ end
248
+ end
242
249
 
243
- query_names = (operation["queryParams"] || []).map { |p| p["name"] }
244
- unless query_names.include?(cursor_param)
245
- raise ArgumentError, "cursor_param #{cursor_param.inspect} is not a query parameter of operation #{operation_id}"
246
- end
250
+ # Yield individual items across pages. +items+ extracts the list from a page
251
+ # (default: the Crawlora +data+ array).
252
+ def paginate_items(operation_id, params = {}, items: nil, **options, &block)
253
+ return enum_for(:paginate_items, operation_id, params, items: items, **options) unless block_given?
247
254
 
248
- cursor = start
249
- fetched = 0
250
- while max_pages.nil? || fetched < max_pages
251
- page_params = base_params.dup
252
- page_params[cursor_param] = cursor unless cursor.nil?
253
- response = request(operation_id, page_params, response_type: response_type, timeout: timeout, headers: headers)
254
- yield response
255
- fetched += 1
256
- cursor = next_cursor.call(response)
257
- break unless cursor && !(cursor.respond_to?(:empty?) && cursor.empty?)
258
- end
259
- return
255
+ extract = items || Pagination.method(:default_items)
256
+ paginate(operation_id, params, **options) do |page|
257
+ extract.call(page).each(&block)
258
+ end
259
+ end
260
+
261
+ private
262
+
263
+ # Yield successive pages by advancing a cursor query parameter until
264
+ # +next_cursor+ returns a blank value.
265
+ def paginate_cursor(operation_id, operation, base_params, cursor_param:, next_cursor:, start:, max_pages:, opts:)
266
+ raise ArgumentError, "cursor pagination requires both cursor_param and next_cursor" unless cursor_param && next_cursor
267
+
268
+ query_names = (operation["queryParams"] || []).map { |p| p["name"] }
269
+ unless query_names.include?(cursor_param)
270
+ raise ArgumentError, "cursor_param #{cursor_param.inspect} is not a query parameter of operation #{operation_id}"
271
+ end
272
+
273
+ cursor = start
274
+ fetched = 0
275
+ while max_pages.nil? || fetched < max_pages
276
+ page_params = base_params.dup
277
+ page_params[cursor_param] = cursor unless cursor.nil?
278
+ response = request(operation_id, page_params, **opts)
279
+ yield response
280
+ fetched += 1
281
+ cursor = next_cursor.call(response)
282
+ break unless cursor && !(cursor.respond_to?(:empty?) && cursor.empty?)
260
283
  end
284
+ end
261
285
 
286
+ # Yield successive pages by advancing the page/offset query parameter until
287
+ # a page comes back empty.
288
+ def paginate_numeric(operation_id, operation, base_params, page_param:, start:, step:, max_pages:, opts:)
262
289
  page_param ||= Pagination.detect_page_param(operation)
263
290
  raise ArgumentError, "operation #{operation_id} has no page or offset query parameter to paginate" unless page_param
264
291
 
@@ -266,7 +293,7 @@ module Crawlora
266
293
  fetched = 0
267
294
  while max_pages.nil? || fetched < max_pages
268
295
  page_params = base_params.merge(page_param => page_value)
269
- response = request(operation_id, page_params, response_type: response_type, timeout: timeout, headers: headers)
296
+ response = request(operation_id, page_params, **opts)
270
297
  yield response
271
298
  fetched += 1
272
299
  break if Pagination.page_empty?(response)
@@ -275,21 +302,30 @@ module Crawlora
275
302
  end
276
303
  end
277
304
 
278
- # Yield individual items across pages. +items+ extracts the list from a page
279
- # (default: the Crawlora +data+ array).
280
- def paginate_items(operation_id, params = {}, items: nil, **options, &block)
281
- return enum_for(:paginate_items, operation_id, params, items: items, **options) unless block_given?
305
+ def send_request(operation, params, response_type:, timeout:, headers:, idempotency_key: nil)
306
+ url, body, body_headers = build_request(@base_url, operation, params)
307
+ request_headers, req_id = prepare_request(operation, body_headers, headers, idempotency_key)
308
+ unless @before_request.empty?
309
+ ctx = { operation: operation["id"], method: operation["method"], url: url, headers: request_headers }
310
+ @before_request.each { |hook| hook.call(ctx) }
311
+ url = ctx[:url]
312
+ request_headers = ctx[:headers]
313
+ end
282
314
 
283
- extract = items || Pagination.method(:default_items)
284
- paginate(operation_id, params, **options) do |page|
285
- extract.call(page).each(&block)
315
+ request_timeout = timeout.nil? ? @timeout : timeout
316
+ begin
317
+ response = call_transport(method: operation["method"], url: url, headers: request_headers, body: body, timeout: request_timeout)
318
+ rescue StandardError => e
319
+ message = timeout_error?(e) ? "Crawlora request timed out" : "Crawlora transport error"
320
+ raise NetworkError.new(message, request_id: req_id, cause: e)
286
321
  end
287
- end
288
322
 
289
- private
323
+ handle_response(operation, response, response_type, req_id)
324
+ end
290
325
 
291
- def send_request(operation, params, response_type:, timeout:, headers:, idempotency_key: nil)
292
- url, body, body_headers = build_request(@base_url, operation, params)
326
+ # Build the merged request headers and resolve the request id, attaching an
327
+ # idempotency key when one was generated.
328
+ def prepare_request(operation, body_headers, headers, idempotency_key)
293
329
  request_headers = merge_headers(
294
330
  @headers,
295
331
  auth_headers(operation["security"] || [], @api_key, @jwt_token),
@@ -301,53 +337,47 @@ module Crawlora
301
337
  if @request_id
302
338
  ensure_request_id(request_headers)
303
339
  else
304
- v = header_value(request_headers, "x-request-id")
305
- v.empty? ? nil : v
340
+ existing = header_value(request_headers, "x-request-id")
341
+ existing.empty? ? nil : existing
306
342
  end
307
343
  request_headers["Idempotency-Key"] = idempotency_key if idempotency_key && header_value(request_headers, "idempotency-key").empty?
308
- unless @before_request.empty?
309
- ctx = { operation: operation["id"], method: operation["method"], url: url, headers: request_headers }
310
- @before_request.each { |hook| hook.call(ctx) }
311
- url = ctx[:url]
312
- request_headers = ctx[:headers]
313
- end
344
+ [request_headers, req_id]
345
+ end
314
346
 
315
- request_timeout = timeout.nil? ? @timeout : timeout
316
- begin
317
- response =
318
- if @rate_limiter
319
- @rate_limiter.run do
320
- @transport.call(method: operation["method"], url: url, headers: request_headers, body: body, timeout: request_timeout)
321
- end
322
- else
323
- @transport.call(method: operation["method"], url: url, headers: request_headers, body: body, timeout: request_timeout)
324
- end
325
- rescue StandardError => e
326
- message = timeout_error?(e) ? "Crawlora request timed out" : "Crawlora transport error"
327
- raise NetworkError.new(message, request_id: req_id, cause: e)
328
- end
347
+ def call_transport(method:, url:, headers:, body:, timeout:)
348
+ call = -> { @transport.call(method: method, url: url, headers: headers, body: body, timeout: timeout) }
349
+ @rate_limiter ? @rate_limiter.run(&call) : call.call
350
+ end
329
351
 
352
+ # Parse the response, raise the typed API error on non-2xx, and run the
353
+ # after_response hooks on success.
354
+ def handle_response(operation, response, response_type, req_id)
330
355
  raw_body = response.body.to_s
331
356
  is_error = response.status < 200 || response.status >= 300
332
- return StringIO.new(response.body.to_s) if response_type == "stream" && !is_error
357
+ return StringIO.new(raw_body) if response_type == "stream" && !is_error
333
358
 
334
- parse_mode = response_type == "stream" ? "auto" : response_type
359
+ parse_mode = (response_type == "stream") ? "auto" : response_type
335
360
  begin
336
- parsed = parse_response(response.body.to_s, header_value(response.headers, "content-type"), parse_mode)
361
+ parsed = parse_response(raw_body, header_value(response.headers, "content-type"), parse_mode)
337
362
  rescue JSON::ParserError => e
338
363
  raise Error.new("Crawlora JSON parse error", status: response.status, raw_body: raw_body,
339
364
  headers: response.headers, request_id: req_id, cause: e)
340
365
  end
341
366
 
342
- if is_error
343
- code = parsed.is_a?(Hash) ? parsed["code"] : nil
344
- message = parsed.is_a?(Hash) && parsed["msg"] && !parsed["msg"].to_s.empty? ? parsed["msg"] : "HTTP #{response.status}"
345
- raise Crawlora.error_class_for(response.status).new(
346
- message, status: response.status, code: code, body: parsed,
347
- raw_body: raw_body, headers: response.headers, request_id: req_id
348
- )
349
- end
367
+ raise_api_error(response, parsed, raw_body, req_id) if is_error
368
+ run_after_response(operation, response, parsed)
369
+ end
370
+
371
+ def raise_api_error(response, parsed, raw_body, req_id)
372
+ code = parsed.is_a?(Hash) ? parsed["code"] : nil
373
+ message = (parsed.is_a?(Hash) && parsed["msg"] && !parsed["msg"].to_s.empty?) ? parsed["msg"] : "HTTP #{response.status}"
374
+ raise Crawlora.error_class_for(response.status).new(
375
+ message, status: response.status, code: code, body: parsed,
376
+ raw_body: raw_body, headers: response.headers, request_id: req_id
377
+ )
378
+ end
350
379
 
380
+ def run_after_response(operation, response, parsed)
351
381
  @after_response.each do |hook|
352
382
  result = hook.call(operation["id"], response.status, response.headers, parsed)
353
383
  parsed = result unless result.nil?
@@ -476,18 +506,23 @@ module Crawlora
476
506
  chunks << "--#{boundary}\r\n"
477
507
  if parameter["type"] == "file"
478
508
  filename, data = read_file_value(value)
479
- chunks << %(Content-Disposition: form-data; name="#{name}"; filename="#{filename}"\r\n)
509
+ chunks << %(Content-Disposition: form-data; name="#{quote_escape(name)}"; filename="#{quote_escape(filename)}"\r\n)
480
510
  chunks << "Content-Type: application/octet-stream\r\n\r\n"
481
511
  chunks << data
482
512
  chunks << "\r\n"
483
513
  else
484
- chunks << %(Content-Disposition: form-data; name="#{name}"\r\n\r\n#{value}\r\n)
514
+ chunks << %(Content-Disposition: form-data; name="#{quote_escape(name)}"\r\n\r\n#{value}\r\n)
485
515
  end
486
516
  end
487
517
  chunks << "--#{boundary}--\r\n"
488
518
  [chunks, { "content-type" => "multipart/form-data; boundary=#{boundary}" }]
489
519
  end
490
520
 
521
+ # Escape characters that would break a multipart Content-Disposition header.
522
+ def quote_escape(value)
523
+ value.to_s.gsub("\\", "\\\\\\\\").gsub('"', '\\"').gsub(/[\r\n]/, " ")
524
+ end
525
+
491
526
  def read_file_value(value)
492
527
  return ["upload.bin", value] if value.is_a?(String) && !File.exist?(value)
493
528
  return [File.basename(value), File.binread(value)] if value.is_a?(String)