webscraping_ai 2.0.1 → 3.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a9962503617721ab3362a7ba4c45f3bdf346d8c2762f25618e4daffd438f2d3f
4
- data.tar.gz: 0b576d0c56ee0cbaa13690dea3a8740887ab3d76a7eb46f8f48ee7deb1e7a784
3
+ metadata.gz: 4867f8d8db8fd67de9d08cbccff1743f171c7494b45b2732f0a02b0895f2f90f
4
+ data.tar.gz: 15627c6dbe09d5319cc3c078fd00b5f7124c2f103c752b288f8763ec1f166f2f
5
5
  SHA512:
6
- metadata.gz: b396ca118c8f0ee9a95f270b3931522a58725982859722e34ffa4039c671d2eb41961b645a1295fec08493351c1495a912484faaf47a73e3f3ad8ccb48763c0a
7
- data.tar.gz: 4106699b17e5f19ac8d772ce93de5961d1020b02531f4fc28aff954332cb301035090a4b7e37a978be88d2127ac348eda4d3c27d7470bdae0b427237004643dc
6
+ metadata.gz: 84f320a205fce47ff07cf166f4745a7f3a82f70216ada66e1806937b2dc484a827cbb37e535fd04b3cd3fcf91f0517bc6ec01a4f8040dabcbfff0500466636a3
7
+ data.tar.gz: 489addbc3805ffe1ec9a9c60f5a42d7633d56512b2cb8a54dfec81bf98b2a28c5876f4e4c7ece8ecf4477a5fa744fa36d355b53500f557e702fa0426a3e45fcf
data/README.md CHANGED
@@ -2,12 +2,12 @@
2
2
 
3
3
  WebScrapingAI - the Ruby gem for the WebScraping.AI
4
4
 
5
- A client for https://webscraping.ai API. It provides a web scaping automation API with Chrome JS rendering, rotating proxies and builtin HTML parsing.
5
+ WebScraping.AI scraping API provides GPT-powered tools with Chromium JavaScript rendering, rotating proxies, and built-in HTML parsing.
6
6
 
7
7
  This SDK is automatically generated by the [OpenAPI Generator](https://openapi-generator.tech) project:
8
8
 
9
- - API version: 2.0.1
10
- - Package version: 2.0.1
9
+ - API version: 3.1.2
10
+ - Package version: 3.1.2
11
11
  - Build package: org.openapitools.codegen.languages.RubyClientCodegen
12
12
  For more information, please visit [https://webscraping.ai](https://webscraping.ai)
13
13
 
@@ -24,16 +24,16 @@ gem build webscraping_ai.gemspec
24
24
  Then either install the gem locally:
25
25
 
26
26
  ```shell
27
- gem install ./webscraping_ai-2.0.1.gem
27
+ gem install ./webscraping_ai-3.1.2.gem
28
28
  ```
29
29
 
30
- (for development, run `gem install --dev ./webscraping_ai-2.0.1.gem` to install the development dependencies)
30
+ (for development, run `gem install --dev ./webscraping_ai-3.1.2.gem` to install the development dependencies)
31
31
 
32
32
  or publish the gem to a gem hosting service, e.g. [RubyGems](https://rubygems.org/).
33
33
 
34
34
  Finally add this to the Gemfile:
35
35
 
36
- gem 'webscraping_ai', '~> 2.0.1'
36
+ gem 'webscraping_ai', '~> 3.1.2'
37
37
 
38
38
  ### Install from Git
39
39
 
@@ -62,23 +62,34 @@ WebScrapingAI.configure do |config|
62
62
  # Configure API key authorization: api_key
63
63
  config.api_key['api_key'] = 'YOUR API KEY'
64
64
  # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
65
- #config.api_key_prefix['api_key'] = 'Bearer'
65
+ # config.api_key_prefix['api_key'] = 'Bearer'
66
66
  end
67
67
 
68
- api_instance = WebScrapingAI::HTMLApi.new
69
- url = 'https://example.com' # String | URL of the target page
68
+ api_instance = WebScrapingAI::AIApi.new
69
+ url = 'https://example.com' # String | URL of the target page.
70
70
  opts = {
71
- headers: {'key' => '{\"Cookie\":\"session=some_id\"}'}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})
72
- timeout: 5000, # Integer | Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000)
73
- js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default), costs 2 requests
74
- proxy: 'datacenter' # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default)
71
+ question: 'What is the summary of this page content?', # String | Question or instructions to ask the LLM model about the target page.
72
+ context_limit: 4000, # Integer | Maximum number of tokens to use as context for the LLM model (4000 by default).
73
+ response_tokens: 100, # Integer | Maximum number of tokens to return in the LLM model response. The total context size (context_limit) includes the question, the target page content and the response, so this parameter reserves tokens for the response (see also on_context_limit).
74
+ on_context_limit: 'truncate', # String | What to do if the context_limit parameter is exceeded (truncate by default). The context is exceeded when the target page content is too long.
75
+ headers: { key: 3.56}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"}).
76
+ timeout: 10000, # Integer | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000).
77
+ js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default).
78
+ js_timeout: 2000, # Integer | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page.
79
+ proxy: 'datacenter', # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details.
80
+ country: 'us', # String | Country of the proxy to use (US by default). Only available on Startup and Custom plans.
81
+ device: 'desktop', # String | Type of device emulation.
82
+ error_on_404: false, # Boolean | Return error on 404 HTTP status on the target page (false by default).
83
+ error_on_redirect: false, # Boolean | Return error on redirect on the target page (false by default).
84
+ js_script: 'document.querySelector('button').click();' # String | Custom JavaScript code to execute on the target page.
75
85
  }
76
86
 
77
87
  begin
78
- #Page HTML by URL
79
- api_instance.get_html(url, opts)
88
+ #Get an answer to a question about a given web page
89
+ result = api_instance.get_question(url, opts)
90
+ p result
80
91
  rescue WebScrapingAI::ApiError => e
81
- puts "Exception when calling HTMLApi->get_html: #{e}"
92
+ puts "Exception when calling AIApi->get_question: #{e}"
82
93
  end
83
94
 
84
95
  ```
@@ -89,23 +100,24 @@ All URIs are relative to *https://api.webscraping.ai*
89
100
 
90
101
  Class | Method | HTTP request | Description
91
102
  ------------ | ------------- | ------------- | -------------
103
+ *WebScrapingAI::AIApi* | [**get_question**](docs/AIApi.md#get_question) | **GET** /ai/question | Get an answer to a question about a given web page
104
+ *WebScrapingAI::AccountApi* | [**account**](docs/AccountApi.md#account) | **GET** /account | Information about your account calls quota
92
105
  *WebScrapingAI::HTMLApi* | [**get_html**](docs/HTMLApi.md#get_html) | **GET** /html | Page HTML by URL
93
- *WebScrapingAI::HTMLApi* | [**post_html**](docs/HTMLApi.md#post_html) | **POST** /html | Page HTML by URL with POST request to the target page
94
106
  *WebScrapingAI::SelectedHTMLApi* | [**get_selected**](docs/SelectedHTMLApi.md#get_selected) | **GET** /selected | HTML of a selected page area by URL and CSS selector
95
107
  *WebScrapingAI::SelectedHTMLApi* | [**get_selected_multiple**](docs/SelectedHTMLApi.md#get_selected_multiple) | **GET** /selected-multiple | HTML of multiple page areas by URL and CSS selectors
96
- *WebScrapingAI::SelectedHTMLApi* | [**post_selected**](docs/SelectedHTMLApi.md#post_selected) | **POST** /selected | HTML of a selected page areas by URL and CSS selector, with POST request to the target page
97
- *WebScrapingAI::SelectedHTMLApi* | [**post_selected_multiple**](docs/SelectedHTMLApi.md#post_selected_multiple) | **POST** /selected-multiple | HTML of multiple page areas by URL and CSS selectors, with POST request to the target page
108
+ *WebScrapingAI::TextApi* | [**get_text**](docs/TextApi.md#get_text) | **GET** /text | Page text by URL
98
109
 
99
110
 
100
111
  ## Documentation for Models
101
112
 
113
+ - [WebScrapingAI::Account](docs/Account.md)
102
114
  - [WebScrapingAI::Error](docs/Error.md)
103
- - [WebScrapingAI::PageError](docs/PageError.md)
104
115
 
105
116
 
106
117
  ## Documentation for Authorization
107
118
 
108
119
 
120
+ Authentication schemes defined for the API:
109
121
  ### api_key
110
122
 
111
123
 
data/docs/AIApi.md ADDED
@@ -0,0 +1,109 @@
1
+ # WebScrapingAI::AIApi
2
+
3
+ All URIs are relative to *https://api.webscraping.ai*
4
+
5
+ | Method | HTTP request | Description |
6
+ | ------ | ------------ | ----------- |
7
+ | [**get_question**](AIApi.md#get_question) | **GET** /ai/question | Get an answer to a question about a given web page |
8
+
9
+
10
+ ## get_question
11
+
12
+ > String get_question(url, opts)
13
+
14
+ Get an answer to a question about a given web page
15
+
16
+ Returns the answer in plain text. Proxies and Chromium JavaScript rendering are used for page retrieval and processing, then the answer is extracted using an LLM model.
17
+
18
+ ### Examples
19
+
20
+ ```ruby
21
+ require 'time'
22
+ require 'webscraping_ai'
23
+ # setup authorization
24
+ WebScrapingAI.configure do |config|
25
+ # Configure API key authorization: api_key
26
+ config.api_key['api_key'] = 'YOUR API KEY'
27
+ # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
28
+ # config.api_key_prefix['api_key'] = 'Bearer'
29
+ end
30
+
31
+ api_instance = WebScrapingAI::AIApi.new
32
+ url = 'https://example.com' # String | URL of the target page.
33
+ opts = {
34
+ question: 'What is the summary of this page content?', # String | Question or instructions to ask the LLM model about the target page.
35
+ context_limit: 4000, # Integer | Maximum number of tokens to use as context for the LLM model (4000 by default).
36
+ response_tokens: 100, # Integer | Maximum number of tokens to return in the LLM model response. The total context size (context_limit) includes the question, the target page content and the response, so this parameter reserves tokens for the response (see also on_context_limit).
37
+ on_context_limit: 'truncate', # String | What to do if the context_limit parameter is exceeded (truncate by default). The context is exceeded when the target page content is too long.
38
+ headers: { key: 3.56}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"}).
39
+ timeout: 10000, # Integer | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000).
40
+ js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default).
41
+ js_timeout: 2000, # Integer | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page.
42
+ proxy: 'datacenter', # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details.
43
+ country: 'us', # String | Country of the proxy to use (US by default). Only available on Startup and Custom plans.
44
+ device: 'desktop', # String | Type of device emulation.
45
+ error_on_404: false, # Boolean | Return error on 404 HTTP status on the target page (false by default).
46
+ error_on_redirect: false, # Boolean | Return error on redirect on the target page (false by default).
47
+ js_script: 'document.querySelector('button').click();' # String | Custom JavaScript code to execute on the target page.
48
+ }
49
+
50
+ begin
51
+ # Get an answer to a question about a given web page
52
+ result = api_instance.get_question(url, opts)
53
+ p result
54
+ rescue WebScrapingAI::ApiError => e
55
+ puts "Error when calling AIApi->get_question: #{e}"
56
+ end
57
+ ```
58
+
59
+ #### Using the get_question_with_http_info variant
60
+
61
+ This returns an Array which contains the response data, status code and headers.
62
+
63
+ > <Array(String, Integer, Hash)> get_question_with_http_info(url, opts)
64
+
65
+ ```ruby
66
+ begin
67
+ # Get an answer to a question about a given web page
68
+ data, status_code, headers = api_instance.get_question_with_http_info(url, opts)
69
+ p status_code # => 2xx
70
+ p headers # => { ... }
71
+ p data # => String
72
+ rescue WebScrapingAI::ApiError => e
73
+ puts "Error when calling AIApi->get_question_with_http_info: #{e}"
74
+ end
75
+ ```
76
+
77
+ ### Parameters
78
+
79
+ | Name | Type | Description | Notes |
80
+ | ---- | ---- | ----------- | ----- |
81
+ | **url** | **String** | URL of the target page. | |
82
+ | **question** | **String** | Question or instructions to ask the LLM model about the target page. | [optional] |
83
+ | **context_limit** | **Integer** | Maximum number of tokens to use as context for the LLM model (4000 by default). | [optional][default to 8000] |
84
+ | **response_tokens** | **Integer** | Maximum number of tokens to return in the LLM model response. The total context size (context_limit) includes the question, the target page content and the response, so this parameter reserves tokens for the response (see also on_context_limit). | [optional][default to 100] |
85
+ | **on_context_limit** | **String** | What to do if the context_limit parameter is exceeded (truncate by default). The context is exceeded when the target page content is too long. | [optional][default to &#39;truncate&#39;] |
86
+ | **headers** | [**Hash&lt;String, String&gt;**](String.md) | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}). | [optional] |
87
+ | **timeout** | **Integer** | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000). | [optional][default to 10000] |
88
+ | **js** | **Boolean** | Execute on-page JavaScript using a headless browser (true by default). | [optional][default to true] |
89
+ | **js_timeout** | **Integer** | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page. | [optional][default to 2000] |
90
+ | **proxy** | **String** | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details. | [optional][default to &#39;datacenter&#39;] |
91
+ | **country** | **String** | Country of the proxy to use (US by default). Only available on Startup and Custom plans. | [optional][default to &#39;us&#39;] |
92
+ | **device** | **String** | Type of device emulation. | [optional][default to &#39;desktop&#39;] |
93
+ | **error_on_404** | **Boolean** | Return error on 404 HTTP status on the target page (false by default). | [optional][default to false] |
94
+ | **error_on_redirect** | **Boolean** | Return error on redirect on the target page (false by default). | [optional][default to false] |
95
+ | **js_script** | **String** | Custom JavaScript code to execute on the target page. | [optional] |
96
+
97
+ ### Return type
98
+
99
+ **String**
100
+
101
+ ### Authorization
102
+
103
+ [api_key](../README.md#api_key)
104
+
105
+ ### HTTP request headers
106
+
107
+ - **Content-Type**: Not defined
108
+ - **Accept**: application/json, text/html
109
+
data/docs/Account.md ADDED
@@ -0,0 +1,22 @@
1
+ # WebScrapingAI::Account
2
+
3
+ ## Properties
4
+
5
+ | Name | Type | Description | Notes |
6
+ | ---- | ---- | ----------- | ----- |
7
+ | **remaining_api_calls** | **Integer** | Remaining API credits quota | [optional] |
8
+ | **resets_at** | **Integer** | Next billing cycle start time (UNIX timestamp) | [optional] |
9
+ | **remaining_concurrency** | **Integer** | Remaining concurrent requests | [optional] |
10
+
11
+ ## Example
12
+
13
+ ```ruby
14
+ require 'webscraping_ai'
15
+
16
+ instance = WebScrapingAI::Account.new(
17
+ remaining_api_calls: null,
18
+ resets_at: null,
19
+ remaining_concurrency: null
20
+ )
21
+ ```
22
+
@@ -0,0 +1,76 @@
1
+ # WebScrapingAI::AccountApi
2
+
3
+ All URIs are relative to *https://api.webscraping.ai*
4
+
5
+ | Method | HTTP request | Description |
6
+ | ------ | ------------ | ----------- |
7
+ | [**account**](AccountApi.md#account) | **GET** /account | Information about your account calls quota |
8
+
9
+
10
+ ## account
11
+
12
+ > <Account> account
13
+
14
+ Information about your account calls quota
15
+
16
+ Returns information about your account, including the remaining API credits quota, the next billing cycle start time, and the remaining concurrent requests. The response is in JSON format.
17
+
18
+ ### Examples
19
+
20
+ ```ruby
21
+ require 'time'
22
+ require 'webscraping_ai'
23
+ # setup authorization
24
+ WebScrapingAI.configure do |config|
25
+ # Configure API key authorization: api_key
26
+ config.api_key['api_key'] = 'YOUR API KEY'
27
+ # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
28
+ # config.api_key_prefix['api_key'] = 'Bearer'
29
+ end
30
+
31
+ api_instance = WebScrapingAI::AccountApi.new
32
+
33
+ begin
34
+ # Information about your account calls quota
35
+ result = api_instance.account
36
+ p result
37
+ rescue WebScrapingAI::ApiError => e
38
+ puts "Error when calling AccountApi->account: #{e}"
39
+ end
40
+ ```
41
+
42
+ #### Using the account_with_http_info variant
43
+
44
+ This returns an Array which contains the response data, status code and headers.
45
+
46
+ > <Array(<Account>, Integer, Hash)> account_with_http_info
47
+
48
+ ```ruby
49
+ begin
50
+ # Information about your account calls quota
51
+ data, status_code, headers = api_instance.account_with_http_info
52
+ p status_code # => 2xx
53
+ p headers # => { ... }
54
+ p data # => <Account>
55
+ rescue WebScrapingAI::ApiError => e
56
+ puts "Error when calling AccountApi->account_with_http_info: #{e}"
57
+ end
58
+ ```
59
+
60
+ ### Parameters
61
+
62
+ This endpoint does not need any parameter.
63
+
64
+ ### Return type
65
+
66
+ [**Account**](Account.md)
67
+
68
+ ### Authorization
69
+
70
+ [api_key](../README.md#api_key)
71
+
72
+ ### HTTP request headers
73
+
74
+ - **Content-Type**: Not defined
75
+ - **Accept**: application/json
76
+
data/docs/Error.md CHANGED
@@ -2,16 +2,23 @@
2
2
 
3
3
  ## Properties
4
4
 
5
- Name | Type | Description | Notes
6
- ------------ | ------------- | ------------- | -------------
7
- **message** | **String** | Error description | [optional]
5
+ | Name | Type | Description | Notes |
6
+ | ---- | ---- | ----------- | ----- |
7
+ | **message** | **String** | Error description | [optional] |
8
+ | **status_code** | **Integer** | Target page response HTTP status code (403, 500, etc) | [optional] |
9
+ | **status_message** | **String** | Target page response HTTP status message | [optional] |
10
+ | **body** | **String** | Target page response body | [optional] |
8
11
 
9
- ## Code Sample
12
+ ## Example
10
13
 
11
14
  ```ruby
12
- require 'WebScrapingAI'
15
+ require 'webscraping_ai'
13
16
 
14
- instance = WebScrapingAI::Error.new(message: null)
17
+ instance = WebScrapingAI::Error.new(
18
+ message: null,
19
+ status_code: null,
20
+ status_message: null,
21
+ body: null
22
+ )
15
23
  ```
16
24
 
17
-
data/docs/HTMLApi.md CHANGED
@@ -2,130 +2,95 @@
2
2
 
3
3
  All URIs are relative to *https://api.webscraping.ai*
4
4
 
5
- Method | HTTP request | Description
6
- ------------- | ------------- | -------------
7
- [**get_html**](HTMLApi.md#get_html) | **GET** /html | Page HTML by URL
8
- [**post_html**](HTMLApi.md#post_html) | **POST** /html | Page HTML by URL with POST request to the target page
9
-
5
+ | Method | HTTP request | Description |
6
+ | ------ | ------------ | ----------- |
7
+ | [**get_html**](HTMLApi.md#get_html) | **GET** /html | Page HTML by URL |
10
8
 
11
9
 
12
10
  ## get_html
13
11
 
14
- > get_html(url, opts)
12
+ > String get_html(url, opts)
15
13
 
16
14
  Page HTML by URL
17
15
 
18
- Returns just HTML on success, JSON on error
16
+ Returns the full HTML content of a webpage specified by the URL. The response is in plain text. Proxies and Chromium JavaScript rendering are used for page retrieval and processing.
19
17
 
20
- ### Example
18
+ ### Examples
21
19
 
22
20
  ```ruby
23
- # load the gem
21
+ require 'time'
24
22
  require 'webscraping_ai'
25
23
  # setup authorization
26
24
  WebScrapingAI.configure do |config|
27
25
  # Configure API key authorization: api_key
28
26
  config.api_key['api_key'] = 'YOUR API KEY'
29
27
  # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
30
- #config.api_key_prefix['api_key'] = 'Bearer'
28
+ # config.api_key_prefix['api_key'] = 'Bearer'
31
29
  end
32
30
 
33
31
  api_instance = WebScrapingAI::HTMLApi.new
34
- url = 'https://example.com' # String | URL of the target page
32
+ url = 'https://example.com' # String | URL of the target page.
35
33
  opts = {
36
- headers: {'key' => '{\"Cookie\":\"session=some_id\"}'}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})
37
- timeout: 5000, # Integer | Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000)
38
- js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default), costs 2 requests
39
- proxy: 'datacenter' # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default)
34
+ headers: { key: 3.56}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"}).
35
+ timeout: 10000, # Integer | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000).
36
+ js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default).
37
+ js_timeout: 2000, # Integer | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page.
38
+ proxy: 'datacenter', # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details.
39
+ country: 'us', # String | Country of the proxy to use (US by default). Only available on Startup and Custom plans.
40
+ device: 'desktop', # String | Type of device emulation.
41
+ error_on_404: false, # Boolean | Return error on 404 HTTP status on the target page (false by default).
42
+ error_on_redirect: false, # Boolean | Return error on redirect on the target page (false by default).
43
+ js_script: 'document.querySelector('button').click();', # String | Custom JavaScript code to execute on the target page.
44
+ return_script_result: false # Boolean | Return result of the custom JavaScript code (js_script parameter) execution on the target page (false by default, page HTML will be returned).
40
45
  }
41
46
 
42
47
  begin
43
- #Page HTML by URL
44
- api_instance.get_html(url, opts)
48
+ # Page HTML by URL
49
+ result = api_instance.get_html(url, opts)
50
+ p result
45
51
  rescue WebScrapingAI::ApiError => e
46
- puts "Exception when calling HTMLApi->get_html: #{e}"
52
+ puts "Error when calling HTMLApi->get_html: #{e}"
47
53
  end
48
54
  ```
49
55
 
50
- ### Parameters
51
-
52
-
53
- Name | Type | Description | Notes
54
- ------------- | ------------- | ------------- | -------------
55
- **url** | **String**| URL of the target page |
56
- **headers** | [**Hash&lt;String, String&gt;**](String.md)| HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}) | [optional]
57
- **timeout** | **Integer**| Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000) | [optional] [default to 5000]
58
- **js** | **Boolean**| Execute on-page JavaScript using a headless browser (true by default), costs 2 requests | [optional] [default to true]
59
- **proxy** | **String**| Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default) | [optional] [default to &#39;datacenter&#39;]
60
-
61
- ### Return type
62
-
63
- nil (empty response body)
64
-
65
- ### Authorization
66
-
67
- [api_key](../README.md#api_key)
68
-
69
- ### HTTP request headers
70
-
71
- - **Content-Type**: Not defined
72
- - **Accept**: application/json, text/html
73
-
56
+ #### Using the get_html_with_http_info variant
74
57
 
75
- ## post_html
58
+ This returns an Array which contains the response data, status code and headers.
76
59
 
77
- > post_html(url, opts)
78
-
79
- Page HTML by URL with POST request to the target page
80
-
81
- Returns just HTML on success, JSON on error. Request body will be passed to the target page.
82
-
83
- ### Example
60
+ > <Array(String, Integer, Hash)> get_html_with_http_info(url, opts)
84
61
 
85
62
  ```ruby
86
- # load the gem
87
- require 'webscraping_ai'
88
- # setup authorization
89
- WebScrapingAI.configure do |config|
90
- # Configure API key authorization: api_key
91
- config.api_key['api_key'] = 'YOUR API KEY'
92
- # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
93
- #config.api_key_prefix['api_key'] = 'Bearer'
94
- end
95
-
96
- api_instance = WebScrapingAI::HTMLApi.new
97
- url = 'https://httpbin.org/post' # String | URL of the target page
98
- opts = {
99
- headers: {'key' => '{\"Cookie\":\"session=some_id\"}'}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})
100
- timeout: 5000, # Integer | Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000)
101
- js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default), costs 2 requests
102
- proxy: 'datacenter', # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default)
103
- request_body: nil # Hash<String, Object> | Request body to pass to the target page
104
- }
105
-
106
63
  begin
107
- #Page HTML by URL with POST request to the target page
108
- api_instance.post_html(url, opts)
64
+ # Page HTML by URL
65
+ data, status_code, headers = api_instance.get_html_with_http_info(url, opts)
66
+ p status_code # => 2xx
67
+ p headers # => { ... }
68
+ p data # => String
109
69
  rescue WebScrapingAI::ApiError => e
110
- puts "Exception when calling HTMLApi->post_html: #{e}"
70
+ puts "Error when calling HTMLApi->get_html_with_http_info: #{e}"
111
71
  end
112
72
  ```
113
73
 
114
74
  ### Parameters
115
75
 
116
-
117
- Name | Type | Description | Notes
118
- ------------- | ------------- | ------------- | -------------
119
- **url** | **String**| URL of the target page |
120
- **headers** | [**Hash&lt;String, String&gt;**](String.md)| HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}) | [optional]
121
- **timeout** | **Integer**| Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000) | [optional] [default to 5000]
122
- **js** | **Boolean**| Execute on-page JavaScript using a headless browser (true by default), costs 2 requests | [optional] [default to true]
123
- **proxy** | **String**| Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default) | [optional] [default to &#39;datacenter&#39;]
124
- **request_body** | [**Hash&lt;String, Object&gt;**](Object.md)| Request body to pass to the target page | [optional]
76
+ | Name | Type | Description | Notes |
77
+ | ---- | ---- | ----------- | ----- |
78
+ | **url** | **String** | URL of the target page. | |
79
+ | **headers** | [**Hash&lt;String, String&gt;**](String.md) | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}). | [optional] |
80
+ | **timeout** | **Integer** | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000). | [optional][default to 10000] |
81
+ | **js** | **Boolean** | Execute on-page JavaScript using a headless browser (true by default). | [optional][default to true] |
82
+ | **js_timeout** | **Integer** | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page. | [optional][default to 2000] |
83
+ | **proxy** | **String** | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details. | [optional][default to &#39;datacenter&#39;] |
84
+ | **country** | **String** | Country of the proxy to use (US by default). Only available on Startup and Custom plans. | [optional][default to &#39;us&#39;] |
85
+ | **device** | **String** | Type of device emulation. | [optional][default to &#39;desktop&#39;] |
86
+ | **error_on_404** | **Boolean** | Return error on 404 HTTP status on the target page (false by default). | [optional][default to false] |
87
+ | **error_on_redirect** | **Boolean** | Return error on redirect on the target page (false by default). | [optional][default to false] |
88
+ | **js_script** | **String** | Custom JavaScript code to execute on the target page. | [optional] |
89
+ | **return_script_result** | **Boolean** | Return result of the custom JavaScript code (js_script parameter) execution on the target page (false by default, page HTML will be returned). | [optional][default to false] |
125
90
 
126
91
  ### Return type
127
92
 
128
- nil (empty response body)
93
+ **String**
129
94
 
130
95
  ### Authorization
131
96
 
@@ -133,6 +98,6 @@ nil (empty response body)
133
98
 
134
99
  ### HTTP request headers
135
100
 
136
- - **Content-Type**: application/json, application/x-www-form-urlencoded, application/xml, text/plain
101
+ - **Content-Type**: Not defined
137
102
  - **Accept**: application/json, text/html
138
103