webscraping_ai 2.0.2 → 3.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,13 +2,10 @@
2
2
 
3
3
  All URIs are relative to *https://api.webscraping.ai*
4
4
 
5
- Method | HTTP request | Description
6
- ------------- | ------------- | -------------
7
- [**get_selected**](SelectedHTMLApi.md#get_selected) | **GET** /selected | HTML of a selected page area by URL and CSS selector
8
- [**get_selected_multiple**](SelectedHTMLApi.md#get_selected_multiple) | **GET** /selected-multiple | HTML of multiple page areas by URL and CSS selectors
9
- [**post_selected**](SelectedHTMLApi.md#post_selected) | **POST** /selected | HTML of a selected page areas by URL and CSS selector, with POST request to the target page
10
- [**post_selected_multiple**](SelectedHTMLApi.md#post_selected_multiple) | **POST** /selected-multiple | HTML of multiple page areas by URL and CSS selectors, with POST request to the target page
11
-
5
+ | Method | HTTP request | Description |
6
+ | ------ | ------------ | ----------- |
7
+ | [**get_selected**](SelectedHTMLApi.md#get_selected) | **GET** /selected | HTML of a selected page area by URL and CSS selector |
8
+ | [**get_selected_multiple**](SelectedHTMLApi.md#get_selected_multiple) | **GET** /selected-multiple | HTML of multiple page areas by URL and CSS selectors |
12
9
 
13
10
 
14
11
  ## get_selected
@@ -17,121 +14,84 @@ Method | HTTP request | Description
17
14
 
18
15
  HTML of a selected page area by URL and CSS selector
19
16
 
20
- Returns just HTML on success, JSON on error
17
+ Returns HTML of a selected page area by URL and CSS selector. Useful if you don't want to do the HTML parsing on your side.
21
18
 
22
- ### Example
19
+ ### Examples
23
20
 
24
21
  ```ruby
25
- # load the gem
22
+ require 'time'
26
23
  require 'webscraping_ai'
27
24
  # setup authorization
28
25
  WebScrapingAI.configure do |config|
29
26
  # Configure API key authorization: api_key
30
27
  config.api_key['api_key'] = 'YOUR API KEY'
31
28
  # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
32
- #config.api_key_prefix['api_key'] = 'Bearer'
29
+ # config.api_key_prefix['api_key'] = 'Bearer'
33
30
  end
34
31
 
35
32
  api_instance = WebScrapingAI::SelectedHTMLApi.new
36
- url = 'https://example.com' # String | URL of the target page
33
+ url = 'https://example.com' # String | URL of the target page.
37
34
  opts = {
38
35
  selector: 'h1', # String | CSS selector (null by default, returns whole page HTML)
39
- headers: {'key' => '{\"Cookie\":\"session=some_id\"}'}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})
40
- timeout: 5000, # Integer | Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000)
41
- js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default), costs 2 requests
42
- proxy: 'datacenter' # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default)
36
+ headers: { key: 3.56}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"}).
37
+ timeout: 10000, # Integer | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000).
38
+ js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default).
39
+ js_timeout: 2000, # Integer | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page.
40
+ proxy: 'datacenter', # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details.
41
+ country: 'us', # String | Country of the proxy to use (US by default). Only available on Startup and Custom plans.
42
+ device: 'desktop', # String | Type of device emulation.
43
+ error_on_404: false, # Boolean | Return error on 404 HTTP status on the target page (false by default).
44
+ error_on_redirect: false, # Boolean | Return error on redirect on the target page (false by default).
45
+ js_script: 'document.querySelector('button').click();' # String | Custom JavaScript code to execute on the target page.
43
46
  }
44
47
 
45
48
  begin
46
- #HTML of a selected page area by URL and CSS selector
49
+ # HTML of a selected page area by URL and CSS selector
47
50
  result = api_instance.get_selected(url, opts)
48
51
  p result
49
52
  rescue WebScrapingAI::ApiError => e
50
- puts "Exception when calling SelectedHTMLApi->get_selected: #{e}"
53
+ puts "Error when calling SelectedHTMLApi->get_selected: #{e}"
51
54
  end
52
55
  ```
53
56
 
54
- ### Parameters
55
-
57
+ #### Using the get_selected_with_http_info variant
56
58
 
57
- Name | Type | Description | Notes
58
- ------------- | ------------- | ------------- | -------------
59
- **url** | **String**| URL of the target page |
60
- **selector** | **String**| CSS selector (null by default, returns whole page HTML) | [optional]
61
- **headers** | [**Hash&lt;String, String&gt;**](String.md)| HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}) | [optional]
62
- **timeout** | **Integer**| Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000) | [optional] [default to 5000]
63
- **js** | **Boolean**| Execute on-page JavaScript using a headless browser (true by default), costs 2 requests | [optional] [default to true]
64
- **proxy** | **String**| Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default) | [optional] [default to &#39;datacenter&#39;]
59
+ This returns an Array which contains the response data, status code and headers.
65
60
 
66
- ### Return type
67
-
68
- **String**
69
-
70
- ### Authorization
71
-
72
- [api_key](../README.md#api_key)
73
-
74
- ### HTTP request headers
75
-
76
- - **Content-Type**: Not defined
77
- - **Accept**: application/json, text/html
78
-
79
-
80
- ## get_selected_multiple
81
-
82
- > Array&lt;String&gt; get_selected_multiple(url, opts)
83
-
84
- HTML of multiple page areas by URL and CSS selectors
85
-
86
- Always returns JSON
87
-
88
- ### Example
61
+ > <Array(String, Integer, Hash)> get_selected_with_http_info(url, opts)
89
62
 
90
63
  ```ruby
91
- # load the gem
92
- require 'webscraping_ai'
93
- # setup authorization
94
- WebScrapingAI.configure do |config|
95
- # Configure API key authorization: api_key
96
- config.api_key['api_key'] = 'YOUR API KEY'
97
- # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
98
- #config.api_key_prefix['api_key'] = 'Bearer'
99
- end
100
-
101
- api_instance = WebScrapingAI::SelectedHTMLApi.new
102
- url = 'https://example.com' # String | URL of the target page
103
- opts = {
104
- selectors: ['[\"h1\"]'], # Array<String> | Multiple CSS selectors (null by default, returns whole page HTML)
105
- headers: {'key' => '{\"Cookie\":\"session=some_id\"}'}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})
106
- timeout: 5000, # Integer | Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000)
107
- js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default), costs 2 requests
108
- proxy: 'datacenter' # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default)
109
- }
110
-
111
64
  begin
112
- #HTML of multiple page areas by URL and CSS selectors
113
- result = api_instance.get_selected_multiple(url, opts)
114
- p result
65
+ # HTML of a selected page area by URL and CSS selector
66
+ data, status_code, headers = api_instance.get_selected_with_http_info(url, opts)
67
+ p status_code # => 2xx
68
+ p headers # => { ... }
69
+ p data # => String
115
70
  rescue WebScrapingAI::ApiError => e
116
- puts "Exception when calling SelectedHTMLApi->get_selected_multiple: #{e}"
71
+ puts "Error when calling SelectedHTMLApi->get_selected_with_http_info: #{e}"
117
72
  end
118
73
  ```
119
74
 
120
75
  ### Parameters
121
76
 
122
-
123
- Name | Type | Description | Notes
124
- ------------- | ------------- | ------------- | -------------
125
- **url** | **String**| URL of the target page |
126
- **selectors** | [**Array&lt;String&gt;**](String.md)| Multiple CSS selectors (null by default, returns whole page HTML) | [optional]
127
- **headers** | [**Hash&lt;String, String&gt;**](String.md)| HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}) | [optional]
128
- **timeout** | **Integer**| Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000) | [optional] [default to 5000]
129
- **js** | **Boolean**| Execute on-page JavaScript using a headless browser (true by default), costs 2 requests | [optional] [default to true]
130
- **proxy** | **String**| Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default) | [optional] [default to &#39;datacenter&#39;]
77
+ | Name | Type | Description | Notes |
78
+ | ---- | ---- | ----------- | ----- |
79
+ | **url** | **String** | URL of the target page. | |
80
+ | **selector** | **String** | CSS selector (null by default, returns whole page HTML) | [optional] |
81
+ | **headers** | [**Hash&lt;String, String&gt;**](String.md) | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}). | [optional] |
82
+ | **timeout** | **Integer** | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000). | [optional][default to 10000] |
83
+ | **js** | **Boolean** | Execute on-page JavaScript using a headless browser (true by default). | [optional][default to true] |
84
+ | **js_timeout** | **Integer** | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page. | [optional][default to 2000] |
85
+ | **proxy** | **String** | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details. | [optional][default to &#39;datacenter&#39;] |
86
+ | **country** | **String** | Country of the proxy to use (US by default). Only available on Startup and Custom plans. | [optional][default to &#39;us&#39;] |
87
+ | **device** | **String** | Type of device emulation. | [optional][default to &#39;desktop&#39;] |
88
+ | **error_on_404** | **Boolean** | Return error on 404 HTTP status on the target page (false by default). | [optional][default to false] |
89
+ | **error_on_redirect** | **Boolean** | Return error on redirect on the target page (false by default). | [optional][default to false] |
90
+ | **js_script** | **String** | Custom JavaScript code to execute on the target page. | [optional] |
131
91
 
132
92
  ### Return type
133
93
 
134
- **Array&lt;String&gt;**
94
+ **String**
135
95
 
136
96
  ### Authorization
137
97
 
@@ -140,130 +100,89 @@ Name | Type | Description | Notes
140
100
  ### HTTP request headers
141
101
 
142
102
  - **Content-Type**: Not defined
143
- - **Accept**: application/json
103
+ - **Accept**: application/json, text/html
144
104
 
145
105
 
146
- ## post_selected
106
+ ## get_selected_multiple
147
107
 
148
- > String post_selected(url, opts)
108
+ > Array&lt;String&gt; get_selected_multiple(url, opts)
149
109
 
150
- HTML of a selected page areas by URL and CSS selector, with POST request to the target page
110
+ HTML of multiple page areas by URL and CSS selectors
151
111
 
152
- Returns just HTML on success, JSON on error. Request body will be passed to the target page.
112
+ Returns HTML of multiple page areas by URL and CSS selectors. Useful if you don't want to do the HTML parsing on your side.
153
113
 
154
- ### Example
114
+ ### Examples
155
115
 
156
116
  ```ruby
157
- # load the gem
117
+ require 'time'
158
118
  require 'webscraping_ai'
159
119
  # setup authorization
160
120
  WebScrapingAI.configure do |config|
161
121
  # Configure API key authorization: api_key
162
122
  config.api_key['api_key'] = 'YOUR API KEY'
163
123
  # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
164
- #config.api_key_prefix['api_key'] = 'Bearer'
124
+ # config.api_key_prefix['api_key'] = 'Bearer'
165
125
  end
166
126
 
167
127
  api_instance = WebScrapingAI::SelectedHTMLApi.new
168
- url = 'https://httpbin.org/post' # String | URL of the target page
128
+ url = 'https://example.com' # String | URL of the target page.
169
129
  opts = {
170
- selector: 'h1', # String | CSS selector (null by default, returns whole page HTML)
171
- headers: {'key' => '{\"Cookie\":\"session=some_id\"}'}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})
172
- timeout: 5000, # Integer | Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000)
173
- js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default), costs 2 requests
174
- proxy: 'datacenter', # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default)
175
- request_body: nil # Hash<String, Object> | Request body to pass to the target page
130
+ selectors: ['inner_example'], # Array<String> | Multiple CSS selectors (null by default, returns whole page HTML)
131
+ headers: { key: 3.56}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"}).
132
+ timeout: 10000, # Integer | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000).
133
+ js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default).
134
+ js_timeout: 2000, # Integer | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page.
135
+ proxy: 'datacenter', # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details.
136
+ country: 'us', # String | Country of the proxy to use (US by default). Only available on Startup and Custom plans.
137
+ device: 'desktop', # String | Type of device emulation.
138
+ error_on_404: false, # Boolean | Return error on 404 HTTP status on the target page (false by default).
139
+ error_on_redirect: false, # Boolean | Return error on redirect on the target page (false by default).
140
+ js_script: 'document.querySelector('button').click();' # String | Custom JavaScript code to execute on the target page.
176
141
  }
177
142
 
178
143
  begin
179
- #HTML of a selected page areas by URL and CSS selector, with POST request to the target page
180
- result = api_instance.post_selected(url, opts)
144
+ # HTML of multiple page areas by URL and CSS selectors
145
+ result = api_instance.get_selected_multiple(url, opts)
181
146
  p result
182
147
  rescue WebScrapingAI::ApiError => e
183
- puts "Exception when calling SelectedHTMLApi->post_selected: #{e}"
148
+ puts "Error when calling SelectedHTMLApi->get_selected_multiple: #{e}"
184
149
  end
185
150
  ```
186
151
 
187
- ### Parameters
188
-
152
+ #### Using the get_selected_multiple_with_http_info variant
189
153
 
190
- Name | Type | Description | Notes
191
- ------------- | ------------- | ------------- | -------------
192
- **url** | **String**| URL of the target page |
193
- **selector** | **String**| CSS selector (null by default, returns whole page HTML) | [optional]
194
- **headers** | [**Hash&lt;String, String&gt;**](String.md)| HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}) | [optional]
195
- **timeout** | **Integer**| Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000) | [optional] [default to 5000]
196
- **js** | **Boolean**| Execute on-page JavaScript using a headless browser (true by default), costs 2 requests | [optional] [default to true]
197
- **proxy** | **String**| Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default) | [optional] [default to &#39;datacenter&#39;]
198
- **request_body** | [**Hash&lt;String, Object&gt;**](Object.md)| Request body to pass to the target page | [optional]
199
-
200
- ### Return type
201
-
202
- **String**
203
-
204
- ### Authorization
205
-
206
- [api_key](../README.md#api_key)
207
-
208
- ### HTTP request headers
209
-
210
- - **Content-Type**: application/json, application/x-www-form-urlencoded, application/xml, text/plain
211
- - **Accept**: application/json, text/html
154
+ This returns an Array which contains the response data, status code and headers.
212
155
 
213
-
214
- ## post_selected_multiple
215
-
216
- > Array&lt;String&gt; post_selected_multiple(url, opts)
217
-
218
- HTML of multiple page areas by URL and CSS selectors, with POST request to the target page
219
-
220
- Always returns JSON. Request body will be passed to the target page.
221
-
222
- ### Example
156
+ > <Array(Array&lt;String&gt;, Integer, Hash)> get_selected_multiple_with_http_info(url, opts)
223
157
 
224
158
  ```ruby
225
- # load the gem
226
- require 'webscraping_ai'
227
- # setup authorization
228
- WebScrapingAI.configure do |config|
229
- # Configure API key authorization: api_key
230
- config.api_key['api_key'] = 'YOUR API KEY'
231
- # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
232
- #config.api_key_prefix['api_key'] = 'Bearer'
233
- end
234
-
235
- api_instance = WebScrapingAI::SelectedHTMLApi.new
236
- url = 'https://httpbin.org/post' # String | URL of the target page
237
- opts = {
238
- selectors: ['[\"h1\"]'], # Array<String> | Multiple CSS selectors (null by default, returns whole page HTML)
239
- headers: {'key' => '{\"Cookie\":\"session=some_id\"}'}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})
240
- timeout: 5000, # Integer | Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000)
241
- js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default), costs 2 requests
242
- proxy: 'datacenter', # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default)
243
- request_body: nil # Hash<String, Object> | Request body to pass to the target page
244
- }
245
-
246
159
  begin
247
- #HTML of multiple page areas by URL and CSS selectors, with POST request to the target page
248
- result = api_instance.post_selected_multiple(url, opts)
249
- p result
160
+ # HTML of multiple page areas by URL and CSS selectors
161
+ data, status_code, headers = api_instance.get_selected_multiple_with_http_info(url, opts)
162
+ p status_code # => 2xx
163
+ p headers # => { ... }
164
+ p data # => Array&lt;String&gt;
250
165
  rescue WebScrapingAI::ApiError => e
251
- puts "Exception when calling SelectedHTMLApi->post_selected_multiple: #{e}"
166
+ puts "Error when calling SelectedHTMLApi->get_selected_multiple_with_http_info: #{e}"
252
167
  end
253
168
  ```
254
169
 
255
170
  ### Parameters
256
171
 
257
-
258
- Name | Type | Description | Notes
259
- ------------- | ------------- | ------------- | -------------
260
- **url** | **String**| URL of the target page |
261
- **selectors** | [**Array&lt;String&gt;**](String.md)| Multiple CSS selectors (null by default, returns whole page HTML) | [optional]
262
- **headers** | [**Hash&lt;String, String&gt;**](String.md)| HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}) | [optional]
263
- **timeout** | **Integer**| Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000) | [optional] [default to 5000]
264
- **js** | **Boolean**| Execute on-page JavaScript using a headless browser (true by default), costs 2 requests | [optional] [default to true]
265
- **proxy** | **String**| Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default) | [optional] [default to &#39;datacenter&#39;]
266
- **request_body** | [**Hash&lt;String, Object&gt;**](Object.md)| Request body to pass to the target page | [optional]
172
+ | Name | Type | Description | Notes |
173
+ | ---- | ---- | ----------- | ----- |
174
+ | **url** | **String** | URL of the target page. | |
175
+ | **selectors** | [**Array&lt;String&gt;**](String.md) | Multiple CSS selectors (null by default, returns whole page HTML) | [optional] |
176
+ | **headers** | [**Hash&lt;String, String&gt;**](String.md) | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}). | [optional] |
177
+ | **timeout** | **Integer** | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000). | [optional][default to 10000] |
178
+ | **js** | **Boolean** | Execute on-page JavaScript using a headless browser (true by default). | [optional][default to true] |
179
+ | **js_timeout** | **Integer** | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page. | [optional][default to 2000] |
180
+ | **proxy** | **String** | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details. | [optional][default to &#39;datacenter&#39;] |
181
+ | **country** | **String** | Country of the proxy to use (US by default). Only available on Startup and Custom plans. | [optional][default to &#39;us&#39;] |
182
+ | **device** | **String** | Type of device emulation. | [optional][default to &#39;desktop&#39;] |
183
+ | **error_on_404** | **Boolean** | Return error on 404 HTTP status on the target page (false by default). | [optional][default to false] |
184
+ | **error_on_redirect** | **Boolean** | Return error on redirect on the target page (false by default). | [optional][default to false] |
185
+ | **js_script** | **String** | Custom JavaScript code to execute on the target page. | [optional] |
267
186
 
268
187
  ### Return type
269
188
 
@@ -275,6 +194,6 @@ Name | Type | Description | Notes
275
194
 
276
195
  ### HTTP request headers
277
196
 
278
- - **Content-Type**: application/json, application/x-www-form-urlencoded, application/xml, text/plain
197
+ - **Content-Type**: Not defined
279
198
  - **Accept**: application/json
280
199
 
data/docs/TextApi.md ADDED
@@ -0,0 +1,105 @@
1
+ # WebScrapingAI::TextApi
2
+
3
+ All URIs are relative to *https://api.webscraping.ai*
4
+
5
+ | Method | HTTP request | Description |
6
+ | ------ | ------------ | ----------- |
7
+ | [**get_text**](TextApi.md#get_text) | **GET** /text | Page text by URL |
8
+
9
+
10
+ ## get_text
11
+
12
+ > String get_text(url, opts)
13
+
14
+ Page text by URL
15
+
16
+ Returns the visible text content of a webpage specified by the URL. Can be used to feed data to GPT or other LLM models. The response can be in plain text, JSON, or XML format based on the text_format parameter. Proxies and Chromium JavaScript rendering are used for page retrieval and processing. Returns JSON on error.
17
+
18
+ ### Examples
19
+
20
+ ```ruby
21
+ require 'time'
22
+ require 'webscraping_ai'
23
+ # setup authorization
24
+ WebScrapingAI.configure do |config|
25
+ # Configure API key authorization: api_key
26
+ config.api_key['api_key'] = 'YOUR API KEY'
27
+ # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
28
+ # config.api_key_prefix['api_key'] = 'Bearer'
29
+ end
30
+
31
+ api_instance = WebScrapingAI::TextApi.new
32
+ url = 'https://example.com' # String | URL of the target page.
33
+ opts = {
34
+ text_format: 'plain', # String | Format of the text response (plain by default). \"plain\" will return only the page body text. \"json\" and \"xml\" will return a json/xml with \"title\", \"description\" and \"content\" keys.
35
+ return_links: false, # Boolean | [Works only with text_format=json] Return links from the page body text (false by default). Useful for building web crawlers.
36
+ headers: { key: 3.56}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"}).
37
+ timeout: 10000, # Integer | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000).
38
+ js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default).
39
+ js_timeout: 2000, # Integer | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page.
40
+ proxy: 'datacenter', # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details.
41
+ country: 'us', # String | Country of the proxy to use (US by default). Only available on Startup and Custom plans.
42
+ device: 'desktop', # String | Type of device emulation.
43
+ error_on_404: false, # Boolean | Return error on 404 HTTP status on the target page (false by default).
44
+ error_on_redirect: false, # Boolean | Return error on redirect on the target page (false by default).
45
+ js_script: 'document.querySelector('button').click();' # String | Custom JavaScript code to execute on the target page.
46
+ }
47
+
48
+ begin
49
+ # Page text by URL
50
+ result = api_instance.get_text(url, opts)
51
+ p result
52
+ rescue WebScrapingAI::ApiError => e
53
+ puts "Error when calling TextApi->get_text: #{e}"
54
+ end
55
+ ```
56
+
57
+ #### Using the get_text_with_http_info variant
58
+
59
+ This returns an Array which contains the response data, status code and headers.
60
+
61
+ > <Array(String, Integer, Hash)> get_text_with_http_info(url, opts)
62
+
63
+ ```ruby
64
+ begin
65
+ # Page text by URL
66
+ data, status_code, headers = api_instance.get_text_with_http_info(url, opts)
67
+ p status_code # => 2xx
68
+ p headers # => { ... }
69
+ p data # => String
70
+ rescue WebScrapingAI::ApiError => e
71
+ puts "Error when calling TextApi->get_text_with_http_info: #{e}"
72
+ end
73
+ ```
74
+
75
+ ### Parameters
76
+
77
+ | Name | Type | Description | Notes |
78
+ | ---- | ---- | ----------- | ----- |
79
+ | **url** | **String** | URL of the target page. | |
80
+ | **text_format** | **String** | Format of the text response (plain by default). \&quot;plain\&quot; will return only the page body text. \&quot;json\&quot; and \&quot;xml\&quot; will return a json/xml with \&quot;title\&quot;, \&quot;description\&quot; and \&quot;content\&quot; keys. | [optional][default to &#39;plain&#39;] |
81
+ | **return_links** | **Boolean** | [Works only with text_format&#x3D;json] Return links from the page body text (false by default). Useful for building web crawlers. | [optional][default to false] |
82
+ | **headers** | [**Hash&lt;String, String&gt;**](String.md) | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&amp;headers[One]&#x3D;value1&amp;headers&#x3D;[Another]&#x3D;value2) or as a JSON encoded object (...&amp;headers&#x3D;{\&quot;One\&quot;: \&quot;value1\&quot;, \&quot;Another\&quot;: \&quot;value2\&quot;}). | [optional] |
83
+ | **timeout** | **Integer** | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000). | [optional][default to 10000] |
84
+ | **js** | **Boolean** | Execute on-page JavaScript using a headless browser (true by default). | [optional][default to true] |
85
+ | **js_timeout** | **Integer** | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page. | [optional][default to 2000] |
86
+ | **proxy** | **String** | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details. | [optional][default to &#39;datacenter&#39;] |
87
+ | **country** | **String** | Country of the proxy to use (US by default). Only available on Startup and Custom plans. | [optional][default to &#39;us&#39;] |
88
+ | **device** | **String** | Type of device emulation. | [optional][default to &#39;desktop&#39;] |
89
+ | **error_on_404** | **Boolean** | Return error on 404 HTTP status on the target page (false by default). | [optional][default to false] |
90
+ | **error_on_redirect** | **Boolean** | Return error on redirect on the target page (false by default). | [optional][default to false] |
91
+ | **js_script** | **String** | Custom JavaScript code to execute on the target page. | [optional] |
92
+
93
+ ### Return type
94
+
95
+ **String**
96
+
97
+ ### Authorization
98
+
99
+ [api_key](../README.md#api_key)
100
+
101
+ ### HTTP request headers
102
+
103
+ - **Content-Type**: Not defined
104
+ - **Accept**: application/json, text/html, text/xml
105
+
data/git_push.sh CHANGED
@@ -1,7 +1,7 @@
1
1
  #!/bin/sh
2
2
  # ref: https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/
3
3
  #
4
- # Usage example: /bin/sh ./git_push.sh wing328 openapi-pestore-perl "minor update" "gitlab.com"
4
+ # Usage example: /bin/sh ./git_push.sh wing328 openapi-petstore-perl "minor update" "gitlab.com"
5
5
 
6
6
  git_user_id=$1
7
7
  git_repo_id=$2
@@ -38,14 +38,14 @@ git add .
38
38
  git commit -m "$release_note"
39
39
 
40
40
  # Sets the new remote
41
- git_remote=`git remote`
41
+ git_remote=$(git remote)
42
42
  if [ "$git_remote" = "" ]; then # git remote not defined
43
43
 
44
44
  if [ "$GIT_TOKEN" = "" ]; then
45
45
  echo "[INFO] \$GIT_TOKEN (environment variable) is not set. Using the git credential in your environment."
46
46
  git remote add origin https://${git_host}/${git_user_id}/${git_repo_id}.git
47
47
  else
48
- git remote add origin https://${git_user_id}:${GIT_TOKEN}@${git_host}/${git_user_id}/${git_repo_id}.git
48
+ git remote add origin https://${git_user_id}:"${GIT_TOKEN}"@${git_host}/${git_user_id}/${git_repo_id}.git
49
49
  fi
50
50
 
51
51
  fi
@@ -55,4 +55,3 @@ git pull origin master
55
55
  # Pushes (Forces) the changes in the local repository up to the remote repository
56
56
  echo "Git pushing to https://${git_host}/${git_user_id}/${git_repo_id}.git"
57
57
  git push origin master 2>&1 | grep -v 'To https'
58
-
@@ -0,0 +1,79 @@
1
+ =begin
2
+ #WebScraping.AI
3
+
4
+ #WebScraping.AI scraping API provides GPT-powered tools with Chromium JavaScript rendering, rotating proxies, and built-in HTML parsing.
5
+
6
+ The version of the OpenAPI document: 3.1.3
7
+ Contact: support@webscraping.ai
8
+ Generated by: https://openapi-generator.tech
9
+ OpenAPI Generator version: 7.2.0
10
+
11
+ =end
12
+
13
+ require 'cgi'
14
+
15
+ module WebScrapingAI
16
+ class AccountApi
17
+ attr_accessor :api_client
18
+
19
+ def initialize(api_client = ApiClient.default)
20
+ @api_client = api_client
21
+ end
22
+ # Information about your account calls quota
23
+ # Returns information about your account, including the remaining API credits quota, the next billing cycle start time, and the remaining concurrent requests. The response is in JSON format.
24
+ # @param [Hash] opts the optional parameters
25
+ # @return [Account]
26
+ def account(opts = {})
27
+ data, _status_code, _headers = account_with_http_info(opts)
28
+ data
29
+ end
30
+
31
+ # Information about your account calls quota
32
+ # Returns information about your account, including the remaining API credits quota, the next billing cycle start time, and the remaining concurrent requests. The response is in JSON format.
33
+ # @param [Hash] opts the optional parameters
34
+ # @return [Array<(Account, Integer, Hash)>] Account data, response status code and response headers
35
+ def account_with_http_info(opts = {})
36
+ if @api_client.config.debugging
37
+ @api_client.config.logger.debug 'Calling API: AccountApi.account ...'
38
+ end
39
+ # resource path
40
+ local_var_path = '/account'
41
+
42
+ # query parameters
43
+ query_params = opts[:query_params] || {}
44
+
45
+ # header parameters
46
+ header_params = opts[:header_params] || {}
47
+ # HTTP header 'Accept' (if needed)
48
+ header_params['Accept'] = @api_client.select_header_accept(['application/json'])
49
+
50
+ # form parameters
51
+ form_params = opts[:form_params] || {}
52
+
53
+ # http body (model)
54
+ post_body = opts[:debug_body]
55
+
56
+ # return_type
57
+ return_type = opts[:debug_return_type] || 'Account'
58
+
59
+ # auth_names
60
+ auth_names = opts[:debug_auth_names] || ['api_key']
61
+
62
+ new_options = opts.merge(
63
+ :operation => :"AccountApi.account",
64
+ :header_params => header_params,
65
+ :query_params => query_params,
66
+ :form_params => form_params,
67
+ :body => post_body,
68
+ :auth_names => auth_names,
69
+ :return_type => return_type
70
+ )
71
+
72
+ data, status_code, headers = @api_client.call_api(:GET, local_var_path, new_options)
73
+ if @api_client.config.debugging
74
+ @api_client.config.logger.debug "API called: AccountApi#account\nData: #{data.inspect}\nStatus code: #{status_code}\nHeaders: #{headers}"
75
+ end
76
+ return data, status_code, headers
77
+ end
78
+ end
79
+ end