proxy_fetcher 0.4.1 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/.rubocop.yml +3 -0
- data/CHANGELOG.md +16 -0
- data/Gemfile +1 -0
- data/README.md +55 -8
- data/lib/proxy_fetcher.rb +6 -1
- data/lib/proxy_fetcher/client/client.rb +71 -0
- data/lib/proxy_fetcher/client/proxies_registry.rb +32 -0
- data/lib/proxy_fetcher/client/request.rb +88 -0
- data/lib/proxy_fetcher/configuration.rb +7 -5
- data/lib/proxy_fetcher/configuration/providers_registry.rb +3 -6
- data/lib/proxy_fetcher/exceptions.rb +36 -0
- data/lib/proxy_fetcher/providers/gather_proxy.rb +36 -0
- data/lib/proxy_fetcher/providers/http_tunnel.rb +48 -0
- data/lib/proxy_fetcher/providers/proxy_docker.rb +1 -1
- data/lib/proxy_fetcher/proxy.rb +3 -1
- data/lib/proxy_fetcher/utils/http_client.rb +2 -1
- data/lib/proxy_fetcher/utils/proxy_validator.rb +3 -3
- data/lib/proxy_fetcher/version.rb +2 -2
- data/proxy_fetcher.gemspec +1 -1
- data/spec/proxy_fetcher/client_spec.rb +125 -0
- data/spec/proxy_fetcher/configuration_spec.rb +4 -4
- data/spec/proxy_fetcher/providers/gather_proxy_spec.rb +9 -0
- data/spec/proxy_fetcher/providers/http_tunnel_spec.rb +9 -0
- data/spec/spec_helper.rb +3 -1
- data/spec/support/evil_proxy_patch.rb +26 -0
- metadata +12 -4
- data/lib/proxy_fetcher/providers/hide_my_name.rb +0 -35
- data/spec/proxy_fetcher/providers/hide_my_name_spec.rb +0 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 287db7b55e3f0798e263fe7268f8a709e4d8e8c0
|
4
|
+
data.tar.gz: cde6d4dc22e60aa012c02b1f679fcc72b23c6114
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a54e2d725338bc5c859d415cae5b0397b4aa2a75a0f00917edf4eaa2f845e893c1d73f8e56b86ddfce85c6016fcf473ed4ad2e29fbf78a2c65495df050a320d7
|
7
|
+
data.tar.gz: 61b2fc3dfd20c75045c8b435d0855e0596071facc8bef9e73d68e46996faaad8a9e1ccbb3b9b818f1449138e7d52ee583dffaa813cc5d3696a24ad6ba835475c
|
data/.gitignore
CHANGED
data/.rubocop.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,22 @@
|
|
2
2
|
|
3
3
|
Reverse Chronological Order:
|
4
4
|
|
5
|
+
## `0.5.0` (2017-09-06)
|
6
|
+
|
7
|
+
* Remove HideMyName provider (not works anymore)
|
8
|
+
* Fix ProxyDocker provider
|
9
|
+
* Add `ProxyFetcher::Client` to make interacting with proxies easier
|
10
|
+
* Add new providers (Gather Proxy & HTTP Tunnel Genius)
|
11
|
+
* Simplify `connection_timeout` config option to `timeout`
|
12
|
+
* Make User-Agent configurable
|
13
|
+
* Move all the gem exceptions under `ProxyFetcher::Error` base class
|
14
|
+
* Small improvements
|
15
|
+
|
16
|
+
## `0.4.1` (2017-09-04)
|
17
|
+
|
18
|
+
* Use all registered providers by default
|
19
|
+
* Disable HideMyName provider (now ше uses CloudFlare)
|
20
|
+
|
5
21
|
## `0.4.0` (2017-08-26)
|
6
22
|
|
7
23
|
* Support operations with multiple providers
|
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -6,7 +6,7 @@
|
|
6
6
|
[](#license)
|
7
7
|
|
8
8
|
This gem can help your Ruby application to make HTTP(S) requests from proxy by fetching and validating actual
|
9
|
-
proxy lists from multiple providers
|
9
|
+
proxy lists from multiple providers.
|
10
10
|
|
11
11
|
It gives you a `Manager` class that can load proxy lists, validate them and return random or specific proxies. Take a look
|
12
12
|
at the documentation below to find all the gem features.
|
@@ -20,6 +20,7 @@ validating proxy lists from the different providers. [Checkout examples](#standa
|
|
20
20
|
- [Example of usage](#example-of-usage)
|
21
21
|
- [In Ruby application](#in-ruby-application)
|
22
22
|
- [Standalone](#standalone)
|
23
|
+
- [Client](#client)
|
23
24
|
- [Configuration](#configuration)
|
24
25
|
- [Proxy validation speed](#proxy-validation-speed)
|
25
26
|
- [Proxy object](#proxy-object)
|
@@ -32,7 +33,7 @@ validating proxy lists from the different providers. [Checkout examples](#standa
|
|
32
33
|
If using bundler, first add 'proxy_fetcher' to your Gemfile:
|
33
34
|
|
34
35
|
```ruby
|
35
|
-
gem 'proxy_fetcher', '~> 0.
|
36
|
+
gem 'proxy_fetcher', '~> 0.5'
|
36
37
|
```
|
37
38
|
|
38
39
|
or if you want to use the latest version (from `master` branch), then:
|
@@ -50,7 +51,7 @@ bundle install
|
|
50
51
|
Otherwise simply install the gem:
|
51
52
|
|
52
53
|
```sh
|
53
|
-
gem install proxy_fetcher -v '0.
|
54
|
+
gem install proxy_fetcher -v '0.5'
|
54
55
|
```
|
55
56
|
|
56
57
|
## Example of usage
|
@@ -123,7 +124,7 @@ If you need to filter proxy list, for example, by country or response time and s
|
|
123
124
|
then you can just pass your filters like a simple Ruby hash to the Manager instance:
|
124
125
|
|
125
126
|
```ruby
|
126
|
-
ProxyFetcher.config.providers = :
|
127
|
+
ProxyFetcher.config.providers = :proxy_docker
|
127
128
|
|
128
129
|
manager = ProxyFetcher::Manager.new(filters: { country: 'PL', maxtime: '500' })
|
129
130
|
manager.proxies
|
@@ -134,7 +135,7 @@ manager.proxies
|
|
134
135
|
If you are using multiple providers, then you can split your filters by proxy provider names:
|
135
136
|
|
136
137
|
```ruby
|
137
|
-
ProxyFetcher.config.providers = [:
|
138
|
+
ProxyFetcher.config.providers = [:proxy_docker, :xroxy]
|
138
139
|
|
139
140
|
manager = ProxyFetcher::Manager.new(filters: {
|
140
141
|
hide_my_name: {
|
@@ -194,19 +195,64 @@ To get all the possible options run:
|
|
194
195
|
proxy_fetcher --help
|
195
196
|
```
|
196
197
|
|
198
|
+
## Client
|
199
|
+
|
200
|
+
ProxyFetcher gem provides you a ready-to-use HTTP client that made requesting with proxies easy. It does all the work
|
201
|
+
with the proxy lists for you (load, validate, refresh, find proxy by type, follow redirects, etc). All you need it to
|
202
|
+
make HTTP(S) requests:
|
203
|
+
|
204
|
+
```ruby
|
205
|
+
require 'proxy-fetcher'
|
206
|
+
|
207
|
+
ProxyFetcher::Client.get 'https://example.com/resource'
|
208
|
+
|
209
|
+
ProxyFetcher::Client.post 'https://example.com/resource', { param: 'value' }
|
210
|
+
|
211
|
+
ProxyFetcher::Client.post 'https://example.com/resource', 'Any data'
|
212
|
+
|
213
|
+
ProxyFetcher::Client.post 'https://example.com/resource', { param: 'value'}.to_json , headers: { 'Content-Type': 'application/json' }
|
214
|
+
|
215
|
+
ProxyFetcher::Client.put 'https://example.com/resource', { param: 'value' }
|
216
|
+
|
217
|
+
ProxyFetcher::Client.patch 'https://example.com/resource', { param: 'value' }
|
218
|
+
|
219
|
+
ProxyFetcher::Client.delete 'https://example.com/resource'
|
220
|
+
```
|
221
|
+
|
222
|
+
By default, `ProxyFetcher::Client` makes 1000 attempts to send a HTTP request in case if proxy is out of order or the
|
223
|
+
remote server returns an error. You can increase or decrease this number for your case or set it to `nil` if you want to
|
224
|
+
make infinite number of requests (or before your Ruby process will die :skull:):
|
225
|
+
|
226
|
+
```ruby
|
227
|
+
require 'proxy-fetcher'
|
228
|
+
|
229
|
+
ProxyFetcher::Client.get 'https://example.com/resource', options: { max_retries: 10_000 }
|
230
|
+
```
|
231
|
+
|
232
|
+
Btw, if you need support of JavaScript or some other features, you need to implement your own client using, for example,
|
233
|
+
`selenium-webdriver`.
|
234
|
+
|
197
235
|
## Configuration
|
198
236
|
|
199
237
|
To change open/read timeout for `cleanup!` and `connectable?` methods you need to change `ProxyFetcher.config`:
|
200
238
|
|
201
239
|
```ruby
|
202
240
|
ProxyFetcher.configure do |config|
|
203
|
-
config.
|
241
|
+
config.timeout = 1 # default is 3
|
204
242
|
end
|
205
243
|
|
206
244
|
manager = ProxyFetcher::Manager.new
|
207
245
|
manager.cleanup!
|
208
246
|
```
|
209
247
|
|
248
|
+
Also you can set your custom User-Agent:
|
249
|
+
|
250
|
+
```ruby
|
251
|
+
ProxyFetcher.configure do |config|
|
252
|
+
config.user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
|
253
|
+
end
|
254
|
+
```
|
255
|
+
|
210
256
|
ProxyFetcher uses simple Ruby solution for dealing with HTTP(S) requests - `net/http` library from the stdlib. If you wanna add, for example, your custom provider that
|
211
257
|
was developed as a Single Page Application (SPA) with some JavaScript, then you will need something like [selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb)
|
212
258
|
to properly load the content of the website. For those and other cases you can write your own class for fetching HTML content by the URL and setup it
|
@@ -269,7 +315,7 @@ ProxyFetcher.config.pool_size = 50
|
|
269
315
|
You can experiment with the threads pool size to find an optimal number of maximum threads count for you PC and OS.
|
270
316
|
This will definitely give you some performance improvements.
|
271
317
|
|
272
|
-
Moreover, the common proxy validation speed depends on `ProxyFetcher.config.
|
318
|
+
Moreover, the common proxy validation speed depends on `ProxyFetcher.config.timeout` option that is equal
|
273
319
|
to `3` by default. It means that gem will wait 3 seconds for the server answer to check if particular proxy is connectable.
|
274
320
|
You can decrease this option to `1`, for example, and it will heavily increase proxy validation speed (**but remember**
|
275
321
|
that some proxies could be connectable, but slow, so with this option you will clear proxy list from the proxies that
|
@@ -300,10 +346,11 @@ Also you can call next instance methods for every Proxy object:
|
|
300
346
|
|
301
347
|
Currently ProxyFetcher can deal with next proxy providers (services):
|
302
348
|
|
303
|
-
* Hide My Name (**currently does not work**)
|
304
349
|
* Free Proxy List
|
305
350
|
* Free SSL Proxies
|
306
351
|
* Proxy Docker
|
352
|
+
* Gather Proxy
|
353
|
+
* HTTP Tunnel Genius
|
307
354
|
* Proxy List
|
308
355
|
* XRoxy
|
309
356
|
|
data/lib/proxy_fetcher.rb
CHANGED
@@ -3,6 +3,7 @@ require 'net/https'
|
|
3
3
|
require 'nokogiri'
|
4
4
|
require 'thread'
|
5
5
|
|
6
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/exceptions'
|
6
7
|
require File.dirname(__FILE__) + '/proxy_fetcher/configuration'
|
7
8
|
require File.dirname(__FILE__) + '/proxy_fetcher/configuration/providers_registry'
|
8
9
|
require File.dirname(__FILE__) + '/proxy_fetcher/proxy'
|
@@ -11,13 +12,17 @@ require File.dirname(__FILE__) + '/proxy_fetcher/manager'
|
|
11
12
|
require File.dirname(__FILE__) + '/proxy_fetcher/utils/http_client'
|
12
13
|
require File.dirname(__FILE__) + '/proxy_fetcher/utils/html'
|
13
14
|
require File.dirname(__FILE__) + '/proxy_fetcher/utils/proxy_validator'
|
15
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/client/client'
|
16
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/client/request'
|
17
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/client/proxies_registry'
|
14
18
|
|
15
19
|
module ProxyFetcher
|
16
20
|
module Providers
|
17
21
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/base'
|
18
22
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/free_proxy_list'
|
19
23
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/free_proxy_list_ssl'
|
20
|
-
require File.dirname(__FILE__) + '/proxy_fetcher/providers/
|
24
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/providers/gather_proxy'
|
25
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/providers/http_tunnel'
|
21
26
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/proxy_docker'
|
22
27
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/proxy_list'
|
23
28
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/xroxy'
|
@@ -0,0 +1,71 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
module Client
|
3
|
+
class << self
|
4
|
+
def get(url, headers: {}, options: {})
|
5
|
+
request_without_payload(:get, url, headers, options)
|
6
|
+
end
|
7
|
+
|
8
|
+
def head(url, headers: {}, options: {})
|
9
|
+
request_without_payload(:head, url, headers, options)
|
10
|
+
end
|
11
|
+
|
12
|
+
def post(url, payload, headers: {}, options: {})
|
13
|
+
request_with_payload(:post, url, payload, headers, options)
|
14
|
+
end
|
15
|
+
|
16
|
+
def delete(url, headers: {}, options: {})
|
17
|
+
request_without_payload(:delete, url, headers, options)
|
18
|
+
end
|
19
|
+
|
20
|
+
def put(url, payload, headers: {}, options: {})
|
21
|
+
request_with_payload(:put, url, payload, headers, options)
|
22
|
+
end
|
23
|
+
|
24
|
+
def patch(url, payload, headers: {}, options: {})
|
25
|
+
request_with_payload(:patch, url, payload, headers, options)
|
26
|
+
end
|
27
|
+
|
28
|
+
private
|
29
|
+
|
30
|
+
def request_with_payload(method, url, payload, headers, options)
|
31
|
+
safe_request_to(url, options.fetch(:max_retries, 1000)) do |proxy|
|
32
|
+
opts = options.merge(url: url, payload: payload, proxy: proxy, headers: default_headers.merge(headers))
|
33
|
+
|
34
|
+
Request.execute(method: method, **opts)
|
35
|
+
end
|
36
|
+
end
|
37
|
+
|
38
|
+
def request_without_payload(method, url, headers, options)
|
39
|
+
safe_request_to(url, options.fetch(:max_retries, 1000)) do |proxy|
|
40
|
+
opts = options.merge(url: url, proxy: proxy, headers: default_headers.merge(headers))
|
41
|
+
|
42
|
+
Request.execute(method: method, **opts)
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
def default_headers
|
47
|
+
{
|
48
|
+
'User-Agent' => ProxyFetcher.config.user_agent
|
49
|
+
}
|
50
|
+
end
|
51
|
+
|
52
|
+
def safe_request_to(url, max_retries = 1000)
|
53
|
+
tries = 0
|
54
|
+
|
55
|
+
begin
|
56
|
+
proxy = ProxiesRegistry.find_proxy_for(url)
|
57
|
+
yield(proxy)
|
58
|
+
rescue ProxyFetcher::Error
|
59
|
+
raise
|
60
|
+
rescue StandardError
|
61
|
+
raise ProxyFetcher::Exceptions::MaximumRetriesReached if max_retries && tries >= max_retries
|
62
|
+
|
63
|
+
ProxiesRegistry.invalidate_proxy!(proxy)
|
64
|
+
tries += 1
|
65
|
+
|
66
|
+
retry
|
67
|
+
end
|
68
|
+
end
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
@@ -0,0 +1,32 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
module Client
|
3
|
+
class ProxiesRegistry
|
4
|
+
class << self
|
5
|
+
def invalidate_proxy!(proxy)
|
6
|
+
manager.proxies.delete(proxy)
|
7
|
+
manager.refresh_list! if manager.proxies.empty?
|
8
|
+
end
|
9
|
+
|
10
|
+
def find_proxy_for(url)
|
11
|
+
proxy = if URI.parse(url).is_a?(URI::HTTPS)
|
12
|
+
manager.proxies.detect(&:ssl?)
|
13
|
+
else
|
14
|
+
manager.get
|
15
|
+
end
|
16
|
+
|
17
|
+
return proxy unless proxy.nil?
|
18
|
+
|
19
|
+
manager.refresh_list!
|
20
|
+
find_proxy_for(url)
|
21
|
+
end
|
22
|
+
|
23
|
+
def manager
|
24
|
+
manager = Thread.current[:proxy_fetcher_manager]
|
25
|
+
return manager unless manager.nil?
|
26
|
+
|
27
|
+
Thread.current[:proxy_fetcher_manager] = ProxyFetcher::Manager.new
|
28
|
+
end
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
@@ -0,0 +1,88 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
module Client
|
3
|
+
class Request
|
4
|
+
URL_ENCODED = {
|
5
|
+
'Content-Type' => 'application/x-www-form-urlencoded'
|
6
|
+
}.freeze
|
7
|
+
|
8
|
+
DEFAULT_SSL_OPTIONS = {
|
9
|
+
verify_mode: OpenSSL::SSL::VERIFY_NONE
|
10
|
+
}.freeze
|
11
|
+
|
12
|
+
attr_reader :http, :method, :uri, :headers, :timeout,
|
13
|
+
:payload, :proxy, :max_redirects, :ssl_options
|
14
|
+
|
15
|
+
def self.execute(args)
|
16
|
+
new(args).execute
|
17
|
+
end
|
18
|
+
|
19
|
+
def initialize(args)
|
20
|
+
raise ArgumentError, 'args must be a Hash!' unless args.is_a?(Hash)
|
21
|
+
|
22
|
+
@uri = URI.parse(args.fetch(:url))
|
23
|
+
@method = args.fetch(:method).to_s.capitalize
|
24
|
+
@headers = (args[:headers] || {}).dup
|
25
|
+
@payload = preprocess_payload(args[:payload])
|
26
|
+
@timeout = args.fetch(:timeout, ProxyFetcher.config.timeout)
|
27
|
+
@ssl_options = args.fetch(:ssl_options, DEFAULT_SSL_OPTIONS)
|
28
|
+
|
29
|
+
@proxy = args.fetch(:proxy)
|
30
|
+
@max_redirects = args.fetch(:max_redirects, 10)
|
31
|
+
|
32
|
+
build_http_client
|
33
|
+
end
|
34
|
+
|
35
|
+
def execute
|
36
|
+
request = request_class_for(method).new(uri, headers)
|
37
|
+
|
38
|
+
http.start do |connection|
|
39
|
+
process_response!(connection.request(request, payload))
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
private
|
44
|
+
|
45
|
+
def preprocess_payload(payload)
|
46
|
+
return if payload.nil?
|
47
|
+
|
48
|
+
if payload.is_a?(Hash)
|
49
|
+
headers.merge(URL_ENCODED)
|
50
|
+
URI.encode_www_form(payload)
|
51
|
+
else
|
52
|
+
payload
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
def build_http_client
|
57
|
+
@http = Net::HTTP.new(uri.host, uri.port, proxy.addr, proxy.port)
|
58
|
+
|
59
|
+
@http.use_ssl = uri.is_a?(URI::HTTPS)
|
60
|
+
@http.verify_mode = ssl_options.fetch(:verify_mode)
|
61
|
+
@http.open_timeout = timeout
|
62
|
+
@http.read_timeout = timeout
|
63
|
+
end
|
64
|
+
|
65
|
+
def process_response!(http_response)
|
66
|
+
case http_response
|
67
|
+
when Net::HTTPSuccess then http_response.read_body
|
68
|
+
when Net::HTTPRedirection then follow_redirection(http_response)
|
69
|
+
else
|
70
|
+
http_response.error!
|
71
|
+
end
|
72
|
+
end
|
73
|
+
|
74
|
+
def follow_redirection(http_response)
|
75
|
+
raise ProxyFetcher::Exceptions::MaximumRedirectsReached if max_redirects <= 0
|
76
|
+
|
77
|
+
url = http_response.fetch('location')
|
78
|
+
url = uri.merge(url).to_s unless url.downcase.start_with?('http')
|
79
|
+
|
80
|
+
Request.execute(method: :get, url: url, proxy: proxy, headers: headers, timeout: timeout, max_redirects: max_redirects - 1)
|
81
|
+
end
|
82
|
+
|
83
|
+
def request_class_for(method)
|
84
|
+
Net::HTTP.const_get(method, false)
|
85
|
+
end
|
86
|
+
end
|
87
|
+
end
|
88
|
+
end
|
@@ -1,10 +1,11 @@
|
|
1
1
|
module ProxyFetcher
|
2
2
|
class Configuration
|
3
|
-
|
4
|
-
|
5
|
-
attr_accessor :providers, :connection_timeout, :pool_size
|
3
|
+
attr_accessor :providers, :timeout, :pool_size, :user_agent
|
6
4
|
attr_accessor :http_client, :proxy_validator
|
7
5
|
|
6
|
+
# rubocop:disable Metrics/LineLength
|
7
|
+
DEFAULT_USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112 Safari/537.36'.freeze
|
8
|
+
|
8
9
|
class << self
|
9
10
|
def providers_registry
|
10
11
|
@registry ||= ProvidersRegistry.new
|
@@ -25,8 +26,9 @@ module ProxyFetcher
|
|
25
26
|
|
26
27
|
# Sets default configuration options
|
27
28
|
def reset!
|
29
|
+
@user_agent = DEFAULT_USER_AGENT
|
28
30
|
@pool_size = 10
|
29
|
-
@
|
31
|
+
@timeout = 3
|
30
32
|
@http_client = HTTPClient
|
31
33
|
@proxy_validator = ProxyValidator
|
32
34
|
|
@@ -53,7 +55,7 @@ module ProxyFetcher
|
|
53
55
|
# Checks if custom class has some required class methods
|
54
56
|
def setup_custom_class(klass, required_methods: [])
|
55
57
|
unless klass.respond_to?(*required_methods)
|
56
|
-
raise WrongCustomClass
|
58
|
+
raise ProxyFetcher::Exceptions::WrongCustomClass.new(klass, required_methods)
|
57
59
|
end
|
58
60
|
|
59
61
|
klass
|
@@ -1,17 +1,14 @@
|
|
1
1
|
module ProxyFetcher
|
2
2
|
class ProvidersRegistry
|
3
|
-
UnknownProvider = Class.new(StandardError)
|
4
|
-
RegisteredProvider = Class.new(StandardError)
|
5
|
-
|
6
3
|
def providers
|
7
4
|
@providers ||= {}
|
8
5
|
end
|
9
6
|
|
10
7
|
# Add custom provider to common registry.
|
11
|
-
# Requires proxy provider name ('
|
8
|
+
# Requires proxy provider name ('proxy_docker' for example) and a class
|
12
9
|
# that implements the parsing logic.
|
13
10
|
def register(name, klass)
|
14
|
-
raise RegisteredProvider,
|
11
|
+
raise ProxyFetcher::Exceptions::RegisteredProvider, name if providers.key?(name.to_sym)
|
15
12
|
|
16
13
|
providers[name.to_sym] = klass
|
17
14
|
end
|
@@ -23,7 +20,7 @@ module ProxyFetcher
|
|
23
20
|
|
24
21
|
providers.fetch(provider_name)
|
25
22
|
rescue KeyError
|
26
|
-
raise UnknownProvider,
|
23
|
+
raise ProxyFetcher::Exceptions::UnknownProvider, provider_name
|
27
24
|
end
|
28
25
|
end
|
29
26
|
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
Error = Class.new(StandardError)
|
3
|
+
|
4
|
+
module Exceptions
|
5
|
+
class WrongCustomClass < Error
|
6
|
+
def initialize(klass, methods)
|
7
|
+
required_methods = Array(methods).join(', ')
|
8
|
+
super("#{klass} must respond to [#{required_methods}] class methods!")
|
9
|
+
end
|
10
|
+
end
|
11
|
+
|
12
|
+
class UnknownProvider < Error
|
13
|
+
def initialize(provider_name)
|
14
|
+
super("unregistered proxy provider `#{provider_name}`")
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
class RegisteredProvider < Error
|
19
|
+
def initialize(name)
|
20
|
+
super("`#{name}` provider already registered!")
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
class MaximumRedirectsReached < Error
|
25
|
+
def initialize(*)
|
26
|
+
super('maximum redirects reached')
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
class MaximumRetriesReached < Error
|
31
|
+
def initialize(*)
|
32
|
+
super('reached the maximum number of retries')
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
require 'json'
|
2
|
+
|
3
|
+
module ProxyFetcher
|
4
|
+
module Providers
|
5
|
+
class GatherProxy < Base
|
6
|
+
PROVIDER_URL = 'http://www.gatherproxy.com/'.freeze
|
7
|
+
|
8
|
+
def load_proxy_list(*)
|
9
|
+
doc = load_document(PROVIDER_URL)
|
10
|
+
doc.xpath('//div[@class="proxy-list"]/table/script')
|
11
|
+
end
|
12
|
+
|
13
|
+
def to_proxy(html_element)
|
14
|
+
json = parse_json(html_element)
|
15
|
+
|
16
|
+
ProxyFetcher::Proxy.new.tap do |proxy|
|
17
|
+
proxy.addr = json['PROXY_IP']
|
18
|
+
proxy.port = json['PROXY_PORT'].to_i(16)
|
19
|
+
proxy.anonymity = json['PROXY_TYPE']
|
20
|
+
proxy.country = json['PROXY_COUNTRY']
|
21
|
+
proxy.response_time = json['PROXY_TIME'].to_i
|
22
|
+
proxy.type = ProxyFetcher::Proxy::HTTP
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
private
|
27
|
+
|
28
|
+
def parse_json(element)
|
29
|
+
javascript = clear(element.content)[/{.+}/im]
|
30
|
+
JSON.parse(javascript)
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
ProxyFetcher::Configuration.register_provider(:gather_proxy, GatherProxy)
|
35
|
+
end
|
36
|
+
end
|
@@ -0,0 +1,48 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
module Providers
|
3
|
+
class HTTPTunnel < Base
|
4
|
+
PROVIDER_URL = 'http://www.httptunnel.ge/ProxyListForFree.aspx'.freeze
|
5
|
+
|
6
|
+
def load_proxy_list(*)
|
7
|
+
doc = load_document(PROVIDER_URL)
|
8
|
+
doc.xpath('//table[contains(@id, "GridView")]/tr[(count(td)>2)]')
|
9
|
+
end
|
10
|
+
|
11
|
+
def to_proxy(html_element)
|
12
|
+
ProxyFetcher::Proxy.new.tap do |proxy|
|
13
|
+
uri = parse_proxy_uri(html_element)
|
14
|
+
proxy.addr = uri.host
|
15
|
+
proxy.port = uri.port
|
16
|
+
|
17
|
+
proxy.country = parse_country(html_element)
|
18
|
+
proxy.anonymity = parse_anonymity(html_element)
|
19
|
+
proxy.type = ProxyFetcher::Proxy::HTTP
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
private
|
24
|
+
|
25
|
+
def parse_proxy_uri(element)
|
26
|
+
full_addr = parse_element(element, 'td[1]')
|
27
|
+
URI.parse("http://#{full_addr}")
|
28
|
+
end
|
29
|
+
|
30
|
+
def parse_country(element)
|
31
|
+
element.at('img').attr('title')
|
32
|
+
end
|
33
|
+
|
34
|
+
def parse_anonymity(element)
|
35
|
+
transparency = parse_element(element, 'td[5]').to_sym
|
36
|
+
|
37
|
+
{
|
38
|
+
A: 'Anonimous',
|
39
|
+
E: 'Elite',
|
40
|
+
T: 'Transparent',
|
41
|
+
U: 'Unknown'
|
42
|
+
}.fetch(transparency, 'Unknown')
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
ProxyFetcher::Configuration.register_provider(:http_tunnel, HTTPTunnel)
|
47
|
+
end
|
48
|
+
end
|
data/lib/proxy_fetcher/proxy.rb
CHANGED
@@ -14,12 +14,13 @@ module ProxyFetcher
|
|
14
14
|
def fetch
|
15
15
|
request = Net::HTTP::Get.new(@uri.to_s)
|
16
16
|
request['Connection'] = 'keep-alive'
|
17
|
+
request['User-Agent'] = ProxyFetcher.config.user_agent
|
17
18
|
response = @http.request(request)
|
18
19
|
response.body
|
19
20
|
end
|
20
21
|
|
21
22
|
def https?
|
22
|
-
@uri.
|
23
|
+
@uri.is_a?(URI::HTTPS)
|
23
24
|
end
|
24
25
|
|
25
26
|
class << self
|
@@ -6,15 +6,15 @@ module ProxyFetcher
|
|
6
6
|
uri = URI.parse(URL_TO_CHECK)
|
7
7
|
@http = Net::HTTP.new(uri.host, uri.port, proxy_addr, proxy_port.to_i)
|
8
8
|
|
9
|
-
return unless uri.
|
9
|
+
return unless uri.is_a?(URI::HTTPS)
|
10
10
|
|
11
11
|
@http.use_ssl = true
|
12
12
|
@http.verify_mode = OpenSSL::SSL::VERIFY_NONE
|
13
13
|
end
|
14
14
|
|
15
15
|
def connectable?
|
16
|
-
@http.open_timeout = ProxyFetcher.config.
|
17
|
-
@http.read_timeout = ProxyFetcher.config.
|
16
|
+
@http.open_timeout = ProxyFetcher.config.timeout
|
17
|
+
@http.read_timeout = ProxyFetcher.config.timeout
|
18
18
|
|
19
19
|
@http.start { |connection| return true if connection.request_head('/') }
|
20
20
|
|
data/proxy_fetcher.gemspec
CHANGED
@@ -5,7 +5,7 @@ require 'proxy_fetcher/version'
|
|
5
5
|
Gem::Specification.new do |gem|
|
6
6
|
gem.name = 'proxy_fetcher'
|
7
7
|
gem.version = ProxyFetcher.gem_version
|
8
|
-
gem.date = '2017-09-
|
8
|
+
gem.date = '2017-09-06'
|
9
9
|
gem.summary = 'Ruby gem for dealing with proxy lists from different providers'
|
10
10
|
gem.description = 'This gem can help your Ruby application to make HTTP(S) requests ' \
|
11
11
|
'from proxy server by fetching and validating proxy lists from the different providers.'
|
@@ -0,0 +1,125 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'json'
|
3
|
+
|
4
|
+
require 'evil-proxy'
|
5
|
+
require 'evil-proxy/async'
|
6
|
+
|
7
|
+
describe ProxyFetcher::Client do
|
8
|
+
before :all do
|
9
|
+
ProxyFetcher.configure do |config|
|
10
|
+
config.provider = :xroxy
|
11
|
+
config.timeout = 5
|
12
|
+
end
|
13
|
+
|
14
|
+
@server = EvilProxy::MITMProxyServer.new Port: 3128, Quiet: true
|
15
|
+
@server.start
|
16
|
+
end
|
17
|
+
|
18
|
+
after :all do
|
19
|
+
@server.shutdown
|
20
|
+
end
|
21
|
+
|
22
|
+
# Use local proxy server in order to avoid side effects, non-working proxies, etc
|
23
|
+
before :each do
|
24
|
+
proxy = ProxyFetcher::Proxy.new(addr: '127.0.0.1', port: 3128, type: 'HTTP, HTTPS')
|
25
|
+
ProxyFetcher::Client::ProxiesRegistry.manager.instance_variable_set(:'@proxies', [proxy])
|
26
|
+
allow_any_instance_of(ProxyFetcher::Providers::Base).to receive(:fetch_proxies!).and_return([proxy])
|
27
|
+
end
|
28
|
+
|
29
|
+
context 'GET request with the valid proxy' do
|
30
|
+
it 'successfully returns page content for HTTP' do
|
31
|
+
content = ProxyFetcher::Client.get('http://httpbin.org')
|
32
|
+
|
33
|
+
expect(content).not_to be_nil
|
34
|
+
expect(content).not_to be_empty
|
35
|
+
end
|
36
|
+
|
37
|
+
it 'successfully returns page content for HTTPS' do
|
38
|
+
content = ProxyFetcher::Client.get('https://httpbin.org')
|
39
|
+
|
40
|
+
expect(content).not_to be_nil
|
41
|
+
expect(content).not_to be_empty
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
context 'POST request with the valid proxy' do
|
46
|
+
it 'successfully returns page content for HTTP' do
|
47
|
+
headers = {
|
48
|
+
'X-Proxy-Fetcher-Version' => ProxyFetcher::VERSION::STRING
|
49
|
+
}
|
50
|
+
content = ProxyFetcher::Client.post('http://httpbin.org/post', { param: 'value'} , headers: headers)
|
51
|
+
|
52
|
+
expect(content).not_to be_nil
|
53
|
+
expect(content).not_to be_empty
|
54
|
+
|
55
|
+
json = JSON.parse(content)
|
56
|
+
|
57
|
+
expect(json['headers']['X-Proxy-Fetcher-Version']).to eq(ProxyFetcher::VERSION::STRING)
|
58
|
+
expect(json['headers']['User-Agent']).to eq(ProxyFetcher.config.user_agent)
|
59
|
+
end
|
60
|
+
end
|
61
|
+
|
62
|
+
context 'PUT request with the valid proxy' do
|
63
|
+
it 'successfully returns page content for HTTP' do
|
64
|
+
content = ProxyFetcher::Client.put('http://httpbin.org/put', 'param=PutValue')
|
65
|
+
|
66
|
+
expect(content).not_to be_nil
|
67
|
+
expect(content).not_to be_empty
|
68
|
+
|
69
|
+
json = JSON.parse(content)
|
70
|
+
|
71
|
+
expect(json['form']['param']).to eq('PutValue')
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
context 'PATCH request with the valid proxy' do
|
76
|
+
it 'successfully returns page content for HTTP' do
|
77
|
+
content = ProxyFetcher::Client.patch('http://httpbin.org/patch', param: 'value')
|
78
|
+
|
79
|
+
expect(content).not_to be_nil
|
80
|
+
expect(content).not_to be_empty
|
81
|
+
|
82
|
+
json = JSON.parse(content)
|
83
|
+
|
84
|
+
expect(json['form']['param']).to eq('value')
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
88
|
+
context 'DELETE request with the valid proxy' do
|
89
|
+
it 'successfully returns page content for HTTP' do
|
90
|
+
content = ProxyFetcher::Client.delete('http://httpbin.org/delete')
|
91
|
+
|
92
|
+
expect(content).not_to be_nil
|
93
|
+
expect(content).not_to be_empty
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
context 'HEAD request with the valid proxy' do
|
98
|
+
it 'successfully works' do
|
99
|
+
content = ProxyFetcher::Client.head('http://httpbin.org')
|
100
|
+
|
101
|
+
expect(content).to be_nil
|
102
|
+
end
|
103
|
+
end
|
104
|
+
|
105
|
+
context 'retries' do
|
106
|
+
it 'raises an error when reaches max retries limit' do
|
107
|
+
allow(ProxyFetcher::Client::Request).to receive(:execute).and_raise(StandardError)
|
108
|
+
|
109
|
+
expect { ProxyFetcher::Client.get('http://httpbin.org') }.to raise_error(ProxyFetcher::Exceptions::MaximumRetriesReached)
|
110
|
+
end
|
111
|
+
end
|
112
|
+
|
113
|
+
context 'redirects' do
|
114
|
+
it 'follows redirect when present' do
|
115
|
+
content = ProxyFetcher::Client.get('http://httpbin.org/absolute-redirect/2')
|
116
|
+
|
117
|
+
expect(content).not_to be_nil
|
118
|
+
expect(content).not_to be_empty
|
119
|
+
end
|
120
|
+
|
121
|
+
it 'raises an error when reaches max redirects limit' do
|
122
|
+
expect { ProxyFetcher::Client.get('http://httpbin.org/absolute-redirect/11') }.to raise_error(ProxyFetcher::Exceptions::MaximumRedirectsReached)
|
123
|
+
end
|
124
|
+
end
|
125
|
+
end
|
@@ -19,7 +19,7 @@ describe ProxyFetcher::Configuration do
|
|
19
19
|
MyWrongHTTPClient = Class.new
|
20
20
|
|
21
21
|
expect { ProxyFetcher.config.http_client = MyWrongHTTPClient }
|
22
|
-
.to raise_error(ProxyFetcher::
|
22
|
+
.to raise_error(ProxyFetcher::Exceptions::WrongCustomClass)
|
23
23
|
end
|
24
24
|
end
|
25
25
|
|
@@ -38,21 +38,21 @@ describe ProxyFetcher::Configuration do
|
|
38
38
|
MyWrongProxyValidator = Class.new
|
39
39
|
|
40
40
|
expect { ProxyFetcher.config.proxy_validator = MyWrongProxyValidator }
|
41
|
-
.to raise_error(ProxyFetcher::
|
41
|
+
.to raise_error(ProxyFetcher::Exceptions::WrongCustomClass)
|
42
42
|
end
|
43
43
|
end
|
44
44
|
|
45
45
|
context 'custom provider' do
|
46
46
|
it 'failed on registration if provider class already registered' do
|
47
47
|
expect { ProxyFetcher::Configuration.register_provider(:xroxy, Class.new) }
|
48
|
-
.to raise_error(ProxyFetcher::
|
48
|
+
.to raise_error(ProxyFetcher::Exceptions::RegisteredProvider)
|
49
49
|
end
|
50
50
|
|
51
51
|
it "failed on proxy list fetching if provider doesn't registered" do
|
52
52
|
ProxyFetcher.config.provider = :not_existing_provider
|
53
53
|
|
54
54
|
expect { ProxyFetcher::Manager.new }
|
55
|
-
.to raise_error(ProxyFetcher::
|
55
|
+
.to raise_error(ProxyFetcher::Exceptions::UnknownProvider)
|
56
56
|
end
|
57
57
|
end
|
58
58
|
end
|
data/spec/spec_helper.rb
CHANGED
@@ -0,0 +1,26 @@
|
|
1
|
+
require 'evil-proxy'
|
2
|
+
|
3
|
+
EvilProxy::HTTPProxyServer.class_eval do
|
4
|
+
def do_PUT(req, res)
|
5
|
+
perform_proxy_request(req, res) do |http, path, header|
|
6
|
+
http.put(path, req.body || '', header)
|
7
|
+
end
|
8
|
+
end
|
9
|
+
|
10
|
+
def do_DELETE(req, res)
|
11
|
+
perform_proxy_request(req, res) do |http, path, header|
|
12
|
+
http.delete(path, header)
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
def do_PATCH(req, res)
|
17
|
+
perform_proxy_request(req, res) do |http, path, header|
|
18
|
+
http.patch(path, req.body || '', header)
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
# This method is not needed for PUT but I added for completeness
|
23
|
+
def do_OPTIONS(_req, res)
|
24
|
+
res['allow'] = 'GET,HEAD,POST,OPTIONS,CONNECT,PUT,PATCH,DELETE'
|
25
|
+
end
|
26
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: proxy_fetcher
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Nikita Bulai
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-09-
|
11
|
+
date: 2017-09-06 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
@@ -63,13 +63,18 @@ files:
|
|
63
63
|
- Rakefile
|
64
64
|
- bin/proxy_fetcher
|
65
65
|
- lib/proxy_fetcher.rb
|
66
|
+
- lib/proxy_fetcher/client/client.rb
|
67
|
+
- lib/proxy_fetcher/client/proxies_registry.rb
|
68
|
+
- lib/proxy_fetcher/client/request.rb
|
66
69
|
- lib/proxy_fetcher/configuration.rb
|
67
70
|
- lib/proxy_fetcher/configuration/providers_registry.rb
|
71
|
+
- lib/proxy_fetcher/exceptions.rb
|
68
72
|
- lib/proxy_fetcher/manager.rb
|
69
73
|
- lib/proxy_fetcher/providers/base.rb
|
70
74
|
- lib/proxy_fetcher/providers/free_proxy_list.rb
|
71
75
|
- lib/proxy_fetcher/providers/free_proxy_list_ssl.rb
|
72
|
-
- lib/proxy_fetcher/providers/
|
76
|
+
- lib/proxy_fetcher/providers/gather_proxy.rb
|
77
|
+
- lib/proxy_fetcher/providers/http_tunnel.rb
|
73
78
|
- lib/proxy_fetcher/providers/proxy_docker.rb
|
74
79
|
- lib/proxy_fetcher/providers/proxy_list.rb
|
75
80
|
- lib/proxy_fetcher/providers/xroxy.rb
|
@@ -79,17 +84,20 @@ files:
|
|
79
84
|
- lib/proxy_fetcher/utils/proxy_validator.rb
|
80
85
|
- lib/proxy_fetcher/version.rb
|
81
86
|
- proxy_fetcher.gemspec
|
87
|
+
- spec/proxy_fetcher/client_spec.rb
|
82
88
|
- spec/proxy_fetcher/configuration_spec.rb
|
83
89
|
- spec/proxy_fetcher/providers/base_spec.rb
|
84
90
|
- spec/proxy_fetcher/providers/free_proxy_list_spec.rb
|
85
91
|
- spec/proxy_fetcher/providers/free_proxy_list_ssl_spec.rb
|
86
|
-
- spec/proxy_fetcher/providers/
|
92
|
+
- spec/proxy_fetcher/providers/gather_proxy_spec.rb
|
93
|
+
- spec/proxy_fetcher/providers/http_tunnel_spec.rb
|
87
94
|
- spec/proxy_fetcher/providers/multiple_providers_spec.rb
|
88
95
|
- spec/proxy_fetcher/providers/proxy_docker_spec.rb
|
89
96
|
- spec/proxy_fetcher/providers/proxy_list_spec.rb
|
90
97
|
- spec/proxy_fetcher/providers/xroxy_spec.rb
|
91
98
|
- spec/proxy_fetcher/proxy_spec.rb
|
92
99
|
- spec/spec_helper.rb
|
100
|
+
- spec/support/evil_proxy_patch.rb
|
93
101
|
- spec/support/manager_examples.rb
|
94
102
|
homepage: http://github.com/nbulaj/proxy_fetcher
|
95
103
|
licenses:
|
@@ -1,35 +0,0 @@
|
|
1
|
-
module ProxyFetcher
|
2
|
-
module Providers
|
3
|
-
class HideMyName < Base
|
4
|
-
PROVIDER_URL = 'https://hidemy.name/en/proxy-list/'.freeze
|
5
|
-
|
6
|
-
def load_proxy_list(filters = { type: 'hs' })
|
7
|
-
doc = load_document(PROVIDER_URL, filters)
|
8
|
-
doc.xpath('//table[@class="proxy__t"]/tbody/tr')
|
9
|
-
end
|
10
|
-
|
11
|
-
def to_proxy(html_element)
|
12
|
-
ProxyFetcher::Proxy.new.tap do |proxy|
|
13
|
-
proxy.addr = parse_element(html_element, 'td[1]')
|
14
|
-
proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
|
15
|
-
proxy.anonymity = parse_element(html_element, 'td[6]')
|
16
|
-
proxy.country = parse_country(html_element)
|
17
|
-
proxy.type = parse_element(html_element, 'td[5]')
|
18
|
-
proxy.response_time = parse_response_time(html_element)
|
19
|
-
end
|
20
|
-
end
|
21
|
-
|
22
|
-
private
|
23
|
-
|
24
|
-
def parse_country(element)
|
25
|
-
clear(element.at_xpath('*//span[1]/following-sibling::text()[1]').content)
|
26
|
-
end
|
27
|
-
|
28
|
-
def parse_response_time(element)
|
29
|
-
convert_to_int(element.at_xpath('td[4]').content.strip[/\d+/])
|
30
|
-
end
|
31
|
-
end
|
32
|
-
|
33
|
-
ProxyFetcher::Configuration.register_provider(:hide_my_name, HideMyName)
|
34
|
-
end
|
35
|
-
end
|