proxy_fetcher 0.4.1 → 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/.rubocop.yml +3 -0
- data/CHANGELOG.md +16 -0
- data/Gemfile +1 -0
- data/README.md +55 -8
- data/lib/proxy_fetcher.rb +6 -1
- data/lib/proxy_fetcher/client/client.rb +71 -0
- data/lib/proxy_fetcher/client/proxies_registry.rb +32 -0
- data/lib/proxy_fetcher/client/request.rb +88 -0
- data/lib/proxy_fetcher/configuration.rb +7 -5
- data/lib/proxy_fetcher/configuration/providers_registry.rb +3 -6
- data/lib/proxy_fetcher/exceptions.rb +36 -0
- data/lib/proxy_fetcher/providers/gather_proxy.rb +36 -0
- data/lib/proxy_fetcher/providers/http_tunnel.rb +48 -0
- data/lib/proxy_fetcher/providers/proxy_docker.rb +1 -1
- data/lib/proxy_fetcher/proxy.rb +3 -1
- data/lib/proxy_fetcher/utils/http_client.rb +2 -1
- data/lib/proxy_fetcher/utils/proxy_validator.rb +3 -3
- data/lib/proxy_fetcher/version.rb +2 -2
- data/proxy_fetcher.gemspec +1 -1
- data/spec/proxy_fetcher/client_spec.rb +125 -0
- data/spec/proxy_fetcher/configuration_spec.rb +4 -4
- data/spec/proxy_fetcher/providers/gather_proxy_spec.rb +9 -0
- data/spec/proxy_fetcher/providers/http_tunnel_spec.rb +9 -0
- data/spec/spec_helper.rb +3 -1
- data/spec/support/evil_proxy_patch.rb +26 -0
- metadata +12 -4
- data/lib/proxy_fetcher/providers/hide_my_name.rb +0 -35
- data/spec/proxy_fetcher/providers/hide_my_name_spec.rb +0 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 287db7b55e3f0798e263fe7268f8a709e4d8e8c0
|
4
|
+
data.tar.gz: cde6d4dc22e60aa012c02b1f679fcc72b23c6114
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a54e2d725338bc5c859d415cae5b0397b4aa2a75a0f00917edf4eaa2f845e893c1d73f8e56b86ddfce85c6016fcf473ed4ad2e29fbf78a2c65495df050a320d7
|
7
|
+
data.tar.gz: 61b2fc3dfd20c75045c8b435d0855e0596071facc8bef9e73d68e46996faaad8a9e1ccbb3b9b818f1449138e7d52ee583dffaa813cc5d3696a24ad6ba835475c
|
data/.gitignore
CHANGED
data/.rubocop.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,22 @@
|
|
2
2
|
|
3
3
|
Reverse Chronological Order:
|
4
4
|
|
5
|
+
## `0.5.0` (2017-09-06)
|
6
|
+
|
7
|
+
* Remove HideMyName provider (not works anymore)
|
8
|
+
* Fix ProxyDocker provider
|
9
|
+
* Add `ProxyFetcher::Client` to make interacting with proxies easier
|
10
|
+
* Add new providers (Gather Proxy & HTTP Tunnel Genius)
|
11
|
+
* Simplify `connection_timeout` config option to `timeout`
|
12
|
+
* Make User-Agent configurable
|
13
|
+
* Move all the gem exceptions under `ProxyFetcher::Error` base class
|
14
|
+
* Small improvements
|
15
|
+
|
16
|
+
## `0.4.1` (2017-09-04)
|
17
|
+
|
18
|
+
* Use all registered providers by default
|
19
|
+
* Disable HideMyName provider (now ше uses CloudFlare)
|
20
|
+
|
5
21
|
## `0.4.0` (2017-08-26)
|
6
22
|
|
7
23
|
* Support operations with multiple providers
|
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -6,7 +6,7 @@
|
|
6
6
|
[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg)](#license)
|
7
7
|
|
8
8
|
This gem can help your Ruby application to make HTTP(S) requests from proxy by fetching and validating actual
|
9
|
-
proxy lists from multiple providers
|
9
|
+
proxy lists from multiple providers.
|
10
10
|
|
11
11
|
It gives you a `Manager` class that can load proxy lists, validate them and return random or specific proxies. Take a look
|
12
12
|
at the documentation below to find all the gem features.
|
@@ -20,6 +20,7 @@ validating proxy lists from the different providers. [Checkout examples](#standa
|
|
20
20
|
- [Example of usage](#example-of-usage)
|
21
21
|
- [In Ruby application](#in-ruby-application)
|
22
22
|
- [Standalone](#standalone)
|
23
|
+
- [Client](#client)
|
23
24
|
- [Configuration](#configuration)
|
24
25
|
- [Proxy validation speed](#proxy-validation-speed)
|
25
26
|
- [Proxy object](#proxy-object)
|
@@ -32,7 +33,7 @@ validating proxy lists from the different providers. [Checkout examples](#standa
|
|
32
33
|
If using bundler, first add 'proxy_fetcher' to your Gemfile:
|
33
34
|
|
34
35
|
```ruby
|
35
|
-
gem 'proxy_fetcher', '~> 0.
|
36
|
+
gem 'proxy_fetcher', '~> 0.5'
|
36
37
|
```
|
37
38
|
|
38
39
|
or if you want to use the latest version (from `master` branch), then:
|
@@ -50,7 +51,7 @@ bundle install
|
|
50
51
|
Otherwise simply install the gem:
|
51
52
|
|
52
53
|
```sh
|
53
|
-
gem install proxy_fetcher -v '0.
|
54
|
+
gem install proxy_fetcher -v '0.5'
|
54
55
|
```
|
55
56
|
|
56
57
|
## Example of usage
|
@@ -123,7 +124,7 @@ If you need to filter proxy list, for example, by country or response time and s
|
|
123
124
|
then you can just pass your filters like a simple Ruby hash to the Manager instance:
|
124
125
|
|
125
126
|
```ruby
|
126
|
-
ProxyFetcher.config.providers = :
|
127
|
+
ProxyFetcher.config.providers = :proxy_docker
|
127
128
|
|
128
129
|
manager = ProxyFetcher::Manager.new(filters: { country: 'PL', maxtime: '500' })
|
129
130
|
manager.proxies
|
@@ -134,7 +135,7 @@ manager.proxies
|
|
134
135
|
If you are using multiple providers, then you can split your filters by proxy provider names:
|
135
136
|
|
136
137
|
```ruby
|
137
|
-
ProxyFetcher.config.providers = [:
|
138
|
+
ProxyFetcher.config.providers = [:proxy_docker, :xroxy]
|
138
139
|
|
139
140
|
manager = ProxyFetcher::Manager.new(filters: {
|
140
141
|
hide_my_name: {
|
@@ -194,19 +195,64 @@ To get all the possible options run:
|
|
194
195
|
proxy_fetcher --help
|
195
196
|
```
|
196
197
|
|
198
|
+
## Client
|
199
|
+
|
200
|
+
ProxyFetcher gem provides you a ready-to-use HTTP client that made requesting with proxies easy. It does all the work
|
201
|
+
with the proxy lists for you (load, validate, refresh, find proxy by type, follow redirects, etc). All you need it to
|
202
|
+
make HTTP(S) requests:
|
203
|
+
|
204
|
+
```ruby
|
205
|
+
require 'proxy-fetcher'
|
206
|
+
|
207
|
+
ProxyFetcher::Client.get 'https://example.com/resource'
|
208
|
+
|
209
|
+
ProxyFetcher::Client.post 'https://example.com/resource', { param: 'value' }
|
210
|
+
|
211
|
+
ProxyFetcher::Client.post 'https://example.com/resource', 'Any data'
|
212
|
+
|
213
|
+
ProxyFetcher::Client.post 'https://example.com/resource', { param: 'value'}.to_json , headers: { 'Content-Type': 'application/json' }
|
214
|
+
|
215
|
+
ProxyFetcher::Client.put 'https://example.com/resource', { param: 'value' }
|
216
|
+
|
217
|
+
ProxyFetcher::Client.patch 'https://example.com/resource', { param: 'value' }
|
218
|
+
|
219
|
+
ProxyFetcher::Client.delete 'https://example.com/resource'
|
220
|
+
```
|
221
|
+
|
222
|
+
By default, `ProxyFetcher::Client` makes 1000 attempts to send a HTTP request in case if proxy is out of order or the
|
223
|
+
remote server returns an error. You can increase or decrease this number for your case or set it to `nil` if you want to
|
224
|
+
make infinite number of requests (or before your Ruby process will die :skull:):
|
225
|
+
|
226
|
+
```ruby
|
227
|
+
require 'proxy-fetcher'
|
228
|
+
|
229
|
+
ProxyFetcher::Client.get 'https://example.com/resource', options: { max_retries: 10_000 }
|
230
|
+
```
|
231
|
+
|
232
|
+
Btw, if you need support of JavaScript or some other features, you need to implement your own client using, for example,
|
233
|
+
`selenium-webdriver`.
|
234
|
+
|
197
235
|
## Configuration
|
198
236
|
|
199
237
|
To change open/read timeout for `cleanup!` and `connectable?` methods you need to change `ProxyFetcher.config`:
|
200
238
|
|
201
239
|
```ruby
|
202
240
|
ProxyFetcher.configure do |config|
|
203
|
-
config.
|
241
|
+
config.timeout = 1 # default is 3
|
204
242
|
end
|
205
243
|
|
206
244
|
manager = ProxyFetcher::Manager.new
|
207
245
|
manager.cleanup!
|
208
246
|
```
|
209
247
|
|
248
|
+
Also you can set your custom User-Agent:
|
249
|
+
|
250
|
+
```ruby
|
251
|
+
ProxyFetcher.configure do |config|
|
252
|
+
config.user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
|
253
|
+
end
|
254
|
+
```
|
255
|
+
|
210
256
|
ProxyFetcher uses simple Ruby solution for dealing with HTTP(S) requests - `net/http` library from the stdlib. If you wanna add, for example, your custom provider that
|
211
257
|
was developed as a Single Page Application (SPA) with some JavaScript, then you will need something like [selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb)
|
212
258
|
to properly load the content of the website. For those and other cases you can write your own class for fetching HTML content by the URL and setup it
|
@@ -269,7 +315,7 @@ ProxyFetcher.config.pool_size = 50
|
|
269
315
|
You can experiment with the threads pool size to find an optimal number of maximum threads count for you PC and OS.
|
270
316
|
This will definitely give you some performance improvements.
|
271
317
|
|
272
|
-
Moreover, the common proxy validation speed depends on `ProxyFetcher.config.
|
318
|
+
Moreover, the common proxy validation speed depends on `ProxyFetcher.config.timeout` option that is equal
|
273
319
|
to `3` by default. It means that gem will wait 3 seconds for the server answer to check if particular proxy is connectable.
|
274
320
|
You can decrease this option to `1`, for example, and it will heavily increase proxy validation speed (**but remember**
|
275
321
|
that some proxies could be connectable, but slow, so with this option you will clear proxy list from the proxies that
|
@@ -300,10 +346,11 @@ Also you can call next instance methods for every Proxy object:
|
|
300
346
|
|
301
347
|
Currently ProxyFetcher can deal with next proxy providers (services):
|
302
348
|
|
303
|
-
* Hide My Name (**currently does not work**)
|
304
349
|
* Free Proxy List
|
305
350
|
* Free SSL Proxies
|
306
351
|
* Proxy Docker
|
352
|
+
* Gather Proxy
|
353
|
+
* HTTP Tunnel Genius
|
307
354
|
* Proxy List
|
308
355
|
* XRoxy
|
309
356
|
|
data/lib/proxy_fetcher.rb
CHANGED
@@ -3,6 +3,7 @@ require 'net/https'
|
|
3
3
|
require 'nokogiri'
|
4
4
|
require 'thread'
|
5
5
|
|
6
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/exceptions'
|
6
7
|
require File.dirname(__FILE__) + '/proxy_fetcher/configuration'
|
7
8
|
require File.dirname(__FILE__) + '/proxy_fetcher/configuration/providers_registry'
|
8
9
|
require File.dirname(__FILE__) + '/proxy_fetcher/proxy'
|
@@ -11,13 +12,17 @@ require File.dirname(__FILE__) + '/proxy_fetcher/manager'
|
|
11
12
|
require File.dirname(__FILE__) + '/proxy_fetcher/utils/http_client'
|
12
13
|
require File.dirname(__FILE__) + '/proxy_fetcher/utils/html'
|
13
14
|
require File.dirname(__FILE__) + '/proxy_fetcher/utils/proxy_validator'
|
15
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/client/client'
|
16
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/client/request'
|
17
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/client/proxies_registry'
|
14
18
|
|
15
19
|
module ProxyFetcher
|
16
20
|
module Providers
|
17
21
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/base'
|
18
22
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/free_proxy_list'
|
19
23
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/free_proxy_list_ssl'
|
20
|
-
require File.dirname(__FILE__) + '/proxy_fetcher/providers/
|
24
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/providers/gather_proxy'
|
25
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/providers/http_tunnel'
|
21
26
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/proxy_docker'
|
22
27
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/proxy_list'
|
23
28
|
require File.dirname(__FILE__) + '/proxy_fetcher/providers/xroxy'
|
@@ -0,0 +1,71 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
module Client
|
3
|
+
class << self
|
4
|
+
def get(url, headers: {}, options: {})
|
5
|
+
request_without_payload(:get, url, headers, options)
|
6
|
+
end
|
7
|
+
|
8
|
+
def head(url, headers: {}, options: {})
|
9
|
+
request_without_payload(:head, url, headers, options)
|
10
|
+
end
|
11
|
+
|
12
|
+
def post(url, payload, headers: {}, options: {})
|
13
|
+
request_with_payload(:post, url, payload, headers, options)
|
14
|
+
end
|
15
|
+
|
16
|
+
def delete(url, headers: {}, options: {})
|
17
|
+
request_without_payload(:delete, url, headers, options)
|
18
|
+
end
|
19
|
+
|
20
|
+
def put(url, payload, headers: {}, options: {})
|
21
|
+
request_with_payload(:put, url, payload, headers, options)
|
22
|
+
end
|
23
|
+
|
24
|
+
def patch(url, payload, headers: {}, options: {})
|
25
|
+
request_with_payload(:patch, url, payload, headers, options)
|
26
|
+
end
|
27
|
+
|
28
|
+
private
|
29
|
+
|
30
|
+
def request_with_payload(method, url, payload, headers, options)
|
31
|
+
safe_request_to(url, options.fetch(:max_retries, 1000)) do |proxy|
|
32
|
+
opts = options.merge(url: url, payload: payload, proxy: proxy, headers: default_headers.merge(headers))
|
33
|
+
|
34
|
+
Request.execute(method: method, **opts)
|
35
|
+
end
|
36
|
+
end
|
37
|
+
|
38
|
+
def request_without_payload(method, url, headers, options)
|
39
|
+
safe_request_to(url, options.fetch(:max_retries, 1000)) do |proxy|
|
40
|
+
opts = options.merge(url: url, proxy: proxy, headers: default_headers.merge(headers))
|
41
|
+
|
42
|
+
Request.execute(method: method, **opts)
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
def default_headers
|
47
|
+
{
|
48
|
+
'User-Agent' => ProxyFetcher.config.user_agent
|
49
|
+
}
|
50
|
+
end
|
51
|
+
|
52
|
+
def safe_request_to(url, max_retries = 1000)
|
53
|
+
tries = 0
|
54
|
+
|
55
|
+
begin
|
56
|
+
proxy = ProxiesRegistry.find_proxy_for(url)
|
57
|
+
yield(proxy)
|
58
|
+
rescue ProxyFetcher::Error
|
59
|
+
raise
|
60
|
+
rescue StandardError
|
61
|
+
raise ProxyFetcher::Exceptions::MaximumRetriesReached if max_retries && tries >= max_retries
|
62
|
+
|
63
|
+
ProxiesRegistry.invalidate_proxy!(proxy)
|
64
|
+
tries += 1
|
65
|
+
|
66
|
+
retry
|
67
|
+
end
|
68
|
+
end
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
@@ -0,0 +1,32 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
module Client
|
3
|
+
class ProxiesRegistry
|
4
|
+
class << self
|
5
|
+
def invalidate_proxy!(proxy)
|
6
|
+
manager.proxies.delete(proxy)
|
7
|
+
manager.refresh_list! if manager.proxies.empty?
|
8
|
+
end
|
9
|
+
|
10
|
+
def find_proxy_for(url)
|
11
|
+
proxy = if URI.parse(url).is_a?(URI::HTTPS)
|
12
|
+
manager.proxies.detect(&:ssl?)
|
13
|
+
else
|
14
|
+
manager.get
|
15
|
+
end
|
16
|
+
|
17
|
+
return proxy unless proxy.nil?
|
18
|
+
|
19
|
+
manager.refresh_list!
|
20
|
+
find_proxy_for(url)
|
21
|
+
end
|
22
|
+
|
23
|
+
def manager
|
24
|
+
manager = Thread.current[:proxy_fetcher_manager]
|
25
|
+
return manager unless manager.nil?
|
26
|
+
|
27
|
+
Thread.current[:proxy_fetcher_manager] = ProxyFetcher::Manager.new
|
28
|
+
end
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
@@ -0,0 +1,88 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
module Client
|
3
|
+
class Request
|
4
|
+
URL_ENCODED = {
|
5
|
+
'Content-Type' => 'application/x-www-form-urlencoded'
|
6
|
+
}.freeze
|
7
|
+
|
8
|
+
DEFAULT_SSL_OPTIONS = {
|
9
|
+
verify_mode: OpenSSL::SSL::VERIFY_NONE
|
10
|
+
}.freeze
|
11
|
+
|
12
|
+
attr_reader :http, :method, :uri, :headers, :timeout,
|
13
|
+
:payload, :proxy, :max_redirects, :ssl_options
|
14
|
+
|
15
|
+
def self.execute(args)
|
16
|
+
new(args).execute
|
17
|
+
end
|
18
|
+
|
19
|
+
def initialize(args)
|
20
|
+
raise ArgumentError, 'args must be a Hash!' unless args.is_a?(Hash)
|
21
|
+
|
22
|
+
@uri = URI.parse(args.fetch(:url))
|
23
|
+
@method = args.fetch(:method).to_s.capitalize
|
24
|
+
@headers = (args[:headers] || {}).dup
|
25
|
+
@payload = preprocess_payload(args[:payload])
|
26
|
+
@timeout = args.fetch(:timeout, ProxyFetcher.config.timeout)
|
27
|
+
@ssl_options = args.fetch(:ssl_options, DEFAULT_SSL_OPTIONS)
|
28
|
+
|
29
|
+
@proxy = args.fetch(:proxy)
|
30
|
+
@max_redirects = args.fetch(:max_redirects, 10)
|
31
|
+
|
32
|
+
build_http_client
|
33
|
+
end
|
34
|
+
|
35
|
+
def execute
|
36
|
+
request = request_class_for(method).new(uri, headers)
|
37
|
+
|
38
|
+
http.start do |connection|
|
39
|
+
process_response!(connection.request(request, payload))
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
private
|
44
|
+
|
45
|
+
def preprocess_payload(payload)
|
46
|
+
return if payload.nil?
|
47
|
+
|
48
|
+
if payload.is_a?(Hash)
|
49
|
+
headers.merge(URL_ENCODED)
|
50
|
+
URI.encode_www_form(payload)
|
51
|
+
else
|
52
|
+
payload
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
56
|
+
def build_http_client
|
57
|
+
@http = Net::HTTP.new(uri.host, uri.port, proxy.addr, proxy.port)
|
58
|
+
|
59
|
+
@http.use_ssl = uri.is_a?(URI::HTTPS)
|
60
|
+
@http.verify_mode = ssl_options.fetch(:verify_mode)
|
61
|
+
@http.open_timeout = timeout
|
62
|
+
@http.read_timeout = timeout
|
63
|
+
end
|
64
|
+
|
65
|
+
def process_response!(http_response)
|
66
|
+
case http_response
|
67
|
+
when Net::HTTPSuccess then http_response.read_body
|
68
|
+
when Net::HTTPRedirection then follow_redirection(http_response)
|
69
|
+
else
|
70
|
+
http_response.error!
|
71
|
+
end
|
72
|
+
end
|
73
|
+
|
74
|
+
def follow_redirection(http_response)
|
75
|
+
raise ProxyFetcher::Exceptions::MaximumRedirectsReached if max_redirects <= 0
|
76
|
+
|
77
|
+
url = http_response.fetch('location')
|
78
|
+
url = uri.merge(url).to_s unless url.downcase.start_with?('http')
|
79
|
+
|
80
|
+
Request.execute(method: :get, url: url, proxy: proxy, headers: headers, timeout: timeout, max_redirects: max_redirects - 1)
|
81
|
+
end
|
82
|
+
|
83
|
+
def request_class_for(method)
|
84
|
+
Net::HTTP.const_get(method, false)
|
85
|
+
end
|
86
|
+
end
|
87
|
+
end
|
88
|
+
end
|
@@ -1,10 +1,11 @@
|
|
1
1
|
module ProxyFetcher
|
2
2
|
class Configuration
|
3
|
-
|
4
|
-
|
5
|
-
attr_accessor :providers, :connection_timeout, :pool_size
|
3
|
+
attr_accessor :providers, :timeout, :pool_size, :user_agent
|
6
4
|
attr_accessor :http_client, :proxy_validator
|
7
5
|
|
6
|
+
# rubocop:disable Metrics/LineLength
|
7
|
+
DEFAULT_USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112 Safari/537.36'.freeze
|
8
|
+
|
8
9
|
class << self
|
9
10
|
def providers_registry
|
10
11
|
@registry ||= ProvidersRegistry.new
|
@@ -25,8 +26,9 @@ module ProxyFetcher
|
|
25
26
|
|
26
27
|
# Sets default configuration options
|
27
28
|
def reset!
|
29
|
+
@user_agent = DEFAULT_USER_AGENT
|
28
30
|
@pool_size = 10
|
29
|
-
@
|
31
|
+
@timeout = 3
|
30
32
|
@http_client = HTTPClient
|
31
33
|
@proxy_validator = ProxyValidator
|
32
34
|
|
@@ -53,7 +55,7 @@ module ProxyFetcher
|
|
53
55
|
# Checks if custom class has some required class methods
|
54
56
|
def setup_custom_class(klass, required_methods: [])
|
55
57
|
unless klass.respond_to?(*required_methods)
|
56
|
-
raise WrongCustomClass
|
58
|
+
raise ProxyFetcher::Exceptions::WrongCustomClass.new(klass, required_methods)
|
57
59
|
end
|
58
60
|
|
59
61
|
klass
|
@@ -1,17 +1,14 @@
|
|
1
1
|
module ProxyFetcher
|
2
2
|
class ProvidersRegistry
|
3
|
-
UnknownProvider = Class.new(StandardError)
|
4
|
-
RegisteredProvider = Class.new(StandardError)
|
5
|
-
|
6
3
|
def providers
|
7
4
|
@providers ||= {}
|
8
5
|
end
|
9
6
|
|
10
7
|
# Add custom provider to common registry.
|
11
|
-
# Requires proxy provider name ('
|
8
|
+
# Requires proxy provider name ('proxy_docker' for example) and a class
|
12
9
|
# that implements the parsing logic.
|
13
10
|
def register(name, klass)
|
14
|
-
raise RegisteredProvider,
|
11
|
+
raise ProxyFetcher::Exceptions::RegisteredProvider, name if providers.key?(name.to_sym)
|
15
12
|
|
16
13
|
providers[name.to_sym] = klass
|
17
14
|
end
|
@@ -23,7 +20,7 @@ module ProxyFetcher
|
|
23
20
|
|
24
21
|
providers.fetch(provider_name)
|
25
22
|
rescue KeyError
|
26
|
-
raise UnknownProvider,
|
23
|
+
raise ProxyFetcher::Exceptions::UnknownProvider, provider_name
|
27
24
|
end
|
28
25
|
end
|
29
26
|
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
Error = Class.new(StandardError)
|
3
|
+
|
4
|
+
module Exceptions
|
5
|
+
class WrongCustomClass < Error
|
6
|
+
def initialize(klass, methods)
|
7
|
+
required_methods = Array(methods).join(', ')
|
8
|
+
super("#{klass} must respond to [#{required_methods}] class methods!")
|
9
|
+
end
|
10
|
+
end
|
11
|
+
|
12
|
+
class UnknownProvider < Error
|
13
|
+
def initialize(provider_name)
|
14
|
+
super("unregistered proxy provider `#{provider_name}`")
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
class RegisteredProvider < Error
|
19
|
+
def initialize(name)
|
20
|
+
super("`#{name}` provider already registered!")
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
class MaximumRedirectsReached < Error
|
25
|
+
def initialize(*)
|
26
|
+
super('maximum redirects reached')
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
class MaximumRetriesReached < Error
|
31
|
+
def initialize(*)
|
32
|
+
super('reached the maximum number of retries')
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
require 'json'
|
2
|
+
|
3
|
+
module ProxyFetcher
|
4
|
+
module Providers
|
5
|
+
class GatherProxy < Base
|
6
|
+
PROVIDER_URL = 'http://www.gatherproxy.com/'.freeze
|
7
|
+
|
8
|
+
def load_proxy_list(*)
|
9
|
+
doc = load_document(PROVIDER_URL)
|
10
|
+
doc.xpath('//div[@class="proxy-list"]/table/script')
|
11
|
+
end
|
12
|
+
|
13
|
+
def to_proxy(html_element)
|
14
|
+
json = parse_json(html_element)
|
15
|
+
|
16
|
+
ProxyFetcher::Proxy.new.tap do |proxy|
|
17
|
+
proxy.addr = json['PROXY_IP']
|
18
|
+
proxy.port = json['PROXY_PORT'].to_i(16)
|
19
|
+
proxy.anonymity = json['PROXY_TYPE']
|
20
|
+
proxy.country = json['PROXY_COUNTRY']
|
21
|
+
proxy.response_time = json['PROXY_TIME'].to_i
|
22
|
+
proxy.type = ProxyFetcher::Proxy::HTTP
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
private
|
27
|
+
|
28
|
+
def parse_json(element)
|
29
|
+
javascript = clear(element.content)[/{.+}/im]
|
30
|
+
JSON.parse(javascript)
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
ProxyFetcher::Configuration.register_provider(:gather_proxy, GatherProxy)
|
35
|
+
end
|
36
|
+
end
|
@@ -0,0 +1,48 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
module Providers
|
3
|
+
class HTTPTunnel < Base
|
4
|
+
PROVIDER_URL = 'http://www.httptunnel.ge/ProxyListForFree.aspx'.freeze
|
5
|
+
|
6
|
+
def load_proxy_list(*)
|
7
|
+
doc = load_document(PROVIDER_URL)
|
8
|
+
doc.xpath('//table[contains(@id, "GridView")]/tr[(count(td)>2)]')
|
9
|
+
end
|
10
|
+
|
11
|
+
def to_proxy(html_element)
|
12
|
+
ProxyFetcher::Proxy.new.tap do |proxy|
|
13
|
+
uri = parse_proxy_uri(html_element)
|
14
|
+
proxy.addr = uri.host
|
15
|
+
proxy.port = uri.port
|
16
|
+
|
17
|
+
proxy.country = parse_country(html_element)
|
18
|
+
proxy.anonymity = parse_anonymity(html_element)
|
19
|
+
proxy.type = ProxyFetcher::Proxy::HTTP
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
private
|
24
|
+
|
25
|
+
def parse_proxy_uri(element)
|
26
|
+
full_addr = parse_element(element, 'td[1]')
|
27
|
+
URI.parse("http://#{full_addr}")
|
28
|
+
end
|
29
|
+
|
30
|
+
def parse_country(element)
|
31
|
+
element.at('img').attr('title')
|
32
|
+
end
|
33
|
+
|
34
|
+
def parse_anonymity(element)
|
35
|
+
transparency = parse_element(element, 'td[5]').to_sym
|
36
|
+
|
37
|
+
{
|
38
|
+
A: 'Anonimous',
|
39
|
+
E: 'Elite',
|
40
|
+
T: 'Transparent',
|
41
|
+
U: 'Unknown'
|
42
|
+
}.fetch(transparency, 'Unknown')
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
ProxyFetcher::Configuration.register_provider(:http_tunnel, HTTPTunnel)
|
47
|
+
end
|
48
|
+
end
|
data/lib/proxy_fetcher/proxy.rb
CHANGED
@@ -14,12 +14,13 @@ module ProxyFetcher
|
|
14
14
|
def fetch
|
15
15
|
request = Net::HTTP::Get.new(@uri.to_s)
|
16
16
|
request['Connection'] = 'keep-alive'
|
17
|
+
request['User-Agent'] = ProxyFetcher.config.user_agent
|
17
18
|
response = @http.request(request)
|
18
19
|
response.body
|
19
20
|
end
|
20
21
|
|
21
22
|
def https?
|
22
|
-
@uri.
|
23
|
+
@uri.is_a?(URI::HTTPS)
|
23
24
|
end
|
24
25
|
|
25
26
|
class << self
|
@@ -6,15 +6,15 @@ module ProxyFetcher
|
|
6
6
|
uri = URI.parse(URL_TO_CHECK)
|
7
7
|
@http = Net::HTTP.new(uri.host, uri.port, proxy_addr, proxy_port.to_i)
|
8
8
|
|
9
|
-
return unless uri.
|
9
|
+
return unless uri.is_a?(URI::HTTPS)
|
10
10
|
|
11
11
|
@http.use_ssl = true
|
12
12
|
@http.verify_mode = OpenSSL::SSL::VERIFY_NONE
|
13
13
|
end
|
14
14
|
|
15
15
|
def connectable?
|
16
|
-
@http.open_timeout = ProxyFetcher.config.
|
17
|
-
@http.read_timeout = ProxyFetcher.config.
|
16
|
+
@http.open_timeout = ProxyFetcher.config.timeout
|
17
|
+
@http.read_timeout = ProxyFetcher.config.timeout
|
18
18
|
|
19
19
|
@http.start { |connection| return true if connection.request_head('/') }
|
20
20
|
|
data/proxy_fetcher.gemspec
CHANGED
@@ -5,7 +5,7 @@ require 'proxy_fetcher/version'
|
|
5
5
|
Gem::Specification.new do |gem|
|
6
6
|
gem.name = 'proxy_fetcher'
|
7
7
|
gem.version = ProxyFetcher.gem_version
|
8
|
-
gem.date = '2017-09-
|
8
|
+
gem.date = '2017-09-06'
|
9
9
|
gem.summary = 'Ruby gem for dealing with proxy lists from different providers'
|
10
10
|
gem.description = 'This gem can help your Ruby application to make HTTP(S) requests ' \
|
11
11
|
'from proxy server by fetching and validating proxy lists from the different providers.'
|
@@ -0,0 +1,125 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'json'
|
3
|
+
|
4
|
+
require 'evil-proxy'
|
5
|
+
require 'evil-proxy/async'
|
6
|
+
|
7
|
+
describe ProxyFetcher::Client do
|
8
|
+
before :all do
|
9
|
+
ProxyFetcher.configure do |config|
|
10
|
+
config.provider = :xroxy
|
11
|
+
config.timeout = 5
|
12
|
+
end
|
13
|
+
|
14
|
+
@server = EvilProxy::MITMProxyServer.new Port: 3128, Quiet: true
|
15
|
+
@server.start
|
16
|
+
end
|
17
|
+
|
18
|
+
after :all do
|
19
|
+
@server.shutdown
|
20
|
+
end
|
21
|
+
|
22
|
+
# Use local proxy server in order to avoid side effects, non-working proxies, etc
|
23
|
+
before :each do
|
24
|
+
proxy = ProxyFetcher::Proxy.new(addr: '127.0.0.1', port: 3128, type: 'HTTP, HTTPS')
|
25
|
+
ProxyFetcher::Client::ProxiesRegistry.manager.instance_variable_set(:'@proxies', [proxy])
|
26
|
+
allow_any_instance_of(ProxyFetcher::Providers::Base).to receive(:fetch_proxies!).and_return([proxy])
|
27
|
+
end
|
28
|
+
|
29
|
+
context 'GET request with the valid proxy' do
|
30
|
+
it 'successfully returns page content for HTTP' do
|
31
|
+
content = ProxyFetcher::Client.get('http://httpbin.org')
|
32
|
+
|
33
|
+
expect(content).not_to be_nil
|
34
|
+
expect(content).not_to be_empty
|
35
|
+
end
|
36
|
+
|
37
|
+
it 'successfully returns page content for HTTPS' do
|
38
|
+
content = ProxyFetcher::Client.get('https://httpbin.org')
|
39
|
+
|
40
|
+
expect(content).not_to be_nil
|
41
|
+
expect(content).not_to be_empty
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
context 'POST request with the valid proxy' do
|
46
|
+
it 'successfully returns page content for HTTP' do
|
47
|
+
headers = {
|
48
|
+
'X-Proxy-Fetcher-Version' => ProxyFetcher::VERSION::STRING
|
49
|
+
}
|
50
|
+
content = ProxyFetcher::Client.post('http://httpbin.org/post', { param: 'value'} , headers: headers)
|
51
|
+
|
52
|
+
expect(content).not_to be_nil
|
53
|
+
expect(content).not_to be_empty
|
54
|
+
|
55
|
+
json = JSON.parse(content)
|
56
|
+
|
57
|
+
expect(json['headers']['X-Proxy-Fetcher-Version']).to eq(ProxyFetcher::VERSION::STRING)
|
58
|
+
expect(json['headers']['User-Agent']).to eq(ProxyFetcher.config.user_agent)
|
59
|
+
end
|
60
|
+
end
|
61
|
+
|
62
|
+
context 'PUT request with the valid proxy' do
|
63
|
+
it 'successfully returns page content for HTTP' do
|
64
|
+
content = ProxyFetcher::Client.put('http://httpbin.org/put', 'param=PutValue')
|
65
|
+
|
66
|
+
expect(content).not_to be_nil
|
67
|
+
expect(content).not_to be_empty
|
68
|
+
|
69
|
+
json = JSON.parse(content)
|
70
|
+
|
71
|
+
expect(json['form']['param']).to eq('PutValue')
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
context 'PATCH request with the valid proxy' do
|
76
|
+
it 'successfully returns page content for HTTP' do
|
77
|
+
content = ProxyFetcher::Client.patch('http://httpbin.org/patch', param: 'value')
|
78
|
+
|
79
|
+
expect(content).not_to be_nil
|
80
|
+
expect(content).not_to be_empty
|
81
|
+
|
82
|
+
json = JSON.parse(content)
|
83
|
+
|
84
|
+
expect(json['form']['param']).to eq('value')
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
88
|
+
context 'DELETE request with the valid proxy' do
|
89
|
+
it 'successfully returns page content for HTTP' do
|
90
|
+
content = ProxyFetcher::Client.delete('http://httpbin.org/delete')
|
91
|
+
|
92
|
+
expect(content).not_to be_nil
|
93
|
+
expect(content).not_to be_empty
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
context 'HEAD request with the valid proxy' do
|
98
|
+
it 'successfully works' do
|
99
|
+
content = ProxyFetcher::Client.head('http://httpbin.org')
|
100
|
+
|
101
|
+
expect(content).to be_nil
|
102
|
+
end
|
103
|
+
end
|
104
|
+
|
105
|
+
context 'retries' do
|
106
|
+
it 'raises an error when reaches max retries limit' do
|
107
|
+
allow(ProxyFetcher::Client::Request).to receive(:execute).and_raise(StandardError)
|
108
|
+
|
109
|
+
expect { ProxyFetcher::Client.get('http://httpbin.org') }.to raise_error(ProxyFetcher::Exceptions::MaximumRetriesReached)
|
110
|
+
end
|
111
|
+
end
|
112
|
+
|
113
|
+
context 'redirects' do
|
114
|
+
it 'follows redirect when present' do
|
115
|
+
content = ProxyFetcher::Client.get('http://httpbin.org/absolute-redirect/2')
|
116
|
+
|
117
|
+
expect(content).not_to be_nil
|
118
|
+
expect(content).not_to be_empty
|
119
|
+
end
|
120
|
+
|
121
|
+
it 'raises an error when reaches max redirects limit' do
|
122
|
+
expect { ProxyFetcher::Client.get('http://httpbin.org/absolute-redirect/11') }.to raise_error(ProxyFetcher::Exceptions::MaximumRedirectsReached)
|
123
|
+
end
|
124
|
+
end
|
125
|
+
end
|
@@ -19,7 +19,7 @@ describe ProxyFetcher::Configuration do
|
|
19
19
|
MyWrongHTTPClient = Class.new
|
20
20
|
|
21
21
|
expect { ProxyFetcher.config.http_client = MyWrongHTTPClient }
|
22
|
-
.to raise_error(ProxyFetcher::
|
22
|
+
.to raise_error(ProxyFetcher::Exceptions::WrongCustomClass)
|
23
23
|
end
|
24
24
|
end
|
25
25
|
|
@@ -38,21 +38,21 @@ describe ProxyFetcher::Configuration do
|
|
38
38
|
MyWrongProxyValidator = Class.new
|
39
39
|
|
40
40
|
expect { ProxyFetcher.config.proxy_validator = MyWrongProxyValidator }
|
41
|
-
.to raise_error(ProxyFetcher::
|
41
|
+
.to raise_error(ProxyFetcher::Exceptions::WrongCustomClass)
|
42
42
|
end
|
43
43
|
end
|
44
44
|
|
45
45
|
context 'custom provider' do
|
46
46
|
it 'failed on registration if provider class already registered' do
|
47
47
|
expect { ProxyFetcher::Configuration.register_provider(:xroxy, Class.new) }
|
48
|
-
.to raise_error(ProxyFetcher::
|
48
|
+
.to raise_error(ProxyFetcher::Exceptions::RegisteredProvider)
|
49
49
|
end
|
50
50
|
|
51
51
|
it "failed on proxy list fetching if provider doesn't registered" do
|
52
52
|
ProxyFetcher.config.provider = :not_existing_provider
|
53
53
|
|
54
54
|
expect { ProxyFetcher::Manager.new }
|
55
|
-
.to raise_error(ProxyFetcher::
|
55
|
+
.to raise_error(ProxyFetcher::Exceptions::UnknownProvider)
|
56
56
|
end
|
57
57
|
end
|
58
58
|
end
|
data/spec/spec_helper.rb
CHANGED
@@ -0,0 +1,26 @@
|
|
1
|
+
require 'evil-proxy'
|
2
|
+
|
3
|
+
EvilProxy::HTTPProxyServer.class_eval do
|
4
|
+
def do_PUT(req, res)
|
5
|
+
perform_proxy_request(req, res) do |http, path, header|
|
6
|
+
http.put(path, req.body || '', header)
|
7
|
+
end
|
8
|
+
end
|
9
|
+
|
10
|
+
def do_DELETE(req, res)
|
11
|
+
perform_proxy_request(req, res) do |http, path, header|
|
12
|
+
http.delete(path, header)
|
13
|
+
end
|
14
|
+
end
|
15
|
+
|
16
|
+
def do_PATCH(req, res)
|
17
|
+
perform_proxy_request(req, res) do |http, path, header|
|
18
|
+
http.patch(path, req.body || '', header)
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
# This method is not needed for PUT but I added for completeness
|
23
|
+
def do_OPTIONS(_req, res)
|
24
|
+
res['allow'] = 'GET,HEAD,POST,OPTIONS,CONNECT,PUT,PATCH,DELETE'
|
25
|
+
end
|
26
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: proxy_fetcher
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Nikita Bulai
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-09-
|
11
|
+
date: 2017-09-06 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
@@ -63,13 +63,18 @@ files:
|
|
63
63
|
- Rakefile
|
64
64
|
- bin/proxy_fetcher
|
65
65
|
- lib/proxy_fetcher.rb
|
66
|
+
- lib/proxy_fetcher/client/client.rb
|
67
|
+
- lib/proxy_fetcher/client/proxies_registry.rb
|
68
|
+
- lib/proxy_fetcher/client/request.rb
|
66
69
|
- lib/proxy_fetcher/configuration.rb
|
67
70
|
- lib/proxy_fetcher/configuration/providers_registry.rb
|
71
|
+
- lib/proxy_fetcher/exceptions.rb
|
68
72
|
- lib/proxy_fetcher/manager.rb
|
69
73
|
- lib/proxy_fetcher/providers/base.rb
|
70
74
|
- lib/proxy_fetcher/providers/free_proxy_list.rb
|
71
75
|
- lib/proxy_fetcher/providers/free_proxy_list_ssl.rb
|
72
|
-
- lib/proxy_fetcher/providers/
|
76
|
+
- lib/proxy_fetcher/providers/gather_proxy.rb
|
77
|
+
- lib/proxy_fetcher/providers/http_tunnel.rb
|
73
78
|
- lib/proxy_fetcher/providers/proxy_docker.rb
|
74
79
|
- lib/proxy_fetcher/providers/proxy_list.rb
|
75
80
|
- lib/proxy_fetcher/providers/xroxy.rb
|
@@ -79,17 +84,20 @@ files:
|
|
79
84
|
- lib/proxy_fetcher/utils/proxy_validator.rb
|
80
85
|
- lib/proxy_fetcher/version.rb
|
81
86
|
- proxy_fetcher.gemspec
|
87
|
+
- spec/proxy_fetcher/client_spec.rb
|
82
88
|
- spec/proxy_fetcher/configuration_spec.rb
|
83
89
|
- spec/proxy_fetcher/providers/base_spec.rb
|
84
90
|
- spec/proxy_fetcher/providers/free_proxy_list_spec.rb
|
85
91
|
- spec/proxy_fetcher/providers/free_proxy_list_ssl_spec.rb
|
86
|
-
- spec/proxy_fetcher/providers/
|
92
|
+
- spec/proxy_fetcher/providers/gather_proxy_spec.rb
|
93
|
+
- spec/proxy_fetcher/providers/http_tunnel_spec.rb
|
87
94
|
- spec/proxy_fetcher/providers/multiple_providers_spec.rb
|
88
95
|
- spec/proxy_fetcher/providers/proxy_docker_spec.rb
|
89
96
|
- spec/proxy_fetcher/providers/proxy_list_spec.rb
|
90
97
|
- spec/proxy_fetcher/providers/xroxy_spec.rb
|
91
98
|
- spec/proxy_fetcher/proxy_spec.rb
|
92
99
|
- spec/spec_helper.rb
|
100
|
+
- spec/support/evil_proxy_patch.rb
|
93
101
|
- spec/support/manager_examples.rb
|
94
102
|
homepage: http://github.com/nbulaj/proxy_fetcher
|
95
103
|
licenses:
|
@@ -1,35 +0,0 @@
|
|
1
|
-
module ProxyFetcher
|
2
|
-
module Providers
|
3
|
-
class HideMyName < Base
|
4
|
-
PROVIDER_URL = 'https://hidemy.name/en/proxy-list/'.freeze
|
5
|
-
|
6
|
-
def load_proxy_list(filters = { type: 'hs' })
|
7
|
-
doc = load_document(PROVIDER_URL, filters)
|
8
|
-
doc.xpath('//table[@class="proxy__t"]/tbody/tr')
|
9
|
-
end
|
10
|
-
|
11
|
-
def to_proxy(html_element)
|
12
|
-
ProxyFetcher::Proxy.new.tap do |proxy|
|
13
|
-
proxy.addr = parse_element(html_element, 'td[1]')
|
14
|
-
proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
|
15
|
-
proxy.anonymity = parse_element(html_element, 'td[6]')
|
16
|
-
proxy.country = parse_country(html_element)
|
17
|
-
proxy.type = parse_element(html_element, 'td[5]')
|
18
|
-
proxy.response_time = parse_response_time(html_element)
|
19
|
-
end
|
20
|
-
end
|
21
|
-
|
22
|
-
private
|
23
|
-
|
24
|
-
def parse_country(element)
|
25
|
-
clear(element.at_xpath('*//span[1]/following-sibling::text()[1]').content)
|
26
|
-
end
|
27
|
-
|
28
|
-
def parse_response_time(element)
|
29
|
-
convert_to_int(element.at_xpath('td[4]').content.strip[/\d+/])
|
30
|
-
end
|
31
|
-
end
|
32
|
-
|
33
|
-
ProxyFetcher::Configuration.register_provider(:hide_my_name, HideMyName)
|
34
|
-
end
|
35
|
-
end
|