proxy_fetcher 0.4.1 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 40029164cd3a21183e8f22191e2cd8c01ba05ae3
4
- data.tar.gz: ff557370ec0e68c0817fb009f9e483f936eaecd4
3
+ metadata.gz: 287db7b55e3f0798e263fe7268f8a709e4d8e8c0
4
+ data.tar.gz: cde6d4dc22e60aa012c02b1f679fcc72b23c6114
5
5
  SHA512:
6
- metadata.gz: 9a1461d4c42c3478675a4ad20e0d5e41a674d3486fe1026fdf4d95a1e484c4048f8f01476aaac14ccfa0c2a9e4e78ad2049e6b284828eeeb7c6670b1d0923630
7
- data.tar.gz: 98f74fc50f15dc7f21f0f308494e4e874ad184cad512b401173da7ecc648c4fd0672aca45f2f3a4f0ec55aa6c01850d7979b579b8609a013d11c37a59443c10b
6
+ metadata.gz: a54e2d725338bc5c859d415cae5b0397b4aa2a75a0f00917edf4eaa2f845e893c1d73f8e56b86ddfce85c6016fcf473ed4ad2e29fbf78a2c65495df050a320d7
7
+ data.tar.gz: 61b2fc3dfd20c75045c8b435d0855e0596071facc8bef9e73d68e46996faaad8a9e1ccbb3b9b818f1449138e7d52ee583dffaa813cc5d3696a24ad6ba835475c
data/.gitignore CHANGED
@@ -14,6 +14,7 @@ pickle-email-*.html
14
14
  .idea
15
15
  Gemfile.lock
16
16
  *.gem
17
+ certs
17
18
 
18
19
  # TODO Comment out this rule if you are OK with secrets being uploaded to the repo
19
20
  config/initializers/secret_token.rb
@@ -1,6 +1,7 @@
1
1
  LineLength:
2
2
  Max: 120
3
3
  AllCops:
4
+ TargetRubyVersion: 2.4
4
5
  Exclude:
5
6
  - 'spec/**/*'
6
7
  - 'bin/*'
@@ -9,3 +10,5 @@ Rails:
9
10
  Enabled: false
10
11
  Documentation:
11
12
  Enabled: false
13
+ FrozenStringLiteralComment:
14
+ Enabled: false
@@ -2,6 +2,22 @@
2
2
 
3
3
  Reverse Chronological Order:
4
4
 
5
+ ## `0.5.0` (2017-09-06)
6
+
7
+ * Remove HideMyName provider (not works anymore)
8
+ * Fix ProxyDocker provider
9
+ * Add `ProxyFetcher::Client` to make interacting with proxies easier
10
+ * Add new providers (Gather Proxy & HTTP Tunnel Genius)
11
+ * Simplify `connection_timeout` config option to `timeout`
12
+ * Make User-Agent configurable
13
+ * Move all the gem exceptions under `ProxyFetcher::Error` base class
14
+ * Small improvements
15
+
16
+ ## `0.4.1` (2017-09-04)
17
+
18
+ * Use all registered providers by default
19
+ * Disable HideMyName provider (now ше uses CloudFlare)
20
+
5
21
  ## `0.4.0` (2017-08-26)
6
22
 
7
23
  * Support operations with multiple providers
data/Gemfile CHANGED
@@ -4,4 +4,5 @@ gemspec
4
4
 
5
5
  group :test do
6
6
  gem 'coveralls', require: false
7
+ gem 'evil-proxy'
7
8
  end
data/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
  [![License](http://img.shields.io/badge/license-MIT-brightgreen.svg)](#license)
7
7
 
8
8
  This gem can help your Ruby application to make HTTP(S) requests from proxy by fetching and validating actual
9
- proxy lists from multiple providers like [HideMyName](https://hidemy.name/en/).
9
+ proxy lists from multiple providers.
10
10
 
11
11
  It gives you a `Manager` class that can load proxy lists, validate them and return random or specific proxies. Take a look
12
12
  at the documentation below to find all the gem features.
@@ -20,6 +20,7 @@ validating proxy lists from the different providers. [Checkout examples](#standa
20
20
  - [Example of usage](#example-of-usage)
21
21
  - [In Ruby application](#in-ruby-application)
22
22
  - [Standalone](#standalone)
23
+ - [Client](#client)
23
24
  - [Configuration](#configuration)
24
25
  - [Proxy validation speed](#proxy-validation-speed)
25
26
  - [Proxy object](#proxy-object)
@@ -32,7 +33,7 @@ validating proxy lists from the different providers. [Checkout examples](#standa
32
33
  If using bundler, first add 'proxy_fetcher' to your Gemfile:
33
34
 
34
35
  ```ruby
35
- gem 'proxy_fetcher', '~> 0.4'
36
+ gem 'proxy_fetcher', '~> 0.5'
36
37
  ```
37
38
 
38
39
  or if you want to use the latest version (from `master` branch), then:
@@ -50,7 +51,7 @@ bundle install
50
51
  Otherwise simply install the gem:
51
52
 
52
53
  ```sh
53
- gem install proxy_fetcher -v '0.4'
54
+ gem install proxy_fetcher -v '0.5'
54
55
  ```
55
56
 
56
57
  ## Example of usage
@@ -123,7 +124,7 @@ If you need to filter proxy list, for example, by country or response time and s
123
124
  then you can just pass your filters like a simple Ruby hash to the Manager instance:
124
125
 
125
126
  ```ruby
126
- ProxyFetcher.config.providers = :hide_my_name
127
+ ProxyFetcher.config.providers = :proxy_docker
127
128
 
128
129
  manager = ProxyFetcher::Manager.new(filters: { country: 'PL', maxtime: '500' })
129
130
  manager.proxies
@@ -134,7 +135,7 @@ manager.proxies
134
135
  If you are using multiple providers, then you can split your filters by proxy provider names:
135
136
 
136
137
  ```ruby
137
- ProxyFetcher.config.providers = [:hide_my_name, :xroxy]
138
+ ProxyFetcher.config.providers = [:proxy_docker, :xroxy]
138
139
 
139
140
  manager = ProxyFetcher::Manager.new(filters: {
140
141
  hide_my_name: {
@@ -194,19 +195,64 @@ To get all the possible options run:
194
195
  proxy_fetcher --help
195
196
  ```
196
197
 
198
+ ## Client
199
+
200
+ ProxyFetcher gem provides you a ready-to-use HTTP client that made requesting with proxies easy. It does all the work
201
+ with the proxy lists for you (load, validate, refresh, find proxy by type, follow redirects, etc). All you need it to
202
+ make HTTP(S) requests:
203
+
204
+ ```ruby
205
+ require 'proxy-fetcher'
206
+
207
+ ProxyFetcher::Client.get 'https://example.com/resource'
208
+
209
+ ProxyFetcher::Client.post 'https://example.com/resource', { param: 'value' }
210
+
211
+ ProxyFetcher::Client.post 'https://example.com/resource', 'Any data'
212
+
213
+ ProxyFetcher::Client.post 'https://example.com/resource', { param: 'value'}.to_json , headers: { 'Content-Type': 'application/json' }
214
+
215
+ ProxyFetcher::Client.put 'https://example.com/resource', { param: 'value' }
216
+
217
+ ProxyFetcher::Client.patch 'https://example.com/resource', { param: 'value' }
218
+
219
+ ProxyFetcher::Client.delete 'https://example.com/resource'
220
+ ```
221
+
222
+ By default, `ProxyFetcher::Client` makes 1000 attempts to send a HTTP request in case if proxy is out of order or the
223
+ remote server returns an error. You can increase or decrease this number for your case or set it to `nil` if you want to
224
+ make infinite number of requests (or before your Ruby process will die :skull:):
225
+
226
+ ```ruby
227
+ require 'proxy-fetcher'
228
+
229
+ ProxyFetcher::Client.get 'https://example.com/resource', options: { max_retries: 10_000 }
230
+ ```
231
+
232
+ Btw, if you need support of JavaScript or some other features, you need to implement your own client using, for example,
233
+ `selenium-webdriver`.
234
+
197
235
  ## Configuration
198
236
 
199
237
  To change open/read timeout for `cleanup!` and `connectable?` methods you need to change `ProxyFetcher.config`:
200
238
 
201
239
  ```ruby
202
240
  ProxyFetcher.configure do |config|
203
- config.connection_timeout = 1 # default is 3
241
+ config.timeout = 1 # default is 3
204
242
  end
205
243
 
206
244
  manager = ProxyFetcher::Manager.new
207
245
  manager.cleanup!
208
246
  ```
209
247
 
248
+ Also you can set your custom User-Agent:
249
+
250
+ ```ruby
251
+ ProxyFetcher.configure do |config|
252
+ config.user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
253
+ end
254
+ ```
255
+
210
256
  ProxyFetcher uses simple Ruby solution for dealing with HTTP(S) requests - `net/http` library from the stdlib. If you wanna add, for example, your custom provider that
211
257
  was developed as a Single Page Application (SPA) with some JavaScript, then you will need something like [selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb)
212
258
  to properly load the content of the website. For those and other cases you can write your own class for fetching HTML content by the URL and setup it
@@ -269,7 +315,7 @@ ProxyFetcher.config.pool_size = 50
269
315
  You can experiment with the threads pool size to find an optimal number of maximum threads count for you PC and OS.
270
316
  This will definitely give you some performance improvements.
271
317
 
272
- Moreover, the common proxy validation speed depends on `ProxyFetcher.config.connection_timeout` option that is equal
318
+ Moreover, the common proxy validation speed depends on `ProxyFetcher.config.timeout` option that is equal
273
319
  to `3` by default. It means that gem will wait 3 seconds for the server answer to check if particular proxy is connectable.
274
320
  You can decrease this option to `1`, for example, and it will heavily increase proxy validation speed (**but remember**
275
321
  that some proxies could be connectable, but slow, so with this option you will clear proxy list from the proxies that
@@ -300,10 +346,11 @@ Also you can call next instance methods for every Proxy object:
300
346
 
301
347
  Currently ProxyFetcher can deal with next proxy providers (services):
302
348
 
303
- * Hide My Name (**currently does not work**)
304
349
  * Free Proxy List
305
350
  * Free SSL Proxies
306
351
  * Proxy Docker
352
+ * Gather Proxy
353
+ * HTTP Tunnel Genius
307
354
  * Proxy List
308
355
  * XRoxy
309
356
 
@@ -3,6 +3,7 @@ require 'net/https'
3
3
  require 'nokogiri'
4
4
  require 'thread'
5
5
 
6
+ require File.dirname(__FILE__) + '/proxy_fetcher/exceptions'
6
7
  require File.dirname(__FILE__) + '/proxy_fetcher/configuration'
7
8
  require File.dirname(__FILE__) + '/proxy_fetcher/configuration/providers_registry'
8
9
  require File.dirname(__FILE__) + '/proxy_fetcher/proxy'
@@ -11,13 +12,17 @@ require File.dirname(__FILE__) + '/proxy_fetcher/manager'
11
12
  require File.dirname(__FILE__) + '/proxy_fetcher/utils/http_client'
12
13
  require File.dirname(__FILE__) + '/proxy_fetcher/utils/html'
13
14
  require File.dirname(__FILE__) + '/proxy_fetcher/utils/proxy_validator'
15
+ require File.dirname(__FILE__) + '/proxy_fetcher/client/client'
16
+ require File.dirname(__FILE__) + '/proxy_fetcher/client/request'
17
+ require File.dirname(__FILE__) + '/proxy_fetcher/client/proxies_registry'
14
18
 
15
19
  module ProxyFetcher
16
20
  module Providers
17
21
  require File.dirname(__FILE__) + '/proxy_fetcher/providers/base'
18
22
  require File.dirname(__FILE__) + '/proxy_fetcher/providers/free_proxy_list'
19
23
  require File.dirname(__FILE__) + '/proxy_fetcher/providers/free_proxy_list_ssl'
20
- require File.dirname(__FILE__) + '/proxy_fetcher/providers/hide_my_name'
24
+ require File.dirname(__FILE__) + '/proxy_fetcher/providers/gather_proxy'
25
+ require File.dirname(__FILE__) + '/proxy_fetcher/providers/http_tunnel'
21
26
  require File.dirname(__FILE__) + '/proxy_fetcher/providers/proxy_docker'
22
27
  require File.dirname(__FILE__) + '/proxy_fetcher/providers/proxy_list'
23
28
  require File.dirname(__FILE__) + '/proxy_fetcher/providers/xroxy'
@@ -0,0 +1,71 @@
1
+ module ProxyFetcher
2
+ module Client
3
+ class << self
4
+ def get(url, headers: {}, options: {})
5
+ request_without_payload(:get, url, headers, options)
6
+ end
7
+
8
+ def head(url, headers: {}, options: {})
9
+ request_without_payload(:head, url, headers, options)
10
+ end
11
+
12
+ def post(url, payload, headers: {}, options: {})
13
+ request_with_payload(:post, url, payload, headers, options)
14
+ end
15
+
16
+ def delete(url, headers: {}, options: {})
17
+ request_without_payload(:delete, url, headers, options)
18
+ end
19
+
20
+ def put(url, payload, headers: {}, options: {})
21
+ request_with_payload(:put, url, payload, headers, options)
22
+ end
23
+
24
+ def patch(url, payload, headers: {}, options: {})
25
+ request_with_payload(:patch, url, payload, headers, options)
26
+ end
27
+
28
+ private
29
+
30
+ def request_with_payload(method, url, payload, headers, options)
31
+ safe_request_to(url, options.fetch(:max_retries, 1000)) do |proxy|
32
+ opts = options.merge(url: url, payload: payload, proxy: proxy, headers: default_headers.merge(headers))
33
+
34
+ Request.execute(method: method, **opts)
35
+ end
36
+ end
37
+
38
+ def request_without_payload(method, url, headers, options)
39
+ safe_request_to(url, options.fetch(:max_retries, 1000)) do |proxy|
40
+ opts = options.merge(url: url, proxy: proxy, headers: default_headers.merge(headers))
41
+
42
+ Request.execute(method: method, **opts)
43
+ end
44
+ end
45
+
46
+ def default_headers
47
+ {
48
+ 'User-Agent' => ProxyFetcher.config.user_agent
49
+ }
50
+ end
51
+
52
+ def safe_request_to(url, max_retries = 1000)
53
+ tries = 0
54
+
55
+ begin
56
+ proxy = ProxiesRegistry.find_proxy_for(url)
57
+ yield(proxy)
58
+ rescue ProxyFetcher::Error
59
+ raise
60
+ rescue StandardError
61
+ raise ProxyFetcher::Exceptions::MaximumRetriesReached if max_retries && tries >= max_retries
62
+
63
+ ProxiesRegistry.invalidate_proxy!(proxy)
64
+ tries += 1
65
+
66
+ retry
67
+ end
68
+ end
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,32 @@
1
+ module ProxyFetcher
2
+ module Client
3
+ class ProxiesRegistry
4
+ class << self
5
+ def invalidate_proxy!(proxy)
6
+ manager.proxies.delete(proxy)
7
+ manager.refresh_list! if manager.proxies.empty?
8
+ end
9
+
10
+ def find_proxy_for(url)
11
+ proxy = if URI.parse(url).is_a?(URI::HTTPS)
12
+ manager.proxies.detect(&:ssl?)
13
+ else
14
+ manager.get
15
+ end
16
+
17
+ return proxy unless proxy.nil?
18
+
19
+ manager.refresh_list!
20
+ find_proxy_for(url)
21
+ end
22
+
23
+ def manager
24
+ manager = Thread.current[:proxy_fetcher_manager]
25
+ return manager unless manager.nil?
26
+
27
+ Thread.current[:proxy_fetcher_manager] = ProxyFetcher::Manager.new
28
+ end
29
+ end
30
+ end
31
+ end
32
+ end
@@ -0,0 +1,88 @@
1
+ module ProxyFetcher
2
+ module Client
3
+ class Request
4
+ URL_ENCODED = {
5
+ 'Content-Type' => 'application/x-www-form-urlencoded'
6
+ }.freeze
7
+
8
+ DEFAULT_SSL_OPTIONS = {
9
+ verify_mode: OpenSSL::SSL::VERIFY_NONE
10
+ }.freeze
11
+
12
+ attr_reader :http, :method, :uri, :headers, :timeout,
13
+ :payload, :proxy, :max_redirects, :ssl_options
14
+
15
+ def self.execute(args)
16
+ new(args).execute
17
+ end
18
+
19
+ def initialize(args)
20
+ raise ArgumentError, 'args must be a Hash!' unless args.is_a?(Hash)
21
+
22
+ @uri = URI.parse(args.fetch(:url))
23
+ @method = args.fetch(:method).to_s.capitalize
24
+ @headers = (args[:headers] || {}).dup
25
+ @payload = preprocess_payload(args[:payload])
26
+ @timeout = args.fetch(:timeout, ProxyFetcher.config.timeout)
27
+ @ssl_options = args.fetch(:ssl_options, DEFAULT_SSL_OPTIONS)
28
+
29
+ @proxy = args.fetch(:proxy)
30
+ @max_redirects = args.fetch(:max_redirects, 10)
31
+
32
+ build_http_client
33
+ end
34
+
35
+ def execute
36
+ request = request_class_for(method).new(uri, headers)
37
+
38
+ http.start do |connection|
39
+ process_response!(connection.request(request, payload))
40
+ end
41
+ end
42
+
43
+ private
44
+
45
+ def preprocess_payload(payload)
46
+ return if payload.nil?
47
+
48
+ if payload.is_a?(Hash)
49
+ headers.merge(URL_ENCODED)
50
+ URI.encode_www_form(payload)
51
+ else
52
+ payload
53
+ end
54
+ end
55
+
56
+ def build_http_client
57
+ @http = Net::HTTP.new(uri.host, uri.port, proxy.addr, proxy.port)
58
+
59
+ @http.use_ssl = uri.is_a?(URI::HTTPS)
60
+ @http.verify_mode = ssl_options.fetch(:verify_mode)
61
+ @http.open_timeout = timeout
62
+ @http.read_timeout = timeout
63
+ end
64
+
65
+ def process_response!(http_response)
66
+ case http_response
67
+ when Net::HTTPSuccess then http_response.read_body
68
+ when Net::HTTPRedirection then follow_redirection(http_response)
69
+ else
70
+ http_response.error!
71
+ end
72
+ end
73
+
74
+ def follow_redirection(http_response)
75
+ raise ProxyFetcher::Exceptions::MaximumRedirectsReached if max_redirects <= 0
76
+
77
+ url = http_response.fetch('location')
78
+ url = uri.merge(url).to_s unless url.downcase.start_with?('http')
79
+
80
+ Request.execute(method: :get, url: url, proxy: proxy, headers: headers, timeout: timeout, max_redirects: max_redirects - 1)
81
+ end
82
+
83
+ def request_class_for(method)
84
+ Net::HTTP.const_get(method, false)
85
+ end
86
+ end
87
+ end
88
+ end
@@ -1,10 +1,11 @@
1
1
  module ProxyFetcher
2
2
  class Configuration
3
- WrongCustomClass = Class.new(StandardError)
4
-
5
- attr_accessor :providers, :connection_timeout, :pool_size
3
+ attr_accessor :providers, :timeout, :pool_size, :user_agent
6
4
  attr_accessor :http_client, :proxy_validator
7
5
 
6
+ # rubocop:disable Metrics/LineLength
7
+ DEFAULT_USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112 Safari/537.36'.freeze
8
+
8
9
  class << self
9
10
  def providers_registry
10
11
  @registry ||= ProvidersRegistry.new
@@ -25,8 +26,9 @@ module ProxyFetcher
25
26
 
26
27
  # Sets default configuration options
27
28
  def reset!
29
+ @user_agent = DEFAULT_USER_AGENT
28
30
  @pool_size = 10
29
- @connection_timeout = 3
31
+ @timeout = 3
30
32
  @http_client = HTTPClient
31
33
  @proxy_validator = ProxyValidator
32
34
 
@@ -53,7 +55,7 @@ module ProxyFetcher
53
55
  # Checks if custom class has some required class methods
54
56
  def setup_custom_class(klass, required_methods: [])
55
57
  unless klass.respond_to?(*required_methods)
56
- raise WrongCustomClass, "#{klass} must respond to [#{Array(required_methods).join(', ')}] class methods!"
58
+ raise ProxyFetcher::Exceptions::WrongCustomClass.new(klass, required_methods)
57
59
  end
58
60
 
59
61
  klass
@@ -1,17 +1,14 @@
1
1
  module ProxyFetcher
2
2
  class ProvidersRegistry
3
- UnknownProvider = Class.new(StandardError)
4
- RegisteredProvider = Class.new(StandardError)
5
-
6
3
  def providers
7
4
  @providers ||= {}
8
5
  end
9
6
 
10
7
  # Add custom provider to common registry.
11
- # Requires proxy provider name ('hide_my_name' for example) and a class
8
+ # Requires proxy provider name ('proxy_docker' for example) and a class
12
9
  # that implements the parsing logic.
13
10
  def register(name, klass)
14
- raise RegisteredProvider, "`#{name}` provider already registered!" if providers.key?(name.to_sym)
11
+ raise ProxyFetcher::Exceptions::RegisteredProvider, name if providers.key?(name.to_sym)
15
12
 
16
13
  providers[name.to_sym] = klass
17
14
  end
@@ -23,7 +20,7 @@ module ProxyFetcher
23
20
 
24
21
  providers.fetch(provider_name)
25
22
  rescue KeyError
26
- raise UnknownProvider, "unregistered proxy provider `#{provider_name}`"
23
+ raise ProxyFetcher::Exceptions::UnknownProvider, provider_name
27
24
  end
28
25
  end
29
26
  end
@@ -0,0 +1,36 @@
1
+ module ProxyFetcher
2
+ Error = Class.new(StandardError)
3
+
4
+ module Exceptions
5
+ class WrongCustomClass < Error
6
+ def initialize(klass, methods)
7
+ required_methods = Array(methods).join(', ')
8
+ super("#{klass} must respond to [#{required_methods}] class methods!")
9
+ end
10
+ end
11
+
12
+ class UnknownProvider < Error
13
+ def initialize(provider_name)
14
+ super("unregistered proxy provider `#{provider_name}`")
15
+ end
16
+ end
17
+
18
+ class RegisteredProvider < Error
19
+ def initialize(name)
20
+ super("`#{name}` provider already registered!")
21
+ end
22
+ end
23
+
24
+ class MaximumRedirectsReached < Error
25
+ def initialize(*)
26
+ super('maximum redirects reached')
27
+ end
28
+ end
29
+
30
+ class MaximumRetriesReached < Error
31
+ def initialize(*)
32
+ super('reached the maximum number of retries')
33
+ end
34
+ end
35
+ end
36
+ end
@@ -0,0 +1,36 @@
1
+ require 'json'
2
+
3
+ module ProxyFetcher
4
+ module Providers
5
+ class GatherProxy < Base
6
+ PROVIDER_URL = 'http://www.gatherproxy.com/'.freeze
7
+
8
+ def load_proxy_list(*)
9
+ doc = load_document(PROVIDER_URL)
10
+ doc.xpath('//div[@class="proxy-list"]/table/script')
11
+ end
12
+
13
+ def to_proxy(html_element)
14
+ json = parse_json(html_element)
15
+
16
+ ProxyFetcher::Proxy.new.tap do |proxy|
17
+ proxy.addr = json['PROXY_IP']
18
+ proxy.port = json['PROXY_PORT'].to_i(16)
19
+ proxy.anonymity = json['PROXY_TYPE']
20
+ proxy.country = json['PROXY_COUNTRY']
21
+ proxy.response_time = json['PROXY_TIME'].to_i
22
+ proxy.type = ProxyFetcher::Proxy::HTTP
23
+ end
24
+ end
25
+
26
+ private
27
+
28
+ def parse_json(element)
29
+ javascript = clear(element.content)[/{.+}/im]
30
+ JSON.parse(javascript)
31
+ end
32
+ end
33
+
34
+ ProxyFetcher::Configuration.register_provider(:gather_proxy, GatherProxy)
35
+ end
36
+ end
@@ -0,0 +1,48 @@
1
+ module ProxyFetcher
2
+ module Providers
3
+ class HTTPTunnel < Base
4
+ PROVIDER_URL = 'http://www.httptunnel.ge/ProxyListForFree.aspx'.freeze
5
+
6
+ def load_proxy_list(*)
7
+ doc = load_document(PROVIDER_URL)
8
+ doc.xpath('//table[contains(@id, "GridView")]/tr[(count(td)>2)]')
9
+ end
10
+
11
+ def to_proxy(html_element)
12
+ ProxyFetcher::Proxy.new.tap do |proxy|
13
+ uri = parse_proxy_uri(html_element)
14
+ proxy.addr = uri.host
15
+ proxy.port = uri.port
16
+
17
+ proxy.country = parse_country(html_element)
18
+ proxy.anonymity = parse_anonymity(html_element)
19
+ proxy.type = ProxyFetcher::Proxy::HTTP
20
+ end
21
+ end
22
+
23
+ private
24
+
25
+ def parse_proxy_uri(element)
26
+ full_addr = parse_element(element, 'td[1]')
27
+ URI.parse("http://#{full_addr}")
28
+ end
29
+
30
+ def parse_country(element)
31
+ element.at('img').attr('title')
32
+ end
33
+
34
+ def parse_anonymity(element)
35
+ transparency = parse_element(element, 'td[5]').to_sym
36
+
37
+ {
38
+ A: 'Anonimous',
39
+ E: 'Elite',
40
+ T: 'Transparent',
41
+ U: 'Unknown'
42
+ }.fetch(transparency, 'Unknown')
43
+ end
44
+ end
45
+
46
+ ProxyFetcher::Configuration.register_provider(:http_tunnel, HTTPTunnel)
47
+ end
48
+ end
@@ -1,7 +1,7 @@
1
1
  module ProxyFetcher
2
2
  module Providers
3
3
  class ProxyDocker < Base
4
- PROVIDER_URL = 'https://www.proxydocker.com/en'.freeze
4
+ PROVIDER_URL = 'https://www.proxydocker.com/'.freeze
5
5
 
6
6
  # [NOTE] Doesn't support direct filters
7
7
  def load_proxy_list(*)
@@ -15,7 +15,9 @@ module ProxyFetcher
15
15
  end
16
16
  end
17
17
 
18
- alias ssl? https?
18
+ def ssl?
19
+ https? || socks4? || socks5?
20
+ end
19
21
 
20
22
  def initialize(attributes = {})
21
23
  attributes.each do |attr, value|
@@ -14,12 +14,13 @@ module ProxyFetcher
14
14
  def fetch
15
15
  request = Net::HTTP::Get.new(@uri.to_s)
16
16
  request['Connection'] = 'keep-alive'
17
+ request['User-Agent'] = ProxyFetcher.config.user_agent
17
18
  response = @http.request(request)
18
19
  response.body
19
20
  end
20
21
 
21
22
  def https?
22
- @uri.scheme.casecmp('https').zero?
23
+ @uri.is_a?(URI::HTTPS)
23
24
  end
24
25
 
25
26
  class << self
@@ -6,15 +6,15 @@ module ProxyFetcher
6
6
  uri = URI.parse(URL_TO_CHECK)
7
7
  @http = Net::HTTP.new(uri.host, uri.port, proxy_addr, proxy_port.to_i)
8
8
 
9
- return unless uri.scheme.casecmp('https').zero?
9
+ return unless uri.is_a?(URI::HTTPS)
10
10
 
11
11
  @http.use_ssl = true
12
12
  @http.verify_mode = OpenSSL::SSL::VERIFY_NONE
13
13
  end
14
14
 
15
15
  def connectable?
16
- @http.open_timeout = ProxyFetcher.config.connection_timeout
17
- @http.read_timeout = ProxyFetcher.config.connection_timeout
16
+ @http.open_timeout = ProxyFetcher.config.timeout
17
+ @http.read_timeout = ProxyFetcher.config.timeout
18
18
 
19
19
  @http.start { |connection| return true if connection.request_head('/') }
20
20
 
@@ -7,9 +7,9 @@ module ProxyFetcher
7
7
  # Major version number
8
8
  MAJOR = 0
9
9
  # Minor version number
10
- MINOR = 4
10
+ MINOR = 5
11
11
  # Smallest version number
12
- TINY = 1
12
+ TINY = 0
13
13
 
14
14
  # Full version number
15
15
  STRING = [MAJOR, MINOR, TINY].compact.join('.')
@@ -5,7 +5,7 @@ require 'proxy_fetcher/version'
5
5
  Gem::Specification.new do |gem|
6
6
  gem.name = 'proxy_fetcher'
7
7
  gem.version = ProxyFetcher.gem_version
8
- gem.date = '2017-09-04'
8
+ gem.date = '2017-09-06'
9
9
  gem.summary = 'Ruby gem for dealing with proxy lists from different providers'
10
10
  gem.description = 'This gem can help your Ruby application to make HTTP(S) requests ' \
11
11
  'from proxy server by fetching and validating proxy lists from the different providers.'
@@ -0,0 +1,125 @@
1
+ require 'spec_helper'
2
+ require 'json'
3
+
4
+ require 'evil-proxy'
5
+ require 'evil-proxy/async'
6
+
7
+ describe ProxyFetcher::Client do
8
+ before :all do
9
+ ProxyFetcher.configure do |config|
10
+ config.provider = :xroxy
11
+ config.timeout = 5
12
+ end
13
+
14
+ @server = EvilProxy::MITMProxyServer.new Port: 3128, Quiet: true
15
+ @server.start
16
+ end
17
+
18
+ after :all do
19
+ @server.shutdown
20
+ end
21
+
22
+ # Use local proxy server in order to avoid side effects, non-working proxies, etc
23
+ before :each do
24
+ proxy = ProxyFetcher::Proxy.new(addr: '127.0.0.1', port: 3128, type: 'HTTP, HTTPS')
25
+ ProxyFetcher::Client::ProxiesRegistry.manager.instance_variable_set(:'@proxies', [proxy])
26
+ allow_any_instance_of(ProxyFetcher::Providers::Base).to receive(:fetch_proxies!).and_return([proxy])
27
+ end
28
+
29
+ context 'GET request with the valid proxy' do
30
+ it 'successfully returns page content for HTTP' do
31
+ content = ProxyFetcher::Client.get('http://httpbin.org')
32
+
33
+ expect(content).not_to be_nil
34
+ expect(content).not_to be_empty
35
+ end
36
+
37
+ it 'successfully returns page content for HTTPS' do
38
+ content = ProxyFetcher::Client.get('https://httpbin.org')
39
+
40
+ expect(content).not_to be_nil
41
+ expect(content).not_to be_empty
42
+ end
43
+ end
44
+
45
+ context 'POST request with the valid proxy' do
46
+ it 'successfully returns page content for HTTP' do
47
+ headers = {
48
+ 'X-Proxy-Fetcher-Version' => ProxyFetcher::VERSION::STRING
49
+ }
50
+ content = ProxyFetcher::Client.post('http://httpbin.org/post', { param: 'value'} , headers: headers)
51
+
52
+ expect(content).not_to be_nil
53
+ expect(content).not_to be_empty
54
+
55
+ json = JSON.parse(content)
56
+
57
+ expect(json['headers']['X-Proxy-Fetcher-Version']).to eq(ProxyFetcher::VERSION::STRING)
58
+ expect(json['headers']['User-Agent']).to eq(ProxyFetcher.config.user_agent)
59
+ end
60
+ end
61
+
62
+ context 'PUT request with the valid proxy' do
63
+ it 'successfully returns page content for HTTP' do
64
+ content = ProxyFetcher::Client.put('http://httpbin.org/put', 'param=PutValue')
65
+
66
+ expect(content).not_to be_nil
67
+ expect(content).not_to be_empty
68
+
69
+ json = JSON.parse(content)
70
+
71
+ expect(json['form']['param']).to eq('PutValue')
72
+ end
73
+ end
74
+
75
+ context 'PATCH request with the valid proxy' do
76
+ it 'successfully returns page content for HTTP' do
77
+ content = ProxyFetcher::Client.patch('http://httpbin.org/patch', param: 'value')
78
+
79
+ expect(content).not_to be_nil
80
+ expect(content).not_to be_empty
81
+
82
+ json = JSON.parse(content)
83
+
84
+ expect(json['form']['param']).to eq('value')
85
+ end
86
+ end
87
+
88
+ context 'DELETE request with the valid proxy' do
89
+ it 'successfully returns page content for HTTP' do
90
+ content = ProxyFetcher::Client.delete('http://httpbin.org/delete')
91
+
92
+ expect(content).not_to be_nil
93
+ expect(content).not_to be_empty
94
+ end
95
+ end
96
+
97
+ context 'HEAD request with the valid proxy' do
98
+ it 'successfully works' do
99
+ content = ProxyFetcher::Client.head('http://httpbin.org')
100
+
101
+ expect(content).to be_nil
102
+ end
103
+ end
104
+
105
+ context 'retries' do
106
+ it 'raises an error when reaches max retries limit' do
107
+ allow(ProxyFetcher::Client::Request).to receive(:execute).and_raise(StandardError)
108
+
109
+ expect { ProxyFetcher::Client.get('http://httpbin.org') }.to raise_error(ProxyFetcher::Exceptions::MaximumRetriesReached)
110
+ end
111
+ end
112
+
113
+ context 'redirects' do
114
+ it 'follows redirect when present' do
115
+ content = ProxyFetcher::Client.get('http://httpbin.org/absolute-redirect/2')
116
+
117
+ expect(content).not_to be_nil
118
+ expect(content).not_to be_empty
119
+ end
120
+
121
+ it 'raises an error when reaches max redirects limit' do
122
+ expect { ProxyFetcher::Client.get('http://httpbin.org/absolute-redirect/11') }.to raise_error(ProxyFetcher::Exceptions::MaximumRedirectsReached)
123
+ end
124
+ end
125
+ end
@@ -19,7 +19,7 @@ describe ProxyFetcher::Configuration do
19
19
  MyWrongHTTPClient = Class.new
20
20
 
21
21
  expect { ProxyFetcher.config.http_client = MyWrongHTTPClient }
22
- .to raise_error(ProxyFetcher::Configuration::WrongCustomClass)
22
+ .to raise_error(ProxyFetcher::Exceptions::WrongCustomClass)
23
23
  end
24
24
  end
25
25
 
@@ -38,21 +38,21 @@ describe ProxyFetcher::Configuration do
38
38
  MyWrongProxyValidator = Class.new
39
39
 
40
40
  expect { ProxyFetcher.config.proxy_validator = MyWrongProxyValidator }
41
- .to raise_error(ProxyFetcher::Configuration::WrongCustomClass)
41
+ .to raise_error(ProxyFetcher::Exceptions::WrongCustomClass)
42
42
  end
43
43
  end
44
44
 
45
45
  context 'custom provider' do
46
46
  it 'failed on registration if provider class already registered' do
47
47
  expect { ProxyFetcher::Configuration.register_provider(:xroxy, Class.new) }
48
- .to raise_error(ProxyFetcher::ProvidersRegistry::RegisteredProvider)
48
+ .to raise_error(ProxyFetcher::Exceptions::RegisteredProvider)
49
49
  end
50
50
 
51
51
  it "failed on proxy list fetching if provider doesn't registered" do
52
52
  ProxyFetcher.config.provider = :not_existing_provider
53
53
 
54
54
  expect { ProxyFetcher::Manager.new }
55
- .to raise_error(ProxyFetcher::ProvidersRegistry::UnknownProvider)
55
+ .to raise_error(ProxyFetcher::Exceptions::UnknownProvider)
56
56
  end
57
57
  end
58
58
  end
@@ -0,0 +1,9 @@
1
+ require 'spec_helper'
2
+
3
+ describe ProxyFetcher::Providers::GatherProxy do
4
+ before :all do
5
+ ProxyFetcher.config.provider = :gather_proxy
6
+ end
7
+
8
+ it_behaves_like 'a manager'
9
+ end
@@ -0,0 +1,9 @@
1
+ require 'spec_helper'
2
+
3
+ describe ProxyFetcher::Providers::HTTPTunnel do
4
+ before :all do
5
+ ProxyFetcher.config.provider = :http_tunnel
6
+ end
7
+
8
+ it_behaves_like 'a manager'
9
+ end
@@ -1,8 +1,10 @@
1
+ require 'simplecov'
2
+ SimpleCov.add_filter 'spec'
3
+
1
4
  if ENV['CI'] || ENV['TRAVIS'] || ENV['COVERALLS'] || ENV['JENKINS_URL']
2
5
  require 'coveralls'
3
6
  Coveralls.wear!
4
7
  else
5
- require 'simplecov'
6
8
  SimpleCov.start
7
9
  end
8
10
 
@@ -0,0 +1,26 @@
1
+ require 'evil-proxy'
2
+
3
+ EvilProxy::HTTPProxyServer.class_eval do
4
+ def do_PUT(req, res)
5
+ perform_proxy_request(req, res) do |http, path, header|
6
+ http.put(path, req.body || '', header)
7
+ end
8
+ end
9
+
10
+ def do_DELETE(req, res)
11
+ perform_proxy_request(req, res) do |http, path, header|
12
+ http.delete(path, header)
13
+ end
14
+ end
15
+
16
+ def do_PATCH(req, res)
17
+ perform_proxy_request(req, res) do |http, path, header|
18
+ http.patch(path, req.body || '', header)
19
+ end
20
+ end
21
+
22
+ # This method is not needed for PUT but I added for completeness
23
+ def do_OPTIONS(_req, res)
24
+ res['allow'] = 'GET,HEAD,POST,OPTIONS,CONNECT,PUT,PATCH,DELETE'
25
+ end
26
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: proxy_fetcher
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Nikita Bulai
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-09-04 00:00:00.000000000 Z
11
+ date: 2017-09-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -63,13 +63,18 @@ files:
63
63
  - Rakefile
64
64
  - bin/proxy_fetcher
65
65
  - lib/proxy_fetcher.rb
66
+ - lib/proxy_fetcher/client/client.rb
67
+ - lib/proxy_fetcher/client/proxies_registry.rb
68
+ - lib/proxy_fetcher/client/request.rb
66
69
  - lib/proxy_fetcher/configuration.rb
67
70
  - lib/proxy_fetcher/configuration/providers_registry.rb
71
+ - lib/proxy_fetcher/exceptions.rb
68
72
  - lib/proxy_fetcher/manager.rb
69
73
  - lib/proxy_fetcher/providers/base.rb
70
74
  - lib/proxy_fetcher/providers/free_proxy_list.rb
71
75
  - lib/proxy_fetcher/providers/free_proxy_list_ssl.rb
72
- - lib/proxy_fetcher/providers/hide_my_name.rb
76
+ - lib/proxy_fetcher/providers/gather_proxy.rb
77
+ - lib/proxy_fetcher/providers/http_tunnel.rb
73
78
  - lib/proxy_fetcher/providers/proxy_docker.rb
74
79
  - lib/proxy_fetcher/providers/proxy_list.rb
75
80
  - lib/proxy_fetcher/providers/xroxy.rb
@@ -79,17 +84,20 @@ files:
79
84
  - lib/proxy_fetcher/utils/proxy_validator.rb
80
85
  - lib/proxy_fetcher/version.rb
81
86
  - proxy_fetcher.gemspec
87
+ - spec/proxy_fetcher/client_spec.rb
82
88
  - spec/proxy_fetcher/configuration_spec.rb
83
89
  - spec/proxy_fetcher/providers/base_spec.rb
84
90
  - spec/proxy_fetcher/providers/free_proxy_list_spec.rb
85
91
  - spec/proxy_fetcher/providers/free_proxy_list_ssl_spec.rb
86
- - spec/proxy_fetcher/providers/hide_my_name_spec.rb
92
+ - spec/proxy_fetcher/providers/gather_proxy_spec.rb
93
+ - spec/proxy_fetcher/providers/http_tunnel_spec.rb
87
94
  - spec/proxy_fetcher/providers/multiple_providers_spec.rb
88
95
  - spec/proxy_fetcher/providers/proxy_docker_spec.rb
89
96
  - spec/proxy_fetcher/providers/proxy_list_spec.rb
90
97
  - spec/proxy_fetcher/providers/xroxy_spec.rb
91
98
  - spec/proxy_fetcher/proxy_spec.rb
92
99
  - spec/spec_helper.rb
100
+ - spec/support/evil_proxy_patch.rb
93
101
  - spec/support/manager_examples.rb
94
102
  homepage: http://github.com/nbulaj/proxy_fetcher
95
103
  licenses:
@@ -1,35 +0,0 @@
1
- module ProxyFetcher
2
- module Providers
3
- class HideMyName < Base
4
- PROVIDER_URL = 'https://hidemy.name/en/proxy-list/'.freeze
5
-
6
- def load_proxy_list(filters = { type: 'hs' })
7
- doc = load_document(PROVIDER_URL, filters)
8
- doc.xpath('//table[@class="proxy__t"]/tbody/tr')
9
- end
10
-
11
- def to_proxy(html_element)
12
- ProxyFetcher::Proxy.new.tap do |proxy|
13
- proxy.addr = parse_element(html_element, 'td[1]')
14
- proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
15
- proxy.anonymity = parse_element(html_element, 'td[6]')
16
- proxy.country = parse_country(html_element)
17
- proxy.type = parse_element(html_element, 'td[5]')
18
- proxy.response_time = parse_response_time(html_element)
19
- end
20
- end
21
-
22
- private
23
-
24
- def parse_country(element)
25
- clear(element.at_xpath('*//span[1]/following-sibling::text()[1]').content)
26
- end
27
-
28
- def parse_response_time(element)
29
- convert_to_int(element.at_xpath('td[4]').content.strip[/\d+/])
30
- end
31
- end
32
-
33
- ProxyFetcher::Configuration.register_provider(:hide_my_name, HideMyName)
34
- end
35
- end
@@ -1,10 +0,0 @@
1
- require 'spec_helper'
2
-
3
- describe ProxyFetcher::Providers::HideMyName do
4
- before :all do
5
- ProxyFetcher.config.provider = :hide_my_name
6
- end
7
-
8
- # TODO: fix provider
9
- # it_behaves_like 'a manager'
10
- end