proxy_fetcher 0.2.5 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a0c631e566e3330cda9a53d1ecbaa38cc0c4e055
4
- data.tar.gz: e60250b46b1db6cc1f1d77efa8351b8b422c1f7e
3
+ metadata.gz: 2082cc216a388f9014cbdec5daa8a54ac1d93016
4
+ data.tar.gz: 6d0d817f9b1fecdbc3512440803a4444bcaf3c67
5
5
  SHA512:
6
- metadata.gz: a0a8e44ed225617a7110933054111824bfacbc7f32698bb465a51d85b1870a23ac0cf792f9d1af0128f4bba1eb489ee68288317807b8b631cef2900a4e8ae1e6
7
- data.tar.gz: 525adfd9031c16d5f709405c1eb70e0dac13e285709e59afb1d6c20755fe3c580dd011a5c773cc35defce5c3dbf565261b4aa23e3a74ffe987afa60733e985d5
6
+ metadata.gz: f3a33d56dd95b4c3a7755a8f76f780b702af03de5a546b78cf3bfa56fad96b877169d02ca4c4f83dcfdcee8ee7a1efd67983ead21580b0ea4bb94c78db92da3a
7
+ data.tar.gz: b0019c53bed440256de36853a9b9973f58bd2fd019e90ca1b8178c727ad64572a3a0b3fc4ded49c64d6f4134935b72f124ba4c8780f9ed01b3412bb91c953499
data/README.md CHANGED
@@ -2,6 +2,7 @@
2
2
  [![Gem Version](https://badge.fury.io/rb/proxy_fetcher.svg)](http://badge.fury.io/rb/proxy_fetcher)
3
3
  [![Build Status](https://travis-ci.org/nbulaj/proxy_fetcher.svg?branch=master)](https://travis-ci.org/nbulaj/proxy_fetcher)
4
4
  [![Coverage Status](https://coveralls.io/repos/github/nbulaj/proxy_fetcher/badge.svg)](https://coveralls.io/github/nbulaj/proxy_fetcher)
5
+ [![Code Climate](https://codeclimate.com/github/nbulaj/proxy_fetcher/badges/gpa.svg)](https://codeclimate.com/github/nbulaj/proxy_fetcher)
5
6
  [![License](http://img.shields.io/badge/license-MIT-brightgreen.svg)](#license)
6
7
 
7
8
  This gem can help your Ruby application to make HTTP(S) requests from proxy by fetching and validating actual
@@ -10,12 +11,15 @@ proxy lists from the different providers like [HideMyName](https://hidemy.name/e
10
11
  It gives you a `Manager` class that can load proxy list, validate it and return random or specific proxy entry. Take a look
11
12
  at the documentation below to find all the gem features.
12
13
 
14
+ Also this gem can be used as standalone solution for downloading and validating proxy lists from the different providers.
15
+ Checkout examples of usage below.
16
+
13
17
  ## Installation
14
18
 
15
19
  If using bundler, first add 'proxy_fetcher' to your Gemfile:
16
20
 
17
21
  ```ruby
18
- gem 'proxy_fetcher', '~> 0.2'
22
+ gem 'proxy_fetcher', '~> 0.3'
19
23
  ```
20
24
 
21
25
  or if you want to use the latest version (from `master` branch), then:
@@ -33,12 +37,14 @@ bundle install
33
37
  Otherwise simply install the gem:
34
38
 
35
39
  ```sh
36
- gem install proxy_fetcher -v '0.2'
40
+ gem install proxy_fetcher -v '0.3'
37
41
  ```
38
42
 
39
43
  ## Example of usage
40
44
 
41
- Get current proxy list:
45
+ ### In Ruby application
46
+
47
+ Get current proxy list without validation:
42
48
 
43
49
  ```ruby
44
50
  manager = ProxyFetcher::Manager.new # will immediately load proxy list from the server
@@ -48,7 +54,7 @@ manager.proxies
48
54
  # @response_time=5217, @speed=48, @type="HTTP", @anonymity="High">, ... ]
49
55
  ```
50
56
 
51
- You can initialize proxy manager without loading proxy list from the remote server by passing `refresh: false` on initialization:
57
+ You can initialize proxy manager without immediate load of proxy list from the remote server by passing `refresh: false` on initialization:
52
58
 
53
59
  ```ruby
54
60
  manager = ProxyFetcher::Manager.new(refresh: false) # just initialize class instance
@@ -57,7 +63,13 @@ manager.proxies
57
63
  #=> []
58
64
  ```
59
65
 
60
- Get raw proxy URLs:
66
+ If you wanna clean current proxy list from some dead servers that does not respond to the requests, than you can just call `cleanup!` method:
67
+
68
+ ```ruby
69
+ manager.cleanup! # or manager.validate!
70
+ ```
71
+
72
+ Get raw proxy URLs as Strings:
61
73
 
62
74
  ```ruby
63
75
  manager = ProxyFetcher::Manager.new
@@ -76,6 +88,58 @@ manager.refresh_list! # or manager.fetch!
76
88
  # @response_time=5217, @speed=48, @type="HTTP", @anonymity="High">, ... ]
77
89
  ```
78
90
 
91
+ If you need to filter proxy list, for example, by country or response time and selected provider supports filtering by GET params, then you
92
+ can pass your filters to the Manager instance like that:
93
+
94
+ ```ruby
95
+ ProxyFetcher.config.provider = :hide_my_name
96
+
97
+ manager = ProxyFetcher::Manager.new(filters: { country: 'AO', maxtime: '500' })
98
+ manager.proxies
99
+
100
+ # => [...]
101
+ ```
102
+
103
+ *NOTE*: not all the providers support filtering. Take a look at the provider class to see if it supports custom filters.
104
+
105
+ You can use two methods to get the first proxy from the list:
106
+
107
+ * `get` or aliased `pop` (will return first proxy and move it to the end of the list)
108
+ * `get!` or aliased `pop!` (will return first **connectable** proxy and move it to the end of the list; all the proxies till the working one will be removed)
109
+
110
+ Or you can get just random proxy by calling `manager.random_proxy` or it's alias `manager.random`.
111
+
112
+ ### Standalone
113
+
114
+ All you need to use this gem is Ruby >= 2.0 (2.3 is recommended). You can install it in a different ways. If you are using Ubuntu Xenial (16.04LTS)
115
+ then you already have Ruby 2.3 installed. In other cases you can install it with [RVM](https://rvm.io/) or [rbenv](https://github.com/rbenv/rbenv).
116
+
117
+ Just install the gem by running `gem install proxy_fetcher` in your terminal and run it:
118
+
119
+ ```bash
120
+ proxy_fetcher >> proxies.txt # Will download proxies, validate them and write to file
121
+ ```
122
+
123
+ If you need a list of proxies in JSON then pass `--json` argument to the command:
124
+
125
+ ```bash
126
+ proxy_fetcher --json
127
+
128
+ # Will print:
129
+ # {"proxies":["https://120.26.206.178:8888","https://119.61.13.242:1080","https://117.40.213.26:1080","https://92.62.72.242:1080",
130
+ # "https://58.20.41.172:1080","https://204.116.192.151:35923","https://190.5.96.58:1080","https://170.250.109.97:35923",
131
+ # "https://121.41.82.99:1080","https://77.53.105.155:35923"]}
132
+
133
+ ```
134
+
135
+ To get all the possible options run:
136
+
137
+ ```bash
138
+ proxy_fetcher --help
139
+ ```
140
+
141
+ ## Proxy object
142
+
79
143
  Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance variables):
80
144
 
81
145
  * `addr` (IP address)
@@ -84,7 +148,7 @@ Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance va
84
148
  * `response_time` (5217 for example)
85
149
  * `speed` (`:slow`, `:medium` or `:fast`. **Note:** depends on the proxy provider and can be `nil`)
86
150
  * `type` (URI schema, HTTP or HTTPS)
87
- * `anonimity` (Low or High +KA for example)
151
+ * `anonymity` (`Low`, `Elite proxy` or `High +KA` for example)
88
152
 
89
153
  Also you can call next instance methods for every Proxy object:
90
154
 
@@ -94,18 +158,7 @@ Also you can call next instance methods for every Proxy object:
94
158
  * `uri` (returns `URI::Generic` object)
95
159
  * `url` (returns a formatted URL like "_http://IP:PORT_" )
96
160
 
97
- You can use two methods to get the first proxy from the list:
98
-
99
- * `get` or aliased `pop` (will return first proxy and move it to the end of the list)
100
- * `get!` or aliased `pop!` (will return first **connectable** proxy and move it to the end of the list; all the proxies till the working one will be removed)
101
-
102
- If you wanna clear current proxy manager list from dead servers, you can just call `cleanup!` method:
103
-
104
- ```ruby
105
- manager.cleanup! # or manager.validate!
106
- ```
107
-
108
- You can sort or find any proxy by speed using next 3 instance methods:
161
+ You can sort or find any proxy by speed using next 3 instance methods (if it is available for the specific provider):
109
162
 
110
163
  * `fast?`
111
164
  * `medium?`
@@ -117,26 +170,27 @@ To change open/read timeout for `cleanup!` and `connectable?` methods you need t
117
170
 
118
171
  ```ruby
119
172
  ProxyFetcher.configure do |config|
120
- config.read_timeout = 1 # default is 3
121
- config.open_timeout = 1 # default is 3
173
+ config.connection_timeout = 1 # default is 3
122
174
  end
123
175
 
124
176
  manager = ProxyFetcher::Manager.new
125
177
  manager.cleanup!
126
178
  ```
127
179
 
128
- ProxyFetcher uses simple Ruby solution for dealing with HTTP requests - `net/http` library. If you wanna add, for example, your custom provider that
129
- was developed as a Single Page Application (SPA) with some JavaScript, then you will need something like []selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb)
180
+ ProxyFetcher uses simple Ruby solution for dealing with HTTP(S) requests - `net/http` library from the stdlib. If you wanna add, for example, your custom provider that
181
+ was developed as a Single Page Application (SPA) with some JavaScript, then you will need something like [selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb)
130
182
  to properly load the content of the website. For those and other cases you can write your own class for fetching HTML content by the URL and setup it
131
183
  in the ProxyFetcher config:
132
184
 
133
185
  ```ruby
134
186
  class MyHTTPClient
135
- class << self
136
- # [IMPORTANT]: self.fetch method is required!
137
- def fetch(url)
138
- # ... some magic to return proper HTML ...
139
- end
187
+ # [IMPORTANT]: below methods are required!
188
+ def self.fetch(url)
189
+ # ... some magic to return proper HTML ...
190
+ end
191
+
192
+ def self.connectable?(url)
193
+ # ... some magic to check if url is connectable ...
140
194
  end
141
195
  end
142
196
 
@@ -149,14 +203,17 @@ manager.proxies
149
203
  # @response_time=5217, @speed=48, @type="HTTP", @anonymity="High">, ... ]
150
204
  ```
151
205
 
206
+ You can take a look at the [lib/proxy_fetcher/utils/http_client.rb](lib/proxy_fetcher/utils/http_client.rb) for an example.
207
+
152
208
  ## Providers
153
209
 
154
210
  Currently ProxyFetcher can deal with next proxy providers (services):
155
211
 
156
212
  * Hide My Name (default one)
157
213
  * Free Proxy List
158
- * SSL Proxies
214
+ * Free SSL Proxies
159
215
  * Proxy Docker
216
+ * Proxy List
160
217
  * XRoxy
161
218
 
162
219
  If you wanna use one of them just setup required in the config:
@@ -176,14 +233,8 @@ Also you can write your own provider. All you need is to create a class, that wo
176
233
  ProxyFetcher::Configuration.register_provider(:your_provider, YourProviderClass)
177
234
  ```
178
235
 
179
- Provider class must implement `self.load_proxy_list` and `#parse!(html_entry)` methods that will load and parse
180
- provider HTML page with proxy list. Take a look at the samples in the `proxy_fetcher/providers` directory.
181
-
182
- ## TODO
183
-
184
- * Add proxy filters
185
- * Code refactoring
186
- * Rewrite specs
236
+ Provider class must implement `self.load_proxy_list` and `#to_proxy(html_element)` methods that will load and parse
237
+ provider HTML page with proxy list. Take a look at the existing providers in the [lib/proxy_fetcher/providers](lib/proxy_fetcher/providers) directory.
187
238
 
188
239
  ## Contributing
189
240
 
@@ -206,8 +257,6 @@ Thanks.
206
257
 
207
258
  ## License
208
259
 
209
- proxy_fetcher gem is released under the [MIT License](http://www.opensource.org/licenses/MIT).
260
+ `proxy_fetcher` gem is released under the [MIT License](http://www.opensource.org/licenses/MIT).
210
261
 
211
262
  Copyright (c) 2017 Nikita Bulai (bulajnikita@gmail.com).
212
-
213
- Some parser code (c) [pifleo](https://gist.github.com/pifleo/3889803)
data/bin/proxy_fetcher ADDED
@@ -0,0 +1,57 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'optparse'
4
+ require 'proxy_fetcher'
5
+
6
+ options = {
7
+ validate: true,
8
+ json: false
9
+ }
10
+
11
+ OptionParser.new do |opts|
12
+ opts.banner = 'Usage: proxy_fetcher [OPTIONS]'
13
+
14
+ opts.on('-h', '--help', '# Show this help message and quit') do
15
+ puts opts
16
+ exit(0)
17
+ end
18
+
19
+ opts.on('-p', '--provider=NAME', '# Use specific proxy provider') do |value|
20
+ provider_name = value.downcase
21
+
22
+ unless ProxyFetcher::Configuration.providers.include?(provider_name.to_sym)
23
+ possible_providers = ProxyFetcher::Configuration.providers.keys
24
+
25
+ puts "Unknown provider - '#{value}'.\nUse one of the following: #{possible_providers.join(', ')}."
26
+ exit(0)
27
+ end
28
+
29
+ options[:provider] = provider_name
30
+ end
31
+
32
+ opts.on('-n', '--no-validate', '# Dump all the proxies without validation') do
33
+ options[:validate] = false
34
+ end
35
+
36
+ opts.on('-t', '--timeout=SECONDS', Integer, '# Connection timeout in seconds') do |value|
37
+ options[:timeout] = value
38
+ end
39
+
40
+ opts.on('-j', '--json', '# Dump proxies to the JSON format') do
41
+ options[:json] = true
42
+ end
43
+ end.parse!
44
+
45
+ ProxyFetcher.config.provider = options[:provider] if options[:provider]
46
+ ProxyFetcher.config.connection_timeout = options[:timeout] if options[:timeout]
47
+
48
+ manager = ProxyFetcher::Manager.new
49
+ manager.validate! if options[:validate]
50
+
51
+ if options[:json]
52
+ require 'json'
53
+
54
+ puts JSON.generate(proxies: manager.raw_proxies)
55
+ else
56
+ puts manager.raw_proxies
57
+ end
data/lib/proxy_fetcher.rb CHANGED
@@ -1,16 +1,22 @@
1
1
  require 'uri'
2
2
  require 'net/http'
3
+ require 'openssl'
3
4
  require 'nokogiri'
5
+ require 'ostruct'
4
6
 
5
7
  require 'proxy_fetcher/configuration'
6
8
  require 'proxy_fetcher/proxy'
7
9
  require 'proxy_fetcher/manager'
8
- require 'proxy_fetcher/utils/http_fetcher'
10
+
11
+ require 'proxy_fetcher/utils/http_client'
12
+ require 'proxy_fetcher/utils/html'
13
+
9
14
  require 'proxy_fetcher/providers/base'
10
15
  require 'proxy_fetcher/providers/free_proxy_list'
11
16
  require 'proxy_fetcher/providers/free_proxy_list_ssl'
12
17
  require 'proxy_fetcher/providers/hide_my_name'
13
18
  require 'proxy_fetcher/providers/proxy_docker'
19
+ require 'proxy_fetcher/providers/proxy_list'
14
20
  require 'proxy_fetcher/providers/xroxy'
15
21
 
16
22
  module ProxyFetcher
@@ -2,9 +2,10 @@ module ProxyFetcher
2
2
  class Configuration
3
3
  UnknownProvider = Class.new(StandardError)
4
4
  RegisteredProvider = Class.new(StandardError)
5
+ WrongHttpClient = Class.new(StandardError)
5
6
 
6
- attr_accessor :open_timeout, :read_timeout, :provider
7
- attr_accessor :http_client
7
+ attr_accessor :http_client, :connection_timeout
8
+ attr_accessor :provider
8
9
 
9
10
  class << self
10
11
  def providers
@@ -12,15 +13,18 @@ module ProxyFetcher
12
13
  end
13
14
 
14
15
  def register_provider(name, klass)
15
- raise RegisteredProvider, "#{name} provider already registered!" if providers.key?(name.to_sym)
16
+ raise RegisteredProvider, "`#{name}` provider already registered!" if providers.key?(name.to_sym)
16
17
 
17
18
  providers[name.to_sym] = klass
18
19
  end
19
20
  end
20
21
 
21
22
  def initialize
22
- @open_timeout = 3
23
- @read_timeout = 3
23
+ reset!
24
+ end
25
+
26
+ def reset!
27
+ @connection_timeout = 3
24
28
  @http_client = HTTPClient
25
29
 
26
30
  self.provider = :hide_my_name # currently default one
@@ -29,7 +33,15 @@ module ProxyFetcher
29
33
  def provider=(name)
30
34
  @provider = self.class.providers[name.to_sym]
31
35
 
32
- raise UnknownProvider, "unregistered proxy provider (#{name})!" if @provider.nil?
36
+ raise UnknownProvider, "unregistered proxy provider `#{name}`!" if @provider.nil?
37
+ end
38
+
39
+ def http_client=(klass)
40
+ unless klass.respond_to?(:fetch, :connectable?)
41
+ raise WrongHttpClient, "#{klass} must respond to #fetch and #connectable? class methods!"
42
+ end
43
+
44
+ @http_client = klass
33
45
  end
34
46
  end
35
47
  end
@@ -1,10 +1,12 @@
1
1
  module ProxyFetcher
2
2
  class Manager
3
- attr_reader :proxies
3
+ attr_reader :proxies, :filters
4
4
 
5
5
  # refresh: true - load proxy list from the remote server on initialization
6
6
  # refresh: false - just initialize the class, proxy list will be empty ([])
7
- def initialize(refresh: true)
7
+ def initialize(refresh: true, filters: {})
8
+ @filters = filters
9
+
8
10
  if refresh
9
11
  refresh_list!
10
12
  else
@@ -14,8 +16,7 @@ module ProxyFetcher
14
16
 
15
17
  # Update current proxy list from the provider
16
18
  def refresh_list!
17
- rows = ProxyFetcher.config.provider.load_proxy_list
18
- @proxies = rows.map { |row| Proxy.new(row) }
19
+ @proxies = ProxyFetcher.config.provider.fetch_proxies!(filters)
19
20
  end
20
21
 
21
22
  alias fetch! refresh_list!
@@ -56,10 +57,12 @@ module ProxyFetcher
56
57
  alias validate! cleanup!
57
58
 
58
59
  # Return random proxy
59
- def random
60
+ def random_proxy
60
61
  proxies.sample
61
62
  end
62
63
 
64
+ alias random random_proxy
65
+
63
66
  # Returns array of proxy URLs (just schema + host + port)
64
67
  def raw_proxies
65
68
  proxies.map(&:url)
@@ -1,25 +1,52 @@
1
+ require 'forwardable'
2
+
1
3
  module ProxyFetcher
2
4
  module Providers
3
5
  class Base
4
- attr_reader :proxy
6
+ extend Forwardable
5
7
 
6
- def initialize(proxy_instance)
7
- @proxy = proxy_instance
8
- end
8
+ def_delegators ProxyFetcher::HTML, :clear, :convert_to_int
9
+
10
+ PROXY_TYPES = [
11
+ HTTP = 'HTTP'.freeze,
12
+ HTTPS = 'HTTPS'.freeze
13
+ ].freeze
9
14
 
10
- def set!(name, value)
11
- @proxy.instance_variable_set(:"@#{name}", value)
15
+ attr_reader :proxy
16
+
17
+ def fetch_proxies!(filters = {})
18
+ load_proxy_list(filters).map { |html| to_proxy(html) }
12
19
  end
13
20
 
14
21
  class << self
15
- def parse_entry(entry, proxy_instance)
16
- new(proxy_instance).parse!(entry)
22
+ def fetch_proxies!(filters = {})
23
+ new.fetch_proxies!(filters)
17
24
  end
25
+ end
18
26
 
19
- # Get HTML from the requested URL
20
- def load_html(url)
21
- ProxyFetcher.config.http_client.fetch(url)
22
- end
27
+ protected
28
+
29
+ # Loads HTML document with Nokogiri by the URL combined with custom filters
30
+ def load_document(url, filters = {})
31
+ uri = URI.parse(url)
32
+ uri.query = URI.encode_www_form(filters) if filters.any?
33
+
34
+ Nokogiri::HTML(ProxyFetcher.config.http_client.fetch(uri.to_s))
35
+ end
36
+
37
+ # Get HTML elements with proxy info
38
+ def load_proxy_list(*)
39
+ raise NotImplementedError, "#{__method__} must be implemented in a descendant class!"
40
+ end
41
+
42
+ # Convert HTML element with proxy info to ProxyFetcher::Proxy instance
43
+ def to_proxy(*)
44
+ raise NotImplementedError, "#{__method__} must be implemented in a descendant class!"
45
+ end
46
+
47
+ # Return normalized HTML element content by selector
48
+ def parse_element(element, selector, method = :at_xpath)
49
+ clear(element.public_send(method, selector).content)
23
50
  end
24
51
  end
25
52
  end
@@ -3,42 +3,27 @@ module ProxyFetcher
3
3
  class FreeProxyList < Base
4
4
  PROVIDER_URL = 'https://free-proxy-list.net/'.freeze
5
5
 
6
- class << self
7
- def load_proxy_list
8
- doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
- doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
10
- end
6
+ # [NOTE] Doesn't support filtering
7
+ def load_proxy_list(*)
8
+ doc = load_document(PROVIDER_URL, {})
9
+ doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
11
10
  end
12
11
 
13
- def parse!(html_entry)
14
- html_entry.xpath('td').each_with_index do |td, index|
15
- case index
16
- when 0
17
- set!(:addr, td.content.strip)
18
- when 1 then
19
- set!(:port, Integer(td.content.strip))
20
- when 3 then
21
- set!(:country, td.content.strip)
22
- when 4
23
- set!(:anonymity, td.content.strip)
24
- when 6
25
- set!(:type, parse_type(td))
26
- else
27
- # nothing
28
- end
12
+ def to_proxy(html_element)
13
+ ProxyFetcher::Proxy.new.tap do |proxy|
14
+ proxy.addr = parse_element(html_element, 'td[1]')
15
+ proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
16
+ proxy.country = parse_element(html_element, 'td[4]')
17
+ proxy.anonymity = parse_element(html_element, 'td[5]')
18
+ proxy.type = parse_type(html_element)
29
19
  end
30
20
  end
31
21
 
32
22
  private
33
23
 
34
- def parse_type(td)
35
- type = td.content.strip
36
-
37
- if type && type.downcase.include?('yes')
38
- 'HTTPS'
39
- else
40
- 'HTTP'
41
- end
24
+ def parse_type(element)
25
+ type = parse_element(element, 'td[6]')
26
+ type && type.casecmp('yes').zero? ? HTTPS : HTTP
42
27
  end
43
28
  end
44
29
 
@@ -3,29 +3,19 @@ module ProxyFetcher
3
3
  class FreeProxyListSSL < Base
4
4
  PROVIDER_URL = 'https://www.sslproxies.org/'.freeze
5
5
 
6
- class << self
7
- def load_proxy_list
8
- doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
- doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
10
- end
6
+ # [NOTE] Doesn't support filtering
7
+ def load_proxy_list(*)
8
+ doc = load_document(PROVIDER_URL, {})
9
+ doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
11
10
  end
12
11
 
13
- def parse!(html_entry)
14
- html_entry.xpath('td').each_with_index do |td, index|
15
- case index
16
- when 0
17
- set!(:addr, td.content.strip)
18
- when 1 then
19
- set!(:port, Integer(td.content.strip))
20
- when 3 then
21
- set!(:country, td.content.strip)
22
- when 4
23
- set!(:anonymity, td.content.strip)
24
- when 6
25
- set!(:type, 'HTTPS')
26
- else
27
- # nothing
28
- end
12
+ def to_proxy(html_element)
13
+ ProxyFetcher::Proxy.new.tap do |proxy|
14
+ proxy.addr = parse_element(html_element, 'td[1]')
15
+ proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
16
+ proxy.country = parse_element(html_element, 'td[4]')
17
+ proxy.anonymity = parse_element(html_element, 'td[5]')
18
+ proxy.type = HTTPS
29
19
  end
30
20
  end
31
21
  end
@@ -1,51 +1,49 @@
1
1
  module ProxyFetcher
2
2
  module Providers
3
3
  class HideMyName < Base
4
- PROVIDER_URL = 'https://hidemy.name/en/proxy-list/?type=hs'.freeze
4
+ PROVIDER_URL = 'https://hidemy.name/en/proxy-list/'.freeze
5
5
 
6
- class << self
7
- def load_proxy_list
8
- doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
- doc.xpath('//table[@class="proxy__t"]/tbody/tr')
10
- end
6
+ def load_proxy_list(filters = { type: 'hs' })
7
+ doc = load_document(PROVIDER_URL, filters)
8
+ doc.xpath('//table[@class="proxy__t"]/tbody/tr')
11
9
  end
12
10
 
13
- def parse!(html_entry)
14
- html_entry.xpath('td').each_with_index do |td, index|
15
- case index
16
- when 0
17
- set!(:addr, td.content.strip)
18
- when 1 then
19
- set!(:port, Integer(td.content.strip))
20
- when 2 then
21
- set!(:country, td.at_xpath('*//span[1]/following-sibling::text()[1]').content.strip)
22
- when 3
23
- response_time = Integer(td.at('p').content.strip[/\d+/])
24
-
25
- set!(:response_time, response_time)
26
- set!(:speed, speed_from_response_time(response_time))
27
- when 4
28
- set!(:type, parse_type(td))
29
- when 5
30
- set!(:anonymity, td.content.strip)
31
- else
32
- # nothing
33
- end
11
+ def to_proxy(html_element)
12
+ ProxyFetcher::Proxy.new.tap do |proxy|
13
+ proxy.addr = parse_element(html_element, 'td[1]')
14
+ proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
15
+ proxy.anonymity = parse_element(html_element, 'td[6]')
16
+
17
+ proxy.country = parse_country(html_element)
18
+ proxy.type = parse_type(html_element)
19
+
20
+ response_time = parse_response_time(html_element)
21
+
22
+ proxy.response_time = response_time
23
+ proxy.speed = speed_from_response_time(response_time)
34
24
  end
35
25
  end
36
26
 
37
27
  private
38
28
 
39
- def parse_type(td)
40
- schemas = td.content.strip
29
+ def parse_country(element)
30
+ clear(element.at_xpath('*//span[1]/following-sibling::text()[1]').content)
31
+ end
32
+
33
+ def parse_type(element)
34
+ schemas = parse_element(element, 'td[5]')
41
35
 
42
36
  if schemas && schemas.downcase.include?('https')
43
- 'HTTPS'
37
+ HTTPS
44
38
  else
45
- 'HTTP'
39
+ HTTP
46
40
  end
47
41
  end
48
42
 
43
+ def parse_response_time(element)
44
+ convert_to_int(element.at_xpath('td[4]').content.strip[/\d+/])
45
+ end
46
+
49
47
  def speed_from_response_time(response_time)
50
48
  if response_time < 1500
51
49
  :fast
@@ -3,30 +3,21 @@ module ProxyFetcher
3
3
  class ProxyDocker < Base
4
4
  PROVIDER_URL = 'https://www.proxydocker.com/en'.freeze
5
5
 
6
- class << self
7
- def load_proxy_list
8
- doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
- doc.xpath('//table[contains(@class, "table")]/tr[(not(@id="proxy-table-header")) and (count(td)>2)]')
10
- end
6
+ # [NOTE] Doesn't support direct filters
7
+ def load_proxy_list(*)
8
+ doc = load_document(PROVIDER_URL, {})
9
+ doc.xpath('//table[contains(@class, "table")]/tr[(not(@id="proxy-table-header")) and (count(td)>2)]')
11
10
  end
12
11
 
13
- def parse!(html_entry)
14
- html_entry.xpath('td').each_with_index do |td, index|
15
- case index
16
- when 0
17
- uri = URI("//#{td.content.strip}")
12
+ def to_proxy(html_element)
13
+ ProxyFetcher::Proxy.new.tap do |proxy|
14
+ uri = URI("//#{parse_element(html_element, 'td[1]')}")
15
+ proxy.addr = uri.host
16
+ proxy.port = uri.port
18
17
 
19
- set!(:addr, uri.host)
20
- set!(:port, uri.port)
21
- when 1
22
- set!(:type, td.content.strip)
23
- when 2
24
- set!(:anonymity, td.content.strip)
25
- when 4 then
26
- set!(:country, td.content.strip)
27
- else
28
- # nothing
29
- end
18
+ proxy.type = parse_element(html_element, 'td[2]')
19
+ proxy.anonymity = parse_element(html_element, 'td[3]')
20
+ proxy.country = parse_element(html_element, 'td[5]')
30
21
  end
31
22
  end
32
23
  end
@@ -0,0 +1,35 @@
1
+ require 'base64'
2
+
3
+ module ProxyFetcher
4
+ module Providers
5
+ class ProxyList < Base
6
+ PROVIDER_URL = 'https://proxy-list.org/english/index.php'.freeze
7
+
8
+ def load_proxy_list(filters = {})
9
+ doc = load_document(PROVIDER_URL, filters)
10
+ doc.css('.table-wrap .table ul')
11
+ end
12
+
13
+ def to_proxy(html_element)
14
+ ProxyFetcher::Proxy.new.tap do |proxy|
15
+ uri = parse_proxy_uri(html_element)
16
+ proxy.addr = uri.host
17
+ proxy.port = uri.port
18
+
19
+ proxy.type = parse_element(html_element, 'li[2]')
20
+ proxy.anonymity = parse_element(html_element, 'li[4]')
21
+ proxy.country = clear(html_element.at_xpath("li[5]//span[@class='country']").attr('title'))
22
+ end
23
+ end
24
+
25
+ private
26
+
27
+ def parse_proxy_uri(element)
28
+ full_addr = ::Base64.decode64(element.at('li script').inner_html.match(/'(.+)'/)[1])
29
+ URI.parse("http://#{full_addr}")
30
+ end
31
+ end
32
+
33
+ ProxyFetcher::Configuration.register_provider(:proxy_list, ProxyList)
34
+ end
35
+ end
@@ -1,34 +1,21 @@
1
1
  module ProxyFetcher
2
2
  module Providers
3
3
  class XRoxy < Base
4
- PROVIDER_URL = 'http://www.xroxy.com/proxylist.php?port=&type=All_http'.freeze
4
+ PROVIDER_URL = 'http://www.xroxy.com/proxylist.php'.freeze
5
5
 
6
- class << self
7
- def load_proxy_list
8
- doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
- doc.xpath('//div[@id="content"]/table[1]/tr[contains(@class, "row")]')
10
- end
6
+ def load_proxy_list(filters = { type: 'All_http' })
7
+ doc = load_document(PROVIDER_URL, filters)
8
+ doc.xpath('//div[@id="content"]/table[1]/tr[contains(@class, "row")]')
11
9
  end
12
10
 
13
- def parse!(html_entry)
14
- html_entry.xpath('td').each_with_index do |td, index|
15
- case index
16
- when 1
17
- set!(:addr, td.content.strip)
18
- when 2
19
- set!(:port, Integer(td.content.strip))
20
- when 3
21
- set!(:anonymity, td.content.strip)
22
- when 4
23
- ssl = td.content.strip.downcase
24
- set!(:type, ssl.include?('true') ? 'HTTPS' : 'HTTP')
25
- when 5 then
26
- set!(:country, td.content.strip)
27
- when 6
28
- set!(:response_time, Integer(td.content.strip))
29
- else
30
- # nothing
31
- end
11
+ def to_proxy(html_element)
12
+ ProxyFetcher::Proxy.new.tap do |proxy|
13
+ proxy.addr = parse_element(html_element, 'td[2]')
14
+ proxy.port = convert_to_int(parse_element(html_element, 'td[3]'))
15
+ proxy.anonymity = parse_element(html_element, 'td[4]')
16
+ proxy.type = parse_element(html_element, 'td[5]').casecmp('true').zero? ? HTTPS : HTTP
17
+ proxy.country = parse_element(html_element, 'td[6]')
18
+ proxy.response_time = convert_to_int(parse_element(html_element, 'td[7]'))
32
19
  end
33
20
  end
34
21
  end
@@ -1,24 +1,7 @@
1
1
  module ProxyFetcher
2
- class Proxy
3
- attr_reader :addr, :port, :country, :response_time, :speed, :type, :anonymity
4
-
5
- def initialize(html_row)
6
- ProxyFetcher.config.provider.parse_entry(html_row, self)
7
-
8
- self
9
- end
10
-
2
+ class Proxy < OpenStruct
11
3
  def connectable?
12
- connection = Net::HTTP.new(addr, port)
13
- connection.use_ssl = true if https?
14
- connection.open_timeout = ProxyFetcher.config.open_timeout
15
- connection.read_timeout = ProxyFetcher.config.read_timeout
16
-
17
- connection.start { |http| return true if http.request_head('/') }
18
-
19
- false
20
- rescue Timeout::Error, Errno::ECONNREFUSED, Errno::ECONNRESET, Errno::ECONNABORTED
21
- false
4
+ ProxyFetcher.config.http_client.connectable?(url)
22
5
  end
23
6
 
24
7
  alias valid? connectable?
@@ -0,0 +1,15 @@
1
+ module ProxyFetcher
2
+ class HTML
3
+ class << self
4
+ def clear(text)
5
+ return if text.nil? || text.empty?
6
+
7
+ text.strip.gsub(/[ \t]/i, '')
8
+ end
9
+
10
+ def convert_to_int(text)
11
+ Integer(clear(text))
12
+ end
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,46 @@
1
+ module ProxyFetcher
2
+ class HTTPClient
3
+ attr_reader :uri, :http
4
+
5
+ def initialize(url)
6
+ @uri = URI.parse(url)
7
+ @http = Net::HTTP.new(@uri.host, @uri.port)
8
+ return unless https?
9
+
10
+ @http.use_ssl = true
11
+ @http.verify_mode = OpenSSL::SSL::VERIFY_NONE
12
+ end
13
+
14
+ def fetch
15
+ request = Net::HTTP::Get.new(@uri.to_s)
16
+ request['Connection'] = 'keep-alive'
17
+ response = @http.request(request)
18
+ response.body
19
+ end
20
+
21
+ def connectable?
22
+ @http.open_timeout = ProxyFetcher.config.connection_timeout
23
+ @http.read_timeout = ProxyFetcher.config.connection_timeout
24
+
25
+ @http.start { |connection| return true if connection.request_head('/') }
26
+
27
+ false
28
+ rescue StandardError
29
+ false
30
+ end
31
+
32
+ def https?
33
+ @uri.scheme.casecmp('https').zero?
34
+ end
35
+
36
+ class << self
37
+ def fetch(url)
38
+ new(url).fetch
39
+ end
40
+
41
+ def connectable?(url)
42
+ new(url).connectable?
43
+ end
44
+ end
45
+ end
46
+ end
@@ -7,9 +7,9 @@ module ProxyFetcher
7
7
  # Major version number
8
8
  MAJOR = 0
9
9
  # Minor version number
10
- MINOR = 2
10
+ MINOR = 3
11
11
  # Smallest version number
12
- TINY = 5
12
+ TINY = 0
13
13
 
14
14
  # Full version number
15
15
  STRING = [MAJOR, MINOR, TINY].compact.join('.')
@@ -5,14 +5,16 @@ require 'proxy_fetcher/version'
5
5
  Gem::Specification.new do |gem|
6
6
  gem.name = 'proxy_fetcher'
7
7
  gem.version = ProxyFetcher.gem_version
8
- gem.date = '2017-08-17'
9
- gem.summary = 'Ruby gem for dealing with proxy lists '
8
+ gem.date = '2017-08-21'
9
+ gem.summary = 'Ruby gem for dealing with proxy lists from different providers'
10
10
  gem.description = 'This gem can help your Ruby application to make HTTP(S) requests ' \
11
11
  'from proxy server by fetching and validating proxy lists from the different providers.'
12
12
  gem.authors = ['Nikita Bulai']
13
13
  gem.email = 'bulajnikita@gmail.com'
14
14
  gem.require_paths = ['lib']
15
+ gem.bindir = 'bin'
15
16
  gem.files = `git ls-files`.split($RS)
17
+ gem.executables = `git ls-files -- bin/*`.split("\n").map { |f| File.basename(f) }
16
18
  gem.homepage = 'http://github.com/nbulaj/proxy_fetcher'
17
19
  gem.license = 'MIT'
18
20
  gem.required_ruby_version = '>= 2.2.2'
@@ -0,0 +1,48 @@
1
+ require 'spec_helper'
2
+
3
+ describe ProxyFetcher::Configuration do
4
+ before { ProxyFetcher.config.reset! }
5
+ after { ProxyFetcher.config.reset! }
6
+
7
+ context 'custom HTTP client' do
8
+ it 'successfully setups if class has all the required methods' do
9
+ class MyHTTPClient
10
+ def self.fetch(url)
11
+ url
12
+ end
13
+
14
+ def self.connectable?(*)
15
+ true
16
+ end
17
+ end
18
+
19
+ expect { ProxyFetcher.config.http_client = MyHTTPClient }.not_to raise_error
20
+ end
21
+
22
+ it 'failed on setup if required methods are missing' do
23
+ MyWrongHTTPClient = Class.new
24
+
25
+ expect { ProxyFetcher.config.http_client = MyWrongHTTPClient }
26
+ .to raise_error(ProxyFetcher::Configuration::WrongHttpClient)
27
+ end
28
+ end
29
+
30
+ context 'custom provider' do
31
+ it 'successfully setups if provider class registered' do
32
+ CustomProvider = Class.new(ProxyFetcher::Providers::Base)
33
+ ProxyFetcher::Configuration.register_provider(:custom_provider, CustomProvider)
34
+
35
+ expect { ProxyFetcher.config.provider = :custom_provider }.not_to raise_error
36
+ end
37
+
38
+ it 'failed on setup if provider class is not registered' do
39
+ expect { ProxyFetcher.config.provider = :unexisting_provider }
40
+ .to raise_error(ProxyFetcher::Configuration::UnknownProvider)
41
+ end
42
+
43
+ it 'failed on setup if provider class already registered' do
44
+ expect { ProxyFetcher::Configuration.register_provider(:xroxy, Class.new)}
45
+ .to raise_error(ProxyFetcher::Configuration::RegisteredProvider)
46
+ end
47
+ end
48
+ end
@@ -0,0 +1,28 @@
1
+ require 'spec_helper'
2
+
3
+ describe ProxyFetcher::Providers::Base do
4
+ before { ProxyFetcher.config.reset! }
5
+ after { ProxyFetcher.config.reset! }
6
+
7
+ it 'does not allows to use not implemented methods' do
8
+ NotImplementedCustomProvider = Class.new(ProxyFetcher::Providers::Base)
9
+
10
+ ProxyFetcher::Configuration.register_provider(:provider_without_methods, NotImplementedCustomProvider)
11
+ ProxyFetcher.config.provider = :provider_without_methods
12
+
13
+ expect { ProxyFetcher::Manager.new }.to raise_error(NotImplementedError) do |error|
14
+ expect(error.message).to include('load_proxy_list')
15
+ end
16
+
17
+ # implement one of the methods
18
+ NotImplementedCustomProvider.class_eval do
19
+ def load_proxy_list(*)
20
+ [1, 2, 3]
21
+ end
22
+ end
23
+
24
+ expect { ProxyFetcher::Manager.new }.to raise_error(NotImplementedError) do |error|
25
+ expect(error.message).to include('to_proxy')
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,9 @@
1
+ require 'spec_helper'
2
+
3
+ describe ProxyFetcher::Providers::ProxyList do
4
+ before :all do
5
+ ProxyFetcher.config.provider = :proxy_list
6
+ end
7
+
8
+ it_behaves_like 'a manager'
9
+ end
@@ -9,24 +9,24 @@ describe ProxyFetcher::Proxy do
9
9
  @manager = ProxyFetcher::Manager.new
10
10
  end
11
11
 
12
- let(:proxy) { @manager.proxies.first }
12
+ let(:proxy) { @manager.proxies.first.dup }
13
13
 
14
14
  it 'checks schema' do
15
- proxy.instance_variable_set(:@type, 'HTTP')
15
+ proxy.type = ProxyFetcher::Providers::Base::HTTP
16
16
  expect(proxy.http?).to be_truthy
17
17
  expect(proxy.https?).to be_falsey
18
18
 
19
- proxy.instance_variable_set(:@type, 'HTTPS')
19
+ proxy.type = ProxyFetcher::Providers::Base::HTTPS
20
20
  expect(proxy.https?).to be_truthy
21
21
  expect(proxy.http?).to be_falsey
22
22
  end
23
23
 
24
24
  it 'not connectable if IP addr is wrong' do
25
- allow_any_instance_of(ProxyFetcher::Proxy).to receive(:addr).and_return('192.168.1.1')
25
+ proxy.addr = '192.168.1.0'
26
26
  expect(proxy.connectable?).to be_falsey
27
27
  end
28
28
 
29
- it 'not connectable if ERR' do
29
+ it 'not connectable if there are some error during connection request' do
30
30
  allow_any_instance_of(Net::HTTP).to receive(:start).and_raise(Errno::ECONNABORTED)
31
31
  expect(proxy.connectable?).to be_falsey
32
32
  end
@@ -46,13 +46,13 @@ describe ProxyFetcher::Proxy do
46
46
  end
47
47
 
48
48
  it 'checks speed' do
49
- proxy.instance_variable_set(:@speed, :fast)
49
+ proxy.speed = :fast
50
50
  expect(proxy.fast?).to be_truthy
51
51
 
52
- proxy.instance_variable_set(:@speed, :slow)
52
+ proxy.speed = :slow
53
53
  expect(proxy.slow?).to be_truthy
54
54
 
55
- proxy.instance_variable_set(:@speed, :medium)
55
+ proxy.speed = :medium
56
56
  expect(proxy.medium?).to be_truthy
57
57
  end
58
58
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: proxy_fetcher
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.5
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Nikita Bulai
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-08-17 00:00:00.000000000 Z
11
+ date: 2017-08-21 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -47,7 +47,8 @@ dependencies:
47
47
  description: This gem can help your Ruby application to make HTTP(S) requests from
48
48
  proxy server by fetching and validating proxy lists from the different providers.
49
49
  email: bulajnikita@gmail.com
50
- executables: []
50
+ executables:
51
+ - proxy_fetcher
51
52
  extensions: []
52
53
  extra_rdoc_files: []
53
54
  files:
@@ -58,6 +59,7 @@ files:
58
59
  - LICENSE
59
60
  - README.md
60
61
  - Rakefile
62
+ - bin/proxy_fetcher
61
63
  - lib/proxy_fetcher.rb
62
64
  - lib/proxy_fetcher/configuration.rb
63
65
  - lib/proxy_fetcher/manager.rb
@@ -66,15 +68,20 @@ files:
66
68
  - lib/proxy_fetcher/providers/free_proxy_list_ssl.rb
67
69
  - lib/proxy_fetcher/providers/hide_my_name.rb
68
70
  - lib/proxy_fetcher/providers/proxy_docker.rb
71
+ - lib/proxy_fetcher/providers/proxy_list.rb
69
72
  - lib/proxy_fetcher/providers/xroxy.rb
70
73
  - lib/proxy_fetcher/proxy.rb
71
- - lib/proxy_fetcher/utils/http_fetcher.rb
74
+ - lib/proxy_fetcher/utils/html.rb
75
+ - lib/proxy_fetcher/utils/http_client.rb
72
76
  - lib/proxy_fetcher/version.rb
73
77
  - proxy_fetcher.gemspec
78
+ - spec/proxy_fetcher/configuration_spec.rb
79
+ - spec/proxy_fetcher/providers/base_spec.rb
74
80
  - spec/proxy_fetcher/providers/free_proxy_list_spec.rb
75
81
  - spec/proxy_fetcher/providers/free_proxy_list_ssl_spec.rb
76
82
  - spec/proxy_fetcher/providers/hide_my_name_spec.rb
77
83
  - spec/proxy_fetcher/providers/proxy_docker_spec.rb
84
+ - spec/proxy_fetcher/providers/proxy_list_spec.rb
78
85
  - spec/proxy_fetcher/providers/xroxy_spec.rb
79
86
  - spec/proxy_fetcher/proxy_spec.rb
80
87
  - spec/spec_helper.rb
@@ -102,5 +109,5 @@ rubyforge_project:
102
109
  rubygems_version: 2.6.11
103
110
  signing_key:
104
111
  specification_version: 4
105
- summary: Ruby gem for dealing with proxy lists
112
+ summary: Ruby gem for dealing with proxy lists from different providers
106
113
  test_files: []
@@ -1,24 +0,0 @@
1
- module ProxyFetcher
2
- class HTTPClient
3
- attr_reader :http
4
-
5
- def initialize(url)
6
- @uri = URI.parse(url)
7
- @http = Net::HTTP.new(@uri.host, @uri.port)
8
- @http.use_ssl = true if @uri.scheme.downcase == 'https'
9
- end
10
-
11
- def fetch
12
- request = Net::HTTP::Get.new(@uri.to_s)
13
- request['Connection'] = 'keep-alive'
14
- response = @http.request(request)
15
- response.body
16
- end
17
-
18
- class << self
19
- def fetch(url)
20
- new(url).fetch
21
- end
22
- end
23
- end
24
- end