proxy_fetcher 0.5.1 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: eb17af3b54c59182794340dd9e07351e3107c380
4
- data.tar.gz: f86f209533d12f4a536ac4a5ae07d48a8afc9e90
3
+ metadata.gz: 6a8a3fe3140e235a3b46ecdb64322c77c8ce69d4
4
+ data.tar.gz: 9ace00e654e55832242e050ee42d01642b26338c
5
5
  SHA512:
6
- metadata.gz: 0ecf1b6ee299550228ea651a220bcc9d85f3e383844974416862d567634a35d41d3650499ee6a8695af9b26c53fea3bba35a57336b1cfc795996df10c301751b
7
- data.tar.gz: dcb39302fa9b2b43869be96484794d273a579546f92c27d5103f535d2fe8768220ab3a781d8373a58e11ebe5a2abe9efb13c772c32b088aa393bf890ca66c385
6
+ metadata.gz: 961dd103ae502f947a7417b248ba7cbfe8e5907880bbe2b2b123b568de6f703f39ba6a2da281dbb139a62703d9f2a26f4bb02be6128df6b09457542aa7235ba3
7
+ data.tar.gz: d80b53cdfb9f67c76edf4176ab6bd3daa8151d814f67e82405b53495ecab4be0d963232e010efe333a5ef1dfd177251367b9d620d9cc78306fccfc2ef137b837
data/.gitignore CHANGED
@@ -15,6 +15,7 @@ pickle-email-*.html
15
15
  Gemfile.lock
16
16
  *.gem
17
17
  certs
18
+ gemfiles/*.gemfile.lock
18
19
 
19
20
  # TODO Comment out this rule if you are OK with secrets being uploaded to the repo
20
21
  config/initializers/secret_token.rb
data/.rubocop.yml CHANGED
@@ -1,7 +1,7 @@
1
1
  LineLength:
2
2
  Max: 120
3
3
  AllCops:
4
- TargetRubyVersion: 2.4
4
+ TargetRubyVersion: 2.1
5
5
  Exclude:
6
6
  - 'spec/**/*'
7
7
  - 'bin/*'
data/.travis.yml CHANGED
@@ -2,6 +2,12 @@ language: ruby
2
2
  before_install: gem install bundler
3
3
  bundler_args: --without yard guard benchmarks
4
4
  script: "rake spec"
5
+ env:
6
+ global:
7
+ - "JRUBY_OPTS='$JRUBY_OPTS --debug'"
8
+ gemfile:
9
+ - gemfiles/oga.gemfile
10
+ - gemfiles/nokogiri.gemfile
5
11
  rvm:
6
12
  - 2.0
7
13
  - 2.1
@@ -12,3 +18,6 @@ rvm:
12
18
  matrix:
13
19
  allow_failures:
14
20
  - rvm: ruby-head
21
+ exclude:
22
+ - rvm: 2.0
23
+ gemfile: gemfiles/nokogiri.gemfile # Nokogiri doesn't support Ruby 2.0
data/Gemfile CHANGED
@@ -2,7 +2,10 @@ source 'https://rubygems.org'
2
2
 
3
3
  gemspec
4
4
 
5
+ gem 'nokogiri', '~> 1.8'
6
+ gem 'oga', '~> 2.0'
7
+
5
8
  group :test do
6
9
  gem 'coveralls', require: false
7
- gem 'evil-proxy'
10
+ gem 'evil-proxy', '~> 0.2'
8
11
  end
data/README.md CHANGED
@@ -5,17 +5,19 @@
5
5
  [![Code Climate](https://codeclimate.com/github/nbulaj/proxy_fetcher/badges/gpa.svg)](https://codeclimate.com/github/nbulaj/proxy_fetcher)
6
6
  [![License](http://img.shields.io/badge/license-MIT-brightgreen.svg)](#license)
7
7
 
8
- This gem can help your Ruby application to make HTTP(S) requests from proxy by fetching and validating actual
8
+ This gem can help your Ruby application to make HTTP(S) requests using proxy by fetching and validating actual
9
9
  proxy lists from multiple providers.
10
10
 
11
- It gives you a `Manager` class that can load proxy lists, validate them and return random or specific proxies. Take a look
12
- at the documentation below to find all the gem features.
11
+ It gives you a special `Manager` class that can load proxy lists, validate them and return random or specific proxies.
12
+ It also has a `Client` class that encapsulates all the logic for the sending HTTP requests using proxies.
13
+ Take a look at the documentation below to find all the gem features.
13
14
 
14
15
  Also this gem can be used with any other programming language (Go / Python / etc) as standalone solution for downloading and
15
16
  validating proxy lists from the different providers. [Checkout examples](#standalone) of usage below.
16
17
 
17
18
  ## Table of Contents
18
19
 
20
+ - [Dependencies](#dependencies)
19
21
  - [Installation](#installation)
20
22
  - [Example of usage](#example-of-usage)
21
23
  - [In Ruby application](#in-ruby-application)
@@ -28,12 +30,24 @@ validating proxy lists from the different providers. [Checkout examples](#standa
28
30
  - [Contributing](#contributing)
29
31
  - [License](#license)
30
32
 
33
+ ## Dependencies
34
+
35
+ ProxyFetcher gem itself requires only Ruby `>= 2.0.0`.
36
+
37
+ However, it requires an adapter to parse HTML. If you do not specify any specific adapter, then it will use
38
+ default one - [Nokogiri](https://github.com/sparklemotion/nokogiri). It's OK for any Ruby on Rails project
39
+ (because they uses it by default).
40
+
41
+ But if you want to use some specific adapter (for example your Ruby application uses [Oga](https://gitlab.com/yorickpeterse/oga),
42
+ then you need to manually add your dependencies to your project and configure ProxyFetcher to use another adapter. Moreover,
43
+ you can implement your own adapter if it your use-case. Take a look at the [Configuration](#configuration) section for more details.
44
+
31
45
  ## Installation
32
46
 
33
47
  If using bundler, first add 'proxy_fetcher' to your Gemfile:
34
48
 
35
49
  ```ruby
36
- gem 'proxy_fetcher', '~> 0.5'
50
+ gem 'proxy_fetcher', '~> 0.6'
37
51
  ```
38
52
 
39
53
  or if you want to use the latest version (from `master` branch), then:
@@ -234,7 +248,25 @@ Btw, if you need support of JavaScript or some other features, you need to imple
234
248
 
235
249
  ## Configuration
236
250
 
237
- To change open/read timeout for `cleanup!` and `connectable?` methods you need to change `ProxyFetcher.config`:
251
+ ProxyFetcher is very flexible gem. You can configure the most important parts of the library and use your own solutions.
252
+
253
+ Default configuration looks as follows:
254
+
255
+ ```ruby
256
+ ProxyFetcher.configure do |config|
257
+ config.user_agent = ProxyFetcher::Configuration::DEFAULT_USER_AGENT
258
+ config.pool_size = 10
259
+ config.timeout = 3
260
+ config.http_client = ProxyFetcher::HTTPClient
261
+ config.proxy_validator = ProxyFetcher::ProxyValidator
262
+ config.providers = ProxyFetcher::Configuration.registered_providers
263
+ config.adapter = ProxyFetcher::Configuration::DEFAULT_ADAPTER # :nokogiri by default
264
+ end
265
+ ```
266
+
267
+ You can change any of the options above. Let's look at this deeper.
268
+
269
+ To change open/read timeout for `cleanup!` and `connectable?` methods you need to change `timeout` options:
238
270
 
239
271
  ```ruby
240
272
  ProxyFetcher.configure do |config|
@@ -245,7 +277,7 @@ manager = ProxyFetcher::Manager.new
245
277
  manager.cleanup!
246
278
  ```
247
279
 
248
- Also you can set your custom User-Agent:
280
+ Also you can set your custom User-Agent string:
249
281
 
250
282
  ```ruby
251
283
  ProxyFetcher.configure do |config|
@@ -253,10 +285,11 @@ ProxyFetcher.configure do |config|
253
285
  end
254
286
  ```
255
287
 
256
- ProxyFetcher uses simple Ruby solution for dealing with HTTP(S) requests - `net/http` library from the stdlib. If you wanna add, for example, your custom provider that
257
- was developed as a Single Page Application (SPA) with some JavaScript, then you will need something like [selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb)
258
- to properly load the content of the website. For those and other cases you can write your own class for fetching HTML content by the URL and setup it
259
- in the ProxyFetcher config:
288
+ ProxyFetcher uses standard Ruby solution for dealing with HTTP(S) requests - `net/http` library from the Ruby core.
289
+ If you wanna add, for example, your custom provider that was developed as a Single Page Application (SPA) with some JavaScript,
290
+ then you will need something like [selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb) to properly
291
+ load the content of the website. For those and other cases you can write your own class for fetching HTML content by
292
+ the URL and setup it in the ProxyFetcher config:
260
293
 
261
294
  ```ruby
262
295
  class MyHTTPClient
@@ -300,6 +333,21 @@ manager.validate!
300
333
  #=> [ ... ]
301
334
  ```
302
335
 
336
+ Be default, ProxyFetcher gem uses [Nokogiri](https://github.com/sparklemotion/nokogiri) for parsing HTML. If you want
337
+ to use [Oga](https://gitlab.com/yorickpeterse/oga) instead, then you need to add `gem 'oga'` to your Gemfile and configure
338
+ ProxyFetcher as follows:
339
+
340
+ ```ruby
341
+ ProxyFetcher.config.adapter = :oga
342
+ ```
343
+
344
+ Also you can write your own HTML parser implementation and use it, take a look at the [abstract class and implementations](lib/proxy_fetcher/document).
345
+ Configure it as:
346
+
347
+ ```ruby
348
+ ProxyFetcher.config.adapter = MyHTMLParserClass
349
+ ```
350
+
303
351
  ### Proxy validation speed
304
352
 
305
353
  There are some tricks to increase proxy list validation performance.
@@ -0,0 +1,11 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec path: '../'
4
+
5
+ gem 'nokogiri', '~> 1.8'
6
+
7
+ group :test do
8
+ gem 'coveralls', require: false
9
+ gem 'evil-proxy', '~> 0.2'
10
+ gem 'rspec-rails', '~> 3.6'
11
+ end
@@ -0,0 +1,11 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec path: '../'
4
+
5
+ gem 'oga', '~> 2.0'
6
+
7
+ group :test do
8
+ gem 'coveralls', require: false
9
+ gem 'evil-proxy', '~> 0.2'
10
+ gem 'rspec-rails', '~> 3.6'
11
+ end
@@ -1,11 +1,13 @@
1
1
  module ProxyFetcher
2
2
  class Configuration
3
- attr_accessor :providers, :timeout, :pool_size, :user_agent
4
- attr_accessor :http_client, :proxy_validator
3
+ attr_accessor :timeout, :pool_size, :user_agent
4
+ attr_reader :adapter, :http_client, :proxy_validator, :providers
5
5
 
6
6
  # rubocop:disable Metrics/LineLength
7
7
  DEFAULT_USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112 Safari/537.36'.freeze
8
8
 
9
+ DEFAULT_ADAPTER = :nokogiri
10
+
9
11
  class << self
10
12
  def providers_registry
11
13
  @registry ||= ProvidersRegistry.new
@@ -35,6 +37,11 @@ module ProxyFetcher
35
37
  self.providers = self.class.registered_providers
36
38
  end
37
39
 
40
+ def adapter=(name_or_class)
41
+ @adapter = ProxyFetcher::Document::Adapters.lookup(name_or_class)
42
+ @adapter.setup!
43
+ end
44
+
38
45
  def providers=(value)
39
46
  @providers = Array(value)
40
47
  end
@@ -0,0 +1,31 @@
1
+ module ProxyFetcher
2
+ class Document
3
+ class AbstractAdapter
4
+ attr_reader :document
5
+
6
+ def initialize(document)
7
+ @document = document
8
+ end
9
+
10
+ # You can override this method in your own adapter class
11
+ def xpath(selector)
12
+ document.xpath(selector)
13
+ end
14
+
15
+ # You can override this method in your own adapter class
16
+ def css(selector)
17
+ document.css(selector)
18
+ end
19
+
20
+ def proxy_node
21
+ self.class.const_get('Node')
22
+ end
23
+
24
+ def self.setup!(*args)
25
+ install_requirements!(*args)
26
+ rescue LoadError => error
27
+ raise Exceptions::AdapterSetupError.new(name, error.message)
28
+ end
29
+ end
30
+ end
31
+ end
@@ -0,0 +1,35 @@
1
+ module ProxyFetcher
2
+ class Document
3
+ class NokogiriAdapter < AbstractAdapter
4
+ def self.install_requirements!
5
+ require 'nokogiri'
6
+ end
7
+
8
+ def self.parse(data)
9
+ new(::Nokogiri::HTML(data))
10
+ end
11
+
12
+ class Node < ProxyFetcher::Document::Node
13
+ def at_xpath(*args)
14
+ self.class.new(node.at_xpath(*args))
15
+ end
16
+
17
+ def at_css(*args)
18
+ self.class.new(node.at_css(*args))
19
+ end
20
+
21
+ def attr(*args)
22
+ clear(node.attr(*args))
23
+ end
24
+
25
+ def content
26
+ clear(node.content)
27
+ end
28
+
29
+ def html
30
+ node.inner_html
31
+ end
32
+ end
33
+ end
34
+ end
35
+ end
@@ -0,0 +1,35 @@
1
+ module ProxyFetcher
2
+ class Document
3
+ class OgaAdapter < AbstractAdapter
4
+ def self.install_requirements!
5
+ require 'oga'
6
+ end
7
+
8
+ def self.parse(data)
9
+ new(::Oga.parse_html(data))
10
+ end
11
+
12
+ class Node < ProxyFetcher::Document::Node
13
+ def at_xpath(*args)
14
+ self.class.new(node.at_xpath(*args))
15
+ end
16
+
17
+ def at_css(*args)
18
+ self.class.new(node.at_css(*args))
19
+ end
20
+
21
+ def attr(*args)
22
+ clear(node.attribute(*args).value)
23
+ end
24
+
25
+ def content
26
+ clear(node.text)
27
+ end
28
+
29
+ def html
30
+ node.to_xml
31
+ end
32
+ end
33
+ end
34
+ end
35
+ end
@@ -0,0 +1,24 @@
1
+ module ProxyFetcher
2
+ class Document
3
+ class Adapters
4
+ ADAPTER = 'Adapter'.freeze
5
+ private_constant :ADAPTER
6
+
7
+ class << self
8
+ def lookup(name_or_class)
9
+ raise Exceptions::BlankAdapter if name_or_class.nil? || name_or_class.to_s.empty?
10
+
11
+ case name_or_class
12
+ when Symbol, String
13
+ adapter_name = name_or_class.to_s.capitalize << ADAPTER
14
+ ProxyFetcher::Document.const_get(adapter_name)
15
+ else
16
+ name_or_class
17
+ end
18
+ rescue NameError
19
+ raise Exceptions::UnknownAdapter, name_or_class
20
+ end
21
+ end
22
+ end
23
+ end
24
+ end
@@ -0,0 +1,35 @@
1
+ module ProxyFetcher
2
+ class Document
3
+ class Node
4
+ attr_reader :node
5
+
6
+ def initialize(node)
7
+ @node = node
8
+ end
9
+
10
+ def find(selector, method = :at_xpath)
11
+ self.class.new(node.public_send(method, selector))
12
+ end
13
+
14
+ def content_at(*args)
15
+ clear(find(*args).content)
16
+ end
17
+
18
+ def content
19
+ raise "#{__method__} must be implemented in descendant class!"
20
+ end
21
+
22
+ def html
23
+ raise "#{__method__} must be implemented in descendant class!"
24
+ end
25
+
26
+ protected
27
+
28
+ def clear(text)
29
+ return if text.nil? || text.empty?
30
+
31
+ text.strip.gsub(/[ \t]/i, '')
32
+ end
33
+ end
34
+ end
35
+ end
@@ -0,0 +1,23 @@
1
+ module ProxyFetcher
2
+ class Document
3
+ class << self
4
+ def parse(data)
5
+ new(ProxyFetcher.config.adapter.parse(data))
6
+ end
7
+ end
8
+
9
+ attr_reader :backend
10
+
11
+ def initialize(backend)
12
+ @backend = backend
13
+ end
14
+
15
+ def xpath(*args)
16
+ backend.xpath(*args).map { |node| backend.proxy_node.new(node) }
17
+ end
18
+
19
+ def css(*args)
20
+ backend.css(*args).map { |node| backend.proxy_node.new(node) }
21
+ end
22
+ end
23
+ end
@@ -32,5 +32,38 @@ module ProxyFetcher
32
32
  super('reached the maximum number of retries')
33
33
  end
34
34
  end
35
+
36
+ class UnknownAdapter < Error
37
+ def initialize(name)
38
+ super("unknown adapter '#{name}'")
39
+ end
40
+ end
41
+
42
+ class BlankAdapter < Error
43
+ def initialize(*)
44
+ super(<<-MSG.strip.squeeze
45
+ you need to specify adapter for HTML parsing: ProxyFetcher.config.adapter = :nokogiri.
46
+ You can use one of the predefined adapters (:nokogiri or :oga) or your own implementation.
47
+ MSG
48
+ )
49
+ end
50
+ end
51
+
52
+ class AdapterSetupError < Error
53
+ def initialize(adapter_name, reason)
54
+ adapter = demodulize(adapter_name.gsub('Adapter', ''))
55
+
56
+ super("can't setup '#{adapter}' adapter during the following error:\n\t#{reason}'")
57
+ end
58
+
59
+ private
60
+
61
+ def demodulize(path)
62
+ path = path.to_s
63
+ index = path.rindex('::')
64
+
65
+ index ? path[(index + 2)..-1] : path
66
+ end
67
+ end
35
68
  end
36
69
  end
@@ -1,12 +1,6 @@
1
- require 'forwardable'
2
-
3
1
  module ProxyFetcher
4
2
  module Providers
5
3
  class Base
6
- extend Forwardable
7
-
8
- def_delegators ProxyFetcher::HTML, :clear, :convert_to_int
9
-
10
4
  # Loads proxy provider page content, extract proxy list from it
11
5
  # and convert every entry to proxy object.
12
6
  def fetch_proxies!(filters = {})
@@ -14,8 +8,8 @@ module ProxyFetcher
14
8
  end
15
9
 
16
10
  class << self
17
- def fetch_proxies!(filters = {})
18
- new.fetch_proxies!(filters)
11
+ def fetch_proxies!(*args)
12
+ new.fetch_proxies!(*args)
19
13
  end
20
14
  end
21
15
 
@@ -23,12 +17,13 @@ module ProxyFetcher
23
17
 
24
18
  # Loads HTML document with Nokogiri by the URL combined with custom filters
25
19
  def load_document(url, filters = {})
26
- raise ArgumentError, 'filters must be a Hash' if filters && !filters.is_a?(Hash)
20
+ raise ArgumentError, 'filters must be a Hash' unless filters.is_a?(Hash)
27
21
 
28
22
  uri = URI.parse(url)
29
23
  uri.query = URI.encode_www_form(filters) if filters && filters.any?
30
24
 
31
- Nokogiri::HTML(ProxyFetcher.config.http_client.fetch(uri.to_s))
25
+ html = ProxyFetcher.config.http_client.fetch(uri.to_s)
26
+ ProxyFetcher::Document.parse(html)
32
27
  end
33
28
 
34
29
  # Get HTML elements with proxy info
@@ -40,11 +35,6 @@ module ProxyFetcher
40
35
  def to_proxy(*)
41
36
  raise NotImplementedError, "#{__method__} must be implemented in a descendant class!"
42
37
  end
43
-
44
- # Return normalized HTML element content by selector
45
- def parse_element(parent, selector, method = :at_xpath)
46
- clear(parent.public_send(method, selector).content)
47
- end
48
38
  end
49
39
  end
50
40
  end
@@ -9,20 +9,20 @@ module ProxyFetcher
9
9
  doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
10
10
  end
11
11
 
12
- def to_proxy(html_element)
12
+ def to_proxy(html_node)
13
13
  ProxyFetcher::Proxy.new.tap do |proxy|
14
- proxy.addr = parse_element(html_element, 'td[1]')
15
- proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
16
- proxy.country = parse_element(html_element, 'td[4]')
17
- proxy.anonymity = parse_element(html_element, 'td[5]')
18
- proxy.type = parse_type(html_element)
14
+ proxy.addr = html_node.content_at('td[1]')
15
+ proxy.port = Integer(html_node.content_at('td[2]'))
16
+ proxy.country = html_node.content_at('td[4]')
17
+ proxy.anonymity = html_node.content_at('td[5]')
18
+ proxy.type = parse_type(html_node)
19
19
  end
20
20
  end
21
21
 
22
22
  private
23
23
 
24
- def parse_type(element)
25
- https = parse_element(element, 'td[6]')
24
+ def parse_type(html_node)
25
+ https = html_node.content_at('td[6]')
26
26
  https && https.casecmp('yes').zero? ? ProxyFetcher::Proxy::HTTPS : ProxyFetcher::Proxy::HTTP
27
27
  end
28
28
  end
@@ -9,12 +9,12 @@ module ProxyFetcher
9
9
  doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
10
10
  end
11
11
 
12
- def to_proxy(html_element)
12
+ def to_proxy(html_node)
13
13
  ProxyFetcher::Proxy.new.tap do |proxy|
14
- proxy.addr = parse_element(html_element, 'td[1]')
15
- proxy.port = convert_to_int(parse_element(html_element, 'td[2]'))
16
- proxy.country = parse_element(html_element, 'td[4]')
17
- proxy.anonymity = parse_element(html_element, 'td[5]')
14
+ proxy.addr = html_node.content_at('td[1]')
15
+ proxy.port = Integer(html_node.content_at('td[2]'))
16
+ proxy.country = html_node.content_at('td[4]')
17
+ proxy.anonymity = html_node.content_at('td[5]')
18
18
  proxy.type = ProxyFetcher::Proxy::HTTPS
19
19
  end
20
20
  end
@@ -10,8 +10,8 @@ module ProxyFetcher
10
10
  doc.xpath('//div[@class="proxy-list"]/table/script')
11
11
  end
12
12
 
13
- def to_proxy(html_element)
14
- json = parse_json(html_element)
13
+ def to_proxy(html_node)
14
+ json = parse_json(html_node)
15
15
 
16
16
  ProxyFetcher::Proxy.new.tap do |proxy|
17
17
  proxy.addr = json['PROXY_IP']
@@ -25,8 +25,8 @@ module ProxyFetcher
25
25
 
26
26
  private
27
27
 
28
- def parse_json(element)
29
- javascript = clear(element.content)[/{.+}/im]
28
+ def parse_json(html_node)
29
+ javascript = html_node.content[/{.+}/im]
30
30
  JSON.parse(javascript)
31
31
  end
32
32
  end
@@ -8,34 +8,34 @@ module ProxyFetcher
8
8
  doc.xpath('//table[contains(@id, "GridView")]/tr[(count(td)>2)]')
9
9
  end
10
10
 
11
- def to_proxy(html_element)
11
+ def to_proxy(html_node)
12
12
  ProxyFetcher::Proxy.new.tap do |proxy|
13
- uri = parse_proxy_uri(html_element)
13
+ uri = parse_proxy_uri(html_node)
14
14
  proxy.addr = uri.host
15
15
  proxy.port = uri.port
16
16
 
17
- proxy.country = parse_country(html_element)
18
- proxy.anonymity = parse_anonymity(html_element)
17
+ proxy.country = parse_country(html_node)
18
+ proxy.anonymity = parse_anonymity(html_node)
19
19
  proxy.type = ProxyFetcher::Proxy::HTTP
20
20
  end
21
21
  end
22
22
 
23
23
  private
24
24
 
25
- def parse_proxy_uri(element)
26
- full_addr = parse_element(element, 'td[1]')
25
+ def parse_proxy_uri(html_node)
26
+ full_addr = html_node.content_at('td[1]')
27
27
  URI.parse("http://#{full_addr}")
28
28
  end
29
29
 
30
- def parse_country(element)
31
- element.at('img').attr('title')
30
+ def parse_country(html_node)
31
+ html_node.find('.//img').attr('title')
32
32
  end
33
33
 
34
- def parse_anonymity(element)
35
- transparency = parse_element(element, 'td[5]').to_sym
34
+ def parse_anonymity(html_node)
35
+ transparency = html_node.content_at('td[5]').to_sym
36
36
 
37
37
  {
38
- A: 'Anonimous',
38
+ A: 'Anonymous',
39
39
  E: 'Elite',
40
40
  T: 'Transparent',
41
41
  U: 'Unknown'
@@ -9,15 +9,15 @@ module ProxyFetcher
9
9
  doc.xpath('//table[contains(@class, "table")]/tr[(not(@id="proxy-table-header")) and (count(td)>2)]')
10
10
  end
11
11
 
12
- def to_proxy(html_element)
12
+ def to_proxy(html_node)
13
13
  ProxyFetcher::Proxy.new.tap do |proxy|
14
- uri = URI("//#{parse_element(html_element, 'td[1]')}")
14
+ uri = URI("//#{html_node.content_at('td[1]')}")
15
15
  proxy.addr = uri.host
16
16
  proxy.port = uri.port
17
17
 
18
- proxy.type = parse_element(html_element, 'td[2]')
19
- proxy.anonymity = parse_element(html_element, 'td[3]')
20
- proxy.country = parse_element(html_element, 'td[5]')
18
+ proxy.type = html_node.content_at('td[2]')
19
+ proxy.anonymity = html_node.content_at('td[3]')
20
+ proxy.country = html_node.content_at('td[5]')
21
21
  end
22
22
  end
23
23
  end
@@ -10,22 +10,22 @@ module ProxyFetcher
10
10
  doc.css('.table-wrap .table ul')
11
11
  end
12
12
 
13
- def to_proxy(html_element)
13
+ def to_proxy(html_node)
14
14
  ProxyFetcher::Proxy.new.tap do |proxy|
15
- uri = parse_proxy_uri(html_element)
15
+ uri = parse_proxy_uri(html_node)
16
16
  proxy.addr = uri.host
17
17
  proxy.port = uri.port
18
18
 
19
- proxy.type = parse_element(html_element, 'li[2]')
20
- proxy.anonymity = parse_element(html_element, 'li[4]')
21
- proxy.country = clear(html_element.at_xpath("li[5]//span[@class='country']").attr('title'))
19
+ proxy.type = html_node.content_at('li[2]')
20
+ proxy.anonymity = html_node.content_at('li[4]')
21
+ proxy.country = html_node.find("li[5]//span[@class='country']").attr('title')
22
22
  end
23
23
  end
24
24
 
25
25
  private
26
26
 
27
- def parse_proxy_uri(element)
28
- full_addr = ::Base64.decode64(element.at('li script').inner_html.match(/'(.+)'/)[1])
27
+ def parse_proxy_uri(html_node)
28
+ full_addr = ::Base64.decode64(html_node.at_css('li script').html.match(/'(.+)'/)[1])
29
29
  URI.parse("http://#{full_addr}")
30
30
  end
31
31
  end
@@ -8,21 +8,21 @@ module ProxyFetcher
8
8
  doc.xpath('//div[@id="content"]/table[1]/tr[contains(@class, "row")]')
9
9
  end
10
10
 
11
- def to_proxy(html_element)
11
+ def to_proxy(html_node)
12
12
  ProxyFetcher::Proxy.new.tap do |proxy|
13
- proxy.addr = parse_element(html_element, 'td[2]')
14
- proxy.port = convert_to_int(parse_element(html_element, 'td[3]'))
15
- proxy.anonymity = parse_element(html_element, 'td[4]')
16
- proxy.country = parse_element(html_element, 'td[6]')
17
- proxy.response_time = convert_to_int(parse_element(html_element, 'td[7]'))
18
- proxy.type = parse_type(html_element)
13
+ proxy.addr = html_node.content_at('td[2]')
14
+ proxy.port = Integer(html_node.content_at('td[3]'))
15
+ proxy.anonymity = html_node.content_at('td[4]')
16
+ proxy.country = html_node.content_at('td[6]')
17
+ proxy.response_time = Integer(html_node.content_at('td[7]'))
18
+ proxy.type = parse_type(html_node)
19
19
  end
20
20
  end
21
21
 
22
22
  private
23
23
 
24
- def parse_type(element)
25
- https = parse_element(element, 'td[5]')
24
+ def parse_type(html_node)
25
+ https = html_node.content_at('td[5]')
26
26
  https.casecmp('true').zero? ? ProxyFetcher::Proxy::HTTPS : ProxyFetcher::Proxy::HTTP
27
27
  end
28
28
  end
@@ -7,9 +7,9 @@ module ProxyFetcher
7
7
  # Major version number
8
8
  MAJOR = 0
9
9
  # Minor version number
10
- MINOR = 5
10
+ MINOR = 6
11
11
  # Smallest version number
12
- TINY = 1
12
+ TINY = 0
13
13
 
14
14
  # Full version number
15
15
  STRING = [MAJOR, MINOR, TINY].compact.join('.')
data/lib/proxy_fetcher.rb CHANGED
@@ -1,7 +1,5 @@
1
1
  require 'uri'
2
2
  require 'net/https'
3
- require 'nokogiri'
4
- require 'thread'
5
3
 
6
4
  require File.dirname(__FILE__) + '/proxy_fetcher/exceptions'
7
5
  require File.dirname(__FILE__) + '/proxy_fetcher/configuration'
@@ -10,12 +8,18 @@ require File.dirname(__FILE__) + '/proxy_fetcher/proxy'
10
8
  require File.dirname(__FILE__) + '/proxy_fetcher/manager'
11
9
 
12
10
  require File.dirname(__FILE__) + '/proxy_fetcher/utils/http_client'
13
- require File.dirname(__FILE__) + '/proxy_fetcher/utils/html'
14
11
  require File.dirname(__FILE__) + '/proxy_fetcher/utils/proxy_validator'
15
12
  require File.dirname(__FILE__) + '/proxy_fetcher/client/client'
16
13
  require File.dirname(__FILE__) + '/proxy_fetcher/client/request'
17
14
  require File.dirname(__FILE__) + '/proxy_fetcher/client/proxies_registry'
18
15
 
16
+ require File.dirname(__FILE__) + '/proxy_fetcher/document'
17
+ require File.dirname(__FILE__) + '/proxy_fetcher/document/adapters'
18
+ require File.dirname(__FILE__) + '/proxy_fetcher/document/node'
19
+ require File.dirname(__FILE__) + '/proxy_fetcher/document/adapters/abstract_adapter'
20
+ require File.dirname(__FILE__) + '/proxy_fetcher/document/adapters/nokogiri_adapter'
21
+ require File.dirname(__FILE__) + '/proxy_fetcher/document/adapters/oga_adapter'
22
+
19
23
  module ProxyFetcher
20
24
  module Providers
21
25
  require File.dirname(__FILE__) + '/proxy_fetcher/providers/base'
@@ -36,5 +40,13 @@ module ProxyFetcher
36
40
  def configure
37
41
  yield config
38
42
  end
43
+
44
+ private
45
+
46
+ def configure_adapter!
47
+ config.adapter = Configuration::DEFAULT_ADAPTER if config.adapter.nil?
48
+ end
39
49
  end
50
+
51
+ configure_adapter!
40
52
  end
@@ -5,10 +5,10 @@ require 'proxy_fetcher/version'
5
5
  Gem::Specification.new do |gem|
6
6
  gem.name = 'proxy_fetcher'
7
7
  gem.version = ProxyFetcher.gem_version
8
- gem.date = '2017-11-13'
8
+ gem.date = '2017-12-08'
9
9
  gem.summary = 'Ruby gem for dealing with proxy lists from different providers'
10
10
  gem.description = 'This gem can help your Ruby application to make HTTP(S) requests ' \
11
- 'from proxy server by fetching and validating proxy lists from the different providers.'
11
+ 'using proxies by fetching and validating proxy lists from the different providers.'
12
12
  gem.authors = ['Nikita Bulai']
13
13
  gem.email = 'bulajnikita@gmail.com'
14
14
  gem.require_paths = ['lib']
@@ -19,7 +19,5 @@ Gem::Specification.new do |gem|
19
19
  gem.license = 'MIT'
20
20
  gem.required_ruby_version = '>= 2.0.0'
21
21
 
22
- gem.add_runtime_dependency 'nokogiri', '~> 1.6', '>= 1.6'
23
-
24
22
  gem.add_development_dependency 'rspec', '~> 3.5'
25
23
  end
@@ -118,7 +118,7 @@ describe ProxyFetcher::Client do
118
118
  it 'refreshes proxy lists if no proxy found' do
119
119
  ProxyFetcher::Client::ProxiesRegistry.manager.instance_variable_set(:'@proxies', [])
120
120
 
121
- expect { ProxyFetcher::Client.get('http://httpbin.org') }.not_to raise_error(ProxyFetcher::Exceptions::MaximumRetriesReached)
121
+ expect { ProxyFetcher::Client.get('http://httpbin.org') }.not_to raise_error
122
122
  end
123
123
  end
124
124
 
@@ -43,16 +43,33 @@ describe ProxyFetcher::Configuration do
43
43
  end
44
44
 
45
45
  context 'custom provider' do
46
- it 'failed on registration if provider class already registered' do
46
+ it 'fails on registration if provider class already registered' do
47
47
  expect { ProxyFetcher::Configuration.register_provider(:xroxy, Class.new) }
48
48
  .to raise_error(ProxyFetcher::Exceptions::RegisteredProvider)
49
49
  end
50
50
 
51
- it "failed on proxy list fetching if provider doesn't registered" do
51
+ it "fails on proxy list fetching if provider doesn't registered" do
52
52
  ProxyFetcher.config.provider = :not_existing_provider
53
53
 
54
54
  expect { ProxyFetcher::Manager.new }
55
55
  .to raise_error(ProxyFetcher::Exceptions::UnknownProvider)
56
56
  end
57
57
  end
58
+
59
+ context 'custom HTML parsing adapter' do
60
+ it "fails if adapter can't be installed" do
61
+ old_config = ProxyFetcher.config.dup
62
+
63
+ class CustomAdapter < ProxyFetcher::Document::AbstractAdapter
64
+ def self.install_requirements!
65
+ require 'not_existing_gem'
66
+ end
67
+ end
68
+
69
+ expect { ProxyFetcher.config.adapter = CustomAdapter }
70
+ .to raise_error(ProxyFetcher::Exceptions::AdapterSetupError)
71
+
72
+ ProxyFetcher.instance_variable_set('@config', old_config)
73
+ end
74
+ end
58
75
  end
@@ -0,0 +1,24 @@
1
+ require 'spec_helper'
2
+
3
+ describe ProxyFetcher::Document::Adapters do
4
+ describe '#lookup' do
5
+ it 'returns predefined adapters if symbol or string passed' do
6
+ expect(described_class.lookup('nokogiri')).to eq(ProxyFetcher::Document::NokogiriAdapter)
7
+
8
+ expect(described_class.lookup(:oga)).to eq(ProxyFetcher::Document::OgaAdapter)
9
+ end
10
+
11
+ it 'returns self if class passed' do
12
+ expect(described_class.lookup(Struct)).to eq(Struct)
13
+ end
14
+
15
+ it 'raises an exception if passed value is blank' do
16
+ expect { described_class.lookup(nil) }.to raise_error(ProxyFetcher::Exceptions::BlankAdapter)
17
+ expect { described_class.lookup('') }.to raise_error(ProxyFetcher::Exceptions::BlankAdapter)
18
+ end
19
+
20
+ it "raises an exception if adapter doesn't exist" do
21
+ expect { described_class.lookup('wrong') }.to raise_error(ProxyFetcher::Exceptions::UnknownAdapter)
22
+ end
23
+ end
24
+ end
data/spec/spec_helper.rb CHANGED
@@ -15,6 +15,13 @@ require 'proxy_fetcher'
15
15
 
16
16
  Dir['./spec/support/**/*.rb'].sort.each { |f| require f }
17
17
 
18
+ adapter = ENV['BUNDLE_GEMFILE'][/.+\/(.+)\.gemfile/i, 1]
19
+ puts "Configured adapter: '#{adapter}'"
20
+
21
+ ProxyFetcher.configure do |config|
22
+ config.adapter = adapter
23
+ end
24
+
18
25
  RSpec.configure do |config|
19
26
  config.order = 'random'
20
27
  end
metadata CHANGED
@@ -1,35 +1,15 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: proxy_fetcher
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.1
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Nikita Bulai
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-11-13 00:00:00.000000000 Z
11
+ date: 2017-12-08 00:00:00.000000000 Z
12
12
  dependencies:
13
- - !ruby/object:Gem::Dependency
14
- name: nokogiri
15
- requirement: !ruby/object:Gem::Requirement
16
- requirements:
17
- - - "~>"
18
- - !ruby/object:Gem::Version
19
- version: '1.6'
20
- - - ">="
21
- - !ruby/object:Gem::Version
22
- version: '1.6'
23
- type: :runtime
24
- prerelease: false
25
- version_requirements: !ruby/object:Gem::Requirement
26
- requirements:
27
- - - "~>"
28
- - !ruby/object:Gem::Version
29
- version: '1.6'
30
- - - ">="
31
- - !ruby/object:Gem::Version
32
- version: '1.6'
33
13
  - !ruby/object:Gem::Dependency
34
14
  name: rspec
35
15
  requirement: !ruby/object:Gem::Requirement
@@ -44,8 +24,8 @@ dependencies:
44
24
  - - "~>"
45
25
  - !ruby/object:Gem::Version
46
26
  version: '3.5'
47
- description: This gem can help your Ruby application to make HTTP(S) requests from
48
- proxy server by fetching and validating proxy lists from the different providers.
27
+ description: This gem can help your Ruby application to make HTTP(S) requests using
28
+ proxies by fetching and validating proxy lists from the different providers.
49
29
  email: bulajnikita@gmail.com
50
30
  executables:
51
31
  - proxy_fetcher
@@ -62,12 +42,20 @@ files:
62
42
  - README.md
63
43
  - Rakefile
64
44
  - bin/proxy_fetcher
45
+ - gemfiles/nokogiri.gemfile
46
+ - gemfiles/oga.gemfile
65
47
  - lib/proxy_fetcher.rb
66
48
  - lib/proxy_fetcher/client/client.rb
67
49
  - lib/proxy_fetcher/client/proxies_registry.rb
68
50
  - lib/proxy_fetcher/client/request.rb
69
51
  - lib/proxy_fetcher/configuration.rb
70
52
  - lib/proxy_fetcher/configuration/providers_registry.rb
53
+ - lib/proxy_fetcher/document.rb
54
+ - lib/proxy_fetcher/document/adapters.rb
55
+ - lib/proxy_fetcher/document/adapters/abstract_adapter.rb
56
+ - lib/proxy_fetcher/document/adapters/nokogiri_adapter.rb
57
+ - lib/proxy_fetcher/document/adapters/oga_adapter.rb
58
+ - lib/proxy_fetcher/document/node.rb
71
59
  - lib/proxy_fetcher/exceptions.rb
72
60
  - lib/proxy_fetcher/manager.rb
73
61
  - lib/proxy_fetcher/providers/base.rb
@@ -79,13 +67,13 @@ files:
79
67
  - lib/proxy_fetcher/providers/proxy_list.rb
80
68
  - lib/proxy_fetcher/providers/xroxy.rb
81
69
  - lib/proxy_fetcher/proxy.rb
82
- - lib/proxy_fetcher/utils/html.rb
83
70
  - lib/proxy_fetcher/utils/http_client.rb
84
71
  - lib/proxy_fetcher/utils/proxy_validator.rb
85
72
  - lib/proxy_fetcher/version.rb
86
73
  - proxy_fetcher.gemspec
87
74
  - spec/proxy_fetcher/client_spec.rb
88
75
  - spec/proxy_fetcher/configuration_spec.rb
76
+ - spec/proxy_fetcher/document/adapters_spec.rb
89
77
  - spec/proxy_fetcher/providers/base_spec.rb
90
78
  - spec/proxy_fetcher/providers/free_proxy_list_spec.rb
91
79
  - spec/proxy_fetcher/providers/free_proxy_list_ssl_spec.rb
@@ -97,7 +85,6 @@ files:
97
85
  - spec/proxy_fetcher/providers/xroxy_spec.rb
98
86
  - spec/proxy_fetcher/proxy_spec.rb
99
87
  - spec/spec_helper.rb
100
- - spec/support/evil_proxy_patch.rb
101
88
  - spec/support/manager_examples.rb
102
89
  homepage: http://github.com/nbulaj/proxy_fetcher
103
90
  licenses:
@@ -1,15 +0,0 @@
1
- module ProxyFetcher
2
- class HTML
3
- class << self
4
- def clear(text)
5
- return if text.nil? || text.empty?
6
-
7
- text.strip.gsub(/[ \t]/i, '')
8
- end
9
-
10
- def convert_to_int(text)
11
- Integer(clear(text))
12
- end
13
- end
14
- end
15
- end
@@ -1,26 +0,0 @@
1
- require 'evil-proxy'
2
-
3
- EvilProxy::HTTPProxyServer.class_eval do
4
- def do_PUT(req, res)
5
- perform_proxy_request(req, res) do |http, path, header|
6
- http.put(path, req.body || '', header)
7
- end
8
- end
9
-
10
- def do_DELETE(req, res)
11
- perform_proxy_request(req, res) do |http, path, header|
12
- http.delete(path, header)
13
- end
14
- end
15
-
16
- def do_PATCH(req, res)
17
- perform_proxy_request(req, res) do |http, path, header|
18
- http.patch(path, req.body || '', header)
19
- end
20
- end
21
-
22
- # This method is not needed for PUT but I added for completeness
23
- def do_OPTIONS(_req, res)
24
- res['allow'] = 'GET,HEAD,POST,OPTIONS,CONNECT,PUT,PATCH,DELETE'
25
- end
26
- end