proxy_fetcher 0.1.5 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 04cf23c5f1bb6abfd29e5d6180a5a9e167a2d147
4
- data.tar.gz: 5590d391eee582e511027a8282ef38917aa653a9
3
+ metadata.gz: 7a36eb7a048bc985836e08d8e380412ceadd6e4b
4
+ data.tar.gz: cdceec65f99f25a4bfe8b8ae3531a9d203fb8710
5
5
  SHA512:
6
- metadata.gz: ccd0ef56339916919c8bf1b70dd3332d9972d6db1cc5b676b28b16691161955cffafbfa25bb310ba5b1d78a7b02dc7e8e440bb0a019736442b8bcbcfa3f8a004
7
- data.tar.gz: e72892832780fddb424adc29dcb3b500ede2998d452c333aeffcb2a4766483a079288ec3e52e1eeca4ff61f7801ddae3ba80cddac2d527eaeabed09cbd8f6736
6
+ metadata.gz: 39954c3255c4ea3642023b4ceb0a9c03d4623d2cecd2b75237f4b219af71a292408583602ca5d7e3ce06ec54a8ac39c4e0c83b7d98ecceaea76ca8d86ddc5f81
7
+ data.tar.gz: 362d73388a1ed8b9d674e571c55763c265d808415efbc5db8c96e94c8ab110fffde2b137436ffe521f54b9940805df2c846a392c2a3316153f235a47f47ae96f
data/README.md CHANGED
@@ -5,14 +5,16 @@
5
5
  [![License](http://img.shields.io/badge/license-MIT-brightgreen.svg)](#license)
6
6
 
7
7
  This gem can help your Ruby application to make HTTP(S) requests from proxy server, fetching and validating
8
- current proxy lists from the [HideMyAss](http://hidemyass.com/) service.
8
+ current proxy lists from the different proxy services like [HideMyAss](http://hidemyass.com/) or Hide My Name.
9
+
10
+ **IMPORTANT** currently HideMyAss service closed free proxy list service, but it will be open soon and gem will be updated.
9
11
 
10
12
  ## Installation
11
13
 
12
14
  If using bundler, first add 'proxy_fetcher' to your Gemfile:
13
15
 
14
16
  ```ruby
15
- gem 'proxy_fetcher', '~> 0.1'
17
+ gem 'proxy_fetcher', '~> 0.2'
16
18
  ```
17
19
 
18
20
  or if you want to use the latest version (from `master` branch), then:
@@ -30,7 +32,7 @@ bundle install
30
32
  Otherwise simply install the gem:
31
33
 
32
34
  ```sh
33
- gem install proxy_fetcher -v '0.1'
35
+ gem install proxy_fetcher -v '0.2'
34
36
  ```
35
37
 
36
38
  ## Example of usage
@@ -79,9 +81,9 @@ Every proxy is a `ProxyFetcher::Proxy` object that has next readers:
79
81
  * `port`
80
82
  * `country` (USA or Brazil for example)
81
83
  * `response_time` (5217 for example)
82
- * `connection_time` (rank from 0 to 100, where 0 — slow, 100 — high)
83
- * `speed` (rank from 0 to 100, where 0 — slow, 100 — high)
84
- * `type` (URI schema, HTTP for example)
84
+ * `connection_time` (rank from 0 to 100, where 0 — slow, 100 — high. **Note** depends on the proxy provider)
85
+ * `speed` (rank from 0 to 100, where 0 — slow, 100 — high. **Note** depends on the proxy provider)
86
+ * `type` (URI schema, HTTP or HTTPS)
85
87
  * `anonimity` (Low or High +KA for example)
86
88
 
87
89
  Also you can call next instance method for every Proxy object:
@@ -109,16 +111,45 @@ You can sort or find any proxy by speed using next 3 instance methods:
109
111
  * `medium?`
110
112
  * `slow?`'
111
113
 
112
- To change open/read timeout for `cleanup!` and `connectable?` methods yu need to change ProxyFetcher::Manager config:
114
+ To change open/read timeout for `cleanup!` and `connectable?` methods yu need to change ProxyFetcher.config:
113
115
 
114
116
  ```ruby
115
- ProxyFetcher::Manager.config.read_timeout = 1 # default is 3
116
- ProxyFetcher::Manager.config.open_timeout = 1 # default is 3
117
+ ProxyFetcher.config.read_timeout = 1 # default is 3
118
+ ProxyFetcher.config.open_timeout = 1 # default is 3
117
119
 
118
120
  manager = ProxyFetcher::Manager.new
119
121
  manager.cleanup!
120
122
  ```
121
123
 
124
+ ## Providers
125
+
126
+ Currently ProxyFetcher can deal with next proxy providers:
127
+
128
+ * Hide My Name (default one)
129
+ * Free Proxy List
130
+ * HideMyAss
131
+
132
+ If you wanna use one of them just setup required in the config:
133
+
134
+
135
+ ```ruby
136
+ ProxyFetcher.config.provider = :free_proxy_list
137
+
138
+ manager = ProxyFetcher::Manager.new
139
+ manager.proxies
140
+ #=> ...
141
+ ```
142
+
143
+ Also you can write your own provider. All you need is to create a class, that would be inherited from the
144
+ `ProxyFetcher::Providers::Base` class, and register your provider like this:
145
+
146
+ ```ruby
147
+ ProxyFetcher::Configuration.register_provider(:your_provider, YourProviderClass)
148
+ ```
149
+
150
+ Provider class must implement `self.load_proxy_list` and `#parse!(html_entry)` methods that will load and parse
151
+ provider HTML page with proxy list. Take a look at the samples in the `proxy_fetcher/providers` directory.
152
+
122
153
  ## TODO
123
154
 
124
155
  * Proxy filters
data/lib/proxy_fetcher.rb CHANGED
@@ -5,6 +5,15 @@ require 'nokogiri'
5
5
  require 'proxy_fetcher/configuration'
6
6
  require 'proxy_fetcher/proxy'
7
7
  require 'proxy_fetcher/manager'
8
+ require 'proxy_fetcher/providers/base'
9
+ require 'proxy_fetcher/providers/hide_my_ass'
10
+ require 'proxy_fetcher/providers/hide_my_name'
11
+ require 'proxy_fetcher/providers/free_proxy_list'
8
12
 
9
13
  module ProxyFetcher
14
+ class << self
15
+ def config
16
+ @config ||= ProxyFetcher::Configuration.new
17
+ end
18
+ end
10
19
  end
@@ -1,10 +1,33 @@
1
1
  module ProxyFetcher
2
2
  class Configuration
3
- attr_accessor :open_timeout, :read_timeout
3
+ UnknownProvider = Class.new(StandardError)
4
+ ProviderRegistered = Class.new(StandardError)
5
+
6
+ attr_accessor :open_timeout, :read_timeout, :provider
7
+
8
+ class << self
9
+ def providers
10
+ @providers ||= {}
11
+ end
12
+
13
+ def register_provider(name, klass)
14
+ raise ProviderRegistered, "#{name} provider already registered!" if providers.key?(name.to_sym)
15
+
16
+ providers[name.to_sym] = klass
17
+ end
18
+ end
4
19
 
5
20
  def initialize
6
21
  @open_timeout = 3
7
22
  @read_timeout = 3
23
+
24
+ self.provider = :hide_my_name # currently default one
25
+ end
26
+
27
+ def provider=(name)
28
+ @provider = self.class.providers[name.to_sym]
29
+
30
+ raise UnknownProvider, "unregistered proxy provider (#{name})!" if @provider.nil?
8
31
  end
9
32
  end
10
33
  end
@@ -1,12 +1,6 @@
1
1
  module ProxyFetcher
2
2
  class Manager
3
- PROXY_PROVIDER_URL = 'http://proxylist.hidemyass.com/'.freeze
4
-
5
- class << self
6
- def config
7
- @config ||= ProxyFetcher::Configuration.new
8
- end
9
- end
3
+ EmptyProxyList = Class.new(StandardError)
10
4
 
11
5
  attr_reader :proxies
12
6
 
@@ -22,9 +16,7 @@ module ProxyFetcher
22
16
 
23
17
  # Update current proxy list from the provider
24
18
  def refresh_list!
25
- doc = Nokogiri::HTML(load_html(PROXY_PROVIDER_URL))
26
- rows = doc.xpath('//table[@id="listable"]/tbody/tr')
27
-
19
+ rows = ProxyFetcher.config.provider.load_proxy_list
28
20
  @proxies = rows.map { |row| Proxy.new(row) }
29
21
  end
30
22
 
@@ -45,11 +37,11 @@ module ProxyFetcher
45
37
  # Pop first valid proxy (and back it to the end of the proxy list)
46
38
  # Invalid proxies will be removed from the list
47
39
  def get!
48
- index = @proxies.find_index(&:connectable?)
40
+ index = proxies.find_index(&:connectable?)
49
41
  return if index.nil?
50
42
 
51
- proxy = @proxies.delete_at(index)
52
- tail = @proxies[index..-1]
43
+ proxy = proxies.delete_at(index)
44
+ tail = proxies[index..-1]
53
45
 
54
46
  @proxies = tail << proxy
55
47
 
@@ -65,7 +57,12 @@ module ProxyFetcher
65
57
 
66
58
  alias validate! cleanup!
67
59
 
68
- # Just schema + host + port
60
+ # Return random proxy
61
+ def random
62
+ proxies.sample
63
+ end
64
+
65
+ # Returns array of proxy URLs (just schema + host + port)
69
66
  def raw_proxies
70
67
  proxies.map(&:url)
71
68
  end
@@ -74,15 +71,5 @@ module ProxyFetcher
74
71
  def inspect
75
72
  to_s
76
73
  end
77
-
78
- private
79
-
80
- # Get HTML from the requested URL
81
- def load_html(url)
82
- uri = URI.parse(url)
83
- http = Net::HTTP.new(uri.host, uri.port)
84
- response = http.get(uri.request_uri)
85
- response.body
86
- end
87
74
  end
88
75
  end
@@ -0,0 +1,30 @@
1
+ module ProxyFetcher
2
+ module Providers
3
+ class Base
4
+ attr_reader :proxy
5
+
6
+ def initialize(proxy_instance)
7
+ @proxy = proxy_instance
8
+ end
9
+
10
+ def set!(name, value)
11
+ @proxy.instance_variable_set(:"@#{name}", value)
12
+ end
13
+
14
+ class << self
15
+ def parse_entry(entry, proxy_instance)
16
+ new(proxy_instance).parse!(entry)
17
+ end
18
+
19
+ # Get HTML from the requested URL
20
+ def load_html(url)
21
+ uri = URI.parse(url)
22
+ http = Net::HTTP.new(uri.host, uri.port)
23
+ http.use_ssl = true if uri.scheme == 'https'
24
+ response = http.get(uri.request_uri)
25
+ response.body
26
+ end
27
+ end
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,47 @@
1
+ module ProxyFetcher
2
+ module Providers
3
+ class FreeProxyList < Base
4
+ PROVIDER_URL = 'https://free-proxy-list.net/'.freeze
5
+
6
+ class << self
7
+ def load_proxy_list
8
+ doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
+ doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
10
+ end
11
+ end
12
+
13
+ def parse!(html_entry)
14
+ html_entry.xpath('td').each_with_index do |td, index|
15
+ case index
16
+ when 0
17
+ set!(:addr, td.content.strip)
18
+ when 1 then
19
+ set!(:port, Integer(td.content.strip))
20
+ when 3 then
21
+ set!(:country, td.content.strip)
22
+ when 4
23
+ set!(:anonymity, td.content.strip)
24
+ when 6
25
+ set!(:type, parse_type(td))
26
+ else
27
+ # nothing
28
+ end
29
+ end
30
+ end
31
+
32
+ private
33
+
34
+ def parse_type(td)
35
+ type = td.content.strip
36
+
37
+ if type && type.downcase.include?('yes')
38
+ 'HTTPS'
39
+ else
40
+ 'HTTP'
41
+ end
42
+ end
43
+ end
44
+ end
45
+ end
46
+
47
+ ProxyFetcher::Configuration.register_provider(:free_proxy_list, ProxyFetcher::Providers::FreeProxyList)
@@ -0,0 +1,70 @@
1
+ module ProxyFetcher
2
+ module Providers
3
+ class HideMyAss < Base
4
+ PROVIDER_URL = 'http://proxylist.hidemyass.com/'.freeze
5
+
6
+ class << self
7
+ def load_proxy_list
8
+ doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
+ doc.xpath('//table[@id="listable"]/tbody/tr')
10
+ end
11
+ end
12
+
13
+ def parse!(html_entry)
14
+ html_entry.xpath('td').each_with_index do |td, index|
15
+ case index
16
+ when 1
17
+ set!(:addr, parse_addr(td))
18
+ when 2 then
19
+ set!(:port, Integer(td.content.strip))
20
+ when 3 then
21
+ set!(:country, td.content.strip)
22
+ when 4
23
+ set!(:response_time, parse_response_time(td))
24
+ set!(:speed, parse_indicator_value(td))
25
+ when 5
26
+ set!(:connection_time, parse_indicator_value(td))
27
+ when 6 then
28
+ set!(:type, td.content.strip)
29
+ when 7
30
+ set!(:anonymity, td.content.strip)
31
+ else
32
+ # nothing
33
+ end
34
+ end
35
+ end
36
+
37
+ private
38
+
39
+ def parse_addr(html_doc)
40
+ good = []
41
+ bytes = []
42
+ css = html_doc.at_xpath('span/style/text()').to_s
43
+ css.split.each { |l| good << Regexp.last_match(1) if l =~ /\.(.+?)\{.*inline/ }
44
+
45
+ html_doc.xpath('span/span | span | span/text()').each do |span|
46
+ if span.is_a?(Nokogiri::XML::Text)
47
+ bytes << Regexp.last_match(1) if span.content.strip =~ /\.{0,1}(.+)\.{0,1}/
48
+ elsif (span['style'] && span['style'] =~ /inline/) ||
49
+ (span['class'] && good.include?(span['class'])) ||
50
+ (span['class'] =~ /^[0-9]/)
51
+
52
+ bytes << span.content
53
+ end
54
+ end
55
+
56
+ bytes.join('.').gsub(/\.+/, '.')
57
+ end
58
+
59
+ def parse_response_time(html_doc)
60
+ Integer(html_doc.at_xpath('div')['rel'])
61
+ end
62
+
63
+ def parse_indicator_value(html_doc)
64
+ Integer(html_doc.at('.indicator').attr('style').match(/width: (\d+)%/i)[1])
65
+ end
66
+ end
67
+ end
68
+ end
69
+
70
+ ProxyFetcher::Configuration.register_provider(:hide_my_ass, ProxyFetcher::Providers::HideMyAss)
@@ -0,0 +1,49 @@
1
+ module ProxyFetcher
2
+ module Providers
3
+ class HideMyName < Base
4
+ PROVIDER_URL = 'https://hidemy.name/en/proxy-list/?type=hs#list'.freeze
5
+
6
+ class << self
7
+ def load_proxy_list
8
+ doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
+ doc.xpath('//table[@class="proxy__t"]/tbody/tr')
10
+ end
11
+ end
12
+
13
+ def parse!(html_entry)
14
+ html_entry.xpath('td').each_with_index do |td, index|
15
+ case index
16
+ when 0
17
+ set!(:addr, td.content.strip)
18
+ when 1 then
19
+ set!(:port, Integer(td.content.strip))
20
+ when 2 then
21
+ set!(:country, td.at_xpath('*//span[1]/following-sibling::text()[1]').content.strip)
22
+ when 3
23
+ set!(:response_time, Integer(td.at('p').content.strip[/\d+/]))
24
+ when 4
25
+ set!(:type, parse_type(td))
26
+ when 5
27
+ set!(:anonymity, td.content.strip)
28
+ else
29
+ # nothing
30
+ end
31
+ end
32
+ end
33
+
34
+ private
35
+
36
+ def parse_type(td)
37
+ schemas = td.content.strip
38
+
39
+ if schemas && schemas.downcase.include?('https')
40
+ 'HTTPS'
41
+ else
42
+ 'HTTP'
43
+ end
44
+ end
45
+ end
46
+ end
47
+ end
48
+
49
+ ProxyFetcher::Configuration.register_provider(:hide_my_name, ProxyFetcher::Providers::HideMyName)
@@ -1,10 +1,10 @@
1
1
  module ProxyFetcher
2
2
  class Proxy
3
3
  attr_reader :addr, :port, :country, :response_time,
4
- :connection_time, :speed, :type, :anonimity
4
+ :connection_time, :speed, :type, :anonymity
5
5
 
6
6
  def initialize(html_row)
7
- parse_row!(html_row)
7
+ ProxyFetcher.config.provider.parse_entry(html_row, self)
8
8
 
9
9
  self
10
10
  end
@@ -12,13 +12,13 @@ module ProxyFetcher
12
12
  def connectable?
13
13
  connection = Net::HTTP.new(addr, port)
14
14
  connection.use_ssl = true if https?
15
- connection.open_timeout = ProxyFetcher::Manager.config.open_timeout
16
- connection.read_timeout = ProxyFetcher::Manager.config.read_timeout
15
+ connection.open_timeout = ProxyFetcher.config.open_timeout
16
+ connection.read_timeout = ProxyFetcher.config.read_timeout
17
17
 
18
18
  connection.start { |http| return true if http.request_head('/') }
19
19
 
20
20
  false
21
- rescue Timeout::Error, Errno::ECONNREFUSED, Errno::ECONNRESET, Errno::ECONNABORTED
21
+ rescue Timeout::Error, Errno::ECONNREFUSED, Errno::ECONNRESET, Errno::ECONNABORTED, Errno::EOFError
22
22
  false
23
23
  end
24
24
 
@@ -51,60 +51,5 @@ module ProxyFetcher
51
51
  def url
52
52
  uri.to_s
53
53
  end
54
-
55
- private
56
-
57
- # HideMyAss proxy list rows parsing by columns
58
- def parse_row!(html)
59
- html.xpath('td').each_with_index do |td, index|
60
- case index
61
- when 1
62
- @addr = parse_addr(td)
63
- when 2 then
64
- @port = Integer(td.content.strip)
65
- when 3 then
66
- @country = td.content.strip
67
- when 4
68
- @response_time = parse_response_time(td)
69
- @speed = parse_indicator_value(td)
70
- when 5
71
- @connection_time = parse_indicator_value(td)
72
- when 6 then
73
- @type = td.content.strip
74
- when 7
75
- @anonymity = td.content.strip
76
- else
77
- # nothing
78
- end
79
- end
80
- end
81
-
82
- def parse_addr(html)
83
- good = []
84
- bytes = []
85
- css = html.at_xpath('span/style/text()').to_s
86
- css.split.each { |l| good << Regexp.last_match(1) if l =~ /\.(.+?)\{.*inline/ }
87
-
88
- html.xpath('span/span | span | span/text()').each do |span|
89
- if span.is_a?(Nokogiri::XML::Text)
90
- bytes << Regexp.last_match(1) if span.content.strip =~ /\.{0,1}(.+)\.{0,1}/
91
- elsif (span['style'] && span['style'] =~ /inline/) ||
92
- (span['class'] && good.include?(span['class'])) ||
93
- (span['class'] =~ /^[0-9]/)
94
-
95
- bytes << span.content
96
- end
97
- end
98
-
99
- bytes.join('.').gsub(/\.+/, '.')
100
- end
101
-
102
- def parse_response_time(html)
103
- Integer(html.at_xpath('div')['rel'])
104
- end
105
-
106
- def parse_indicator_value(html)
107
- Integer(html.at('.indicator').attr('style').match(/width: (\d+)%/i)[1])
108
- end
109
54
  end
110
55
  end