proxy_fetcher 0.1.5 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 04cf23c5f1bb6abfd29e5d6180a5a9e167a2d147
4
- data.tar.gz: 5590d391eee582e511027a8282ef38917aa653a9
3
+ metadata.gz: 7a36eb7a048bc985836e08d8e380412ceadd6e4b
4
+ data.tar.gz: cdceec65f99f25a4bfe8b8ae3531a9d203fb8710
5
5
  SHA512:
6
- metadata.gz: ccd0ef56339916919c8bf1b70dd3332d9972d6db1cc5b676b28b16691161955cffafbfa25bb310ba5b1d78a7b02dc7e8e440bb0a019736442b8bcbcfa3f8a004
7
- data.tar.gz: e72892832780fddb424adc29dcb3b500ede2998d452c333aeffcb2a4766483a079288ec3e52e1eeca4ff61f7801ddae3ba80cddac2d527eaeabed09cbd8f6736
6
+ metadata.gz: 39954c3255c4ea3642023b4ceb0a9c03d4623d2cecd2b75237f4b219af71a292408583602ca5d7e3ce06ec54a8ac39c4e0c83b7d98ecceaea76ca8d86ddc5f81
7
+ data.tar.gz: 362d73388a1ed8b9d674e571c55763c265d808415efbc5db8c96e94c8ab110fffde2b137436ffe521f54b9940805df2c846a392c2a3316153f235a47f47ae96f
data/README.md CHANGED
@@ -5,14 +5,16 @@
5
5
  [![License](http://img.shields.io/badge/license-MIT-brightgreen.svg)](#license)
6
6
 
7
7
  This gem can help your Ruby application to make HTTP(S) requests from proxy server, fetching and validating
8
- current proxy lists from the [HideMyAss](http://hidemyass.com/) service.
8
+ current proxy lists from the different proxy services like [HideMyAss](http://hidemyass.com/) or Hide My Name.
9
+
10
+ **IMPORTANT** currently HideMyAss service closed free proxy list service, but it will be open soon and gem will be updated.
9
11
 
10
12
  ## Installation
11
13
 
12
14
  If using bundler, first add 'proxy_fetcher' to your Gemfile:
13
15
 
14
16
  ```ruby
15
- gem 'proxy_fetcher', '~> 0.1'
17
+ gem 'proxy_fetcher', '~> 0.2'
16
18
  ```
17
19
 
18
20
  or if you want to use the latest version (from `master` branch), then:
@@ -30,7 +32,7 @@ bundle install
30
32
  Otherwise simply install the gem:
31
33
 
32
34
  ```sh
33
- gem install proxy_fetcher -v '0.1'
35
+ gem install proxy_fetcher -v '0.2'
34
36
  ```
35
37
 
36
38
  ## Example of usage
@@ -79,9 +81,9 @@ Every proxy is a `ProxyFetcher::Proxy` object that has next readers:
79
81
  * `port`
80
82
  * `country` (USA or Brazil for example)
81
83
  * `response_time` (5217 for example)
82
- * `connection_time` (rank from 0 to 100, where 0 — slow, 100 — high)
83
- * `speed` (rank from 0 to 100, where 0 — slow, 100 — high)
84
- * `type` (URI schema, HTTP for example)
84
+ * `connection_time` (rank from 0 to 100, where 0 — slow, 100 — high. **Note** depends on the proxy provider)
85
+ * `speed` (rank from 0 to 100, where 0 — slow, 100 — high. **Note** depends on the proxy provider)
86
+ * `type` (URI schema, HTTP or HTTPS)
85
87
  * `anonimity` (Low or High +KA for example)
86
88
 
87
89
  Also you can call next instance method for every Proxy object:
@@ -109,16 +111,45 @@ You can sort or find any proxy by speed using next 3 instance methods:
109
111
  * `medium?`
110
112
  * `slow?`'
111
113
 
112
- To change open/read timeout for `cleanup!` and `connectable?` methods yu need to change ProxyFetcher::Manager config:
114
+ To change open/read timeout for `cleanup!` and `connectable?` methods yu need to change ProxyFetcher.config:
113
115
 
114
116
  ```ruby
115
- ProxyFetcher::Manager.config.read_timeout = 1 # default is 3
116
- ProxyFetcher::Manager.config.open_timeout = 1 # default is 3
117
+ ProxyFetcher.config.read_timeout = 1 # default is 3
118
+ ProxyFetcher.config.open_timeout = 1 # default is 3
117
119
 
118
120
  manager = ProxyFetcher::Manager.new
119
121
  manager.cleanup!
120
122
  ```
121
123
 
124
+ ## Providers
125
+
126
+ Currently ProxyFetcher can deal with next proxy providers:
127
+
128
+ * Hide My Name (default one)
129
+ * Free Proxy List
130
+ * HideMyAss
131
+
132
+ If you wanna use one of them just setup required in the config:
133
+
134
+
135
+ ```ruby
136
+ ProxyFetcher.config.provider = :free_proxy_list
137
+
138
+ manager = ProxyFetcher::Manager.new
139
+ manager.proxies
140
+ #=> ...
141
+ ```
142
+
143
+ Also you can write your own provider. All you need is to create a class, that would be inherited from the
144
+ `ProxyFetcher::Providers::Base` class, and register your provider like this:
145
+
146
+ ```ruby
147
+ ProxyFetcher::Configuration.register_provider(:your_provider, YourProviderClass)
148
+ ```
149
+
150
+ Provider class must implement `self.load_proxy_list` and `#parse!(html_entry)` methods that will load and parse
151
+ provider HTML page with proxy list. Take a look at the samples in the `proxy_fetcher/providers` directory.
152
+
122
153
  ## TODO
123
154
 
124
155
  * Proxy filters
data/lib/proxy_fetcher.rb CHANGED
@@ -5,6 +5,15 @@ require 'nokogiri'
5
5
  require 'proxy_fetcher/configuration'
6
6
  require 'proxy_fetcher/proxy'
7
7
  require 'proxy_fetcher/manager'
8
+ require 'proxy_fetcher/providers/base'
9
+ require 'proxy_fetcher/providers/hide_my_ass'
10
+ require 'proxy_fetcher/providers/hide_my_name'
11
+ require 'proxy_fetcher/providers/free_proxy_list'
8
12
 
9
13
  module ProxyFetcher
14
+ class << self
15
+ def config
16
+ @config ||= ProxyFetcher::Configuration.new
17
+ end
18
+ end
10
19
  end
@@ -1,10 +1,33 @@
1
1
  module ProxyFetcher
2
2
  class Configuration
3
- attr_accessor :open_timeout, :read_timeout
3
+ UnknownProvider = Class.new(StandardError)
4
+ ProviderRegistered = Class.new(StandardError)
5
+
6
+ attr_accessor :open_timeout, :read_timeout, :provider
7
+
8
+ class << self
9
+ def providers
10
+ @providers ||= {}
11
+ end
12
+
13
+ def register_provider(name, klass)
14
+ raise ProviderRegistered, "#{name} provider already registered!" if providers.key?(name.to_sym)
15
+
16
+ providers[name.to_sym] = klass
17
+ end
18
+ end
4
19
 
5
20
  def initialize
6
21
  @open_timeout = 3
7
22
  @read_timeout = 3
23
+
24
+ self.provider = :hide_my_name # currently default one
25
+ end
26
+
27
+ def provider=(name)
28
+ @provider = self.class.providers[name.to_sym]
29
+
30
+ raise UnknownProvider, "unregistered proxy provider (#{name})!" if @provider.nil?
8
31
  end
9
32
  end
10
33
  end
@@ -1,12 +1,6 @@
1
1
  module ProxyFetcher
2
2
  class Manager
3
- PROXY_PROVIDER_URL = 'http://proxylist.hidemyass.com/'.freeze
4
-
5
- class << self
6
- def config
7
- @config ||= ProxyFetcher::Configuration.new
8
- end
9
- end
3
+ EmptyProxyList = Class.new(StandardError)
10
4
 
11
5
  attr_reader :proxies
12
6
 
@@ -22,9 +16,7 @@ module ProxyFetcher
22
16
 
23
17
  # Update current proxy list from the provider
24
18
  def refresh_list!
25
- doc = Nokogiri::HTML(load_html(PROXY_PROVIDER_URL))
26
- rows = doc.xpath('//table[@id="listable"]/tbody/tr')
27
-
19
+ rows = ProxyFetcher.config.provider.load_proxy_list
28
20
  @proxies = rows.map { |row| Proxy.new(row) }
29
21
  end
30
22
 
@@ -45,11 +37,11 @@ module ProxyFetcher
45
37
  # Pop first valid proxy (and back it to the end of the proxy list)
46
38
  # Invalid proxies will be removed from the list
47
39
  def get!
48
- index = @proxies.find_index(&:connectable?)
40
+ index = proxies.find_index(&:connectable?)
49
41
  return if index.nil?
50
42
 
51
- proxy = @proxies.delete_at(index)
52
- tail = @proxies[index..-1]
43
+ proxy = proxies.delete_at(index)
44
+ tail = proxies[index..-1]
53
45
 
54
46
  @proxies = tail << proxy
55
47
 
@@ -65,7 +57,12 @@ module ProxyFetcher
65
57
 
66
58
  alias validate! cleanup!
67
59
 
68
- # Just schema + host + port
60
+ # Return random proxy
61
+ def random
62
+ proxies.sample
63
+ end
64
+
65
+ # Returns array of proxy URLs (just schema + host + port)
69
66
  def raw_proxies
70
67
  proxies.map(&:url)
71
68
  end
@@ -74,15 +71,5 @@ module ProxyFetcher
74
71
  def inspect
75
72
  to_s
76
73
  end
77
-
78
- private
79
-
80
- # Get HTML from the requested URL
81
- def load_html(url)
82
- uri = URI.parse(url)
83
- http = Net::HTTP.new(uri.host, uri.port)
84
- response = http.get(uri.request_uri)
85
- response.body
86
- end
87
74
  end
88
75
  end
@@ -0,0 +1,30 @@
1
+ module ProxyFetcher
2
+ module Providers
3
+ class Base
4
+ attr_reader :proxy
5
+
6
+ def initialize(proxy_instance)
7
+ @proxy = proxy_instance
8
+ end
9
+
10
+ def set!(name, value)
11
+ @proxy.instance_variable_set(:"@#{name}", value)
12
+ end
13
+
14
+ class << self
15
+ def parse_entry(entry, proxy_instance)
16
+ new(proxy_instance).parse!(entry)
17
+ end
18
+
19
+ # Get HTML from the requested URL
20
+ def load_html(url)
21
+ uri = URI.parse(url)
22
+ http = Net::HTTP.new(uri.host, uri.port)
23
+ http.use_ssl = true if uri.scheme == 'https'
24
+ response = http.get(uri.request_uri)
25
+ response.body
26
+ end
27
+ end
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,47 @@
1
+ module ProxyFetcher
2
+ module Providers
3
+ class FreeProxyList < Base
4
+ PROVIDER_URL = 'https://free-proxy-list.net/'.freeze
5
+
6
+ class << self
7
+ def load_proxy_list
8
+ doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
+ doc.xpath('//table[@id="proxylisttable"]/tbody/tr')
10
+ end
11
+ end
12
+
13
+ def parse!(html_entry)
14
+ html_entry.xpath('td').each_with_index do |td, index|
15
+ case index
16
+ when 0
17
+ set!(:addr, td.content.strip)
18
+ when 1 then
19
+ set!(:port, Integer(td.content.strip))
20
+ when 3 then
21
+ set!(:country, td.content.strip)
22
+ when 4
23
+ set!(:anonymity, td.content.strip)
24
+ when 6
25
+ set!(:type, parse_type(td))
26
+ else
27
+ # nothing
28
+ end
29
+ end
30
+ end
31
+
32
+ private
33
+
34
+ def parse_type(td)
35
+ type = td.content.strip
36
+
37
+ if type && type.downcase.include?('yes')
38
+ 'HTTPS'
39
+ else
40
+ 'HTTP'
41
+ end
42
+ end
43
+ end
44
+ end
45
+ end
46
+
47
+ ProxyFetcher::Configuration.register_provider(:free_proxy_list, ProxyFetcher::Providers::FreeProxyList)
@@ -0,0 +1,70 @@
1
+ module ProxyFetcher
2
+ module Providers
3
+ class HideMyAss < Base
4
+ PROVIDER_URL = 'http://proxylist.hidemyass.com/'.freeze
5
+
6
+ class << self
7
+ def load_proxy_list
8
+ doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
+ doc.xpath('//table[@id="listable"]/tbody/tr')
10
+ end
11
+ end
12
+
13
+ def parse!(html_entry)
14
+ html_entry.xpath('td').each_with_index do |td, index|
15
+ case index
16
+ when 1
17
+ set!(:addr, parse_addr(td))
18
+ when 2 then
19
+ set!(:port, Integer(td.content.strip))
20
+ when 3 then
21
+ set!(:country, td.content.strip)
22
+ when 4
23
+ set!(:response_time, parse_response_time(td))
24
+ set!(:speed, parse_indicator_value(td))
25
+ when 5
26
+ set!(:connection_time, parse_indicator_value(td))
27
+ when 6 then
28
+ set!(:type, td.content.strip)
29
+ when 7
30
+ set!(:anonymity, td.content.strip)
31
+ else
32
+ # nothing
33
+ end
34
+ end
35
+ end
36
+
37
+ private
38
+
39
+ def parse_addr(html_doc)
40
+ good = []
41
+ bytes = []
42
+ css = html_doc.at_xpath('span/style/text()').to_s
43
+ css.split.each { |l| good << Regexp.last_match(1) if l =~ /\.(.+?)\{.*inline/ }
44
+
45
+ html_doc.xpath('span/span | span | span/text()').each do |span|
46
+ if span.is_a?(Nokogiri::XML::Text)
47
+ bytes << Regexp.last_match(1) if span.content.strip =~ /\.{0,1}(.+)\.{0,1}/
48
+ elsif (span['style'] && span['style'] =~ /inline/) ||
49
+ (span['class'] && good.include?(span['class'])) ||
50
+ (span['class'] =~ /^[0-9]/)
51
+
52
+ bytes << span.content
53
+ end
54
+ end
55
+
56
+ bytes.join('.').gsub(/\.+/, '.')
57
+ end
58
+
59
+ def parse_response_time(html_doc)
60
+ Integer(html_doc.at_xpath('div')['rel'])
61
+ end
62
+
63
+ def parse_indicator_value(html_doc)
64
+ Integer(html_doc.at('.indicator').attr('style').match(/width: (\d+)%/i)[1])
65
+ end
66
+ end
67
+ end
68
+ end
69
+
70
+ ProxyFetcher::Configuration.register_provider(:hide_my_ass, ProxyFetcher::Providers::HideMyAss)
@@ -0,0 +1,49 @@
1
+ module ProxyFetcher
2
+ module Providers
3
+ class HideMyName < Base
4
+ PROVIDER_URL = 'https://hidemy.name/en/proxy-list/?type=hs#list'.freeze
5
+
6
+ class << self
7
+ def load_proxy_list
8
+ doc = Nokogiri::HTML(load_html(PROVIDER_URL))
9
+ doc.xpath('//table[@class="proxy__t"]/tbody/tr')
10
+ end
11
+ end
12
+
13
+ def parse!(html_entry)
14
+ html_entry.xpath('td').each_with_index do |td, index|
15
+ case index
16
+ when 0
17
+ set!(:addr, td.content.strip)
18
+ when 1 then
19
+ set!(:port, Integer(td.content.strip))
20
+ when 2 then
21
+ set!(:country, td.at_xpath('*//span[1]/following-sibling::text()[1]').content.strip)
22
+ when 3
23
+ set!(:response_time, Integer(td.at('p').content.strip[/\d+/]))
24
+ when 4
25
+ set!(:type, parse_type(td))
26
+ when 5
27
+ set!(:anonymity, td.content.strip)
28
+ else
29
+ # nothing
30
+ end
31
+ end
32
+ end
33
+
34
+ private
35
+
36
+ def parse_type(td)
37
+ schemas = td.content.strip
38
+
39
+ if schemas && schemas.downcase.include?('https')
40
+ 'HTTPS'
41
+ else
42
+ 'HTTP'
43
+ end
44
+ end
45
+ end
46
+ end
47
+ end
48
+
49
+ ProxyFetcher::Configuration.register_provider(:hide_my_name, ProxyFetcher::Providers::HideMyName)
@@ -1,10 +1,10 @@
1
1
  module ProxyFetcher
2
2
  class Proxy
3
3
  attr_reader :addr, :port, :country, :response_time,
4
- :connection_time, :speed, :type, :anonimity
4
+ :connection_time, :speed, :type, :anonymity
5
5
 
6
6
  def initialize(html_row)
7
- parse_row!(html_row)
7
+ ProxyFetcher.config.provider.parse_entry(html_row, self)
8
8
 
9
9
  self
10
10
  end
@@ -12,13 +12,13 @@ module ProxyFetcher
12
12
  def connectable?
13
13
  connection = Net::HTTP.new(addr, port)
14
14
  connection.use_ssl = true if https?
15
- connection.open_timeout = ProxyFetcher::Manager.config.open_timeout
16
- connection.read_timeout = ProxyFetcher::Manager.config.read_timeout
15
+ connection.open_timeout = ProxyFetcher.config.open_timeout
16
+ connection.read_timeout = ProxyFetcher.config.read_timeout
17
17
 
18
18
  connection.start { |http| return true if http.request_head('/') }
19
19
 
20
20
  false
21
- rescue Timeout::Error, Errno::ECONNREFUSED, Errno::ECONNRESET, Errno::ECONNABORTED
21
+ rescue Timeout::Error, Errno::ECONNREFUSED, Errno::ECONNRESET, Errno::ECONNABORTED, Errno::EOFError
22
22
  false
23
23
  end
24
24
 
@@ -51,60 +51,5 @@ module ProxyFetcher
51
51
  def url
52
52
  uri.to_s
53
53
  end
54
-
55
- private
56
-
57
- # HideMyAss proxy list rows parsing by columns
58
- def parse_row!(html)
59
- html.xpath('td').each_with_index do |td, index|
60
- case index
61
- when 1
62
- @addr = parse_addr(td)
63
- when 2 then
64
- @port = Integer(td.content.strip)
65
- when 3 then
66
- @country = td.content.strip
67
- when 4
68
- @response_time = parse_response_time(td)
69
- @speed = parse_indicator_value(td)
70
- when 5
71
- @connection_time = parse_indicator_value(td)
72
- when 6 then
73
- @type = td.content.strip
74
- when 7
75
- @anonymity = td.content.strip
76
- else
77
- # nothing
78
- end
79
- end
80
- end
81
-
82
- def parse_addr(html)
83
- good = []
84
- bytes = []
85
- css = html.at_xpath('span/style/text()').to_s
86
- css.split.each { |l| good << Regexp.last_match(1) if l =~ /\.(.+?)\{.*inline/ }
87
-
88
- html.xpath('span/span | span | span/text()').each do |span|
89
- if span.is_a?(Nokogiri::XML::Text)
90
- bytes << Regexp.last_match(1) if span.content.strip =~ /\.{0,1}(.+)\.{0,1}/
91
- elsif (span['style'] && span['style'] =~ /inline/) ||
92
- (span['class'] && good.include?(span['class'])) ||
93
- (span['class'] =~ /^[0-9]/)
94
-
95
- bytes << span.content
96
- end
97
- end
98
-
99
- bytes.join('.').gsub(/\.+/, '.')
100
- end
101
-
102
- def parse_response_time(html)
103
- Integer(html.at_xpath('div')['rel'])
104
- end
105
-
106
- def parse_indicator_value(html)
107
- Integer(html.at('.indicator').attr('style').match(/width: (\d+)%/i)[1])
108
- end
109
54
  end
110
55
  end