proxy_fetcher 0.3.1 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 27d97dfb3997e004b2fb1a07e935ba7a1132148d
4
- data.tar.gz: 8084082dc54f59c2c6bc8c0457b2ea766c479db0
3
+ metadata.gz: '0529f88d50000f7c2d3fad641372f5dcddfd40b3'
4
+ data.tar.gz: eab7966ac9aacf6cb62a8673280beed0fbb34330
5
5
  SHA512:
6
- metadata.gz: 44a592c9882b108ff9f78cc274c27d6e33e8460aa8b3599a0388abb75e787829565223fd4bdfe48a798ca8e153c79c9a3bd1db6184540bf47567b85fa5bf01a9
7
- data.tar.gz: 6bce6a4e8d0c197852edd8766c62a2e5850ab75e657f56b90f521307a98a3fa3995efd1219ff963648ad31dad2b5bc467a4c783887b724925e826393605cf36f
6
+ metadata.gz: 45964603f17dc94e09075a1fb2264bc35a56a0667bcbc03a2142156414a23f35ee2d98aad1326fd64c80be7db8a31f5103244852253b56f50c168bb33fac94c1
7
+ data.tar.gz: 88ef9ef4e52e31277d0881f0b14b8ebf3095af18b993bee6cf1f0941edd56251a5ee3ea71f7b8ce2f725a127c8c6ef6f9d2c1b5c290103d43879b621cdd1813b
@@ -3,6 +3,7 @@ LineLength:
3
3
  AllCops:
4
4
  Exclude:
5
5
  - 'spec/**/*'
6
+ - 'bin/*'
6
7
  DisplayCopNames: true
7
8
  Rails:
8
9
  Enabled: false
@@ -0,0 +1,83 @@
1
+ # Proxy Fetcher Changelog
2
+
3
+ Reverse Chronological Order:
4
+
5
+ ## `0.4.0` (2017-08-26)
6
+
7
+ * Support operations with multiple providers
8
+ * Refactor filtering
9
+ * Small bugfixes
10
+ * Documentation
11
+
12
+ ## `0.3.1` (2017-08-24)
13
+
14
+ * Remove speed from proxy (no need to)
15
+ * Extract proxy validation from the HTTPClient to separate class
16
+ * Make proxy validator configurable
17
+ * Refactor proxy validation behavior
18
+ * Refactor Proxy object (OpenStruct => PORO, url / uri methods, etc)
19
+ * Optimize proxy list check with threads
20
+ * Improve proxy_fetcher bin
21
+
22
+ ## `0.3.0` (2017-08-21)
23
+
24
+ * Proxy providers refactoring
25
+ * Proxy object refactoring
26
+ * Specs refactoring
27
+ * New providers
28
+ * Custom HTTP client
29
+ * Configuration improvements
30
+ * Proxy filters
31
+
32
+ ## `0.2.5` (2017-08-17)
33
+
34
+ * Configurable HTTPClient
35
+ * Fix errors handling
36
+
37
+ ## `0.2.3` (2017-08-10)
38
+
39
+ * Fix broken providers
40
+ * Add new providers
41
+ * Docs
42
+
43
+ ## `0.2.2` (2017-07-20)
44
+
45
+ * Code & specs refactoring
46
+
47
+ ## `0.2.1` (2017-07-19)
48
+
49
+ * New proxy providers
50
+ * Bugfixes
51
+
52
+ ## `0.2.0` (2017-07-17)
53
+
54
+ * New proxy providers
55
+ * Custom providers
56
+ * Network errors handling
57
+ * Specs refactorirng
58
+
59
+ ## `0.1.4` (2017-05-31)
60
+
61
+ * Code & specs refactoring
62
+ * Add `speed` to `Proxy` object
63
+ * Docs
64
+
65
+ ## `0.1.3` (2017-05-25)
66
+
67
+ * Proxy list management with `ProxyFetcher::Manager`
68
+
69
+ ## `0.1.2` (2017-05-23)
70
+
71
+ * HTTPS proccesing
72
+ * `Proxy` object sugar
73
+ * Specs improvements
74
+ * Docs improvements
75
+
76
+ ## `0.1.1` (2017-05-22)
77
+
78
+ * Configuration (timeouts)
79
+ * Documentation
80
+
81
+ ## `0.1.0` (2017-05-19)
82
+
83
+ * Initial release
@@ -0,0 +1,46 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
6
+
7
+ ## Our Standards
8
+
9
+ Examples of behavior that contributes to creating a positive environment include:
10
+
11
+ * Using welcoming and inclusive language
12
+ * Being respectful of differing viewpoints and experiences
13
+ * Gracefully accepting constructive criticism
14
+ * Focusing on what is best for the community
15
+ * Showing empathy towards other community members
16
+
17
+ Examples of unacceptable behavior by participants include:
18
+
19
+ * The use of sexualized language or imagery and unwelcome sexual attention or advances
20
+ * Trolling, insulting/derogatory comments, and personal or political attacks
21
+ * Public or private harassment
22
+ * Publishing others' private information, such as a physical or electronic address, without explicit permission
23
+ * Other conduct which could reasonably be considered inappropriate in a professional setting
24
+
25
+ ## Our Responsibilities
26
+
27
+ Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
28
+
29
+ Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
30
+
31
+ ## Scope
32
+
33
+ This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
34
+
35
+ ## Enforcement
36
+
37
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at bulajnikita@gmail.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
38
+
39
+ Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
40
+
41
+ ## Attribution
42
+
43
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
44
+
45
+ [homepage]: http://contributor-covenant.org
46
+ [version]: http://contributor-covenant.org/version/1/4/
data/README.md CHANGED
@@ -6,20 +6,33 @@
6
6
  [![License](http://img.shields.io/badge/license-MIT-brightgreen.svg)](#license)
7
7
 
8
8
  This gem can help your Ruby application to make HTTP(S) requests from proxy by fetching and validating actual
9
- proxy lists from the different providers like [HideMyName](https://hidemy.name/en/).
9
+ proxy lists from multiple providers like [HideMyName](https://hidemy.name/en/).
10
10
 
11
- It gives you a `Manager` class that can load proxy list, validate it and return random or specific proxy entry. Take a look
11
+ It gives you a `Manager` class that can load proxy lists, validate them and return random or specific proxies. Take a look
12
12
  at the documentation below to find all the gem features.
13
13
 
14
- Also this gem can be used as standalone solution for downloading and validating proxy lists from the different providers.
15
- Checkout examples of usage below.
14
+ Also this gem can be used with any other programming language (Go / Python / etc) as standalone solution for downloading and
15
+ validating proxy lists from the different providers. [Checkout examples](#standalone) of usage below.
16
+
17
+ ## Table of Contents
18
+
19
+ - [Installation](#installation)
20
+ - [Example of usage](#example-of-usage)
21
+ - [In Ruby application](#in-ruby-application)
22
+ - [Standalone](#standalone)
23
+ - [Configuration](#configuration)
24
+ - [Proxy validation speed](#proxy-validation-speed)
25
+ - [Proxy object](#proxy-object)
26
+ - [Providers](#providers)
27
+ - [Contributing](#contributing)
28
+ - [License](#license)
16
29
 
17
30
  ## Installation
18
31
 
19
32
  If using bundler, first add 'proxy_fetcher' to your Gemfile:
20
33
 
21
34
  ```ruby
22
- gem 'proxy_fetcher', '~> 0.3'
35
+ gem 'proxy_fetcher', '~> 0.4'
23
36
  ```
24
37
 
25
38
  or if you want to use the latest version (from `master` branch), then:
@@ -37,7 +50,7 @@ bundle install
37
50
  Otherwise simply install the gem:
38
51
 
39
52
  ```sh
40
- gem install proxy_fetcher -v '0.3'
53
+ gem install proxy_fetcher -v '0.4'
41
54
  ```
42
55
 
43
56
  ## Example of usage
@@ -63,12 +76,17 @@ manager.proxies
63
76
  #=> []
64
77
  ```
65
78
 
66
- If you wanna clean current proxy list from some dead servers that does not respond to the requests, than you can just call `cleanup!` method:
79
+ If you want to clean current proxy list from the dead servers that does not respond to the requests, than you can just call `cleanup!` method:
67
80
 
68
81
  ```ruby
69
82
  manager.cleanup! # or manager.validate!
70
83
  ```
71
84
 
85
+ In order to increase the speed of this operation proxy list validation is performed using Ruby threads.
86
+ By default, gem creates a pool with 10 threads, but you can increase this number by passing threads pool
87
+ size to the `#cleanup!` (or `#validate!`) method: `manager.validate!(50)`. In that case ProxyFetcher will
88
+ process all the fetched proxies in group of 50 threads.
89
+
72
90
  Get raw proxy URLs as Strings:
73
91
 
74
92
  ```ruby
@@ -88,33 +106,61 @@ manager.refresh_list! # or manager.fetch!
88
106
  # @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
89
107
  ```
90
108
 
91
- If you need to filter proxy list, for example, by country or response time and selected provider supports filtering by GET params, then you
92
- can pass your filters to the Manager instance like that:
109
+ You can use two methods to get the first proxy from the list:
110
+
111
+ * `get` or aliased `pop` (will return first proxy and move it to the end of the list)
112
+ * `get!` or aliased `pop!` (will return first **connectable** proxy and move it to the end of the list; all the proxies till the working one will be removed)
113
+
114
+ Or you can get just random proxy by calling `manager.random_proxy` or it's alias `manager.random`.
115
+
116
+ If you need to filter proxy list, for example, by country or response time and selected provider supports filtering with GET params,
117
+ then you can just pass your filters like a simple Ruby hash to the Manager instance:
93
118
 
94
119
  ```ruby
95
- ProxyFetcher.config.provider = :hide_my_name
120
+ ProxyFetcher.config.providers = :hide_my_name
96
121
 
97
- manager = ProxyFetcher::Manager.new(filters: { country: 'AO', maxtime: '500' })
122
+ manager = ProxyFetcher::Manager.new(filters: { country: 'PL', maxtime: '500' })
98
123
  manager.proxies
99
124
 
100
125
  # => [...]
101
126
  ```
102
127
 
103
- *NOTE*: not all the providers support filtering. Take a look at the provider class to see if it supports custom filters.
128
+ If you are using multiple providers, then you can split your filters by proxy provider names:
104
129
 
105
- You can use two methods to get the first proxy from the list:
130
+ ```ruby
131
+ ProxyFetcher.config.providers = [:hide_my_name, :xroxy]
132
+
133
+ manager = ProxyFetcher::Manager.new(filters: {
134
+ hide_my_name: {
135
+ country: 'PL',
136
+ maxtime: '500'
137
+ },
138
+ xroxy: {
139
+ type: 'All_http'
140
+ }
141
+ })
142
+
143
+ manager.proxies
106
144
 
107
- * `get` or aliased `pop` (will return first proxy and move it to the end of the list)
108
- * `get!` or aliased `pop!` (will return first **connectable** proxy and move it to the end of the list; all the proxies till the working one will be removed)
145
+ # => [...]
146
+ ```
109
147
 
110
- Or you can get just random proxy by calling `manager.random_proxy` or it's alias `manager.random`.
148
+ You can apply different filters every time you calling `#refresh_list!` (or `#fetch!`) method:
149
+
150
+ ```ruby
151
+ manager.refresh_list!(country: 'PL', maxtime: '500')
152
+
153
+ # => [...]
154
+ ```
155
+
156
+ *NOTE*: not all the providers support filtering. Take a look at the provider classes to see if it supports custom filters.
111
157
 
112
158
  ### Standalone
113
159
 
114
- All you need to use this gem is Ruby >= 2.0 (2.3 is recommended). You can install it in a different ways. If you are using Ubuntu Xenial (16.04LTS)
160
+ All you need to use this gem is Ruby >= 2.0 (2.4 is recommended). You can install it in a different ways. If you are using Ubuntu Xenial (16.04LTS)
115
161
  then you already have Ruby 2.3 installed. In other cases you can install it with [RVM](https://rvm.io/) or [rbenv](https://github.com/rbenv/rbenv).
116
162
 
117
- Just install the gem by running `gem install proxy_fetcher` in your terminal and run it:
163
+ After installing Ruby just bundle the gem by running `gem install proxy_fetcher` in your terminal and now you can run it:
118
164
 
119
165
  ```bash
120
166
  proxy_fetcher >> proxies.txt # Will download proxies from the default provider, validate them and write to file
@@ -142,27 +188,6 @@ To get all the possible options run:
142
188
  proxy_fetcher --help
143
189
  ```
144
190
 
145
- ## Proxy object
146
-
147
- Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance variables):
148
-
149
- * `addr` (IP address)
150
- * `port`
151
- * `type` (proxy type, can be HTTP, HTTPS, SOCKS4 or/and SOCKS5)
152
- * `country` (USA or Brazil for example)
153
- * `response_time` (5217 for example)
154
- * `anonymity` (`Low`, `Elite proxy` or `High +KA` for example)
155
-
156
- Also you can call next instance methods for every Proxy object:
157
-
158
- * `connectable?` (whether proxy server is available)
159
- * `http?` (whether proxy server has a HTTP protocol)
160
- * `https?` (whether proxy server has a HTTPS protocol)
161
- * `socks4?`
162
- * `socks5?`
163
- * `uri` (returns `URI::Generic` object)
164
- * `url` (returns a formatted URL like "_http://IP:PORT_" )
165
-
166
191
  ## Configuration
167
192
 
168
193
  To change open/read timeout for `cleanup!` and `connectable?` methods you need to change ProxyFetcher.config:
@@ -215,7 +240,7 @@ ProxyFetcher.config.proxy_validator = MyProxyValidator
215
240
  manager = ProxyFetcher::Manager.new
216
241
  manager.proxies
217
242
 
218
- #=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
243
+ #=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
219
244
  # @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
220
245
 
221
246
  manager.validate!
@@ -223,6 +248,48 @@ manager.validate!
223
248
  #=> [ ... ]
224
249
  ```
225
250
 
251
+ ### Proxy validation speed
252
+
253
+ There are some tricks to increase proxy list validation performance.
254
+
255
+ In a few words, ProxyFetcher gem uses threads to validate proxies for availability. Every proxy is checked in a
256
+ separate thread. By default, ProxyFetcher uses a pool with a maximum of 10 threads. You can increase this number by
257
+ setting max number of threads in the config:
258
+
259
+ ```ruby
260
+ ProxyFetcher.config.pool_size = 50
261
+ ```
262
+
263
+ You can experiment with the threads pool size to find an optimal number of maximum threads count for you PC and OS.
264
+ This will definitely give you some performance improvements.
265
+
266
+ Moreover, the common proxy validation speed depends on `ProxyFetcher.config.connection_timeout` option that is equal
267
+ to `3` by default. It means that gem will wait 3 seconds for the server answer to check if particular proxy is connectable.
268
+ You can decrease this option to `1`, for example, and it will heavily increase proxy validation speed (**but remember**
269
+ that some proxies could be connectable, but slow, so with this option you will clear proxy list from the proxies that
270
+ works, but very slow).
271
+
272
+ ## Proxy object
273
+
274
+ Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance variables):
275
+
276
+ * `addr` (IP address)
277
+ * `port`
278
+ * `type` (proxy type, can be HTTP, HTTPS, SOCKS4 or/and SOCKS5)
279
+ * `country` (USA or Brazil for example)
280
+ * `response_time` (5217 for example)
281
+ * `anonymity` (`Low`, `Elite proxy` or `High +KA` for example)
282
+
283
+ Also you can call next instance methods for every Proxy object:
284
+
285
+ * `connectable?` (whether proxy server is available)
286
+ * `http?` (whether proxy server has a HTTP protocol)
287
+ * `https?` (whether proxy server has a HTTPS protocol)
288
+ * `socks4?`
289
+ * `socks5?`
290
+ * `uri` (returns `URI::Generic` object)
291
+ * `url` (returns a formatted URL like "_http://IP:PORT_" )
292
+
226
293
  ## Providers
227
294
 
228
295
  Currently ProxyFetcher can deal with next proxy providers (services):
@@ -234,7 +301,7 @@ Currently ProxyFetcher can deal with next proxy providers (services):
234
301
  * Proxy List
235
302
  * XRoxy
236
303
 
237
- If you wanna use one of them just setup required in the config:
304
+ If you wanna use one of them just setup it in the config:
238
305
 
239
306
  ```ruby
240
307
  ProxyFetcher.config.provider = :free_proxy_list
@@ -244,7 +311,29 @@ manager.proxies
244
311
  #=> ...
245
312
  ```
246
313
 
247
- Also you can write your own provider. All you need is to create a class, that would be inherited from the
314
+ You can use multiple providers at the same time:
315
+
316
+ ```ruby
317
+ ProxyFetcher.config.providers = :free_proxy_list, :xroxy, :proxy_docker
318
+
319
+ manager = ProxyFetcher::Manager.new
320
+ manager.proxies
321
+ #=> ...
322
+ ```
323
+
324
+ If you want to use all the possible proxy providers then you can configure ProxyFetcher as follows:
325
+
326
+ ```ruby
327
+ ProxyFetcher.config.providers = ProxyFetcher::Configuration.registered_providers
328
+
329
+ manager = ProxyFetcher::Manager.new.proxies
330
+ manager.proxies
331
+
332
+ #=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
333
+ # @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
334
+ ```
335
+
336
+ Moreover, you can write your own provider! All you need is to create a class, that would be inherited from the
248
337
  `ProxyFetcher::Providers::Base` class, and register your provider like this:
249
338
 
250
339
  ```ruby
@@ -17,17 +17,8 @@ OptionParser.new do |opts|
17
17
  exit(0)
18
18
  end
19
19
 
20
- opts.on('-p', '--provider=NAME', '# Use specific proxy provider') do |value|
21
- provider_name = value.downcase
22
-
23
- unless ProxyFetcher::Configuration.providers.include?(provider_name.to_sym)
24
- possible_providers = ProxyFetcher::Configuration.providers.keys
25
-
26
- puts "Unknown provider - '#{value}'.\nUse one of the following: #{possible_providers.join(', ')}."
27
- exit(0)
28
- end
29
-
30
- options[:provider] = provider_name
20
+ opts.on('-p', '--providers=NAME1,NAME2', Array, '# Use specific proxy providers') do |values|
21
+ options[:providers] = values
31
22
  end
32
23
 
33
24
  opts.on('-n', '--no-validate', '# Dump all the proxies without validation') do
@@ -49,7 +40,7 @@ OptionParser.new do |opts|
49
40
  end
50
41
  end.parse!
51
42
 
52
- ProxyFetcher.config.provider = options[:provider] if options[:provider]
43
+ ProxyFetcher.config.providers = options[:providers] if options[:providers]
53
44
  ProxyFetcher.config.connection_timeout = options[:timeout] if options[:timeout]
54
45
 
55
46
  manager = ProxyFetcher::Manager.new(filters: options[:filters])
@@ -4,6 +4,7 @@ require 'nokogiri'
4
4
  require 'thread'
5
5
 
6
6
  require File.dirname(__FILE__) + '/proxy_fetcher/configuration'
7
+ require File.dirname(__FILE__) + '/proxy_fetcher/configuration/providers_registry'
7
8
  require File.dirname(__FILE__) + '/proxy_fetcher/proxy'
8
9
  require File.dirname(__FILE__) + '/proxy_fetcher/manager'
9
10
 
@@ -1,21 +1,21 @@
1
1
  module ProxyFetcher
2
2
  class Configuration
3
- UnknownProvider = Class.new(StandardError)
4
- RegisteredProvider = Class.new(StandardError)
5
3
  WrongCustomClass = Class.new(StandardError)
6
4
 
7
- attr_accessor :provider, :connection_timeout
8
- attr_accessor :http_client, :proxy_validator, :logger
5
+ attr_accessor :providers, :connection_timeout, :pool_size
6
+ attr_accessor :http_client, :proxy_validator
9
7
 
10
8
  class << self
11
- def providers
12
- @providers ||= {}
9
+ def providers_registry
10
+ @registry ||= ProvidersRegistry.new
13
11
  end
14
12
 
15
13
  def register_provider(name, klass)
16
- raise RegisteredProvider, "`#{name}` provider already registered!" if providers.key?(name.to_sym)
14
+ providers_registry.register(name, klass)
15
+ end
17
16
 
18
- providers[name.to_sym] = klass
17
+ def registered_providers
18
+ providers_registry.providers.keys
19
19
  end
20
20
  end
21
21
 
@@ -23,20 +23,23 @@ module ProxyFetcher
23
23
  reset!
24
24
  end
25
25
 
26
+ # Sets default configuration options
26
27
  def reset!
28
+ @pool_size = 10
27
29
  @connection_timeout = 3
28
30
  @http_client = HTTPClient
29
31
  @proxy_validator = ProxyValidator
30
32
 
31
- self.provider = :hide_my_name # currently default one
33
+ self.providers = [:hide_my_name] # currently default one
32
34
  end
33
35
 
34
- def provider=(name)
35
- @provider = self.class.providers[name.to_sym]
36
-
37
- raise UnknownProvider, "unregistered proxy provider `#{name}`!" if @provider.nil?
36
+ def providers=(value)
37
+ @providers = Array(value)
38
38
  end
39
39
 
40
+ alias provider providers
41
+ alias provider= providers=
42
+
40
43
  def http_client=(klass)
41
44
  @http_client = setup_custom_class(klass, required_methods: :fetch)
42
45
  end
@@ -47,6 +50,7 @@ module ProxyFetcher
47
50
 
48
51
  private
49
52
 
53
+ # Checks if custom class has some required class methods
50
54
  def setup_custom_class(klass, required_methods: [])
51
55
  unless klass.respond_to?(*required_methods)
52
56
  raise WrongCustomClass, "#{klass} must respond to [#{Array(required_methods).join(', ')}] class methods!"
@@ -0,0 +1,29 @@
1
+ module ProxyFetcher
2
+ class ProvidersRegistry
3
+ UnknownProvider = Class.new(StandardError)
4
+ RegisteredProvider = Class.new(StandardError)
5
+
6
+ def providers
7
+ @providers ||= {}
8
+ end
9
+
10
+ # Add custom provider to common registry.
11
+ # Requires proxy provider name ('hide_my_name' for example) and a class
12
+ # that implements the parsing logic.
13
+ def register(name, klass)
14
+ raise RegisteredProvider, "`#{name}` provider already registered!" if providers.key?(name.to_sym)
15
+
16
+ providers[name.to_sym] = klass
17
+ end
18
+
19
+ # Returns a class for specific provider if it is
20
+ # registered in the registry. Otherwise throws an exception.
21
+ def class_for(provider_name)
22
+ provider_name = provider_name.to_sym
23
+
24
+ providers.fetch(provider_name)
25
+ rescue KeyError
26
+ raise UnknownProvider, "unregistered proxy provider `#{provider_name}`"
27
+ end
28
+ end
29
+ end
@@ -1,22 +1,29 @@
1
1
  module ProxyFetcher
2
2
  class Manager
3
- attr_reader :proxies, :filters
3
+ attr_reader :proxies
4
4
 
5
5
  # refresh: true - load proxy list from the remote server on initialization
6
6
  # refresh: false - just initialize the class, proxy list will be empty ([])
7
- def initialize(refresh: true, filters: {})
8
- @filters = filters
9
-
7
+ def initialize(refresh: true, validate: false, filters: {})
10
8
  if refresh
11
- refresh_list!
9
+ refresh_list!(filters)
12
10
  else
13
11
  @proxies = []
14
12
  end
13
+
14
+ cleanup! if validate
15
15
  end
16
16
 
17
17
  # Update current proxy list from the provider
18
- def refresh_list!
19
- @proxies = ProxyFetcher.config.provider.fetch_proxies!(filters)
18
+ def refresh_list!(filters = nil)
19
+ @proxies = []
20
+
21
+ ProxyFetcher.config.providers.each do |provider_name|
22
+ provider = ProxyFetcher::Configuration.providers_registry.class_for(provider_name)
23
+ provider_filters = filters && filters.fetch(provider_name.to_sym, filters)
24
+
25
+ @proxies.concat(provider.fetch_proxies!(provider_filters))
26
+ end
20
27
  end
21
28
 
22
29
  alias fetch! refresh_list!
@@ -50,10 +57,10 @@ module ProxyFetcher
50
57
  alias pop! get!
51
58
 
52
59
  # Clean current proxy list from dead proxies (that doesn't respond by timeout)
53
- def cleanup!(pool_size = 10)
60
+ def cleanup!
54
61
  lock = Mutex.new
55
62
 
56
- proxies.dup.each_slice(pool_size) do |proxy_group|
63
+ proxies.dup.each_slice(ProxyFetcher.config.pool_size) do |proxy_group|
57
64
  threads = proxy_group.map do |group_proxy|
58
65
  Thread.new(group_proxy, proxies) do |proxy, proxies|
59
66
  lock.synchronize { proxies.delete(proxy) } unless proxy.connectable?
@@ -7,8 +7,8 @@ module ProxyFetcher
7
7
 
8
8
  def_delegators ProxyFetcher::HTML, :clear, :convert_to_int
9
9
 
10
- attr_reader :proxy
11
-
10
+ # Loads proxy provider page content, extract proxy list from it
11
+ # and convert every entry to proxy object.
12
12
  def fetch_proxies!(filters = {})
13
13
  load_proxy_list(filters).map { |html| to_proxy(html) }
14
14
  end
@@ -23,8 +23,10 @@ module ProxyFetcher
23
23
 
24
24
  # Loads HTML document with Nokogiri by the URL combined with custom filters
25
25
  def load_document(url, filters = {})
26
+ raise ArgumentError, 'filters must be a Hash' if filters && !filters.is_a?(Hash)
27
+
26
28
  uri = URI.parse(url)
27
- uri.query = URI.encode_www_form(filters) if filters.any?
29
+ uri.query = URI.encode_www_form(filters) if filters && filters.any?
28
30
 
29
31
  Nokogiri::HTML(ProxyFetcher.config.http_client.fetch(uri.to_s))
30
32
  end
@@ -11,12 +11,18 @@ module ProxyFetcher
11
11
 
12
12
  TYPES.each do |proxy_type|
13
13
  define_method "#{proxy_type.downcase}?" do
14
- type && type.upcase.include?(proxy_type)
14
+ !type.nil? && type.upcase.include?(proxy_type)
15
15
  end
16
16
  end
17
17
 
18
18
  alias ssl? https?
19
19
 
20
+ def initialize(attributes = {})
21
+ attributes.each do |attr, value|
22
+ public_send("#{attr}=", value)
23
+ end
24
+ end
25
+
20
26
  def connectable?
21
27
  ProxyFetcher.config.proxy_validator.connectable?(addr, port)
22
28
  end
@@ -7,9 +7,9 @@ module ProxyFetcher
7
7
  # Major version number
8
8
  MAJOR = 0
9
9
  # Minor version number
10
- MINOR = 3
10
+ MINOR = 4
11
11
  # Smallest version number
12
- TINY = 1
12
+ TINY = 0
13
13
 
14
14
  # Full version number
15
15
  STRING = [MAJOR, MINOR, TINY].compact.join('.')
@@ -5,7 +5,7 @@ require 'proxy_fetcher/version'
5
5
  Gem::Specification.new do |gem|
6
6
  gem.name = 'proxy_fetcher'
7
7
  gem.version = ProxyFetcher.gem_version
8
- gem.date = '2017-08-21'
8
+ gem.date = '2017-08-28'
9
9
  gem.summary = 'Ruby gem for dealing with proxy lists from different providers'
10
10
  gem.description = 'This gem can help your Ruby application to make HTTP(S) requests ' \
11
11
  'from proxy server by fetching and validating proxy lists from the different providers.'
@@ -43,21 +43,16 @@ describe ProxyFetcher::Configuration do
43
43
  end
44
44
 
45
45
  context 'custom provider' do
46
- it 'successfully setups if provider class registered' do
47
- CustomProvider = Class.new(ProxyFetcher::Providers::Base)
48
- ProxyFetcher::Configuration.register_provider(:custom_provider, CustomProvider)
49
-
50
- expect { ProxyFetcher.config.provider = :custom_provider }.not_to raise_error
46
+ it 'failed on registration if provider class already registered' do
47
+ expect { ProxyFetcher::Configuration.register_provider(:xroxy, Class.new) }
48
+ .to raise_error(ProxyFetcher::ProvidersRegistry::RegisteredProvider)
51
49
  end
52
50
 
53
- it 'failed on setup if provider class is not registered' do
54
- expect { ProxyFetcher.config.provider = :unexisting_provider }
55
- .to raise_error(ProxyFetcher::Configuration::UnknownProvider)
56
- end
51
+ it "failed on proxy list fetching if provider doesn't registered" do
52
+ ProxyFetcher.config.provider = :not_existing_provider
57
53
 
58
- it 'failed on setup if provider class already registered' do
59
- expect { ProxyFetcher::Configuration.register_provider(:xroxy, Class.new)}
60
- .to raise_error(ProxyFetcher::Configuration::RegisteredProvider)
54
+ expect { ProxyFetcher::Manager.new }
55
+ .to raise_error(ProxyFetcher::ProvidersRegistry::UnknownProvider)
61
56
  end
62
57
  end
63
58
  end
@@ -0,0 +1,21 @@
1
+ require 'spec_helper'
2
+
3
+ describe 'Multiple proxy providers' do
4
+ before { ProxyFetcher.config.reset! }
5
+ after { ProxyFetcher.config.reset! }
6
+
7
+ it 'combine proxies from multiple providers' do
8
+ proxy_stub = ProxyFetcher::Proxy.new(addr: '192.168.1.1', port: 8080)
9
+
10
+ # Each proxy provider will return 2 proxies
11
+ ProxyFetcher::Configuration.providers_registry.providers.each do |_name, klass|
12
+ allow_any_instance_of(klass).to receive(:load_proxy_list).and_return([1, 2])
13
+ allow_any_instance_of(klass).to receive(:to_proxy).and_return(proxy_stub)
14
+ end
15
+
16
+ all_providers = ProxyFetcher::Configuration.registered_providers
17
+ ProxyFetcher.config.providers = all_providers
18
+
19
+ expect(ProxyFetcher::Manager.new.proxies.size).to eq(all_providers.size * 2)
20
+ end
21
+ end
@@ -11,6 +11,15 @@ describe ProxyFetcher::Proxy do
11
11
 
12
12
  let(:proxy) { @manager.proxies.first.dup }
13
13
 
14
+ it 'can initialize a new proxy object' do
15
+ proxy = described_class.new(addr: '192.169.1.1', port: 8080, type: 'HTTP')
16
+
17
+ expect(proxy).not_to be_nil
18
+ expect(proxy.addr).to eq('192.169.1.1')
19
+ expect(proxy.port).to eq(8080)
20
+ expect(proxy.type).to eq('HTTP')
21
+ end
22
+
14
23
  it 'checks schema' do
15
24
  proxy.type = ProxyFetcher::Proxy::HTTP
16
25
  expect(proxy.http?).to be_truthy
@@ -9,9 +9,18 @@ RSpec.shared_examples 'a manager' do
9
9
  expect(manager.proxies).to be_empty
10
10
  end
11
11
 
12
- it 'returns Proxy objects' do
12
+ it 'returns valid Proxy objects' do
13
13
  manager = ProxyFetcher::Manager.new
14
14
  expect(manager.proxies).to all(be_a(ProxyFetcher::Proxy))
15
+
16
+ manager.proxies.each do |proxy|
17
+ expect(proxy.addr).to match(/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/i)
18
+ expect(proxy.port).to be_a_kind_of(Numeric)
19
+ expect(proxy.type).not_to be_empty
20
+ expect(proxy.country).not_to be_empty
21
+ expect(proxy.anonymity).not_to be_empty
22
+ expect(proxy.response_time).to be_nil.or(be_a_kind_of(Numeric))
23
+ end
15
24
  end
16
25
 
17
26
  it 'returns raw proxies (HOST:PORT)' do
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: proxy_fetcher
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Nikita Bulai
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-08-21 00:00:00.000000000 Z
11
+ date: 2017-08-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -55,6 +55,8 @@ files:
55
55
  - ".gitignore"
56
56
  - ".rubocop.yml"
57
57
  - ".travis.yml"
58
+ - CHANGELOG.md
59
+ - CODE_OF_CONDUCT.md
58
60
  - Gemfile
59
61
  - LICENSE
60
62
  - README.md
@@ -62,6 +64,7 @@ files:
62
64
  - bin/proxy_fetcher
63
65
  - lib/proxy_fetcher.rb
64
66
  - lib/proxy_fetcher/configuration.rb
67
+ - lib/proxy_fetcher/configuration/providers_registry.rb
65
68
  - lib/proxy_fetcher/manager.rb
66
69
  - lib/proxy_fetcher/providers/base.rb
67
70
  - lib/proxy_fetcher/providers/free_proxy_list.rb
@@ -81,6 +84,7 @@ files:
81
84
  - spec/proxy_fetcher/providers/free_proxy_list_spec.rb
82
85
  - spec/proxy_fetcher/providers/free_proxy_list_ssl_spec.rb
83
86
  - spec/proxy_fetcher/providers/hide_my_name_spec.rb
87
+ - spec/proxy_fetcher/providers/multiple_providers_spec.rb
84
88
  - spec/proxy_fetcher/providers/proxy_docker_spec.rb
85
89
  - spec/proxy_fetcher/providers/proxy_list_spec.rb
86
90
  - spec/proxy_fetcher/providers/xroxy_spec.rb