proxy_fetcher 0.3.1 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.rubocop.yml +1 -0
- data/CHANGELOG.md +83 -0
- data/CODE_OF_CONDUCT.md +46 -0
- data/README.md +131 -42
- data/bin/proxy_fetcher +3 -12
- data/lib/proxy_fetcher.rb +1 -0
- data/lib/proxy_fetcher/configuration.rb +17 -13
- data/lib/proxy_fetcher/configuration/providers_registry.rb +29 -0
- data/lib/proxy_fetcher/manager.rb +16 -9
- data/lib/proxy_fetcher/providers/base.rb +5 -3
- data/lib/proxy_fetcher/proxy.rb +7 -1
- data/lib/proxy_fetcher/version.rb +2 -2
- data/proxy_fetcher.gemspec +1 -1
- data/spec/proxy_fetcher/configuration_spec.rb +7 -12
- data/spec/proxy_fetcher/providers/multiple_providers_spec.rb +21 -0
- data/spec/proxy_fetcher/proxy_spec.rb +9 -0
- data/spec/support/manager_examples.rb +10 -1
- metadata +6 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: '0529f88d50000f7c2d3fad641372f5dcddfd40b3'
|
4
|
+
data.tar.gz: eab7966ac9aacf6cb62a8673280beed0fbb34330
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 45964603f17dc94e09075a1fb2264bc35a56a0667bcbc03a2142156414a23f35ee2d98aad1326fd64c80be7db8a31f5103244852253b56f50c168bb33fac94c1
|
7
|
+
data.tar.gz: 88ef9ef4e52e31277d0881f0b14b8ebf3095af18b993bee6cf1f0941edd56251a5ee3ea71f7b8ce2f725a127c8c6ef6f9d2c1b5c290103d43879b621cdd1813b
|
data/.rubocop.yml
CHANGED
data/CHANGELOG.md
ADDED
@@ -0,0 +1,83 @@
|
|
1
|
+
# Proxy Fetcher Changelog
|
2
|
+
|
3
|
+
Reverse Chronological Order:
|
4
|
+
|
5
|
+
## `0.4.0` (2017-08-26)
|
6
|
+
|
7
|
+
* Support operations with multiple providers
|
8
|
+
* Refactor filtering
|
9
|
+
* Small bugfixes
|
10
|
+
* Documentation
|
11
|
+
|
12
|
+
## `0.3.1` (2017-08-24)
|
13
|
+
|
14
|
+
* Remove speed from proxy (no need to)
|
15
|
+
* Extract proxy validation from the HTTPClient to separate class
|
16
|
+
* Make proxy validator configurable
|
17
|
+
* Refactor proxy validation behavior
|
18
|
+
* Refactor Proxy object (OpenStruct => PORO, url / uri methods, etc)
|
19
|
+
* Optimize proxy list check with threads
|
20
|
+
* Improve proxy_fetcher bin
|
21
|
+
|
22
|
+
## `0.3.0` (2017-08-21)
|
23
|
+
|
24
|
+
* Proxy providers refactoring
|
25
|
+
* Proxy object refactoring
|
26
|
+
* Specs refactoring
|
27
|
+
* New providers
|
28
|
+
* Custom HTTP client
|
29
|
+
* Configuration improvements
|
30
|
+
* Proxy filters
|
31
|
+
|
32
|
+
## `0.2.5` (2017-08-17)
|
33
|
+
|
34
|
+
* Configurable HTTPClient
|
35
|
+
* Fix errors handling
|
36
|
+
|
37
|
+
## `0.2.3` (2017-08-10)
|
38
|
+
|
39
|
+
* Fix broken providers
|
40
|
+
* Add new providers
|
41
|
+
* Docs
|
42
|
+
|
43
|
+
## `0.2.2` (2017-07-20)
|
44
|
+
|
45
|
+
* Code & specs refactoring
|
46
|
+
|
47
|
+
## `0.2.1` (2017-07-19)
|
48
|
+
|
49
|
+
* New proxy providers
|
50
|
+
* Bugfixes
|
51
|
+
|
52
|
+
## `0.2.0` (2017-07-17)
|
53
|
+
|
54
|
+
* New proxy providers
|
55
|
+
* Custom providers
|
56
|
+
* Network errors handling
|
57
|
+
* Specs refactorirng
|
58
|
+
|
59
|
+
## `0.1.4` (2017-05-31)
|
60
|
+
|
61
|
+
* Code & specs refactoring
|
62
|
+
* Add `speed` to `Proxy` object
|
63
|
+
* Docs
|
64
|
+
|
65
|
+
## `0.1.3` (2017-05-25)
|
66
|
+
|
67
|
+
* Proxy list management with `ProxyFetcher::Manager`
|
68
|
+
|
69
|
+
## `0.1.2` (2017-05-23)
|
70
|
+
|
71
|
+
* HTTPS proccesing
|
72
|
+
* `Proxy` object sugar
|
73
|
+
* Specs improvements
|
74
|
+
* Docs improvements
|
75
|
+
|
76
|
+
## `0.1.1` (2017-05-22)
|
77
|
+
|
78
|
+
* Configuration (timeouts)
|
79
|
+
* Documentation
|
80
|
+
|
81
|
+
## `0.1.0` (2017-05-19)
|
82
|
+
|
83
|
+
* Initial release
|
data/CODE_OF_CONDUCT.md
ADDED
@@ -0,0 +1,46 @@
|
|
1
|
+
# Contributor Covenant Code of Conduct
|
2
|
+
|
3
|
+
## Our Pledge
|
4
|
+
|
5
|
+
In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
|
6
|
+
|
7
|
+
## Our Standards
|
8
|
+
|
9
|
+
Examples of behavior that contributes to creating a positive environment include:
|
10
|
+
|
11
|
+
* Using welcoming and inclusive language
|
12
|
+
* Being respectful of differing viewpoints and experiences
|
13
|
+
* Gracefully accepting constructive criticism
|
14
|
+
* Focusing on what is best for the community
|
15
|
+
* Showing empathy towards other community members
|
16
|
+
|
17
|
+
Examples of unacceptable behavior by participants include:
|
18
|
+
|
19
|
+
* The use of sexualized language or imagery and unwelcome sexual attention or advances
|
20
|
+
* Trolling, insulting/derogatory comments, and personal or political attacks
|
21
|
+
* Public or private harassment
|
22
|
+
* Publishing others' private information, such as a physical or electronic address, without explicit permission
|
23
|
+
* Other conduct which could reasonably be considered inappropriate in a professional setting
|
24
|
+
|
25
|
+
## Our Responsibilities
|
26
|
+
|
27
|
+
Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
|
28
|
+
|
29
|
+
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
|
30
|
+
|
31
|
+
## Scope
|
32
|
+
|
33
|
+
This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
|
34
|
+
|
35
|
+
## Enforcement
|
36
|
+
|
37
|
+
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at bulajnikita@gmail.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
|
38
|
+
|
39
|
+
Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
|
40
|
+
|
41
|
+
## Attribution
|
42
|
+
|
43
|
+
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
|
44
|
+
|
45
|
+
[homepage]: http://contributor-covenant.org
|
46
|
+
[version]: http://contributor-covenant.org/version/1/4/
|
data/README.md
CHANGED
@@ -6,20 +6,33 @@
|
|
6
6
|
[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg)](#license)
|
7
7
|
|
8
8
|
This gem can help your Ruby application to make HTTP(S) requests from proxy by fetching and validating actual
|
9
|
-
proxy lists from
|
9
|
+
proxy lists from multiple providers like [HideMyName](https://hidemy.name/en/).
|
10
10
|
|
11
|
-
It gives you a `Manager` class that can load proxy
|
11
|
+
It gives you a `Manager` class that can load proxy lists, validate them and return random or specific proxies. Take a look
|
12
12
|
at the documentation below to find all the gem features.
|
13
13
|
|
14
|
-
Also this gem can be used
|
15
|
-
Checkout examples of usage below.
|
14
|
+
Also this gem can be used with any other programming language (Go / Python / etc) as standalone solution for downloading and
|
15
|
+
validating proxy lists from the different providers. [Checkout examples](#standalone) of usage below.
|
16
|
+
|
17
|
+
## Table of Contents
|
18
|
+
|
19
|
+
- [Installation](#installation)
|
20
|
+
- [Example of usage](#example-of-usage)
|
21
|
+
- [In Ruby application](#in-ruby-application)
|
22
|
+
- [Standalone](#standalone)
|
23
|
+
- [Configuration](#configuration)
|
24
|
+
- [Proxy validation speed](#proxy-validation-speed)
|
25
|
+
- [Proxy object](#proxy-object)
|
26
|
+
- [Providers](#providers)
|
27
|
+
- [Contributing](#contributing)
|
28
|
+
- [License](#license)
|
16
29
|
|
17
30
|
## Installation
|
18
31
|
|
19
32
|
If using bundler, first add 'proxy_fetcher' to your Gemfile:
|
20
33
|
|
21
34
|
```ruby
|
22
|
-
gem 'proxy_fetcher', '~> 0.
|
35
|
+
gem 'proxy_fetcher', '~> 0.4'
|
23
36
|
```
|
24
37
|
|
25
38
|
or if you want to use the latest version (from `master` branch), then:
|
@@ -37,7 +50,7 @@ bundle install
|
|
37
50
|
Otherwise simply install the gem:
|
38
51
|
|
39
52
|
```sh
|
40
|
-
gem install proxy_fetcher -v '0.
|
53
|
+
gem install proxy_fetcher -v '0.4'
|
41
54
|
```
|
42
55
|
|
43
56
|
## Example of usage
|
@@ -63,12 +76,17 @@ manager.proxies
|
|
63
76
|
#=> []
|
64
77
|
```
|
65
78
|
|
66
|
-
If you
|
79
|
+
If you want to clean current proxy list from the dead servers that does not respond to the requests, than you can just call `cleanup!` method:
|
67
80
|
|
68
81
|
```ruby
|
69
82
|
manager.cleanup! # or manager.validate!
|
70
83
|
```
|
71
84
|
|
85
|
+
In order to increase the speed of this operation proxy list validation is performed using Ruby threads.
|
86
|
+
By default, gem creates a pool with 10 threads, but you can increase this number by passing threads pool
|
87
|
+
size to the `#cleanup!` (or `#validate!`) method: `manager.validate!(50)`. In that case ProxyFetcher will
|
88
|
+
process all the fetched proxies in group of 50 threads.
|
89
|
+
|
72
90
|
Get raw proxy URLs as Strings:
|
73
91
|
|
74
92
|
```ruby
|
@@ -88,33 +106,61 @@ manager.refresh_list! # or manager.fetch!
|
|
88
106
|
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
89
107
|
```
|
90
108
|
|
91
|
-
|
92
|
-
|
109
|
+
You can use two methods to get the first proxy from the list:
|
110
|
+
|
111
|
+
* `get` or aliased `pop` (will return first proxy and move it to the end of the list)
|
112
|
+
* `get!` or aliased `pop!` (will return first **connectable** proxy and move it to the end of the list; all the proxies till the working one will be removed)
|
113
|
+
|
114
|
+
Or you can get just random proxy by calling `manager.random_proxy` or it's alias `manager.random`.
|
115
|
+
|
116
|
+
If you need to filter proxy list, for example, by country or response time and selected provider supports filtering with GET params,
|
117
|
+
then you can just pass your filters like a simple Ruby hash to the Manager instance:
|
93
118
|
|
94
119
|
```ruby
|
95
|
-
ProxyFetcher.config.
|
120
|
+
ProxyFetcher.config.providers = :hide_my_name
|
96
121
|
|
97
|
-
manager = ProxyFetcher::Manager.new(filters: { country: '
|
122
|
+
manager = ProxyFetcher::Manager.new(filters: { country: 'PL', maxtime: '500' })
|
98
123
|
manager.proxies
|
99
124
|
|
100
125
|
# => [...]
|
101
126
|
```
|
102
127
|
|
103
|
-
|
128
|
+
If you are using multiple providers, then you can split your filters by proxy provider names:
|
104
129
|
|
105
|
-
|
130
|
+
```ruby
|
131
|
+
ProxyFetcher.config.providers = [:hide_my_name, :xroxy]
|
132
|
+
|
133
|
+
manager = ProxyFetcher::Manager.new(filters: {
|
134
|
+
hide_my_name: {
|
135
|
+
country: 'PL',
|
136
|
+
maxtime: '500'
|
137
|
+
},
|
138
|
+
xroxy: {
|
139
|
+
type: 'All_http'
|
140
|
+
}
|
141
|
+
})
|
142
|
+
|
143
|
+
manager.proxies
|
106
144
|
|
107
|
-
|
108
|
-
|
145
|
+
# => [...]
|
146
|
+
```
|
109
147
|
|
110
|
-
|
148
|
+
You can apply different filters every time you calling `#refresh_list!` (or `#fetch!`) method:
|
149
|
+
|
150
|
+
```ruby
|
151
|
+
manager.refresh_list!(country: 'PL', maxtime: '500')
|
152
|
+
|
153
|
+
# => [...]
|
154
|
+
```
|
155
|
+
|
156
|
+
*NOTE*: not all the providers support filtering. Take a look at the provider classes to see if it supports custom filters.
|
111
157
|
|
112
158
|
### Standalone
|
113
159
|
|
114
|
-
All you need to use this gem is Ruby >= 2.0 (2.
|
160
|
+
All you need to use this gem is Ruby >= 2.0 (2.4 is recommended). You can install it in a different ways. If you are using Ubuntu Xenial (16.04LTS)
|
115
161
|
then you already have Ruby 2.3 installed. In other cases you can install it with [RVM](https://rvm.io/) or [rbenv](https://github.com/rbenv/rbenv).
|
116
162
|
|
117
|
-
|
163
|
+
After installing Ruby just bundle the gem by running `gem install proxy_fetcher` in your terminal and now you can run it:
|
118
164
|
|
119
165
|
```bash
|
120
166
|
proxy_fetcher >> proxies.txt # Will download proxies from the default provider, validate them and write to file
|
@@ -142,27 +188,6 @@ To get all the possible options run:
|
|
142
188
|
proxy_fetcher --help
|
143
189
|
```
|
144
190
|
|
145
|
-
## Proxy object
|
146
|
-
|
147
|
-
Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance variables):
|
148
|
-
|
149
|
-
* `addr` (IP address)
|
150
|
-
* `port`
|
151
|
-
* `type` (proxy type, can be HTTP, HTTPS, SOCKS4 or/and SOCKS5)
|
152
|
-
* `country` (USA or Brazil for example)
|
153
|
-
* `response_time` (5217 for example)
|
154
|
-
* `anonymity` (`Low`, `Elite proxy` or `High +KA` for example)
|
155
|
-
|
156
|
-
Also you can call next instance methods for every Proxy object:
|
157
|
-
|
158
|
-
* `connectable?` (whether proxy server is available)
|
159
|
-
* `http?` (whether proxy server has a HTTP protocol)
|
160
|
-
* `https?` (whether proxy server has a HTTPS protocol)
|
161
|
-
* `socks4?`
|
162
|
-
* `socks5?`
|
163
|
-
* `uri` (returns `URI::Generic` object)
|
164
|
-
* `url` (returns a formatted URL like "_http://IP:PORT_" )
|
165
|
-
|
166
191
|
## Configuration
|
167
192
|
|
168
193
|
To change open/read timeout for `cleanup!` and `connectable?` methods you need to change ProxyFetcher.config:
|
@@ -215,7 +240,7 @@ ProxyFetcher.config.proxy_validator = MyProxyValidator
|
|
215
240
|
manager = ProxyFetcher::Manager.new
|
216
241
|
manager.proxies
|
217
242
|
|
218
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
243
|
+
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
219
244
|
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
220
245
|
|
221
246
|
manager.validate!
|
@@ -223,6 +248,48 @@ manager.validate!
|
|
223
248
|
#=> [ ... ]
|
224
249
|
```
|
225
250
|
|
251
|
+
### Proxy validation speed
|
252
|
+
|
253
|
+
There are some tricks to increase proxy list validation performance.
|
254
|
+
|
255
|
+
In a few words, ProxyFetcher gem uses threads to validate proxies for availability. Every proxy is checked in a
|
256
|
+
separate thread. By default, ProxyFetcher uses a pool with a maximum of 10 threads. You can increase this number by
|
257
|
+
setting max number of threads in the config:
|
258
|
+
|
259
|
+
```ruby
|
260
|
+
ProxyFetcher.config.pool_size = 50
|
261
|
+
```
|
262
|
+
|
263
|
+
You can experiment with the threads pool size to find an optimal number of maximum threads count for you PC and OS.
|
264
|
+
This will definitely give you some performance improvements.
|
265
|
+
|
266
|
+
Moreover, the common proxy validation speed depends on `ProxyFetcher.config.connection_timeout` option that is equal
|
267
|
+
to `3` by default. It means that gem will wait 3 seconds for the server answer to check if particular proxy is connectable.
|
268
|
+
You can decrease this option to `1`, for example, and it will heavily increase proxy validation speed (**but remember**
|
269
|
+
that some proxies could be connectable, but slow, so with this option you will clear proxy list from the proxies that
|
270
|
+
works, but very slow).
|
271
|
+
|
272
|
+
## Proxy object
|
273
|
+
|
274
|
+
Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance variables):
|
275
|
+
|
276
|
+
* `addr` (IP address)
|
277
|
+
* `port`
|
278
|
+
* `type` (proxy type, can be HTTP, HTTPS, SOCKS4 or/and SOCKS5)
|
279
|
+
* `country` (USA or Brazil for example)
|
280
|
+
* `response_time` (5217 for example)
|
281
|
+
* `anonymity` (`Low`, `Elite proxy` or `High +KA` for example)
|
282
|
+
|
283
|
+
Also you can call next instance methods for every Proxy object:
|
284
|
+
|
285
|
+
* `connectable?` (whether proxy server is available)
|
286
|
+
* `http?` (whether proxy server has a HTTP protocol)
|
287
|
+
* `https?` (whether proxy server has a HTTPS protocol)
|
288
|
+
* `socks4?`
|
289
|
+
* `socks5?`
|
290
|
+
* `uri` (returns `URI::Generic` object)
|
291
|
+
* `url` (returns a formatted URL like "_http://IP:PORT_" )
|
292
|
+
|
226
293
|
## Providers
|
227
294
|
|
228
295
|
Currently ProxyFetcher can deal with next proxy providers (services):
|
@@ -234,7 +301,7 @@ Currently ProxyFetcher can deal with next proxy providers (services):
|
|
234
301
|
* Proxy List
|
235
302
|
* XRoxy
|
236
303
|
|
237
|
-
If you wanna use one of them just setup
|
304
|
+
If you wanna use one of them just setup it in the config:
|
238
305
|
|
239
306
|
```ruby
|
240
307
|
ProxyFetcher.config.provider = :free_proxy_list
|
@@ -244,7 +311,29 @@ manager.proxies
|
|
244
311
|
#=> ...
|
245
312
|
```
|
246
313
|
|
247
|
-
|
314
|
+
You can use multiple providers at the same time:
|
315
|
+
|
316
|
+
```ruby
|
317
|
+
ProxyFetcher.config.providers = :free_proxy_list, :xroxy, :proxy_docker
|
318
|
+
|
319
|
+
manager = ProxyFetcher::Manager.new
|
320
|
+
manager.proxies
|
321
|
+
#=> ...
|
322
|
+
```
|
323
|
+
|
324
|
+
If you want to use all the possible proxy providers then you can configure ProxyFetcher as follows:
|
325
|
+
|
326
|
+
```ruby
|
327
|
+
ProxyFetcher.config.providers = ProxyFetcher::Configuration.registered_providers
|
328
|
+
|
329
|
+
manager = ProxyFetcher::Manager.new.proxies
|
330
|
+
manager.proxies
|
331
|
+
|
332
|
+
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
333
|
+
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
334
|
+
```
|
335
|
+
|
336
|
+
Moreover, you can write your own provider! All you need is to create a class, that would be inherited from the
|
248
337
|
`ProxyFetcher::Providers::Base` class, and register your provider like this:
|
249
338
|
|
250
339
|
```ruby
|
data/bin/proxy_fetcher
CHANGED
@@ -17,17 +17,8 @@ OptionParser.new do |opts|
|
|
17
17
|
exit(0)
|
18
18
|
end
|
19
19
|
|
20
|
-
opts.on('-p', '--
|
21
|
-
|
22
|
-
|
23
|
-
unless ProxyFetcher::Configuration.providers.include?(provider_name.to_sym)
|
24
|
-
possible_providers = ProxyFetcher::Configuration.providers.keys
|
25
|
-
|
26
|
-
puts "Unknown provider - '#{value}'.\nUse one of the following: #{possible_providers.join(', ')}."
|
27
|
-
exit(0)
|
28
|
-
end
|
29
|
-
|
30
|
-
options[:provider] = provider_name
|
20
|
+
opts.on('-p', '--providers=NAME1,NAME2', Array, '# Use specific proxy providers') do |values|
|
21
|
+
options[:providers] = values
|
31
22
|
end
|
32
23
|
|
33
24
|
opts.on('-n', '--no-validate', '# Dump all the proxies without validation') do
|
@@ -49,7 +40,7 @@ OptionParser.new do |opts|
|
|
49
40
|
end
|
50
41
|
end.parse!
|
51
42
|
|
52
|
-
ProxyFetcher.config.
|
43
|
+
ProxyFetcher.config.providers = options[:providers] if options[:providers]
|
53
44
|
ProxyFetcher.config.connection_timeout = options[:timeout] if options[:timeout]
|
54
45
|
|
55
46
|
manager = ProxyFetcher::Manager.new(filters: options[:filters])
|
data/lib/proxy_fetcher.rb
CHANGED
@@ -4,6 +4,7 @@ require 'nokogiri'
|
|
4
4
|
require 'thread'
|
5
5
|
|
6
6
|
require File.dirname(__FILE__) + '/proxy_fetcher/configuration'
|
7
|
+
require File.dirname(__FILE__) + '/proxy_fetcher/configuration/providers_registry'
|
7
8
|
require File.dirname(__FILE__) + '/proxy_fetcher/proxy'
|
8
9
|
require File.dirname(__FILE__) + '/proxy_fetcher/manager'
|
9
10
|
|
@@ -1,21 +1,21 @@
|
|
1
1
|
module ProxyFetcher
|
2
2
|
class Configuration
|
3
|
-
UnknownProvider = Class.new(StandardError)
|
4
|
-
RegisteredProvider = Class.new(StandardError)
|
5
3
|
WrongCustomClass = Class.new(StandardError)
|
6
4
|
|
7
|
-
attr_accessor :
|
8
|
-
attr_accessor :http_client, :proxy_validator
|
5
|
+
attr_accessor :providers, :connection_timeout, :pool_size
|
6
|
+
attr_accessor :http_client, :proxy_validator
|
9
7
|
|
10
8
|
class << self
|
11
|
-
def
|
12
|
-
@
|
9
|
+
def providers_registry
|
10
|
+
@registry ||= ProvidersRegistry.new
|
13
11
|
end
|
14
12
|
|
15
13
|
def register_provider(name, klass)
|
16
|
-
|
14
|
+
providers_registry.register(name, klass)
|
15
|
+
end
|
17
16
|
|
18
|
-
|
17
|
+
def registered_providers
|
18
|
+
providers_registry.providers.keys
|
19
19
|
end
|
20
20
|
end
|
21
21
|
|
@@ -23,20 +23,23 @@ module ProxyFetcher
|
|
23
23
|
reset!
|
24
24
|
end
|
25
25
|
|
26
|
+
# Sets default configuration options
|
26
27
|
def reset!
|
28
|
+
@pool_size = 10
|
27
29
|
@connection_timeout = 3
|
28
30
|
@http_client = HTTPClient
|
29
31
|
@proxy_validator = ProxyValidator
|
30
32
|
|
31
|
-
self.
|
33
|
+
self.providers = [:hide_my_name] # currently default one
|
32
34
|
end
|
33
35
|
|
34
|
-
def
|
35
|
-
@
|
36
|
-
|
37
|
-
raise UnknownProvider, "unregistered proxy provider `#{name}`!" if @provider.nil?
|
36
|
+
def providers=(value)
|
37
|
+
@providers = Array(value)
|
38
38
|
end
|
39
39
|
|
40
|
+
alias provider providers
|
41
|
+
alias provider= providers=
|
42
|
+
|
40
43
|
def http_client=(klass)
|
41
44
|
@http_client = setup_custom_class(klass, required_methods: :fetch)
|
42
45
|
end
|
@@ -47,6 +50,7 @@ module ProxyFetcher
|
|
47
50
|
|
48
51
|
private
|
49
52
|
|
53
|
+
# Checks if custom class has some required class methods
|
50
54
|
def setup_custom_class(klass, required_methods: [])
|
51
55
|
unless klass.respond_to?(*required_methods)
|
52
56
|
raise WrongCustomClass, "#{klass} must respond to [#{Array(required_methods).join(', ')}] class methods!"
|
@@ -0,0 +1,29 @@
|
|
1
|
+
module ProxyFetcher
|
2
|
+
class ProvidersRegistry
|
3
|
+
UnknownProvider = Class.new(StandardError)
|
4
|
+
RegisteredProvider = Class.new(StandardError)
|
5
|
+
|
6
|
+
def providers
|
7
|
+
@providers ||= {}
|
8
|
+
end
|
9
|
+
|
10
|
+
# Add custom provider to common registry.
|
11
|
+
# Requires proxy provider name ('hide_my_name' for example) and a class
|
12
|
+
# that implements the parsing logic.
|
13
|
+
def register(name, klass)
|
14
|
+
raise RegisteredProvider, "`#{name}` provider already registered!" if providers.key?(name.to_sym)
|
15
|
+
|
16
|
+
providers[name.to_sym] = klass
|
17
|
+
end
|
18
|
+
|
19
|
+
# Returns a class for specific provider if it is
|
20
|
+
# registered in the registry. Otherwise throws an exception.
|
21
|
+
def class_for(provider_name)
|
22
|
+
provider_name = provider_name.to_sym
|
23
|
+
|
24
|
+
providers.fetch(provider_name)
|
25
|
+
rescue KeyError
|
26
|
+
raise UnknownProvider, "unregistered proxy provider `#{provider_name}`"
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
@@ -1,22 +1,29 @@
|
|
1
1
|
module ProxyFetcher
|
2
2
|
class Manager
|
3
|
-
attr_reader :proxies
|
3
|
+
attr_reader :proxies
|
4
4
|
|
5
5
|
# refresh: true - load proxy list from the remote server on initialization
|
6
6
|
# refresh: false - just initialize the class, proxy list will be empty ([])
|
7
|
-
def initialize(refresh: true, filters: {})
|
8
|
-
@filters = filters
|
9
|
-
|
7
|
+
def initialize(refresh: true, validate: false, filters: {})
|
10
8
|
if refresh
|
11
|
-
refresh_list!
|
9
|
+
refresh_list!(filters)
|
12
10
|
else
|
13
11
|
@proxies = []
|
14
12
|
end
|
13
|
+
|
14
|
+
cleanup! if validate
|
15
15
|
end
|
16
16
|
|
17
17
|
# Update current proxy list from the provider
|
18
|
-
def refresh_list!
|
19
|
-
@proxies =
|
18
|
+
def refresh_list!(filters = nil)
|
19
|
+
@proxies = []
|
20
|
+
|
21
|
+
ProxyFetcher.config.providers.each do |provider_name|
|
22
|
+
provider = ProxyFetcher::Configuration.providers_registry.class_for(provider_name)
|
23
|
+
provider_filters = filters && filters.fetch(provider_name.to_sym, filters)
|
24
|
+
|
25
|
+
@proxies.concat(provider.fetch_proxies!(provider_filters))
|
26
|
+
end
|
20
27
|
end
|
21
28
|
|
22
29
|
alias fetch! refresh_list!
|
@@ -50,10 +57,10 @@ module ProxyFetcher
|
|
50
57
|
alias pop! get!
|
51
58
|
|
52
59
|
# Clean current proxy list from dead proxies (that doesn't respond by timeout)
|
53
|
-
def cleanup!
|
60
|
+
def cleanup!
|
54
61
|
lock = Mutex.new
|
55
62
|
|
56
|
-
proxies.dup.each_slice(pool_size) do |proxy_group|
|
63
|
+
proxies.dup.each_slice(ProxyFetcher.config.pool_size) do |proxy_group|
|
57
64
|
threads = proxy_group.map do |group_proxy|
|
58
65
|
Thread.new(group_proxy, proxies) do |proxy, proxies|
|
59
66
|
lock.synchronize { proxies.delete(proxy) } unless proxy.connectable?
|
@@ -7,8 +7,8 @@ module ProxyFetcher
|
|
7
7
|
|
8
8
|
def_delegators ProxyFetcher::HTML, :clear, :convert_to_int
|
9
9
|
|
10
|
-
|
11
|
-
|
10
|
+
# Loads proxy provider page content, extract proxy list from it
|
11
|
+
# and convert every entry to proxy object.
|
12
12
|
def fetch_proxies!(filters = {})
|
13
13
|
load_proxy_list(filters).map { |html| to_proxy(html) }
|
14
14
|
end
|
@@ -23,8 +23,10 @@ module ProxyFetcher
|
|
23
23
|
|
24
24
|
# Loads HTML document with Nokogiri by the URL combined with custom filters
|
25
25
|
def load_document(url, filters = {})
|
26
|
+
raise ArgumentError, 'filters must be a Hash' if filters && !filters.is_a?(Hash)
|
27
|
+
|
26
28
|
uri = URI.parse(url)
|
27
|
-
uri.query = URI.encode_www_form(filters) if filters.any?
|
29
|
+
uri.query = URI.encode_www_form(filters) if filters && filters.any?
|
28
30
|
|
29
31
|
Nokogiri::HTML(ProxyFetcher.config.http_client.fetch(uri.to_s))
|
30
32
|
end
|
data/lib/proxy_fetcher/proxy.rb
CHANGED
@@ -11,12 +11,18 @@ module ProxyFetcher
|
|
11
11
|
|
12
12
|
TYPES.each do |proxy_type|
|
13
13
|
define_method "#{proxy_type.downcase}?" do
|
14
|
-
type && type.upcase.include?(proxy_type)
|
14
|
+
!type.nil? && type.upcase.include?(proxy_type)
|
15
15
|
end
|
16
16
|
end
|
17
17
|
|
18
18
|
alias ssl? https?
|
19
19
|
|
20
|
+
def initialize(attributes = {})
|
21
|
+
attributes.each do |attr, value|
|
22
|
+
public_send("#{attr}=", value)
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
20
26
|
def connectable?
|
21
27
|
ProxyFetcher.config.proxy_validator.connectable?(addr, port)
|
22
28
|
end
|
data/proxy_fetcher.gemspec
CHANGED
@@ -5,7 +5,7 @@ require 'proxy_fetcher/version'
|
|
5
5
|
Gem::Specification.new do |gem|
|
6
6
|
gem.name = 'proxy_fetcher'
|
7
7
|
gem.version = ProxyFetcher.gem_version
|
8
|
-
gem.date = '2017-08-
|
8
|
+
gem.date = '2017-08-28'
|
9
9
|
gem.summary = 'Ruby gem for dealing with proxy lists from different providers'
|
10
10
|
gem.description = 'This gem can help your Ruby application to make HTTP(S) requests ' \
|
11
11
|
'from proxy server by fetching and validating proxy lists from the different providers.'
|
@@ -43,21 +43,16 @@ describe ProxyFetcher::Configuration do
|
|
43
43
|
end
|
44
44
|
|
45
45
|
context 'custom provider' do
|
46
|
-
it '
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
expect { ProxyFetcher.config.provider = :custom_provider }.not_to raise_error
|
46
|
+
it 'failed on registration if provider class already registered' do
|
47
|
+
expect { ProxyFetcher::Configuration.register_provider(:xroxy, Class.new) }
|
48
|
+
.to raise_error(ProxyFetcher::ProvidersRegistry::RegisteredProvider)
|
51
49
|
end
|
52
50
|
|
53
|
-
it
|
54
|
-
|
55
|
-
.to raise_error(ProxyFetcher::Configuration::UnknownProvider)
|
56
|
-
end
|
51
|
+
it "failed on proxy list fetching if provider doesn't registered" do
|
52
|
+
ProxyFetcher.config.provider = :not_existing_provider
|
57
53
|
|
58
|
-
|
59
|
-
|
60
|
-
.to raise_error(ProxyFetcher::Configuration::RegisteredProvider)
|
54
|
+
expect { ProxyFetcher::Manager.new }
|
55
|
+
.to raise_error(ProxyFetcher::ProvidersRegistry::UnknownProvider)
|
61
56
|
end
|
62
57
|
end
|
63
58
|
end
|
@@ -0,0 +1,21 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe 'Multiple proxy providers' do
|
4
|
+
before { ProxyFetcher.config.reset! }
|
5
|
+
after { ProxyFetcher.config.reset! }
|
6
|
+
|
7
|
+
it 'combine proxies from multiple providers' do
|
8
|
+
proxy_stub = ProxyFetcher::Proxy.new(addr: '192.168.1.1', port: 8080)
|
9
|
+
|
10
|
+
# Each proxy provider will return 2 proxies
|
11
|
+
ProxyFetcher::Configuration.providers_registry.providers.each do |_name, klass|
|
12
|
+
allow_any_instance_of(klass).to receive(:load_proxy_list).and_return([1, 2])
|
13
|
+
allow_any_instance_of(klass).to receive(:to_proxy).and_return(proxy_stub)
|
14
|
+
end
|
15
|
+
|
16
|
+
all_providers = ProxyFetcher::Configuration.registered_providers
|
17
|
+
ProxyFetcher.config.providers = all_providers
|
18
|
+
|
19
|
+
expect(ProxyFetcher::Manager.new.proxies.size).to eq(all_providers.size * 2)
|
20
|
+
end
|
21
|
+
end
|
@@ -11,6 +11,15 @@ describe ProxyFetcher::Proxy do
|
|
11
11
|
|
12
12
|
let(:proxy) { @manager.proxies.first.dup }
|
13
13
|
|
14
|
+
it 'can initialize a new proxy object' do
|
15
|
+
proxy = described_class.new(addr: '192.169.1.1', port: 8080, type: 'HTTP')
|
16
|
+
|
17
|
+
expect(proxy).not_to be_nil
|
18
|
+
expect(proxy.addr).to eq('192.169.1.1')
|
19
|
+
expect(proxy.port).to eq(8080)
|
20
|
+
expect(proxy.type).to eq('HTTP')
|
21
|
+
end
|
22
|
+
|
14
23
|
it 'checks schema' do
|
15
24
|
proxy.type = ProxyFetcher::Proxy::HTTP
|
16
25
|
expect(proxy.http?).to be_truthy
|
@@ -9,9 +9,18 @@ RSpec.shared_examples 'a manager' do
|
|
9
9
|
expect(manager.proxies).to be_empty
|
10
10
|
end
|
11
11
|
|
12
|
-
it 'returns Proxy objects' do
|
12
|
+
it 'returns valid Proxy objects' do
|
13
13
|
manager = ProxyFetcher::Manager.new
|
14
14
|
expect(manager.proxies).to all(be_a(ProxyFetcher::Proxy))
|
15
|
+
|
16
|
+
manager.proxies.each do |proxy|
|
17
|
+
expect(proxy.addr).to match(/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/i)
|
18
|
+
expect(proxy.port).to be_a_kind_of(Numeric)
|
19
|
+
expect(proxy.type).not_to be_empty
|
20
|
+
expect(proxy.country).not_to be_empty
|
21
|
+
expect(proxy.anonymity).not_to be_empty
|
22
|
+
expect(proxy.response_time).to be_nil.or(be_a_kind_of(Numeric))
|
23
|
+
end
|
15
24
|
end
|
16
25
|
|
17
26
|
it 'returns raw proxies (HOST:PORT)' do
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: proxy_fetcher
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.4.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Nikita Bulai
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-08-
|
11
|
+
date: 2017-08-28 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
@@ -55,6 +55,8 @@ files:
|
|
55
55
|
- ".gitignore"
|
56
56
|
- ".rubocop.yml"
|
57
57
|
- ".travis.yml"
|
58
|
+
- CHANGELOG.md
|
59
|
+
- CODE_OF_CONDUCT.md
|
58
60
|
- Gemfile
|
59
61
|
- LICENSE
|
60
62
|
- README.md
|
@@ -62,6 +64,7 @@ files:
|
|
62
64
|
- bin/proxy_fetcher
|
63
65
|
- lib/proxy_fetcher.rb
|
64
66
|
- lib/proxy_fetcher/configuration.rb
|
67
|
+
- lib/proxy_fetcher/configuration/providers_registry.rb
|
65
68
|
- lib/proxy_fetcher/manager.rb
|
66
69
|
- lib/proxy_fetcher/providers/base.rb
|
67
70
|
- lib/proxy_fetcher/providers/free_proxy_list.rb
|
@@ -81,6 +84,7 @@ files:
|
|
81
84
|
- spec/proxy_fetcher/providers/free_proxy_list_spec.rb
|
82
85
|
- spec/proxy_fetcher/providers/free_proxy_list_ssl_spec.rb
|
83
86
|
- spec/proxy_fetcher/providers/hide_my_name_spec.rb
|
87
|
+
- spec/proxy_fetcher/providers/multiple_providers_spec.rb
|
84
88
|
- spec/proxy_fetcher/providers/proxy_docker_spec.rb
|
85
89
|
- spec/proxy_fetcher/providers/proxy_list_spec.rb
|
86
90
|
- spec/proxy_fetcher/providers/xroxy_spec.rb
|