proxy_fetcher 0.7.0 → 0.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +13 -0
- data/lib/proxy_fetcher/providers/xroxy.rb +1 -1
- data/lib/proxy_fetcher/version.rb +1 -1
- data/proxy_fetcher.gemspec +1 -1
- data/spec/proxy_fetcher/version_spec.rb +1 -1
- metadata +2 -5
- data/.rubocop.yml +0 -14
- data/.travis.yml +0 -26
- data/README.md +0 -489
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 59ba9fd7169617766f99b048a1c889e0213ac57c
|
4
|
+
data.tar.gz: 03e3d51d71fb5c06c0c7c7d5808a8a43c79c50a4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 495ccb751c9f927f1ef31e1b2536c600aec0f68189ea11d1f8061d59bfe41caf2dd0e818d9b780a5b8279c66b20c05873ffba4976cadb101c4aff2a7a449bc94
|
7
|
+
data.tar.gz: a2aae32a4a3649b10399e23520fbf0a19df3ca564a3519290352bd51366cda6f8a709aa7a1783de31e5e474ab23bd86ea2fb41fff320a81eb57a16b3ef111b30
|
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,19 @@
|
|
2
2
|
|
3
3
|
Reverse Chronological Order:
|
4
4
|
|
5
|
+
## `0.7.0` (2018-06-04)
|
6
|
+
|
7
|
+
* Migrate to `HTTP.rb` instead of `Net::HTTP`
|
8
|
+
* Fixes
|
9
|
+
|
10
|
+
## `0.6.5` (2018-04-20)
|
11
|
+
|
12
|
+
* Fix providers
|
13
|
+
|
14
|
+
## `0.6.4` (2018-03-26)
|
15
|
+
|
16
|
+
* Fix providers
|
17
|
+
|
5
18
|
## `0.6.3` (2018-01-26)
|
6
19
|
|
7
20
|
* Add ability to use own proxy for `ProxyFetcher::Client`
|
@@ -5,7 +5,7 @@ module ProxyFetcher
|
|
5
5
|
# XRoxy provider class.
|
6
6
|
class XRoxy < Base
|
7
7
|
# Provider URL to fetch proxy list
|
8
|
-
PROVIDER_URL = '
|
8
|
+
PROVIDER_URL = 'https://www.xroxy.com/proxylist.php'.freeze
|
9
9
|
|
10
10
|
# Fetches HTML content by sending HTTP request to the provider URL and
|
11
11
|
# parses the document (built as abstract <code>ProxyFetcher::Document</code>)
|
data/proxy_fetcher.gemspec
CHANGED
@@ -12,7 +12,7 @@ Gem::Specification.new do |gem|
|
|
12
12
|
gem.email = 'bulajnikita@gmail.com'
|
13
13
|
gem.require_paths = ['lib']
|
14
14
|
gem.bindir = 'bin'
|
15
|
-
gem.files = `git ls-files`.split($RS)
|
15
|
+
gem.files = `git ls-files`.split($RS) - %w[README.md .travis.yml .rubocop.yml]
|
16
16
|
gem.executables = `git ls-files -- bin/*`.split("\n").map { |f| File.basename(f) }
|
17
17
|
gem.homepage = 'http://github.com/nbulaj/proxy_fetcher'
|
18
18
|
gem.license = 'MIT'
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: proxy_fetcher
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.7.
|
4
|
+
version: 0.7.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Nikita Bulai
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-07-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: http
|
@@ -47,13 +47,10 @@ extensions: []
|
|
47
47
|
extra_rdoc_files: []
|
48
48
|
files:
|
49
49
|
- ".gitignore"
|
50
|
-
- ".rubocop.yml"
|
51
|
-
- ".travis.yml"
|
52
50
|
- CHANGELOG.md
|
53
51
|
- CODE_OF_CONDUCT.md
|
54
52
|
- Gemfile
|
55
53
|
- LICENSE
|
56
|
-
- README.md
|
57
54
|
- Rakefile
|
58
55
|
- bin/proxy_fetcher
|
59
56
|
- gemfiles/nokogiri.gemfile
|
data/.rubocop.yml
DELETED
data/.travis.yml
DELETED
@@ -1,26 +0,0 @@
|
|
1
|
-
language: ruby
|
2
|
-
before_install: gem install bundler
|
3
|
-
bundler_args: --without yard guard benchmarks
|
4
|
-
script: "rake spec"
|
5
|
-
env: JRUBY_OPTS="$JRUBY_OPTS --debug"
|
6
|
-
gemfile:
|
7
|
-
- gemfiles/oga.gemfile
|
8
|
-
- gemfiles/nokogiri.gemfile
|
9
|
-
gemfile:
|
10
|
-
- gemfiles/oga.gemfile
|
11
|
-
- gemfiles/nokogiri.gemfile
|
12
|
-
rvm:
|
13
|
-
- 2.0
|
14
|
-
- 2.1
|
15
|
-
- 2.2.4
|
16
|
-
- 2.3.3
|
17
|
-
- 2.4.3
|
18
|
-
- 2.5.0
|
19
|
-
- ruby-head
|
20
|
-
- jruby-9.1.15.0
|
21
|
-
matrix:
|
22
|
-
allow_failures:
|
23
|
-
- rvm: ruby-head
|
24
|
-
exclude:
|
25
|
-
- rvm: 2.0
|
26
|
-
gemfile: gemfiles/nokogiri.gemfile # Nokogiri doesn't support Ruby 2.0
|
data/README.md
DELETED
@@ -1,489 +0,0 @@
|
|
1
|
-
# Ruby / JRuby lib for managing proxies
|
2
|
-
[](http://badge.fury.io/rb/proxy_fetcher)
|
3
|
-
[](https://travis-ci.org/nbulaj/proxy_fetcher)
|
4
|
-
[](https://coveralls.io/github/nbulaj/proxy_fetcher)
|
5
|
-
[](https://codeclimate.com/github/nbulaj/proxy_fetcher)
|
6
|
-
[](http://inch-ci.org/github/nbulaj/proxy_fetcher)
|
7
|
-
[](#license)
|
8
|
-
|
9
|
-
This gem can help your Ruby / JRuby application to make HTTP(S) requests using
|
10
|
-
proxy by fetching and validating actual proxy lists from multiple providers.
|
11
|
-
|
12
|
-
It gives you a special `Manager` class that can load proxy lists, validate them and return random or specific proxies.
|
13
|
-
It also has a `Client` class that encapsulates all the logic for sending HTTP requests using proxies, automatically
|
14
|
-
fetched and validated by the gem. Take a look at the documentation below to find all the gem features.
|
15
|
-
|
16
|
-
Also this gem can be used with any other programming language (Go / Python / etc) as standalone solution for downloading and
|
17
|
-
validating proxy lists from the different providers. [Checkout examples](#standalone) of usage below.
|
18
|
-
|
19
|
-
## Documentation valid for `master` branch
|
20
|
-
|
21
|
-
Please check the documentation for the version of doorkeeper you are using in:
|
22
|
-
https://github.com/nbulaj/proxy_fetcher/releases
|
23
|
-
|
24
|
-
## Table of Contents
|
25
|
-
|
26
|
-
- [Dependencies](#dependencies)
|
27
|
-
- [Installation](#installation)
|
28
|
-
- [Example of usage](#example-of-usage)
|
29
|
-
- [In Ruby application](#in-ruby-application)
|
30
|
-
- [Standalone](#standalone)
|
31
|
-
- [Client](#client)
|
32
|
-
- [Configuration](#configuration)
|
33
|
-
- [Proxy validation speed](#proxy-validation-speed)
|
34
|
-
- [Proxy object](#proxy-object)
|
35
|
-
- [Providers](#providers)
|
36
|
-
- [Contributing](#contributing)
|
37
|
-
- [License](#license)
|
38
|
-
|
39
|
-
## Dependencies
|
40
|
-
|
41
|
-
ProxyFetcher gem itself requires Ruby `>= 2.0.0` (or [JRuby](http://jruby.org/) `> 9.0`, but maybe earlier too,
|
42
|
-
[see Travis build matrix](.travis.yml)) and great [HTTP.rb gem](https://github.com/httprb/http).
|
43
|
-
|
44
|
-
However, it requires an adapter to parse HTML. If you do not specify any specific adapter, then it will use
|
45
|
-
default one - [Nokogiri](https://github.com/sparklemotion/nokogiri). It's OK for any Ruby on Rails project
|
46
|
-
(because they use it by default).
|
47
|
-
|
48
|
-
But if you want to use some specific adapter (for example your application uses [Oga](https://gitlab.com/yorickpeterse/oga),
|
49
|
-
then you need to manually add your dependencies to your project and configure ProxyFetcher to use another adapter. Moreover,
|
50
|
-
you can implement your own adapter if it your use-case. Take a look at the [Configuration](#configuration) section for more details.
|
51
|
-
|
52
|
-
## Installation
|
53
|
-
|
54
|
-
If using bundler, first add 'proxy_fetcher' to your Gemfile:
|
55
|
-
|
56
|
-
```ruby
|
57
|
-
gem 'proxy_fetcher', '~> 0.7'
|
58
|
-
```
|
59
|
-
|
60
|
-
or if you want to use the latest version (from `master` branch), then:
|
61
|
-
|
62
|
-
```ruby
|
63
|
-
gem 'proxy_fetcher', git: 'https://github.com/nbulaj/proxy_fetcher.git'
|
64
|
-
```
|
65
|
-
|
66
|
-
And run:
|
67
|
-
|
68
|
-
```sh
|
69
|
-
bundle install
|
70
|
-
```
|
71
|
-
|
72
|
-
Otherwise simply install the gem:
|
73
|
-
|
74
|
-
```sh
|
75
|
-
gem install proxy_fetcher -v '0.7'
|
76
|
-
```
|
77
|
-
|
78
|
-
## Example of usage
|
79
|
-
|
80
|
-
### In Ruby application
|
81
|
-
|
82
|
-
By default ProxyFetcher uses all the available proxy providers. To get current proxy list without validation you
|
83
|
-
need to initialize an instance of `ProxyFetcher::Manager` class. During this process ProxyFetcher will automatically load
|
84
|
-
and parse all the proxies:
|
85
|
-
|
86
|
-
```ruby
|
87
|
-
manager = ProxyFetcher::Manager.new # will immediately load proxy list from the server
|
88
|
-
manager.proxies
|
89
|
-
|
90
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
91
|
-
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
92
|
-
```
|
93
|
-
|
94
|
-
You can initialize proxy manager without immediate load of the proxy list from the remote server by passing
|
95
|
-
`refresh: false` on initialization:
|
96
|
-
|
97
|
-
```ruby
|
98
|
-
manager = ProxyFetcher::Manager.new(refresh: false) # just initialize class instance
|
99
|
-
manager.proxies
|
100
|
-
|
101
|
-
#=> []
|
102
|
-
```
|
103
|
-
|
104
|
-
`ProxyFetcher::Manager` class is very helpful when you need to manipulate and manager proxies. To get the proxy
|
105
|
-
from the list yoy can call `.get` or `.pop` method that will return first proxy and move it to the end of the list.
|
106
|
-
This methods has some equivalents like `get!` or aliased `pop!` that will return first **connectable** proxy and
|
107
|
-
move it to the end of the list. They both marked as danger methods because all dead proxies will be removed from the list.
|
108
|
-
|
109
|
-
If you need just some random proxy then call `manager.random_proxy` or it's alias `manager.random`.
|
110
|
-
|
111
|
-
To clean current proxy list from the dead entries that does not respond to the requests you you need to use `cleanup!`
|
112
|
-
or `validate!` method:
|
113
|
-
|
114
|
-
```ruby
|
115
|
-
manager.cleanup! # or manager.validate!
|
116
|
-
```
|
117
|
-
|
118
|
-
This action will enumerate proxy list and remove all the entries that doesn't respond by timeout or returns errors.
|
119
|
-
|
120
|
-
In order to increase the performance proxy list validation is performed using Ruby threads. By default gem creates a
|
121
|
-
pool with 10 threads, but you can increase this number by changing `pool_size` configuration option: `ProxyFetcher.config.pool_size = 50`.
|
122
|
-
Read more in [Proxy validation speed](#proxy-validation-speed) section.
|
123
|
-
|
124
|
-
If you need raw proxy URLs (like `host:port`) then you can use `raw_proxies` methods that will return array of strings:
|
125
|
-
|
126
|
-
```ruby
|
127
|
-
manager = ProxyFetcher::Manager.new
|
128
|
-
manager.raw_proxies
|
129
|
-
|
130
|
-
# => ["97.77.104.22:3128", "94.23.205.32:3128", "209.79.65.140:8080",
|
131
|
-
# "91.217.42.2:8080", "97.77.104.22:80", "165.234.102.177:8080", ...]
|
132
|
-
```
|
133
|
-
|
134
|
-
You don't need to initialize a new manager every time you want to load actual proxy list from the providers. All you
|
135
|
-
need is to refresh the proxy list by calling `#refresh_list!` (or `#fetch!`) method for your `ProxyFetcher::Manager` instance:
|
136
|
-
|
137
|
-
```ruby
|
138
|
-
manager.refresh_list! # or manager.fetch!
|
139
|
-
|
140
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
141
|
-
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
142
|
-
```
|
143
|
-
|
144
|
-
If you need to filter proxy list, for example, by country or response time and selected provider supports filtering with GET params,
|
145
|
-
then you can just pass your filters like a simple Ruby hash to the Manager instance:
|
146
|
-
|
147
|
-
```ruby
|
148
|
-
ProxyFetcher.config.providers = :proxy_docker
|
149
|
-
|
150
|
-
manager = ProxyFetcher::Manager.new(filters: { country: 'PL', maxtime: '500' })
|
151
|
-
manager.proxies
|
152
|
-
|
153
|
-
# => [...]
|
154
|
-
```
|
155
|
-
|
156
|
-
If you are using multiple providers, then you can split your filters by proxy provider names:
|
157
|
-
|
158
|
-
```ruby
|
159
|
-
ProxyFetcher.config.providers = [:proxy_docker, :xroxy]
|
160
|
-
|
161
|
-
manager = ProxyFetcher::Manager.new(filters: {
|
162
|
-
hide_my_name: {
|
163
|
-
country: 'PL',
|
164
|
-
maxtime: '500'
|
165
|
-
},
|
166
|
-
xroxy: {
|
167
|
-
type: 'All_http'
|
168
|
-
}
|
169
|
-
})
|
170
|
-
|
171
|
-
manager.proxies
|
172
|
-
|
173
|
-
# => [...]
|
174
|
-
```
|
175
|
-
|
176
|
-
You can apply different filters every time you calling `#refresh_list!` (or `#fetch!`) method:
|
177
|
-
|
178
|
-
```ruby
|
179
|
-
manager.refresh_list!(country: 'PL', maxtime: '500')
|
180
|
-
|
181
|
-
# => [...]
|
182
|
-
```
|
183
|
-
|
184
|
-
*NOTE*: not all the providers support filtering. Take a look at the provider classes to see if it supports custom filters.
|
185
|
-
|
186
|
-
### Standalone
|
187
|
-
|
188
|
-
All you need to use this gem is Ruby >= 2.0 (2.4 is recommended). You can install it in a different ways. If you are using Ubuntu Xenial (16.04LTS)
|
189
|
-
then you already have Ruby 2.3 installed. In other cases you can install it with [RVM](https://rvm.io/) or [rbenv](https://github.com/rbenv/rbenv).
|
190
|
-
|
191
|
-
After installing Ruby just bundle the gem by running `gem install proxy_fetcher` in your terminal and now you can run it:
|
192
|
-
|
193
|
-
```bash
|
194
|
-
proxy_fetcher >> proxies.txt # Will download proxies from the default provider, validate them and write to file
|
195
|
-
```
|
196
|
-
|
197
|
-
If you need a list of proxies from some specific provider, then you need to pass it's name with `-p` option:
|
198
|
-
|
199
|
-
```bash
|
200
|
-
proxy_fetcher -p proxy_docker >> proxies.txt # Will download proxies from the default provider, validate them and write to file
|
201
|
-
```
|
202
|
-
|
203
|
-
If you need a list of proxies in JSON format just pass a `--json` option to the command:
|
204
|
-
|
205
|
-
```bash
|
206
|
-
proxy_fetcher --json
|
207
|
-
|
208
|
-
# Will print:
|
209
|
-
# {"proxies":["120.26.206.178:80","119.61.13.242:1080","117.40.213.26:80","92.62.72.242:1080","77.53.105.155:3124"
|
210
|
-
# "58.20.41.172:35923","204.116.192.151:35923","190.5.96.58:1080","170.250.109.97:35923","121.41.82.99:1080"]}
|
211
|
-
```
|
212
|
-
|
213
|
-
To get all the possible options run:
|
214
|
-
|
215
|
-
```bash
|
216
|
-
proxy_fetcher --help
|
217
|
-
```
|
218
|
-
|
219
|
-
## Client
|
220
|
-
|
221
|
-
ProxyFetcher gem provides you a ready-to-use HTTP client that made requesting with proxies easy. It does all the work
|
222
|
-
with the proxy lists for you (load, validate, refresh, find proxy by type, follow redirects, etc). All you need it to
|
223
|
-
make HTTP(S) requests:
|
224
|
-
|
225
|
-
```ruby
|
226
|
-
require 'proxy-fetcher'
|
227
|
-
|
228
|
-
ProxyFetcher::Client.get 'https://example.com/resource'
|
229
|
-
|
230
|
-
ProxyFetcher::Client.post 'https://example.com/resource', { param: 'value' }
|
231
|
-
|
232
|
-
ProxyFetcher::Client.post 'https://example.com/resource', 'Any data'
|
233
|
-
|
234
|
-
ProxyFetcher::Client.post 'https://example.com/resource', { param: 'value'}.to_json , headers: { 'Content-Type': 'application/json' }
|
235
|
-
|
236
|
-
ProxyFetcher::Client.put 'https://example.com/resource', { param: 'value' }
|
237
|
-
|
238
|
-
ProxyFetcher::Client.patch 'https://example.com/resource', { param: 'value' }
|
239
|
-
|
240
|
-
ProxyFetcher::Client.delete 'https://example.com/resource'
|
241
|
-
```
|
242
|
-
|
243
|
-
By default, `ProxyFetcher::Client` makes 1000 attempts to send a HTTP request in case if proxy is out of order or the
|
244
|
-
remote server returns an error. You can increase or decrease this number for your case or set it to `nil` if you want to
|
245
|
-
make infinite number of requests (or before your Ruby process will die :skull:):
|
246
|
-
|
247
|
-
```ruby
|
248
|
-
require 'proxy-fetcher'
|
249
|
-
|
250
|
-
ProxyFetcher::Client.get 'https://example.com/resource', options: { max_retries: 10_000 }
|
251
|
-
```
|
252
|
-
|
253
|
-
You can also use your own proxy object when using ProxyFetcher client:
|
254
|
-
|
255
|
-
```ruby
|
256
|
-
require 'proxy-fetcher'
|
257
|
-
|
258
|
-
manager = ProxyFetcher::Manager.new # will immediately load proxy list from the server
|
259
|
-
|
260
|
-
#random will return random proxy object from the list
|
261
|
-
ProxyFetcher::Client.get 'https://example.com/resource', options: { proxy: manager.random }
|
262
|
-
```
|
263
|
-
|
264
|
-
Btw, if you need support of JavaScript or some other features, you need to implement your own client using, for example,
|
265
|
-
`selenium-webdriver`.
|
266
|
-
|
267
|
-
## Configuration
|
268
|
-
|
269
|
-
ProxyFetcher is very flexible gem. You can configure the most important parts of the library and use your own solutions.
|
270
|
-
|
271
|
-
Default configuration looks as follows:
|
272
|
-
|
273
|
-
```ruby
|
274
|
-
ProxyFetcher.configure do |config|
|
275
|
-
config.logger = Logger.new(STDOUT)
|
276
|
-
config.user_agent = ProxyFetcher::Configuration::DEFAULT_USER_AGENT
|
277
|
-
config.pool_size = 10
|
278
|
-
config.timeout = 3
|
279
|
-
config.http_client = ProxyFetcher::HTTPClient
|
280
|
-
config.proxy_validator = ProxyFetcher::ProxyValidator
|
281
|
-
config.providers = ProxyFetcher::Configuration.registered_providers
|
282
|
-
config.adapter = ProxyFetcher::Configuration::DEFAULT_ADAPTER # :nokogiri by default
|
283
|
-
end
|
284
|
-
```
|
285
|
-
|
286
|
-
You can change any of the options above. Let's look at this deeper.
|
287
|
-
|
288
|
-
To change open/read timeout for `cleanup!` and `connectable?` methods you need to change `timeout` options:
|
289
|
-
|
290
|
-
```ruby
|
291
|
-
ProxyFetcher.configure do |config|
|
292
|
-
config.timeout = 1 # default is 3
|
293
|
-
end
|
294
|
-
|
295
|
-
manager = ProxyFetcher::Manager.new
|
296
|
-
manager.cleanup!
|
297
|
-
```
|
298
|
-
|
299
|
-
Also you can set your custom User-Agent string:
|
300
|
-
|
301
|
-
```ruby
|
302
|
-
ProxyFetcher.configure do |config|
|
303
|
-
config.user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
|
304
|
-
end
|
305
|
-
```
|
306
|
-
|
307
|
-
ProxyFetcher uses HTTP.rb gem for dealing with HTTP(S) requests. It is fast enough and has a great chainable API.
|
308
|
-
If you wanna add, for example, your custom provider that was developed as a Single Page Application (SPA) with some JavaScript,
|
309
|
-
then you will need something like [selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb) to properly
|
310
|
-
load the content of the website. For those and other cases you can write your own class for fetching HTML content by
|
311
|
-
the URL and setup it in the ProxyFetcher config:
|
312
|
-
|
313
|
-
```ruby
|
314
|
-
class MyHTTPClient
|
315
|
-
# [IMPORTANT]: below methods are required!
|
316
|
-
def self.fetch(url)
|
317
|
-
# ... some magic to return proper HTML ...
|
318
|
-
end
|
319
|
-
end
|
320
|
-
|
321
|
-
ProxyFetcher.config.http_client = MyHTTPClient
|
322
|
-
|
323
|
-
manager = ProxyFetcher::Manager.new
|
324
|
-
manager.proxies
|
325
|
-
|
326
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
327
|
-
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
328
|
-
```
|
329
|
-
|
330
|
-
You can take a look at the [lib/proxy_fetcher/utils/http_client.rb](lib/proxy_fetcher/utils/http_client.rb) for an example.
|
331
|
-
|
332
|
-
Moreover, you can write your own proxy validator to check if proxy is valid or not:
|
333
|
-
|
334
|
-
```ruby
|
335
|
-
class MyProxyValidator
|
336
|
-
# [IMPORTANT]: below methods are required!
|
337
|
-
def self.connectable?(proxy_addr, proxy_port)
|
338
|
-
# ... some magic to check if proxy is valid ...
|
339
|
-
end
|
340
|
-
end
|
341
|
-
|
342
|
-
ProxyFetcher.config.proxy_validator = MyProxyValidator
|
343
|
-
|
344
|
-
manager = ProxyFetcher::Manager.new
|
345
|
-
manager.proxies
|
346
|
-
|
347
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
348
|
-
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
349
|
-
|
350
|
-
manager.validate!
|
351
|
-
|
352
|
-
#=> [ ... ]
|
353
|
-
```
|
354
|
-
|
355
|
-
Be default, ProxyFetcher gem uses [Nokogiri](https://github.com/sparklemotion/nokogiri) for parsing HTML. If you want
|
356
|
-
to use [Oga](https://gitlab.com/yorickpeterse/oga) instead, then you need to add `gem 'oga'` to your Gemfile and configure
|
357
|
-
ProxyFetcher as follows:
|
358
|
-
|
359
|
-
```ruby
|
360
|
-
ProxyFetcher.config.adapter = :oga
|
361
|
-
```
|
362
|
-
|
363
|
-
Also you can write your own HTML parser implementation and use it, take a look at the [abstract class and implementations](lib/proxy_fetcher/document).
|
364
|
-
Configure it as:
|
365
|
-
|
366
|
-
```ruby
|
367
|
-
ProxyFetcher.config.adapter = MyHTMLParserClass
|
368
|
-
```
|
369
|
-
|
370
|
-
### Proxy validation speed
|
371
|
-
|
372
|
-
There are some tricks to increase proxy list validation performance.
|
373
|
-
|
374
|
-
In a few words, ProxyFetcher gem uses threads to validate proxies for availability. Every proxy is checked in a
|
375
|
-
separate thread. By default, ProxyFetcher uses a pool with a maximum of 10 threads. You can increase this number by
|
376
|
-
setting max number of threads in the config:
|
377
|
-
|
378
|
-
```ruby
|
379
|
-
ProxyFetcher.config.pool_size = 50
|
380
|
-
```
|
381
|
-
|
382
|
-
You can experiment with the threads pool size to find an optimal number of maximum threads count for you PC and OS.
|
383
|
-
This will definitely give you some performance improvements.
|
384
|
-
|
385
|
-
Moreover, the common proxy validation speed depends on `ProxyFetcher.config.timeout` option that is equal
|
386
|
-
to `3` by default. It means that gem will wait 3 seconds for the server answer to check if particular proxy is connectable.
|
387
|
-
You can decrease this option to `1`, for example, and it will heavily increase proxy validation speed (**but remember**
|
388
|
-
that some proxies could be connectable, but slow, so with this option you will clear proxy list from the proxies that
|
389
|
-
works, but very slow).
|
390
|
-
|
391
|
-
## Proxy object
|
392
|
-
|
393
|
-
Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance variables):
|
394
|
-
|
395
|
-
* `addr` (IP address)
|
396
|
-
* `port`
|
397
|
-
* `type` (proxy type, can be HTTP, HTTPS, SOCKS4 or/and SOCKS5)
|
398
|
-
* `country` (USA or Brazil for example)
|
399
|
-
* `response_time` (5217 for example)
|
400
|
-
* `anonymity` (`Low`, `Elite proxy` or `High +KA` for example)
|
401
|
-
|
402
|
-
Also you can call next instance methods for every Proxy object:
|
403
|
-
|
404
|
-
* `connectable?` (whether proxy server is available)
|
405
|
-
* `http?` (whether proxy server has a HTTP protocol)
|
406
|
-
* `https?` (whether proxy server has a HTTPS protocol)
|
407
|
-
* `socks4?`
|
408
|
-
* `socks5?`
|
409
|
-
* `uri` (returns `URI::Generic` object)
|
410
|
-
* `url` (returns a formatted URL like "_IP:PORT_" or "_http://IP:PORT_" if `scheme: true` provided)
|
411
|
-
|
412
|
-
## Providers
|
413
|
-
|
414
|
-
Currently ProxyFetcher can deal with next proxy providers (services):
|
415
|
-
|
416
|
-
* Free Proxy List
|
417
|
-
* Free SSL Proxies
|
418
|
-
* Proxy Docker
|
419
|
-
* Gather Proxy
|
420
|
-
* HTTP Tunnel Genius
|
421
|
-
* Proxy List
|
422
|
-
* XRoxy
|
423
|
-
|
424
|
-
If you wanna use one of them just setup it in the config:
|
425
|
-
|
426
|
-
```ruby
|
427
|
-
ProxyFetcher.config.provider = :free_proxy_list
|
428
|
-
|
429
|
-
manager = ProxyFetcher::Manager.new
|
430
|
-
manager.proxies
|
431
|
-
#=> ...
|
432
|
-
```
|
433
|
-
|
434
|
-
You can use multiple providers at the same time:
|
435
|
-
|
436
|
-
```ruby
|
437
|
-
ProxyFetcher.config.providers = :free_proxy_list, :xroxy, :proxy_docker
|
438
|
-
|
439
|
-
manager = ProxyFetcher::Manager.new
|
440
|
-
manager.proxies
|
441
|
-
#=> ...
|
442
|
-
```
|
443
|
-
|
444
|
-
If you want to use all the possible proxy providers then you can configure ProxyFetcher as follows:
|
445
|
-
|
446
|
-
```ruby
|
447
|
-
ProxyFetcher.config.providers = ProxyFetcher::Configuration.registered_providers
|
448
|
-
|
449
|
-
manager = ProxyFetcher::Manager.new.proxies
|
450
|
-
manager.proxies
|
451
|
-
|
452
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
453
|
-
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
454
|
-
```
|
455
|
-
|
456
|
-
Moreover, you can write your own provider! All you need is to create a class, that would be inherited from the
|
457
|
-
`ProxyFetcher::Providers::Base` class, and register your provider like this:
|
458
|
-
|
459
|
-
```ruby
|
460
|
-
ProxyFetcher::Configuration.register_provider(:your_provider, YourProviderClass)
|
461
|
-
```
|
462
|
-
|
463
|
-
Provider class must implement `self.load_proxy_list` and `#to_proxy(html_element)` methods that will load and parse
|
464
|
-
provider HTML page with proxy list. Take a look at the existing providers in the [lib/proxy_fetcher/providers](lib/proxy_fetcher/providers) directory.
|
465
|
-
|
466
|
-
## Contributing
|
467
|
-
|
468
|
-
You are very welcome to help improve ProxyFetcher if you have suggestions for features that other people can use.
|
469
|
-
|
470
|
-
To contribute:
|
471
|
-
|
472
|
-
1. Fork the project.
|
473
|
-
2. Create your feature branch (`git checkout -b my-new-feature`).
|
474
|
-
3. Implement your feature or bug fix.
|
475
|
-
4. Add documentation for your feature or bug fix.
|
476
|
-
5. Run <tt>rake doc:yard</tt>. If your changes are not 100% documented, go back to step 4.
|
477
|
-
6. Add tests for your feature or bug fix.
|
478
|
-
7. Run `rake spec` to make sure all tests pass.
|
479
|
-
8. Commit your changes (`git commit -am 'Add new feature'`).
|
480
|
-
9. Push to the branch (`git push origin my-new-feature`).
|
481
|
-
10. Create new pull request.
|
482
|
-
|
483
|
-
Thanks.
|
484
|
-
|
485
|
-
## License
|
486
|
-
|
487
|
-
`proxy_fetcher` gem is released under the [MIT License](http://www.opensource.org/licenses/MIT).
|
488
|
-
|
489
|
-
Copyright (c) 2017—2018 Nikita Bulai (bulajnikita@gmail.com).
|