proxy_fetcher 0.7.0 → 0.7.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +13 -0
- data/lib/proxy_fetcher/providers/xroxy.rb +1 -1
- data/lib/proxy_fetcher/version.rb +1 -1
- data/proxy_fetcher.gemspec +1 -1
- data/spec/proxy_fetcher/version_spec.rb +1 -1
- metadata +2 -5
- data/.rubocop.yml +0 -14
- data/.travis.yml +0 -26
- data/README.md +0 -489
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 59ba9fd7169617766f99b048a1c889e0213ac57c
|
4
|
+
data.tar.gz: 03e3d51d71fb5c06c0c7c7d5808a8a43c79c50a4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 495ccb751c9f927f1ef31e1b2536c600aec0f68189ea11d1f8061d59bfe41caf2dd0e818d9b780a5b8279c66b20c05873ffba4976cadb101c4aff2a7a449bc94
|
7
|
+
data.tar.gz: a2aae32a4a3649b10399e23520fbf0a19df3ca564a3519290352bd51366cda6f8a709aa7a1783de31e5e474ab23bd86ea2fb41fff320a81eb57a16b3ef111b30
|
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,19 @@
|
|
2
2
|
|
3
3
|
Reverse Chronological Order:
|
4
4
|
|
5
|
+
## `0.7.0` (2018-06-04)
|
6
|
+
|
7
|
+
* Migrate to `HTTP.rb` instead of `Net::HTTP`
|
8
|
+
* Fixes
|
9
|
+
|
10
|
+
## `0.6.5` (2018-04-20)
|
11
|
+
|
12
|
+
* Fix providers
|
13
|
+
|
14
|
+
## `0.6.4` (2018-03-26)
|
15
|
+
|
16
|
+
* Fix providers
|
17
|
+
|
5
18
|
## `0.6.3` (2018-01-26)
|
6
19
|
|
7
20
|
* Add ability to use own proxy for `ProxyFetcher::Client`
|
@@ -5,7 +5,7 @@ module ProxyFetcher
|
|
5
5
|
# XRoxy provider class.
|
6
6
|
class XRoxy < Base
|
7
7
|
# Provider URL to fetch proxy list
|
8
|
-
PROVIDER_URL = '
|
8
|
+
PROVIDER_URL = 'https://www.xroxy.com/proxylist.php'.freeze
|
9
9
|
|
10
10
|
# Fetches HTML content by sending HTTP request to the provider URL and
|
11
11
|
# parses the document (built as abstract <code>ProxyFetcher::Document</code>)
|
data/proxy_fetcher.gemspec
CHANGED
@@ -12,7 +12,7 @@ Gem::Specification.new do |gem|
|
|
12
12
|
gem.email = 'bulajnikita@gmail.com'
|
13
13
|
gem.require_paths = ['lib']
|
14
14
|
gem.bindir = 'bin'
|
15
|
-
gem.files = `git ls-files`.split($RS)
|
15
|
+
gem.files = `git ls-files`.split($RS) - %w[README.md .travis.yml .rubocop.yml]
|
16
16
|
gem.executables = `git ls-files -- bin/*`.split("\n").map { |f| File.basename(f) }
|
17
17
|
gem.homepage = 'http://github.com/nbulaj/proxy_fetcher'
|
18
18
|
gem.license = 'MIT'
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: proxy_fetcher
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.7.
|
4
|
+
version: 0.7.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Nikita Bulai
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-07-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: http
|
@@ -47,13 +47,10 @@ extensions: []
|
|
47
47
|
extra_rdoc_files: []
|
48
48
|
files:
|
49
49
|
- ".gitignore"
|
50
|
-
- ".rubocop.yml"
|
51
|
-
- ".travis.yml"
|
52
50
|
- CHANGELOG.md
|
53
51
|
- CODE_OF_CONDUCT.md
|
54
52
|
- Gemfile
|
55
53
|
- LICENSE
|
56
|
-
- README.md
|
57
54
|
- Rakefile
|
58
55
|
- bin/proxy_fetcher
|
59
56
|
- gemfiles/nokogiri.gemfile
|
data/.rubocop.yml
DELETED
data/.travis.yml
DELETED
@@ -1,26 +0,0 @@
|
|
1
|
-
language: ruby
|
2
|
-
before_install: gem install bundler
|
3
|
-
bundler_args: --without yard guard benchmarks
|
4
|
-
script: "rake spec"
|
5
|
-
env: JRUBY_OPTS="$JRUBY_OPTS --debug"
|
6
|
-
gemfile:
|
7
|
-
- gemfiles/oga.gemfile
|
8
|
-
- gemfiles/nokogiri.gemfile
|
9
|
-
gemfile:
|
10
|
-
- gemfiles/oga.gemfile
|
11
|
-
- gemfiles/nokogiri.gemfile
|
12
|
-
rvm:
|
13
|
-
- 2.0
|
14
|
-
- 2.1
|
15
|
-
- 2.2.4
|
16
|
-
- 2.3.3
|
17
|
-
- 2.4.3
|
18
|
-
- 2.5.0
|
19
|
-
- ruby-head
|
20
|
-
- jruby-9.1.15.0
|
21
|
-
matrix:
|
22
|
-
allow_failures:
|
23
|
-
- rvm: ruby-head
|
24
|
-
exclude:
|
25
|
-
- rvm: 2.0
|
26
|
-
gemfile: gemfiles/nokogiri.gemfile # Nokogiri doesn't support Ruby 2.0
|
data/README.md
DELETED
@@ -1,489 +0,0 @@
|
|
1
|
-
# Ruby / JRuby lib for managing proxies
|
2
|
-
[![Gem Version](https://badge.fury.io/rb/proxy_fetcher.svg)](http://badge.fury.io/rb/proxy_fetcher)
|
3
|
-
[![Build Status](https://travis-ci.org/nbulaj/proxy_fetcher.svg?branch=master)](https://travis-ci.org/nbulaj/proxy_fetcher)
|
4
|
-
[![Coverage Status](https://coveralls.io/repos/github/nbulaj/proxy_fetcher/badge.svg)](https://coveralls.io/github/nbulaj/proxy_fetcher)
|
5
|
-
[![Code Climate](https://codeclimate.com/github/nbulaj/proxy_fetcher/badges/gpa.svg)](https://codeclimate.com/github/nbulaj/proxy_fetcher)
|
6
|
-
[![Inline docs](http://inch-ci.org/github/nbulaj/proxy_fetcher.png?branch=master)](http://inch-ci.org/github/nbulaj/proxy_fetcher)
|
7
|
-
[![License](http://img.shields.io/badge/license-MIT-brightgreen.svg)](#license)
|
8
|
-
|
9
|
-
This gem can help your Ruby / JRuby application to make HTTP(S) requests using
|
10
|
-
proxy by fetching and validating actual proxy lists from multiple providers.
|
11
|
-
|
12
|
-
It gives you a special `Manager` class that can load proxy lists, validate them and return random or specific proxies.
|
13
|
-
It also has a `Client` class that encapsulates all the logic for sending HTTP requests using proxies, automatically
|
14
|
-
fetched and validated by the gem. Take a look at the documentation below to find all the gem features.
|
15
|
-
|
16
|
-
Also this gem can be used with any other programming language (Go / Python / etc) as standalone solution for downloading and
|
17
|
-
validating proxy lists from the different providers. [Checkout examples](#standalone) of usage below.
|
18
|
-
|
19
|
-
## Documentation valid for `master` branch
|
20
|
-
|
21
|
-
Please check the documentation for the version of doorkeeper you are using in:
|
22
|
-
https://github.com/nbulaj/proxy_fetcher/releases
|
23
|
-
|
24
|
-
## Table of Contents
|
25
|
-
|
26
|
-
- [Dependencies](#dependencies)
|
27
|
-
- [Installation](#installation)
|
28
|
-
- [Example of usage](#example-of-usage)
|
29
|
-
- [In Ruby application](#in-ruby-application)
|
30
|
-
- [Standalone](#standalone)
|
31
|
-
- [Client](#client)
|
32
|
-
- [Configuration](#configuration)
|
33
|
-
- [Proxy validation speed](#proxy-validation-speed)
|
34
|
-
- [Proxy object](#proxy-object)
|
35
|
-
- [Providers](#providers)
|
36
|
-
- [Contributing](#contributing)
|
37
|
-
- [License](#license)
|
38
|
-
|
39
|
-
## Dependencies
|
40
|
-
|
41
|
-
ProxyFetcher gem itself requires Ruby `>= 2.0.0` (or [JRuby](http://jruby.org/) `> 9.0`, but maybe earlier too,
|
42
|
-
[see Travis build matrix](.travis.yml)) and great [HTTP.rb gem](https://github.com/httprb/http).
|
43
|
-
|
44
|
-
However, it requires an adapter to parse HTML. If you do not specify any specific adapter, then it will use
|
45
|
-
default one - [Nokogiri](https://github.com/sparklemotion/nokogiri). It's OK for any Ruby on Rails project
|
46
|
-
(because they use it by default).
|
47
|
-
|
48
|
-
But if you want to use some specific adapter (for example your application uses [Oga](https://gitlab.com/yorickpeterse/oga),
|
49
|
-
then you need to manually add your dependencies to your project and configure ProxyFetcher to use another adapter. Moreover,
|
50
|
-
you can implement your own adapter if it your use-case. Take a look at the [Configuration](#configuration) section for more details.
|
51
|
-
|
52
|
-
## Installation
|
53
|
-
|
54
|
-
If using bundler, first add 'proxy_fetcher' to your Gemfile:
|
55
|
-
|
56
|
-
```ruby
|
57
|
-
gem 'proxy_fetcher', '~> 0.7'
|
58
|
-
```
|
59
|
-
|
60
|
-
or if you want to use the latest version (from `master` branch), then:
|
61
|
-
|
62
|
-
```ruby
|
63
|
-
gem 'proxy_fetcher', git: 'https://github.com/nbulaj/proxy_fetcher.git'
|
64
|
-
```
|
65
|
-
|
66
|
-
And run:
|
67
|
-
|
68
|
-
```sh
|
69
|
-
bundle install
|
70
|
-
```
|
71
|
-
|
72
|
-
Otherwise simply install the gem:
|
73
|
-
|
74
|
-
```sh
|
75
|
-
gem install proxy_fetcher -v '0.7'
|
76
|
-
```
|
77
|
-
|
78
|
-
## Example of usage
|
79
|
-
|
80
|
-
### In Ruby application
|
81
|
-
|
82
|
-
By default ProxyFetcher uses all the available proxy providers. To get current proxy list without validation you
|
83
|
-
need to initialize an instance of `ProxyFetcher::Manager` class. During this process ProxyFetcher will automatically load
|
84
|
-
and parse all the proxies:
|
85
|
-
|
86
|
-
```ruby
|
87
|
-
manager = ProxyFetcher::Manager.new # will immediately load proxy list from the server
|
88
|
-
manager.proxies
|
89
|
-
|
90
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
91
|
-
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
92
|
-
```
|
93
|
-
|
94
|
-
You can initialize proxy manager without immediate load of the proxy list from the remote server by passing
|
95
|
-
`refresh: false` on initialization:
|
96
|
-
|
97
|
-
```ruby
|
98
|
-
manager = ProxyFetcher::Manager.new(refresh: false) # just initialize class instance
|
99
|
-
manager.proxies
|
100
|
-
|
101
|
-
#=> []
|
102
|
-
```
|
103
|
-
|
104
|
-
`ProxyFetcher::Manager` class is very helpful when you need to manipulate and manager proxies. To get the proxy
|
105
|
-
from the list yoy can call `.get` or `.pop` method that will return first proxy and move it to the end of the list.
|
106
|
-
This methods has some equivalents like `get!` or aliased `pop!` that will return first **connectable** proxy and
|
107
|
-
move it to the end of the list. They both marked as danger methods because all dead proxies will be removed from the list.
|
108
|
-
|
109
|
-
If you need just some random proxy then call `manager.random_proxy` or it's alias `manager.random`.
|
110
|
-
|
111
|
-
To clean current proxy list from the dead entries that does not respond to the requests you you need to use `cleanup!`
|
112
|
-
or `validate!` method:
|
113
|
-
|
114
|
-
```ruby
|
115
|
-
manager.cleanup! # or manager.validate!
|
116
|
-
```
|
117
|
-
|
118
|
-
This action will enumerate proxy list and remove all the entries that doesn't respond by timeout or returns errors.
|
119
|
-
|
120
|
-
In order to increase the performance proxy list validation is performed using Ruby threads. By default gem creates a
|
121
|
-
pool with 10 threads, but you can increase this number by changing `pool_size` configuration option: `ProxyFetcher.config.pool_size = 50`.
|
122
|
-
Read more in [Proxy validation speed](#proxy-validation-speed) section.
|
123
|
-
|
124
|
-
If you need raw proxy URLs (like `host:port`) then you can use `raw_proxies` methods that will return array of strings:
|
125
|
-
|
126
|
-
```ruby
|
127
|
-
manager = ProxyFetcher::Manager.new
|
128
|
-
manager.raw_proxies
|
129
|
-
|
130
|
-
# => ["97.77.104.22:3128", "94.23.205.32:3128", "209.79.65.140:8080",
|
131
|
-
# "91.217.42.2:8080", "97.77.104.22:80", "165.234.102.177:8080", ...]
|
132
|
-
```
|
133
|
-
|
134
|
-
You don't need to initialize a new manager every time you want to load actual proxy list from the providers. All you
|
135
|
-
need is to refresh the proxy list by calling `#refresh_list!` (or `#fetch!`) method for your `ProxyFetcher::Manager` instance:
|
136
|
-
|
137
|
-
```ruby
|
138
|
-
manager.refresh_list! # or manager.fetch!
|
139
|
-
|
140
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
141
|
-
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
142
|
-
```
|
143
|
-
|
144
|
-
If you need to filter proxy list, for example, by country or response time and selected provider supports filtering with GET params,
|
145
|
-
then you can just pass your filters like a simple Ruby hash to the Manager instance:
|
146
|
-
|
147
|
-
```ruby
|
148
|
-
ProxyFetcher.config.providers = :proxy_docker
|
149
|
-
|
150
|
-
manager = ProxyFetcher::Manager.new(filters: { country: 'PL', maxtime: '500' })
|
151
|
-
manager.proxies
|
152
|
-
|
153
|
-
# => [...]
|
154
|
-
```
|
155
|
-
|
156
|
-
If you are using multiple providers, then you can split your filters by proxy provider names:
|
157
|
-
|
158
|
-
```ruby
|
159
|
-
ProxyFetcher.config.providers = [:proxy_docker, :xroxy]
|
160
|
-
|
161
|
-
manager = ProxyFetcher::Manager.new(filters: {
|
162
|
-
hide_my_name: {
|
163
|
-
country: 'PL',
|
164
|
-
maxtime: '500'
|
165
|
-
},
|
166
|
-
xroxy: {
|
167
|
-
type: 'All_http'
|
168
|
-
}
|
169
|
-
})
|
170
|
-
|
171
|
-
manager.proxies
|
172
|
-
|
173
|
-
# => [...]
|
174
|
-
```
|
175
|
-
|
176
|
-
You can apply different filters every time you calling `#refresh_list!` (or `#fetch!`) method:
|
177
|
-
|
178
|
-
```ruby
|
179
|
-
manager.refresh_list!(country: 'PL', maxtime: '500')
|
180
|
-
|
181
|
-
# => [...]
|
182
|
-
```
|
183
|
-
|
184
|
-
*NOTE*: not all the providers support filtering. Take a look at the provider classes to see if it supports custom filters.
|
185
|
-
|
186
|
-
### Standalone
|
187
|
-
|
188
|
-
All you need to use this gem is Ruby >= 2.0 (2.4 is recommended). You can install it in a different ways. If you are using Ubuntu Xenial (16.04LTS)
|
189
|
-
then you already have Ruby 2.3 installed. In other cases you can install it with [RVM](https://rvm.io/) or [rbenv](https://github.com/rbenv/rbenv).
|
190
|
-
|
191
|
-
After installing Ruby just bundle the gem by running `gem install proxy_fetcher` in your terminal and now you can run it:
|
192
|
-
|
193
|
-
```bash
|
194
|
-
proxy_fetcher >> proxies.txt # Will download proxies from the default provider, validate them and write to file
|
195
|
-
```
|
196
|
-
|
197
|
-
If you need a list of proxies from some specific provider, then you need to pass it's name with `-p` option:
|
198
|
-
|
199
|
-
```bash
|
200
|
-
proxy_fetcher -p proxy_docker >> proxies.txt # Will download proxies from the default provider, validate them and write to file
|
201
|
-
```
|
202
|
-
|
203
|
-
If you need a list of proxies in JSON format just pass a `--json` option to the command:
|
204
|
-
|
205
|
-
```bash
|
206
|
-
proxy_fetcher --json
|
207
|
-
|
208
|
-
# Will print:
|
209
|
-
# {"proxies":["120.26.206.178:80","119.61.13.242:1080","117.40.213.26:80","92.62.72.242:1080","77.53.105.155:3124"
|
210
|
-
# "58.20.41.172:35923","204.116.192.151:35923","190.5.96.58:1080","170.250.109.97:35923","121.41.82.99:1080"]}
|
211
|
-
```
|
212
|
-
|
213
|
-
To get all the possible options run:
|
214
|
-
|
215
|
-
```bash
|
216
|
-
proxy_fetcher --help
|
217
|
-
```
|
218
|
-
|
219
|
-
## Client
|
220
|
-
|
221
|
-
ProxyFetcher gem provides you a ready-to-use HTTP client that made requesting with proxies easy. It does all the work
|
222
|
-
with the proxy lists for you (load, validate, refresh, find proxy by type, follow redirects, etc). All you need it to
|
223
|
-
make HTTP(S) requests:
|
224
|
-
|
225
|
-
```ruby
|
226
|
-
require 'proxy-fetcher'
|
227
|
-
|
228
|
-
ProxyFetcher::Client.get 'https://example.com/resource'
|
229
|
-
|
230
|
-
ProxyFetcher::Client.post 'https://example.com/resource', { param: 'value' }
|
231
|
-
|
232
|
-
ProxyFetcher::Client.post 'https://example.com/resource', 'Any data'
|
233
|
-
|
234
|
-
ProxyFetcher::Client.post 'https://example.com/resource', { param: 'value'}.to_json , headers: { 'Content-Type': 'application/json' }
|
235
|
-
|
236
|
-
ProxyFetcher::Client.put 'https://example.com/resource', { param: 'value' }
|
237
|
-
|
238
|
-
ProxyFetcher::Client.patch 'https://example.com/resource', { param: 'value' }
|
239
|
-
|
240
|
-
ProxyFetcher::Client.delete 'https://example.com/resource'
|
241
|
-
```
|
242
|
-
|
243
|
-
By default, `ProxyFetcher::Client` makes 1000 attempts to send a HTTP request in case if proxy is out of order or the
|
244
|
-
remote server returns an error. You can increase or decrease this number for your case or set it to `nil` if you want to
|
245
|
-
make infinite number of requests (or before your Ruby process will die :skull:):
|
246
|
-
|
247
|
-
```ruby
|
248
|
-
require 'proxy-fetcher'
|
249
|
-
|
250
|
-
ProxyFetcher::Client.get 'https://example.com/resource', options: { max_retries: 10_000 }
|
251
|
-
```
|
252
|
-
|
253
|
-
You can also use your own proxy object when using ProxyFetcher client:
|
254
|
-
|
255
|
-
```ruby
|
256
|
-
require 'proxy-fetcher'
|
257
|
-
|
258
|
-
manager = ProxyFetcher::Manager.new # will immediately load proxy list from the server
|
259
|
-
|
260
|
-
#random will return random proxy object from the list
|
261
|
-
ProxyFetcher::Client.get 'https://example.com/resource', options: { proxy: manager.random }
|
262
|
-
```
|
263
|
-
|
264
|
-
Btw, if you need support of JavaScript or some other features, you need to implement your own client using, for example,
|
265
|
-
`selenium-webdriver`.
|
266
|
-
|
267
|
-
## Configuration
|
268
|
-
|
269
|
-
ProxyFetcher is very flexible gem. You can configure the most important parts of the library and use your own solutions.
|
270
|
-
|
271
|
-
Default configuration looks as follows:
|
272
|
-
|
273
|
-
```ruby
|
274
|
-
ProxyFetcher.configure do |config|
|
275
|
-
config.logger = Logger.new(STDOUT)
|
276
|
-
config.user_agent = ProxyFetcher::Configuration::DEFAULT_USER_AGENT
|
277
|
-
config.pool_size = 10
|
278
|
-
config.timeout = 3
|
279
|
-
config.http_client = ProxyFetcher::HTTPClient
|
280
|
-
config.proxy_validator = ProxyFetcher::ProxyValidator
|
281
|
-
config.providers = ProxyFetcher::Configuration.registered_providers
|
282
|
-
config.adapter = ProxyFetcher::Configuration::DEFAULT_ADAPTER # :nokogiri by default
|
283
|
-
end
|
284
|
-
```
|
285
|
-
|
286
|
-
You can change any of the options above. Let's look at this deeper.
|
287
|
-
|
288
|
-
To change open/read timeout for `cleanup!` and `connectable?` methods you need to change `timeout` options:
|
289
|
-
|
290
|
-
```ruby
|
291
|
-
ProxyFetcher.configure do |config|
|
292
|
-
config.timeout = 1 # default is 3
|
293
|
-
end
|
294
|
-
|
295
|
-
manager = ProxyFetcher::Manager.new
|
296
|
-
manager.cleanup!
|
297
|
-
```
|
298
|
-
|
299
|
-
Also you can set your custom User-Agent string:
|
300
|
-
|
301
|
-
```ruby
|
302
|
-
ProxyFetcher.configure do |config|
|
303
|
-
config.user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
|
304
|
-
end
|
305
|
-
```
|
306
|
-
|
307
|
-
ProxyFetcher uses HTTP.rb gem for dealing with HTTP(S) requests. It is fast enough and has a great chainable API.
|
308
|
-
If you wanna add, for example, your custom provider that was developed as a Single Page Application (SPA) with some JavaScript,
|
309
|
-
then you will need something like [selenium-webdriver](https://github.com/SeleniumHQ/selenium/tree/master/rb) to properly
|
310
|
-
load the content of the website. For those and other cases you can write your own class for fetching HTML content by
|
311
|
-
the URL and setup it in the ProxyFetcher config:
|
312
|
-
|
313
|
-
```ruby
|
314
|
-
class MyHTTPClient
|
315
|
-
# [IMPORTANT]: below methods are required!
|
316
|
-
def self.fetch(url)
|
317
|
-
# ... some magic to return proper HTML ...
|
318
|
-
end
|
319
|
-
end
|
320
|
-
|
321
|
-
ProxyFetcher.config.http_client = MyHTTPClient
|
322
|
-
|
323
|
-
manager = ProxyFetcher::Manager.new
|
324
|
-
manager.proxies
|
325
|
-
|
326
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
327
|
-
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
328
|
-
```
|
329
|
-
|
330
|
-
You can take a look at the [lib/proxy_fetcher/utils/http_client.rb](lib/proxy_fetcher/utils/http_client.rb) for an example.
|
331
|
-
|
332
|
-
Moreover, you can write your own proxy validator to check if proxy is valid or not:
|
333
|
-
|
334
|
-
```ruby
|
335
|
-
class MyProxyValidator
|
336
|
-
# [IMPORTANT]: below methods are required!
|
337
|
-
def self.connectable?(proxy_addr, proxy_port)
|
338
|
-
# ... some magic to check if proxy is valid ...
|
339
|
-
end
|
340
|
-
end
|
341
|
-
|
342
|
-
ProxyFetcher.config.proxy_validator = MyProxyValidator
|
343
|
-
|
344
|
-
manager = ProxyFetcher::Manager.new
|
345
|
-
manager.proxies
|
346
|
-
|
347
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
348
|
-
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
349
|
-
|
350
|
-
manager.validate!
|
351
|
-
|
352
|
-
#=> [ ... ]
|
353
|
-
```
|
354
|
-
|
355
|
-
Be default, ProxyFetcher gem uses [Nokogiri](https://github.com/sparklemotion/nokogiri) for parsing HTML. If you want
|
356
|
-
to use [Oga](https://gitlab.com/yorickpeterse/oga) instead, then you need to add `gem 'oga'` to your Gemfile and configure
|
357
|
-
ProxyFetcher as follows:
|
358
|
-
|
359
|
-
```ruby
|
360
|
-
ProxyFetcher.config.adapter = :oga
|
361
|
-
```
|
362
|
-
|
363
|
-
Also you can write your own HTML parser implementation and use it, take a look at the [abstract class and implementations](lib/proxy_fetcher/document).
|
364
|
-
Configure it as:
|
365
|
-
|
366
|
-
```ruby
|
367
|
-
ProxyFetcher.config.adapter = MyHTMLParserClass
|
368
|
-
```
|
369
|
-
|
370
|
-
### Proxy validation speed
|
371
|
-
|
372
|
-
There are some tricks to increase proxy list validation performance.
|
373
|
-
|
374
|
-
In a few words, ProxyFetcher gem uses threads to validate proxies for availability. Every proxy is checked in a
|
375
|
-
separate thread. By default, ProxyFetcher uses a pool with a maximum of 10 threads. You can increase this number by
|
376
|
-
setting max number of threads in the config:
|
377
|
-
|
378
|
-
```ruby
|
379
|
-
ProxyFetcher.config.pool_size = 50
|
380
|
-
```
|
381
|
-
|
382
|
-
You can experiment with the threads pool size to find an optimal number of maximum threads count for you PC and OS.
|
383
|
-
This will definitely give you some performance improvements.
|
384
|
-
|
385
|
-
Moreover, the common proxy validation speed depends on `ProxyFetcher.config.timeout` option that is equal
|
386
|
-
to `3` by default. It means that gem will wait 3 seconds for the server answer to check if particular proxy is connectable.
|
387
|
-
You can decrease this option to `1`, for example, and it will heavily increase proxy validation speed (**but remember**
|
388
|
-
that some proxies could be connectable, but slow, so with this option you will clear proxy list from the proxies that
|
389
|
-
works, but very slow).
|
390
|
-
|
391
|
-
## Proxy object
|
392
|
-
|
393
|
-
Every proxy is a `ProxyFetcher::Proxy` object that has next readers (instance variables):
|
394
|
-
|
395
|
-
* `addr` (IP address)
|
396
|
-
* `port`
|
397
|
-
* `type` (proxy type, can be HTTP, HTTPS, SOCKS4 or/and SOCKS5)
|
398
|
-
* `country` (USA or Brazil for example)
|
399
|
-
* `response_time` (5217 for example)
|
400
|
-
* `anonymity` (`Low`, `Elite proxy` or `High +KA` for example)
|
401
|
-
|
402
|
-
Also you can call next instance methods for every Proxy object:
|
403
|
-
|
404
|
-
* `connectable?` (whether proxy server is available)
|
405
|
-
* `http?` (whether proxy server has a HTTP protocol)
|
406
|
-
* `https?` (whether proxy server has a HTTPS protocol)
|
407
|
-
* `socks4?`
|
408
|
-
* `socks5?`
|
409
|
-
* `uri` (returns `URI::Generic` object)
|
410
|
-
* `url` (returns a formatted URL like "_IP:PORT_" or "_http://IP:PORT_" if `scheme: true` provided)
|
411
|
-
|
412
|
-
## Providers
|
413
|
-
|
414
|
-
Currently ProxyFetcher can deal with next proxy providers (services):
|
415
|
-
|
416
|
-
* Free Proxy List
|
417
|
-
* Free SSL Proxies
|
418
|
-
* Proxy Docker
|
419
|
-
* Gather Proxy
|
420
|
-
* HTTP Tunnel Genius
|
421
|
-
* Proxy List
|
422
|
-
* XRoxy
|
423
|
-
|
424
|
-
If you wanna use one of them just setup it in the config:
|
425
|
-
|
426
|
-
```ruby
|
427
|
-
ProxyFetcher.config.provider = :free_proxy_list
|
428
|
-
|
429
|
-
manager = ProxyFetcher::Manager.new
|
430
|
-
manager.proxies
|
431
|
-
#=> ...
|
432
|
-
```
|
433
|
-
|
434
|
-
You can use multiple providers at the same time:
|
435
|
-
|
436
|
-
```ruby
|
437
|
-
ProxyFetcher.config.providers = :free_proxy_list, :xroxy, :proxy_docker
|
438
|
-
|
439
|
-
manager = ProxyFetcher::Manager.new
|
440
|
-
manager.proxies
|
441
|
-
#=> ...
|
442
|
-
```
|
443
|
-
|
444
|
-
If you want to use all the possible proxy providers then you can configure ProxyFetcher as follows:
|
445
|
-
|
446
|
-
```ruby
|
447
|
-
ProxyFetcher.config.providers = ProxyFetcher::Configuration.registered_providers
|
448
|
-
|
449
|
-
manager = ProxyFetcher::Manager.new.proxies
|
450
|
-
manager.proxies
|
451
|
-
|
452
|
-
#=> [#<ProxyFetcher::Proxy:0x00000002879680 @addr="97.77.104.22", @port=3128, @country="USA",
|
453
|
-
# @response_time=5217, @type="HTTP", @anonymity="High">, ... ]
|
454
|
-
```
|
455
|
-
|
456
|
-
Moreover, you can write your own provider! All you need is to create a class, that would be inherited from the
|
457
|
-
`ProxyFetcher::Providers::Base` class, and register your provider like this:
|
458
|
-
|
459
|
-
```ruby
|
460
|
-
ProxyFetcher::Configuration.register_provider(:your_provider, YourProviderClass)
|
461
|
-
```
|
462
|
-
|
463
|
-
Provider class must implement `self.load_proxy_list` and `#to_proxy(html_element)` methods that will load and parse
|
464
|
-
provider HTML page with proxy list. Take a look at the existing providers in the [lib/proxy_fetcher/providers](lib/proxy_fetcher/providers) directory.
|
465
|
-
|
466
|
-
## Contributing
|
467
|
-
|
468
|
-
You are very welcome to help improve ProxyFetcher if you have suggestions for features that other people can use.
|
469
|
-
|
470
|
-
To contribute:
|
471
|
-
|
472
|
-
1. Fork the project.
|
473
|
-
2. Create your feature branch (`git checkout -b my-new-feature`).
|
474
|
-
3. Implement your feature or bug fix.
|
475
|
-
4. Add documentation for your feature or bug fix.
|
476
|
-
5. Run <tt>rake doc:yard</tt>. If your changes are not 100% documented, go back to step 4.
|
477
|
-
6. Add tests for your feature or bug fix.
|
478
|
-
7. Run `rake spec` to make sure all tests pass.
|
479
|
-
8. Commit your changes (`git commit -am 'Add new feature'`).
|
480
|
-
9. Push to the branch (`git push origin my-new-feature`).
|
481
|
-
10. Create new pull request.
|
482
|
-
|
483
|
-
Thanks.
|
484
|
-
|
485
|
-
## License
|
486
|
-
|
487
|
-
`proxy_fetcher` gem is released under the [MIT License](http://www.opensource.org/licenses/MIT).
|
488
|
-
|
489
|
-
Copyright (c) 2017—2018 Nikita Bulai (bulajnikita@gmail.com).
|