metainspector 4.5.0 → 4.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +20 -7
- data/README.md +15 -0
- data/lib/meta_inspector/document.rb +4 -1
- data/lib/meta_inspector/request.rb +4 -1
- data/lib/meta_inspector/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6ee8411bb9ed926d53b27d0723f07ffb1ccf21d3
|
4
|
+
data.tar.gz: cb45f0aed52790578cc841f1caf35c933b290c80
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7c64e3088e40204c2e32a6a00b2531828f5610fddbac8addb1f8032944562c25bd8fb7fb68c54149adc5cc279fbca2633901d4e927ff2a783c73df1eb083bb37
|
7
|
+
data.tar.gz: 72bf81478a0efd1cb57c22311aa13568f78cd57409f3dd54e3d32af46651e5cbed3b5d7e996d921b2b5de9d5454b92cda8d2bf271171aa97ff927dda44a7b1e4
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,19 @@
|
|
1
1
|
# MetaInpector Changelog
|
2
2
|
|
3
|
-
## Changes in 4.5
|
3
|
+
## [Changes in 4.6](https://github.com/jaimeiniesta/metainspector/compare/v4.5.0...v4.6.0)
|
4
|
+
|
5
|
+
Faraday can be passed options via `:faraday_options`. This is useful in cases where we need to
|
6
|
+
customize the way we request the page, like for example disabling SSL verification, like this:
|
7
|
+
|
8
|
+
```ruby
|
9
|
+
MetaInspector.new('https://example.com')
|
10
|
+
# Faraday::SSLError: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
|
11
|
+
|
12
|
+
MetaInpector.new('https://example.com', faraday_options: { ssl: { verify: false } })
|
13
|
+
# Now we can access the page
|
14
|
+
```
|
15
|
+
|
16
|
+
## [Changes in 4.5](https://github.com/jaimeiniesta/metainspector/compare/v4.4.0...v4.5.0)
|
4
17
|
|
5
18
|
* The Document API now includes access to head/link elements
|
6
19
|
* `page.head_links` returns an array of hashes of all head/links.
|
@@ -15,16 +28,16 @@
|
|
15
28
|
* The images API has been extended:
|
16
29
|
* `page.images.with_size` returns a sorted array (by descending area) of [image_url, width, height]
|
17
30
|
|
18
|
-
## Changes in 4.4
|
31
|
+
## [Changes in 4.4](https://github.com/jaimeiniesta/metainspector/compare/v4.3.0...v4.4.0)
|
19
32
|
|
20
33
|
The default headers now include `'Accept-Encoding' => 'identity'` to minimize trouble with servers that respond with malformed compressed responses, [as explained here](https://github.com/lostisland/faraday/issues/337).
|
21
34
|
|
22
|
-
## Changes in 4.3
|
35
|
+
## [Changes in 4.3](https://github.com/jaimeiniesta/metainspector/compare/v4.3.0...v4.4.0)
|
23
36
|
|
24
37
|
* The Document API has been extended with one new method `page.best_title` that returns the longest text available from a selection of candidates.
|
25
38
|
* `to_hash` now includes `scheme`, `host`, `root_url`, `best_title` and `description`.
|
26
39
|
|
27
|
-
## Changes in 4.2
|
40
|
+
## [Changes in 4.2](https://github.com/jaimeiniesta/metainspector/compare/v4.1.0...v4.2.0)
|
28
41
|
|
29
42
|
* The images API has been extended, with two new methods:
|
30
43
|
|
@@ -33,11 +46,11 @@ The default headers now include `'Accept-Encoding' => 'identity'` to minimize tr
|
|
33
46
|
|
34
47
|
* The criteria for `page.images.best` has changed slightly, we'll now return the largest image instead of the first image if no owner-suggested image is found.
|
35
48
|
|
36
|
-
## Changes in 4.1
|
49
|
+
## [Changes in 4.1](https://github.com/jaimeiniesta/metainspector/compare/v4.0.0...v4.1.0)
|
37
50
|
|
38
51
|
* Introduces the `:normalize_url` option, which allows to disable URL normalization.
|
39
52
|
|
40
|
-
## Changes in 4.0
|
53
|
+
## [Changes in 4.0](https://github.com/jaimeiniesta/metainspector/compare/v3.0.0...v4.0.0)
|
41
54
|
|
42
55
|
* The links API has been changed, now instead of `page.links`, `page.internal_links` and `page.external_links` we have:
|
43
56
|
|
@@ -56,7 +69,7 @@ page.links.external # Returns all external HTTP links found
|
|
56
69
|
|
57
70
|
* You can now specify 2 different timeouts, `connection_timeout` and `read_timeout`, instead of the previous single `timeout`.
|
58
71
|
|
59
|
-
## Changes in 3.0
|
72
|
+
## [Changes in 3.0](https://github.com/jaimeiniesta/metainspector/compare/v2.0.0...v3.0.0)
|
60
73
|
|
61
74
|
* The redirect API has been changed, now the `:allow_redirections` option will expect only a boolean, which by default is `true`. That is, no more specifying `:safe`, `:unsafe` or `:all`.
|
62
75
|
* We've dropped support for Ruby < 2.
|
data/README.md
CHANGED
@@ -311,6 +311,21 @@ If you want to override the default headers then use the `headers` option:
|
|
311
311
|
page = MetaInspector.new('example.com', :headers => {'User-Agent' => 'My custom User-Agent'})
|
312
312
|
```
|
313
313
|
|
314
|
+
### Disabling SSL verification (or any other Faraday options)
|
315
|
+
|
316
|
+
Faraday can be passed options via `:faraday_options`.
|
317
|
+
|
318
|
+
This is useful in cases where we need to
|
319
|
+
customize the way we request the page, like for example disabling SSL verification, like this:
|
320
|
+
|
321
|
+
```ruby
|
322
|
+
MetaInspector.new('https://example.com')
|
323
|
+
# Faraday::SSLError: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
|
324
|
+
|
325
|
+
MetaInpector.new('https://example.com', faraday_options: { ssl: { verify: false } })
|
326
|
+
# Now we can access the page
|
327
|
+
```
|
328
|
+
|
314
329
|
### HTML Content Only
|
315
330
|
|
316
331
|
MetaInspector will try to parse all URLs by default. If you want to raise an exception when trying to parse a non-html URL (one that has a content-type different than text/html), you can state it like this:
|
@@ -18,6 +18,7 @@ module MetaInspector
|
|
18
18
|
# Can be :warn, :raise or nil
|
19
19
|
# * headers: object containing custom headers for the request
|
20
20
|
# * normalize_url: true by default
|
21
|
+
# * faraday_options: an optional hash of options to pass to Faraday on the request
|
21
22
|
def initialize(initial_url, options = {})
|
22
23
|
options = defaults.merge(options)
|
23
24
|
@connection_timeout = options[:connection_timeout]
|
@@ -31,6 +32,7 @@ module MetaInspector
|
|
31
32
|
@warn_level = options[:warn_level]
|
32
33
|
@exception_log = options[:exception_log] || MetaInspector::ExceptionLog.new(warn_level: warn_level)
|
33
34
|
@normalize_url = options[:normalize_url]
|
35
|
+
@faraday_options = options[:faraday_options]
|
34
36
|
@url = MetaInspector::URL.new(initial_url, exception_log: @exception_log,
|
35
37
|
normalize: @normalize_url)
|
36
38
|
@request = MetaInspector::Request.new(@url, allow_redirections: @allow_redirections,
|
@@ -38,7 +40,8 @@ module MetaInspector
|
|
38
40
|
read_timeout: @read_timeout,
|
39
41
|
retries: @retries,
|
40
42
|
exception_log: @exception_log,
|
41
|
-
headers: @headers
|
43
|
+
headers: @headers,
|
44
|
+
faraday_options: @faraday_options) unless @document
|
42
45
|
@parser = MetaInspector::Parser.new(self, exception_log: @exception_log,
|
43
46
|
download_images: @download_images)
|
44
47
|
end
|
@@ -17,6 +17,7 @@ module MetaInspector
|
|
17
17
|
@retries = options[:retries]
|
18
18
|
@exception_log = options[:exception_log]
|
19
19
|
@headers = options[:headers]
|
20
|
+
@faraday_options = options[:faraday_options] || {}
|
20
21
|
|
21
22
|
response # request early so we can fail early
|
22
23
|
end
|
@@ -44,7 +45,9 @@ module MetaInspector
|
|
44
45
|
private
|
45
46
|
|
46
47
|
def fetch
|
47
|
-
|
48
|
+
@faraday_options.merge!(:url => url)
|
49
|
+
|
50
|
+
session = Faraday.new(@faraday_options) do |faraday|
|
48
51
|
faraday.request :retry, max: @retries
|
49
52
|
|
50
53
|
if @allow_redirections
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: metainspector
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 4.
|
4
|
+
version: 4.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jaime Iniesta
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-
|
11
|
+
date: 2015-06-11 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|