metainspector 4.5.0 → 4.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +20 -7
- data/README.md +15 -0
- data/lib/meta_inspector/document.rb +4 -1
- data/lib/meta_inspector/request.rb +4 -1
- data/lib/meta_inspector/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6ee8411bb9ed926d53b27d0723f07ffb1ccf21d3
|
4
|
+
data.tar.gz: cb45f0aed52790578cc841f1caf35c933b290c80
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7c64e3088e40204c2e32a6a00b2531828f5610fddbac8addb1f8032944562c25bd8fb7fb68c54149adc5cc279fbca2633901d4e927ff2a783c73df1eb083bb37
|
7
|
+
data.tar.gz: 72bf81478a0efd1cb57c22311aa13568f78cd57409f3dd54e3d32af46651e5cbed3b5d7e996d921b2b5de9d5454b92cda8d2bf271171aa97ff927dda44a7b1e4
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,19 @@
|
|
1
1
|
# MetaInpector Changelog
|
2
2
|
|
3
|
-
## Changes in 4.5
|
3
|
+
## [Changes in 4.6](https://github.com/jaimeiniesta/metainspector/compare/v4.5.0...v4.6.0)
|
4
|
+
|
5
|
+
Faraday can be passed options via `:faraday_options`. This is useful in cases where we need to
|
6
|
+
customize the way we request the page, like for example disabling SSL verification, like this:
|
7
|
+
|
8
|
+
```ruby
|
9
|
+
MetaInspector.new('https://example.com')
|
10
|
+
# Faraday::SSLError: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
|
11
|
+
|
12
|
+
MetaInpector.new('https://example.com', faraday_options: { ssl: { verify: false } })
|
13
|
+
# Now we can access the page
|
14
|
+
```
|
15
|
+
|
16
|
+
## [Changes in 4.5](https://github.com/jaimeiniesta/metainspector/compare/v4.4.0...v4.5.0)
|
4
17
|
|
5
18
|
* The Document API now includes access to head/link elements
|
6
19
|
* `page.head_links` returns an array of hashes of all head/links.
|
@@ -15,16 +28,16 @@
|
|
15
28
|
* The images API has been extended:
|
16
29
|
* `page.images.with_size` returns a sorted array (by descending area) of [image_url, width, height]
|
17
30
|
|
18
|
-
## Changes in 4.4
|
31
|
+
## [Changes in 4.4](https://github.com/jaimeiniesta/metainspector/compare/v4.3.0...v4.4.0)
|
19
32
|
|
20
33
|
The default headers now include `'Accept-Encoding' => 'identity'` to minimize trouble with servers that respond with malformed compressed responses, [as explained here](https://github.com/lostisland/faraday/issues/337).
|
21
34
|
|
22
|
-
## Changes in 4.3
|
35
|
+
## [Changes in 4.3](https://github.com/jaimeiniesta/metainspector/compare/v4.3.0...v4.4.0)
|
23
36
|
|
24
37
|
* The Document API has been extended with one new method `page.best_title` that returns the longest text available from a selection of candidates.
|
25
38
|
* `to_hash` now includes `scheme`, `host`, `root_url`, `best_title` and `description`.
|
26
39
|
|
27
|
-
## Changes in 4.2
|
40
|
+
## [Changes in 4.2](https://github.com/jaimeiniesta/metainspector/compare/v4.1.0...v4.2.0)
|
28
41
|
|
29
42
|
* The images API has been extended, with two new methods:
|
30
43
|
|
@@ -33,11 +46,11 @@ The default headers now include `'Accept-Encoding' => 'identity'` to minimize tr
|
|
33
46
|
|
34
47
|
* The criteria for `page.images.best` has changed slightly, we'll now return the largest image instead of the first image if no owner-suggested image is found.
|
35
48
|
|
36
|
-
## Changes in 4.1
|
49
|
+
## [Changes in 4.1](https://github.com/jaimeiniesta/metainspector/compare/v4.0.0...v4.1.0)
|
37
50
|
|
38
51
|
* Introduces the `:normalize_url` option, which allows to disable URL normalization.
|
39
52
|
|
40
|
-
## Changes in 4.0
|
53
|
+
## [Changes in 4.0](https://github.com/jaimeiniesta/metainspector/compare/v3.0.0...v4.0.0)
|
41
54
|
|
42
55
|
* The links API has been changed, now instead of `page.links`, `page.internal_links` and `page.external_links` we have:
|
43
56
|
|
@@ -56,7 +69,7 @@ page.links.external # Returns all external HTTP links found
|
|
56
69
|
|
57
70
|
* You can now specify 2 different timeouts, `connection_timeout` and `read_timeout`, instead of the previous single `timeout`.
|
58
71
|
|
59
|
-
## Changes in 3.0
|
72
|
+
## [Changes in 3.0](https://github.com/jaimeiniesta/metainspector/compare/v2.0.0...v3.0.0)
|
60
73
|
|
61
74
|
* The redirect API has been changed, now the `:allow_redirections` option will expect only a boolean, which by default is `true`. That is, no more specifying `:safe`, `:unsafe` or `:all`.
|
62
75
|
* We've dropped support for Ruby < 2.
|
data/README.md
CHANGED
@@ -311,6 +311,21 @@ If you want to override the default headers then use the `headers` option:
|
|
311
311
|
page = MetaInspector.new('example.com', :headers => {'User-Agent' => 'My custom User-Agent'})
|
312
312
|
```
|
313
313
|
|
314
|
+
### Disabling SSL verification (or any other Faraday options)
|
315
|
+
|
316
|
+
Faraday can be passed options via `:faraday_options`.
|
317
|
+
|
318
|
+
This is useful in cases where we need to
|
319
|
+
customize the way we request the page, like for example disabling SSL verification, like this:
|
320
|
+
|
321
|
+
```ruby
|
322
|
+
MetaInspector.new('https://example.com')
|
323
|
+
# Faraday::SSLError: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
|
324
|
+
|
325
|
+
MetaInpector.new('https://example.com', faraday_options: { ssl: { verify: false } })
|
326
|
+
# Now we can access the page
|
327
|
+
```
|
328
|
+
|
314
329
|
### HTML Content Only
|
315
330
|
|
316
331
|
MetaInspector will try to parse all URLs by default. If you want to raise an exception when trying to parse a non-html URL (one that has a content-type different than text/html), you can state it like this:
|
@@ -18,6 +18,7 @@ module MetaInspector
|
|
18
18
|
# Can be :warn, :raise or nil
|
19
19
|
# * headers: object containing custom headers for the request
|
20
20
|
# * normalize_url: true by default
|
21
|
+
# * faraday_options: an optional hash of options to pass to Faraday on the request
|
21
22
|
def initialize(initial_url, options = {})
|
22
23
|
options = defaults.merge(options)
|
23
24
|
@connection_timeout = options[:connection_timeout]
|
@@ -31,6 +32,7 @@ module MetaInspector
|
|
31
32
|
@warn_level = options[:warn_level]
|
32
33
|
@exception_log = options[:exception_log] || MetaInspector::ExceptionLog.new(warn_level: warn_level)
|
33
34
|
@normalize_url = options[:normalize_url]
|
35
|
+
@faraday_options = options[:faraday_options]
|
34
36
|
@url = MetaInspector::URL.new(initial_url, exception_log: @exception_log,
|
35
37
|
normalize: @normalize_url)
|
36
38
|
@request = MetaInspector::Request.new(@url, allow_redirections: @allow_redirections,
|
@@ -38,7 +40,8 @@ module MetaInspector
|
|
38
40
|
read_timeout: @read_timeout,
|
39
41
|
retries: @retries,
|
40
42
|
exception_log: @exception_log,
|
41
|
-
headers: @headers
|
43
|
+
headers: @headers,
|
44
|
+
faraday_options: @faraday_options) unless @document
|
42
45
|
@parser = MetaInspector::Parser.new(self, exception_log: @exception_log,
|
43
46
|
download_images: @download_images)
|
44
47
|
end
|
@@ -17,6 +17,7 @@ module MetaInspector
|
|
17
17
|
@retries = options[:retries]
|
18
18
|
@exception_log = options[:exception_log]
|
19
19
|
@headers = options[:headers]
|
20
|
+
@faraday_options = options[:faraday_options] || {}
|
20
21
|
|
21
22
|
response # request early so we can fail early
|
22
23
|
end
|
@@ -44,7 +45,9 @@ module MetaInspector
|
|
44
45
|
private
|
45
46
|
|
46
47
|
def fetch
|
47
|
-
|
48
|
+
@faraday_options.merge!(:url => url)
|
49
|
+
|
50
|
+
session = Faraday.new(@faraday_options) do |faraday|
|
48
51
|
faraday.request :retry, max: @retries
|
49
52
|
|
50
53
|
if @allow_redirections
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: metainspector
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 4.
|
4
|
+
version: 4.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jaime Iniesta
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-
|
11
|
+
date: 2015-06-11 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|