metainspector 4.7.0 → 4.7.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -0
- data/README.md +15 -0
- data/lib/meta_inspector/parser.rb +2 -0
- data/lib/meta_inspector/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 76c49fb7187563a3e9daff74a8f3416521251fa6
|
4
|
+
data.tar.gz: 0c67668db7ec465badcbb6666b4d3183b1f3e70e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 164db60a1bf7139c1fa4f92ad459073df0b0f0b2adf6a1c48aba960afcbcdcd02c8cbc66d4283b3b1f4967d8a1915f9aca2cf303384507ff74571cfaf17bf0c7
|
7
|
+
data.tar.gz: 3326aa3962c7136033557c4398c38a2f442e04d39d14656f7e8195a4b7de16e4f5f5e914a2a48d054ca808b7eff748543b1535f607a096a0a84c7819110c5b28
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,19 @@
|
|
1
1
|
# MetaInpector Changelog
|
2
2
|
|
3
|
+
## [Changes in 4.7](https://github.com/jaimeiniesta/metainspector/compare/v4.6.0...v4.7.1)
|
4
|
+
|
5
|
+
MetaInspector can be configured to use [Faraday::HttpCache](https://github.com/plataformatec/faraday-http-cache) to cache page responses. For that you should pass the `faraday_http_cache` option with at least the `:store` key, for example:
|
6
|
+
|
7
|
+
```ruby
|
8
|
+
cache = ActiveSupport::Cache.lookup_store(:file_store, '/tmp/cache')
|
9
|
+
page = MetaInspector.new('http://example.com', faraday_http_cache: { store: cache })
|
10
|
+
```
|
11
|
+
|
12
|
+
Bugfixes:
|
13
|
+
|
14
|
+
* Parsing of the document is done as soon as it is initialized (just like we do with the request), so
|
15
|
+
that parsing errors will be catched earlier.
|
16
|
+
|
3
17
|
## [Changes in 4.6](https://github.com/jaimeiniesta/metainspector/compare/v4.5.0...v4.6.0)
|
4
18
|
|
5
19
|
Faraday can be passed options via `:faraday_options`. This is useful in cases where we need to
|
data/README.md
CHANGED
@@ -393,6 +393,21 @@ You can also set the `warn_level: :store` option so that exceptions found will b
|
|
393
393
|
|
394
394
|
You should avoid using the `:store` option, or use it wisely, as silencing errors can be problematic, it's always better to face the errors and treat them accordingly.
|
395
395
|
|
396
|
+
If you're using this exception store, you're advised to first initialize the document, check if it seems OK, and then proceed with the extractions, like this:
|
397
|
+
|
398
|
+
```ruby
|
399
|
+
# This will fail because the URL will return a text/xml document
|
400
|
+
page = MetaInspector.new("http://example.com/rss",
|
401
|
+
html_content_only: true,
|
402
|
+
warn_level: :store )
|
403
|
+
|
404
|
+
if page.ok?
|
405
|
+
puts "TITLE: #{page.title}"
|
406
|
+
else
|
407
|
+
puts "There were some exceptions: #{page.exceptions}"
|
408
|
+
end
|
409
|
+
```
|
410
|
+
|
396
411
|
## Examples
|
397
412
|
|
398
413
|
You can find some sample scripts on the `examples` folder, including a basic scraping and a spider that will follow external links using a queue. What follows is an example of use from irb:
|
@@ -19,6 +19,8 @@ module MetaInspector
|
|
19
19
|
@download_images = options[:download_images]
|
20
20
|
@images_parser = MetaInspector::Parsers::ImagesParser.new(self, download_images: @download_images)
|
21
21
|
@texts_parser = MetaInspector::Parsers::TextsParser.new(self)
|
22
|
+
|
23
|
+
parsed # parse early so we can fail early
|
22
24
|
end
|
23
25
|
|
24
26
|
extend Forwardable
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: metainspector
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 4.7.
|
4
|
+
version: 4.7.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jaime Iniesta
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-10-
|
11
|
+
date: 2015-10-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|