metainspector 4.7.0 → 4.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -0
- data/README.md +15 -0
- data/lib/meta_inspector/parser.rb +2 -0
- data/lib/meta_inspector/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 76c49fb7187563a3e9daff74a8f3416521251fa6
|
4
|
+
data.tar.gz: 0c67668db7ec465badcbb6666b4d3183b1f3e70e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 164db60a1bf7139c1fa4f92ad459073df0b0f0b2adf6a1c48aba960afcbcdcd02c8cbc66d4283b3b1f4967d8a1915f9aca2cf303384507ff74571cfaf17bf0c7
|
7
|
+
data.tar.gz: 3326aa3962c7136033557c4398c38a2f442e04d39d14656f7e8195a4b7de16e4f5f5e914a2a48d054ca808b7eff748543b1535f607a096a0a84c7819110c5b28
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,19 @@
|
|
1
1
|
# MetaInpector Changelog
|
2
2
|
|
3
|
+
## [Changes in 4.7](https://github.com/jaimeiniesta/metainspector/compare/v4.6.0...v4.7.1)
|
4
|
+
|
5
|
+
MetaInspector can be configured to use [Faraday::HttpCache](https://github.com/plataformatec/faraday-http-cache) to cache page responses. For that you should pass the `faraday_http_cache` option with at least the `:store` key, for example:
|
6
|
+
|
7
|
+
```ruby
|
8
|
+
cache = ActiveSupport::Cache.lookup_store(:file_store, '/tmp/cache')
|
9
|
+
page = MetaInspector.new('http://example.com', faraday_http_cache: { store: cache })
|
10
|
+
```
|
11
|
+
|
12
|
+
Bugfixes:
|
13
|
+
|
14
|
+
* Parsing of the document is done as soon as it is initialized (just like we do with the request), so
|
15
|
+
that parsing errors will be catched earlier.
|
16
|
+
|
3
17
|
## [Changes in 4.6](https://github.com/jaimeiniesta/metainspector/compare/v4.5.0...v4.6.0)
|
4
18
|
|
5
19
|
Faraday can be passed options via `:faraday_options`. This is useful in cases where we need to
|
data/README.md
CHANGED
@@ -393,6 +393,21 @@ You can also set the `warn_level: :store` option so that exceptions found will b
|
|
393
393
|
|
394
394
|
You should avoid using the `:store` option, or use it wisely, as silencing errors can be problematic, it's always better to face the errors and treat them accordingly.
|
395
395
|
|
396
|
+
If you're using this exception store, you're advised to first initialize the document, check if it seems OK, and then proceed with the extractions, like this:
|
397
|
+
|
398
|
+
```ruby
|
399
|
+
# This will fail because the URL will return a text/xml document
|
400
|
+
page = MetaInspector.new("http://example.com/rss",
|
401
|
+
html_content_only: true,
|
402
|
+
warn_level: :store )
|
403
|
+
|
404
|
+
if page.ok?
|
405
|
+
puts "TITLE: #{page.title}"
|
406
|
+
else
|
407
|
+
puts "There were some exceptions: #{page.exceptions}"
|
408
|
+
end
|
409
|
+
```
|
410
|
+
|
396
411
|
## Examples
|
397
412
|
|
398
413
|
You can find some sample scripts on the `examples` folder, including a basic scraping and a spider that will follow external links using a queue. What follows is an example of use from irb:
|
@@ -19,6 +19,8 @@ module MetaInspector
|
|
19
19
|
@download_images = options[:download_images]
|
20
20
|
@images_parser = MetaInspector::Parsers::ImagesParser.new(self, download_images: @download_images)
|
21
21
|
@texts_parser = MetaInspector::Parsers::TextsParser.new(self)
|
22
|
+
|
23
|
+
parsed # parse early so we can fail early
|
22
24
|
end
|
23
25
|
|
24
26
|
extend Forwardable
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: metainspector
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 4.7.
|
4
|
+
version: 4.7.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jaime Iniesta
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-10-
|
11
|
+
date: 2015-10-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|