scrapifier 0.0.5 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e52f7f47695c7b16fed80a51f99a119d3f98a3b7
4
- data.tar.gz: 24db438acd2fb2df421ab7b6c148e7cbed3331ee
3
+ metadata.gz: 66a555f6eaeccf042961100999adeeee7a4a084a
4
+ data.tar.gz: 0517ecc6004507c2c4a479c187d24d559d6cd2e0
5
5
  SHA512:
6
- metadata.gz: 41e5f58a1760d61196ee3b4f429644025789d4432e7a1cef1a70d31473a26c12a8e38091527adacc78e7650759fc55cbdb80e365ce3f86954fcf493304f08d86
7
- data.tar.gz: 248f23ec549f331a942d1847b9fc4d92935dea34993128fe5bd63fca9a4a4b5814ee531ef3c390e38502cd7f802c55a12dfeef6aa7eb434782d6f735a4d29e28
6
+ metadata.gz: 001a68b1afe7f2b88d7c8c6fd9155ef6754468b78bd66fc6be2c038e291e38ee7aac9a04e9d4f51ce87579a0590ec04b25dd07d6701b317ef37bb02b611b8a37
7
+ data.tar.gz: 7330af461a9585ecf840c054a7f96556036e1189af32198a86e2020cac90b9305f3596fdffc33a3d837a403129d31d085c424542502786da5c33f5e8fac13d70
data/README.md CHANGED
@@ -1,94 +1,119 @@
1
- # Scrapifier
2
-
3
- [![Build Status](https://travis-ci.org/tiagopog/scrapifier.svg?branch=master)](https://travis-ci.org/tiagopog/scrapifier)
4
- [![Code Climate](https://codeclimate.com/github/tiagopog/scrapifier.png)](https://codeclimate.com/github/tiagopog/scrapifier)
5
- [![Dependency Status](https://gemnasium.com/tiagopog/scrapifier.svg)](https://gemnasium.com/tiagopog/scrapifier)
6
- [![Gem Version](https://badge.fury.io/rb/scrapifier.svg)](http://badge.fury.io/rb/scrapifier)
7
-
8
- It's a Ruby gem that brings a very simple way to extract meta information from URIs using the screen scraping technique.
9
-
10
- Note: This gem is mainly focused on screen scraping URLs (presence of protocol, such as: "http", "https" and "ftp"), but it also works with URIs which have the "www" without any protocol defined, like: "www.google.com".
11
-
12
- ## Installation
13
-
14
- Compatible with Ruby 1.9.3+
15
-
16
- Add this line to your application's Gemfile:
17
-
18
- gem 'scrapifier'
19
-
20
- And then execute:
21
-
22
- $ bundle
23
-
24
- Or install it yourself as:
25
-
26
- $ gem install scrapifier
27
-
28
- An then require the gem:
29
-
30
- $ require 'scrapifier'
31
-
32
- ## Usage
33
-
34
- The String#scrapify method finds URIs in a string and then gets their metadata, e.g., the page's title, description, images and URI. All the data is returned in a well-formatted hash.
35
-
36
- #### Default usage.
37
-
38
- ``` ruby
39
- 'Wow! What an awesome site: http://adtangerine.com!'.scrapify
40
- #=> {
41
- # title: "AdTangerine | Advertising Platform for Social Media",
42
- # description: "AdTangerine is an advertising platform that uses the tangerine as a virtual currency for advertisers and publishers in order to share content on social networks.",
43
- # images: ["http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/logo_adt_og.png", "http://s3-us-west-2.amazonaws.com/adtangerine-prod/users/avatars/000/000/834/thumb/275747_1118382211_1929809351_n.jpg", "http://adtangerine.com/assets/foobar.gif"],
44
- # uri: "http://adtangerine.com"
45
- # }
46
- ```
47
-
48
- #### Allow only certain image types.
49
-
50
- ``` ruby
51
- 'Wow! What an awesome site: http://adtangerine.com!'.scrapify(images: :jpg)
52
- #=> {
53
- # title: "AdTangerine | Advertising Platform for Social Media",
54
- # description: "AdTangerine is an advertising platform that uses the tangerine as a virtual currency for advertisers and publishers in order to share content on social networks.",
55
- # images: ["http://s3-us-west-2.amazonaws.com/adtangerine-prod/users/avatars/000/000/834/thumb/275747_1118382211_1929809351_n.jpg"],
56
- # uri: "http://adtangerine.com"
57
- # }
58
-
59
- 'Wow! What an awesome site: http://adtangerine.com!'.scrapify(images: [:png, :gif])
60
- #=> {
61
- # title: "AdTangerine | Advertising Platform for Social Media",
62
- # description: "AdTangerine is an advertising platform that uses the tangerine as a virtual currency for advertisers and publishers in order to share content on social networks.",
63
- # images: ["http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/foobar.gif"],
64
- # uri: "http://adtangerine.com"
65
- # }
66
- ```
67
-
68
- #### Choose which URI you want it to be scraped.
69
-
70
- ``` ruby
71
- 'Check out: http://adtangerine.com and www.twitflink.com'.scrapify(which: 1)
72
- #=> {
73
- # title: "TwitFlink | Find a link!",
74
- # description: "TwitFlink is a very simple searching tool that allows people to find out links tweeted by any user from Twitter.",
75
- # images: ["http://www.twitflink.com//assets/tf_logo.png", "http://twitflink.com/assets/tf_logo.png"],
76
- # uri: "http://www.twitflink.com"
77
- # }
78
-
79
- 'Check out: http://adtangerine.com and www.twitflink.com'.scrapify(which: 0, images: :gif)
80
- #=> {
81
- # title: "AdTangerine | Advertising Platform for Social Media",
82
- # description: "AdTangerine is an advertising platform that uses the tangerine as a virtual currency for advertisers and publishers in order to share content on social networks.",
83
- # images: ["http://adtangerine.com/assets/foobar.gif"],
84
- # uri: "http://adtangerine.com"
85
- # }
86
- ```
87
-
88
- ## Contributing
89
-
90
- 1. Fork it
91
- 2. Create your feature branch (`git checkout -b my-new-feature`)
92
- 3. Commit your changes (`git commit -am 'Add some feature'`)
93
- 4. Push to the branch (`git push origin my-new-feature`)
94
- 5. Create new Pull Request
1
+ # Scrapifier
2
+
3
+ [![Build Status](https://travis-ci.org/tiagopog/scrapifier.svg?branch=master)](https://travis-ci.org/tiagopog/scrapifier)
4
+ [![Code Climate](https://codeclimate.com/github/tiagopog/scrapifier.png)](https://codeclimate.com/github/tiagopog/scrapifier)
5
+ [![Dependency Status](https://gemnasium.com/tiagopog/scrapifier.svg)](https://gemnasium.com/tiagopog/scrapifier)
6
+ [![Gem Version](https://badge.fury.io/rb/scrapifier.svg)](http://badge.fury.io/rb/scrapifier)
7
+
8
+ It's a Ruby gem that brings a very simple way to extract meta information from URIs using the screen scraping technique.
9
+
10
+ Note: This gem is mainly focused on screen scraping URLs (presence of protocol, such as: "http", "https" and "ftp"), but it also works with URIs which have the "www" without any protocol defined, like: "www.google.com".
11
+
12
+ ## Installation
13
+
14
+ Compatible with Ruby 1.9.3+
15
+
16
+ Add this line to your application's Gemfile:
17
+
18
+ gem 'scrapifier'
19
+
20
+ And then execute:
21
+
22
+ $ bundle
23
+
24
+ Or install it yourself as:
25
+
26
+ $ gem install scrapifier
27
+
28
+ An then require the gem:
29
+
30
+ $ require 'scrapifier'
31
+
32
+ ## Usage
33
+
34
+ The String#scrapify method finds URIs in a string and then gets their metadata, e.g., the page's title, description, images, keywords, language, encode, "reply to" email, author and URI. All the data is returned in a well-formatted hash.
35
+
36
+ #### Default usage.
37
+
38
+ ``` ruby
39
+ 'Wow! What an awesome site: http://adtangerine.com!'.scrapify
40
+ #=> {
41
+ # title: "AdTangerine | Boosting great ideas",
42
+ # description: "Advertising social network that uses tangerines as a virtual currency..." ,
43
+ # keywords: "ad network, ad, advertising, advertiser, publisher, social media",
44
+ # lang: "en-us",
45
+ # encode: "utf-8",
46
+ # reply_to: "sayhello@adtangerine.com",
47
+ # author: "Tiago Guedes, Jonatas de Paula, Raphael da Costa",
48
+ # images: ["http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/logo_adt_og.png", "http://s3-us-west-2.amazonaws.com/adtangerine-prod/users/avatars/000/000/834/thumb/275747_1118382211_1929809351_n.jpg", "http://adtangerine.com/assets/foobar.gif"],
49
+ # uri: "http://adtangerine.com"
50
+ # }
51
+ ```
52
+
53
+ #### Allow only certain image types.
54
+
55
+ ``` ruby
56
+ 'Wow! What an awesome site: http://adtangerine.com!'.scrapify(images: :jpg)
57
+ #=> {
58
+ # title: "AdTangerine | Boosting great ideas",
59
+ # description: "Advertising social network that uses tangerines as a virtual currency..." ,
60
+ # keywords: "ad network, ad, advertising, advertiser, publisher, social media",
61
+ # lang: "en-us",
62
+ # encode: "utf-8",
63
+ # reply_to: "sayhello@adtangerine.com",
64
+ # author: "Tiago Guedes, Jonatas de Paula, Raphael da Costa",
65
+ # images: ["http://s3-us-west-2.amazonaws.com/adtangerine-prod/users/avatars/000/000/834/thumb/275747_1118382211_1929809351_n.jpg"],
66
+ # uri: "http://adtangerine.com"
67
+ # }
68
+
69
+ 'Wow! What an awesome site: http://adtangerine.com!'.scrapify(images: [:png, :gif])
70
+ #=> {
71
+ # title: "AdTangerine | Boosting great ideas",
72
+ # description: "Advertising social network that uses tangerines as a virtual currency..." ,
73
+ # keywords: "ad network, ad, advertising, advertiser, publisher, social media",
74
+ # lang: "en-us",
75
+ # encode: "utf-8",
76
+ # reply_to: "sayhello@adtangerine.com",
77
+ # author: "Tiago Guedes, Jonatas de Paula, Raphael da Costa",
78
+ # images: ["http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/foobar.gif"],
79
+ # uri: "http://adtangerine.com"
80
+ # }
81
+ ```
82
+
83
+ #### Choose which URI you want it to be scraped.
84
+
85
+ ``` ruby
86
+ 'Check out: http://adtangerine.com and www.twitflink.com'.scrapify(which: 1)
87
+ #=> {
88
+ # title: "TwitFlink | Find a link!",
89
+ # description: "TwitFlink is a very simple searching tool that allows people to find out links tweeted...",
90
+ # keywords: "search, searching tool, link, twitter, social media",
91
+ # lang: "en-us",
92
+ # encode: "utf-8",
93
+ # reply_to: "sayhello@adtangerine.com",
94
+ # author: "Tiago Guedes",
95
+ # images: ["http://www.twitflink.com//assets/tf_logo.png", "http://twitflink.com/assets/tf_logo.png"],
96
+ # uri: "http://www.twitflink.com"
97
+ # }
98
+
99
+ 'Check out: http://adtangerine.com and www.twitflink.com'.scrapify(which: 0, images: :gif)
100
+ #=> {
101
+ # title: "AdTangerine | Boosting great ideas",
102
+ # description: "Advertising social network that uses tangerines as a virtual currency..." ,
103
+ # keywords: "ad network, ad, advertising, advertiser, publisher, social media",
104
+ # lang: "en-us",
105
+ # encode: "utf-8",
106
+ # reply_to: "sayhello@adtangerine.com",
107
+ # author: "Tiago Guedes, Jonatas de Paula, Raphael da Costa",
108
+ # images: ["http://adtangerine.com/assets/foobar.gif"],
109
+ # uri: "http://adtangerine.com"
110
+ # }
111
+ ```
112
+
113
+ ## Contributing
114
+
115
+ 1. Fork it
116
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
117
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
118
+ 4. Push to the branch (`git push origin my-new-feature`)
119
+ 5. Create new Pull Request
@@ -1,3 +1,3 @@
1
1
  module Scrapifier
2
- VERSION = '0.0.5'
2
+ VERSION = '0.0.6'
3
3
  end
@@ -41,14 +41,14 @@ module Scrapifier
41
41
 
42
42
  REPLY_TO =
43
43
  <<-END.gsub(/^\s+\|/, '')
44
- |//meta[@name="reply_to"]/@content
44
+ |//meta[@name="reply_to"]/@content|
45
+ |//meta[@name="Reply_to"]/@content
45
46
  END
46
47
 
47
48
  AUTHOR =
48
49
  <<-END.gsub(/^\s+\|/, '')
49
50
  |//meta[@name="author"]/@content|
50
- |//meta[@name="Author"]/@content|
51
- |//meta[@name="reply_to"]/@content
51
+ |//meta[@name="Author"]/@content
52
52
  END
53
53
 
54
54
  IMG =
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: scrapifier
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5
4
+ version: 0.0.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tiago Guedes
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-06-28 00:00:00.000000000 Z
11
+ date: 2014-06-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri