scrapifier 0.0.5 → 0.0.6

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e52f7f47695c7b16fed80a51f99a119d3f98a3b7
4
- data.tar.gz: 24db438acd2fb2df421ab7b6c148e7cbed3331ee
3
+ metadata.gz: 66a555f6eaeccf042961100999adeeee7a4a084a
4
+ data.tar.gz: 0517ecc6004507c2c4a479c187d24d559d6cd2e0
5
5
  SHA512:
6
- metadata.gz: 41e5f58a1760d61196ee3b4f429644025789d4432e7a1cef1a70d31473a26c12a8e38091527adacc78e7650759fc55cbdb80e365ce3f86954fcf493304f08d86
7
- data.tar.gz: 248f23ec549f331a942d1847b9fc4d92935dea34993128fe5bd63fca9a4a4b5814ee531ef3c390e38502cd7f802c55a12dfeef6aa7eb434782d6f735a4d29e28
6
+ metadata.gz: 001a68b1afe7f2b88d7c8c6fd9155ef6754468b78bd66fc6be2c038e291e38ee7aac9a04e9d4f51ce87579a0590ec04b25dd07d6701b317ef37bb02b611b8a37
7
+ data.tar.gz: 7330af461a9585ecf840c054a7f96556036e1189af32198a86e2020cac90b9305f3596fdffc33a3d837a403129d31d085c424542502786da5c33f5e8fac13d70
data/README.md CHANGED
@@ -1,94 +1,119 @@
1
- # Scrapifier
2
-
3
- [![Build Status](https://travis-ci.org/tiagopog/scrapifier.svg?branch=master)](https://travis-ci.org/tiagopog/scrapifier)
4
- [![Code Climate](https://codeclimate.com/github/tiagopog/scrapifier.png)](https://codeclimate.com/github/tiagopog/scrapifier)
5
- [![Dependency Status](https://gemnasium.com/tiagopog/scrapifier.svg)](https://gemnasium.com/tiagopog/scrapifier)
6
- [![Gem Version](https://badge.fury.io/rb/scrapifier.svg)](http://badge.fury.io/rb/scrapifier)
7
-
8
- It's a Ruby gem that brings a very simple way to extract meta information from URIs using the screen scraping technique.
9
-
10
- Note: This gem is mainly focused on screen scraping URLs (presence of protocol, such as: "http", "https" and "ftp"), but it also works with URIs which have the "www" without any protocol defined, like: "www.google.com".
11
-
12
- ## Installation
13
-
14
- Compatible with Ruby 1.9.3+
15
-
16
- Add this line to your application's Gemfile:
17
-
18
- gem 'scrapifier'
19
-
20
- And then execute:
21
-
22
- $ bundle
23
-
24
- Or install it yourself as:
25
-
26
- $ gem install scrapifier
27
-
28
- An then require the gem:
29
-
30
- $ require 'scrapifier'
31
-
32
- ## Usage
33
-
34
- The String#scrapify method finds URIs in a string and then gets their metadata, e.g., the page's title, description, images and URI. All the data is returned in a well-formatted hash.
35
-
36
- #### Default usage.
37
-
38
- ``` ruby
39
- 'Wow! What an awesome site: http://adtangerine.com!'.scrapify
40
- #=> {
41
- # title: "AdTangerine | Advertising Platform for Social Media",
42
- # description: "AdTangerine is an advertising platform that uses the tangerine as a virtual currency for advertisers and publishers in order to share content on social networks.",
43
- # images: ["http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/logo_adt_og.png", "http://s3-us-west-2.amazonaws.com/adtangerine-prod/users/avatars/000/000/834/thumb/275747_1118382211_1929809351_n.jpg", "http://adtangerine.com/assets/foobar.gif"],
44
- # uri: "http://adtangerine.com"
45
- # }
46
- ```
47
-
48
- #### Allow only certain image types.
49
-
50
- ``` ruby
51
- 'Wow! What an awesome site: http://adtangerine.com!'.scrapify(images: :jpg)
52
- #=> {
53
- # title: "AdTangerine | Advertising Platform for Social Media",
54
- # description: "AdTangerine is an advertising platform that uses the tangerine as a virtual currency for advertisers and publishers in order to share content on social networks.",
55
- # images: ["http://s3-us-west-2.amazonaws.com/adtangerine-prod/users/avatars/000/000/834/thumb/275747_1118382211_1929809351_n.jpg"],
56
- # uri: "http://adtangerine.com"
57
- # }
58
-
59
- 'Wow! What an awesome site: http://adtangerine.com!'.scrapify(images: [:png, :gif])
60
- #=> {
61
- # title: "AdTangerine | Advertising Platform for Social Media",
62
- # description: "AdTangerine is an advertising platform that uses the tangerine as a virtual currency for advertisers and publishers in order to share content on social networks.",
63
- # images: ["http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/foobar.gif"],
64
- # uri: "http://adtangerine.com"
65
- # }
66
- ```
67
-
68
- #### Choose which URI you want it to be scraped.
69
-
70
- ``` ruby
71
- 'Check out: http://adtangerine.com and www.twitflink.com'.scrapify(which: 1)
72
- #=> {
73
- # title: "TwitFlink | Find a link!",
74
- # description: "TwitFlink is a very simple searching tool that allows people to find out links tweeted by any user from Twitter.",
75
- # images: ["http://www.twitflink.com//assets/tf_logo.png", "http://twitflink.com/assets/tf_logo.png"],
76
- # uri: "http://www.twitflink.com"
77
- # }
78
-
79
- 'Check out: http://adtangerine.com and www.twitflink.com'.scrapify(which: 0, images: :gif)
80
- #=> {
81
- # title: "AdTangerine | Advertising Platform for Social Media",
82
- # description: "AdTangerine is an advertising platform that uses the tangerine as a virtual currency for advertisers and publishers in order to share content on social networks.",
83
- # images: ["http://adtangerine.com/assets/foobar.gif"],
84
- # uri: "http://adtangerine.com"
85
- # }
86
- ```
87
-
88
- ## Contributing
89
-
90
- 1. Fork it
91
- 2. Create your feature branch (`git checkout -b my-new-feature`)
92
- 3. Commit your changes (`git commit -am 'Add some feature'`)
93
- 4. Push to the branch (`git push origin my-new-feature`)
94
- 5. Create new Pull Request
1
+ # Scrapifier
2
+
3
+ [![Build Status](https://travis-ci.org/tiagopog/scrapifier.svg?branch=master)](https://travis-ci.org/tiagopog/scrapifier)
4
+ [![Code Climate](https://codeclimate.com/github/tiagopog/scrapifier.png)](https://codeclimate.com/github/tiagopog/scrapifier)
5
+ [![Dependency Status](https://gemnasium.com/tiagopog/scrapifier.svg)](https://gemnasium.com/tiagopog/scrapifier)
6
+ [![Gem Version](https://badge.fury.io/rb/scrapifier.svg)](http://badge.fury.io/rb/scrapifier)
7
+
8
+ It's a Ruby gem that brings a very simple way to extract meta information from URIs using the screen scraping technique.
9
+
10
+ Note: This gem is mainly focused on screen scraping URLs (presence of protocol, such as: "http", "https" and "ftp"), but it also works with URIs which have the "www" without any protocol defined, like: "www.google.com".
11
+
12
+ ## Installation
13
+
14
+ Compatible with Ruby 1.9.3+
15
+
16
+ Add this line to your application's Gemfile:
17
+
18
+ gem 'scrapifier'
19
+
20
+ And then execute:
21
+
22
+ $ bundle
23
+
24
+ Or install it yourself as:
25
+
26
+ $ gem install scrapifier
27
+
28
+ An then require the gem:
29
+
30
+ $ require 'scrapifier'
31
+
32
+ ## Usage
33
+
34
+ The String#scrapify method finds URIs in a string and then gets their metadata, e.g., the page's title, description, images, keywords, language, encode, "reply to" email, author and URI. All the data is returned in a well-formatted hash.
35
+
36
+ #### Default usage.
37
+
38
+ ``` ruby
39
+ 'Wow! What an awesome site: http://adtangerine.com!'.scrapify
40
+ #=> {
41
+ # title: "AdTangerine | Boosting great ideas",
42
+ # description: "Advertising social network that uses tangerines as a virtual currency..." ,
43
+ # keywords: "ad network, ad, advertising, advertiser, publisher, social media",
44
+ # lang: "en-us",
45
+ # encode: "utf-8",
46
+ # reply_to: "sayhello@adtangerine.com",
47
+ # author: "Tiago Guedes, Jonatas de Paula, Raphael da Costa",
48
+ # images: ["http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/logo_adt_og.png", "http://s3-us-west-2.amazonaws.com/adtangerine-prod/users/avatars/000/000/834/thumb/275747_1118382211_1929809351_n.jpg", "http://adtangerine.com/assets/foobar.gif"],
49
+ # uri: "http://adtangerine.com"
50
+ # }
51
+ ```
52
+
53
+ #### Allow only certain image types.
54
+
55
+ ``` ruby
56
+ 'Wow! What an awesome site: http://adtangerine.com!'.scrapify(images: :jpg)
57
+ #=> {
58
+ # title: "AdTangerine | Boosting great ideas",
59
+ # description: "Advertising social network that uses tangerines as a virtual currency..." ,
60
+ # keywords: "ad network, ad, advertising, advertiser, publisher, social media",
61
+ # lang: "en-us",
62
+ # encode: "utf-8",
63
+ # reply_to: "sayhello@adtangerine.com",
64
+ # author: "Tiago Guedes, Jonatas de Paula, Raphael da Costa",
65
+ # images: ["http://s3-us-west-2.amazonaws.com/adtangerine-prod/users/avatars/000/000/834/thumb/275747_1118382211_1929809351_n.jpg"],
66
+ # uri: "http://adtangerine.com"
67
+ # }
68
+
69
+ 'Wow! What an awesome site: http://adtangerine.com!'.scrapify(images: [:png, :gif])
70
+ #=> {
71
+ # title: "AdTangerine | Boosting great ideas",
72
+ # description: "Advertising social network that uses tangerines as a virtual currency..." ,
73
+ # keywords: "ad network, ad, advertising, advertiser, publisher, social media",
74
+ # lang: "en-us",
75
+ # encode: "utf-8",
76
+ # reply_to: "sayhello@adtangerine.com",
77
+ # author: "Tiago Guedes, Jonatas de Paula, Raphael da Costa",
78
+ # images: ["http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/logo_adt_og.png", "http://adtangerine.com/assets/foobar.gif"],
79
+ # uri: "http://adtangerine.com"
80
+ # }
81
+ ```
82
+
83
+ #### Choose which URI you want it to be scraped.
84
+
85
+ ``` ruby
86
+ 'Check out: http://adtangerine.com and www.twitflink.com'.scrapify(which: 1)
87
+ #=> {
88
+ # title: "TwitFlink | Find a link!",
89
+ # description: "TwitFlink is a very simple searching tool that allows people to find out links tweeted...",
90
+ # keywords: "search, searching tool, link, twitter, social media",
91
+ # lang: "en-us",
92
+ # encode: "utf-8",
93
+ # reply_to: "sayhello@adtangerine.com",
94
+ # author: "Tiago Guedes",
95
+ # images: ["http://www.twitflink.com//assets/tf_logo.png", "http://twitflink.com/assets/tf_logo.png"],
96
+ # uri: "http://www.twitflink.com"
97
+ # }
98
+
99
+ 'Check out: http://adtangerine.com and www.twitflink.com'.scrapify(which: 0, images: :gif)
100
+ #=> {
101
+ # title: "AdTangerine | Boosting great ideas",
102
+ # description: "Advertising social network that uses tangerines as a virtual currency..." ,
103
+ # keywords: "ad network, ad, advertising, advertiser, publisher, social media",
104
+ # lang: "en-us",
105
+ # encode: "utf-8",
106
+ # reply_to: "sayhello@adtangerine.com",
107
+ # author: "Tiago Guedes, Jonatas de Paula, Raphael da Costa",
108
+ # images: ["http://adtangerine.com/assets/foobar.gif"],
109
+ # uri: "http://adtangerine.com"
110
+ # }
111
+ ```
112
+
113
+ ## Contributing
114
+
115
+ 1. Fork it
116
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
117
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
118
+ 4. Push to the branch (`git push origin my-new-feature`)
119
+ 5. Create new Pull Request
@@ -1,3 +1,3 @@
1
1
  module Scrapifier
2
- VERSION = '0.0.5'
2
+ VERSION = '0.0.6'
3
3
  end
@@ -41,14 +41,14 @@ module Scrapifier
41
41
 
42
42
  REPLY_TO =
43
43
  <<-END.gsub(/^\s+\|/, '')
44
- |//meta[@name="reply_to"]/@content
44
+ |//meta[@name="reply_to"]/@content|
45
+ |//meta[@name="Reply_to"]/@content
45
46
  END
46
47
 
47
48
  AUTHOR =
48
49
  <<-END.gsub(/^\s+\|/, '')
49
50
  |//meta[@name="author"]/@content|
50
- |//meta[@name="Author"]/@content|
51
- |//meta[@name="reply_to"]/@content
51
+ |//meta[@name="Author"]/@content
52
52
  END
53
53
 
54
54
  IMG =
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: scrapifier
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5
4
+ version: 0.0.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tiago Guedes
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-06-28 00:00:00.000000000 Z
11
+ date: 2014-06-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri