opengraph_parser 0.2.2 → 0.2.5

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 15c61cf496e86ff25b78ddf8dc79d60c9d72620dbcd88e0748457e1a56c40902
4
+ data.tar.gz: 25ab321fb45b195b5f5a4810922d9b35a9a7ce8256bfd09113bbd36217a2315b
5
+ SHA512:
6
+ metadata.gz: 95ae2946ea00c037102a263fa766a81f4ce6cca640c3ac7e03807ba4487263539b1a7c08467a56f9628b470921b005420d95b56a6dcab8f20783186784b39882
7
+ data.tar.gz: a8f6d57466a55350051206b12b099b721ef4cc89a9276fb1fea8029e59d514e2b603d7a9272175fbe22187e0bd7f92181c56846c3030c3ab8d8f281af1662cd1
data/README.md ADDED
@@ -0,0 +1,74 @@
1
+ # OpengraphParser
2
+
3
+ OpengraphParser is a simple Ruby library for parsing Open Graph protocol information from a website. Learn more about the protocol at:
4
+ http://ogp.me
5
+
6
+ ## Installation
7
+
8
+ ```bash
9
+ gem install opengraph_parser
10
+ ```
11
+
12
+ or add to Gemfile
13
+
14
+ ```bash
15
+ gem "opengraph_parser"
16
+ ```
17
+
18
+ ## Usage
19
+
20
+ ### Parsing an URL
21
+
22
+ ```ruby
23
+ og = OpenGraph.new("http://ogp.me")
24
+ og.title # => "Open Graph protocol"
25
+ og.type # => "website"
26
+ og.url # => "http://ogp.me/"
27
+ og.description # => "The Open Graph protocol enables any web page to become a rich object in a social graph."
28
+ og.images # => ["http://ogp.me/logo.png"]
29
+ ```
30
+
31
+ You can also get other Open Graph metadata as:
32
+
33
+ ```ruby
34
+ og.metadata # => {"og:image:type"=>"image/png", "og:image:width"=>"300", "og:image:height"=>"300"}
35
+ ```
36
+
37
+ ### Parsing a HTML document
38
+
39
+ ```ruby
40
+ og = OpenGraph.new(html_string)
41
+ ```
42
+
43
+ ### Custom header fields
44
+ In some cases you may need to change fields in HTTP request header for an URL
45
+ ```ruby
46
+ og = OpenGraph.new("http://opg.me", { :headers => {'User-Agent' => 'Custom User Agent'} })
47
+ ```
48
+
49
+ ### Fallback
50
+ If you try to parse Open Graph information for a website that doesn’t have any Open Graph metadata, the library will try to find other information in the website as the following rules:
51
+
52
+ * `<title>` for title
53
+ * `<meta name="description">` for description
54
+ * `<link rel="image_src">` or all `<img>` tags for images
55
+
56
+ You can disable this fallback lookup by passing false to init method:
57
+
58
+ ```ruby
59
+ og = OpenGraph.new("http://ogp.me", false)
60
+ ```
61
+
62
+ ## Contributing to opengraph_parser
63
+
64
+ * Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
65
+ * Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
66
+ * Fork the project.
67
+ * Start a feature/bugfix branch.
68
+ * Commit and push until you are happy with your contribution.
69
+ * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
70
+ * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
71
+
72
+ ## Copyright
73
+
74
+ Copyright (c) 2013 Huy Ha. See LICENSE.txt for further details.
data/lib/open_graph.rb CHANGED
@@ -1,9 +1,10 @@
1
1
  require 'nokogiri'
2
2
  require 'redirect_follower'
3
3
  require "addressable/uri"
4
+ require 'uri'
4
5
 
5
6
  class OpenGraph
6
- attr_accessor :src, :url, :type, :title, :description, :images, :metadata, :response, :original_images
7
+ attr_accessor :src, :url, :type, :title, :description, :images, :metadata, :response, :original_images, :html_content
7
8
 
8
9
  def initialize(src, fallback = true, options = {})
9
10
  if fallback.is_a? Hash
@@ -11,6 +12,7 @@ class OpenGraph
11
12
  fallback = true
12
13
  end
13
14
  @src = src
15
+ @body = nil
14
16
  @images = []
15
17
  @metadata = {}
16
18
  parse_opengraph(options)
@@ -21,15 +23,21 @@ class OpenGraph
21
23
  private
22
24
  def parse_opengraph(options = {})
23
25
  begin
24
- @response = RedirectFollower.new(@src, options).resolve
26
+ if @src.include? '</html>'
27
+ @body = @src
28
+ @html_content = true
29
+ else
30
+ @body = RedirectFollower.new(@src, options).resolve.body
31
+ @html_content = false
32
+ end
25
33
  rescue
26
34
  @title = @url = @src
27
35
  return
28
36
  end
29
37
 
30
- if @response && @response.body
38
+ if @body
31
39
  attrs_list = %w(title url type description)
32
- doc = Nokogiri.parse(@response.body)
40
+ doc = Nokogiri.parse(@body)
33
41
  doc.css('meta').each do |m|
34
42
  if m.attribute('property') && m.attribute('property').to_s.match(/^og:(.+)$/i)
35
43
  m_content = m.attribute('content').to_s.strip
@@ -47,8 +55,8 @@ class OpenGraph
47
55
  end
48
56
 
49
57
  def load_fallback
50
- if @response && @response.body
51
- doc = Nokogiri.parse(@response.body)
58
+ if @body
59
+ doc = Nokogiri.parse(@body)
52
60
 
53
61
  if @title.to_s.empty? && doc.xpath("//head//title").size > 0
54
62
  @title = doc.xpath("//head//title").first.text.to_s.strip
@@ -60,6 +68,10 @@ class OpenGraph
60
68
  @description = description_meta.attribute("content").to_s.strip
61
69
  end
62
70
 
71
+ if @description.to_s.empty?
72
+ @description = fetch_first_text(doc)
73
+ end
74
+
63
75
  fetch_images(doc, "//head//link[@rel='image_src']", "href") if @images.empty?
64
76
  fetch_images(doc, "//img", "src") if @images.empty?
65
77
  end
@@ -67,7 +79,11 @@ class OpenGraph
67
79
 
68
80
  def check_images_path
69
81
  @original_images = @images.dup
70
- uri = Addressable::URI.parse(@src)
82
+
83
+ uri = Addressable::URI.parse(@url || @src)
84
+
85
+ return unless uri
86
+
71
87
  imgs = @images.dup
72
88
  @images = []
73
89
  imgs.each do |img|
@@ -90,6 +106,13 @@ class OpenGraph
90
106
  end
91
107
  end
92
108
 
109
+ def fetch_first_text(doc)
110
+ doc.xpath('//p').each do |p|
111
+ s = p.text.to_s.strip
112
+ return s if s.length > 20
113
+ end
114
+ end
115
+
93
116
  def add_metadata(metadata_container, path, content)
94
117
  path_elements = path.split(':')
95
118
  if path_elements.size > 1
@@ -111,4 +134,4 @@ class OpenGraph
111
134
  metadata_container
112
135
  end
113
136
  end
114
- end
137
+ end
@@ -6,21 +6,18 @@ class RedirectFollower
6
6
 
7
7
  attr_accessor :url, :body, :redirect_limit, :response, :headers
8
8
 
9
- def initialize(url, limit = REDIRECT_DEFAULT_LIMIT, options = {})
10
- if limit.is_a? Hash
11
- options = limit
12
- limit = REDIRECT_DEFAULT_LIMIT
13
- end
14
- @url, @redirect_limit = url, limit
9
+ def initialize(url, options = {})
10
+ @url = url
11
+ @redirect_limit = options[:redirect_limit] || REDIRECT_DEFAULT_LIMIT
15
12
  @headers = options[:headers] || {}
16
13
  end
17
14
 
18
15
  def resolve
19
16
  raise TooManyRedirects if redirect_limit < 0
20
17
 
21
- uri = URI.parse(URI.escape(url))
18
+ uri = Addressable::URI.parse(url)
22
19
 
23
- http = Net::HTTP.new(uri.host, uri.port)
20
+ http = Net::HTTP.new(uri.host, uri.inferred_port)
24
21
  if uri.scheme == 'https'
25
22
  http.use_ssl = true
26
23
  http.verify_mode = OpenSSL::SSL::VERIFY_PEER
@@ -45,4 +42,4 @@ class RedirectFollower
45
42
  response['location']
46
43
  end
47
44
  end
48
- end
45
+ end
metadata CHANGED
@@ -1,111 +1,98 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: opengraph_parser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.2
5
- prerelease:
4
+ version: 0.2.5
6
5
  platform: ruby
7
6
  authors:
8
7
  - Huy Ha
9
8
  - Duc Trinh
10
- autorequire:
9
+ autorequire:
11
10
  bindir: bin
12
11
  cert_chain: []
13
- date: 2013-02-21 00:00:00.000000000 Z
12
+ date: 2022-02-23 00:00:00.000000000 Z
14
13
  dependencies:
15
14
  - !ruby/object:Gem::Dependency
16
15
  name: nokogiri
17
16
  requirement: !ruby/object:Gem::Requirement
18
- none: false
19
17
  requirements:
20
- - - ! '>='
18
+ - - ">="
21
19
  - !ruby/object:Gem::Version
22
20
  version: '0'
23
21
  type: :runtime
24
22
  prerelease: false
25
23
  version_requirements: !ruby/object:Gem::Requirement
26
- none: false
27
24
  requirements:
28
- - - ! '>='
25
+ - - ">="
29
26
  - !ruby/object:Gem::Version
30
27
  version: '0'
31
28
  - !ruby/object:Gem::Dependency
32
29
  name: addressable
33
30
  requirement: !ruby/object:Gem::Requirement
34
- none: false
35
31
  requirements:
36
- - - ! '>='
32
+ - - ">="
37
33
  - !ruby/object:Gem::Version
38
34
  version: '0'
39
35
  type: :runtime
40
36
  prerelease: false
41
37
  version_requirements: !ruby/object:Gem::Requirement
42
- none: false
43
38
  requirements:
44
- - - ! '>='
39
+ - - ">="
45
40
  - !ruby/object:Gem::Version
46
41
  version: '0'
47
42
  - !ruby/object:Gem::Dependency
48
43
  name: rspec
49
44
  requirement: !ruby/object:Gem::Requirement
50
- none: false
51
45
  requirements:
52
- - - ! '>='
46
+ - - ">="
53
47
  - !ruby/object:Gem::Version
54
48
  version: '0'
55
49
  type: :development
56
50
  prerelease: false
57
51
  version_requirements: !ruby/object:Gem::Requirement
58
- none: false
59
52
  requirements:
60
- - - ! '>='
53
+ - - ">="
61
54
  - !ruby/object:Gem::Version
62
55
  version: '0'
63
56
  - !ruby/object:Gem::Dependency
64
57
  name: rdoc
65
58
  requirement: !ruby/object:Gem::Requirement
66
- none: false
67
59
  requirements:
68
- - - ! '>='
60
+ - - ">="
69
61
  - !ruby/object:Gem::Version
70
62
  version: '0'
71
63
  type: :development
72
64
  prerelease: false
73
65
  version_requirements: !ruby/object:Gem::Requirement
74
- none: false
75
66
  requirements:
76
- - - ! '>='
67
+ - - ">="
77
68
  - !ruby/object:Gem::Version
78
69
  version: '0'
79
70
  - !ruby/object:Gem::Dependency
80
71
  name: bundler
81
72
  requirement: !ruby/object:Gem::Requirement
82
- none: false
83
73
  requirements:
84
- - - ! '>='
74
+ - - ">="
85
75
  - !ruby/object:Gem::Version
86
76
  version: '0'
87
77
  type: :development
88
78
  prerelease: false
89
79
  version_requirements: !ruby/object:Gem::Requirement
90
- none: false
91
80
  requirements:
92
- - - ! '>='
81
+ - - ">="
93
82
  - !ruby/object:Gem::Version
94
83
  version: '0'
95
84
  - !ruby/object:Gem::Dependency
96
85
  name: jeweler
97
86
  requirement: !ruby/object:Gem::Requirement
98
- none: false
99
87
  requirements:
100
- - - ! '>='
88
+ - - ">="
101
89
  - !ruby/object:Gem::Version
102
90
  version: '0'
103
91
  type: :development
104
92
  prerelease: false
105
93
  version_requirements: !ruby/object:Gem::Requirement
106
- none: false
107
94
  requirements:
108
- - - ! '>='
95
+ - - ">="
109
96
  - !ruby/object:Gem::Version
110
97
  version: '0'
111
98
  description: A simple Ruby library for parsing Open Graph Protocol information from
@@ -116,39 +103,34 @@ executables: []
116
103
  extensions: []
117
104
  extra_rdoc_files:
118
105
  - LICENSE.txt
119
- - README.rdoc
106
+ - README.md
120
107
  files:
108
+ - LICENSE.txt
109
+ - README.md
121
110
  - lib/open_graph.rb
122
111
  - lib/opengraph_parser.rb
123
112
  - lib/redirect_follower.rb
124
- - LICENSE.txt
125
- - README.rdoc
126
113
  homepage: http://github.com/huyha85/opengraph_parser
127
114
  licenses:
128
115
  - MIT
129
- post_install_message:
116
+ metadata: {}
117
+ post_install_message:
130
118
  rdoc_options: []
131
119
  require_paths:
132
120
  - lib
133
121
  required_ruby_version: !ruby/object:Gem::Requirement
134
- none: false
135
122
  requirements:
136
- - - ! '>='
123
+ - - ">="
137
124
  - !ruby/object:Gem::Version
138
125
  version: '0'
139
- segments:
140
- - 0
141
- hash: -2567811217494491510
142
126
  required_rubygems_version: !ruby/object:Gem::Requirement
143
- none: false
144
127
  requirements:
145
- - - ! '>='
128
+ - - ">="
146
129
  - !ruby/object:Gem::Version
147
130
  version: '0'
148
131
  requirements: []
149
- rubyforge_project:
150
- rubygems_version: 1.8.24
151
- signing_key:
132
+ rubygems_version: 3.0.3
133
+ signing_key:
152
134
  specification_version: 3
153
135
  summary: A simple Ruby library for parsing Open Graph Protocol information from a
154
136
  website.
data/README.rdoc DELETED
@@ -1,45 +0,0 @@
1
- = OpengraphParser
2
-
3
- OpengraphParser is a simple Ruby library for parsing Open Graph protocol information from a web site. Learn more about the protocol at:
4
- http://ogp.me
5
-
6
- == Installation
7
- gem install opengraph_parser
8
-
9
- or add to Gemfile
10
-
11
- gem "opengraph_parser"
12
-
13
- == Usage
14
- og = OpenGraph.new("http://ogp.me")
15
- og.title # => "Open Graph protocol"
16
- og.type # => "website"
17
- og.url # => "http://ogp.me/"
18
- og.description # => "The Open Graph protocol enables any web page to become a rich object in a social graph."
19
- og.images # => ["http://ogp.me/logo.png"]
20
-
21
- You can also get other Open Graph metadata as:
22
- og.metadata # => {"og:image:type"=>"image/png", "og:image:width"=>"300", "og:image:height"=>"300"}
23
-
24
- If you try to parse Open Graph information for a website that doesn’t have any Open Graph metadata, the library will try to find other information in the website as the following rules:
25
- <title> for title
26
- <meta name="description"> for description
27
- <link rel="image_src"> or all <img> tags for images
28
-
29
- You can disable this fallback lookup by passing false to init method:
30
- og = OpenGraph.new("http://ogp.me", false)
31
-
32
- == Contributing to opengraph_parser
33
-
34
- * Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
35
- * Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
36
- * Fork the project.
37
- * Start a feature/bugfix branch.
38
- * Commit and push until you are happy with your contribution.
39
- * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
40
- * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
41
-
42
- == Copyright
43
-
44
- Copyright (c) 2012 Huy Ha. See LICENSE.txt for
45
- further details.