opengraph_parser 0.2.0 → 0.2.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: df0c81f7b80017bff4c885d2c414eef59fabf51270ed39deb4c865ad26be55ac
4
+ data.tar.gz: '02908954486292679e02fe035aedce81fe1575d49ad8d09419bdcdaf43dceb3e'
5
+ SHA512:
6
+ metadata.gz: aeff1818cd446174ccd6737c11faa59296184186480f373a43967cab9a36a89412cca711a81d1ed6572b20bea2b5b5909b6bcd2614e2047128a2684827135e10
7
+ data.tar.gz: 196240064499563538c518f454ac15d8498003bae0a732f13e09e109ac0e5037bb3ce5431c47858ea1e797a0622d04ba51c52df3f9f8eab84bd69a7f1074a61f
data/README.md ADDED
@@ -0,0 +1,74 @@
1
+ # OpengraphParser
2
+
3
+ OpengraphParser is a simple Ruby library for parsing Open Graph protocol information from a website. Learn more about the protocol at:
4
+ http://ogp.me
5
+
6
+ ## Installation
7
+
8
+ ```bash
9
+ gem install opengraph_parser
10
+ ```
11
+
12
+ or add to Gemfile
13
+
14
+ ```bash
15
+ gem "opengraph_parser"
16
+ ```
17
+
18
+ ## Usage
19
+
20
+ ### Parsing an URL
21
+
22
+ ```ruby
23
+ og = OpenGraph.new("http://ogp.me")
24
+ og.title # => "Open Graph protocol"
25
+ og.type # => "website"
26
+ og.url # => "http://ogp.me/"
27
+ og.description # => "The Open Graph protocol enables any web page to become a rich object in a social graph."
28
+ og.images # => ["http://ogp.me/logo.png"]
29
+ ```
30
+
31
+ You can also get other Open Graph metadata as:
32
+
33
+ ```ruby
34
+ og.metadata # => {"og:image:type"=>"image/png", "og:image:width"=>"300", "og:image:height"=>"300"}
35
+ ```
36
+
37
+ ### Parsing a HTML document
38
+
39
+ ```ruby
40
+ og = OpenGraph.new(html_string)
41
+ ```
42
+
43
+ ### Custom header fields
44
+ In some cases you may need to change fields in HTTP request header for an URL
45
+ ```ruby
46
+ og = OpenGraph.new("http://opg.me", { :headers => {'User-Agent' => 'Custom User Agent'} })
47
+ ```
48
+
49
+ ### Fallback
50
+ If you try to parse Open Graph information for a website that doesn’t have any Open Graph metadata, the library will try to find other information in the website as the following rules:
51
+
52
+ * `<title>` for title
53
+ * `<meta name="description">` for description
54
+ * `<link rel="image_src">` or all `<img>` tags for images
55
+
56
+ You can disable this fallback lookup by passing false to init method:
57
+
58
+ ```ruby
59
+ og = OpenGraph.new("http://ogp.me", false)
60
+ ```
61
+
62
+ ## Contributing to opengraph_parser
63
+
64
+ * Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
65
+ * Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
66
+ * Fork the project.
67
+ * Start a feature/bugfix branch.
68
+ * Commit and push until you are happy with your contribution.
69
+ * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
70
+ * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
71
+
72
+ ## Copyright
73
+
74
+ Copyright (c) 2013 Huy Ha. See LICENSE.txt for further details.
data/lib/open_graph.rb CHANGED
@@ -1,31 +1,43 @@
1
1
  require 'nokogiri'
2
2
  require 'redirect_follower'
3
3
  require "addressable/uri"
4
+ require 'uri'
4
5
 
5
6
  class OpenGraph
6
- attr_accessor :src, :url, :type, :title, :description, :images, :metadata, :response, :original_images
7
+ attr_accessor :src, :url, :type, :title, :description, :images, :metadata, :response, :original_images, :html_content
7
8
 
8
- def initialize(src, fallback = true)
9
+ def initialize(src, fallback = true, options = {})
10
+ if fallback.is_a? Hash
11
+ options = fallback
12
+ fallback = true
13
+ end
9
14
  @src = src
15
+ @body = nil
10
16
  @images = []
11
17
  @metadata = {}
12
- parse_opengraph
18
+ parse_opengraph(options)
13
19
  load_fallback if fallback
14
20
  check_images_path
15
21
  end
16
22
 
17
23
  private
18
- def parse_opengraph
24
+ def parse_opengraph(options = {})
19
25
  begin
20
- @response = RedirectFollower.new(@src).resolve
26
+ if @src.include? '</html>'
27
+ @body = @src
28
+ @html_content = true
29
+ else
30
+ @body = RedirectFollower.new(@src, options).resolve.body
31
+ @html_content = false
32
+ end
21
33
  rescue
22
34
  @title = @url = @src
23
35
  return
24
36
  end
25
37
 
26
- if @response && @response.body
38
+ if @body
27
39
  attrs_list = %w(title url type description)
28
- doc = Nokogiri.parse(@response.body)
40
+ doc = Nokogiri.parse(@body)
29
41
  doc.css('meta').each do |m|
30
42
  if m.attribute('property') && m.attribute('property').to_s.match(/^og:(.+)$/i)
31
43
  m_content = m.attribute('content').to_s.strip
@@ -43,8 +55,8 @@ class OpenGraph
43
55
  end
44
56
 
45
57
  def load_fallback
46
- if @response && @response.body
47
- doc = Nokogiri.parse(@response.body)
58
+ if @body
59
+ doc = Nokogiri.parse(@body)
48
60
 
49
61
  if @title.to_s.empty? && doc.xpath("//head//title").size > 0
50
62
  @title = doc.xpath("//head//title").first.text.to_s.strip
@@ -56,6 +68,10 @@ class OpenGraph
56
68
  @description = description_meta.attribute("content").to_s.strip
57
69
  end
58
70
 
71
+ if @description.to_s.empty?
72
+ @description = fetch_first_text(doc)
73
+ end
74
+
59
75
  fetch_images(doc, "//head//link[@rel='image_src']", "href") if @images.empty?
60
76
  fetch_images(doc, "//img", "src") if @images.empty?
61
77
  end
@@ -63,7 +79,11 @@ class OpenGraph
63
79
 
64
80
  def check_images_path
65
81
  @original_images = @images.dup
66
- uri = Addressable::URI.parse(@src)
82
+
83
+ uri = Addressable::URI.parse(@url || @src)
84
+
85
+ return unless uri
86
+
67
87
  imgs = @images.dup
68
88
  @images = []
69
89
  imgs.each do |img|
@@ -86,6 +106,13 @@ class OpenGraph
86
106
  end
87
107
  end
88
108
 
109
+ def fetch_first_text(doc)
110
+ doc.xpath('//p').each do |p|
111
+ s = p.text.to_s.strip
112
+ return s if s.length > 20
113
+ end
114
+ end
115
+
89
116
  def add_metadata(metadata_container, path, content)
90
117
  path_elements = path.split(':')
91
118
  if path_elements.size > 1
@@ -107,4 +134,4 @@ class OpenGraph
107
134
  metadata_container
108
135
  end
109
136
  end
110
- end
137
+ end
@@ -1,27 +1,30 @@
1
1
  require 'net/https'
2
2
 
3
3
  class RedirectFollower
4
+ REDIRECT_DEFAULT_LIMIT = 5
4
5
  class TooManyRedirects < StandardError; end
5
6
 
6
- attr_accessor :url, :body, :redirect_limit, :response
7
+ attr_accessor :url, :body, :redirect_limit, :response, :headers
7
8
 
8
- def initialize(url, limit = 5)
9
- @url, @redirect_limit = url, limit
9
+ def initialize(url, options = {})
10
+ @url = url
11
+ @redirect_limit = options[:redirect_limit] || REDIRECT_DEFAULT_LIMIT
12
+ @headers = options[:headers] || {}
10
13
  end
11
14
 
12
15
  def resolve
13
16
  raise TooManyRedirects if redirect_limit < 0
14
17
 
15
- uri = URI.parse(URI.escape(url))
18
+ uri = Addressable::URI.parse(url)
19
+
20
+ http = Net::HTTP.new(uri.host, uri.port)
16
21
  if uri.scheme == 'https'
17
- https = Net::HTTP.new(uri.host, 443)
18
- https.use_ssl = true
19
- https.verify_mode = OpenSSL::SSL::VERIFY_PEER
20
- self.response = https.request_get(uri.request_uri)
21
- else
22
- self.response = Net::HTTP.get_response(uri)
22
+ http.use_ssl = true
23
+ http.verify_mode = OpenSSL::SSL::VERIFY_PEER
23
24
  end
24
25
 
26
+ self.response = http.request_get(uri.request_uri, @headers)
27
+
25
28
  if response.kind_of?(Net::HTTPRedirection)
26
29
  self.url = redirect_url
27
30
  self.redirect_limit -= 1
@@ -39,4 +42,4 @@ class RedirectFollower
39
42
  response['location']
40
43
  end
41
44
  end
42
- end
45
+ end
metadata CHANGED
@@ -1,83 +1,100 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: opengraph_parser
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
5
- prerelease:
4
+ version: 0.2.4
6
5
  platform: ruby
7
6
  authors:
8
7
  - Huy Ha
9
8
  - Duc Trinh
10
- autorequire:
9
+ autorequire:
11
10
  bindir: bin
12
11
  cert_chain: []
13
- date: 2013-01-16 00:00:00.000000000 Z
12
+ date: 2021-12-23 00:00:00.000000000 Z
14
13
  dependencies:
15
14
  - !ruby/object:Gem::Dependency
16
15
  name: nokogiri
17
- requirement: &70256296829180 !ruby/object:Gem::Requirement
18
- none: false
16
+ requirement: !ruby/object:Gem::Requirement
19
17
  requirements:
20
- - - ! '>='
18
+ - - ">="
21
19
  - !ruby/object:Gem::Version
22
20
  version: '0'
23
21
  type: :runtime
24
22
  prerelease: false
25
- version_requirements: *70256296829180
23
+ version_requirements: !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - ">="
26
+ - !ruby/object:Gem::Version
27
+ version: '0'
26
28
  - !ruby/object:Gem::Dependency
27
29
  name: addressable
28
- requirement: &70256296828540 !ruby/object:Gem::Requirement
29
- none: false
30
+ requirement: !ruby/object:Gem::Requirement
30
31
  requirements:
31
- - - ! '>='
32
+ - - ">="
32
33
  - !ruby/object:Gem::Version
33
34
  version: '0'
34
35
  type: :runtime
35
36
  prerelease: false
36
- version_requirements: *70256296828540
37
+ version_requirements: !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - ">="
40
+ - !ruby/object:Gem::Version
41
+ version: '0'
37
42
  - !ruby/object:Gem::Dependency
38
43
  name: rspec
39
- requirement: &70256296827720 !ruby/object:Gem::Requirement
40
- none: false
44
+ requirement: !ruby/object:Gem::Requirement
41
45
  requirements:
42
- - - ! '>='
46
+ - - ">="
43
47
  - !ruby/object:Gem::Version
44
48
  version: '0'
45
49
  type: :development
46
50
  prerelease: false
47
- version_requirements: *70256296827720
51
+ version_requirements: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ version: '0'
48
56
  - !ruby/object:Gem::Dependency
49
57
  name: rdoc
50
- requirement: &70256296826860 !ruby/object:Gem::Requirement
51
- none: false
58
+ requirement: !ruby/object:Gem::Requirement
52
59
  requirements:
53
- - - ! '>='
60
+ - - ">="
54
61
  - !ruby/object:Gem::Version
55
62
  version: '0'
56
63
  type: :development
57
64
  prerelease: false
58
- version_requirements: *70256296826860
65
+ version_requirements: !ruby/object:Gem::Requirement
66
+ requirements:
67
+ - - ">="
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
59
70
  - !ruby/object:Gem::Dependency
60
71
  name: bundler
61
- requirement: &70256296825620 !ruby/object:Gem::Requirement
62
- none: false
72
+ requirement: !ruby/object:Gem::Requirement
63
73
  requirements:
64
- - - ! '>='
74
+ - - ">="
65
75
  - !ruby/object:Gem::Version
66
76
  version: '0'
67
77
  type: :development
68
78
  prerelease: false
69
- version_requirements: *70256296825620
79
+ version_requirements: !ruby/object:Gem::Requirement
80
+ requirements:
81
+ - - ">="
82
+ - !ruby/object:Gem::Version
83
+ version: '0'
70
84
  - !ruby/object:Gem::Dependency
71
85
  name: jeweler
72
- requirement: &70256296818460 !ruby/object:Gem::Requirement
73
- none: false
86
+ requirement: !ruby/object:Gem::Requirement
74
87
  requirements:
75
- - - ! '>='
88
+ - - ">="
76
89
  - !ruby/object:Gem::Version
77
90
  version: '0'
78
91
  type: :development
79
92
  prerelease: false
80
- version_requirements: *70256296818460
93
+ version_requirements: !ruby/object:Gem::Requirement
94
+ requirements:
95
+ - - ">="
96
+ - !ruby/object:Gem::Version
97
+ version: '0'
81
98
  description: A simple Ruby library for parsing Open Graph Protocol information from
82
99
  a website. It also includes a fallback solution when the website has no Open Graph
83
100
  information.
@@ -86,39 +103,34 @@ executables: []
86
103
  extensions: []
87
104
  extra_rdoc_files:
88
105
  - LICENSE.txt
89
- - README.rdoc
106
+ - README.md
90
107
  files:
108
+ - LICENSE.txt
109
+ - README.md
91
110
  - lib/open_graph.rb
92
111
  - lib/opengraph_parser.rb
93
112
  - lib/redirect_follower.rb
94
- - LICENSE.txt
95
- - README.rdoc
96
113
  homepage: http://github.com/huyha85/opengraph_parser
97
114
  licenses:
98
115
  - MIT
99
- post_install_message:
116
+ metadata: {}
117
+ post_install_message:
100
118
  rdoc_options: []
101
119
  require_paths:
102
120
  - lib
103
121
  required_ruby_version: !ruby/object:Gem::Requirement
104
- none: false
105
122
  requirements:
106
- - - ! '>='
123
+ - - ">="
107
124
  - !ruby/object:Gem::Version
108
125
  version: '0'
109
- segments:
110
- - 0
111
- hash: -777082750492777387
112
126
  required_rubygems_version: !ruby/object:Gem::Requirement
113
- none: false
114
127
  requirements:
115
- - - ! '>='
128
+ - - ">="
116
129
  - !ruby/object:Gem::Version
117
130
  version: '0'
118
131
  requirements: []
119
- rubyforge_project:
120
- rubygems_version: 1.8.10
121
- signing_key:
132
+ rubygems_version: 3.0.3
133
+ signing_key:
122
134
  specification_version: 3
123
135
  summary: A simple Ruby library for parsing Open Graph Protocol information from a
124
136
  website.
data/README.rdoc DELETED
@@ -1,45 +0,0 @@
1
- = OpengraphParser
2
-
3
- OpengraphParser is a simple Ruby library for parsing Open Graph protocol information from a web site. Learn more about the protocol at:
4
- http://ogp.me
5
-
6
- == Installation
7
- gem install opengraph_parser
8
-
9
- or add to Gemfile
10
-
11
- gem "opengraph_parser"
12
-
13
- == Usage
14
- og = OpenGraph.new("http://ogp.me")
15
- og.title # => "Open Graph protocol"
16
- og.type # => "website"
17
- og.url # => "http://ogp.me/"
18
- og.description # => "The Open Graph protocol enables any web page to become a rich object in a social graph."
19
- og.images # => ["http://ogp.me/logo.png"]
20
-
21
- You can also get other Open Graph metadata as:
22
- og.metadata # => {"og:image:type"=>"image/png", "og:image:width"=>"300", "og:image:height"=>"300"}
23
-
24
- If you try to parse Open Graph information for a website that doesn’t have any Open Graph metadata, the library will try to find other information in the website as the following rules:
25
- <title> for title
26
- <meta name="description"> for description
27
- <link rel="image_src"> or all <img> tags for images
28
-
29
- You can disable this fallback lookup by passing false to init method:
30
- og = OpenGraph.new("http://ogp.me", false)
31
-
32
- == Contributing to opengraph_parser
33
-
34
- * Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
35
- * Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
36
- * Fork the project.
37
- * Start a feature/bugfix branch.
38
- * Commit and push until you are happy with your contribution.
39
- * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
40
- * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
41
-
42
- == Copyright
43
-
44
- Copyright (c) 2012 Huy Ha. See LICENSE.txt for
45
- further details.