metainspector 1.0.2 → 1.2.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,5 @@
1
+ *.gem
2
+ .bundle
3
+ .rvmrc
4
+ Gemfile.lock
5
+ pkg/*
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in MetaInspector.gemspec
4
+ gemspec
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2009-2011 Jaime Iniesta
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,90 @@
1
+ = MetaInspector
2
+
3
+ MetaInspector is a gem for web scraping purposes. You give it an URL, and it lets you easily get its title, links, and meta tags.
4
+
5
+ = Installation
6
+
7
+ Install the gem from RubyGems:
8
+
9
+ gem install metainspector
10
+
11
+ = Usage
12
+
13
+ Initialize a scraper instance for an URL, like this:
14
+
15
+ page = MetaInspector::Scraper.new('http://pagerankalert.com')
16
+
17
+ or, for short, a convenience alias is also available:
18
+
19
+ page = MetaInspector.new('http://pagerankalert.com')
20
+
21
+ Then you can see the scraped data like this:
22
+
23
+ page.address # URL of the page
24
+ page.title # title of the page, as string
25
+ page.links # array of strings, with every link found on the page
26
+ page.meta_description # meta description, as string
27
+ page.meta_keywords # meta keywords, as string
28
+
29
+ MetaInspector uses dynamic methods for meta_tag discovery, so all these will work, and will be converted to a search of a meta tag by the corresponding name, and return its content attribute
30
+
31
+ page.meta_description # <meta name="description" content="..." />
32
+ page.meta_keywords # <meta name="keywords" content="..." />
33
+ page.meta_robots # <meta name="robots" content="..." />
34
+ page.meta_generator # <meta name="generator" content="..." />
35
+
36
+ It will also work for the meta tags of the form <meta http-equiv="name" ... />, like the following:
37
+
38
+ page.meta_content_language # <meta http-equiv="content-language" content="..." />
39
+ page.meta_Content_Type # <meta http-equiv="Content-Type" content="..." />
40
+
41
+ Please notice that MetaInspector is case sensitive, so page.meta_Content_Type is not the same as page.meta_content_type
42
+
43
+ The full scraped document if accessible from:
44
+
45
+ page.document # Nokogiri doc that you can use it to get any element from the page
46
+
47
+ = Examples
48
+
49
+ You can find some sample scripts on the samples folder, including a basic scraping and a spider that will follow external links using a queue. What follows is an example of use from irb:
50
+
51
+ $ irb
52
+ >> require 'metainspector'
53
+ => true
54
+
55
+ >> page = MetaInspector.new('http://pagerankalert.com')
56
+ => #<MetaInspector:0x11330c0 @document=nil, @links=nil, @address="http://pagerankalert.com", @description=nil, @keywords=nil, @title=nil>
57
+
58
+ >> page.title
59
+ => "PageRankAlert.com :: Track your PageRank changes"
60
+
61
+ >> page.meta_description
62
+ => "Track your PageRank(TM) changes and receive alerts by email"
63
+
64
+ >> page.meta_keywords
65
+ => "pagerank, seo, optimization, google"
66
+
67
+ >> page.links.size
68
+ => 8
69
+
70
+ >> page.links[5]
71
+ => "http://pagerankalert.posterous.com"
72
+
73
+ >> page.document.class
74
+ => String
75
+
76
+ >> page.parsed_document.class
77
+ => Nokogiri::HTML::Document
78
+
79
+ = To Do
80
+
81
+ * Get page.base_dir from the address
82
+ * Distinguish between external and internal links, returning page.links for all of them as found, page.external_links and page.internal_links converted to absolute URLs
83
+ * Return array of images in page as absolute URLs
84
+ * Be able to set a timeout in seconds
85
+ * If keywords seem to be separated by blank spaces, replace them with commas
86
+ * Mocks
87
+ * Check content type, process only HTML pages, don't try to scrape TAR files like http://ftp.ruby-lang.org/pub/ruby/ruby-1.9.1-p129.tar.bz2 or video files like http://isabel.dit.upm.es/component/option,com_docman/task,doc_download/gid,831/Itemid,74/
88
+ * Get most important image querying Facebook
89
+
90
+ Copyright (c) 2009-2011 Jaime Iniesta, released under the MIT license
data/Rakefile CHANGED
@@ -1,20 +1,2 @@
1
- # -*- ruby -*-
2
-
3
- require 'rubygems'
4
- require 'hoe'
5
- require 'open-uri'
6
- require 'hpricot'
7
- require './lib/metainspector.rb'
8
-
9
- Hoe.new('metainspector', MetaInspector::VERSION) do |p|
10
- p.rubyforge_name = 'metainspector'
11
- p.author = 'Jaime Iniesta'
12
- p.email = 'jaimeiniesta@gmail.com'
13
- p.summary = 'Ruby gem for web scraping purposes. It scrapes a given URL, and returns you a hash with data from it like for example the title, meta description, meta keywords, an array with all the links, all the images in it, etc.'
14
- p.description = p.paragraphs_of('README.txt', 2..5).join("\n\n")
15
- p.url = p.paragraphs_of('README.txt', 0).first.split(/\n/)[1..-1]
16
- p.changes = p.paragraphs_of('History.txt', 0..1).join("\n\n")
17
- p.extra_deps << "hpricot"
18
- end
19
-
20
- # vim: syntax=Ruby
1
+ require 'bundler'
2
+ Bundler::GemHelper.install_tasks
@@ -0,0 +1,12 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require_relative 'meta_inspector/scraper'
4
+
5
+ module MetaInspector
6
+ extend self
7
+
8
+ # Sugar method to be able to create a scraper in a shorter way
9
+ def new(url)
10
+ Scraper.new(url)
11
+ end
12
+ end
@@ -0,0 +1,81 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require 'open-uri'
4
+ require 'rubygems'
5
+ require 'nokogiri'
6
+ require 'charguess'
7
+ require 'iconv'
8
+
9
+ # MetaInspector provides an easy way to scrape web pages and get its elements
10
+ module MetaInspector
11
+ class Scraper
12
+ attr_reader :address
13
+
14
+ # Initializes a new instance of MetaInspector, setting the URL address to the one given
15
+ # TODO: validate address as http URL, dont initialize it if wrong format
16
+ def initialize(address)
17
+ @address = address
18
+
19
+ @document = @title = @description = @keywords = @links = nil
20
+ end
21
+
22
+ # Returns the parsed document title, from the content of the <title> tag.
23
+ # This is not the same as the meta_tite tag
24
+ def title
25
+ @title ||= parsed_document.css('title').inner_html rescue nil
26
+ end
27
+
28
+ # Returns the parsed document links
29
+ def links
30
+ @links ||= parsed_document.search("//a").map {|link| link.attributes["href"].to_s.strip} rescue nil
31
+ end
32
+
33
+ # Returns the charset
34
+ # TODO: We should trust the charset expressed on the Content-Type meta tag
35
+ # and only guess it if none given
36
+ def charset
37
+ @charset ||= CharGuess.guess(document).downcase
38
+ end
39
+
40
+ # Returns the whole parsed document
41
+ def parsed_document
42
+ @parsed_document ||= Nokogiri::HTML(document)
43
+
44
+ rescue
45
+ warn 'An exception occurred while trying to scrape the page!'
46
+ end
47
+
48
+ # Returns the original, unparsed document
49
+ def document
50
+ @document ||= open(@address).read
51
+
52
+ rescue SocketError
53
+ warn 'MetaInspector exception: The url provided does not exist or is temporarily unavailable (socket error)'
54
+ @scraped = false
55
+ rescue TimeoutError
56
+ warn 'Timeout!!!'
57
+ rescue
58
+ warn 'An exception occurred while trying to fetch the page!'
59
+ end
60
+
61
+ # Scrapers for all meta_tags in the form of "meta_name" are automatically defined. This has been tested for
62
+ # meta name: keywords, description, robots, generator
63
+ # meta http-equiv: content-language, Content-Type
64
+ #
65
+ # It will first try with meta name="..." and if nothing found,
66
+ # with meta http-equiv="...", substituting "_" by "-"
67
+ # TODO: this should be case unsensitive, so meta_robots gets the results from the HTML for robots, Robots, ROBOTS...
68
+ # TODO: cache results on instance variables, using ||=
69
+ # TODO: define respond_to? to return true on the meta_name methods
70
+ def method_missing(method_name)
71
+ if method_name.to_s =~ /^meta_(.*)/
72
+ content = parsed_document.css("meta[@name='#{$1}']").first['content'] rescue nil
73
+ content = parsed_document.css("meta[@http-equiv='#{$1.gsub("_", "-")}']").first['content'] rescue nil if content.nil?
74
+
75
+ content
76
+ else
77
+ super
78
+ end
79
+ end
80
+ end
81
+ end
@@ -0,0 +1,5 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ module MetaInspector
4
+ VERSION = "1.2.0"
5
+ end
@@ -1,47 +1,3 @@
1
- class MetaInspector
2
- require 'open-uri'
3
- require 'rubygems'
4
- require 'hpricot'
1
+ # -*- encoding: utf-8 -*-
5
2
 
6
- VERSION = '1.0.2'
7
-
8
- Hpricot.buffer_size = 300000
9
-
10
- def self.scrape(url)
11
- doc = Hpricot(open(url))
12
-
13
- # Searching title...
14
- if doc.at('title')
15
- title = doc.at('title').inner_html
16
- else
17
- title = ""
18
- end
19
-
20
- # Searching meta description...
21
- if doc.at("meta[@name='description']")
22
- description = doc.at("meta[@name='description']")['content']
23
- else
24
- description = ""
25
- end
26
-
27
- # Searching meta keywords...
28
- if doc.at("meta[@name='keywords']")
29
- keywords = doc.at("meta[@name='keywords']")['content']
30
- else
31
- keywords = ""
32
- end
33
-
34
- # Searching links...
35
- links = []
36
- doc.search("//a").each do |link|
37
- links << link.attributes["href"] if (!link.attributes["href"].nil?)
38
- end
39
-
40
- # Returning all data...
41
- {'ok' => true, 'title' => title, 'description' => description, 'keywords' => keywords, 'links' => links}
42
-
43
- rescue SocketError
44
- puts 'MetaInspector exception: The url provided does not exist or is temporarily unavailable (socket error)'
45
- {'ok' => false, 'title' => nil, 'description' => nil, 'keywords' => nil, 'links' => nil}
46
- end
47
- end
3
+ require 'meta_inspector'
@@ -0,0 +1,26 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "meta_inspector/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "metainspector"
7
+ s.version = MetaInspector::VERSION
8
+ s.platform = Gem::Platform::RUBY
9
+ s.authors = ["Jaime Iniesta"]
10
+ s.email = ["jaimeiniesta@gmail.com"]
11
+ s.homepage = "https://rubygems.org/gems/metainspector"
12
+ s.summary = %q{MetaInspector is a ruby gem for web scraping purposes, that returns a hash with metadata from a given URL}
13
+ s.description = %q{MetaInspector lets you scrape a web page and get its title, charset, link and meta tags}
14
+
15
+ s.rubyforge_project = "MetaInspector"
16
+
17
+ s.files = `git ls-files`.split("\n")
18
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
19
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
20
+ s.require_paths = ["lib"]
21
+
22
+ s.add_dependency 'nokogiri', '1.4.4'
23
+ s.add_dependency 'charguess', '1.3.20110226181011'
24
+
25
+ s.add_development_dependency 'rspec', '2.5.0'
26
+ end
@@ -0,0 +1,17 @@
1
+ # Some basic MetaInspector samples
2
+
3
+ require_relative '../lib/meta_inspector.rb'
4
+
5
+ puts "Enter a valid http address to scrape it"
6
+ address = gets.strip
7
+ page = MetaInspector.new(address)
8
+ puts "...please wait while scraping the page..."
9
+
10
+ puts "Scraping #{page.address} returned these results:"
11
+ puts "TITLE: #{page.title}"
12
+ puts "META DESCRIPTION: #{page.meta_description}"
13
+ puts "META KEYWORDS: #{page.meta_keywords}"
14
+ puts "#{page.links.size} links found..."
15
+ page.links.each do |link|
16
+ puts " ==> #{link}"
17
+ end
@@ -0,0 +1,28 @@
1
+ # A basic spider that will follow links on an infinite loop
2
+ require_relative '../lib/meta_inspector.rb'
3
+
4
+ q = Queue.new
5
+ visited_links=[]
6
+
7
+ puts "Enter a valid http address to spider it following external links"
8
+ address = gets.strip
9
+
10
+ page = MetaInspector.new(address)
11
+ q.push(address)
12
+
13
+ while q.size > 0
14
+ visited_links << address = q.pop
15
+ page = MetaInspector.new(address)
16
+ puts "Spidering #{page.address}"
17
+
18
+ puts "TITLE: #{page.title}"
19
+ puts "META DESCRIPTION: #{page.meta_description}"
20
+ puts "META KEYWORDS: #{page.meta_keywords}"
21
+ puts "LINKS: #{page.links.size}"
22
+ page.links.each do |link|
23
+ if link[0..6] == 'http://' && !visited_links.include?(link)
24
+ q.push(link)
25
+ end
26
+ end
27
+ puts "#{visited_links.size} pages visited, #{q.size} pages on queue\n\n"
28
+ end
@@ -0,0 +1,77 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require File.join(File.dirname(__FILE__), "/spec_helper")
4
+
5
+ describe MetaInspector do
6
+
7
+ context 'Doing a basic scrape' do
8
+ before(:each) do
9
+ @m = MetaInspector.new('http://pagerankalert.com')
10
+ end
11
+
12
+ it "should get the title" do
13
+ @m.title.should == 'PageRankAlert.com :: Track your PageRank changes'
14
+ end
15
+
16
+ it "should get the links" do
17
+ @m.links.size.should == 8
18
+ end
19
+
20
+ it "should have a Nokogiri::HTML::Document as parsed_document" do
21
+ @m.parsed_document.class.should == Nokogiri::HTML::Document
22
+ end
23
+
24
+ it "should have a String as document" do
25
+ @m.document.class.should == String
26
+ end
27
+ end
28
+
29
+ context 'Getting meta tags by ghost methods' do
30
+ before(:each) do
31
+ @m = MetaInspector.new('http://pagerankalert.com')
32
+ end
33
+
34
+ it "should get the robots meta tag" do
35
+ @m.meta_robots.should == 'all,follow'
36
+ end
37
+
38
+ it "should get the description meta tag" do
39
+ @m.meta_description.should == 'Track your PageRank(TM) changes and receive alerts by email'
40
+ end
41
+
42
+ it "should get the keywords meta tag" do
43
+ @m.meta_keywords.should == "pagerank, seo, optimization, google"
44
+ end
45
+
46
+ it "should get the content-language meta tag" do
47
+ pending "mocks"
48
+ @m.meta_content_language.should == "en"
49
+ end
50
+
51
+ it "should get the Content-Type meta tag" do
52
+ pending "mocks"
53
+ @m.meta_Content_Type.should == "text/html; charset=utf-8"
54
+ end
55
+
56
+ it "should get the generator meta tag" do
57
+ pending "mocks"
58
+ @m.meta_generator.should == 'WordPress 2.8.4'
59
+ end
60
+
61
+ it "should return nil for nonfound meta_tags" do
62
+ @m.meta_lollypop.should == nil
63
+ end
64
+ end
65
+
66
+ context 'Charset detection' do
67
+ it "should detect windows-1252 charset" do
68
+ @m = MetaInspector.new('http://www.alazan.com')
69
+ @m.charset.should == "windows-1252"
70
+ end
71
+
72
+ it "should detect utf-8 charset" do
73
+ @m = MetaInspector.new('http://www.pagerankalert.com')
74
+ @m.charset.should == "utf-8"
75
+ end
76
+ end
77
+ end
@@ -0,0 +1,4 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ $: << File.join(File.dirname(__FILE__), "/../lib")
4
+ require 'meta_inspector'
metadata CHANGED
@@ -1,72 +1,123 @@
1
1
  --- !ruby/object:Gem::Specification
2
- rubygems_version: 0.9.2
3
- specification_version: 1
4
2
  name: metainspector
5
3
  version: !ruby/object:Gem::Version
6
- version: 1.0.2
7
- date: 2007-12-10 00:00:00 +01:00
8
- summary: Ruby gem for web scraping purposes. It scrapes a given URL, and returns you a hash with data from it like for example the title, meta description, meta keywords, an array with all the links, all the images in it, etc.
9
- require_paths:
10
- - lib
11
- email: jaimeiniesta@gmail.com
12
- homepage: " by Jaime Iniesta"
13
- rubyforge_project: metainspector
14
- description: "== FEATURES/PROBLEMS: * Scrape a given URL and return data from its HTML == SYNOPSIS: # Require all gems and libs needed... require 'rubygems' require 'open-uri' require 'hpricot' require 'metainspector' # Scrape an URL... page_data = MetaInspector.scrape(url)"
15
- autorequire:
16
- default_executable:
17
- bindir: bin
18
- has_rdoc: true
19
- required_ruby_version: !ruby/object:Gem::Version::Requirement
20
- requirements:
21
- - - ">"
22
- - !ruby/object:Gem::Version
23
- version: 0.0.0
24
- version:
4
+ prerelease: false
5
+ segments:
6
+ - 1
7
+ - 2
8
+ - 0
9
+ version: 1.2.0
25
10
  platform: ruby
26
- signing_key:
27
- cert_chain:
28
- post_install_message:
29
11
  authors:
30
12
  - Jaime Iniesta
31
- files:
32
- - History.txt
33
- - Manifest.txt
34
- - README.txt
35
- - Rakefile
36
- - bin/metainspector
37
- - lib/metainspector.rb
38
- - test/test_metainspector.rb
39
- test_files:
40
- - test/test_metainspector.rb
41
- rdoc_options:
42
- - --main
43
- - README.txt
44
- extra_rdoc_files:
45
- - History.txt
46
- - Manifest.txt
47
- - README.txt
48
- executables:
49
- - metainspector
50
- extensions: []
51
-
52
- requirements: []
13
+ autorequire:
14
+ bindir: bin
15
+ cert_chain: []
53
16
 
17
+ date: 2011-05-05 00:00:00 +02:00
18
+ default_executable:
54
19
  dependencies:
55
20
  - !ruby/object:Gem::Dependency
56
- name: hpricot
57
- version_requirement:
58
- version_requirements: !ruby/object:Gem::Version::Requirement
21
+ name: nokogiri
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ none: false
25
+ requirements:
26
+ - - "="
27
+ - !ruby/object:Gem::Version
28
+ segments:
29
+ - 1
30
+ - 4
31
+ - 4
32
+ version: 1.4.4
33
+ type: :runtime
34
+ version_requirements: *id001
35
+ - !ruby/object:Gem::Dependency
36
+ name: charguess
37
+ prerelease: false
38
+ requirement: &id002 !ruby/object:Gem::Requirement
39
+ none: false
59
40
  requirements:
60
- - - ">"
41
+ - - "="
61
42
  - !ruby/object:Gem::Version
62
- version: 0.0.0
63
- version:
43
+ segments:
44
+ - 1
45
+ - 3
46
+ - 20110226181011
47
+ version: 1.3.20110226181011
48
+ type: :runtime
49
+ version_requirements: *id002
64
50
  - !ruby/object:Gem::Dependency
65
- name: hoe
66
- version_requirement:
67
- version_requirements: !ruby/object:Gem::Version::Requirement
51
+ name: rspec
52
+ prerelease: false
53
+ requirement: &id003 !ruby/object:Gem::Requirement
54
+ none: false
68
55
  requirements:
69
- - - ">="
56
+ - - "="
70
57
  - !ruby/object:Gem::Version
71
- version: 1.3.0
72
- version:
58
+ segments:
59
+ - 2
60
+ - 5
61
+ - 0
62
+ version: 2.5.0
63
+ type: :development
64
+ version_requirements: *id003
65
+ description: MetaInspector lets you scrape a web page and get its title, charset, link and meta tags
66
+ email:
67
+ - jaimeiniesta@gmail.com
68
+ executables: []
69
+
70
+ extensions: []
71
+
72
+ extra_rdoc_files: []
73
+
74
+ files:
75
+ - .gitignore
76
+ - Gemfile
77
+ - MIT-LICENSE
78
+ - README.rdoc
79
+ - Rakefile
80
+ - lib/meta_inspector.rb
81
+ - lib/meta_inspector/scraper.rb
82
+ - lib/meta_inspector/version.rb
83
+ - lib/metainspector.rb
84
+ - meta_inspector.gemspec
85
+ - samples/basic_scraping.rb
86
+ - samples/spider.rb
87
+ - spec/metainspector_spec.rb
88
+ - spec/spec_helper.rb
89
+ has_rdoc: true
90
+ homepage: https://rubygems.org/gems/metainspector
91
+ licenses: []
92
+
93
+ post_install_message:
94
+ rdoc_options: []
95
+
96
+ require_paths:
97
+ - lib
98
+ required_ruby_version: !ruby/object:Gem::Requirement
99
+ none: false
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ segments:
104
+ - 0
105
+ version: "0"
106
+ required_rubygems_version: !ruby/object:Gem::Requirement
107
+ none: false
108
+ requirements:
109
+ - - ">="
110
+ - !ruby/object:Gem::Version
111
+ segments:
112
+ - 0
113
+ version: "0"
114
+ requirements: []
115
+
116
+ rubyforge_project: MetaInspector
117
+ rubygems_version: 1.3.7
118
+ signing_key:
119
+ specification_version: 3
120
+ summary: MetaInspector is a ruby gem for web scraping purposes, that returns a hash with metadata from a given URL
121
+ test_files:
122
+ - spec/metainspector_spec.rb
123
+ - spec/spec_helper.rb
@@ -1,21 +0,0 @@
1
- == 1.0.2 / 2007-12-10
2
-
3
- * Open-uri, Rubygems and Hpricot required at the MetaInspector class, so you won't need to require them. Just require metainspector and they will be included along.
4
-
5
- * Rescue in case of socket error. If the URL does not exist or is unreachable, it will catch the SocketError exception and return 'ok' => false.
6
-
7
- * Added hpricot as extra dependency so it will be automatically installed when you install the metainspector gem.
8
-
9
- * Misc code cleanup... "if !doc.at('title').nil?" is the same as "if doc.at('title')"
10
-
11
- * Thanks to David Calavera (http://thinkincode.net) and Juan Alvarez (http://ruby.reboot.com.mx/) for their comments and contributions to this release.
12
-
13
- == 1.0.1 / 2007-12-06
14
-
15
- * Added some info at README.txt, translated all methods to English
16
-
17
- == 1.0.0 / 2007-12-06
18
-
19
- * MetaInspector is born!
20
- * Birthday!
21
-
@@ -1,7 +0,0 @@
1
- History.txt
2
- Manifest.txt
3
- README.txt
4
- Rakefile
5
- bin/metainspector
6
- lib/metainspector.rb
7
- test/test_metainspector.rb
data/README.txt DELETED
@@ -1,62 +0,0 @@
1
- metainspector
2
- by Jaime Iniesta
3
- http://metainspector.rubyforge.org/
4
-
5
- == DESCRIPTION:
6
-
7
- Ruby gem for web scraping purposes. It scrapes a given URL, and returns you a hash with data from it like for example the title, meta description, meta keywords, an array with all the links, all the images in it, etc.
8
-
9
- == FEATURES/PROBLEMS:
10
-
11
- * Scrape a given URL and return data from its HTML
12
-
13
- == SYNOPSIS:
14
-
15
- # Require all gems and libs needed...
16
- require 'rubygems'
17
- require 'open-uri'
18
- require 'hpricot'
19
- require 'metainspector'
20
-
21
- # Scrape an URL...
22
- page_data = MetaInspector.scrape(url)
23
-
24
- # See extracted data...
25
- page_data['title']
26
- page_data['description']
27
- page_data['keywords']
28
- page_data['links']
29
-
30
- == REQUIREMENTS:
31
-
32
- * open-uri
33
- * hpricot
34
-
35
- == INSTALL:
36
-
37
- * sudo gem install metainspector
38
-
39
- == LICENSE:
40
-
41
- (The MIT License)
42
-
43
- Copyright (c) 2007 Jaime Iniesta
44
-
45
- Permission is hereby granted, free of charge, to any person obtaining
46
- a copy of this software and associated documentation files (the
47
- 'Software'), to deal in the Software without restriction, including
48
- without limitation the rights to use, copy, modify, merge, publish,
49
- distribute, sublicense, and/or sell copies of the Software, and to
50
- permit persons to whom the Software is furnished to do so, subject to
51
- the following conditions:
52
-
53
- The above copyright notice and this permission notice shall be
54
- included in all copies or substantial portions of the Software.
55
-
56
- THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
57
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
58
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
59
- IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
60
- CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
61
- TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
62
- SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
File without changes
@@ -1 +0,0 @@
1
-