gscraper 0.3.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --colour --format documentation
@@ -1,3 +1,25 @@
1
+ ### 0.4.0 / 2012-04-26
2
+
3
+ * Switched from Bundler to rubygems-tasks ~> 0.1.
4
+ * Switched from json_pure to json ~> 1.6.
5
+ * Require uri-query_params ~> 0.5.
6
+ * Require mechanize ~> 2.0.
7
+ * Added {GScraper::Search::Blocked}.
8
+ * Added {GScraper::Hosts}.
9
+ * Added {GScraper::Languages}.
10
+ * Added {GScraper::Search::Query#define}.
11
+ * Added `:load_balance` option to {GScraper::Search::Query#initialize}, which
12
+ will randomize {GScraper::Search::Query#search_host}.
13
+ * Allow `:all*` / `:with*` search options to accept a String or Array values.
14
+ * Allow {GScraper::Search::WebQuery} and {GScraper::Search::AJAXQuery} to
15
+ submit queries to alternate domains via the `:search_host` option.
16
+ * Renamed `#occurrs_within`, `:occurrs_within` to `#occurs_within`,
17
+ `:occurs_within`, respectively in {GScraper::Search::WebQuery}.
18
+ * Prefer XPath over CSS-path expressions.
19
+ * Fixed XPath expressions in {GScraper::Search::WebQuery#page}
20
+ (thanks Jake Auswick and Ezekiel Templin).
21
+ * Fixed spelling errors.
22
+
1
23
  ### 0.3.0 / 2010-07-01
2
24
 
3
25
  * Upgraded to mechanize ~> 1.0.0.
@@ -13,7 +35,7 @@
13
35
  * Aliased {GScraper::Search::WebQuery#links_to=} to `link=`.
14
36
  * Removed `GScraper.open_uri`.
15
37
  * Removed `GScraper.open_page`.
16
- * Fixed the escaping/unescaping of URL query params in {URI::QueryParams}.
38
+ * Fixed the escaping/unescaping of URL query params in `URI::QueryParams`.
17
39
  * Use `yield` instead of `block.call`, when possible.
18
40
  * All enumerable methods now return an `Enumerator` object, if no block was
19
41
  given.
@@ -57,7 +79,7 @@
57
79
  ### 0.1.8 / 2008-04-30
58
80
 
59
81
  * Added the {GScraper.user_agent_alias=} method.
60
- * Added {URI::HTTP::QueryParams} module.
82
+ * Added `URI::HTTP::QueryParams` module.
61
83
  * Changed license from MIT to GPL-2.
62
84
 
63
85
  ### 0.1.7 / 2008-04-28
data/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # GScraper
2
2
 
3
- * [github.com/postmodern/gscraper](http://github.com/postmodern/gscraper/)
4
- * [github.com/postmodern/gscraper/issues](http://github.com/postmodern/gscraper/issues)
5
- * Postmodern (postmodern.mod3 at gmail.com)
3
+ * [Source](https://github.com/postmodern/gscraper/)
4
+ * [Issues](https://github.com/postmodern/gscraper/issues)
5
+ * [Email](mailto:postmodern.mod3 at gmail.com)
6
6
 
7
7
  ## Description
8
8
 
@@ -18,11 +18,16 @@ GScraper is a web-scraping interface to various Google Services.
18
18
 
19
19
  ## Requirements
20
20
 
21
- * [mechanize](http://mechanize.rubyforge.org/mechanize/) ~> 1.0.0
21
+ * [json](http://flori.github.com/json/)
22
+ ~> 1.6
23
+ * [uri-query_params](https://github.com/postmodern/uri-query_params#readme)
24
+ ~> 0.5
25
+ * [mechanize](http://mechanize.rubyforge.org/mechanize/)
26
+ ~> 2.0
22
27
 
23
28
  ## Install
24
29
 
25
- $ sudo gem install gscraper
30
+ $ gem install gscraper
26
31
 
27
32
  ## Examples
28
33
 
@@ -44,7 +49,7 @@ Queries from URLs:
44
49
 
45
50
  q.query # => "ruby"
46
51
  q.with_words # => "rails"
47
- q.occurrs_within # => :title
52
+ q.occurs_within # => :title
48
53
  q.rights # => :cc_by_nc
49
54
 
50
55
  Getting the search results:
@@ -128,7 +133,7 @@ Setting the User-Agent globally:
128
133
 
129
134
  GScraper - A web-scraping interface to various Google Services.
130
135
 
131
- Copyright (c) 2007-2009 Hal Brodigan (postmodern.mod3 at gmail.com)
136
+ Copyright (c) 2007-2012 Hal Brodigan (postmodern.mod3 at gmail.com)
132
137
 
133
138
  This program is free software; you can redistribute it and/or modify
134
139
  it under the terms of the GNU General Public License as published by
data/Rakefile CHANGED
@@ -1,38 +1,35 @@
1
1
  require 'rubygems'
2
- require 'bundler'
2
+ require 'rake'
3
3
 
4
4
  begin
5
- Bundler.setup(:development, :doc)
6
- rescue Bundler::BundlerError => e
7
- STDERR.puts e.message
8
- STDERR.puts "Run `bundle install` to install missing gems"
9
- exit e.status_code
10
- end
5
+ gem 'rubygems-tasks', '~> 0.1'
6
+ require 'rubygems/tasks'
11
7
 
12
- require 'rake'
13
- require 'jeweler'
14
- require './lib/gscraper/version.rb'
15
-
16
- Jeweler::Tasks.new do |gem|
17
- gem.name = 'gscraper'
18
- gem.version = GScraper::VERSION
19
- gem.license = 'GPL-2'
20
- gem.summary = %Q{GScraper is a web-scraping interface to various Google Services.}
21
- gem.description = %Q{GScraper is a web-scraping interface to various Google Services.}
22
- gem.email = 'postmodern.mod3@gmail.com'
23
- gem.homepage = 'http://github.com/postmodern/gscraper'
24
- gem.authors = ['Postmodern']
25
- gem.has_rdoc = 'yard'
8
+ Gem::Tasks.new
9
+ rescue LoadError => e
10
+ warn e.message
11
+ warn "Run `gem install rubygems-tasks` to install 'rubygems/tasks'."
26
12
  end
27
13
 
28
- require 'spec/rake/spectask'
29
- Spec::Rake::SpecTask.new(:spec) do |spec|
30
- spec.libs += ['lib', 'spec']
31
- spec.spec_files = FileList['spec/**/*_spec.rb']
32
- spec.spec_opts = ['--options', '.specopts']
33
- end
14
+ begin
15
+ gem 'rspec', '~> 2.4'
16
+ require 'rspec/core/rake_task'
34
17
 
18
+ RSpec::Core::RakeTask.new
19
+ rescue LoadError => e
20
+ task :spec do
21
+ abort "Please run `gem install rspec` to install RSpec."
22
+ end
23
+ end
35
24
  task :default => :spec
36
25
 
37
- require 'yard'
38
- YARD::Rake::YardocTask.new
26
+ begin
27
+ gem 'yard', '~> 0.6.0'
28
+ require 'yard'
29
+
30
+ YARD::Rake::YardocTask.new
31
+ rescue LoadError => e
32
+ task :yard do
33
+ abort "Please run `gem install yard` to install YARD."
34
+ end
35
+ end
@@ -0,0 +1,20 @@
1
+ name: gscraper
2
+ summary: Web-scraping interface to various Google Services.
3
+ description:
4
+ GScraper is a web-scraping interface to various Google Services.
5
+
6
+ license: GPL-2
7
+ authors: Postmodern
8
+ email: postmodern.mod3@gmail.com
9
+ homepage: https://github.com/postmodern/gscraper
10
+ has_yard: true
11
+
12
+ dependencies:
13
+ json: ~> 1.6
14
+ uri-query_params: ~> 0.5
15
+ mechanize: ~> 2.0
16
+
17
+ development_dependencies:
18
+ rubygems-tasks: ~> 0.1
19
+ rspec: ~> 2.4
20
+ yard: ~> 0.6
@@ -1,112 +1,127 @@
1
- # Generated by jeweler
2
- # DO NOT EDIT THIS FILE DIRECTLY
3
- # Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
4
- # -*- encoding: utf-8 -*-
5
-
6
- Gem::Specification.new do |s|
7
- s.name = %q{gscraper}
8
- s.version = "0.3.0"
9
-
10
- s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
- s.authors = ["Postmodern"]
12
- s.date = %q{2010-07-02}
13
- s.description = %q{GScraper is a web-scraping interface to various Google Services.}
14
- s.email = %q{postmodern.mod3@gmail.com}
15
- s.extra_rdoc_files = [
16
- "ChangeLog.md",
17
- "README.md"
18
- ]
19
- s.files = [
20
- ".gitignore",
21
- ".specopts",
22
- ".yardopts",
23
- "COPYING.txt",
24
- "ChangeLog.md",
25
- "Gemfile",
26
- "README.md",
27
- "Rakefile",
28
- "gscraper.gemspec",
29
- "lib/gscraper.rb",
30
- "lib/gscraper/extensions.rb",
31
- "lib/gscraper/extensions/uri.rb",
32
- "lib/gscraper/extensions/uri/http.rb",
33
- "lib/gscraper/extensions/uri/query_params.rb",
34
- "lib/gscraper/gscraper.rb",
35
- "lib/gscraper/has_pages.rb",
36
- "lib/gscraper/licenses.rb",
37
- "lib/gscraper/page.rb",
38
- "lib/gscraper/search.rb",
39
- "lib/gscraper/search/ajax_query.rb",
40
- "lib/gscraper/search/page.rb",
41
- "lib/gscraper/search/query.rb",
42
- "lib/gscraper/search/result.rb",
43
- "lib/gscraper/search/search.rb",
44
- "lib/gscraper/search/web_query.rb",
45
- "lib/gscraper/sponsored_ad.rb",
46
- "lib/gscraper/sponsored_links.rb",
47
- "lib/gscraper/version.rb",
48
- "spec/extensions/uri/http_spec.rb",
49
- "spec/extensions/uri/query_params_spec.rb",
50
- "spec/gscraper_spec.rb",
51
- "spec/has_pages_examples.rb",
52
- "spec/has_sponsored_links_examples.rb",
53
- "spec/helpers/query.rb",
54
- "spec/helpers/uri.rb",
55
- "spec/page_has_results_examples.rb",
56
- "spec/search/ajax_query_spec.rb",
57
- "spec/search/page_has_results_examples.rb",
58
- "spec/search/query_spec.rb",
59
- "spec/search/web_query_spec.rb",
60
- "spec/spec_helper.rb"
61
- ]
62
- s.has_rdoc = %q{yard}
63
- s.homepage = %q{http://github.com/postmodern/gscraper}
64
- s.licenses = ["GPL-2"]
65
- s.require_paths = ["lib"]
66
- s.rubygems_version = %q{1.3.7}
67
- s.summary = %q{GScraper is a web-scraping interface to various Google Services.}
68
- s.test_files = [
69
- "spec/extensions/uri/http_spec.rb",
70
- "spec/extensions/uri/query_params_spec.rb",
71
- "spec/gscraper_spec.rb",
72
- "spec/has_pages_examples.rb",
73
- "spec/has_sponsored_links_examples.rb",
74
- "spec/helpers/query.rb",
75
- "spec/helpers/uri.rb",
76
- "spec/page_has_results_examples.rb",
77
- "spec/search/ajax_query_spec.rb",
78
- "spec/search/page_has_results_examples.rb",
79
- "spec/search/query_spec.rb",
80
- "spec/search/web_query_spec.rb",
81
- "spec/spec_helper.rb"
82
- ]
83
-
84
- if s.respond_to? :specification_version then
85
- current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
86
- s.specification_version = 3
87
-
88
- if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
89
- s.add_runtime_dependency(%q<json_pure>, ["~> 1.4.0"])
90
- s.add_runtime_dependency(%q<mechanize>, ["~> 1.0.0"])
91
- s.add_development_dependency(%q<bundler>, ["~> 0.9.19"])
92
- s.add_development_dependency(%q<rake>, ["~> 0.8.7"])
93
- s.add_development_dependency(%q<jeweler>, ["~> 1.4.0"])
94
- s.add_development_dependency(%q<rspec>, ["~> 1.3.0"])
95
- else
96
- s.add_dependency(%q<json_pure>, ["~> 1.4.0"])
97
- s.add_dependency(%q<mechanize>, ["~> 1.0.0"])
98
- s.add_dependency(%q<bundler>, ["~> 0.9.19"])
99
- s.add_dependency(%q<rake>, ["~> 0.8.7"])
100
- s.add_dependency(%q<jeweler>, ["~> 1.4.0"])
101
- s.add_dependency(%q<rspec>, ["~> 1.3.0"])
1
+ # encoding: utf-8
2
+
3
+ require 'yaml'
4
+
5
+ Gem::Specification.new do |gemspec|
6
+ files = if File.directory?('.git')
7
+ `git ls-files`.split($/)
8
+ elsif File.directory?('.hg')
9
+ `hg manifest`.split($/)
10
+ elsif File.directory?('.svn')
11
+ `svn ls -R`.split($/).select { |path| File.file?(path) }
12
+ else
13
+ Dir['{**/}{.*,*}'].select { |path| File.file?(path) }
14
+ end
15
+
16
+ filter_files = lambda { |paths|
17
+ case paths
18
+ when Array
19
+ (files & paths)
20
+ when String
21
+ (files & Dir[paths])
102
22
  end
103
- else
104
- s.add_dependency(%q<json_pure>, ["~> 1.4.0"])
105
- s.add_dependency(%q<mechanize>, ["~> 1.0.0"])
106
- s.add_dependency(%q<bundler>, ["~> 0.9.19"])
107
- s.add_dependency(%q<rake>, ["~> 0.8.7"])
108
- s.add_dependency(%q<jeweler>, ["~> 1.4.0"])
109
- s.add_dependency(%q<rspec>, ["~> 1.3.0"])
23
+ }
24
+
25
+ version = {
26
+ :file => 'lib/gscraper/version.rb',
27
+ :constant => 'GScraper::VERSION'
28
+ }
29
+
30
+ defaults = {
31
+ 'name' => File.basename(File.dirname(__FILE__)),
32
+ 'files' => files,
33
+ 'executables' => filter_files['bin/*'].map { |path| File.basename(path) },
34
+ 'test_files' => filter_files['{test/{**/}*_test.rb,spec/{**/}*_spec.rb}'],
35
+ 'extra_doc_files' => filter_files['*.{txt,rdoc,md,markdown,tt,textile}'],
36
+ }
37
+
38
+ metadata = defaults.merge(YAML.load_file('gemspec.yml'))
39
+
40
+ gemspec.name = metadata.fetch('name',defaults[:name])
41
+ gemspec.version = if metadata['version']
42
+ metadata['version']
43
+ elsif File.file?(version[:file])
44
+ require File.join('.',version[:file])
45
+ eval(version[:constant])
46
+ end
47
+
48
+ gemspec.summary = metadata.fetch('summary',metadata['description'])
49
+ gemspec.description = metadata.fetch('description',metadata['summary'])
50
+
51
+ case metadata['license']
52
+ when Array
53
+ gemspec.licenses = metadata['license']
54
+ when String
55
+ gemspec.license = metadata['license']
56
+ end
57
+
58
+ case metadata['authors']
59
+ when Array
60
+ gemspec.authors = metadata['authors']
61
+ when String
62
+ gemspec.author = metadata['authors']
63
+ end
64
+
65
+ gemspec.email = metadata['email']
66
+ gemspec.homepage = metadata['homepage']
67
+
68
+ case metadata['require_paths']
69
+ when Array
70
+ gemspec.require_paths = metadata['require_paths']
71
+ when String
72
+ gemspec.require_path = metadata['require_paths']
73
+ end
74
+
75
+ gemspec.files = filter_files[metadata['files']]
76
+
77
+ gemspec.executables = metadata['executables']
78
+ gemspec.extensions = metadata['extensions']
79
+
80
+ if Gem::VERSION < '1.7.'
81
+ gemspec.default_executable = gemspec.executables.first
110
82
  end
111
- end
112
83
 
84
+ gemspec.test_files = filter_files[metadata['test_files']]
85
+
86
+ unless gemspec.files.include?('.document')
87
+ gemspec.extra_rdoc_files = metadata['extra_doc_files']
88
+ end
89
+
90
+ gemspec.post_install_message = metadata['post_install_message']
91
+ gemspec.requirements = metadata['requirements']
92
+
93
+ if gemspec.respond_to?(:required_ruby_version=)
94
+ gemspec.required_ruby_version = metadata['required_ruby_version']
95
+ end
96
+
97
+ if gemspec.respond_to?(:required_rubygems_version=)
98
+ gemspec.required_rubygems_version = metadata['required_ruby_version']
99
+ end
100
+
101
+ parse_versions = lambda { |versions|
102
+ case versions
103
+ when Array
104
+ versions.map { |v| v.to_s }
105
+ when String
106
+ versions.split(/,\s*/)
107
+ end
108
+ }
109
+
110
+ if metadata['dependencies']
111
+ metadata['dependencies'].each do |name,versions|
112
+ gemspec.add_dependency(name,parse_versions[versions])
113
+ end
114
+ end
115
+
116
+ if metadata['runtime_dependencies']
117
+ metadata['runtime_dependencies'].each do |name,versions|
118
+ gemspec.add_runtime_dependency(name,parse_versions[versions])
119
+ end
120
+ end
121
+
122
+ if metadata['development_dependencies']
123
+ metadata['development_dependencies'].each do |name,versions|
124
+ gemspec.add_development_dependency(name,parse_versions[versions])
125
+ end
126
+ end
127
+ end
@@ -1,7 +1,7 @@
1
1
  #
2
2
  # GScraper - A web-scraping interface to various Google Services.
3
3
  #
4
- # Copyright (c) 2007-2009 Hal Brodigan (postmodern.mod3 at gmail.com)
4
+ # Copyright (c) 2007-2012 Hal Brodigan (postmodern.mod3 at gmail.com)
5
5
  #
6
6
  # This program is free software; you can redistribute it and/or modify
7
7
  # it under the terms of the GNU General Public License as published by
@@ -1,7 +1,7 @@
1
1
  #
2
2
  # GScraper - A web-scraping interface to various Google Services.
3
3
  #
4
- # Copyright (c) 2007-2009 Hal Brodigan (postmodern.mod3 at gmail.com)
4
+ # Copyright (c) 2007-2012 Hal Brodigan (postmodern.mod3 at gmail.com)
5
5
  #
6
6
  # This program is free software; you can redistribute it and/or modify
7
7
  # it under the terms of the GNU General Public License as published by
@@ -21,7 +21,6 @@
21
21
  require 'uri/http'
22
22
  require 'mechanize'
23
23
  require 'nokogiri'
24
- require 'open-uri'
25
24
 
26
25
  module GScraper
27
26
  # Common proxy port.
@@ -32,8 +31,13 @@ module GScraper
32
31
  #
33
32
  # @return [Hash]
34
33
  #
35
- def GScraper.proxy
36
- @@gscraper_proxy ||= {:host => nil, :port => COMMON_PROXY_PORT, :user => nil, :password => nil}
34
+ def self.proxy
35
+ @@gscraper_proxy ||= {
36
+ :host => nil,
37
+ :port => COMMON_PROXY_PORT,
38
+ :user => nil,
39
+ :password => nil
40
+ }
37
41
  end
38
42
 
39
43
  #
@@ -54,13 +58,13 @@ module GScraper
54
58
  # @option proxy_info [String] :password
55
59
  # The password to login with.
56
60
  #
57
- def GScraper.proxy_uri(proxy_info=GScraper.proxy)
58
- if GScraper.proxy[:host]
61
+ def self.proxy_uri(proxy=self.proxy)
62
+ if proxy[:host]
59
63
  return URI::HTTP.build(
60
- :host => GScraper.proxy[:host],
61
- :port => GScraper.proxy[:port],
62
- :userinfo => "#{GScraper.proxy[:user]}:#{GScraper.proxy[:password]}",
63
- :path => '/'
64
+ :host => proxy[:host],
65
+ :port => proxy[:port],
66
+ :userinfo => "#{proxy[:user]}:#{proxy[:password]}",
67
+ :path => '/'
64
68
  )
65
69
  end
66
70
  end
@@ -70,7 +74,7 @@ module GScraper
70
74
  #
71
75
  # @return [Array<String>]
72
76
  #
73
- def GScraper.user_agent_aliases
77
+ def self.user_agent_aliases
74
78
  Mechanize::AGENT_ALIASES
75
79
  end
76
80
 
@@ -79,8 +83,8 @@ module GScraper
79
83
  #
80
84
  # @return [String]
81
85
  #
82
- def GScraper.user_agent
83
- @@gscraper_user_agent ||= GScraper.user_agent_aliases['Windows IE 6']
86
+ def self.user_agent
87
+ @@gscraper_user_agent ||= self.user_agent_aliases['Windows IE 6']
84
88
  end
85
89
 
86
90
  #
@@ -92,7 +96,7 @@ module GScraper
92
96
  # @return [String]
93
97
  # The new User-Agent string.
94
98
  #
95
- def GScraper.user_agent=(agent)
99
+ def self.user_agent=(agent)
96
100
  @@gscraper_user_agent = agent
97
101
  end
98
102
 
@@ -105,8 +109,8 @@ module GScraper
105
109
  # @return [String]
106
110
  # The new User-Agent string.
107
111
  #
108
- def GScraper.user_agent_alias=(name)
109
- @@gscraper_user_agent = GScraper.user_agent_aliases[name.to_s]
112
+ def self.user_agent_alias=(name)
113
+ @@gscraper_user_agent = self.user_agent_aliases[name.to_s]
110
114
  end
111
115
 
112
116
  #
@@ -143,18 +147,18 @@ module GScraper
143
147
  # GScraper.web_agent(:user_agent_alias => 'Linux Mozilla')
144
148
  # GScraper.web_agent(:user_agent => 'Google Bot')
145
149
  #
146
- def GScraper.web_agent(options={})
150
+ def self.web_agent(options={})
147
151
  agent = Mechanize.new
148
152
 
149
153
  if options[:user_agent_alias]
150
154
  agent.user_agent_alias = options[:user_agent_alias]
151
155
  elsif options[:user_agent]
152
156
  agent.user_agent = options[:user_agent]
153
- elsif GScraper.user_agent
154
- agent.user_agent = GScraper.user_agent
157
+ elsif user_agent
158
+ agent.user_agent = self.user_agent
155
159
  end
156
160
 
157
- proxy = (options[:proxy] || GScraper.proxy)
161
+ proxy = (options[:proxy] || self.proxy)
158
162
  if proxy[:host]
159
163
  agent.set_proxy(proxy[:host],proxy[:port],proxy[:user],proxy[:password])
160
164
  end