alexrabarts-big_sitemap 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
data/History.txt CHANGED
@@ -1,3 +1,14 @@
1
+ === 0.3.0 / 2009-04-06
2
+
3
+ * API change: Pass model through as first argument to add method, e.g.sitemap.add(Posts, {:path => 'articles'})
4
+ * API change: Use Rails' polymorphic_url helper to generate URLs if Rails is being used
5
+ * API change: Only ping search engines when ping_search_engines is explicitly called
6
+ * Add support for passing options through to the model's find method, e.g. :conditions
7
+ * Allow base URL to be specified as a hash as well as a string
8
+ * Add support for changefreq and priority
9
+ * Pluralize sitemap model filenames
10
+ * GZipping may optionally be turned off
11
+
1
12
  === 0.2.1 / 2009-03-12
2
13
 
3
14
  * Normalize path arguments so it no longer matters whether a leading slash is used or not
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ (The MIT License)
2
+
3
+ Copyright (c) 2009 Stateless Systems (http://statelesssystems.com)
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ 'Software'), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
20
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
21
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
22
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc CHANGED
@@ -1,104 +1,110 @@
1
1
  = BigSitemap
2
2
 
3
- == DESCRIPTION
3
+ BigSitemap is a Sitemap (http://sitemaps.org) generator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries to minimize memory usage, can be set up with just a few lines of code and is compatible with just about any framework.
4
4
 
5
- BigSitemap is a Sitemap (http://sitemaps.org) generator specifically designed for large sites (although it works equally well with small sites). It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries so it doesn't take your site down, can be set up with just a few lines of code and is compatible with just about any framework.
5
+ BigSitemap is best run periodically through a Rake/Thor task.
6
6
 
7
- == INSTALL
7
+ sitemap = BigSitemap.new(:url_options => {:host => 'example.com'})
8
8
 
9
- Via git:
9
+ # Add a model
10
+ sitemap.add Product
10
11
 
11
- git clone git://github.com/alexrabarts/big_sitemap.git
12
+ # Add another model with some options
13
+ sitemap.add(Post, {
14
+ :conditions => {:published => true},
15
+ :path => 'articles',
16
+ :change_frequency => 'daily',
17
+ :priority => 0.5
18
+ })
12
19
 
13
- Via gem:
14
-
15
- gem install alexrabarts-big_sitemap -s http://gems.github.com
16
-
17
- == SYNOPSIS
20
+ # Generate the files
21
+ sitemap.generate
18
22
 
19
- The minimum required to generate a sitemap is:
23
+ The code above will create a minimum of three files:
20
24
 
21
- BigSitemap.new(:base_url => 'http://example.com').add(:model => MyModel, :path => 'my_controller').generate
25
+ 1. public/sitemaps/sitemap_index.xml.gz
26
+ 2. public/sitemaps/sitemap_products.xml.gz
27
+ 3. public/sitemaps/sitemap_posts.xml.gz
22
28
 
23
- You can put this in a rake/thor task and create a cron job to run it periodically. It should be enough for most Rails/Merb applications. You can add more models by further calls to the <code>add</code> method. Note that the methods are chainable, although you can call them on an instance variable if you prefer:
29
+ If your sitemaps grow beyond 50,000 URLs (this limit can be overridden with the <code>:max_per_sitemap</code> option), the sitemap files will be partitioned into multiple files (<code>sitemap_products_1.xml.gz</code>, <code>sitemap_products_2.xml.gz</code>, &#133;).
24
30
 
25
- sitemap = BigSitemap.new(:base_url => 'http://example.com')
26
- sitemap.add(:model => Posts, :path => 'articles')
27
- sitemap.add(:model => Comments, :path => 'comments')
28
- sitemap.generate
31
+ If you're using Rails then the URLs for each database record are generated with the <code>polymorphic_url</code> helper. That means that the URL for a record will be exactly what you would expect: generated with respect to the routing setup of your app. In other contexts where this helper isn't available, the URLs are generated in the form:
29
32
 
30
- === Find Methods
33
+ :base_url/:path/:to_param
31
34
 
32
- Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
35
+ If the <code>to_param</code> method does not exist, then <code>id</code> will be used.
33
36
 
34
- Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
37
+ == Install
35
38
 
36
- If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
39
+ Via gem:
37
40
 
38
- === URL Format
41
+ gem install alexrabarts-big_sitemap -s http://gems.github.com
39
42
 
40
- To generate the URLs, BigSitemap will combine the constructor arguments with the <code>to_param</code> method of each instance returned (provided by ActiveRecord but not DataMapper). If this method is not present, <code>id</code> will be used. The URL is constructed as:
43
+ == Advanced
41
44
 
42
- :base_url/:path/:to_param (if to_param exists)
43
- :base_url/:path/:id (if to_param does not exist)
45
+ === Options
44
46
 
45
- === Sitemap Location
47
+ * <code>:url_options</code> -- hash with <code>:host</code>, optionally <code>:port</code> and <code>:protocol</code>
48
+ * <code>:base_url</code> -- string alternative to <code>:url_options</code>, e.g. "https://example.com:8080/"
49
+ * <code>:document_root</code> -- string defaults to <code>Rails.root</code> or <code>Merb.root</code> if available
50
+ * <code>:path</code> -- string defaults to 'sitemaps', which places sitemap files under the <code>/sitemaps</code> directory
51
+ * <code>:max_per_sitemap</code> -- <code>50000</code>, which is the limit dictated by Google but can be less
52
+ * <code>:batch_size</code> -- <code>1001</code> (not <code>1000</code> due to a bug in DataMapper)
53
+ * <code>:gzip</code> -- <code>true</code>
54
+ * <code>:ping_google</code> -- <code>true</code>
55
+ * <code>:ping_yahoo</code> -- <code>false</code>, needs <code>:yahoo_app_id</code>
56
+ * <code>:ping_msn</code> -- <code>false</code>
57
+ * <code>:pink_ask</code> -- <code>false</code>
46
58
 
47
- BigSitemap knows about the document root of Rails and Merb. If you are using another framework then you can specify the document root with the <code>:document_root</code> option. e.g.:
59
+ === Chaining
48
60
 
49
- BigSitemap.new(:base_url => 'http://example.com', :document_root => "#{FOO_ROOT}/httpdocs")
61
+ You can chain methods together. You could even get away with as little code as:
50
62
 
51
- By default, the sitemap files are created under <code>/sitemaps</code>. You can modify this with the <code>:path</code> option:
63
+ BigSitemap.new(:url_options => {:host => 'example.com'}).add(Post).generate
52
64
 
53
- BigSitemap.new(:base_url => 'http://example.com', :path => 'google-sitemaps') # places Sitemaps under /google-sitemaps
65
+ === Pinging Search Engines
54
66
 
55
- === Cleaning the Sitemaps Directory
67
+ To ping search engines, call <code>ping_search_engines</code> after you generate the sitemap:
56
68
 
57
- Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
58
-
59
- === Maximum Number of URLs
60
-
61
- Sitemaps will be split across several files if more than 50,000 records are returned. You can customize this limit with the <code>:max_per_sitemap</code> option:
69
+ sitemap.generate
70
+ sitemap.ping_search_engines
62
71
 
63
- BigSitemap.new(:base_url => 'http://example.com', :max_per_sitemap => 1000) # Max of 1000 URLs per Sitemap
72
+ === Change Frequency and Priority
64
73
 
65
- === Batched Database Queries
74
+ You can control "changefreq" and "priority" values for each record individually by passing lambdas instead of fixed values:
66
75
 
67
- The database is queried in batches to prevent large SQL select statements from locking the database for too long. By default, the batch size is 1001 (not 1000 due to an obscure bug in DataMapper). You can customize the batch size with the <code>:batch_size</code> option:
76
+ sitemap.add(Posts,
77
+ :change_frequency => lambda {|post| ... },
78
+ :priority => lambda {|post| ... }
79
+ )
68
80
 
69
- BigSitemap.new(:base_url => 'http://example.com', :batch_size => 5000) # Database is queried in batches of 5,000
81
+ === Find Methods
70
82
 
71
- === Search Engine Notification
83
+ Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
72
84
 
73
- Google, Yahoo!, MSN and Ask are pinged once the Sitemap files are generated. You can turn one or more of these off:
85
+ Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
74
86
 
75
- BigSitemap.new(
76
- :base_url => 'http://example.com',
77
- :ping_google => false,
78
- :ping_yahoo => false,
79
- :ping_msn => false,
80
- :ping_ask => false
81
- )
87
+ If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
82
88
 
83
- You must provide an App ID in order to ping Yahoo! (more info at http://developer.yahoo.com/search/siteexplorer/V1/updateNotification.html):
89
+ === Cleaning the Sitemaps Directory
84
90
 
85
- BigSitemap.new(:base_url => 'http://example.com', :yahoo_app_id => 'myYahooAppId') # Yahoo! will now be pinged
91
+ Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
86
92
 
87
- == LIMITATIONS
93
+ == Limitations
88
94
 
89
95
  If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
90
96
 
91
97
  == TODO
92
98
 
93
- * Support for <code>priority</code>
94
- * Support for <code>changefreq</code> (currently hard-coded to <code>weekly</code>)
99
+ Tests for Rails components.
95
100
 
96
- == CREDITS
101
+ == Credits
97
102
 
98
103
  Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
99
104
  http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
100
105
 
101
- == COPYRIGHT
106
+ Thanks to Mislav Marohnić for contributing patches.
102
107
 
103
- Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
108
+ == Copyright
104
109
 
110
+ Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
data/VERSION.yml CHANGED
@@ -1,4 +1,4 @@
1
1
  ---
2
- :minor: 2
3
- :patch: 1
2
+ :patch: 0
4
3
  :major: 0
4
+ :minor: 3
data/lib/big_sitemap.rb CHANGED
@@ -1,82 +1,100 @@
1
- require 'net/http'
2
1
  require 'uri'
3
2
  require 'zlib'
4
3
  require 'builder'
5
4
  require 'extlib'
6
5
 
7
6
  class BigSitemap
7
+ DEFAULTS = {
8
+ :max_per_sitemap => 50000,
9
+ :batch_size => 1001,
10
+ :path => 'sitemaps',
11
+ :gzip => true,
12
+
13
+ # opinionated
14
+ :ping_google => true,
15
+ :ping_yahoo => false, # needs :yahoo_app_id
16
+ :ping_msn => false,
17
+ :ping_ask => false
18
+ }
19
+
20
+ COUNT_METHODS = [:count_for_sitemap, :count]
21
+ FIND_METHODS = [:find_for_sitemap, :all]
22
+ TIMESTAMP_METHODS = [:updated_at, :updated_on, :updated, :created_at, :created_on, :created]
23
+ PARAM_METHODS = [:to_param, :id]
24
+
25
+ include ActionController::UrlWriter if defined? Rails
26
+
8
27
  def initialize(options)
9
- document_root = options.delete(:document_root)
28
+ @options = DEFAULTS.merge options
29
+
30
+ # Use Rails' default_url_options if available
31
+ @default_url_options = defined?(Rails) ? default_url_options : {}
32
+
33
+ if @options[:url_options]
34
+ @default_url_options.update @options[:url_options]
35
+ elsif @options[:base_url]
36
+ uri = URI.parse(@options[:base_url])
37
+ @default_url_options[:host] = uri.host
38
+ @default_url_options[:port] = uri.port
39
+ @default_url_options[:protocol] = uri.scheme
40
+ else
41
+ raise ArgumentError, 'you must specify either ":url_options" hash or ":base_url" string'
42
+ end
43
+
44
+ if @options[:batch_size] > @options[:max_per_sitemap]
45
+ raise ArgumentError, '":batch_size" must be less than ":max_per_sitemap"'
46
+ end
10
47
 
11
- if document_root.nil?
12
- if defined? RAILS_ROOT
13
- document_root = "#{RAILS_ROOT}/public"
48
+ @options[:document_root] ||= begin
49
+ if defined? Rails
50
+ "#{Rails.root}/public"
14
51
  elsif defined? Merb
15
- document_root = "#{Merb.root}/public"
52
+ "#{Merb.root}/public"
16
53
  end
17
54
  end
18
55
 
19
- raise ArgumentError, 'Document root must be specified with the :document_root option' if document_root.nil?
20
-
21
- @base_url = options.delete(:base_url)
22
- @max_per_sitemap = options.delete(:max_per_sitemap) || 50000
23
- @batch_size = options.delete(:batch_size) || 1001 # TODO: Set this to 1000 once DM offset 37000 bug is fixed
24
- @web_path = strip_leading_slash(options.delete(:path) || 'sitemaps')
25
- @ping_google = options[:ping_google].nil? ? true : options.delete(:ping_google)
26
- @ping_yahoo = options[:ping_yahoo].nil? ? true : options.delete(:ping_yahoo)
27
- @yahoo_app_id = options.delete(:yahoo_app_id)
28
- @ping_msn = options[:ping_msn].nil? ? true : options.delete(:ping_msn)
29
- @ping_ask = options[:ping_ask].nil? ? true : options.delete(:ping_ask)
30
- @file_path = "#{document_root}/#{@web_path}"
31
- @sources = []
32
-
33
- raise ArgumentError, "Base URL must be specified with the :base_url option" if @base_url.nil?
34
-
35
- raise(
36
- ArgumentError,
37
- 'Batch size (:batch_size) must be less than or equal to maximum URLs per sitemap (:max_per_sitemap)'
38
- ) if @batch_size > @max_per_sitemap
56
+ unless @options[:document_root]
57
+ raise ArgumentError, 'Document root must be specified with the ":document_root" option'
58
+ end
39
59
 
60
+ @file_path = "#{@options[:document_root]}/#{strip_leading_slash(@options[:path])}"
40
61
  Dir.mkdir(@file_path) unless File.exists? @file_path
62
+
63
+ @sources = []
64
+ @sitemap_files = []
41
65
  end
42
66
 
43
- def add(options)
44
- raise ArgumentError, ':model and :path options must be provided' unless options[:model] && options[:path]
45
- @sources << options.update(:path => strip_leading_slash(options[:path]))
46
- self # Chainable
67
+ def add(model, options={})
68
+ options[:path] ||= Extlib::Inflection.tableize(model.to_s)
69
+ @sources << [model, options.dup]
70
+ return self
47
71
  end
48
72
 
49
73
  def clean
50
- unless @file_path.nil?
51
- Dir.foreach(@file_path) do |f|
52
- f = "#{@file_path}/#{f}"
53
- File.delete(f) if File.file?(f)
54
- end
74
+ Dir["#{@file_path}/sitemap_*.{xml,xml.gz}"].each do |file|
75
+ FileUtils.rm file
55
76
  end
56
- self # Chainable
77
+ return self
57
78
  end
58
79
 
59
80
  def generate
60
- @sources.each do |source|
61
- klass = source[:model]
81
+ for model, options in @sources
82
+ count_method = pick_method(model, COUNT_METHODS)
83
+ find_method = pick_method(model, FIND_METHODS)
84
+ raise ArgumentError, "#{model} must provide a count_for_sitemap class method" if count_method.nil?
85
+ raise ArgumentError, "#{model} must provide a find_for_sitemap class method" if find_method.nil?
62
86
 
63
- count_method = pick_method(klass, [:count_for_sitemap, :count])
64
- find_method = pick_method(klass, [:find_for_sitemap, :all])
65
- raise ArgumentError, "#{klass} must provide a count_for_sitemap class method" if count_method.nil?
66
- raise ArgumentError, "#{klass} must provide a find_for_sitemap class method" if find_method.nil?
67
-
68
- count = klass.send(count_method)
87
+ count = model.send(count_method)
69
88
  num_sitemaps = 1
70
89
  num_batches = 1
71
90
 
72
- if count > @batch_size
73
- num_batches = (count.to_f / @batch_size.to_f).ceil
74
- num_sitemaps = (count.to_f / @max_per_sitemap.to_f).ceil
91
+ if count > @options[:batch_size]
92
+ num_batches = (count.to_f / @options[:batch_size].to_f).ceil
93
+ num_sitemaps = (count.to_f / @options[:max_per_sitemap].to_f).ceil
75
94
  end
76
95
  batches_per_sitemap = num_batches.to_f / num_sitemaps.to_f
77
96
 
78
- # Update the @sources hash so that the index file knows how many sitemaps to link to
79
- source[:num_sitemaps] = num_sitemaps
97
+ find_options = options.dup
80
98
 
81
99
  for sitemap_num in 1..num_sitemaps
82
100
  # Work out the start and end batch numbers for this sitemap
@@ -84,126 +102,147 @@ class BigSitemap
84
102
  batch_num_end = (batch_num_start + [batches_per_sitemap, num_batches].min).floor - 1
85
103
 
86
104
  # Stream XML output to a file
87
- filename = "sitemap_#{Extlib::Inflection::underscore(klass.to_s)}"
105
+ filename = "sitemap_#{Extlib::Inflection::tableize(model.to_s)}"
88
106
  filename << "_#{sitemap_num}" if num_sitemaps > 1
89
107
 
90
- gz = gz_writer("#{filename}.xml.gz")
108
+ f = xml_open(filename)
91
109
 
92
- xml = Builder::XmlMarkup.new(:target => gz)
110
+ xml = Builder::XmlMarkup.new(:target => f)
93
111
  xml.instruct!
94
112
  xml.urlset(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') do
95
113
  for batch_num in batch_num_start..batch_num_end
96
- offset = ((batch_num - 1) * @batch_size)
97
- limit = (count - offset) < @batch_size ? (count - offset - 1) : @batch_size
98
- find_options = num_batches > 1 ? {:limit => limit, :offset => offset} : {}
99
-
100
- klass.send(find_method, find_options).each do |r|
101
- last_mod_method = pick_method(
102
- r,
103
- [:updated_at, :updated_on, :updated, :created_at, :created_on, :created]
104
- )
114
+ offset = ((batch_num - 1) * @options[:batch_size])
115
+ limit = (count - offset) < @options[:batch_size] ? (count - offset - 1) : @options[:batch_size]
116
+ find_options.update(:limit => limit, :offset => offset) if num_batches > 1
117
+
118
+ model.send(find_method, find_options).each do |r|
119
+ last_mod_method = pick_method(r, TIMESTAMP_METHODS)
105
120
  last_mod = last_mod_method.nil? ? Time.now : r.send(last_mod_method)
106
121
 
107
- param_method = pick_method(r, [:to_param, :id])
108
- raise ArgumentError, "#{klass} must provide a to_param instance method" if param_method.nil?
122
+ param_method = pick_method(r, PARAM_METHODS)
109
123
 
110
124
  xml.url do
111
- xml.loc("#{@base_url}/#{source[:path]}/#{r.send(param_method)}")
125
+ location = defined?(Rails) ?
126
+ polymorphic_url(r) :
127
+ "#{root_url}/#{strip_leading_slash(options[:path])}/#{r.send(param_method)}"
128
+ xml.loc(location)
129
+
112
130
  xml.lastmod(last_mod.strftime('%Y-%m-%d')) unless last_mod.nil?
113
- xml.changefreq('weekly')
131
+
132
+ change_frequency = options[:change_frequency] || 'weekly'
133
+ xml.changefreq(change_frequency.is_a?(Proc) ? change_frequency.call(r) : change_frequency)
134
+
135
+ priority = options[:priority]
136
+ unless priority.nil?
137
+ xml.priority(priority.is_a?(Proc) ? priority.call(r) : priority)
138
+ end
114
139
  end
115
140
  end
116
141
  end
117
142
  end
118
143
 
119
- gz.close
144
+ f.close
120
145
  end
121
146
 
122
147
  end
123
148
 
124
149
  generate_sitemap_index
125
- ping_search_engines
126
- self # Chainable
150
+
151
+ return self
127
152
  end
128
153
 
129
- private
130
- def strip_leading_slash(str)
131
- str.sub(/^\//, '')
154
+ def ping_search_engines
155
+ require 'net/http'
156
+ require 'cgi'
157
+
158
+ sitemap_uri = CGI::escape(url_for_sitemap(@sitemap_files.last))
159
+
160
+ if @options[:ping_google]
161
+ Net::HTTP.get('www.google.com', "/webmasters/tools/ping?sitemap=#{sitemap_uri}")
132
162
  end
133
163
 
134
- def pick_method(klass, candidates)
135
- method = nil
136
- candidates.each do |candidate|
137
- if klass.respond_to? candidate
138
- method = candidate
139
- break
140
- end
164
+ if @options[:ping_yahoo]
165
+ if @options[:yahoo_app_id]
166
+ Net::HTTP.get(
167
+ 'search.yahooapis.com', "/SiteExplorerService/V1/updateNotification?" +
168
+ "appid=#{@options[:yahoo_app_id]}&url=#{sitemap_uri}"
169
+ )
170
+ else
171
+ $stderr.puts 'unable to ping Yahoo: no ":yahoo_app_id" provided'
141
172
  end
142
- method
143
173
  end
144
174
 
145
- def gz_writer(filename)
146
- Zlib::GzipWriter.new(File.open("#{@file_path}/#{filename}", 'w+'))
175
+ if @options[:ping_msn]
176
+ Net::HTTP.get('webmaster.live.com', "/ping.aspx?siteMap=#{sitemap_uri}")
147
177
  end
148
178
 
149
- def sitemap_index_filename
150
- 'sitemap_index.xml.gz'
179
+ if @options[:pink_ask]
180
+ Net::HTTP.get('submissions.ask.com', "/ping?sitemap=#{sitemap_uri}")
151
181
  end
182
+ end
152
183
 
153
- # Create a sitemap index document
154
- def generate_sitemap_index
155
- xml = ''
156
- builder = Builder::XmlMarkup.new(:target => xml)
157
- builder.instruct!
158
- builder.sitemapindex(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') do
159
- @sources.each do |source|
160
- num_sitemaps = source[:num_sitemaps]
161
- for i in 1..num_sitemaps
162
- loc = "#{@base_url}/#{@web_path}/sitemap_#{Extlib::Inflection::underscore(source[:model].to_s)}"
163
- loc << "_#{i}" if num_sitemaps > 1
164
- loc << '.xml.gz'
165
-
166
- builder.sitemap do
167
- builder.loc(loc)
168
- builder.lastmod(Time.now.strftime('%Y-%m-%d'))
169
- end
170
- end
171
- end
172
- end
173
-
174
- gz = gz_writer(sitemap_index_filename)
175
- gz.write(xml)
176
- gz.close
184
+ def root_url
185
+ @root_url ||= begin
186
+ url = ''
187
+ url << (@default_url_options[:protocol] || 'http')
188
+ url << '://' unless url.match('://')
189
+ url << @default_url_options[:host]
190
+ url << ":#{port}" if port = @default_url_options[:port] and port != 80
177
191
  end
192
+ end
178
193
 
179
- def sitemap_uri
180
- URI.escape("#{@base_url}/#{@web_path}/#{sitemap_index_filename}")
181
- end
194
+ private
182
195
 
183
- # Notify Google of the new sitemap index file
184
- def ping_google
185
- Net::HTTP.get('www.google.com', "/webmasters/tools/ping?sitemap=#{sitemap_uri}")
186
- end
196
+ def strip_leading_slash(str)
197
+ str.sub(/^\//, '')
198
+ end
187
199
 
188
- # Notify Yahoo! of the new sitemap index file
189
- def ping_yahoo
190
- Net::HTTP.get('search.yahooapis.com', "/SiteExplorerService/V1/updateNotification?appid=#{@yahoo_app_id}&url=#{sitemap_uri}")
200
+ def pick_method(model, candidates)
201
+ method = nil
202
+ candidates.each do |candidate|
203
+ if model.respond_to? candidate
204
+ method = candidate
205
+ break
206
+ end
191
207
  end
208
+ method
209
+ end
192
210
 
193
- # Notify MSN of the new sitemap index file
194
- def ping_msn
195
- Net::HTTP.get('webmaster.live.com', "/ping.aspx?siteMap=#{sitemap_uri}")
196
- end
211
+ def xml_open(filename)
212
+ filename << '.xml'
213
+ filename << '.gz' if @options[:gzip]
197
214
 
198
- # Notify Ask of the new sitemap index file
199
- def ping_ask
200
- Net::HTTP.get('submissions.ask.com', "/ping?sitemap=#{sitemap_uri}")
215
+ file = File.open("#{@file_path}/#{filename}", 'w+')
216
+
217
+ @sitemap_files << file.path
218
+
219
+ writer = @options[:gzip] ? Zlib::GzipWriter.new(file) : file
220
+
221
+ if block_given?
222
+ yield writer
223
+ writer.close
201
224
  end
202
225
 
203
- def ping_search_engines
204
- ping_google if @ping_google
205
- ping_yahoo if @ping_yahoo && @yahoo_app_id
206
- ping_msn if @ping_msn
207
- ping_ask if @ping_ask
226
+ writer
227
+ end
228
+
229
+ def url_for_sitemap(path)
230
+ "#{root_url}/#{File.basename(path)}"
231
+ end
232
+
233
+ # Create a sitemap index document
234
+ def generate_sitemap_index
235
+ xml_open 'sitemap_index' do |file|
236
+ xml = Builder::XmlMarkup.new(:target => file)
237
+ xml.instruct!
238
+ xml.sitemapindex(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') do
239
+ for path in @sitemap_files[0..-2]
240
+ xml.sitemap do
241
+ xml.loc(url_for_sitemap(path))
242
+ xml.lastmod(Time.now.strftime('%Y-%m-%d'))
243
+ end
244
+ end
245
+ end
208
246
  end
209
- end
247
+ end
248
+ end
@@ -15,6 +15,14 @@ class BigSitemapTest < Test::Unit::TestCase
15
15
  assert_raise(ArgumentError) { BigSitemap.new(:document_root => tmp_dir) }
16
16
  end
17
17
 
18
+ should 'generate the same base URL' do
19
+ options = {:document_root => tmp_dir}
20
+ assert_equal(
21
+ BigSitemap.new(options.merge(:base_url => 'http://example.com')).root_url,
22
+ BigSitemap.new(options.merge(:url_options => {:host => 'example.com'})).root_url
23
+ )
24
+ end
25
+
18
26
  should 'generate a sitemap index file' do
19
27
  generate_sitemap_files
20
28
  assert File.exists?(sitemaps_index_file)
@@ -27,21 +35,14 @@ class BigSitemapTest < Test::Unit::TestCase
27
35
  assert File.exists?(single_sitemaps_model_file), "#{single_sitemaps_model_file} exists"
28
36
  end
29
37
 
30
- should 'generate exactly two sitemap model files' do
31
- generate_exactly_two_model_sitemap_files
38
+ should 'generate two sitemap model files' do
39
+ generate_two_model_sitemap_files
32
40
  assert File.exists?(first_sitemaps_model_file), "#{first_sitemaps_model_file} exists"
33
41
  assert File.exists?(second_sitemaps_model_file), "#{second_sitemaps_model_file} exists"
34
42
  third_sitemaps_model_file = "#{sitemaps_dir}/sitemap_test_model_3.xml.gz"
35
43
  assert !File.exists?(third_sitemaps_model_file), "#{third_sitemaps_model_file} does not exist"
36
44
  end
37
45
 
38
- should 'clean all sitemap files' do
39
- generate_sitemap_files
40
- assert Dir.entries(sitemaps_dir).size > 2, "#{sitemaps_dir} is not empty" # ['.', '..'].size == 2
41
- @sitemap.clean
42
- assert_equal 2, Dir.entries(sitemaps_dir).size, "#{sitemaps_dir} is empty"
43
- end
44
-
45
46
  context 'Sitemap index file' do
46
47
  should 'contain one sitemapindex element' do
47
48
  generate_sitemap_files
@@ -54,79 +55,125 @@ class BigSitemapTest < Test::Unit::TestCase
54
55
  end
55
56
 
56
57
  should 'contain one loc element' do
57
- generate_sitemap_files
58
+ generate_one_sitemap_model_file
58
59
  assert_equal 1, num_elements(sitemaps_index_file, 'loc')
59
60
  end
60
61
 
61
62
  should 'contain one lastmod element' do
62
- generate_sitemap_files
63
+ generate_one_sitemap_model_file
63
64
  assert_equal 1, num_elements(sitemaps_index_file, 'lastmod')
64
65
  end
65
66
 
66
67
  should 'contain two loc elements' do
67
- generate_exactly_two_model_sitemap_files
68
+ generate_two_model_sitemap_files
68
69
  assert_equal 2, num_elements(sitemaps_index_file, 'loc')
69
70
  end
70
71
 
71
72
  should 'contain two lastmod elements' do
72
- generate_exactly_two_model_sitemap_files
73
+ generate_two_model_sitemap_files
73
74
  assert_equal 2, num_elements(sitemaps_index_file, 'lastmod')
74
75
  end
76
+
77
+ should 'not be gzipped' do
78
+ generate_sitemap_files(:gzip => false)
79
+ assert File.exists?(unzipped_sitemaps_index_file)
80
+ end
75
81
  end
76
82
 
77
83
  context 'Sitemap model file' do
78
84
  should 'contain one urlset element' do
79
- generate_sitemap_files
85
+ generate_one_sitemap_model_file
80
86
  assert_equal 1, num_elements(single_sitemaps_model_file, 'urlset')
81
87
  end
82
88
 
83
89
  should 'contain several loc elements' do
84
- generate_sitemap_files
90
+ generate_one_sitemap_model_file
85
91
  assert_equal default_num_items, num_elements(single_sitemaps_model_file, 'loc')
86
92
  end
87
93
 
88
94
  should 'contain several lastmod elements' do
89
- generate_sitemap_files
95
+ generate_one_sitemap_model_file
90
96
  assert_equal default_num_items, num_elements(single_sitemaps_model_file, 'lastmod')
91
97
  end
92
98
 
93
99
  should 'contain several changefreq elements' do
94
- generate_sitemap_files
100
+ generate_one_sitemap_model_file
95
101
  assert_equal default_num_items, num_elements(single_sitemaps_model_file, 'changefreq')
96
102
  end
97
103
 
104
+ should 'contain several priority elements' do
105
+ generate_one_sitemap_model_file(:priority => 0.2)
106
+ assert_equal default_num_items, num_elements(single_sitemaps_model_file, 'priority')
107
+ end
108
+
109
+ should 'have a change frequency of weekly by default' do
110
+ generate_one_sitemap_model_file
111
+ assert_equal 'weekly', elements(single_sitemaps_model_file, 'changefreq').first.text
112
+ end
113
+
114
+ should 'have a change frequency of daily' do
115
+ generate_one_sitemap_model_file(:change_frequency => 'daily')
116
+ assert_equal 'daily', elements(single_sitemaps_model_file, 'changefreq').first.text
117
+ end
118
+
119
+ should 'be able to use a lambda to specify change frequency' do
120
+ generate_one_sitemap_model_file(:change_frequency => lambda {|m| m.change_frequency})
121
+ assert_equal TestModel.new.change_frequency, elements(single_sitemaps_model_file, 'changefreq').first.text
122
+ end
123
+
124
+ should 'have a priority of 0.2' do
125
+ generate_one_sitemap_model_file(:priority => 0.2)
126
+ assert_equal '0.2', elements(single_sitemaps_model_file, 'priority').first.text
127
+ end
128
+
129
+ should 'be able to use a lambda to specify priority' do
130
+ generate_one_sitemap_model_file(:priority => lambda {|m| m.priority})
131
+ assert_equal TestModel.new.priority.to_s, elements(single_sitemaps_model_file, 'priority').first.text
132
+ end
133
+
98
134
  should 'contain one loc element' do
99
- generate_exactly_two_model_sitemap_files
135
+ generate_two_model_sitemap_files
100
136
  assert_equal 1, num_elements(first_sitemaps_model_file, 'loc')
101
137
  assert_equal 1, num_elements(second_sitemaps_model_file, 'loc')
102
138
  end
103
139
 
104
140
  should 'contain one lastmod element' do
105
- generate_exactly_two_model_sitemap_files
141
+ generate_two_model_sitemap_files
106
142
  assert_equal 1, num_elements(first_sitemaps_model_file, 'lastmod')
107
143
  assert_equal 1, num_elements(second_sitemaps_model_file, 'lastmod')
108
144
  end
109
145
 
110
146
  should 'contain one changefreq element' do
111
- generate_exactly_two_model_sitemap_files
147
+ generate_two_model_sitemap_files
112
148
  assert_equal 1, num_elements(first_sitemaps_model_file, 'changefreq')
113
149
  assert_equal 1, num_elements(second_sitemaps_model_file, 'changefreq')
114
150
  end
115
151
 
152
+ should 'contain one priority element' do
153
+ generate_two_model_sitemap_files(:priority => 0.2)
154
+ assert_equal 1, num_elements(first_sitemaps_model_file, 'priority')
155
+ assert_equal 1, num_elements(second_sitemaps_model_file, 'priority')
156
+ end
157
+
116
158
  should 'strip leading slashes from controller paths' do
117
159
  create_sitemap
118
- @sitemap.add(:model => TestModel, :path => '/test_controller').generate
160
+ @sitemap.add(TestModel, :path => '/test_controller').generate
119
161
  assert(
120
162
  !elements(single_sitemaps_model_file, 'loc').first.text.match(/\/\/test_controller\//),
121
163
  'URL does not contain a double-slash before the controller path'
122
164
  )
123
165
  end
166
+
167
+ should 'not be gzipped' do
168
+ generate_one_sitemap_model_file(:gzip => false)
169
+ assert File.exists?(unzipped_single_sitemaps_model_file)
170
+ end
124
171
  end
125
172
 
126
173
  context 'add method' do
127
174
  should 'be chainable' do
128
175
  create_sitemap
129
- assert_equal BigSitemap, @sitemap.add(:model => TestModel, :path => 'test_controller').class
176
+ assert_equal BigSitemap, @sitemap.add(TestModel).class
130
177
  end
131
178
  end
132
179
 
@@ -135,6 +182,13 @@ class BigSitemapTest < Test::Unit::TestCase
135
182
  create_sitemap
136
183
  assert_equal BigSitemap, @sitemap.clean.class
137
184
  end
185
+
186
+ should 'clean all sitemap files' do
187
+ generate_sitemap_files
188
+ assert Dir.entries(sitemaps_dir).size > 2, "#{sitemaps_dir} is not empty" # ['.', '..'].size == 2
189
+ @sitemap.clean
190
+ assert_equal 2, Dir.entries(sitemaps_dir).size, "#{sitemaps_dir} is empty"
191
+ end
138
192
  end
139
193
 
140
194
  context 'generate method' do
@@ -157,22 +211,32 @@ class BigSitemapTest < Test::Unit::TestCase
157
211
  }.update(options))
158
212
  end
159
213
 
160
- def generate_sitemap_files
161
- create_sitemap
214
+ def generate_sitemap_files(options={})
215
+ create_sitemap(options)
162
216
  add_model
163
217
  @sitemap.generate
164
218
  end
165
219
 
166
- def generate_exactly_two_model_sitemap_files
167
- create_sitemap(:max_per_sitemap => 1, :batch_size => 1)
168
- add_model(:num_items => 2)
220
+ def generate_one_sitemap_model_file(options={})
221
+ change_frequency = options.delete(:change_frequency)
222
+ priority = options.delete(:priority)
223
+ create_sitemap(options.merge(:max_per_sitemap => default_num_items, :batch_size => default_num_items))
224
+ add_model(:change_frequency => change_frequency, :priority => priority)
225
+ @sitemap.generate
226
+ end
227
+
228
+ def generate_two_model_sitemap_files(options={})
229
+ change_frequency = options.delete(:change_frequency)
230
+ priority = options.delete(:priority)
231
+ create_sitemap(options.merge(:max_per_sitemap => 1, :batch_size => 1))
232
+ add_model(:num_items => 2, :change_frequency => change_frequency, :priority => priority)
169
233
  @sitemap.generate
170
234
  end
171
235
 
172
236
  def add_model(options={})
173
237
  num_items = options.delete(:num_items) || default_num_items
174
238
  TestModel.stubs(:num_items).returns(num_items)
175
- @sitemap.add({:model => TestModel, :path => 'test_controller'}.update(options))
239
+ @sitemap.add(TestModel, options)
176
240
  end
177
241
 
178
242
  def default_num_items
@@ -180,19 +244,27 @@ class BigSitemapTest < Test::Unit::TestCase
180
244
  end
181
245
 
182
246
  def sitemaps_index_file
183
- "#{sitemaps_dir}/sitemap_index.xml.gz"
247
+ "#{unzipped_sitemaps_index_file}.gz"
248
+ end
249
+
250
+ def unzipped_sitemaps_index_file
251
+ "#{sitemaps_dir}/sitemap_index.xml"
184
252
  end
185
253
 
186
254
  def single_sitemaps_model_file
187
- "#{sitemaps_dir}/sitemap_test_model.xml.gz"
255
+ "#{unzipped_single_sitemaps_model_file}.gz"
256
+ end
257
+
258
+ def unzipped_single_sitemaps_model_file
259
+ "#{sitemaps_dir}/sitemap_test_models.xml"
188
260
  end
189
261
 
190
262
  def first_sitemaps_model_file
191
- "#{sitemaps_dir}/sitemap_test_model_1.xml.gz"
263
+ "#{sitemaps_dir}/sitemap_test_models_1.xml.gz"
192
264
  end
193
265
 
194
266
  def second_sitemaps_model_file
195
- "#{sitemaps_dir}/sitemap_test_model_2.xml.gz"
267
+ "#{sitemaps_dir}/sitemap_test_models_2.xml.gz"
196
268
  end
197
269
 
198
270
  def sitemaps_dir
@@ -215,4 +287,4 @@ class BigSitemapTest < Test::Unit::TestCase
215
287
  def num_elements(filename, el)
216
288
  elements(filename, el).size
217
289
  end
218
- end
290
+ end
@@ -3,6 +3,14 @@ class TestModel
3
3
  object_id
4
4
  end
5
5
 
6
+ def change_frequency
7
+ 'monthly'
8
+ end
9
+
10
+ def priority
11
+ 0.8
12
+ end
13
+
6
14
  class << self
7
15
  def count_for_sitemap
8
16
  self.find_for_sitemap.size
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: alexrabarts-big_sitemap
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Alex Rabarts
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-03-12 00:00:00 -07:00
12
+ date: 2009-04-06 00:00:00 -07:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -38,8 +38,9 @@ executables: []
38
38
 
39
39
  extensions: []
40
40
 
41
- extra_rdoc_files: []
42
-
41
+ extra_rdoc_files:
42
+ - README.rdoc
43
+ - LICENSE
43
44
  files:
44
45
  - History.txt
45
46
  - README.rdoc
@@ -49,6 +50,7 @@ files:
49
50
  - test/fixtures
50
51
  - test/fixtures/test_model.rb
51
52
  - test/test_helper.rb
53
+ - LICENSE
52
54
  has_rdoc: true
53
55
  homepage: http://github.com/alexrabarts/big_sitemap
54
56
  post_install_message: