big_sitemap 0.5.1

Sign up to get free protection for your applications and to get access to all the features.
data/History.txt ADDED
@@ -0,0 +1,62 @@
1
+ === 0.5.1 / 2009-09-07
2
+
3
+ * Fixes an issue with the :last_modified key being passed into the find method options
4
+
5
+ === 0.5.0 / 2009-09-07
6
+
7
+ * Add support for lambdas when specifying lastmod
8
+
9
+ === 0.4.0 / 2009-08-09
10
+
11
+ * Use Bing instead of Live/MSN. Note, this breaks backwards compatibility as
12
+ the old :ping_msn option is now :ping_bing.
13
+
14
+ === 0.3.5 / 2009-08-05
15
+
16
+ * Fixed bugs in root_url generation and url_for_sitemap generation
17
+
18
+ === 0.3.4 / 2009-07-02
19
+
20
+ * BigSitemap-specific options are no longer passed through to the ORM's find method
21
+
22
+ === 0.3.2 / 2009-06-09
23
+
24
+ * Better handling of URLs when Rails' polymorphic_url isn't available in the model
25
+
26
+ === 0.3.2 / 2009-06-09
27
+
28
+ * Fixes "uninitialized constant ActionController" error
29
+ * Fixes "Unknown key(s): path" error
30
+
31
+ === 0.3.1 / 2009-04-18
32
+
33
+ * Fixes broken gemspec
34
+
35
+ === 0.3.0 / 2009-04-06
36
+
37
+ * API change: Pass model through as first argument to add method, e.g.sitemap.add(Posts, {:path => 'articles'})
38
+ * API change: Use Rails' polymorphic_url helper to generate URLs if Rails is being used
39
+ * API change: Only ping search engines when ping_search_engines is explicitly called
40
+ * Add support for passing options through to the model's find method, e.g. :conditions
41
+ * Allow base URL to be specified as a hash as well as a string
42
+ * Add support for changefreq and priority
43
+ * Pluralize sitemap model filenames
44
+ * GZipping may optionally be turned off
45
+
46
+ === 0.2.1 / 2009-03-12
47
+
48
+ * Normalize path arguments so it no longer matters whether a leading slash is used or not
49
+
50
+ === 0.2.0 / 2009-03-11
51
+
52
+ * Methods are now chainable
53
+
54
+ === 0.1.4 / 2009-03-11
55
+
56
+ * Add clean method to clear out Sitemaps directory
57
+ * Make methods chainable
58
+
59
+ === 0.1.3 / 2009-03-10
60
+
61
+ * Initial release
62
+
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ (The MIT License)
2
+
3
+ Copyright (c) 2009 Stateless Systems (http://statelesssystems.com)
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ 'Software'), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
20
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
21
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
22
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,117 @@
1
+ = BigSitemap
2
+
3
+ BigSitemap is a Sitemap (http://sitemaps.org) generator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries to minimize memory usage, can be set up with just a few lines of code and is compatible with just about any framework.
4
+
5
+ BigSitemap is best run periodically through a Rake/Thor task.
6
+
7
+ require 'big_sitemap'
8
+
9
+ sitemap = BigSitemap.new(:url_options => {:host => 'example.com'})
10
+
11
+ # Add a model
12
+ sitemap.add Product
13
+
14
+ # Add another model with some options
15
+ sitemap.add(Post, {
16
+ :conditions => {:published => true},
17
+ :path => 'articles',
18
+ :change_frequency => 'daily',
19
+ :priority => 0.5
20
+ })
21
+
22
+ # Generate the files
23
+ sitemap.generate
24
+
25
+ The code above will create a minimum of three files:
26
+
27
+ 1. public/sitemaps/sitemap_index.xml.gz
28
+ 2. public/sitemaps/sitemap_products.xml.gz
29
+ 3. public/sitemaps/sitemap_posts.xml.gz
30
+
31
+ If your sitemaps grow beyond 50,000 URLs (this limit can be overridden with the <code>:max_per_sitemap</code> option), the sitemap files will be partitioned into multiple files (<code>sitemap_products_1.xml.gz</code>, <code>sitemap_products_2.xml.gz</code>, ...).
32
+
33
+ If you're using Rails then the URLs for each database record are generated with the <code>polymorphic_url</code> helper. That means that the URL for a record will be exactly what you would expect: generated with respect to the routing setup of your app. In other contexts where this helper isn't available, the URLs are generated in the form:
34
+
35
+ :base_url/:path/:to_param
36
+
37
+ If the <code>to_param</code> method does not exist, then <code>id</code> will be used.
38
+
39
+ == Install
40
+
41
+ Via gem:
42
+
43
+ sudo gem install alexrabarts-big_sitemap -s http://gems.github.com
44
+
45
+ == Advanced
46
+
47
+ === Options
48
+
49
+ * <code>:url_options</code> -- hash with <code>:host</code>, optionally <code>:port</code> and <code>:protocol</code>
50
+ * <code>:base_url</code> -- string alternative to <code>:url_options</code>, e.g. "https://example.com:8080/"
51
+ * <code>:document_root</code> -- string defaults to <code>Rails.root</code> or <code>Merb.root</code> if available
52
+ * <code>:path</code> -- string defaults to 'sitemaps', which places sitemap files under the <code>/sitemaps</code> directory
53
+ * <code>:max_per_sitemap</code> -- <code>50000</code>, which is the limit dictated by Google but can be less
54
+ * <code>:batch_size</code> -- <code>1001</code> (not <code>1000</code> due to a bug in DataMapper)
55
+ * <code>:gzip</code> -- <code>true</code>
56
+ * <code>:ping_google</code> -- <code>true</code>
57
+ * <code>:ping_yahoo</code> -- <code>false</code>, needs <code>:yahoo_app_id</code>
58
+ * <code>:ping_bing</code> -- <code>false</code>
59
+ * <code>:ping_ask</code> -- <code>false</code>
60
+
61
+ === Chaining
62
+
63
+ You can chain methods together. You could even get away with as little code as:
64
+
65
+ BigSitemap.new(:url_options => {:host => 'example.com'}).add(Post).generate
66
+
67
+ === Pinging Search Engines
68
+
69
+ To ping search engines, call <code>ping_search_engines</code> after you generate the sitemap:
70
+
71
+ sitemap.generate
72
+ sitemap.ping_search_engines
73
+
74
+ === Change Frequency, Priority and Last Modified
75
+
76
+ You can control "changefreq", "priority" and "lastmod" values for each record individually by passing lambdas instead of fixed values:
77
+
78
+ sitemap.add(Posts,
79
+ :change_frequency => lambda {|post| ... },
80
+ :priority => lambda {|post| ... },
81
+ :last_modified => lambda {|post| ... }
82
+ )
83
+
84
+ === Find Methods
85
+
86
+ Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
87
+
88
+ Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
89
+
90
+ If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
91
+
92
+ === Cleaning the Sitemaps Directory
93
+
94
+ Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
95
+
96
+ == Limitations
97
+
98
+ If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
99
+
100
+ == TODO
101
+
102
+ Tests for Rails components.
103
+
104
+ == Credits
105
+
106
+ Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
107
+ http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
108
+
109
+ Thanks to those who have contributed patches:
110
+
111
+ * Mislav Marohnić
112
+ * Jeff Schoolcraft
113
+ * Dalibor Nasevic
114
+
115
+ == Copyright
116
+
117
+ Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
data/VERSION.yml ADDED
@@ -0,0 +1,4 @@
1
+ ---
2
+ :patch: 1
3
+ :major: 0
4
+ :minor: 5
@@ -0,0 +1,239 @@
1
+ require 'uri'
2
+ require 'big_sitemap/builder'
3
+ require 'extlib'
4
+ require 'action_controller' if defined? Rails
5
+
6
+ class BigSitemap
7
+ DEFAULTS = {
8
+ :max_per_sitemap => Builder::MAX_URLS,
9
+ :batch_size => 1001,
10
+ :path => 'sitemaps',
11
+ :gzip => true,
12
+
13
+ # opinionated
14
+ :ping_google => true,
15
+ :ping_yahoo => false, # needs :yahoo_app_id
16
+ :ping_bing => false,
17
+ :ping_ask => false
18
+ }
19
+
20
+ COUNT_METHODS = [:count_for_sitemap, :count]
21
+ FIND_METHODS = [:find_for_sitemap, :all]
22
+ TIMESTAMP_METHODS = [:updated_at, :updated_on, :updated, :created_at, :created_on, :created]
23
+ PARAM_METHODS = [:to_param, :id]
24
+
25
+ include ActionController::UrlWriter if defined? Rails
26
+
27
+ def initialize(options)
28
+ @options = DEFAULTS.merge options
29
+
30
+ # Use Rails' default_url_options if available
31
+ @default_url_options = defined?(Rails) ? default_url_options : {}
32
+
33
+ if @options[:max_per_sitemap] <= 1
34
+ raise ArgumentError, '":max_per_sitemap" must be greater than 1'
35
+ end
36
+
37
+ if @options[:url_options]
38
+ @default_url_options.update @options[:url_options]
39
+ elsif @options[:base_url]
40
+ uri = URI.parse(@options[:base_url])
41
+ @default_url_options[:host] = uri.host
42
+ @default_url_options[:port] = uri.port
43
+ @default_url_options[:protocol] = uri.scheme
44
+ else
45
+ raise ArgumentError, 'you must specify either ":url_options" hash or ":base_url" string'
46
+ end
47
+
48
+ if @options[:batch_size] > @options[:max_per_sitemap]
49
+ raise ArgumentError, '":batch_size" must be less than ":max_per_sitemap"'
50
+ end
51
+
52
+ @options[:document_root] ||= begin
53
+ if defined? Rails
54
+ "#{Rails.root}/public"
55
+ elsif defined? Merb
56
+ "#{Merb.root}/public"
57
+ end
58
+ end
59
+
60
+ unless @options[:document_root]
61
+ raise ArgumentError, 'Document root must be specified with the ":document_root" option'
62
+ end
63
+
64
+ @file_path = "#{@options[:document_root]}/#{strip_leading_slash(@options[:path])}"
65
+ Dir.mkdir(@file_path) unless File.exists? @file_path
66
+
67
+ @sources = []
68
+ @sitemap_files = []
69
+ end
70
+
71
+ def add(model, options={})
72
+ options[:path] ||= Extlib::Inflection.tableize(model.to_s)
73
+ @sources << [model, options.dup]
74
+ return self
75
+ end
76
+
77
+ def clean
78
+ Dir["#{@file_path}/sitemap_*.{xml,xml.gz}"].each do |file|
79
+ FileUtils.rm file
80
+ end
81
+ return self
82
+ end
83
+
84
+ def generate
85
+ for model, options in @sources
86
+ with_sitemap(Extlib::Inflection::tableize(model.to_s)) do |sitemap|
87
+ count_method = pick_method(model, COUNT_METHODS)
88
+ find_method = pick_method(model, FIND_METHODS)
89
+ raise ArgumentError, "#{model} must provide a count_for_sitemap class method" if count_method.nil?
90
+ raise ArgumentError, "#{model} must provide a find_for_sitemap class method" if find_method.nil?
91
+
92
+ count = model.send(count_method)
93
+ num_sitemaps = 1
94
+ num_batches = 1
95
+
96
+ if count > @options[:batch_size]
97
+ num_batches = (count.to_f / @options[:batch_size].to_f).ceil
98
+ num_sitemaps = (count.to_f / @options[:max_per_sitemap].to_f).ceil
99
+ end
100
+ batches_per_sitemap = num_batches.to_f / num_sitemaps.to_f
101
+
102
+ find_options = options.except(:path, :num_items, :priority, :change_frequency, :last_modified)
103
+
104
+ for sitemap_num in 1..num_sitemaps
105
+ # Work out the start and end batch numbers for this sitemap
106
+ batch_num_start = sitemap_num == 1 ? 1 : ((sitemap_num * batches_per_sitemap).ceil - batches_per_sitemap + 1).to_i
107
+ batch_num_end = (batch_num_start + [batches_per_sitemap, num_batches].min).floor - 1
108
+
109
+ for batch_num in batch_num_start..batch_num_end
110
+ offset = ((batch_num - 1) * @options[:batch_size])
111
+ limit = (count - offset) < @options[:batch_size] ? (count - offset - 1) : @options[:batch_size]
112
+ find_options.update(:limit => limit, :offset => offset) if num_batches > 1
113
+
114
+ model.send(find_method, find_options).each do |record|
115
+ last_mod = options[:last_modified]
116
+ if last_mod.is_a?(Proc)
117
+ last_mod = last_mod.call(record)
118
+ elsif last_mod.nil?
119
+ last_mod_method = pick_method(record, TIMESTAMP_METHODS)
120
+ last_mod = last_mod_method.nil? ? Time.now : record.send(last_mod_method)
121
+ end
122
+
123
+ param_method = pick_method(record, PARAM_METHODS)
124
+
125
+ location = defined?(Rails) ? polymorphic_url(record) : nil rescue nil
126
+ location ||= "#{root_url}/#{strip_leading_slash(options[:path])}/#{record.send(param_method)}"
127
+
128
+ change_frequency = options[:change_frequency] || 'weekly'
129
+ freq = change_frequency.is_a?(Proc) ? change_frequency.call(record) : change_frequency
130
+
131
+ priority = options[:priority]
132
+ pri = priority.is_a?(Proc) ? priority.call(record) : priority
133
+
134
+ sitemap.add_url!(location, last_mod, freq, pri)
135
+ end
136
+ end
137
+ end
138
+ end
139
+ end
140
+
141
+ generate_sitemap_index
142
+
143
+ return self
144
+ end
145
+
146
+ def ping_search_engines
147
+ require 'net/http'
148
+ require 'cgi'
149
+
150
+ sitemap_uri = CGI::escape(url_for_sitemap(@sitemap_files.last))
151
+
152
+ if @options[:ping_google]
153
+ Net::HTTP.get('www.google.com', "/webmasters/tools/ping?sitemap=#{sitemap_uri}")
154
+ end
155
+
156
+ if @options[:ping_yahoo]
157
+ if @options[:yahoo_app_id]
158
+ Net::HTTP.get(
159
+ 'search.yahooapis.com', "/SiteExplorerService/V1/updateNotification?" +
160
+ "appid=#{@options[:yahoo_app_id]}&url=#{sitemap_uri}"
161
+ )
162
+ else
163
+ $stderr.puts 'unable to ping Yahoo: no ":yahoo_app_id" provided'
164
+ end
165
+ end
166
+
167
+ if @options[:ping_bing]
168
+ Net::HTTP.get('www.bing.com', "/webmaster/ping.aspx?siteMap=#{sitemap_uri}")
169
+ end
170
+
171
+ if @options[:ping_ask]
172
+ Net::HTTP.get('submissions.ask.com', "/ping?sitemap=#{sitemap_uri}")
173
+ end
174
+ end
175
+
176
+ def root_url
177
+ @root_url ||= begin
178
+ url = ''
179
+ url << (@default_url_options[:protocol] || 'http')
180
+ url << '://' unless url.match('://')
181
+ url << @default_url_options[:host]
182
+ url << ":#{port}" if port = @default_url_options[:port] and port != 80
183
+ url
184
+ end
185
+ end
186
+
187
+ private
188
+
189
+ def with_sitemap(name, options={})
190
+ options[:index] = name == 'index'
191
+ options[:filename] = "#{@file_path}/sitemap_#{name}"
192
+ options[:max_urls] = @options[:max_per_sitemap]
193
+
194
+ unless options[:gzip] = @options[:gzip]
195
+ options[:indent] = 2
196
+ end
197
+
198
+ sitemap = Builder.new(options)
199
+
200
+ begin
201
+ yield sitemap
202
+ ensure
203
+ sitemap.close!
204
+ @sitemap_files.concat sitemap.paths!
205
+ end
206
+ end
207
+
208
+ def strip_leading_slash(str)
209
+ str.sub(/^\//, '')
210
+ end
211
+
212
+ def pick_method(model, candidates)
213
+ method = nil
214
+ candidates.each do |candidate|
215
+ if model.respond_to? candidate
216
+ method = candidate
217
+ break
218
+ end
219
+ end
220
+ method
221
+ end
222
+
223
+ def url_for_sitemap(path)
224
+ if @options[:path].blank?
225
+ "#{root_url}/#{File.basename(path)}"
226
+ else
227
+ "#{root_url}/#{@options[:path]}/#{File.basename(path)}"
228
+ end
229
+ end
230
+
231
+ # Create a sitemap index document
232
+ def generate_sitemap_index
233
+ with_sitemap 'index' do |sitemap|
234
+ for path in @sitemap_files
235
+ sitemap.add_url!(url_for_sitemap(path), File.stat(path).mtime)
236
+ end
237
+ end
238
+ end
239
+ end
@@ -0,0 +1,124 @@
1
+ require 'builder'
2
+ require 'zlib'
3
+
4
+ class BigSitemap
5
+ class Builder < Builder::XmlMarkup
6
+ NAMESPACE = 'http://www.sitemaps.org/schemas/sitemap/0.9'
7
+ MAX_URLS = 50000
8
+
9
+ def initialize(options)
10
+ @gzip = options.delete(:gzip)
11
+ @max_urls = options.delete(:max_urls) || MAX_URLS
12
+ @index = options.delete(:index)
13
+ @paths = []
14
+ @parts = 0
15
+
16
+ if @filename = options.delete(:filename)
17
+ options[:target] = _get_writer
18
+ end
19
+
20
+ super(options)
21
+
22
+ @opened_tags = []
23
+ _init_document
24
+ end
25
+
26
+ def add_url!(url, time = nil, frequency = nil, priority = nil)
27
+ _rotate if @max_urls == @urls
28
+
29
+ tag!(@index ? 'sitemap' : 'url') do
30
+ loc url
31
+ # W3C format is the subset of ISO 8601
32
+ lastmod(time.utc.strftime('%Y-%m-%dT%H:%M:%S+00:00')) unless time.nil?
33
+ changefreq(frequency) unless frequency.nil?
34
+ priority(priority) unless priority.nil?
35
+ end
36
+ @urls += 1
37
+ end
38
+
39
+ def close!
40
+ _close_document
41
+ target!.close if target!.respond_to?(:close)
42
+ end
43
+
44
+ def paths!
45
+ @paths
46
+ end
47
+
48
+ private
49
+
50
+ def _get_writer
51
+ if @filename
52
+ filename = @filename.dup
53
+ filename << "_#{@parts}" if @parts > 0
54
+ filename << '.xml'
55
+ filename << '.gz' if @gzip
56
+ _open_writer(filename)
57
+ else
58
+ target!
59
+ end
60
+ end
61
+
62
+ def _open_writer(filename)
63
+ file = File.open(filename, 'w+')
64
+ @paths << filename
65
+ @gzip ? Zlib::GzipWriter.new(file) : file
66
+ end
67
+
68
+ def _init_document
69
+ @urls = 0
70
+ instruct!
71
+ _open_tag(@index ? 'sitemapindex' : 'urlset', :xmlns => NAMESPACE)
72
+ end
73
+
74
+ def _rotate
75
+ # write out the current document and start writing into a new file
76
+ close!
77
+ @parts += 1
78
+ @target = _get_writer
79
+ _init_document
80
+ end
81
+
82
+ # add support for:
83
+ # xml.open_foo!(attrs)
84
+ # xml.close_foo!
85
+ def method_missing(method, *args, &block)
86
+ if method.to_s =~ /^(open|close)_(.+)!$/
87
+ operation, name = $1, $2
88
+ name = "#{name}:#{args.shift}" if Symbol === args.first
89
+
90
+ if 'open' == operation
91
+ _open_tag(name, args.first)
92
+ else
93
+ _close_tag(name)
94
+ end
95
+ else
96
+ super
97
+ end
98
+ end
99
+
100
+ # opens a tag, bumps up level but doesn't require a block
101
+ def _open_tag(name, attrs)
102
+ _indent
103
+ _start_tag(name, attrs)
104
+ _newline
105
+ @level += 1
106
+ @opened_tags << name
107
+ end
108
+
109
+ # closes a tag block by decreasing the level and inserting a close tag
110
+ def _close_tag(name)
111
+ @opened_tags.pop
112
+ @level -= 1
113
+ _indent
114
+ _end_tag(name)
115
+ _newline
116
+ end
117
+
118
+ def _close_document
119
+ for name in @opened_tags.reverse
120
+ _close_tag(name)
121
+ end
122
+ end
123
+ end
124
+ end
@@ -0,0 +1,294 @@
1
+ require File.dirname(__FILE__) + '/test_helper'
2
+ require 'nokogiri'
3
+
4
+ class BigSitemapTest < Test::Unit::TestCase
5
+ def setup
6
+ delete_tmp_files
7
+ end
8
+
9
+ def teardown
10
+ delete_tmp_files
11
+ end
12
+
13
+ should 'raise an error if the :base_url option is not specified' do
14
+ assert_nothing_raised { BigSitemap.new(:base_url => 'http://example.com', :document_root => tmp_dir) }
15
+ assert_raise(ArgumentError) { BigSitemap.new(:document_root => tmp_dir) }
16
+ end
17
+
18
+ should 'generate the same base URL' do
19
+ options = {:document_root => tmp_dir}
20
+ assert_equal(
21
+ BigSitemap.new(options.merge(:base_url => 'http://example.com')).root_url,
22
+ BigSitemap.new(options.merge(:url_options => {:host => 'example.com'})).root_url
23
+ )
24
+ end
25
+
26
+ should 'generate a sitemap index file' do
27
+ generate_sitemap_files
28
+ assert File.exists?(sitemaps_index_file)
29
+ end
30
+
31
+ should 'generate a single sitemap model file' do
32
+ create_sitemap
33
+ add_model
34
+ @sitemap.generate
35
+ assert File.exists?(first_sitemaps_model_file), "#{first_sitemaps_model_file} exists"
36
+ end
37
+
38
+ should 'generate two sitemap model files' do
39
+ generate_two_model_sitemap_files
40
+ assert File.exists?(first_sitemaps_model_file), "#{first_sitemaps_model_file} exists"
41
+ assert File.exists?(second_sitemaps_model_file), "#{second_sitemaps_model_file} exists"
42
+ assert !File.exists?(third_sitemaps_model_file), "#{third_sitemaps_model_file} does not exist"
43
+ end
44
+
45
+ context 'Sitemap index file' do
46
+ should 'contain one sitemapindex element' do
47
+ generate_sitemap_files
48
+ assert_equal 1, num_elements(sitemaps_index_file, 'sitemapindex')
49
+ end
50
+
51
+ should 'contain one sitemap element' do
52
+ generate_sitemap_files
53
+ assert_equal 1, num_elements(sitemaps_index_file, 'sitemap')
54
+ end
55
+
56
+ should 'contain one loc element' do
57
+ generate_one_sitemap_model_file
58
+ assert_equal 1, num_elements(sitemaps_index_file, 'loc')
59
+ end
60
+
61
+ should 'contain one lastmod element' do
62
+ generate_one_sitemap_model_file
63
+ assert_equal 1, num_elements(sitemaps_index_file, 'lastmod')
64
+ end
65
+
66
+ should 'contain two loc elements' do
67
+ generate_two_model_sitemap_files
68
+ assert_equal 2, num_elements(sitemaps_index_file, 'loc')
69
+ end
70
+
71
+ should 'contain two lastmod elements' do
72
+ generate_two_model_sitemap_files
73
+ assert_equal 2, num_elements(sitemaps_index_file, 'lastmod')
74
+ end
75
+
76
+ should 'not be gzipped' do
77
+ generate_sitemap_files(:gzip => false)
78
+ assert File.exists?(unzipped_sitemaps_index_file)
79
+ end
80
+ end
81
+
82
+ context 'Sitemap model file' do
83
+ should 'contain one urlset element' do
84
+ generate_one_sitemap_model_file
85
+ assert_equal 1, num_elements(first_sitemaps_model_file, 'urlset')
86
+ end
87
+
88
+ should 'contain several loc elements' do
89
+ generate_one_sitemap_model_file
90
+ assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'loc')
91
+ end
92
+
93
+ should 'contain several lastmod elements' do
94
+ generate_one_sitemap_model_file
95
+ assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'lastmod')
96
+ end
97
+
98
+ should 'contain several changefreq elements' do
99
+ generate_one_sitemap_model_file
100
+ assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'changefreq')
101
+ end
102
+
103
+ should 'contain several priority elements' do
104
+ generate_one_sitemap_model_file(:priority => 0.2)
105
+ assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'priority')
106
+ end
107
+
108
+ should 'have a change frequency of weekly by default' do
109
+ generate_one_sitemap_model_file
110
+ assert_equal 'weekly', elements(first_sitemaps_model_file, 'changefreq').first.text
111
+ end
112
+
113
+ should 'have a change frequency of daily' do
114
+ generate_one_sitemap_model_file(:change_frequency => 'daily')
115
+ assert_equal 'daily', elements(first_sitemaps_model_file, 'changefreq').first.text
116
+ end
117
+
118
+ should 'be able to use a lambda to specify change frequency' do
119
+ generate_one_sitemap_model_file(:change_frequency => lambda {|m| m.change_frequency})
120
+ assert_equal TestModel.new.change_frequency, elements(first_sitemaps_model_file, 'changefreq').first.text
121
+ end
122
+
123
+ should 'have a priority of 0.2' do
124
+ generate_one_sitemap_model_file(:priority => 0.2)
125
+ assert_equal '0.2', elements(first_sitemaps_model_file, 'priority').first.text
126
+ end
127
+
128
+ should 'be able to use a lambda to specify priority' do
129
+ generate_one_sitemap_model_file(:priority => lambda {|m| m.priority})
130
+ assert_equal TestModel.new.priority.to_s, elements(first_sitemaps_model_file, 'priority').first.text
131
+ end
132
+
133
+ should 'be able to use a lambda to specify lastmod' do
134
+ generate_one_sitemap_model_file(:last_modified => lambda {|m| m.updated_at})
135
+ assert_equal TestModel.new.updated_at.utc.strftime('%Y-%m-%dT%H:%M:%S+00:00'), elements(first_sitemaps_model_file, 'lastmod').first.text
136
+ end
137
+
138
+ should 'contain two loc element' do
139
+ generate_two_model_sitemap_files
140
+ assert_equal 2, num_elements(first_sitemaps_model_file, 'loc')
141
+ assert_equal 2, num_elements(second_sitemaps_model_file, 'loc')
142
+ end
143
+
144
+ should 'contain two lastmod element' do
145
+ generate_two_model_sitemap_files
146
+ assert_equal 2, num_elements(first_sitemaps_model_file, 'lastmod')
147
+ assert_equal 2, num_elements(second_sitemaps_model_file, 'lastmod')
148
+ end
149
+
150
+ should 'contain two changefreq elements' do
151
+ generate_two_model_sitemap_files
152
+ assert_equal 2, num_elements(first_sitemaps_model_file, 'changefreq')
153
+ assert_equal 2, num_elements(second_sitemaps_model_file, 'changefreq')
154
+ end
155
+
156
+ should 'contain two priority element' do
157
+ generate_two_model_sitemap_files(:priority => 0.2)
158
+ assert_equal 2, num_elements(first_sitemaps_model_file, 'priority')
159
+ assert_equal 2, num_elements(second_sitemaps_model_file, 'priority')
160
+ end
161
+
162
+ should 'strip leading slashes from controller paths' do
163
+ create_sitemap
164
+ @sitemap.add(TestModel, :path => '/test_controller').generate
165
+ assert(
166
+ !elements(first_sitemaps_model_file, 'loc').first.text.match(/\/\/test_controller\//),
167
+ 'URL does not contain a double-slash before the controller path'
168
+ )
169
+ end
170
+
171
+ should 'not be gzipped' do
172
+ generate_one_sitemap_model_file(:gzip => false)
173
+ assert File.exists?(unzipped_first_sitemaps_model_file)
174
+ end
175
+ end
176
+
177
+ context 'add method' do
178
+ should 'be chainable' do
179
+ create_sitemap
180
+ assert_equal BigSitemap, @sitemap.add(TestModel).class
181
+ end
182
+ end
183
+
184
+ context 'clean method' do
185
+ should 'be chainable' do
186
+ create_sitemap
187
+ assert_equal BigSitemap, @sitemap.clean.class
188
+ end
189
+
190
+ should 'clean all sitemap files' do
191
+ generate_sitemap_files
192
+ assert Dir.entries(sitemaps_dir).size > 2, "#{sitemaps_dir} is not empty" # ['.', '..'].size == 2
193
+ @sitemap.clean
194
+ assert_equal 2, Dir.entries(sitemaps_dir).size, "#{sitemaps_dir} is empty"
195
+ end
196
+ end
197
+
198
+ context 'generate method' do
199
+ should 'be chainable' do
200
+ create_sitemap
201
+ assert_equal BigSitemap, @sitemap.generate.class
202
+ end
203
+ end
204
+
205
+ private
206
+ def delete_tmp_files
207
+ FileUtils.rm_rf(sitemaps_dir)
208
+ end
209
+
210
+ def create_sitemap(options={})
211
+ @sitemap = BigSitemap.new({
212
+ :base_url => 'http://example.com',
213
+ :document_root => tmp_dir,
214
+ :update_google => false
215
+ }.update(options))
216
+ end
217
+
218
+ def generate_sitemap_files(options={})
219
+ create_sitemap(options)
220
+ add_model
221
+ @sitemap.generate
222
+ end
223
+
224
+ def generate_one_sitemap_model_file(options={})
225
+ change_frequency = options.delete(:change_frequency)
226
+ priority = options.delete(:priority)
227
+ create_sitemap(options.merge(:max_per_sitemap => default_num_items, :batch_size => default_num_items))
228
+ add_model(:change_frequency => change_frequency, :priority => priority)
229
+ @sitemap.generate
230
+ end
231
+
232
+ def generate_two_model_sitemap_files(options={})
233
+ change_frequency = options.delete(:change_frequency)
234
+ priority = options.delete(:priority)
235
+ create_sitemap(options.merge(:max_per_sitemap => 2, :batch_size => 1))
236
+ add_model(:num_items => 4, :change_frequency => change_frequency, :priority => priority)
237
+ @sitemap.generate
238
+ end
239
+
240
+ def add_model(options={})
241
+ num_items = options.delete(:num_items) || default_num_items
242
+ TestModel.stubs(:num_items).returns(num_items)
243
+ @sitemap.add(TestModel, options)
244
+ end
245
+
246
+ def default_num_items
247
+ 10
248
+ end
249
+
250
+ def sitemaps_index_file
251
+ "#{unzipped_sitemaps_index_file}.gz"
252
+ end
253
+
254
+ def unzipped_sitemaps_index_file
255
+ "#{sitemaps_dir}/sitemap_index.xml"
256
+ end
257
+
258
+ def unzipped_first_sitemaps_model_file
259
+ "#{sitemaps_dir}/sitemap_test_models.xml"
260
+ end
261
+
262
+ def first_sitemaps_model_file
263
+ "#{sitemaps_dir}/sitemap_test_models.xml.gz"
264
+ end
265
+
266
+ def second_sitemaps_model_file
267
+ "#{sitemaps_dir}/sitemap_test_models_1.xml.gz"
268
+ end
269
+
270
+ def third_sitemaps_model_file
271
+ "#{sitemaps_dir}/sitemap_test_model_2.xml.gz"
272
+ end
273
+
274
+ def sitemaps_dir
275
+ "#{tmp_dir}/sitemaps"
276
+ end
277
+
278
+ def tmp_dir
279
+ '/tmp'
280
+ end
281
+
282
+ def ns
283
+ {'s' => 'http://www.sitemaps.org/schemas/sitemap/0.9'}
284
+ end
285
+
286
+ def elements(filename, el)
287
+ data = Nokogiri::XML.parse(Zlib::GzipReader.open(filename).read)
288
+ data.search("//s:#{el}", ns)
289
+ end
290
+
291
+ def num_elements(filename, el)
292
+ elements(filename, el).size
293
+ end
294
+ end
@@ -0,0 +1,34 @@
1
+ class TestModel
2
+ def to_param
3
+ object_id
4
+ end
5
+
6
+ def change_frequency
7
+ 'monthly'
8
+ end
9
+
10
+ def priority
11
+ 0.8
12
+ end
13
+
14
+ def updated_at
15
+ Time.at(1000000000)
16
+ end
17
+
18
+ class << self
19
+ def count_for_sitemap
20
+ self.find_for_sitemap.size
21
+ end
22
+
23
+ def num_items
24
+ 10
25
+ end
26
+
27
+ def find_for_sitemap(options={})
28
+ instances = []
29
+ num_times = options.delete(:limit) || self.num_items
30
+ num_times.times { instances.push(self.new) }
31
+ instances
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,11 @@
1
+ require 'rubygems'
2
+ require 'test/unit'
3
+ require 'shoulda'
4
+ require 'mocha'
5
+ require 'test/fixtures/test_model'
6
+
7
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
8
+ require 'big_sitemap'
9
+
10
+ class Test::Unit::TestCase
11
+ end
metadata ADDED
@@ -0,0 +1,108 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: big_sitemap
3
+ version: !ruby/object:Gem::Version
4
+ hash: 9
5
+ prerelease: false
6
+ segments:
7
+ - 0
8
+ - 5
9
+ - 1
10
+ version: 0.5.1
11
+ platform: ruby
12
+ authors:
13
+ - Alex Rabarts
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2009-09-07 00:00:00 +01:00
19
+ default_executable:
20
+ dependencies:
21
+ - !ruby/object:Gem::Dependency
22
+ name: builder
23
+ prerelease: false
24
+ requirement: &id001 !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ">="
28
+ - !ruby/object:Gem::Version
29
+ hash: 15
30
+ segments:
31
+ - 2
32
+ - 1
33
+ - 2
34
+ version: 2.1.2
35
+ type: :runtime
36
+ version_requirements: *id001
37
+ - !ruby/object:Gem::Dependency
38
+ name: extlib
39
+ prerelease: false
40
+ requirement: &id002 !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ">="
44
+ - !ruby/object:Gem::Version
45
+ hash: 41
46
+ segments:
47
+ - 0
48
+ - 9
49
+ - 9
50
+ version: 0.9.9
51
+ type: :runtime
52
+ version_requirements: *id002
53
+ description: A Sitemap generator specifically designed for large sites (although it works equally well with small sites)
54
+ email: alexrabarts@gmail.com
55
+ executables: []
56
+
57
+ extensions: []
58
+
59
+ extra_rdoc_files:
60
+ - README.rdoc
61
+ - LICENSE
62
+ files:
63
+ - History.txt
64
+ - README.rdoc
65
+ - VERSION.yml
66
+ - lib/big_sitemap/builder.rb
67
+ - lib/big_sitemap.rb
68
+ - test/big_sitemap_test.rb
69
+ - test/fixtures/test_model.rb
70
+ - test/test_helper.rb
71
+ - LICENSE
72
+ has_rdoc: true
73
+ homepage: http://github.com/alexrabarts/big_sitemap
74
+ licenses: []
75
+
76
+ post_install_message:
77
+ rdoc_options:
78
+ - --inline-source
79
+ - --charset=UTF-8
80
+ require_paths:
81
+ - lib
82
+ required_ruby_version: !ruby/object:Gem::Requirement
83
+ none: false
84
+ requirements:
85
+ - - ">="
86
+ - !ruby/object:Gem::Version
87
+ hash: 3
88
+ segments:
89
+ - 0
90
+ version: "0"
91
+ required_rubygems_version: !ruby/object:Gem::Requirement
92
+ none: false
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ hash: 3
97
+ segments:
98
+ - 0
99
+ version: "0"
100
+ requirements: []
101
+
102
+ rubyforge_project:
103
+ rubygems_version: 1.3.7
104
+ signing_key:
105
+ specification_version: 2
106
+ summary: A Sitemap generator specifically designed for large sites (although it works equally well with small sites)
107
+ test_files: []
108
+