pj_nitin-big_sitemap 0.3.1
Sign up to get free protection for your applications and to get access to all the features.
- data/History.txt +28 -0
- data/LICENSE +22 -0
- data/README.rdoc +110 -0
- data/VERSION.yml +4 -0
- data/lib/big_sitemap.rb +229 -0
- data/test/big_sitemap_test.rb +289 -0
- data/test/fixtures/test_model.rb +30 -0
- data/test/test_helper.rb +11 -0
- metadata +82 -0
data/History.txt
ADDED
@@ -0,0 +1,28 @@
|
|
1
|
+
=== 0.3.0 / 2009-04-06
|
2
|
+
|
3
|
+
* API change: Pass model through as first argument to add method, e.g.sitemap.add(Posts, {:path => 'articles'})
|
4
|
+
* API change: Use Rails' polymorphic_url helper to generate URLs if Rails is being used
|
5
|
+
* API change: Only ping search engines when ping_search_engines is explicitly called
|
6
|
+
* Add support for passing options through to the model's find method, e.g. :conditions
|
7
|
+
* Allow base URL to be specified as a hash as well as a string
|
8
|
+
* Add support for changefreq and priority
|
9
|
+
* Pluralize sitemap model filenames
|
10
|
+
* GZipping may optionally be turned off
|
11
|
+
|
12
|
+
=== 0.2.1 / 2009-03-12
|
13
|
+
|
14
|
+
* Normalize path arguments so it no longer matters whether a leading slash is used or not
|
15
|
+
|
16
|
+
=== 0.2.0 / 2009-03-11
|
17
|
+
|
18
|
+
* Methods are now chainable
|
19
|
+
|
20
|
+
=== 0.1.4 / 2009-03-11
|
21
|
+
|
22
|
+
* Add clean method to clear out Sitemaps directory
|
23
|
+
* Make methods chainable
|
24
|
+
|
25
|
+
=== 0.1.3 / 2009-03-10
|
26
|
+
|
27
|
+
* Initial release
|
28
|
+
|
data/LICENSE
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
(The MIT License)
|
2
|
+
|
3
|
+
Copyright (c) 2009 Stateless Systems (http://statelesssystems.com)
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
'Software'), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
19
|
+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
20
|
+
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
21
|
+
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
22
|
+
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.rdoc
ADDED
@@ -0,0 +1,110 @@
|
|
1
|
+
= BigSitemap
|
2
|
+
|
3
|
+
BigSitemap is a Sitemap (http://sitemaps.org) generator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries to minimize memory usage, can be set up with just a few lines of code and is compatible with just about any framework.
|
4
|
+
|
5
|
+
BigSitemap is best run periodically through a Rake/Thor task.
|
6
|
+
|
7
|
+
sitemap = BigSitemap.new(:url_options => {:host => 'example.com'})
|
8
|
+
|
9
|
+
# Add a model
|
10
|
+
sitemap.add Product
|
11
|
+
|
12
|
+
# Add another model with some options
|
13
|
+
sitemap.add(Post, {
|
14
|
+
:conditions => {:published => true},
|
15
|
+
:path => 'articles',
|
16
|
+
:change_frequency => 'daily',
|
17
|
+
:priority => 0.5
|
18
|
+
})
|
19
|
+
|
20
|
+
# Generate the files
|
21
|
+
sitemap.generate
|
22
|
+
|
23
|
+
The code above will create a minimum of three files:
|
24
|
+
|
25
|
+
1. public/sitemaps/sitemap_index.xml.gz
|
26
|
+
2. public/sitemaps/sitemap_products.xml.gz
|
27
|
+
3. public/sitemaps/sitemap_posts.xml.gz
|
28
|
+
|
29
|
+
If your sitemaps grow beyond 50,000 URLs (this limit can be overridden with the <code>:max_per_sitemap</code> option), the sitemap files will be partitioned into multiple files (<code>sitemap_products_1.xml.gz</code>, <code>sitemap_products_2.xml.gz</code>, ...).
|
30
|
+
|
31
|
+
If you're using Rails then the URLs for each database record are generated with the <code>polymorphic_url</code> helper. That means that the URL for a record will be exactly what you would expect: generated with respect to the routing setup of your app. In other contexts where this helper isn't available, the URLs are generated in the form:
|
32
|
+
|
33
|
+
:base_url/:path/:to_param
|
34
|
+
|
35
|
+
If the <code>to_param</code> method does not exist, then <code>id</code> will be used.
|
36
|
+
|
37
|
+
== Install
|
38
|
+
|
39
|
+
Via gem:
|
40
|
+
|
41
|
+
gem install alexrabarts-big_sitemap -s http://gems.github.com
|
42
|
+
|
43
|
+
== Advanced
|
44
|
+
|
45
|
+
=== Options
|
46
|
+
|
47
|
+
* <code>:url_options</code> -- hash with <code>:host</code>, optionally <code>:port</code> and <code>:protocol</code>
|
48
|
+
* <code>:base_url</code> -- string alternative to <code>:url_options</code>, e.g. "https://example.com:8080/"
|
49
|
+
* <code>:document_root</code> -- string defaults to <code>Rails.root</code> or <code>Merb.root</code> if available
|
50
|
+
* <code>:path</code> -- string defaults to 'sitemaps', which places sitemap files under the <code>/sitemaps</code> directory
|
51
|
+
* <code>:max_per_sitemap</code> -- <code>50000</code>, which is the limit dictated by Google but can be less
|
52
|
+
* <code>:batch_size</code> -- <code>1001</code> (not <code>1000</code> due to a bug in DataMapper)
|
53
|
+
* <code>:gzip</code> -- <code>true</code>
|
54
|
+
* <code>:ping_google</code> -- <code>true</code>
|
55
|
+
* <code>:ping_yahoo</code> -- <code>false</code>, needs <code>:yahoo_app_id</code>
|
56
|
+
* <code>:ping_msn</code> -- <code>false</code>
|
57
|
+
* <code>:pink_ask</code> -- <code>false</code>
|
58
|
+
|
59
|
+
=== Chaining
|
60
|
+
|
61
|
+
You can chain methods together. You could even get away with as little code as:
|
62
|
+
|
63
|
+
BigSitemap.new(:url_options => {:host => 'example.com'}).add(Post).generate
|
64
|
+
|
65
|
+
=== Pinging Search Engines
|
66
|
+
|
67
|
+
To ping search engines, call <code>ping_search_engines</code> after you generate the sitemap:
|
68
|
+
|
69
|
+
sitemap.generate
|
70
|
+
sitemap.ping_search_engines
|
71
|
+
|
72
|
+
=== Change Frequency and Priority
|
73
|
+
|
74
|
+
You can control "changefreq" and "priority" values for each record individually by passing lambdas instead of fixed values:
|
75
|
+
|
76
|
+
sitemap.add(Posts,
|
77
|
+
:change_frequency => lambda {|post| ... },
|
78
|
+
:priority => lambda {|post| ... }
|
79
|
+
)
|
80
|
+
|
81
|
+
=== Find Methods
|
82
|
+
|
83
|
+
Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
|
84
|
+
|
85
|
+
Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
|
86
|
+
|
87
|
+
If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
|
88
|
+
|
89
|
+
=== Cleaning the Sitemaps Directory
|
90
|
+
|
91
|
+
Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
|
92
|
+
|
93
|
+
== Limitations
|
94
|
+
|
95
|
+
If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
|
96
|
+
|
97
|
+
== TODO
|
98
|
+
|
99
|
+
Tests for Rails components.
|
100
|
+
|
101
|
+
== Credits
|
102
|
+
|
103
|
+
Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
|
104
|
+
http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
|
105
|
+
|
106
|
+
Thanks to Mislav Marohnić for contributing patches.
|
107
|
+
|
108
|
+
== Copyright
|
109
|
+
|
110
|
+
Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
|
data/VERSION.yml
ADDED
data/lib/big_sitemap.rb
ADDED
@@ -0,0 +1,229 @@
|
|
1
|
+
require 'uri'
|
2
|
+
require 'big_sitemap/builder'
|
3
|
+
require 'activesupport'
|
4
|
+
|
5
|
+
class BigSitemap
|
6
|
+
DEFAULTS = {
|
7
|
+
:max_per_sitemap => Builder::MAX_URLS,
|
8
|
+
:batch_size => 1001,
|
9
|
+
:path => 'sitemaps',
|
10
|
+
:gzip => true,
|
11
|
+
|
12
|
+
# opinionated
|
13
|
+
:ping_google => true,
|
14
|
+
:ping_yahoo => false, # needs :yahoo_app_id
|
15
|
+
:ping_msn => false,
|
16
|
+
:ping_ask => false
|
17
|
+
}
|
18
|
+
|
19
|
+
COUNT_METHODS = [:count_for_sitemap, :count]
|
20
|
+
FIND_METHODS = [:find_for_sitemap, :all]
|
21
|
+
TIMESTAMP_METHODS = [:updated_at, :updated_on, :updated, :created_at, :created_on, :created]
|
22
|
+
PARAM_METHODS = [:to_param, :id]
|
23
|
+
|
24
|
+
include ActionController::UrlWriter if defined? Rails
|
25
|
+
|
26
|
+
def initialize(options)
|
27
|
+
@options = DEFAULTS.merge options
|
28
|
+
|
29
|
+
# Use Rails' default_url_options if available
|
30
|
+
@default_url_options = defined?(Rails) ? default_url_options : {}
|
31
|
+
|
32
|
+
if @options[:max_per_sitemap] <= 1
|
33
|
+
raise ArgumentError, '":max_per_sitemap" must be greater than 1'
|
34
|
+
end
|
35
|
+
|
36
|
+
if @options[:url_options]
|
37
|
+
@default_url_options.update @options[:url_options]
|
38
|
+
elsif @options[:base_url]
|
39
|
+
uri = URI.parse(@options[:base_url])
|
40
|
+
@default_url_options[:host] = uri.host
|
41
|
+
@default_url_options[:port] = uri.port
|
42
|
+
@default_url_options[:protocol] = uri.scheme
|
43
|
+
else
|
44
|
+
raise ArgumentError, 'you must specify either ":url_options" hash or ":base_url" string'
|
45
|
+
end
|
46
|
+
|
47
|
+
if @options[:batch_size] > @options[:max_per_sitemap]
|
48
|
+
raise ArgumentError, '":batch_size" must be less than ":max_per_sitemap"'
|
49
|
+
end
|
50
|
+
|
51
|
+
@options[:document_root] ||= begin
|
52
|
+
if defined? Rails
|
53
|
+
"#{Rails.root}/public"
|
54
|
+
elsif defined? Merb
|
55
|
+
"#{Merb.root}/public"
|
56
|
+
end
|
57
|
+
end
|
58
|
+
|
59
|
+
unless @options[:document_root]
|
60
|
+
raise ArgumentError, 'Document root must be specified with the ":document_root" option'
|
61
|
+
end
|
62
|
+
|
63
|
+
@file_path = "#{@options[:document_root]}/#{strip_leading_slash(@options[:path])}"
|
64
|
+
Dir.mkdir(@file_path) unless File.exists? @file_path
|
65
|
+
|
66
|
+
@sources = []
|
67
|
+
@sitemap_files = []
|
68
|
+
end
|
69
|
+
|
70
|
+
def add(model, options={})
|
71
|
+
options[:path] ||= ActiveSupport::Inflector.tableize(model.to_s)
|
72
|
+
@sources << [model, options.dup]
|
73
|
+
return self
|
74
|
+
end
|
75
|
+
|
76
|
+
def clean
|
77
|
+
Dir["#{@file_path}/sitemap_*.{xml,xml.gz}"].each do |file|
|
78
|
+
FileUtils.rm file
|
79
|
+
end
|
80
|
+
return self
|
81
|
+
end
|
82
|
+
|
83
|
+
def generate
|
84
|
+
for model, options in @sources
|
85
|
+
with_sitemap(ActiveSupport::Inflector.tableize(model)) do |sitemap|
|
86
|
+
count_method = pick_method(model, COUNT_METHODS)
|
87
|
+
find_method = pick_method(model, FIND_METHODS)
|
88
|
+
raise ArgumentError, "#{model} must provide a count_for_sitemap class method" if count_method.nil?
|
89
|
+
raise ArgumentError, "#{model} must provide a find_for_sitemap class method" if find_method.nil?
|
90
|
+
|
91
|
+
count = model.send(count_method)
|
92
|
+
num_sitemaps = 1
|
93
|
+
num_batches = 1
|
94
|
+
|
95
|
+
if count > @options[:batch_size]
|
96
|
+
num_batches = (count.to_f / @options[:batch_size].to_f).ceil
|
97
|
+
num_sitemaps = (count.to_f / @options[:max_per_sitemap].to_f).ceil
|
98
|
+
end
|
99
|
+
batches_per_sitemap = num_batches.to_f / num_sitemaps.to_f
|
100
|
+
|
101
|
+
find_options = options.dup
|
102
|
+
|
103
|
+
for sitemap_num in 1..num_sitemaps
|
104
|
+
# Work out the start and end batch numbers for this sitemap
|
105
|
+
batch_num_start = sitemap_num == 1 ? 1 : ((sitemap_num * batches_per_sitemap).ceil - batches_per_sitemap + 1).to_i
|
106
|
+
batch_num_end = (batch_num_start + [batches_per_sitemap, num_batches].min).floor - 1
|
107
|
+
|
108
|
+
for batch_num in batch_num_start..batch_num_end
|
109
|
+
offset = ((batch_num - 1) * @options[:batch_size])
|
110
|
+
limit = (count - offset) < @options[:batch_size] ? (count - offset - 1) : @options[:batch_size]
|
111
|
+
find_options.update(:limit => limit, :offset => offset) if num_batches > 1
|
112
|
+
|
113
|
+
model.send(find_method, find_options).each do |record|
|
114
|
+
last_mod_method = pick_method(record, TIMESTAMP_METHODS)
|
115
|
+
last_mod = last_mod_method.nil? ? Time.now : record.send(last_mod_method)
|
116
|
+
|
117
|
+
param_method = pick_method(record, PARAM_METHODS)
|
118
|
+
|
119
|
+
location = defined?(Rails) ?
|
120
|
+
polymorphic_url(record) :
|
121
|
+
"#{root_url}/#{strip_leading_slash(options[:path])}/#{record.send(param_method)}"
|
122
|
+
|
123
|
+
change_frequency = options[:change_frequency] || 'weekly'
|
124
|
+
freq = change_frequency.is_a?(Proc) ? change_frequency.call(record) : change_frequency
|
125
|
+
|
126
|
+
priority = options[:priority]
|
127
|
+
pri = priority.is_a?(Proc) ? priority.call(record) : priority
|
128
|
+
|
129
|
+
sitemap.add_url!(location, last_mod, freq, pri)
|
130
|
+
end
|
131
|
+
end
|
132
|
+
end
|
133
|
+
end
|
134
|
+
end
|
135
|
+
|
136
|
+
generate_sitemap_index
|
137
|
+
|
138
|
+
return self
|
139
|
+
end
|
140
|
+
|
141
|
+
def ping_search_engines
|
142
|
+
require 'net/http'
|
143
|
+
require 'cgi'
|
144
|
+
|
145
|
+
sitemap_uri = CGI::escape(url_for_sitemap(@sitemap_files.last))
|
146
|
+
|
147
|
+
if @options[:ping_google]
|
148
|
+
Net::HTTP.get('www.google.com', "/webmasters/tools/ping?sitemap=#{sitemap_uri}")
|
149
|
+
end
|
150
|
+
|
151
|
+
if @options[:ping_yahoo]
|
152
|
+
if @options[:yahoo_app_id]
|
153
|
+
Net::HTTP.get(
|
154
|
+
'search.yahooapis.com', "/SiteExplorerService/V1/updateNotification?" +
|
155
|
+
"appid=#{@options[:yahoo_app_id]}&url=#{sitemap_uri}"
|
156
|
+
)
|
157
|
+
else
|
158
|
+
$stderr.puts 'unable to ping Yahoo: no ":yahoo_app_id" provided'
|
159
|
+
end
|
160
|
+
end
|
161
|
+
|
162
|
+
if @options[:ping_msn]
|
163
|
+
Net::HTTP.get('webmaster.live.com', "/ping.aspx?siteMap=#{sitemap_uri}")
|
164
|
+
end
|
165
|
+
|
166
|
+
if @options[:pink_ask]
|
167
|
+
Net::HTTP.get('submissions.ask.com', "/ping?sitemap=#{sitemap_uri}")
|
168
|
+
end
|
169
|
+
end
|
170
|
+
|
171
|
+
def root_url
|
172
|
+
@root_url ||= begin
|
173
|
+
url = ''
|
174
|
+
url << (@default_url_options[:protocol] || 'http')
|
175
|
+
url << '://' unless url.match('://')
|
176
|
+
url << @default_url_options[:host]
|
177
|
+
url << ":#{port}" if port = @default_url_options[:port] and port != 80
|
178
|
+
end
|
179
|
+
end
|
180
|
+
|
181
|
+
private
|
182
|
+
|
183
|
+
def with_sitemap(name, options={})
|
184
|
+
options[:index] = name == 'index'
|
185
|
+
options[:filename] = "#{@file_path}/sitemap_#{name}"
|
186
|
+
options[:max_urls] = @options[:max_per_sitemap]
|
187
|
+
|
188
|
+
unless options[:gzip] = @options[:gzip]
|
189
|
+
options[:indent] = 2
|
190
|
+
end
|
191
|
+
|
192
|
+
sitemap = Builder.new(options)
|
193
|
+
|
194
|
+
begin
|
195
|
+
yield sitemap
|
196
|
+
ensure
|
197
|
+
sitemap.close!
|
198
|
+
@sitemap_files.concat sitemap.paths!
|
199
|
+
end
|
200
|
+
end
|
201
|
+
|
202
|
+
def strip_leading_slash(str)
|
203
|
+
str.sub(/^\//, '')
|
204
|
+
end
|
205
|
+
|
206
|
+
def pick_method(model, candidates)
|
207
|
+
method = nil
|
208
|
+
candidates.each do |candidate|
|
209
|
+
if model.respond_to? candidate
|
210
|
+
method = candidate
|
211
|
+
break
|
212
|
+
end
|
213
|
+
end
|
214
|
+
method
|
215
|
+
end
|
216
|
+
|
217
|
+
def url_for_sitemap(path)
|
218
|
+
"#{root_url}/#{File.basename(path)}"
|
219
|
+
end
|
220
|
+
|
221
|
+
# Create a sitemap index document
|
222
|
+
def generate_sitemap_index
|
223
|
+
with_sitemap 'index' do |sitemap|
|
224
|
+
for path in @sitemap_files
|
225
|
+
sitemap.add_url!(url_for_sitemap(path), File.stat(path).mtime)
|
226
|
+
end
|
227
|
+
end
|
228
|
+
end
|
229
|
+
end
|
@@ -0,0 +1,289 @@
|
|
1
|
+
require File.dirname(__FILE__) + '/test_helper'
|
2
|
+
require 'nokogiri'
|
3
|
+
|
4
|
+
class BigSitemapTest < Test::Unit::TestCase
|
5
|
+
def setup
|
6
|
+
delete_tmp_files
|
7
|
+
end
|
8
|
+
|
9
|
+
def teardown
|
10
|
+
delete_tmp_files
|
11
|
+
end
|
12
|
+
|
13
|
+
should 'raise an error if the :base_url option is not specified' do
|
14
|
+
assert_nothing_raised { BigSitemap.new(:base_url => 'http://example.com', :document_root => tmp_dir) }
|
15
|
+
assert_raise(ArgumentError) { BigSitemap.new(:document_root => tmp_dir) }
|
16
|
+
end
|
17
|
+
|
18
|
+
should 'generate the same base URL' do
|
19
|
+
options = {:document_root => tmp_dir}
|
20
|
+
assert_equal(
|
21
|
+
BigSitemap.new(options.merge(:base_url => 'http://example.com')).root_url,
|
22
|
+
BigSitemap.new(options.merge(:url_options => {:host => 'example.com'})).root_url
|
23
|
+
)
|
24
|
+
end
|
25
|
+
|
26
|
+
should 'generate a sitemap index file' do
|
27
|
+
generate_sitemap_files
|
28
|
+
assert File.exists?(sitemaps_index_file)
|
29
|
+
end
|
30
|
+
|
31
|
+
should 'generate a single sitemap model file' do
|
32
|
+
create_sitemap
|
33
|
+
add_model
|
34
|
+
@sitemap.generate
|
35
|
+
assert File.exists?(first_sitemaps_model_file), "#{first_sitemaps_model_file} exists"
|
36
|
+
end
|
37
|
+
|
38
|
+
should 'generate two sitemap model files' do
|
39
|
+
generate_two_model_sitemap_files
|
40
|
+
assert File.exists?(first_sitemaps_model_file), "#{first_sitemaps_model_file} exists"
|
41
|
+
assert File.exists?(second_sitemaps_model_file), "#{second_sitemaps_model_file} exists"
|
42
|
+
assert !File.exists?(third_sitemaps_model_file), "#{third_sitemaps_model_file} does not exist"
|
43
|
+
end
|
44
|
+
|
45
|
+
context 'Sitemap index file' do
|
46
|
+
should 'contain one sitemapindex element' do
|
47
|
+
generate_sitemap_files
|
48
|
+
assert_equal 1, num_elements(sitemaps_index_file, 'sitemapindex')
|
49
|
+
end
|
50
|
+
|
51
|
+
should 'contain one sitemap element' do
|
52
|
+
generate_sitemap_files
|
53
|
+
assert_equal 1, num_elements(sitemaps_index_file, 'sitemap')
|
54
|
+
end
|
55
|
+
|
56
|
+
should 'contain one loc element' do
|
57
|
+
generate_one_sitemap_model_file
|
58
|
+
assert_equal 1, num_elements(sitemaps_index_file, 'loc')
|
59
|
+
end
|
60
|
+
|
61
|
+
should 'contain one lastmod element' do
|
62
|
+
generate_one_sitemap_model_file
|
63
|
+
assert_equal 1, num_elements(sitemaps_index_file, 'lastmod')
|
64
|
+
end
|
65
|
+
|
66
|
+
should 'contain two loc elements' do
|
67
|
+
generate_two_model_sitemap_files
|
68
|
+
assert_equal 2, num_elements(sitemaps_index_file, 'loc')
|
69
|
+
end
|
70
|
+
|
71
|
+
should 'contain two lastmod elements' do
|
72
|
+
generate_two_model_sitemap_files
|
73
|
+
assert_equal 2, num_elements(sitemaps_index_file, 'lastmod')
|
74
|
+
end
|
75
|
+
|
76
|
+
should 'not be gzipped' do
|
77
|
+
generate_sitemap_files(:gzip => false)
|
78
|
+
assert File.exists?(unzipped_sitemaps_index_file)
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
context 'Sitemap model file' do
|
83
|
+
should 'contain one urlset element' do
|
84
|
+
generate_one_sitemap_model_file
|
85
|
+
assert_equal 1, num_elements(first_sitemaps_model_file, 'urlset')
|
86
|
+
end
|
87
|
+
|
88
|
+
should 'contain several loc elements' do
|
89
|
+
generate_one_sitemap_model_file
|
90
|
+
assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'loc')
|
91
|
+
end
|
92
|
+
|
93
|
+
should 'contain several lastmod elements' do
|
94
|
+
generate_one_sitemap_model_file
|
95
|
+
assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'lastmod')
|
96
|
+
end
|
97
|
+
|
98
|
+
should 'contain several changefreq elements' do
|
99
|
+
generate_one_sitemap_model_file
|
100
|
+
assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'changefreq')
|
101
|
+
end
|
102
|
+
|
103
|
+
should 'contain several priority elements' do
|
104
|
+
generate_one_sitemap_model_file(:priority => 0.2)
|
105
|
+
assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'priority')
|
106
|
+
end
|
107
|
+
|
108
|
+
should 'have a change frequency of weekly by default' do
|
109
|
+
generate_one_sitemap_model_file
|
110
|
+
assert_equal 'weekly', elements(first_sitemaps_model_file, 'changefreq').first.text
|
111
|
+
end
|
112
|
+
|
113
|
+
should 'have a change frequency of daily' do
|
114
|
+
generate_one_sitemap_model_file(:change_frequency => 'daily')
|
115
|
+
assert_equal 'daily', elements(first_sitemaps_model_file, 'changefreq').first.text
|
116
|
+
end
|
117
|
+
|
118
|
+
should 'be able to use a lambda to specify change frequency' do
|
119
|
+
generate_one_sitemap_model_file(:change_frequency => lambda {|m| m.change_frequency})
|
120
|
+
assert_equal TestModel.new.change_frequency, elements(first_sitemaps_model_file, 'changefreq').first.text
|
121
|
+
end
|
122
|
+
|
123
|
+
should 'have a priority of 0.2' do
|
124
|
+
generate_one_sitemap_model_file(:priority => 0.2)
|
125
|
+
assert_equal '0.2', elements(first_sitemaps_model_file, 'priority').first.text
|
126
|
+
end
|
127
|
+
|
128
|
+
should 'be able to use a lambda to specify priority' do
|
129
|
+
generate_one_sitemap_model_file(:priority => lambda {|m| m.priority})
|
130
|
+
assert_equal TestModel.new.priority.to_s, elements(first_sitemaps_model_file, 'priority').first.text
|
131
|
+
end
|
132
|
+
|
133
|
+
should 'contain two loc element' do
|
134
|
+
generate_two_model_sitemap_files
|
135
|
+
assert_equal 2, num_elements(first_sitemaps_model_file, 'loc')
|
136
|
+
assert_equal 2, num_elements(second_sitemaps_model_file, 'loc')
|
137
|
+
end
|
138
|
+
|
139
|
+
should 'contain two lastmod element' do
|
140
|
+
generate_two_model_sitemap_files
|
141
|
+
assert_equal 2, num_elements(first_sitemaps_model_file, 'lastmod')
|
142
|
+
assert_equal 2, num_elements(second_sitemaps_model_file, 'lastmod')
|
143
|
+
end
|
144
|
+
|
145
|
+
should 'contain two changefreq elements' do
|
146
|
+
generate_two_model_sitemap_files
|
147
|
+
assert_equal 2, num_elements(first_sitemaps_model_file, 'changefreq')
|
148
|
+
assert_equal 2, num_elements(second_sitemaps_model_file, 'changefreq')
|
149
|
+
end
|
150
|
+
|
151
|
+
should 'contain two priority element' do
|
152
|
+
generate_two_model_sitemap_files(:priority => 0.2)
|
153
|
+
assert_equal 2, num_elements(first_sitemaps_model_file, 'priority')
|
154
|
+
assert_equal 2, num_elements(second_sitemaps_model_file, 'priority')
|
155
|
+
end
|
156
|
+
|
157
|
+
should 'strip leading slashes from controller paths' do
|
158
|
+
create_sitemap
|
159
|
+
@sitemap.add(TestModel, :path => '/test_controller').generate
|
160
|
+
assert(
|
161
|
+
!elements(first_sitemaps_model_file, 'loc').first.text.match(/\/\/test_controller\//),
|
162
|
+
'URL does not contain a double-slash before the controller path'
|
163
|
+
)
|
164
|
+
end
|
165
|
+
|
166
|
+
should 'not be gzipped' do
|
167
|
+
generate_one_sitemap_model_file(:gzip => false)
|
168
|
+
assert File.exists?(unzipped_first_sitemaps_model_file)
|
169
|
+
end
|
170
|
+
end
|
171
|
+
|
172
|
+
context 'add method' do
|
173
|
+
should 'be chainable' do
|
174
|
+
create_sitemap
|
175
|
+
assert_equal BigSitemap, @sitemap.add(TestModel).class
|
176
|
+
end
|
177
|
+
end
|
178
|
+
|
179
|
+
context 'clean method' do
|
180
|
+
should 'be chainable' do
|
181
|
+
create_sitemap
|
182
|
+
assert_equal BigSitemap, @sitemap.clean.class
|
183
|
+
end
|
184
|
+
|
185
|
+
should 'clean all sitemap files' do
|
186
|
+
generate_sitemap_files
|
187
|
+
assert Dir.entries(sitemaps_dir).size > 2, "#{sitemaps_dir} is not empty" # ['.', '..'].size == 2
|
188
|
+
@sitemap.clean
|
189
|
+
assert_equal 2, Dir.entries(sitemaps_dir).size, "#{sitemaps_dir} is empty"
|
190
|
+
end
|
191
|
+
end
|
192
|
+
|
193
|
+
context 'generate method' do
|
194
|
+
should 'be chainable' do
|
195
|
+
create_sitemap
|
196
|
+
assert_equal BigSitemap, @sitemap.generate.class
|
197
|
+
end
|
198
|
+
end
|
199
|
+
|
200
|
+
private
|
201
|
+
def delete_tmp_files
|
202
|
+
FileUtils.rm_rf(sitemaps_dir)
|
203
|
+
end
|
204
|
+
|
205
|
+
def create_sitemap(options={})
|
206
|
+
@sitemap = BigSitemap.new({
|
207
|
+
:base_url => 'http://example.com',
|
208
|
+
:document_root => tmp_dir,
|
209
|
+
:update_google => false
|
210
|
+
}.update(options))
|
211
|
+
end
|
212
|
+
|
213
|
+
def generate_sitemap_files(options={})
|
214
|
+
create_sitemap(options)
|
215
|
+
add_model
|
216
|
+
@sitemap.generate
|
217
|
+
end
|
218
|
+
|
219
|
+
def generate_one_sitemap_model_file(options={})
|
220
|
+
change_frequency = options.delete(:change_frequency)
|
221
|
+
priority = options.delete(:priority)
|
222
|
+
create_sitemap(options.merge(:max_per_sitemap => default_num_items, :batch_size => default_num_items))
|
223
|
+
add_model(:change_frequency => change_frequency, :priority => priority)
|
224
|
+
@sitemap.generate
|
225
|
+
end
|
226
|
+
|
227
|
+
def generate_two_model_sitemap_files(options={})
|
228
|
+
change_frequency = options.delete(:change_frequency)
|
229
|
+
priority = options.delete(:priority)
|
230
|
+
create_sitemap(options.merge(:max_per_sitemap => 2, :batch_size => 1))
|
231
|
+
add_model(:num_items => 4, :change_frequency => change_frequency, :priority => priority)
|
232
|
+
@sitemap.generate
|
233
|
+
end
|
234
|
+
|
235
|
+
def add_model(options={})
|
236
|
+
num_items = options.delete(:num_items) || default_num_items
|
237
|
+
TestModel.stubs(:num_items).returns(num_items)
|
238
|
+
@sitemap.add(TestModel, options)
|
239
|
+
end
|
240
|
+
|
241
|
+
def default_num_items
|
242
|
+
10
|
243
|
+
end
|
244
|
+
|
245
|
+
def sitemaps_index_file
|
246
|
+
"#{unzipped_sitemaps_index_file}.gz"
|
247
|
+
end
|
248
|
+
|
249
|
+
def unzipped_sitemaps_index_file
|
250
|
+
"#{sitemaps_dir}/sitemap_index.xml"
|
251
|
+
end
|
252
|
+
|
253
|
+
def unzipped_first_sitemaps_model_file
|
254
|
+
"#{sitemaps_dir}/sitemap_test_models.xml"
|
255
|
+
end
|
256
|
+
|
257
|
+
def first_sitemaps_model_file
|
258
|
+
"#{sitemaps_dir}/sitemap_test_models.xml.gz"
|
259
|
+
end
|
260
|
+
|
261
|
+
def second_sitemaps_model_file
|
262
|
+
"#{sitemaps_dir}/sitemap_test_models_1.xml.gz"
|
263
|
+
end
|
264
|
+
|
265
|
+
def third_sitemaps_model_file
|
266
|
+
"#{sitemaps_dir}/sitemap_test_model_2.xml.gz"
|
267
|
+
end
|
268
|
+
|
269
|
+
def sitemaps_dir
|
270
|
+
"#{tmp_dir}/sitemaps"
|
271
|
+
end
|
272
|
+
|
273
|
+
def tmp_dir
|
274
|
+
'/tmp'
|
275
|
+
end
|
276
|
+
|
277
|
+
def ns
|
278
|
+
{'s' => 'http://www.sitemaps.org/schemas/sitemap/0.9'}
|
279
|
+
end
|
280
|
+
|
281
|
+
def elements(filename, el)
|
282
|
+
data = Nokogiri::XML.parse(Zlib::GzipReader.open(filename).read)
|
283
|
+
data.search("//s:#{el}", ns)
|
284
|
+
end
|
285
|
+
|
286
|
+
def num_elements(filename, el)
|
287
|
+
elements(filename, el).size
|
288
|
+
end
|
289
|
+
end
|
@@ -0,0 +1,30 @@
|
|
1
|
+
class TestModel
|
2
|
+
def to_param
|
3
|
+
object_id
|
4
|
+
end
|
5
|
+
|
6
|
+
def change_frequency
|
7
|
+
'monthly'
|
8
|
+
end
|
9
|
+
|
10
|
+
def priority
|
11
|
+
0.8
|
12
|
+
end
|
13
|
+
|
14
|
+
class << self
|
15
|
+
def count_for_sitemap
|
16
|
+
self.find_for_sitemap.size
|
17
|
+
end
|
18
|
+
|
19
|
+
def num_items
|
20
|
+
10
|
21
|
+
end
|
22
|
+
|
23
|
+
def find_for_sitemap(options={})
|
24
|
+
instances = []
|
25
|
+
num_times = options.delete(:limit) || self.num_items
|
26
|
+
num_times.times { instances.push(self.new) }
|
27
|
+
instances
|
28
|
+
end
|
29
|
+
end
|
30
|
+
end
|
data/test/test_helper.rb
ADDED
metadata
ADDED
@@ -0,0 +1,82 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: pj_nitin-big_sitemap
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.3.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Alex Rabarts
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2009-04-17 00:00:00 -07:00
|
13
|
+
default_executable:
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: builder
|
17
|
+
type: :runtime
|
18
|
+
version_requirement:
|
19
|
+
version_requirements: !ruby/object:Gem::Requirement
|
20
|
+
requirements:
|
21
|
+
- - ">="
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: 2.1.2
|
24
|
+
version:
|
25
|
+
- !ruby/object:Gem::Dependency
|
26
|
+
name: activesupport
|
27
|
+
type: :runtime
|
28
|
+
version_requirement:
|
29
|
+
version_requirements: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: "0"
|
34
|
+
version:
|
35
|
+
description: (Now depends on activesupport) A Sitemap generator specifically designed for large sites (although it works equally well with small sites)
|
36
|
+
email: alexrabarts@gmail.com
|
37
|
+
executables: []
|
38
|
+
|
39
|
+
extensions: []
|
40
|
+
|
41
|
+
extra_rdoc_files:
|
42
|
+
- README.rdoc
|
43
|
+
- LICENSE
|
44
|
+
files:
|
45
|
+
- History.txt
|
46
|
+
- README.rdoc
|
47
|
+
- VERSION.yml
|
48
|
+
- lib/big_sitemap.rb
|
49
|
+
- test/big_sitemap_test.rb
|
50
|
+
- test/fixtures
|
51
|
+
- test/fixtures/test_model.rb
|
52
|
+
- test/test_helper.rb
|
53
|
+
- LICENSE
|
54
|
+
has_rdoc: true
|
55
|
+
homepage: http://github.com/pj_nitin/big_sitemap
|
56
|
+
post_install_message:
|
57
|
+
rdoc_options:
|
58
|
+
- --inline-source
|
59
|
+
- --charset=UTF-8
|
60
|
+
require_paths:
|
61
|
+
- lib
|
62
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
63
|
+
requirements:
|
64
|
+
- - ">="
|
65
|
+
- !ruby/object:Gem::Version
|
66
|
+
version: "0"
|
67
|
+
version:
|
68
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
69
|
+
requirements:
|
70
|
+
- - ">="
|
71
|
+
- !ruby/object:Gem::Version
|
72
|
+
version: "0"
|
73
|
+
version:
|
74
|
+
requirements: []
|
75
|
+
|
76
|
+
rubyforge_project:
|
77
|
+
rubygems_version: 1.2.0
|
78
|
+
signing_key:
|
79
|
+
specification_version: 2
|
80
|
+
summary: A Sitemap generator specifically designed for large sites (although it works equally well with small sites)
|
81
|
+
test_files: []
|
82
|
+
|