alexrabarts-big_sitemap 0.2.1 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/History.txt +11 -0
- data/LICENSE +22 -0
- data/README.rdoc +63 -57
- data/VERSION.yml +2 -2
- data/lib/big_sitemap.rb +169 -130
- data/test/big_sitemap_test.rb +105 -33
- data/test/fixtures/test_model.rb +8 -0
- metadata +6 -4
data/History.txt
CHANGED
@@ -1,3 +1,14 @@
|
|
1
|
+
=== 0.3.0 / 2009-04-06
|
2
|
+
|
3
|
+
* API change: Pass model through as first argument to add method, e.g.sitemap.add(Posts, {:path => 'articles'})
|
4
|
+
* API change: Use Rails' polymorphic_url helper to generate URLs if Rails is being used
|
5
|
+
* API change: Only ping search engines when ping_search_engines is explicitly called
|
6
|
+
* Add support for passing options through to the model's find method, e.g. :conditions
|
7
|
+
* Allow base URL to be specified as a hash as well as a string
|
8
|
+
* Add support for changefreq and priority
|
9
|
+
* Pluralize sitemap model filenames
|
10
|
+
* GZipping may optionally be turned off
|
11
|
+
|
1
12
|
=== 0.2.1 / 2009-03-12
|
2
13
|
|
3
14
|
* Normalize path arguments so it no longer matters whether a leading slash is used or not
|
data/LICENSE
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
(The MIT License)
|
2
|
+
|
3
|
+
Copyright (c) 2009 Stateless Systems (http://statelesssystems.com)
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
'Software'), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
19
|
+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
20
|
+
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
21
|
+
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
22
|
+
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.rdoc
CHANGED
@@ -1,104 +1,110 @@
|
|
1
1
|
= BigSitemap
|
2
2
|
|
3
|
-
|
3
|
+
BigSitemap is a Sitemap (http://sitemaps.org) generator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries to minimize memory usage, can be set up with just a few lines of code and is compatible with just about any framework.
|
4
4
|
|
5
|
-
BigSitemap is
|
5
|
+
BigSitemap is best run periodically through a Rake/Thor task.
|
6
6
|
|
7
|
-
|
7
|
+
sitemap = BigSitemap.new(:url_options => {:host => 'example.com'})
|
8
8
|
|
9
|
-
|
9
|
+
# Add a model
|
10
|
+
sitemap.add Product
|
10
11
|
|
11
|
-
|
12
|
+
# Add another model with some options
|
13
|
+
sitemap.add(Post, {
|
14
|
+
:conditions => {:published => true},
|
15
|
+
:path => 'articles',
|
16
|
+
:change_frequency => 'daily',
|
17
|
+
:priority => 0.5
|
18
|
+
})
|
12
19
|
|
13
|
-
|
14
|
-
|
15
|
-
gem install alexrabarts-big_sitemap -s http://gems.github.com
|
16
|
-
|
17
|
-
== SYNOPSIS
|
20
|
+
# Generate the files
|
21
|
+
sitemap.generate
|
18
22
|
|
19
|
-
The
|
23
|
+
The code above will create a minimum of three files:
|
20
24
|
|
21
|
-
|
25
|
+
1. public/sitemaps/sitemap_index.xml.gz
|
26
|
+
2. public/sitemaps/sitemap_products.xml.gz
|
27
|
+
3. public/sitemaps/sitemap_posts.xml.gz
|
22
28
|
|
23
|
-
|
29
|
+
If your sitemaps grow beyond 50,000 URLs (this limit can be overridden with the <code>:max_per_sitemap</code> option), the sitemap files will be partitioned into multiple files (<code>sitemap_products_1.xml.gz</code>, <code>sitemap_products_2.xml.gz</code>, …).
|
24
30
|
|
25
|
-
|
26
|
-
sitemap.add(:model => Posts, :path => 'articles')
|
27
|
-
sitemap.add(:model => Comments, :path => 'comments')
|
28
|
-
sitemap.generate
|
31
|
+
If you're using Rails then the URLs for each database record are generated with the <code>polymorphic_url</code> helper. That means that the URL for a record will be exactly what you would expect: generated with respect to the routing setup of your app. In other contexts where this helper isn't available, the URLs are generated in the form:
|
29
32
|
|
30
|
-
|
33
|
+
:base_url/:path/:to_param
|
31
34
|
|
32
|
-
|
35
|
+
If the <code>to_param</code> method does not exist, then <code>id</code> will be used.
|
33
36
|
|
34
|
-
|
37
|
+
== Install
|
35
38
|
|
36
|
-
|
39
|
+
Via gem:
|
37
40
|
|
38
|
-
|
41
|
+
gem install alexrabarts-big_sitemap -s http://gems.github.com
|
39
42
|
|
40
|
-
|
43
|
+
== Advanced
|
41
44
|
|
42
|
-
|
43
|
-
:base_url/:path/:id (if to_param does not exist)
|
45
|
+
=== Options
|
44
46
|
|
45
|
-
|
47
|
+
* <code>:url_options</code> -- hash with <code>:host</code>, optionally <code>:port</code> and <code>:protocol</code>
|
48
|
+
* <code>:base_url</code> -- string alternative to <code>:url_options</code>, e.g. "https://example.com:8080/"
|
49
|
+
* <code>:document_root</code> -- string defaults to <code>Rails.root</code> or <code>Merb.root</code> if available
|
50
|
+
* <code>:path</code> -- string defaults to 'sitemaps', which places sitemap files under the <code>/sitemaps</code> directory
|
51
|
+
* <code>:max_per_sitemap</code> -- <code>50000</code>, which is the limit dictated by Google but can be less
|
52
|
+
* <code>:batch_size</code> -- <code>1001</code> (not <code>1000</code> due to a bug in DataMapper)
|
53
|
+
* <code>:gzip</code> -- <code>true</code>
|
54
|
+
* <code>:ping_google</code> -- <code>true</code>
|
55
|
+
* <code>:ping_yahoo</code> -- <code>false</code>, needs <code>:yahoo_app_id</code>
|
56
|
+
* <code>:ping_msn</code> -- <code>false</code>
|
57
|
+
* <code>:pink_ask</code> -- <code>false</code>
|
46
58
|
|
47
|
-
|
59
|
+
=== Chaining
|
48
60
|
|
49
|
-
|
61
|
+
You can chain methods together. You could even get away with as little code as:
|
50
62
|
|
51
|
-
|
63
|
+
BigSitemap.new(:url_options => {:host => 'example.com'}).add(Post).generate
|
52
64
|
|
53
|
-
|
65
|
+
=== Pinging Search Engines
|
54
66
|
|
55
|
-
|
67
|
+
To ping search engines, call <code>ping_search_engines</code> after you generate the sitemap:
|
56
68
|
|
57
|
-
|
58
|
-
|
59
|
-
=== Maximum Number of URLs
|
60
|
-
|
61
|
-
Sitemaps will be split across several files if more than 50,000 records are returned. You can customize this limit with the <code>:max_per_sitemap</code> option:
|
69
|
+
sitemap.generate
|
70
|
+
sitemap.ping_search_engines
|
62
71
|
|
63
|
-
|
72
|
+
=== Change Frequency and Priority
|
64
73
|
|
65
|
-
|
74
|
+
You can control "changefreq" and "priority" values for each record individually by passing lambdas instead of fixed values:
|
66
75
|
|
67
|
-
|
76
|
+
sitemap.add(Posts,
|
77
|
+
:change_frequency => lambda {|post| ... },
|
78
|
+
:priority => lambda {|post| ... }
|
79
|
+
)
|
68
80
|
|
69
|
-
|
81
|
+
=== Find Methods
|
70
82
|
|
71
|
-
|
83
|
+
Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
|
72
84
|
|
73
|
-
|
85
|
+
Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
|
74
86
|
|
75
|
-
|
76
|
-
:base_url => 'http://example.com',
|
77
|
-
:ping_google => false,
|
78
|
-
:ping_yahoo => false,
|
79
|
-
:ping_msn => false,
|
80
|
-
:ping_ask => false
|
81
|
-
)
|
87
|
+
If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
|
82
88
|
|
83
|
-
|
89
|
+
=== Cleaning the Sitemaps Directory
|
84
90
|
|
85
|
-
|
91
|
+
Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
|
86
92
|
|
87
|
-
==
|
93
|
+
== Limitations
|
88
94
|
|
89
95
|
If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
|
90
96
|
|
91
97
|
== TODO
|
92
98
|
|
93
|
-
|
94
|
-
* Support for <code>changefreq</code> (currently hard-coded to <code>weekly</code>)
|
99
|
+
Tests for Rails components.
|
95
100
|
|
96
|
-
==
|
101
|
+
== Credits
|
97
102
|
|
98
103
|
Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
|
99
104
|
http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
|
100
105
|
|
101
|
-
|
106
|
+
Thanks to Mislav Marohnić for contributing patches.
|
102
107
|
|
103
|
-
Copyright
|
108
|
+
== Copyright
|
104
109
|
|
110
|
+
Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
|
data/VERSION.yml
CHANGED
data/lib/big_sitemap.rb
CHANGED
@@ -1,82 +1,100 @@
|
|
1
|
-
require 'net/http'
|
2
1
|
require 'uri'
|
3
2
|
require 'zlib'
|
4
3
|
require 'builder'
|
5
4
|
require 'extlib'
|
6
5
|
|
7
6
|
class BigSitemap
|
7
|
+
DEFAULTS = {
|
8
|
+
:max_per_sitemap => 50000,
|
9
|
+
:batch_size => 1001,
|
10
|
+
:path => 'sitemaps',
|
11
|
+
:gzip => true,
|
12
|
+
|
13
|
+
# opinionated
|
14
|
+
:ping_google => true,
|
15
|
+
:ping_yahoo => false, # needs :yahoo_app_id
|
16
|
+
:ping_msn => false,
|
17
|
+
:ping_ask => false
|
18
|
+
}
|
19
|
+
|
20
|
+
COUNT_METHODS = [:count_for_sitemap, :count]
|
21
|
+
FIND_METHODS = [:find_for_sitemap, :all]
|
22
|
+
TIMESTAMP_METHODS = [:updated_at, :updated_on, :updated, :created_at, :created_on, :created]
|
23
|
+
PARAM_METHODS = [:to_param, :id]
|
24
|
+
|
25
|
+
include ActionController::UrlWriter if defined? Rails
|
26
|
+
|
8
27
|
def initialize(options)
|
9
|
-
|
28
|
+
@options = DEFAULTS.merge options
|
29
|
+
|
30
|
+
# Use Rails' default_url_options if available
|
31
|
+
@default_url_options = defined?(Rails) ? default_url_options : {}
|
32
|
+
|
33
|
+
if @options[:url_options]
|
34
|
+
@default_url_options.update @options[:url_options]
|
35
|
+
elsif @options[:base_url]
|
36
|
+
uri = URI.parse(@options[:base_url])
|
37
|
+
@default_url_options[:host] = uri.host
|
38
|
+
@default_url_options[:port] = uri.port
|
39
|
+
@default_url_options[:protocol] = uri.scheme
|
40
|
+
else
|
41
|
+
raise ArgumentError, 'you must specify either ":url_options" hash or ":base_url" string'
|
42
|
+
end
|
43
|
+
|
44
|
+
if @options[:batch_size] > @options[:max_per_sitemap]
|
45
|
+
raise ArgumentError, '":batch_size" must be less than ":max_per_sitemap"'
|
46
|
+
end
|
10
47
|
|
11
|
-
|
12
|
-
if defined?
|
13
|
-
|
48
|
+
@options[:document_root] ||= begin
|
49
|
+
if defined? Rails
|
50
|
+
"#{Rails.root}/public"
|
14
51
|
elsif defined? Merb
|
15
|
-
|
52
|
+
"#{Merb.root}/public"
|
16
53
|
end
|
17
54
|
end
|
18
55
|
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
@max_per_sitemap = options.delete(:max_per_sitemap) || 50000
|
23
|
-
@batch_size = options.delete(:batch_size) || 1001 # TODO: Set this to 1000 once DM offset 37000 bug is fixed
|
24
|
-
@web_path = strip_leading_slash(options.delete(:path) || 'sitemaps')
|
25
|
-
@ping_google = options[:ping_google].nil? ? true : options.delete(:ping_google)
|
26
|
-
@ping_yahoo = options[:ping_yahoo].nil? ? true : options.delete(:ping_yahoo)
|
27
|
-
@yahoo_app_id = options.delete(:yahoo_app_id)
|
28
|
-
@ping_msn = options[:ping_msn].nil? ? true : options.delete(:ping_msn)
|
29
|
-
@ping_ask = options[:ping_ask].nil? ? true : options.delete(:ping_ask)
|
30
|
-
@file_path = "#{document_root}/#{@web_path}"
|
31
|
-
@sources = []
|
32
|
-
|
33
|
-
raise ArgumentError, "Base URL must be specified with the :base_url option" if @base_url.nil?
|
34
|
-
|
35
|
-
raise(
|
36
|
-
ArgumentError,
|
37
|
-
'Batch size (:batch_size) must be less than or equal to maximum URLs per sitemap (:max_per_sitemap)'
|
38
|
-
) if @batch_size > @max_per_sitemap
|
56
|
+
unless @options[:document_root]
|
57
|
+
raise ArgumentError, 'Document root must be specified with the ":document_root" option'
|
58
|
+
end
|
39
59
|
|
60
|
+
@file_path = "#{@options[:document_root]}/#{strip_leading_slash(@options[:path])}"
|
40
61
|
Dir.mkdir(@file_path) unless File.exists? @file_path
|
62
|
+
|
63
|
+
@sources = []
|
64
|
+
@sitemap_files = []
|
41
65
|
end
|
42
66
|
|
43
|
-
def add(options)
|
44
|
-
|
45
|
-
@sources << options.
|
46
|
-
self
|
67
|
+
def add(model, options={})
|
68
|
+
options[:path] ||= Extlib::Inflection.tableize(model.to_s)
|
69
|
+
@sources << [model, options.dup]
|
70
|
+
return self
|
47
71
|
end
|
48
72
|
|
49
73
|
def clean
|
50
|
-
|
51
|
-
|
52
|
-
f = "#{@file_path}/#{f}"
|
53
|
-
File.delete(f) if File.file?(f)
|
54
|
-
end
|
74
|
+
Dir["#{@file_path}/sitemap_*.{xml,xml.gz}"].each do |file|
|
75
|
+
FileUtils.rm file
|
55
76
|
end
|
56
|
-
self
|
77
|
+
return self
|
57
78
|
end
|
58
79
|
|
59
80
|
def generate
|
60
|
-
@sources
|
61
|
-
|
81
|
+
for model, options in @sources
|
82
|
+
count_method = pick_method(model, COUNT_METHODS)
|
83
|
+
find_method = pick_method(model, FIND_METHODS)
|
84
|
+
raise ArgumentError, "#{model} must provide a count_for_sitemap class method" if count_method.nil?
|
85
|
+
raise ArgumentError, "#{model} must provide a find_for_sitemap class method" if find_method.nil?
|
62
86
|
|
63
|
-
|
64
|
-
find_method = pick_method(klass, [:find_for_sitemap, :all])
|
65
|
-
raise ArgumentError, "#{klass} must provide a count_for_sitemap class method" if count_method.nil?
|
66
|
-
raise ArgumentError, "#{klass} must provide a find_for_sitemap class method" if find_method.nil?
|
67
|
-
|
68
|
-
count = klass.send(count_method)
|
87
|
+
count = model.send(count_method)
|
69
88
|
num_sitemaps = 1
|
70
89
|
num_batches = 1
|
71
90
|
|
72
|
-
if count > @batch_size
|
73
|
-
num_batches = (count.to_f / @batch_size.to_f).ceil
|
74
|
-
num_sitemaps = (count.to_f / @max_per_sitemap.to_f).ceil
|
91
|
+
if count > @options[:batch_size]
|
92
|
+
num_batches = (count.to_f / @options[:batch_size].to_f).ceil
|
93
|
+
num_sitemaps = (count.to_f / @options[:max_per_sitemap].to_f).ceil
|
75
94
|
end
|
76
95
|
batches_per_sitemap = num_batches.to_f / num_sitemaps.to_f
|
77
96
|
|
78
|
-
|
79
|
-
source[:num_sitemaps] = num_sitemaps
|
97
|
+
find_options = options.dup
|
80
98
|
|
81
99
|
for sitemap_num in 1..num_sitemaps
|
82
100
|
# Work out the start and end batch numbers for this sitemap
|
@@ -84,126 +102,147 @@ class BigSitemap
|
|
84
102
|
batch_num_end = (batch_num_start + [batches_per_sitemap, num_batches].min).floor - 1
|
85
103
|
|
86
104
|
# Stream XML output to a file
|
87
|
-
filename = "sitemap_#{Extlib::Inflection::
|
105
|
+
filename = "sitemap_#{Extlib::Inflection::tableize(model.to_s)}"
|
88
106
|
filename << "_#{sitemap_num}" if num_sitemaps > 1
|
89
107
|
|
90
|
-
|
108
|
+
f = xml_open(filename)
|
91
109
|
|
92
|
-
xml = Builder::XmlMarkup.new(:target =>
|
110
|
+
xml = Builder::XmlMarkup.new(:target => f)
|
93
111
|
xml.instruct!
|
94
112
|
xml.urlset(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') do
|
95
113
|
for batch_num in batch_num_start..batch_num_end
|
96
|
-
offset = ((batch_num - 1) * @batch_size)
|
97
|
-
limit = (count - offset) < @batch_size ? (count - offset - 1) : @batch_size
|
98
|
-
find_options
|
99
|
-
|
100
|
-
|
101
|
-
last_mod_method = pick_method(
|
102
|
-
r,
|
103
|
-
[:updated_at, :updated_on, :updated, :created_at, :created_on, :created]
|
104
|
-
)
|
114
|
+
offset = ((batch_num - 1) * @options[:batch_size])
|
115
|
+
limit = (count - offset) < @options[:batch_size] ? (count - offset - 1) : @options[:batch_size]
|
116
|
+
find_options.update(:limit => limit, :offset => offset) if num_batches > 1
|
117
|
+
|
118
|
+
model.send(find_method, find_options).each do |r|
|
119
|
+
last_mod_method = pick_method(r, TIMESTAMP_METHODS)
|
105
120
|
last_mod = last_mod_method.nil? ? Time.now : r.send(last_mod_method)
|
106
121
|
|
107
|
-
param_method = pick_method(r,
|
108
|
-
raise ArgumentError, "#{klass} must provide a to_param instance method" if param_method.nil?
|
122
|
+
param_method = pick_method(r, PARAM_METHODS)
|
109
123
|
|
110
124
|
xml.url do
|
111
|
-
|
125
|
+
location = defined?(Rails) ?
|
126
|
+
polymorphic_url(r) :
|
127
|
+
"#{root_url}/#{strip_leading_slash(options[:path])}/#{r.send(param_method)}"
|
128
|
+
xml.loc(location)
|
129
|
+
|
112
130
|
xml.lastmod(last_mod.strftime('%Y-%m-%d')) unless last_mod.nil?
|
113
|
-
|
131
|
+
|
132
|
+
change_frequency = options[:change_frequency] || 'weekly'
|
133
|
+
xml.changefreq(change_frequency.is_a?(Proc) ? change_frequency.call(r) : change_frequency)
|
134
|
+
|
135
|
+
priority = options[:priority]
|
136
|
+
unless priority.nil?
|
137
|
+
xml.priority(priority.is_a?(Proc) ? priority.call(r) : priority)
|
138
|
+
end
|
114
139
|
end
|
115
140
|
end
|
116
141
|
end
|
117
142
|
end
|
118
143
|
|
119
|
-
|
144
|
+
f.close
|
120
145
|
end
|
121
146
|
|
122
147
|
end
|
123
148
|
|
124
149
|
generate_sitemap_index
|
125
|
-
|
126
|
-
self
|
150
|
+
|
151
|
+
return self
|
127
152
|
end
|
128
153
|
|
129
|
-
|
130
|
-
|
131
|
-
|
154
|
+
def ping_search_engines
|
155
|
+
require 'net/http'
|
156
|
+
require 'cgi'
|
157
|
+
|
158
|
+
sitemap_uri = CGI::escape(url_for_sitemap(@sitemap_files.last))
|
159
|
+
|
160
|
+
if @options[:ping_google]
|
161
|
+
Net::HTTP.get('www.google.com', "/webmasters/tools/ping?sitemap=#{sitemap_uri}")
|
132
162
|
end
|
133
163
|
|
134
|
-
|
135
|
-
|
136
|
-
|
137
|
-
|
138
|
-
|
139
|
-
|
140
|
-
|
164
|
+
if @options[:ping_yahoo]
|
165
|
+
if @options[:yahoo_app_id]
|
166
|
+
Net::HTTP.get(
|
167
|
+
'search.yahooapis.com', "/SiteExplorerService/V1/updateNotification?" +
|
168
|
+
"appid=#{@options[:yahoo_app_id]}&url=#{sitemap_uri}"
|
169
|
+
)
|
170
|
+
else
|
171
|
+
$stderr.puts 'unable to ping Yahoo: no ":yahoo_app_id" provided'
|
141
172
|
end
|
142
|
-
method
|
143
173
|
end
|
144
174
|
|
145
|
-
|
146
|
-
|
175
|
+
if @options[:ping_msn]
|
176
|
+
Net::HTTP.get('webmaster.live.com', "/ping.aspx?siteMap=#{sitemap_uri}")
|
147
177
|
end
|
148
178
|
|
149
|
-
|
150
|
-
'
|
179
|
+
if @options[:pink_ask]
|
180
|
+
Net::HTTP.get('submissions.ask.com', "/ping?sitemap=#{sitemap_uri}")
|
151
181
|
end
|
182
|
+
end
|
152
183
|
|
153
|
-
|
154
|
-
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
-
|
160
|
-
num_sitemaps = source[:num_sitemaps]
|
161
|
-
for i in 1..num_sitemaps
|
162
|
-
loc = "#{@base_url}/#{@web_path}/sitemap_#{Extlib::Inflection::underscore(source[:model].to_s)}"
|
163
|
-
loc << "_#{i}" if num_sitemaps > 1
|
164
|
-
loc << '.xml.gz'
|
165
|
-
|
166
|
-
builder.sitemap do
|
167
|
-
builder.loc(loc)
|
168
|
-
builder.lastmod(Time.now.strftime('%Y-%m-%d'))
|
169
|
-
end
|
170
|
-
end
|
171
|
-
end
|
172
|
-
end
|
173
|
-
|
174
|
-
gz = gz_writer(sitemap_index_filename)
|
175
|
-
gz.write(xml)
|
176
|
-
gz.close
|
184
|
+
def root_url
|
185
|
+
@root_url ||= begin
|
186
|
+
url = ''
|
187
|
+
url << (@default_url_options[:protocol] || 'http')
|
188
|
+
url << '://' unless url.match('://')
|
189
|
+
url << @default_url_options[:host]
|
190
|
+
url << ":#{port}" if port = @default_url_options[:port] and port != 80
|
177
191
|
end
|
192
|
+
end
|
178
193
|
|
179
|
-
|
180
|
-
URI.escape("#{@base_url}/#{@web_path}/#{sitemap_index_filename}")
|
181
|
-
end
|
194
|
+
private
|
182
195
|
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
end
|
196
|
+
def strip_leading_slash(str)
|
197
|
+
str.sub(/^\//, '')
|
198
|
+
end
|
187
199
|
|
188
|
-
|
189
|
-
|
190
|
-
|
200
|
+
def pick_method(model, candidates)
|
201
|
+
method = nil
|
202
|
+
candidates.each do |candidate|
|
203
|
+
if model.respond_to? candidate
|
204
|
+
method = candidate
|
205
|
+
break
|
206
|
+
end
|
191
207
|
end
|
208
|
+
method
|
209
|
+
end
|
192
210
|
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
end
|
211
|
+
def xml_open(filename)
|
212
|
+
filename << '.xml'
|
213
|
+
filename << '.gz' if @options[:gzip]
|
197
214
|
|
198
|
-
|
199
|
-
|
200
|
-
|
215
|
+
file = File.open("#{@file_path}/#{filename}", 'w+')
|
216
|
+
|
217
|
+
@sitemap_files << file.path
|
218
|
+
|
219
|
+
writer = @options[:gzip] ? Zlib::GzipWriter.new(file) : file
|
220
|
+
|
221
|
+
if block_given?
|
222
|
+
yield writer
|
223
|
+
writer.close
|
201
224
|
end
|
202
225
|
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
226
|
+
writer
|
227
|
+
end
|
228
|
+
|
229
|
+
def url_for_sitemap(path)
|
230
|
+
"#{root_url}/#{File.basename(path)}"
|
231
|
+
end
|
232
|
+
|
233
|
+
# Create a sitemap index document
|
234
|
+
def generate_sitemap_index
|
235
|
+
xml_open 'sitemap_index' do |file|
|
236
|
+
xml = Builder::XmlMarkup.new(:target => file)
|
237
|
+
xml.instruct!
|
238
|
+
xml.sitemapindex(:xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9') do
|
239
|
+
for path in @sitemap_files[0..-2]
|
240
|
+
xml.sitemap do
|
241
|
+
xml.loc(url_for_sitemap(path))
|
242
|
+
xml.lastmod(Time.now.strftime('%Y-%m-%d'))
|
243
|
+
end
|
244
|
+
end
|
245
|
+
end
|
208
246
|
end
|
209
|
-
end
|
247
|
+
end
|
248
|
+
end
|
data/test/big_sitemap_test.rb
CHANGED
@@ -15,6 +15,14 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
15
15
|
assert_raise(ArgumentError) { BigSitemap.new(:document_root => tmp_dir) }
|
16
16
|
end
|
17
17
|
|
18
|
+
should 'generate the same base URL' do
|
19
|
+
options = {:document_root => tmp_dir}
|
20
|
+
assert_equal(
|
21
|
+
BigSitemap.new(options.merge(:base_url => 'http://example.com')).root_url,
|
22
|
+
BigSitemap.new(options.merge(:url_options => {:host => 'example.com'})).root_url
|
23
|
+
)
|
24
|
+
end
|
25
|
+
|
18
26
|
should 'generate a sitemap index file' do
|
19
27
|
generate_sitemap_files
|
20
28
|
assert File.exists?(sitemaps_index_file)
|
@@ -27,21 +35,14 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
27
35
|
assert File.exists?(single_sitemaps_model_file), "#{single_sitemaps_model_file} exists"
|
28
36
|
end
|
29
37
|
|
30
|
-
should 'generate
|
31
|
-
|
38
|
+
should 'generate two sitemap model files' do
|
39
|
+
generate_two_model_sitemap_files
|
32
40
|
assert File.exists?(first_sitemaps_model_file), "#{first_sitemaps_model_file} exists"
|
33
41
|
assert File.exists?(second_sitemaps_model_file), "#{second_sitemaps_model_file} exists"
|
34
42
|
third_sitemaps_model_file = "#{sitemaps_dir}/sitemap_test_model_3.xml.gz"
|
35
43
|
assert !File.exists?(third_sitemaps_model_file), "#{third_sitemaps_model_file} does not exist"
|
36
44
|
end
|
37
45
|
|
38
|
-
should 'clean all sitemap files' do
|
39
|
-
generate_sitemap_files
|
40
|
-
assert Dir.entries(sitemaps_dir).size > 2, "#{sitemaps_dir} is not empty" # ['.', '..'].size == 2
|
41
|
-
@sitemap.clean
|
42
|
-
assert_equal 2, Dir.entries(sitemaps_dir).size, "#{sitemaps_dir} is empty"
|
43
|
-
end
|
44
|
-
|
45
46
|
context 'Sitemap index file' do
|
46
47
|
should 'contain one sitemapindex element' do
|
47
48
|
generate_sitemap_files
|
@@ -54,79 +55,125 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
54
55
|
end
|
55
56
|
|
56
57
|
should 'contain one loc element' do
|
57
|
-
|
58
|
+
generate_one_sitemap_model_file
|
58
59
|
assert_equal 1, num_elements(sitemaps_index_file, 'loc')
|
59
60
|
end
|
60
61
|
|
61
62
|
should 'contain one lastmod element' do
|
62
|
-
|
63
|
+
generate_one_sitemap_model_file
|
63
64
|
assert_equal 1, num_elements(sitemaps_index_file, 'lastmod')
|
64
65
|
end
|
65
66
|
|
66
67
|
should 'contain two loc elements' do
|
67
|
-
|
68
|
+
generate_two_model_sitemap_files
|
68
69
|
assert_equal 2, num_elements(sitemaps_index_file, 'loc')
|
69
70
|
end
|
70
71
|
|
71
72
|
should 'contain two lastmod elements' do
|
72
|
-
|
73
|
+
generate_two_model_sitemap_files
|
73
74
|
assert_equal 2, num_elements(sitemaps_index_file, 'lastmod')
|
74
75
|
end
|
76
|
+
|
77
|
+
should 'not be gzipped' do
|
78
|
+
generate_sitemap_files(:gzip => false)
|
79
|
+
assert File.exists?(unzipped_sitemaps_index_file)
|
80
|
+
end
|
75
81
|
end
|
76
82
|
|
77
83
|
context 'Sitemap model file' do
|
78
84
|
should 'contain one urlset element' do
|
79
|
-
|
85
|
+
generate_one_sitemap_model_file
|
80
86
|
assert_equal 1, num_elements(single_sitemaps_model_file, 'urlset')
|
81
87
|
end
|
82
88
|
|
83
89
|
should 'contain several loc elements' do
|
84
|
-
|
90
|
+
generate_one_sitemap_model_file
|
85
91
|
assert_equal default_num_items, num_elements(single_sitemaps_model_file, 'loc')
|
86
92
|
end
|
87
93
|
|
88
94
|
should 'contain several lastmod elements' do
|
89
|
-
|
95
|
+
generate_one_sitemap_model_file
|
90
96
|
assert_equal default_num_items, num_elements(single_sitemaps_model_file, 'lastmod')
|
91
97
|
end
|
92
98
|
|
93
99
|
should 'contain several changefreq elements' do
|
94
|
-
|
100
|
+
generate_one_sitemap_model_file
|
95
101
|
assert_equal default_num_items, num_elements(single_sitemaps_model_file, 'changefreq')
|
96
102
|
end
|
97
103
|
|
104
|
+
should 'contain several priority elements' do
|
105
|
+
generate_one_sitemap_model_file(:priority => 0.2)
|
106
|
+
assert_equal default_num_items, num_elements(single_sitemaps_model_file, 'priority')
|
107
|
+
end
|
108
|
+
|
109
|
+
should 'have a change frequency of weekly by default' do
|
110
|
+
generate_one_sitemap_model_file
|
111
|
+
assert_equal 'weekly', elements(single_sitemaps_model_file, 'changefreq').first.text
|
112
|
+
end
|
113
|
+
|
114
|
+
should 'have a change frequency of daily' do
|
115
|
+
generate_one_sitemap_model_file(:change_frequency => 'daily')
|
116
|
+
assert_equal 'daily', elements(single_sitemaps_model_file, 'changefreq').first.text
|
117
|
+
end
|
118
|
+
|
119
|
+
should 'be able to use a lambda to specify change frequency' do
|
120
|
+
generate_one_sitemap_model_file(:change_frequency => lambda {|m| m.change_frequency})
|
121
|
+
assert_equal TestModel.new.change_frequency, elements(single_sitemaps_model_file, 'changefreq').first.text
|
122
|
+
end
|
123
|
+
|
124
|
+
should 'have a priority of 0.2' do
|
125
|
+
generate_one_sitemap_model_file(:priority => 0.2)
|
126
|
+
assert_equal '0.2', elements(single_sitemaps_model_file, 'priority').first.text
|
127
|
+
end
|
128
|
+
|
129
|
+
should 'be able to use a lambda to specify priority' do
|
130
|
+
generate_one_sitemap_model_file(:priority => lambda {|m| m.priority})
|
131
|
+
assert_equal TestModel.new.priority.to_s, elements(single_sitemaps_model_file, 'priority').first.text
|
132
|
+
end
|
133
|
+
|
98
134
|
should 'contain one loc element' do
|
99
|
-
|
135
|
+
generate_two_model_sitemap_files
|
100
136
|
assert_equal 1, num_elements(first_sitemaps_model_file, 'loc')
|
101
137
|
assert_equal 1, num_elements(second_sitemaps_model_file, 'loc')
|
102
138
|
end
|
103
139
|
|
104
140
|
should 'contain one lastmod element' do
|
105
|
-
|
141
|
+
generate_two_model_sitemap_files
|
106
142
|
assert_equal 1, num_elements(first_sitemaps_model_file, 'lastmod')
|
107
143
|
assert_equal 1, num_elements(second_sitemaps_model_file, 'lastmod')
|
108
144
|
end
|
109
145
|
|
110
146
|
should 'contain one changefreq element' do
|
111
|
-
|
147
|
+
generate_two_model_sitemap_files
|
112
148
|
assert_equal 1, num_elements(first_sitemaps_model_file, 'changefreq')
|
113
149
|
assert_equal 1, num_elements(second_sitemaps_model_file, 'changefreq')
|
114
150
|
end
|
115
151
|
|
152
|
+
should 'contain one priority element' do
|
153
|
+
generate_two_model_sitemap_files(:priority => 0.2)
|
154
|
+
assert_equal 1, num_elements(first_sitemaps_model_file, 'priority')
|
155
|
+
assert_equal 1, num_elements(second_sitemaps_model_file, 'priority')
|
156
|
+
end
|
157
|
+
|
116
158
|
should 'strip leading slashes from controller paths' do
|
117
159
|
create_sitemap
|
118
|
-
@sitemap.add(
|
160
|
+
@sitemap.add(TestModel, :path => '/test_controller').generate
|
119
161
|
assert(
|
120
162
|
!elements(single_sitemaps_model_file, 'loc').first.text.match(/\/\/test_controller\//),
|
121
163
|
'URL does not contain a double-slash before the controller path'
|
122
164
|
)
|
123
165
|
end
|
166
|
+
|
167
|
+
should 'not be gzipped' do
|
168
|
+
generate_one_sitemap_model_file(:gzip => false)
|
169
|
+
assert File.exists?(unzipped_single_sitemaps_model_file)
|
170
|
+
end
|
124
171
|
end
|
125
172
|
|
126
173
|
context 'add method' do
|
127
174
|
should 'be chainable' do
|
128
175
|
create_sitemap
|
129
|
-
assert_equal BigSitemap, @sitemap.add(
|
176
|
+
assert_equal BigSitemap, @sitemap.add(TestModel).class
|
130
177
|
end
|
131
178
|
end
|
132
179
|
|
@@ -135,6 +182,13 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
135
182
|
create_sitemap
|
136
183
|
assert_equal BigSitemap, @sitemap.clean.class
|
137
184
|
end
|
185
|
+
|
186
|
+
should 'clean all sitemap files' do
|
187
|
+
generate_sitemap_files
|
188
|
+
assert Dir.entries(sitemaps_dir).size > 2, "#{sitemaps_dir} is not empty" # ['.', '..'].size == 2
|
189
|
+
@sitemap.clean
|
190
|
+
assert_equal 2, Dir.entries(sitemaps_dir).size, "#{sitemaps_dir} is empty"
|
191
|
+
end
|
138
192
|
end
|
139
193
|
|
140
194
|
context 'generate method' do
|
@@ -157,22 +211,32 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
157
211
|
}.update(options))
|
158
212
|
end
|
159
213
|
|
160
|
-
def generate_sitemap_files
|
161
|
-
create_sitemap
|
214
|
+
def generate_sitemap_files(options={})
|
215
|
+
create_sitemap(options)
|
162
216
|
add_model
|
163
217
|
@sitemap.generate
|
164
218
|
end
|
165
219
|
|
166
|
-
def
|
167
|
-
|
168
|
-
|
220
|
+
def generate_one_sitemap_model_file(options={})
|
221
|
+
change_frequency = options.delete(:change_frequency)
|
222
|
+
priority = options.delete(:priority)
|
223
|
+
create_sitemap(options.merge(:max_per_sitemap => default_num_items, :batch_size => default_num_items))
|
224
|
+
add_model(:change_frequency => change_frequency, :priority => priority)
|
225
|
+
@sitemap.generate
|
226
|
+
end
|
227
|
+
|
228
|
+
def generate_two_model_sitemap_files(options={})
|
229
|
+
change_frequency = options.delete(:change_frequency)
|
230
|
+
priority = options.delete(:priority)
|
231
|
+
create_sitemap(options.merge(:max_per_sitemap => 1, :batch_size => 1))
|
232
|
+
add_model(:num_items => 2, :change_frequency => change_frequency, :priority => priority)
|
169
233
|
@sitemap.generate
|
170
234
|
end
|
171
235
|
|
172
236
|
def add_model(options={})
|
173
237
|
num_items = options.delete(:num_items) || default_num_items
|
174
238
|
TestModel.stubs(:num_items).returns(num_items)
|
175
|
-
@sitemap.add(
|
239
|
+
@sitemap.add(TestModel, options)
|
176
240
|
end
|
177
241
|
|
178
242
|
def default_num_items
|
@@ -180,19 +244,27 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
180
244
|
end
|
181
245
|
|
182
246
|
def sitemaps_index_file
|
183
|
-
"#{
|
247
|
+
"#{unzipped_sitemaps_index_file}.gz"
|
248
|
+
end
|
249
|
+
|
250
|
+
def unzipped_sitemaps_index_file
|
251
|
+
"#{sitemaps_dir}/sitemap_index.xml"
|
184
252
|
end
|
185
253
|
|
186
254
|
def single_sitemaps_model_file
|
187
|
-
"#{
|
255
|
+
"#{unzipped_single_sitemaps_model_file}.gz"
|
256
|
+
end
|
257
|
+
|
258
|
+
def unzipped_single_sitemaps_model_file
|
259
|
+
"#{sitemaps_dir}/sitemap_test_models.xml"
|
188
260
|
end
|
189
261
|
|
190
262
|
def first_sitemaps_model_file
|
191
|
-
"#{sitemaps_dir}/
|
263
|
+
"#{sitemaps_dir}/sitemap_test_models_1.xml.gz"
|
192
264
|
end
|
193
265
|
|
194
266
|
def second_sitemaps_model_file
|
195
|
-
"#{sitemaps_dir}/
|
267
|
+
"#{sitemaps_dir}/sitemap_test_models_2.xml.gz"
|
196
268
|
end
|
197
269
|
|
198
270
|
def sitemaps_dir
|
@@ -215,4 +287,4 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
215
287
|
def num_elements(filename, el)
|
216
288
|
elements(filename, el).size
|
217
289
|
end
|
218
|
-
end
|
290
|
+
end
|
data/test/fixtures/test_model.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: alexrabarts-big_sitemap
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Alex Rabarts
|
@@ -9,7 +9,7 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2009-
|
12
|
+
date: 2009-04-06 00:00:00 -07:00
|
13
13
|
default_executable:
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
@@ -38,8 +38,9 @@ executables: []
|
|
38
38
|
|
39
39
|
extensions: []
|
40
40
|
|
41
|
-
extra_rdoc_files:
|
42
|
-
|
41
|
+
extra_rdoc_files:
|
42
|
+
- README.rdoc
|
43
|
+
- LICENSE
|
43
44
|
files:
|
44
45
|
- History.txt
|
45
46
|
- README.rdoc
|
@@ -49,6 +50,7 @@ files:
|
|
49
50
|
- test/fixtures
|
50
51
|
- test/fixtures/test_model.rb
|
51
52
|
- test/test_helper.rb
|
53
|
+
- LICENSE
|
52
54
|
has_rdoc: true
|
53
55
|
homepage: http://github.com/alexrabarts/big_sitemap
|
54
56
|
post_install_message:
|