big_sitemap 0.5.1 → 0.8.1
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +3 -0
- data/Gemfile +7 -0
- data/Gemfile.lock +16 -0
- data/README.rdoc +76 -26
- data/Rakefile +52 -0
- data/VERSION.yml +1 -1
- data/big_sitemap.gemspec +58 -0
- data/lib/big_sitemap/builder.rb +104 -60
- data/lib/big_sitemap.rb +170 -47
- data/test/big_sitemap_test.rb +190 -3
- data/test/fixtures/test_model.rb +15 -1
- data/test/test_helper.rb +2 -1
- metadata +26 -35
data/.gitignore
ADDED
data/Gemfile
ADDED
data/Gemfile.lock
ADDED
data/README.rdoc
CHANGED
@@ -1,55 +1,68 @@
|
|
1
1
|
= BigSitemap
|
2
2
|
|
3
|
-
BigSitemap is a Sitemap
|
3
|
+
BigSitemap is a {Sitemap}[http://sitemaps.org] generator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries to minimize memory usage, supports increment updates, can be set up with just a few lines of code and is compatible with just about any framework.
|
4
4
|
|
5
5
|
BigSitemap is best run periodically through a Rake/Thor task.
|
6
6
|
|
7
7
|
require 'big_sitemap'
|
8
8
|
|
9
|
-
sitemap = BigSitemap.new(
|
9
|
+
sitemap = BigSitemap.new(
|
10
|
+
:url_options => {:host => 'example.com'},
|
11
|
+
:document_root => "#{APP_ROOT}/public"
|
12
|
+
)
|
10
13
|
|
11
14
|
# Add a model
|
12
15
|
sitemap.add Product
|
13
16
|
|
14
17
|
# Add another model with some options
|
15
|
-
sitemap.add(Post,
|
18
|
+
sitemap.add(Post,
|
16
19
|
:conditions => {:published => true},
|
17
20
|
:path => 'articles',
|
18
21
|
:change_frequency => 'daily',
|
19
22
|
:priority => 0.5
|
20
|
-
|
23
|
+
)
|
24
|
+
|
25
|
+
# Add a static resource
|
26
|
+
sitemap.add_static('http://example.com/about', Time.now, 'monthly', 0.1)
|
21
27
|
|
22
28
|
# Generate the files
|
23
29
|
sitemap.generate
|
24
30
|
|
25
|
-
The code above will create a minimum of
|
31
|
+
The code above will create a minimum of four files:
|
26
32
|
|
27
33
|
1. public/sitemaps/sitemap_index.xml.gz
|
28
34
|
2. public/sitemaps/sitemap_products.xml.gz
|
29
35
|
3. public/sitemaps/sitemap_posts.xml.gz
|
36
|
+
4. public/sitemaps/sitemap_static.xml.gz
|
30
37
|
|
31
38
|
If your sitemaps grow beyond 50,000 URLs (this limit can be overridden with the <code>:max_per_sitemap</code> option), the sitemap files will be partitioned into multiple files (<code>sitemap_products_1.xml.gz</code>, <code>sitemap_products_2.xml.gz</code>, ...).
|
32
39
|
|
33
|
-
|
40
|
+
=== Framework-specific Classes
|
41
|
+
|
42
|
+
Use the framework-specific classes to take advantage of built-in shortcuts.
|
43
|
+
|
44
|
+
==== Rails
|
45
|
+
|
46
|
+
<code>BigSiteMapRails</code> includes <code>UrlWriter</code> (useful for making use of your Rails routes - see the Location URLs section) and deals with setting the <code>:document_root</code> and <code>:url_options</code> initialization options.
|
34
47
|
|
35
|
-
|
48
|
+
==== Merb
|
36
49
|
|
37
|
-
|
50
|
+
<code>BigSitemapMerb</code> deals with setting the <code>:document_root</code> initialization option.
|
38
51
|
|
39
52
|
== Install
|
40
53
|
|
41
54
|
Via gem:
|
42
55
|
|
43
|
-
sudo gem install
|
56
|
+
sudo gem install big_sitemap
|
44
57
|
|
45
58
|
== Advanced
|
46
59
|
|
47
|
-
=== Options
|
60
|
+
=== Initialization Options
|
48
61
|
|
49
62
|
* <code>:url_options</code> -- hash with <code>:host</code>, optionally <code>:port</code> and <code>:protocol</code>
|
50
|
-
* <code>:base_url</code> -- string alternative to <code>:url_options</code>, e.g.
|
51
|
-
* <code>:document_root</code> -- string
|
52
|
-
* <code>:path</code> -- string defaults to 'sitemaps'
|
63
|
+
* <code>:base_url</code> -- string alternative to <code>:url_options</code>, e.g. <code>'https://example.com:8080/'</code>
|
64
|
+
* <code>:document_root</code> -- string
|
65
|
+
* <code>:path</code> -- string defaults to <code>'sitemaps'</code>, which places sitemap files under the <code>/sitemaps</code> directory
|
53
66
|
* <code>:max_per_sitemap</code> -- <code>50000</code>, which is the limit dictated by Google but can be less
|
54
67
|
* <code>:batch_size</code> -- <code>1001</code> (not <code>1000</code> due to a bug in DataMapper)
|
55
68
|
* <code>:gzip</code> -- <code>true</code>
|
@@ -57,28 +70,44 @@ Via gem:
|
|
57
70
|
* <code>:ping_yahoo</code> -- <code>false</code>, needs <code>:yahoo_app_id</code>
|
58
71
|
* <code>:ping_bing</code> -- <code>false</code>
|
59
72
|
* <code>:ping_ask</code> -- <code>false</code>
|
73
|
+
* <code>:partial_update</code> -- <code>false</code>
|
60
74
|
|
61
75
|
=== Chaining
|
62
76
|
|
63
|
-
You can chain methods together
|
77
|
+
You can chain methods together:
|
64
78
|
|
65
79
|
BigSitemap.new(:url_options => {:host => 'example.com'}).add(Post).generate
|
66
80
|
|
81
|
+
With the Rails-specific class, you could even get away with as little code as:
|
82
|
+
|
83
|
+
BigSitemapRails.new.add(Post).generate
|
84
|
+
|
67
85
|
=== Pinging Search Engines
|
68
86
|
|
69
87
|
To ping search engines, call <code>ping_search_engines</code> after you generate the sitemap:
|
70
88
|
|
71
|
-
sitemap.generate
|
72
|
-
|
89
|
+
sitemap.generate.ping_search_engines
|
90
|
+
|
91
|
+
=== Location URLs
|
92
|
+
|
93
|
+
By default, URLs for the "loc" values are generated in the form:
|
94
|
+
|
95
|
+
:base_url/:path|<table_name>/<to_param>|<id>
|
96
|
+
|
97
|
+
Alternatively, you can pass a lambda. For example, to make use of your Rails route helper:
|
98
|
+
|
99
|
+
sitemap.add(Post,
|
100
|
+
:location => lambda { |post| post_url(post) }
|
101
|
+
)
|
73
102
|
|
74
103
|
=== Change Frequency, Priority and Last Modified
|
75
104
|
|
76
105
|
You can control "changefreq", "priority" and "lastmod" values for each record individually by passing lambdas instead of fixed values:
|
77
106
|
|
78
|
-
sitemap.add(
|
79
|
-
:change_frequency => lambda {|post| ... },
|
80
|
-
:priority => lambda {|post| ... },
|
81
|
-
:last_modified => lambda {|post| ... }
|
107
|
+
sitemap.add(Post,
|
108
|
+
:change_frequency => lambda { |post| ... },
|
109
|
+
:priority => lambda { |post| ... },
|
110
|
+
:last_modified => lambda { |post| ... }
|
82
111
|
)
|
83
112
|
|
84
113
|
=== Find Methods
|
@@ -87,7 +116,28 @@ Your models must provide either a <code>find_for_sitemap</code> or <code>all</co
|
|
87
116
|
|
88
117
|
Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
|
89
118
|
|
90
|
-
If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you
|
119
|
+
If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you can make use of any supported parameter: (:conditions, :limit, :joins, :select, :order, :include, :group)
|
120
|
+
|
121
|
+
sitemap.add(Track,
|
122
|
+
:select => "id, permalink, user_id, updated_at",
|
123
|
+
:include => :user,
|
124
|
+
:conditions => "public = 1 AND state = 'finished' AND user_id IS NOT NULL",
|
125
|
+
:order => "id ASC"
|
126
|
+
)
|
127
|
+
|
128
|
+
If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
|
129
|
+
|
130
|
+
=== Partial Update
|
131
|
+
|
132
|
+
If you enable <code>:partial_update</code>, the filename will include an id smaller than the id of the first entry. This is perfect to update just the last file with new entries without the need to re-generate files being already there.
|
133
|
+
|
134
|
+
=== Lock Generation Process
|
135
|
+
|
136
|
+
To prevent another process overwriting from the generated files, use the <code>with_lock</code> method:
|
137
|
+
|
138
|
+
sitemap.with_lock do
|
139
|
+
sitemap.generate
|
140
|
+
end
|
91
141
|
|
92
142
|
=== Cleaning the Sitemaps Directory
|
93
143
|
|
@@ -95,23 +145,23 @@ Calling the <code>clean</code> method will remove all files from the Sitemaps di
|
|
95
145
|
|
96
146
|
== Limitations
|
97
147
|
|
98
|
-
If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning).
|
148
|
+
If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). In this case and your database uses incremental primary IDs then you might want to use the <code>:partial_update</code> option, which looks at the last ID instead of paginating.
|
99
149
|
|
100
150
|
== TODO
|
101
151
|
|
102
|
-
Tests for
|
152
|
+
Tests for framework-specific components.
|
103
153
|
|
104
154
|
== Credits
|
105
155
|
|
106
156
|
Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
|
107
|
-
http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
|
108
157
|
|
109
|
-
Thanks to those who have contributed patches:
|
158
|
+
Thanks also to those who have contributed patches:
|
110
159
|
|
111
160
|
* Mislav Marohnić
|
112
161
|
* Jeff Schoolcraft
|
113
162
|
* Dalibor Nasevic
|
163
|
+
* Tobias Bielohlawek (http://www.rngtng.com)
|
114
164
|
|
115
165
|
== Copyright
|
116
166
|
|
117
|
-
Copyright (c)
|
167
|
+
Copyright (c) 2010 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
|
data/Rakefile
ADDED
@@ -0,0 +1,52 @@
|
|
1
|
+
require 'rake'
|
2
|
+
|
3
|
+
begin
|
4
|
+
require 'jeweler'
|
5
|
+
Jeweler::Tasks.new do |s|
|
6
|
+
s.name = "big_sitemap"
|
7
|
+
s.summary = %Q{A Sitemap generator specifically designed for large sites (although it works equally well with small sites)}
|
8
|
+
s.email = %w(tobi@soundcloud.com alexrabarts@gmail.com)
|
9
|
+
s.homepage = "http://github.com/rngtng/big_sitemap"
|
10
|
+
s.description = "A Sitemap generator specifically designed for large sites (although it works equally well with small sites)"
|
11
|
+
s.authors = ["Tobias Bielohlawek", "Alex Rabarts"]
|
12
|
+
s.add_dependency 'bundler'
|
13
|
+
end
|
14
|
+
rescue LoadError
|
15
|
+
puts "Jeweler not available. Install it with: sudo gem install technicalpickles-jeweler -s http://gems.github.com"
|
16
|
+
end
|
17
|
+
|
18
|
+
require 'rake/rdoctask'
|
19
|
+
Rake::RDocTask.new do |rdoc|
|
20
|
+
rdoc.rdoc_dir = 'rdoc'
|
21
|
+
rdoc.title = 'big_sitemap'
|
22
|
+
rdoc.options << '--line-numbers' << '--inline-source'
|
23
|
+
rdoc.rdoc_files.include('README*')
|
24
|
+
rdoc.rdoc_files.include('lib/**/*.rb')
|
25
|
+
end
|
26
|
+
|
27
|
+
require 'rake/testtask'
|
28
|
+
Rake::TestTask.new(:test) do |t|
|
29
|
+
t.libs << 'lib' << 'test' << Rake.original_dir
|
30
|
+
t.pattern = 'test/**/*_test.rb'
|
31
|
+
t.verbose = false
|
32
|
+
end
|
33
|
+
|
34
|
+
begin
|
35
|
+
require 'rcov/rcovtask'
|
36
|
+
Rcov::RcovTask.new do |t|
|
37
|
+
t.libs << 'test'
|
38
|
+
t.test_files = FileList['test/**/*_test.rb']
|
39
|
+
t.verbose = true
|
40
|
+
end
|
41
|
+
rescue LoadError
|
42
|
+
puts "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
|
43
|
+
end
|
44
|
+
|
45
|
+
begin
|
46
|
+
require 'cucumber/rake/task'
|
47
|
+
Cucumber::Rake::Task.new(:features)
|
48
|
+
rescue LoadError
|
49
|
+
puts "Cucumber is not available. In order to run features, you must: sudo gem install cucumber"
|
50
|
+
end
|
51
|
+
|
52
|
+
task :default => :test
|
data/VERSION.yml
CHANGED
data/big_sitemap.gemspec
ADDED
@@ -0,0 +1,58 @@
|
|
1
|
+
# Generated by jeweler
|
2
|
+
# DO NOT EDIT THIS FILE
|
3
|
+
# Instead, edit Jeweler::Tasks in Rakefile, and run `rake gemspec`
|
4
|
+
# -*- encoding: utf-8 -*-
|
5
|
+
|
6
|
+
Gem::Specification.new do |s|
|
7
|
+
s.name = %q{big_sitemap}
|
8
|
+
s.version = "0.8.1"
|
9
|
+
|
10
|
+
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
|
+
s.authors = ["Tobias Bielohlawek", "Alex Rabarts"]
|
12
|
+
s.date = %q{2011-01-25}
|
13
|
+
s.description = %q{A Sitemap generator specifically designed for large sites (although it works equally well with small sites)}
|
14
|
+
s.email = ["tobi@soundcloud.com", "alexrabarts@gmail.com"]
|
15
|
+
s.extra_rdoc_files = [
|
16
|
+
"LICENSE",
|
17
|
+
"README.rdoc"
|
18
|
+
]
|
19
|
+
s.files = [
|
20
|
+
".gitignore",
|
21
|
+
"Gemfile",
|
22
|
+
"Gemfile.lock",
|
23
|
+
"History.txt",
|
24
|
+
"LICENSE",
|
25
|
+
"README.rdoc",
|
26
|
+
"Rakefile",
|
27
|
+
"VERSION.yml",
|
28
|
+
"big_sitemap.gemspec",
|
29
|
+
"lib/big_sitemap.rb",
|
30
|
+
"lib/big_sitemap/builder.rb",
|
31
|
+
"test/big_sitemap_test.rb",
|
32
|
+
"test/fixtures/test_model.rb",
|
33
|
+
"test/test_helper.rb"
|
34
|
+
]
|
35
|
+
s.homepage = %q{http://github.com/rngtng/big_sitemap}
|
36
|
+
s.rdoc_options = ["--charset=UTF-8"]
|
37
|
+
s.require_paths = ["lib"]
|
38
|
+
s.rubygems_version = %q{1.3.7}
|
39
|
+
s.summary = %q{A Sitemap generator specifically designed for large sites (although it works equally well with small sites)}
|
40
|
+
s.test_files = [
|
41
|
+
"test/big_sitemap_test.rb",
|
42
|
+
"test/fixtures/test_model.rb",
|
43
|
+
"test/test_helper.rb"
|
44
|
+
]
|
45
|
+
|
46
|
+
if s.respond_to? :specification_version then
|
47
|
+
current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
|
48
|
+
s.specification_version = 3
|
49
|
+
|
50
|
+
if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
|
51
|
+
s.add_runtime_dependency(%q<bundler>, [">= 0"])
|
52
|
+
else
|
53
|
+
s.add_dependency(%q<bundler>, [">= 0"])
|
54
|
+
end
|
55
|
+
else
|
56
|
+
s.add_dependency(%q<bundler>, [">= 0"])
|
57
|
+
end
|
58
|
+
end
|
data/lib/big_sitemap/builder.rb
CHANGED
@@ -1,104 +1,96 @@
|
|
1
|
-
require '
|
1
|
+
require 'fileutils'
|
2
2
|
require 'zlib'
|
3
3
|
|
4
4
|
class BigSitemap
|
5
|
-
class Builder
|
6
|
-
NAMESPACE = 'http://www.sitemaps.org/schemas/sitemap/0.9'
|
5
|
+
class Builder
|
7
6
|
MAX_URLS = 50000
|
7
|
+
HEADER_ATTRIBUTES = {
|
8
|
+
'xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9',
|
9
|
+
'xmlns:xsi' => "http://www.w3.org/2001/XMLSchema-instance",
|
10
|
+
'xsi:schemaLocation' => "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
|
11
|
+
}
|
8
12
|
|
9
13
|
def initialize(options)
|
10
|
-
@gzip
|
11
|
-
@max_urls
|
12
|
-
@
|
13
|
-
@paths
|
14
|
-
@parts
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
14
|
+
@gzip = options.delete(:gzip)
|
15
|
+
@max_urls = options.delete(:max_urls) || MAX_URLS
|
16
|
+
@type = options.delete(:type)
|
17
|
+
@paths = []
|
18
|
+
@parts = options.delete(:start_part_id) || 0
|
19
|
+
@custom_part_nr = options.delete(:partial_update)
|
20
|
+
|
21
|
+
@filename = options.delete(:filename)
|
22
|
+
@current_filename = nil
|
23
|
+
@tmp_filename = nil
|
24
|
+
@target = _get_writer
|
21
25
|
|
26
|
+
@level = 0
|
22
27
|
@opened_tags = []
|
23
28
|
_init_document
|
24
29
|
end
|
25
30
|
|
26
|
-
def add_url!(url, time = nil, frequency = nil, priority = nil)
|
27
|
-
_rotate if @max_urls == @urls
|
31
|
+
def add_url!(url, time = nil, frequency = nil, priority = nil, part_nr = nil)
|
32
|
+
_rotate(part_nr) if @max_urls == @urls
|
33
|
+
|
34
|
+
_open_tag 'url'
|
35
|
+
tag! 'loc', url
|
36
|
+
tag! 'lastmod', time.utc.strftime('%Y-%m-%dT%H:%M:%S+00:00') if time
|
37
|
+
tag! 'changefreq', frequency if frequency
|
38
|
+
tag! 'priority', priority if priority
|
39
|
+
_close_tag 'url'
|
28
40
|
|
29
|
-
tag!(@index ? 'sitemap' : 'url') do
|
30
|
-
loc url
|
31
|
-
# W3C format is the subset of ISO 8601
|
32
|
-
lastmod(time.utc.strftime('%Y-%m-%dT%H:%M:%S+00:00')) unless time.nil?
|
33
|
-
changefreq(frequency) unless frequency.nil?
|
34
|
-
priority(priority) unless priority.nil?
|
35
|
-
end
|
36
41
|
@urls += 1
|
37
42
|
end
|
38
43
|
|
44
|
+
def paths!
|
45
|
+
@paths
|
46
|
+
end
|
47
|
+
|
39
48
|
def close!
|
40
49
|
_close_document
|
41
50
|
target!.close if target!.respond_to?(:close)
|
51
|
+
File.delete(@current_filename) if File.exists?(@current_filename)
|
52
|
+
File.rename(@tmp_filename, @current_filename)
|
42
53
|
end
|
43
54
|
|
44
|
-
def
|
45
|
-
@
|
55
|
+
def target!
|
56
|
+
@target
|
46
57
|
end
|
47
58
|
|
48
59
|
private
|
49
60
|
|
50
61
|
def _get_writer
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
_open_writer(filename)
|
57
|
-
else
|
58
|
-
target!
|
59
|
-
end
|
62
|
+
filename = @filename.dup
|
63
|
+
filename << "_#{@parts}" if @parts > 0
|
64
|
+
filename << '.xml'
|
65
|
+
filename << '.gz' if @gzip
|
66
|
+
_open_writer(filename)
|
60
67
|
end
|
61
68
|
|
62
69
|
def _open_writer(filename)
|
63
|
-
|
70
|
+
@current_filename = filename
|
71
|
+
@tmp_filename = filename + ".tmp"
|
64
72
|
@paths << filename
|
65
|
-
|
73
|
+
file = ::File.open(@tmp_filename, 'w+')
|
74
|
+
@gzip ? ::Zlib::GzipWriter.new(file) : file
|
66
75
|
end
|
67
76
|
|
68
|
-
def _init_document
|
77
|
+
def _init_document( name = 'urlset', attrs = HEADER_ATTRIBUTES)
|
69
78
|
@urls = 0
|
70
|
-
|
71
|
-
|
79
|
+
target!.print '<?xml version="1.0" encoding="UTF-8"?>'
|
80
|
+
_newline
|
81
|
+
_open_tag name, attrs
|
72
82
|
end
|
73
83
|
|
74
|
-
def _rotate
|
84
|
+
def _rotate(part_nr = nil)
|
75
85
|
# write out the current document and start writing into a new file
|
76
86
|
close!
|
77
|
-
@parts
|
87
|
+
@parts = (part_nr && @custom_part_nr) ? part_nr : @parts + 1
|
78
88
|
@target = _get_writer
|
79
89
|
_init_document
|
80
90
|
end
|
81
91
|
|
82
|
-
# add support for:
|
83
|
-
# xml.open_foo!(attrs)
|
84
|
-
# xml.close_foo!
|
85
|
-
def method_missing(method, *args, &block)
|
86
|
-
if method.to_s =~ /^(open|close)_(.+)!$/
|
87
|
-
operation, name = $1, $2
|
88
|
-
name = "#{name}:#{args.shift}" if Symbol === args.first
|
89
|
-
|
90
|
-
if 'open' == operation
|
91
|
-
_open_tag(name, args.first)
|
92
|
-
else
|
93
|
-
_close_tag(name)
|
94
|
-
end
|
95
|
-
else
|
96
|
-
super
|
97
|
-
end
|
98
|
-
end
|
99
|
-
|
100
92
|
# opens a tag, bumps up level but doesn't require a block
|
101
|
-
def _open_tag(name, attrs)
|
93
|
+
def _open_tag(name, attrs = {})
|
102
94
|
_indent
|
103
95
|
_start_tag(name, attrs)
|
104
96
|
_newline
|
@@ -106,6 +98,23 @@ class BigSitemap
|
|
106
98
|
@opened_tags << name
|
107
99
|
end
|
108
100
|
|
101
|
+
def _start_tag(name, attrs = {})
|
102
|
+
attrs = attrs.map { |attr,value| %Q( #{attr}="#{value}") }.join('')
|
103
|
+
target!.print "<#{name}#{attrs}>"
|
104
|
+
end
|
105
|
+
|
106
|
+
def tag!(name, content, attrs = {})
|
107
|
+
_indent
|
108
|
+
_start_tag(name, attrs)
|
109
|
+
target!.print content.to_s.gsub('&', '&')
|
110
|
+
_end_tag(name)
|
111
|
+
_newline
|
112
|
+
end
|
113
|
+
|
114
|
+
def _end_tag(name)
|
115
|
+
target!.print "</#{name}>"
|
116
|
+
end
|
117
|
+
|
109
118
|
# closes a tag block by decreasing the level and inserting a close tag
|
110
119
|
def _close_tag(name)
|
111
120
|
@opened_tags.pop
|
@@ -120,5 +129,40 @@ class BigSitemap
|
|
120
129
|
_close_tag(name)
|
121
130
|
end
|
122
131
|
end
|
132
|
+
|
133
|
+
def _indent
|
134
|
+
return if @gzip
|
135
|
+
target!.print " " * @level
|
136
|
+
end
|
137
|
+
|
138
|
+
def _newline
|
139
|
+
return if @gzip
|
140
|
+
target!.puts ''
|
141
|
+
end
|
123
142
|
end
|
143
|
+
|
144
|
+
class IndexBuilder < Builder
|
145
|
+
def _init_document(name = 'sitemapindex', attrs = {'xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9'})
|
146
|
+
attrs.merge('xmlns:geo' => "http://www.google.com/geo/schemas/sitemap/1.0")
|
147
|
+
super(name, attrs)
|
148
|
+
end
|
149
|
+
|
150
|
+
def add_url!(url, time = nil)
|
151
|
+
_open_tag 'sitemap'
|
152
|
+
tag! 'loc', url
|
153
|
+
tag! 'lastmod', time.utc.strftime('%Y-%m-%dT%H:%M:%S+00:00') if time
|
154
|
+
_close_tag 'sitemap'
|
155
|
+
end
|
156
|
+
end
|
157
|
+
|
158
|
+
class GeoBuilder < Builder
|
159
|
+
#_build_geo if @geo
|
160
|
+
|
161
|
+
# def _build_geo
|
162
|
+
# geo :geo do
|
163
|
+
# geo :format, 'kml'
|
164
|
+
# end
|
165
|
+
# end
|
166
|
+
end
|
167
|
+
|
124
168
|
end
|
data/lib/big_sitemap.rb
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
require 'uri'
|
2
|
+
require 'fileutils'
|
3
|
+
|
2
4
|
require 'big_sitemap/builder'
|
3
|
-
require 'extlib'
|
4
|
-
require 'action_controller' if defined? Rails
|
5
5
|
|
6
6
|
class BigSitemap
|
7
7
|
DEFAULTS = {
|
@@ -22,13 +22,10 @@ class BigSitemap
|
|
22
22
|
TIMESTAMP_METHODS = [:updated_at, :updated_on, :updated, :created_at, :created_on, :created]
|
23
23
|
PARAM_METHODS = [:to_param, :id]
|
24
24
|
|
25
|
-
include ActionController::UrlWriter if defined? Rails
|
26
|
-
|
27
25
|
def initialize(options)
|
28
26
|
@options = DEFAULTS.merge options
|
29
27
|
|
30
|
-
|
31
|
-
@default_url_options = defined?(Rails) ? default_url_options : {}
|
28
|
+
@default_url_options = options.delete(:default_url_options) || {}
|
32
29
|
|
33
30
|
if @options[:max_per_sitemap] <= 1
|
34
31
|
raise ArgumentError, '":max_per_sitemap" must be greater than 1'
|
@@ -49,13 +46,7 @@ class BigSitemap
|
|
49
46
|
raise ArgumentError, '":batch_size" must be less than ":max_per_sitemap"'
|
50
47
|
end
|
51
48
|
|
52
|
-
@options[:document_root] ||=
|
53
|
-
if defined? Rails
|
54
|
-
"#{Rails.root}/public"
|
55
|
-
elsif defined? Merb
|
56
|
-
"#{Merb.root}/public"
|
57
|
-
end
|
58
|
-
end
|
49
|
+
@options[:document_root] ||= document_root
|
59
50
|
|
60
51
|
unless @options[:document_root]
|
61
52
|
raise ArgumentError, 'Document root must be specified with the ":document_root" option'
|
@@ -69,27 +60,77 @@ class BigSitemap
|
|
69
60
|
end
|
70
61
|
|
71
62
|
def add(model, options={})
|
72
|
-
options[:path]
|
63
|
+
options[:path] ||= table_name(model)
|
64
|
+
options[:filename] ||= file_name(model)
|
65
|
+
options[:primary_column] ||= 'id' if model.new.respond_to?('id')
|
66
|
+
options[:partial_update] = @options[:partial_update] && options[:partial_update] != false
|
73
67
|
@sources << [model, options.dup]
|
74
|
-
|
68
|
+
self
|
69
|
+
end
|
70
|
+
|
71
|
+
def add_static(url, time = nil, frequency = nil, priority = nil)
|
72
|
+
@static_pages ||= []
|
73
|
+
@static_pages << [url, time, frequency, priority]
|
74
|
+
self
|
75
|
+
end
|
76
|
+
|
77
|
+
def with_lock
|
78
|
+
lock!
|
79
|
+
begin
|
80
|
+
yield
|
81
|
+
ensure
|
82
|
+
unlock!
|
83
|
+
end
|
84
|
+
rescue Errno::EACCES => e
|
85
|
+
STDERR.puts "Lockfile exists"
|
86
|
+
end
|
87
|
+
|
88
|
+
def table_name(model)
|
89
|
+
model.table_name
|
90
|
+
end
|
91
|
+
|
92
|
+
def file_name(name)
|
93
|
+
name = table_name(name) unless name.is_a? String
|
94
|
+
"#{@file_path}/sitemap_#{name}"
|
95
|
+
end
|
96
|
+
|
97
|
+
def document_root
|
75
98
|
end
|
76
99
|
|
77
100
|
def clean
|
78
101
|
Dir["#{@file_path}/sitemap_*.{xml,xml.gz}"].each do |file|
|
79
102
|
FileUtils.rm file
|
80
103
|
end
|
81
|
-
|
104
|
+
self
|
82
105
|
end
|
83
106
|
|
84
107
|
def generate
|
108
|
+
prepare_update
|
109
|
+
|
110
|
+
generate_models
|
111
|
+
generate_static
|
112
|
+
generate_sitemap_index
|
113
|
+
self
|
114
|
+
end
|
115
|
+
|
116
|
+
def generate_models
|
85
117
|
for model, options in @sources
|
86
|
-
with_sitemap(
|
118
|
+
with_sitemap(model, options.dup) do |sitemap|
|
119
|
+
last_id = nil #id of last processed item
|
87
120
|
count_method = pick_method(model, COUNT_METHODS)
|
88
121
|
find_method = pick_method(model, FIND_METHODS)
|
89
122
|
raise ArgumentError, "#{model} must provide a count_for_sitemap class method" if count_method.nil?
|
90
123
|
raise ArgumentError, "#{model} must provide a find_for_sitemap class method" if find_method.nil?
|
91
124
|
|
92
|
-
|
125
|
+
find_options = {}
|
126
|
+
[:conditions, :limit, :joins, :select, :order, :include, :group].each do |key|
|
127
|
+
find_options[key] = options.delete(key)
|
128
|
+
end
|
129
|
+
|
130
|
+
primary_column = options.delete(:primary_column)
|
131
|
+
|
132
|
+
count = model.send(count_method, find_options.merge(:select => (primary_column || '*'), :include => nil))
|
133
|
+
count = find_options[:limit].to_i if find_options[:limit] && find_options[:limit].to_i < count
|
93
134
|
num_sitemaps = 1
|
94
135
|
num_batches = 1
|
95
136
|
|
@@ -99,18 +140,22 @@ class BigSitemap
|
|
99
140
|
end
|
100
141
|
batches_per_sitemap = num_batches.to_f / num_sitemaps.to_f
|
101
142
|
|
102
|
-
find_options = options.except(:path, :num_items, :priority, :change_frequency, :last_modified)
|
103
|
-
|
104
143
|
for sitemap_num in 1..num_sitemaps
|
105
144
|
# Work out the start and end batch numbers for this sitemap
|
106
145
|
batch_num_start = sitemap_num == 1 ? 1 : ((sitemap_num * batches_per_sitemap).ceil - batches_per_sitemap + 1).to_i
|
107
146
|
batch_num_end = (batch_num_start + [batches_per_sitemap, num_batches].min).floor - 1
|
108
147
|
|
109
148
|
for batch_num in batch_num_start..batch_num_end
|
110
|
-
offset
|
111
|
-
limit
|
149
|
+
offset = (batch_num - 1) * @options[:batch_size]
|
150
|
+
limit = (count - offset) < @options[:batch_size] ? (count - offset) : @options[:batch_size]
|
112
151
|
find_options.update(:limit => limit, :offset => offset) if num_batches > 1
|
113
152
|
|
153
|
+
if last_id && primary_column
|
154
|
+
find_options.update(:limit => limit, :offset => nil)
|
155
|
+
primary_column_value = last_id.to_s.gsub("'", %q(\\\')) #escape '
|
156
|
+
find_options.update(:conditions => [find_options[:conditions], "(#{primary_column} > '#{primary_column_value}')"].compact.join(' AND '))
|
157
|
+
end
|
158
|
+
|
114
159
|
model.send(find_method, find_options).each do |record|
|
115
160
|
last_mod = options[:last_modified]
|
116
161
|
if last_mod.is_a?(Proc)
|
@@ -122,8 +167,12 @@ class BigSitemap
|
|
122
167
|
|
123
168
|
param_method = pick_method(record, PARAM_METHODS)
|
124
169
|
|
125
|
-
location =
|
126
|
-
location
|
170
|
+
location = options[:location]
|
171
|
+
if location.is_a?(Proc)
|
172
|
+
location = location.call(record)
|
173
|
+
else
|
174
|
+
location = "#{root_url}/#{strip_leading_slash(options[:path])}/#{record.send(param_method)}"
|
175
|
+
end
|
127
176
|
|
128
177
|
change_frequency = options[:change_frequency] || 'weekly'
|
129
178
|
freq = change_frequency.is_a?(Proc) ? change_frequency.call(record) : change_frequency
|
@@ -131,16 +180,36 @@ class BigSitemap
|
|
131
180
|
priority = options[:priority]
|
132
181
|
pri = priority.is_a?(Proc) ? priority.call(record) : priority
|
133
182
|
|
134
|
-
|
183
|
+
last_id = primary_column ? record.send(primary_column) : nil
|
184
|
+
sitemap.add_url!(location, last_mod, freq, pri, last_id)
|
135
185
|
end
|
136
186
|
end
|
137
187
|
end
|
138
188
|
end
|
139
189
|
end
|
190
|
+
self
|
191
|
+
end
|
140
192
|
|
141
|
-
|
193
|
+
def generate_static
|
194
|
+
return self if Array(@static_pages).empty?
|
195
|
+
with_sitemap('static', :type => 'static') do |sitemap|
|
196
|
+
@static_pages.each do |location, last_mod, freq, pri|
|
197
|
+
sitemap.add_url!(location, last_mod, freq, pri)
|
198
|
+
end
|
199
|
+
end
|
200
|
+
self
|
201
|
+
end
|
142
202
|
|
143
|
-
|
203
|
+
# Create a sitemap index document
|
204
|
+
def generate_sitemap_index(files = nil)
|
205
|
+
files ||= Dir["#{@file_path}/sitemap_*.{xml,xml.gz}"]
|
206
|
+
with_sitemap 'index', :type => 'index' do |sitemap|
|
207
|
+
for path in files
|
208
|
+
next if path =~ /index/
|
209
|
+
sitemap.add_url!(url_for_sitemap(path), File.stat(path).mtime)
|
210
|
+
end
|
211
|
+
end
|
212
|
+
self
|
144
213
|
end
|
145
214
|
|
146
215
|
def ping_search_engines
|
@@ -186,16 +255,40 @@ class BigSitemap
|
|
186
255
|
|
187
256
|
private
|
188
257
|
|
189
|
-
def
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
258
|
+
def prepare_update
|
259
|
+
@files_to_move = []
|
260
|
+
@sources.each do |model, options|
|
261
|
+
if options[:partial_update] && primary_column = options[:primary_column] && last_id = get_last_id(options[:filename])
|
262
|
+
primary_column_value = last_id.to_s.gsub("'", %q(\\\')) #escape '
|
263
|
+
options[:conditions] = [options[:conditions], "(#{primary_column} >= '#{primary_column_value}')"].compact.join(' AND ')
|
264
|
+
options[:start_part_id] = last_id
|
265
|
+
end
|
196
266
|
end
|
267
|
+
end
|
268
|
+
|
269
|
+
def lock!(lock_file = 'generator.lock')
|
270
|
+
File.open("#{@file_path}/#{lock_file}", 'w', File::EXCL)
|
271
|
+
end
|
197
272
|
|
198
|
-
|
273
|
+
def unlock!(lock_file = 'generator.lock')
|
274
|
+
FileUtils.rm "#{@file_path}/#{lock_file}"
|
275
|
+
end
|
276
|
+
|
277
|
+
def with_sitemap(name, options={})
|
278
|
+
options[:filename] ||= file_name(name)
|
279
|
+
options[:type] ||= 'sitemap'
|
280
|
+
options[:max_urls] ||= @options["max_per_#{options[:type]}".to_sym]
|
281
|
+
options[:gzip] ||= @options[:gzip]
|
282
|
+
options[:indent] = options[:gzip] ? 0 : 2
|
283
|
+
|
284
|
+
sitemap = if options[:type] == 'index'
|
285
|
+
IndexBuilder.new(options)
|
286
|
+
elsif options[:geo]
|
287
|
+
options[:filename] << '_kml'
|
288
|
+
GeoBuilder.new(options)
|
289
|
+
else
|
290
|
+
Builder.new(options)
|
291
|
+
end
|
199
292
|
|
200
293
|
begin
|
201
294
|
yield sitemap
|
@@ -209,6 +302,12 @@ class BigSitemap
|
|
209
302
|
str.sub(/^\//, '')
|
210
303
|
end
|
211
304
|
|
305
|
+
def get_last_id(filename)
|
306
|
+
Dir["#{filename}*.{xml,xml.gz}"].map do |file|
|
307
|
+
file.to_s.scan(/#{filename}_(.+).xml/).flatten.last.to_i
|
308
|
+
end.sort.last
|
309
|
+
end
|
310
|
+
|
212
311
|
def pick_method(model, candidates)
|
213
312
|
method = nil
|
214
313
|
candidates.each do |candidate|
|
@@ -221,19 +320,43 @@ class BigSitemap
|
|
221
320
|
end
|
222
321
|
|
223
322
|
def url_for_sitemap(path)
|
224
|
-
|
225
|
-
"#{root_url}/#{File.basename(path)}"
|
226
|
-
else
|
227
|
-
"#{root_url}/#{@options[:path]}/#{File.basename(path)}"
|
228
|
-
end
|
323
|
+
[root_url, @options[:path], File.basename(path)].compact.join('/')
|
229
324
|
end
|
230
325
|
|
231
|
-
|
232
|
-
|
233
|
-
|
234
|
-
|
235
|
-
|
236
|
-
|
237
|
-
|
326
|
+
end
|
327
|
+
|
328
|
+
|
329
|
+
|
330
|
+
class BigSitemapRails < BigSitemap
|
331
|
+
|
332
|
+
include ActionController::UrlWriter if defined? Rails
|
333
|
+
|
334
|
+
def initialize(options)
|
335
|
+
require 'action_controller'
|
336
|
+
|
337
|
+
super options.merge(:default_url_options => default_url_options)
|
338
|
+
end
|
339
|
+
|
340
|
+
def document_root
|
341
|
+
"#{Rails.root}/public"
|
238
342
|
end
|
343
|
+
end
|
344
|
+
|
345
|
+
|
346
|
+
|
347
|
+
class BigSitemapMerb < BigSitemap
|
348
|
+
|
349
|
+
def initialize(options)
|
350
|
+
require 'extlib'
|
351
|
+
super
|
352
|
+
end
|
353
|
+
|
354
|
+
def document_root
|
355
|
+
"#{Merb.root}/public"
|
356
|
+
end
|
357
|
+
|
358
|
+
def table_name(model)
|
359
|
+
Extlib::Inflection.tableize(model.to_s)
|
360
|
+
end
|
361
|
+
|
239
362
|
end
|
data/test/big_sitemap_test.rb
CHANGED
@@ -161,6 +161,7 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
161
161
|
|
162
162
|
should 'strip leading slashes from controller paths' do
|
163
163
|
create_sitemap
|
164
|
+
add_model
|
164
165
|
@sitemap.add(TestModel, :path => '/test_controller').generate
|
165
166
|
assert(
|
166
167
|
!elements(first_sitemaps_model_file, 'loc').first.text.match(/\/\/test_controller\//),
|
@@ -181,6 +182,29 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
181
182
|
end
|
182
183
|
end
|
183
184
|
|
185
|
+
context 'add static method' do
|
186
|
+
should 'should generate static content' do
|
187
|
+
create_sitemap
|
188
|
+
@sitemap.add_static('/', Time.now, 'weekly', 0.5)
|
189
|
+
@sitemap.add_static('/about', Time.now, 'weekly', 0.5)
|
190
|
+
@sitemap.generate_static
|
191
|
+
elems = elements(static_sitemaps_file, 'loc')
|
192
|
+
assert_equal "/", elems.first.text
|
193
|
+
assert_equal "/about", elems.last.text
|
194
|
+
end
|
195
|
+
end
|
196
|
+
|
197
|
+
context 'sanatize XML chars' do
|
198
|
+
should 'should transform ampersands' do
|
199
|
+
create_sitemap
|
200
|
+
@sitemap.add_static('/something&else', Time.now, 'weekly', 0.5)
|
201
|
+
@sitemap.generate_static
|
202
|
+
elems = elements(static_sitemaps_file, 'loc')
|
203
|
+
assert Zlib::GzipReader.open(static_sitemaps_file).read.include?("/something&else")
|
204
|
+
assert_equal "/something&else", elems.first.text
|
205
|
+
end
|
206
|
+
end
|
207
|
+
|
184
208
|
context 'clean method' do
|
185
209
|
should 'be chainable' do
|
186
210
|
create_sitemap
|
@@ -202,6 +226,164 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
202
226
|
end
|
203
227
|
end
|
204
228
|
|
229
|
+
context 'sitemap index' do
|
230
|
+
should 'generate for all xml files in directory' do
|
231
|
+
create_sitemap
|
232
|
+
@sitemap.clean
|
233
|
+
File.open("#{sitemaps_dir}/sitemap_file1.xml", 'w')
|
234
|
+
File.open("#{sitemaps_dir}/sitemap_file2.xml.gz", 'w')
|
235
|
+
File.open("#{sitemaps_dir}/sitemap_file3.txt", 'w')
|
236
|
+
File.open("#{sitemaps_dir}/file4.xml", 'w')
|
237
|
+
File.open(unzipped_sitemaps_index_file, 'w')
|
238
|
+
@sitemap.send :generate_sitemap_index
|
239
|
+
|
240
|
+
elem = elements(sitemaps_index_file, 'loc')
|
241
|
+
assert_equal 2, elem.size #no index and file3 and file4 found
|
242
|
+
assert_equal "http://example.com/sitemaps/sitemap_file1.xml", elem.first.text
|
243
|
+
assert_equal "http://example.com/sitemaps/sitemap_file2.xml.gz", elem.last.text
|
244
|
+
end
|
245
|
+
|
246
|
+
should 'generate for all for given file' do
|
247
|
+
create_sitemap
|
248
|
+
@sitemap.clean
|
249
|
+
File.open("#{sitemaps_dir}/sitemap_file1.xml", 'w')
|
250
|
+
File.open("#{sitemaps_dir}/sitemap_file2.xml.gz", 'w')
|
251
|
+
files = ["#{sitemaps_dir}/sitemap_file1.xml", "#{sitemaps_dir}/sitemap_file2.xml.gz"]
|
252
|
+
@sitemap.send :generate_sitemap_index, files
|
253
|
+
|
254
|
+
elem = elements(sitemaps_index_file, 'loc')
|
255
|
+
assert_equal 2, elem.size
|
256
|
+
assert_equal "http://example.com/sitemaps/sitemap_file1.xml", elem.first.text
|
257
|
+
assert_equal "http://example.com/sitemaps/sitemap_file2.xml.gz", elem.last.text
|
258
|
+
end
|
259
|
+
end
|
260
|
+
|
261
|
+
context 'get_last_id' do
|
262
|
+
should 'return last id' do
|
263
|
+
create_sitemap.clean
|
264
|
+
filename = "#{sitemaps_dir}/sitemap_file"
|
265
|
+
File.open("#{filename}_1.xml", 'w')
|
266
|
+
File.open("#{filename}_23.xml", 'w')
|
267
|
+
File.open("#{filename}_42.xml.gz", 'w')
|
268
|
+
File.open("#{filename}_9.xml", 'w')
|
269
|
+
assert_equal 42, @sitemap.send(:get_last_id, filename)
|
270
|
+
end
|
271
|
+
|
272
|
+
should 'return nil' do
|
273
|
+
create_sitemap.clean
|
274
|
+
filename = "#{sitemaps_dir}/sitemap_file"
|
275
|
+
assert_equal nil, @sitemap.send(:get_last_id, filename)
|
276
|
+
end
|
277
|
+
end
|
278
|
+
|
279
|
+
context 'partial update' do
|
280
|
+
should 'generate for all xml files in directory and delete last file' do
|
281
|
+
TestModel.current_id = last_id = 27
|
282
|
+
filename = "#{sitemaps_dir}/sitemap_test_models"
|
283
|
+
|
284
|
+
create_sitemap(:partial_update => true, :gzip => false, :batch_size => 5, :max_per_sitemap => 5, :max_per_index => 100).clean
|
285
|
+
add_model(:num_items => 50 - last_id) #TestModel
|
286
|
+
|
287
|
+
File.open("#{filename}.xml", 'w')
|
288
|
+
File.open("#{filename}_5.xml", 'w')
|
289
|
+
File.open("#{filename}_9.xml", 'w')
|
290
|
+
File.open("#{filename}_23.xml", 'w')
|
291
|
+
File.open("#{filename}_#{last_id}.xml", 'w')
|
292
|
+
@sitemap.generate
|
293
|
+
|
294
|
+
# Dir["#{sitemaps_dir}/*"].each do |d| puts d; end
|
295
|
+
|
296
|
+
assert File.exists?("#{filename}_48.xml")
|
297
|
+
assert File.exists?("#{filename}_#{last_id}.xml")
|
298
|
+
elems = elements("#{filename}_#{last_id}.xml", 'loc').map(&:text)
|
299
|
+
|
300
|
+
assert_equal 5, elems.size
|
301
|
+
(28..32).each do |i|
|
302
|
+
assert elems.include? "http://example.com/test_models/#{i}"
|
303
|
+
end
|
304
|
+
|
305
|
+
elems = elements(unzipped_sitemaps_index_file, 'loc').map(&:text)
|
306
|
+
assert elems.include? "http://example.com/sitemaps/sitemap_test_models.xml"
|
307
|
+
assert elems.include? "http://example.com/sitemaps/sitemap_test_models_9.xml"
|
308
|
+
assert elems.include? "http://example.com/sitemaps/sitemap_test_models_#{last_id}.xml"
|
309
|
+
assert elems.include? "http://example.com/sitemaps/sitemap_test_models_48.xml"
|
310
|
+
end
|
311
|
+
|
312
|
+
should 'generate sitemap, update should respect old files' do
|
313
|
+
max_id = 23
|
314
|
+
TestModel.current_id = 0
|
315
|
+
filename = "#{sitemaps_dir}/sitemap_test_models"
|
316
|
+
|
317
|
+
create_sitemap(:partial_update => true, :gzip => false, :batch_size => 5, :max_per_sitemap => 5, :max_per_index => 100).clean
|
318
|
+
add_model(:num_items => max_id) #TestModel
|
319
|
+
@sitemap.generate
|
320
|
+
|
321
|
+
# Dir["#{sitemaps_dir}/*"].each do |d| puts d; end
|
322
|
+
|
323
|
+
assert_equal 5, elements("#{filename}.xml", 'loc').size
|
324
|
+
assert_equal 5, elements("#{filename}_6.xml", 'loc').size
|
325
|
+
assert_equal 3, elements("#{filename}_21.xml", 'loc').size
|
326
|
+
|
327
|
+
TestModel.current_id = 20 #last_id is 21, so start with one below
|
328
|
+
create_sitemap(:partial_update => true, :gzip => false, :batch_size => 5, :max_per_sitemap => 5, :max_per_index => 100)
|
329
|
+
add_model( :num_items => 48 - TestModel.current_id ) #TestModel
|
330
|
+
@sitemap.generate
|
331
|
+
|
332
|
+
assert_equal 5, elements("#{filename}_6.xml", 'loc').size
|
333
|
+
assert_equal 5, elements("#{filename}_21.xml", 'loc').size
|
334
|
+
|
335
|
+
# Dir["#{sitemaps_dir}/*"].each do |d| puts d; end
|
336
|
+
|
337
|
+
elems = elements("#{filename}_26.xml", 'loc').map(&:text)
|
338
|
+
(26..30).each do |i|
|
339
|
+
assert elems.include? "http://example.com/test_models/#{i}"
|
340
|
+
end
|
341
|
+
|
342
|
+
#puts `cat /tmp/sitemaps/sitemap_test_models_41.xml`
|
343
|
+
|
344
|
+
assert_equal 3, elements("#{filename}_46.xml", 'loc').size
|
345
|
+
end
|
346
|
+
|
347
|
+
context 'lockfile' do
|
348
|
+
should 'create and delete lock file' do
|
349
|
+
sitemap = BigSitemap.new(:base_url => 'http://example.com', :document_root => tmp_dir)
|
350
|
+
|
351
|
+
sitemap.with_lock do
|
352
|
+
assert File.exists?('/tmp/sitemaps/generator.lock')
|
353
|
+
end
|
354
|
+
|
355
|
+
assert !File.exists?('/tmp/sitemaps/generator.lock')
|
356
|
+
end
|
357
|
+
|
358
|
+
should 'not catch error not related to lock' do
|
359
|
+
sitemap = BigSitemap.new(:base_url => 'http://example.com', :document_root => tmp_dir)
|
360
|
+
|
361
|
+
assert_raise RuntimeError do
|
362
|
+
sitemap.with_lock do
|
363
|
+
raise "Wrong"
|
364
|
+
end
|
365
|
+
end
|
366
|
+
|
367
|
+
end
|
368
|
+
|
369
|
+
should 'throw error if lock exits' do
|
370
|
+
sitemap = BigSitemap.new(:base_url => 'http://example.com', :document_root => tmp_dir)
|
371
|
+
|
372
|
+
sitemap.with_lock do
|
373
|
+
sitemap2 = BigSitemap.new(:base_url => 'http://example.com', :document_root => tmp_dir)
|
374
|
+
|
375
|
+
assert_nothing_raised do
|
376
|
+
sitemap2.with_lock do
|
377
|
+
raise "Should not be called"
|
378
|
+
end
|
379
|
+
end
|
380
|
+
|
381
|
+
end
|
382
|
+
end
|
383
|
+
|
384
|
+
end
|
385
|
+
end
|
386
|
+
|
205
387
|
private
|
206
388
|
def delete_tmp_files
|
207
389
|
FileUtils.rm_rf(sitemaps_dir)
|
@@ -211,7 +393,7 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
211
393
|
@sitemap = BigSitemap.new({
|
212
394
|
:base_url => 'http://example.com',
|
213
395
|
:document_root => tmp_dir,
|
214
|
-
:
|
396
|
+
:ping_google => false
|
215
397
|
}.update(options))
|
216
398
|
end
|
217
399
|
|
@@ -239,7 +421,7 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
239
421
|
|
240
422
|
def add_model(options={})
|
241
423
|
num_items = options.delete(:num_items) || default_num_items
|
242
|
-
TestModel.stubs(:
|
424
|
+
TestModel.stubs(:count_for_sitemap).returns(num_items)
|
243
425
|
@sitemap.add(TestModel, options)
|
244
426
|
end
|
245
427
|
|
@@ -263,6 +445,10 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
263
445
|
"#{sitemaps_dir}/sitemap_test_models.xml.gz"
|
264
446
|
end
|
265
447
|
|
448
|
+
def static_sitemaps_file
|
449
|
+
"#{sitemaps_dir}/sitemap_static.xml.gz"
|
450
|
+
end
|
451
|
+
|
266
452
|
def second_sitemaps_model_file
|
267
453
|
"#{sitemaps_dir}/sitemap_test_models_1.xml.gz"
|
268
454
|
end
|
@@ -284,7 +470,8 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
284
470
|
end
|
285
471
|
|
286
472
|
def elements(filename, el)
|
287
|
-
|
473
|
+
file_class = filename.include?('.gz') ? Zlib::GzipReader : File
|
474
|
+
data = Nokogiri::XML.parse(file_class.open(filename).read)
|
288
475
|
data.search("//s:#{el}", ns)
|
289
476
|
end
|
290
477
|
|
data/test/fixtures/test_model.rb
CHANGED
@@ -1,6 +1,10 @@
|
|
1
1
|
class TestModel
|
2
2
|
def to_param
|
3
|
-
object_id
|
3
|
+
id #|| object_id
|
4
|
+
end
|
5
|
+
|
6
|
+
def id
|
7
|
+
@id ||= TestModel.current_id += 1
|
4
8
|
end
|
5
9
|
|
6
10
|
def change_frequency
|
@@ -16,6 +20,10 @@ class TestModel
|
|
16
20
|
end
|
17
21
|
|
18
22
|
class << self
|
23
|
+
def table_name
|
24
|
+
'test_models'
|
25
|
+
end
|
26
|
+
|
19
27
|
def count_for_sitemap
|
20
28
|
self.find_for_sitemap.size
|
21
29
|
end
|
@@ -30,5 +38,11 @@ class TestModel
|
|
30
38
|
num_times.times { instances.push(self.new) }
|
31
39
|
instances
|
32
40
|
end
|
41
|
+
|
42
|
+
attr_writer :current_id
|
43
|
+
|
44
|
+
def current_id
|
45
|
+
@current_id ||= 0
|
46
|
+
end
|
33
47
|
end
|
34
48
|
end
|
data/test/test_helper.rb
CHANGED
metadata
CHANGED
@@ -1,81 +1,70 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: big_sitemap
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
hash:
|
4
|
+
hash: 61
|
5
5
|
prerelease: false
|
6
6
|
segments:
|
7
7
|
- 0
|
8
|
-
-
|
8
|
+
- 8
|
9
9
|
- 1
|
10
|
-
version: 0.
|
10
|
+
version: 0.8.1
|
11
11
|
platform: ruby
|
12
12
|
authors:
|
13
|
+
- Tobias Bielohlawek
|
13
14
|
- Alex Rabarts
|
14
15
|
autorequire:
|
15
16
|
bindir: bin
|
16
17
|
cert_chain: []
|
17
18
|
|
18
|
-
date:
|
19
|
+
date: 2011-01-25 00:00:00 +00:00
|
19
20
|
default_executable:
|
20
21
|
dependencies:
|
21
22
|
- !ruby/object:Gem::Dependency
|
22
|
-
name:
|
23
|
+
name: bundler
|
23
24
|
prerelease: false
|
24
25
|
requirement: &id001 !ruby/object:Gem::Requirement
|
25
26
|
none: false
|
26
27
|
requirements:
|
27
28
|
- - ">="
|
28
29
|
- !ruby/object:Gem::Version
|
29
|
-
hash:
|
30
|
-
segments:
|
31
|
-
- 2
|
32
|
-
- 1
|
33
|
-
- 2
|
34
|
-
version: 2.1.2
|
35
|
-
type: :runtime
|
36
|
-
version_requirements: *id001
|
37
|
-
- !ruby/object:Gem::Dependency
|
38
|
-
name: extlib
|
39
|
-
prerelease: false
|
40
|
-
requirement: &id002 !ruby/object:Gem::Requirement
|
41
|
-
none: false
|
42
|
-
requirements:
|
43
|
-
- - ">="
|
44
|
-
- !ruby/object:Gem::Version
|
45
|
-
hash: 41
|
30
|
+
hash: 3
|
46
31
|
segments:
|
47
32
|
- 0
|
48
|
-
|
49
|
-
- 9
|
50
|
-
version: 0.9.9
|
33
|
+
version: "0"
|
51
34
|
type: :runtime
|
52
|
-
version_requirements: *
|
35
|
+
version_requirements: *id001
|
53
36
|
description: A Sitemap generator specifically designed for large sites (although it works equally well with small sites)
|
54
|
-
email:
|
37
|
+
email:
|
38
|
+
- tobi@soundcloud.com
|
39
|
+
- alexrabarts@gmail.com
|
55
40
|
executables: []
|
56
41
|
|
57
42
|
extensions: []
|
58
43
|
|
59
44
|
extra_rdoc_files:
|
60
|
-
- README.rdoc
|
61
45
|
- LICENSE
|
46
|
+
- README.rdoc
|
62
47
|
files:
|
48
|
+
- .gitignore
|
49
|
+
- Gemfile
|
50
|
+
- Gemfile.lock
|
63
51
|
- History.txt
|
52
|
+
- LICENSE
|
64
53
|
- README.rdoc
|
54
|
+
- Rakefile
|
65
55
|
- VERSION.yml
|
66
|
-
-
|
56
|
+
- big_sitemap.gemspec
|
67
57
|
- lib/big_sitemap.rb
|
58
|
+
- lib/big_sitemap/builder.rb
|
68
59
|
- test/big_sitemap_test.rb
|
69
60
|
- test/fixtures/test_model.rb
|
70
61
|
- test/test_helper.rb
|
71
|
-
- LICENSE
|
72
62
|
has_rdoc: true
|
73
|
-
homepage: http://github.com/
|
63
|
+
homepage: http://github.com/rngtng/big_sitemap
|
74
64
|
licenses: []
|
75
65
|
|
76
66
|
post_install_message:
|
77
67
|
rdoc_options:
|
78
|
-
- --inline-source
|
79
68
|
- --charset=UTF-8
|
80
69
|
require_paths:
|
81
70
|
- lib
|
@@ -102,7 +91,9 @@ requirements: []
|
|
102
91
|
rubyforge_project:
|
103
92
|
rubygems_version: 1.3.7
|
104
93
|
signing_key:
|
105
|
-
specification_version:
|
94
|
+
specification_version: 3
|
106
95
|
summary: A Sitemap generator specifically designed for large sites (although it works equally well with small sites)
|
107
|
-
test_files:
|
108
|
-
|
96
|
+
test_files:
|
97
|
+
- test/big_sitemap_test.rb
|
98
|
+
- test/fixtures/test_model.rb
|
99
|
+
- test/test_helper.rb
|