alexrabarts-big_sitemap 0.2.0 → 0.2.1
Sign up to get free protection for your applications and to get access to all the features.
- data/History.txt +17 -0
- data/{README.markdown → README.rdoc} +39 -45
- data/VERSION.yml +1 -1
- data/lib/big_sitemap.rb +6 -2
- data/test/big_sitemap_test.rb +16 -3
- data/test/fixtures/test_model.rb +4 -0
- metadata +3 -3
data/History.txt
ADDED
@@ -0,0 +1,17 @@
|
|
1
|
+
=== 0.2.1 / 2009-03-12
|
2
|
+
|
3
|
+
* Normalize path arguments so it no longer matters whether a leading slash is used or not
|
4
|
+
|
5
|
+
=== 0.2.0 / 2009-03-11
|
6
|
+
|
7
|
+
* Methods are now chainable
|
8
|
+
|
9
|
+
=== 0.1.4 / 2009-03-11
|
10
|
+
|
11
|
+
* Add clean method to clear out Sitemaps directory
|
12
|
+
* Make methods chainable
|
13
|
+
|
14
|
+
=== 0.1.3 / 2009-03-10
|
15
|
+
|
16
|
+
* Initial release
|
17
|
+
|
@@ -1,82 +1,77 @@
|
|
1
|
-
|
1
|
+
= BigSitemap
|
2
2
|
|
3
|
-
|
3
|
+
== DESCRIPTION
|
4
4
|
|
5
|
-
BigSitemap is a Sitemap generator specifically designed for large sites (although it works equally well with small sites). It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries so it doesn't take your site down, can be set up with just a few lines of code and is compatible with just about any framework.
|
5
|
+
BigSitemap is a Sitemap (http://sitemaps.org) generator specifically designed for large sites (although it works equally well with small sites). It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries so it doesn't take your site down, can be set up with just a few lines of code and is compatible with just about any framework.
|
6
6
|
|
7
|
-
|
7
|
+
== INSTALL
|
8
8
|
|
9
|
-
|
10
|
-
* Via gem: gem install alexrabarts-big_sitemap -s http://gems.github.com
|
9
|
+
Via git:
|
11
10
|
|
12
|
-
|
11
|
+
git clone git://github.com/alexrabarts/big_sitemap.git
|
12
|
+
|
13
|
+
Via gem:
|
14
|
+
|
15
|
+
gem install alexrabarts-big_sitemap -s http://gems.github.com
|
16
|
+
|
17
|
+
== SYNOPSIS
|
13
18
|
|
14
19
|
The minimum required to generate a sitemap is:
|
15
20
|
|
16
|
-
<pre>
|
17
21
|
BigSitemap.new(:base_url => 'http://example.com').add(:model => MyModel, :path => 'my_controller').generate
|
18
|
-
</pre>
|
19
22
|
|
20
|
-
You can put this in a rake/thor task and create a cron job to run it periodically. It should be enough for most Rails/Merb applications. Note that the methods are chainable, although you can call them on an instance variable if you prefer:
|
23
|
+
You can put this in a rake/thor task and create a cron job to run it periodically. It should be enough for most Rails/Merb applications. You can add more models by further calls to the <code>add</code> method. Note that the methods are chainable, although you can call them on an instance variable if you prefer:
|
21
24
|
|
22
|
-
<pre>
|
23
25
|
sitemap = BigSitemap.new(:base_url => 'http://example.com')
|
24
|
-
sitemap.add(:model =>
|
26
|
+
sitemap.add(:model => Posts, :path => 'articles')
|
27
|
+
sitemap.add(:model => Comments, :path => 'comments')
|
25
28
|
sitemap.generate
|
26
|
-
</pre>
|
27
29
|
|
28
|
-
|
30
|
+
=== Find Methods
|
29
31
|
|
30
|
-
Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
|
32
|
+
Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
|
31
33
|
|
32
|
-
|
34
|
+
Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
|
35
|
+
|
36
|
+
If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
|
37
|
+
|
38
|
+
=== URL Format
|
33
39
|
|
34
40
|
To generate the URLs, BigSitemap will combine the constructor arguments with the <code>to_param</code> method of each instance returned (provided by ActiveRecord but not DataMapper). If this method is not present, <code>id</code> will be used. The URL is constructed as:
|
35
41
|
|
36
|
-
|
37
|
-
|
38
|
-
":base_url/:path/:id" # (if to_param does not exist)
|
39
|
-
</pre>
|
42
|
+
:base_url/:path/:to_param (if to_param exists)
|
43
|
+
:base_url/:path/:id (if to_param does not exist)
|
40
44
|
|
41
|
-
|
45
|
+
=== Sitemap Location
|
42
46
|
|
43
47
|
BigSitemap knows about the document root of Rails and Merb. If you are using another framework then you can specify the document root with the <code>:document_root</code> option. e.g.:
|
44
48
|
|
45
|
-
<pre>
|
46
49
|
BigSitemap.new(:base_url => 'http://example.com', :document_root => "#{FOO_ROOT}/httpdocs")
|
47
|
-
</pre>
|
48
50
|
|
49
51
|
By default, the sitemap files are created under <code>/sitemaps</code>. You can modify this with the <code>:path</code> option:
|
50
52
|
|
51
|
-
<pre>
|
52
53
|
BigSitemap.new(:base_url => 'http://example.com', :path => 'google-sitemaps') # places Sitemaps under /google-sitemaps
|
53
|
-
</pre>
|
54
54
|
|
55
|
-
|
55
|
+
=== Cleaning the Sitemaps Directory
|
56
56
|
|
57
57
|
Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
|
58
58
|
|
59
|
-
|
59
|
+
=== Maximum Number of URLs
|
60
60
|
|
61
61
|
Sitemaps will be split across several files if more than 50,000 records are returned. You can customize this limit with the <code>:max_per_sitemap</code> option:
|
62
62
|
|
63
|
-
<pre>
|
64
63
|
BigSitemap.new(:base_url => 'http://example.com', :max_per_sitemap => 1000) # Max of 1000 URLs per Sitemap
|
65
|
-
</pre>
|
66
64
|
|
67
|
-
|
65
|
+
=== Batched Database Queries
|
68
66
|
|
69
|
-
The database is queried in batches to prevent large SQL select statements from locking the database for too long. By default, the batch size is 1001 (not 1000 due to an obscure bug in DataMapper
|
67
|
+
The database is queried in batches to prevent large SQL select statements from locking the database for too long. By default, the batch size is 1001 (not 1000 due to an obscure bug in DataMapper). You can customize the batch size with the <code>:batch_size</code> option:
|
70
68
|
|
71
|
-
|
72
|
-
BigSitemap.new(:base_url => 'http://example.com, :batch_size => 5000) # Database is queried in batches of 5,000
|
73
|
-
</pre>
|
69
|
+
BigSitemap.new(:base_url => 'http://example.com', :batch_size => 5000) # Database is queried in batches of 5,000
|
74
70
|
|
75
|
-
|
71
|
+
=== Search Engine Notification
|
76
72
|
|
77
73
|
Google, Yahoo!, MSN and Ask are pinged once the Sitemap files are generated. You can turn one or more of these off:
|
78
74
|
|
79
|
-
<pre>
|
80
75
|
BigSitemap.new(
|
81
76
|
:base_url => 'http://example.com',
|
82
77
|
:ping_google => false,
|
@@ -84,27 +79,26 @@ Google, Yahoo!, MSN and Ask are pinged once the Sitemap files are generated. Yo
|
|
84
79
|
:ping_msn => false,
|
85
80
|
:ping_ask => false
|
86
81
|
)
|
87
|
-
</pre>
|
88
82
|
|
89
83
|
You must provide an App ID in order to ping Yahoo! (more info at http://developer.yahoo.com/search/siteexplorer/V1/updateNotification.html):
|
90
84
|
|
91
|
-
<pre>
|
92
85
|
BigSitemap.new(:base_url => 'http://example.com', :yahoo_app_id => 'myYahooAppId') # Yahoo! will now be pinged
|
93
|
-
</pre>
|
94
86
|
|
95
|
-
|
87
|
+
== LIMITATIONS
|
96
88
|
|
97
89
|
If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
|
98
90
|
|
99
|
-
|
91
|
+
== TODO
|
100
92
|
|
101
|
-
* Support for priority
|
93
|
+
* Support for <code>priority</code>
|
94
|
+
* Support for <code>changefreq</code> (currently hard-coded to <code>weekly</code>)
|
102
95
|
|
103
|
-
|
96
|
+
== CREDITS
|
104
97
|
|
105
98
|
Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
|
106
99
|
http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
|
107
100
|
|
108
|
-
|
101
|
+
== COPYRIGHT
|
102
|
+
|
103
|
+
Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
|
109
104
|
|
110
|
-
Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
|
data/VERSION.yml
CHANGED
data/lib/big_sitemap.rb
CHANGED
@@ -21,7 +21,7 @@ class BigSitemap
|
|
21
21
|
@base_url = options.delete(:base_url)
|
22
22
|
@max_per_sitemap = options.delete(:max_per_sitemap) || 50000
|
23
23
|
@batch_size = options.delete(:batch_size) || 1001 # TODO: Set this to 1000 once DM offset 37000 bug is fixed
|
24
|
-
@web_path = options.delete(:path) || 'sitemaps'
|
24
|
+
@web_path = strip_leading_slash(options.delete(:path) || 'sitemaps')
|
25
25
|
@ping_google = options[:ping_google].nil? ? true : options.delete(:ping_google)
|
26
26
|
@ping_yahoo = options[:ping_yahoo].nil? ? true : options.delete(:ping_yahoo)
|
27
27
|
@yahoo_app_id = options.delete(:yahoo_app_id)
|
@@ -42,7 +42,7 @@ class BigSitemap
|
|
42
42
|
|
43
43
|
def add(options)
|
44
44
|
raise ArgumentError, ':model and :path options must be provided' unless options[:model] && options[:path]
|
45
|
-
@sources << options
|
45
|
+
@sources << options.update(:path => strip_leading_slash(options[:path]))
|
46
46
|
self # Chainable
|
47
47
|
end
|
48
48
|
|
@@ -127,6 +127,10 @@ class BigSitemap
|
|
127
127
|
end
|
128
128
|
|
129
129
|
private
|
130
|
+
def strip_leading_slash(str)
|
131
|
+
str.sub(/^\//, '')
|
132
|
+
end
|
133
|
+
|
130
134
|
def pick_method(klass, candidates)
|
131
135
|
method = nil
|
132
136
|
candidates.each do |candidate|
|
data/test/big_sitemap_test.rb
CHANGED
@@ -112,12 +112,21 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
112
112
|
assert_equal 1, num_elements(first_sitemaps_model_file, 'changefreq')
|
113
113
|
assert_equal 1, num_elements(second_sitemaps_model_file, 'changefreq')
|
114
114
|
end
|
115
|
+
|
116
|
+
should 'strip leading slashes from controller paths' do
|
117
|
+
create_sitemap
|
118
|
+
@sitemap.add(:model => TestModel, :path => '/test_controller').generate
|
119
|
+
assert(
|
120
|
+
!elements(single_sitemaps_model_file, 'loc').first.text.match(/\/\/test_controller\//),
|
121
|
+
'URL does not contain a double-slash before the controller path'
|
122
|
+
)
|
123
|
+
end
|
115
124
|
end
|
116
125
|
|
117
126
|
context 'add method' do
|
118
127
|
should 'be chainable' do
|
119
128
|
create_sitemap
|
120
|
-
assert_equal BigSitemap, @sitemap.add(
|
129
|
+
assert_equal BigSitemap, @sitemap.add(:model => TestModel, :path => 'test_controller').class
|
121
130
|
end
|
122
131
|
end
|
123
132
|
|
@@ -198,8 +207,12 @@ class BigSitemapTest < Test::Unit::TestCase
|
|
198
207
|
{'s' => 'http://www.sitemaps.org/schemas/sitemap/0.9'}
|
199
208
|
end
|
200
209
|
|
201
|
-
def
|
210
|
+
def elements(filename, el)
|
202
211
|
data = Nokogiri::XML.parse(Zlib::GzipReader.open(filename).read)
|
203
|
-
data.search("//s:#{el}", ns)
|
212
|
+
data.search("//s:#{el}", ns)
|
213
|
+
end
|
214
|
+
|
215
|
+
def num_elements(filename, el)
|
216
|
+
elements(filename, el).size
|
204
217
|
end
|
205
218
|
end
|
data/test/fixtures/test_model.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: alexrabarts-big_sitemap
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Alex Rabarts
|
@@ -9,7 +9,7 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2009-03-
|
12
|
+
date: 2009-03-12 00:00:00 -07:00
|
13
13
|
default_executable:
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
@@ -42,7 +42,7 @@ extra_rdoc_files: []
|
|
42
42
|
|
43
43
|
files:
|
44
44
|
- History.txt
|
45
|
-
- README.
|
45
|
+
- README.rdoc
|
46
46
|
- VERSION.yml
|
47
47
|
- lib/big_sitemap.rb
|
48
48
|
- test/big_sitemap_test.rb
|