alexrabarts-big_sitemap 0.2.0 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
data/History.txt ADDED
@@ -0,0 +1,17 @@
1
+ === 0.2.1 / 2009-03-12
2
+
3
+ * Normalize path arguments so it no longer matters whether a leading slash is used or not
4
+
5
+ === 0.2.0 / 2009-03-11
6
+
7
+ * Methods are now chainable
8
+
9
+ === 0.1.4 / 2009-03-11
10
+
11
+ * Add clean method to clear out Sitemaps directory
12
+ * Make methods chainable
13
+
14
+ === 0.1.3 / 2009-03-10
15
+
16
+ * Initial release
17
+
@@ -1,82 +1,77 @@
1
- # BigSitemap
1
+ = BigSitemap
2
2
 
3
- ## DESCRIPTION
3
+ == DESCRIPTION
4
4
 
5
- BigSitemap is a Sitemap generator specifically designed for large sites (although it works equally well with small sites). It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries so it doesn't take your site down, can be set up with just a few lines of code and is compatible with just about any framework.
5
+ BigSitemap is a Sitemap (http://sitemaps.org) generator specifically designed for large sites (although it works equally well with small sites). It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries so it doesn't take your site down, can be set up with just a few lines of code and is compatible with just about any framework.
6
6
 
7
- ## INSTALL
7
+ == INSTALL
8
8
 
9
- * Via git: git clone git://github.com/alexrabarts/big_sitemap.git
10
- * Via gem: gem install alexrabarts-big_sitemap -s http://gems.github.com
9
+ Via git:
11
10
 
12
- ## SYNOPSIS
11
+ git clone git://github.com/alexrabarts/big_sitemap.git
12
+
13
+ Via gem:
14
+
15
+ gem install alexrabarts-big_sitemap -s http://gems.github.com
16
+
17
+ == SYNOPSIS
13
18
 
14
19
  The minimum required to generate a sitemap is:
15
20
 
16
- <pre>
17
21
  BigSitemap.new(:base_url => 'http://example.com').add(:model => MyModel, :path => 'my_controller').generate
18
- </pre>
19
22
 
20
- You can put this in a rake/thor task and create a cron job to run it periodically. It should be enough for most Rails/Merb applications. Note that the methods are chainable, although you can call them on an instance variable if you prefer:
23
+ You can put this in a rake/thor task and create a cron job to run it periodically. It should be enough for most Rails/Merb applications. You can add more models by further calls to the <code>add</code> method. Note that the methods are chainable, although you can call them on an instance variable if you prefer:
21
24
 
22
- <pre>
23
25
  sitemap = BigSitemap.new(:base_url => 'http://example.com')
24
- sitemap.add(:model => MyModel, :path => 'my_controller')
26
+ sitemap.add(:model => Posts, :path => 'articles')
27
+ sitemap.add(:model => Comments, :path => 'comments')
25
28
  sitemap.generate
26
- </pre>
27
29
 
28
- ### Find Methods
30
+ === Find Methods
29
31
 
30
- Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap. Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included. If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
32
+ Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
31
33
 
32
- ### URL Format
34
+ Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
35
+
36
+ If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
37
+
38
+ === URL Format
33
39
 
34
40
  To generate the URLs, BigSitemap will combine the constructor arguments with the <code>to_param</code> method of each instance returned (provided by ActiveRecord but not DataMapper). If this method is not present, <code>id</code> will be used. The URL is constructed as:
35
41
 
36
- <pre>
37
- ":base_url/:path/:to_param" # (if to_param exists)
38
- ":base_url/:path/:id" # (if to_param does not exist)
39
- </pre>
42
+ :base_url/:path/:to_param (if to_param exists)
43
+ :base_url/:path/:id (if to_param does not exist)
40
44
 
41
- ### Sitemap Location
45
+ === Sitemap Location
42
46
 
43
47
  BigSitemap knows about the document root of Rails and Merb. If you are using another framework then you can specify the document root with the <code>:document_root</code> option. e.g.:
44
48
 
45
- <pre>
46
49
  BigSitemap.new(:base_url => 'http://example.com', :document_root => "#{FOO_ROOT}/httpdocs")
47
- </pre>
48
50
 
49
51
  By default, the sitemap files are created under <code>/sitemaps</code>. You can modify this with the <code>:path</code> option:
50
52
 
51
- <pre>
52
53
  BigSitemap.new(:base_url => 'http://example.com', :path => 'google-sitemaps') # places Sitemaps under /google-sitemaps
53
- </pre>
54
54
 
55
- ### Cleaning the Sitemaps Directory
55
+ === Cleaning the Sitemaps Directory
56
56
 
57
57
  Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
58
58
 
59
- ### Maximum Number of URLs
59
+ === Maximum Number of URLs
60
60
 
61
61
  Sitemaps will be split across several files if more than 50,000 records are returned. You can customize this limit with the <code>:max_per_sitemap</code> option:
62
62
 
63
- <pre>
64
63
  BigSitemap.new(:base_url => 'http://example.com', :max_per_sitemap => 1000) # Max of 1000 URLs per Sitemap
65
- </pre>
66
64
 
67
- ### Batched Database Queries
65
+ === Batched Database Queries
68
66
 
69
- The database is queried in batches to prevent large SQL select statements from locking the database for too long. By default, the batch size is 1001 (not 1000 due to an obscure bug in DataMapper that appears when an offset of 37000 is used). You can customize the batch size with the <code>:batch_size</code> option:
67
+ The database is queried in batches to prevent large SQL select statements from locking the database for too long. By default, the batch size is 1001 (not 1000 due to an obscure bug in DataMapper). You can customize the batch size with the <code>:batch_size</code> option:
70
68
 
71
- <pre>
72
- BigSitemap.new(:base_url => 'http://example.com, :batch_size => 5000) # Database is queried in batches of 5,000
73
- </pre>
69
+ BigSitemap.new(:base_url => 'http://example.com', :batch_size => 5000) # Database is queried in batches of 5,000
74
70
 
75
- ### Search Engine Notification
71
+ === Search Engine Notification
76
72
 
77
73
  Google, Yahoo!, MSN and Ask are pinged once the Sitemap files are generated. You can turn one or more of these off:
78
74
 
79
- <pre>
80
75
  BigSitemap.new(
81
76
  :base_url => 'http://example.com',
82
77
  :ping_google => false,
@@ -84,27 +79,26 @@ Google, Yahoo!, MSN and Ask are pinged once the Sitemap files are generated. Yo
84
79
  :ping_msn => false,
85
80
  :ping_ask => false
86
81
  )
87
- </pre>
88
82
 
89
83
  You must provide an App ID in order to ping Yahoo! (more info at http://developer.yahoo.com/search/siteexplorer/V1/updateNotification.html):
90
84
 
91
- <pre>
92
85
  BigSitemap.new(:base_url => 'http://example.com', :yahoo_app_id => 'myYahooAppId') # Yahoo! will now be pinged
93
- </pre>
94
86
 
95
- ## LIMITATIONS
87
+ == LIMITATIONS
96
88
 
97
89
  If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
98
90
 
99
- ## TODO
91
+ == TODO
100
92
 
101
- * Support for priority and changefreq (currently hard-coded to 'weekly')
93
+ * Support for <code>priority</code>
94
+ * Support for <code>changefreq</code> (currently hard-coded to <code>weekly</code>)
102
95
 
103
- ## CREDITS
96
+ == CREDITS
104
97
 
105
98
  Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
106
99
  http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
107
100
 
108
- ## COPYRIGHT
101
+ == COPYRIGHT
102
+
103
+ Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
109
104
 
110
- Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
data/VERSION.yml CHANGED
@@ -1,4 +1,4 @@
1
1
  ---
2
2
  :minor: 2
3
- :patch: 0
3
+ :patch: 1
4
4
  :major: 0
data/lib/big_sitemap.rb CHANGED
@@ -21,7 +21,7 @@ class BigSitemap
21
21
  @base_url = options.delete(:base_url)
22
22
  @max_per_sitemap = options.delete(:max_per_sitemap) || 50000
23
23
  @batch_size = options.delete(:batch_size) || 1001 # TODO: Set this to 1000 once DM offset 37000 bug is fixed
24
- @web_path = options.delete(:path) || 'sitemaps'
24
+ @web_path = strip_leading_slash(options.delete(:path) || 'sitemaps')
25
25
  @ping_google = options[:ping_google].nil? ? true : options.delete(:ping_google)
26
26
  @ping_yahoo = options[:ping_yahoo].nil? ? true : options.delete(:ping_yahoo)
27
27
  @yahoo_app_id = options.delete(:yahoo_app_id)
@@ -42,7 +42,7 @@ class BigSitemap
42
42
 
43
43
  def add(options)
44
44
  raise ArgumentError, ':model and :path options must be provided' unless options[:model] && options[:path]
45
- @sources << options
45
+ @sources << options.update(:path => strip_leading_slash(options[:path]))
46
46
  self # Chainable
47
47
  end
48
48
 
@@ -127,6 +127,10 @@ class BigSitemap
127
127
  end
128
128
 
129
129
  private
130
+ def strip_leading_slash(str)
131
+ str.sub(/^\//, '')
132
+ end
133
+
130
134
  def pick_method(klass, candidates)
131
135
  method = nil
132
136
  candidates.each do |candidate|
@@ -112,12 +112,21 @@ class BigSitemapTest < Test::Unit::TestCase
112
112
  assert_equal 1, num_elements(first_sitemaps_model_file, 'changefreq')
113
113
  assert_equal 1, num_elements(second_sitemaps_model_file, 'changefreq')
114
114
  end
115
+
116
+ should 'strip leading slashes from controller paths' do
117
+ create_sitemap
118
+ @sitemap.add(:model => TestModel, :path => '/test_controller').generate
119
+ assert(
120
+ !elements(single_sitemaps_model_file, 'loc').first.text.match(/\/\/test_controller\//),
121
+ 'URL does not contain a double-slash before the controller path'
122
+ )
123
+ end
115
124
  end
116
125
 
117
126
  context 'add method' do
118
127
  should 'be chainable' do
119
128
  create_sitemap
120
- assert_equal BigSitemap, @sitemap.add({:model => TestModel, :path => 'test_controller'}).class
129
+ assert_equal BigSitemap, @sitemap.add(:model => TestModel, :path => 'test_controller').class
121
130
  end
122
131
  end
123
132
 
@@ -198,8 +207,12 @@ class BigSitemapTest < Test::Unit::TestCase
198
207
  {'s' => 'http://www.sitemaps.org/schemas/sitemap/0.9'}
199
208
  end
200
209
 
201
- def num_elements(filename, el)
210
+ def elements(filename, el)
202
211
  data = Nokogiri::XML.parse(Zlib::GzipReader.open(filename).read)
203
- data.search("//s:#{el}", ns).size
212
+ data.search("//s:#{el}", ns)
213
+ end
214
+
215
+ def num_elements(filename, el)
216
+ elements(filename, el).size
204
217
  end
205
218
  end
@@ -8,6 +8,10 @@ class TestModel
8
8
  self.find_for_sitemap.size
9
9
  end
10
10
 
11
+ def num_items
12
+ 10
13
+ end
14
+
11
15
  def find_for_sitemap(options={})
12
16
  instances = []
13
17
  num_times = options.delete(:limit) || self.num_items
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: alexrabarts-big_sitemap
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Alex Rabarts
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-03-11 00:00:00 -07:00
12
+ date: 2009-03-12 00:00:00 -07:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -42,7 +42,7 @@ extra_rdoc_files: []
42
42
 
43
43
  files:
44
44
  - History.txt
45
- - README.markdown
45
+ - README.rdoc
46
46
  - VERSION.yml
47
47
  - lib/big_sitemap.rb
48
48
  - test/big_sitemap_test.rb