alexrabarts-big_sitemap 0.1.3 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.markdown CHANGED
@@ -1,17 +1,23 @@
1
1
  # BigSitemap
2
2
 
3
- ## DESCRIPTION:
3
+ ## DESCRIPTION
4
4
 
5
5
  BigSitemap is a Sitemap generator specifically designed for large sites (although it works equally well with small sites). It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries so it doesn't take your site down, can be set up with just a few lines of code and is compatible with just about any framework.
6
6
 
7
- ## INSTALL:
7
+ ## INSTALL
8
8
 
9
9
  * Via git: git clone git://github.com/alexrabarts/big_sitemap.git
10
10
  * Via gem: gem install alexrabarts-big_sitemap -s http://gems.github.com
11
11
 
12
- ## SYNOPSIS:
12
+ ## SYNOPSIS
13
13
 
14
- The minimum required to generated a sitemap is:
14
+ The minimum required to generate a sitemap is:
15
+
16
+ <pre>
17
+ BigSitemap.new(:base_url => 'http://example.com').add(:model => MyModel, :path => 'my_controller').generate
18
+ </pre>
19
+
20
+ You can put this in a rake/thor task and create a cron job to run it periodically. It should be enough for most Rails/Merb applications. Note that the methods are chainable, although you can call them on an instance variable if you prefer:
15
21
 
16
22
  <pre>
17
23
  sitemap = BigSitemap.new(:base_url => 'http://example.com')
@@ -19,10 +25,12 @@ The minimum required to generated a sitemap is:
19
25
  sitemap.generate
20
26
  </pre>
21
27
 
22
- You can put this in a rake/thor task and create a cron job to run it periodically. It should be enough for most Rails/Merb applications.
28
+ ### Find Methods
23
29
 
24
30
  Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap. Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included. If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
25
31
 
32
+ ### URL Format
33
+
26
34
  To generate the URLs, BigSitemap will combine the constructor arguments with the <code>to_param</code> method of each instance returned (provided by ActiveRecord but not DataMapper). If this method is not present, <code>id</code> will be used. The URL is constructed as:
27
35
 
28
36
  <pre>
@@ -30,6 +38,8 @@ To generate the URLs, BigSitemap will combine the constructor arguments with the
30
38
  ":base_url/:path/:id" # (if to_param does not exist)
31
39
  </pre>
32
40
 
41
+ ### Sitemap Location
42
+
33
43
  BigSitemap knows about the document root of Rails and Merb. If you are using another framework then you can specify the document root with the <code>:document_root</code> option. e.g.:
34
44
 
35
45
  <pre>
@@ -42,18 +52,28 @@ By default, the sitemap files are created under <code>/sitemaps</code>. You can
42
52
  BigSitemap.new(:base_url => 'http://example.com', :path => 'google-sitemaps') # places Sitemaps under /google-sitemaps
43
53
  </pre>
44
54
 
55
+ ### Cleaning the Sitemaps Directory
56
+
57
+ Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
58
+
59
+ ### Maximum Number of URLs
60
+
45
61
  Sitemaps will be split across several files if more than 50,000 records are returned. You can customize this limit with the <code>:max_per_sitemap</code> option:
46
62
 
47
63
  <pre>
48
64
  BigSitemap.new(:base_url => 'http://example.com', :max_per_sitemap => 1000) # Max of 1000 URLs per Sitemap
49
65
  </pre>
50
66
 
51
- The database is queries in batches to prevent large SQL select statements from locking the database for too long. By default, the batch size is 1001 (not 1000 due to an obscure bug in DataMapper that appears when an offset of 37000 is used). You can customize the batch size with the <code>:batch_size</code> option:
67
+ ### Batched Database Queries
68
+
69
+ The database is queried in batches to prevent large SQL select statements from locking the database for too long. By default, the batch size is 1001 (not 1000 due to an obscure bug in DataMapper that appears when an offset of 37000 is used). You can customize the batch size with the <code>:batch_size</code> option:
52
70
 
53
71
  <pre>
54
72
  BigSitemap.new(:base_url => 'http://example.com, :batch_size => 5000) # Database is queried in batches of 5,000
55
73
  </pre>
56
74
 
75
+ ### Search Engine Notification
76
+
57
77
  Google, Yahoo!, MSN and Ask are pinged once the Sitemap files are generated. You can turn one or more of these off:
58
78
 
59
79
  <pre>
@@ -72,7 +92,7 @@ You must provide an App ID in order to ping Yahoo! (more info at http://develope
72
92
  BigSitemap.new(:base_url => 'http://example.com', :yahoo_app_id => 'myYahooAppId') # Yahoo! will now be pinged
73
93
  </pre>
74
94
 
75
- ## LIMITATIONS:
95
+ ## LIMITATIONS
76
96
 
77
97
  If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
78
98
 
data/VERSION.yml CHANGED
@@ -1,4 +1,4 @@
1
1
  ---
2
- :minor: 1
3
- :patch: 3
2
+ :minor: 2
3
+ :patch: 0
4
4
  :major: 0
data/lib/big_sitemap.rb CHANGED
@@ -6,7 +6,7 @@ require 'extlib'
6
6
 
7
7
  class BigSitemap
8
8
  def initialize(options)
9
- document_root = options.delete(:document_root)
9
+ document_root = options.delete(:document_root)
10
10
 
11
11
  if document_root.nil?
12
12
  if defined? RAILS_ROOT
@@ -37,20 +37,26 @@ class BigSitemap
37
37
  'Batch size (:batch_size) must be less than or equal to maximum URLs per sitemap (:max_per_sitemap)'
38
38
  ) if @batch_size > @max_per_sitemap
39
39
 
40
- unless File.exists? @file_path
41
- Dir.mkdir(@file_path)
42
- end
40
+ Dir.mkdir(@file_path) unless File.exists? @file_path
43
41
  end
44
42
 
45
43
  def add(options)
46
44
  raise ArgumentError, ':model and :path options must be provided' unless options[:model] && options[:path]
47
45
  @sources << options
46
+ self # Chainable
48
47
  end
49
48
 
50
- def generate
51
- paths = []
52
- sitemaps = []
49
+ def clean
50
+ unless @file_path.nil?
51
+ Dir.foreach(@file_path) do |f|
52
+ f = "#{@file_path}/#{f}"
53
+ File.delete(f) if File.file?(f)
54
+ end
55
+ end
56
+ self # Chainable
57
+ end
53
58
 
59
+ def generate
54
60
  @sources.each do |source|
55
61
  klass = source[:model]
56
62
 
@@ -101,11 +107,9 @@ class BigSitemap
101
107
  param_method = pick_method(r, [:to_param, :id])
102
108
  raise ArgumentError, "#{klass} must provide a to_param instance method" if param_method.nil?
103
109
 
104
- path = {:url => "#{source[:path]}/#{r.send(param_method)}", :last_mod => last_mod}
105
-
106
110
  xml.url do
107
- xml.loc("#{@base_url}/#{path[:url]}")
108
- xml.lastmod(path[:last_mod].strftime('%Y-%m-%d')) unless path[:last_mod].nil?
111
+ xml.loc("#{@base_url}/#{source[:path]}/#{r.send(param_method)}")
112
+ xml.lastmod(last_mod.strftime('%Y-%m-%d')) unless last_mod.nil?
109
113
  xml.changefreq('weekly')
110
114
  end
111
115
  end
@@ -119,6 +123,7 @@ class BigSitemap
119
123
 
120
124
  generate_sitemap_index
121
125
  ping_search_engines
126
+ self # Chainable
122
127
  end
123
128
 
124
129
  private
@@ -35,6 +35,13 @@ class BigSitemapTest < Test::Unit::TestCase
35
35
  assert !File.exists?(third_sitemaps_model_file), "#{third_sitemaps_model_file} does not exist"
36
36
  end
37
37
 
38
+ should 'clean all sitemap files' do
39
+ generate_sitemap_files
40
+ assert Dir.entries(sitemaps_dir).size > 2, "#{sitemaps_dir} is not empty" # ['.', '..'].size == 2
41
+ @sitemap.clean
42
+ assert_equal 2, Dir.entries(sitemaps_dir).size, "#{sitemaps_dir} is empty"
43
+ end
44
+
38
45
  context 'Sitemap index file' do
39
46
  should 'contain one sitemapindex element' do
40
47
  generate_sitemap_files
@@ -107,6 +114,27 @@ class BigSitemapTest < Test::Unit::TestCase
107
114
  end
108
115
  end
109
116
 
117
+ context 'add method' do
118
+ should 'be chainable' do
119
+ create_sitemap
120
+ assert_equal BigSitemap, @sitemap.add({:model => TestModel, :path => 'test_controller'}).class
121
+ end
122
+ end
123
+
124
+ context 'clean method' do
125
+ should 'be chainable' do
126
+ create_sitemap
127
+ assert_equal BigSitemap, @sitemap.clean.class
128
+ end
129
+ end
130
+
131
+ context 'generate method' do
132
+ should 'be chainable' do
133
+ create_sitemap
134
+ assert_equal BigSitemap, @sitemap.generate.class
135
+ end
136
+ end
137
+
110
138
  private
111
139
  def delete_tmp_files
112
140
  FileUtils.rm_rf(sitemaps_dir)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: alexrabarts-big_sitemap
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Alex Rabarts
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-03-10 00:00:00 -07:00
12
+ date: 2009-03-11 00:00:00 -07:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -41,12 +41,13 @@ extensions: []
41
41
  extra_rdoc_files: []
42
42
 
43
43
  files:
44
- - VERSION.yml
44
+ - History.txt
45
45
  - README.markdown
46
+ - VERSION.yml
46
47
  - lib/big_sitemap.rb
48
+ - test/big_sitemap_test.rb
47
49
  - test/fixtures
48
50
  - test/fixtures/test_model.rb
49
- - test/big_sitemap_test.rb
50
51
  - test/test_helper.rb
51
52
  has_rdoc: true
52
53
  homepage: http://github.com/alexrabarts/big_sitemap