pj_nitin-big_sitemap 0.3.1

Sign up to get free protection for your applications and to get access to all the features.
data/History.txt ADDED
@@ -0,0 +1,28 @@
1
+ === 0.3.0 / 2009-04-06
2
+
3
+ * API change: Pass model through as first argument to add method, e.g.sitemap.add(Posts, {:path => 'articles'})
4
+ * API change: Use Rails' polymorphic_url helper to generate URLs if Rails is being used
5
+ * API change: Only ping search engines when ping_search_engines is explicitly called
6
+ * Add support for passing options through to the model's find method, e.g. :conditions
7
+ * Allow base URL to be specified as a hash as well as a string
8
+ * Add support for changefreq and priority
9
+ * Pluralize sitemap model filenames
10
+ * GZipping may optionally be turned off
11
+
12
+ === 0.2.1 / 2009-03-12
13
+
14
+ * Normalize path arguments so it no longer matters whether a leading slash is used or not
15
+
16
+ === 0.2.0 / 2009-03-11
17
+
18
+ * Methods are now chainable
19
+
20
+ === 0.1.4 / 2009-03-11
21
+
22
+ * Add clean method to clear out Sitemaps directory
23
+ * Make methods chainable
24
+
25
+ === 0.1.3 / 2009-03-10
26
+
27
+ * Initial release
28
+
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ (The MIT License)
2
+
3
+ Copyright (c) 2009 Stateless Systems (http://statelesssystems.com)
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ 'Software'), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
20
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
21
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
22
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,110 @@
1
+ = BigSitemap
2
+
3
+ BigSitemap is a Sitemap (http://sitemaps.org) generator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries to minimize memory usage, can be set up with just a few lines of code and is compatible with just about any framework.
4
+
5
+ BigSitemap is best run periodically through a Rake/Thor task.
6
+
7
+ sitemap = BigSitemap.new(:url_options => {:host => 'example.com'})
8
+
9
+ # Add a model
10
+ sitemap.add Product
11
+
12
+ # Add another model with some options
13
+ sitemap.add(Post, {
14
+ :conditions => {:published => true},
15
+ :path => 'articles',
16
+ :change_frequency => 'daily',
17
+ :priority => 0.5
18
+ })
19
+
20
+ # Generate the files
21
+ sitemap.generate
22
+
23
+ The code above will create a minimum of three files:
24
+
25
+ 1. public/sitemaps/sitemap_index.xml.gz
26
+ 2. public/sitemaps/sitemap_products.xml.gz
27
+ 3. public/sitemaps/sitemap_posts.xml.gz
28
+
29
+ If your sitemaps grow beyond 50,000 URLs (this limit can be overridden with the <code>:max_per_sitemap</code> option), the sitemap files will be partitioned into multiple files (<code>sitemap_products_1.xml.gz</code>, <code>sitemap_products_2.xml.gz</code>, ...).
30
+
31
+ If you're using Rails then the URLs for each database record are generated with the <code>polymorphic_url</code> helper. That means that the URL for a record will be exactly what you would expect: generated with respect to the routing setup of your app. In other contexts where this helper isn't available, the URLs are generated in the form:
32
+
33
+ :base_url/:path/:to_param
34
+
35
+ If the <code>to_param</code> method does not exist, then <code>id</code> will be used.
36
+
37
+ == Install
38
+
39
+ Via gem:
40
+
41
+ gem install alexrabarts-big_sitemap -s http://gems.github.com
42
+
43
+ == Advanced
44
+
45
+ === Options
46
+
47
+ * <code>:url_options</code> -- hash with <code>:host</code>, optionally <code>:port</code> and <code>:protocol</code>
48
+ * <code>:base_url</code> -- string alternative to <code>:url_options</code>, e.g. "https://example.com:8080/"
49
+ * <code>:document_root</code> -- string defaults to <code>Rails.root</code> or <code>Merb.root</code> if available
50
+ * <code>:path</code> -- string defaults to 'sitemaps', which places sitemap files under the <code>/sitemaps</code> directory
51
+ * <code>:max_per_sitemap</code> -- <code>50000</code>, which is the limit dictated by Google but can be less
52
+ * <code>:batch_size</code> -- <code>1001</code> (not <code>1000</code> due to a bug in DataMapper)
53
+ * <code>:gzip</code> -- <code>true</code>
54
+ * <code>:ping_google</code> -- <code>true</code>
55
+ * <code>:ping_yahoo</code> -- <code>false</code>, needs <code>:yahoo_app_id</code>
56
+ * <code>:ping_msn</code> -- <code>false</code>
57
+ * <code>:pink_ask</code> -- <code>false</code>
58
+
59
+ === Chaining
60
+
61
+ You can chain methods together. You could even get away with as little code as:
62
+
63
+ BigSitemap.new(:url_options => {:host => 'example.com'}).add(Post).generate
64
+
65
+ === Pinging Search Engines
66
+
67
+ To ping search engines, call <code>ping_search_engines</code> after you generate the sitemap:
68
+
69
+ sitemap.generate
70
+ sitemap.ping_search_engines
71
+
72
+ === Change Frequency and Priority
73
+
74
+ You can control "changefreq" and "priority" values for each record individually by passing lambdas instead of fixed values:
75
+
76
+ sitemap.add(Posts,
77
+ :change_frequency => lambda {|post| ... },
78
+ :priority => lambda {|post| ... }
79
+ )
80
+
81
+ === Find Methods
82
+
83
+ Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
84
+
85
+ Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
86
+
87
+ If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
88
+
89
+ === Cleaning the Sitemaps Directory
90
+
91
+ Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
92
+
93
+ == Limitations
94
+
95
+ If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
96
+
97
+ == TODO
98
+
99
+ Tests for Rails components.
100
+
101
+ == Credits
102
+
103
+ Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
104
+ http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
105
+
106
+ Thanks to Mislav Marohnić for contributing patches.
107
+
108
+ == Copyright
109
+
110
+ Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
data/VERSION.yml ADDED
@@ -0,0 +1,4 @@
1
+ ---
2
+ :patch: 0
3
+ :major: 0
4
+ :minor: 3
@@ -0,0 +1,229 @@
1
+ require 'uri'
2
+ require 'big_sitemap/builder'
3
+ require 'activesupport'
4
+
5
+ class BigSitemap
6
+ DEFAULTS = {
7
+ :max_per_sitemap => Builder::MAX_URLS,
8
+ :batch_size => 1001,
9
+ :path => 'sitemaps',
10
+ :gzip => true,
11
+
12
+ # opinionated
13
+ :ping_google => true,
14
+ :ping_yahoo => false, # needs :yahoo_app_id
15
+ :ping_msn => false,
16
+ :ping_ask => false
17
+ }
18
+
19
+ COUNT_METHODS = [:count_for_sitemap, :count]
20
+ FIND_METHODS = [:find_for_sitemap, :all]
21
+ TIMESTAMP_METHODS = [:updated_at, :updated_on, :updated, :created_at, :created_on, :created]
22
+ PARAM_METHODS = [:to_param, :id]
23
+
24
+ include ActionController::UrlWriter if defined? Rails
25
+
26
+ def initialize(options)
27
+ @options = DEFAULTS.merge options
28
+
29
+ # Use Rails' default_url_options if available
30
+ @default_url_options = defined?(Rails) ? default_url_options : {}
31
+
32
+ if @options[:max_per_sitemap] <= 1
33
+ raise ArgumentError, '":max_per_sitemap" must be greater than 1'
34
+ end
35
+
36
+ if @options[:url_options]
37
+ @default_url_options.update @options[:url_options]
38
+ elsif @options[:base_url]
39
+ uri = URI.parse(@options[:base_url])
40
+ @default_url_options[:host] = uri.host
41
+ @default_url_options[:port] = uri.port
42
+ @default_url_options[:protocol] = uri.scheme
43
+ else
44
+ raise ArgumentError, 'you must specify either ":url_options" hash or ":base_url" string'
45
+ end
46
+
47
+ if @options[:batch_size] > @options[:max_per_sitemap]
48
+ raise ArgumentError, '":batch_size" must be less than ":max_per_sitemap"'
49
+ end
50
+
51
+ @options[:document_root] ||= begin
52
+ if defined? Rails
53
+ "#{Rails.root}/public"
54
+ elsif defined? Merb
55
+ "#{Merb.root}/public"
56
+ end
57
+ end
58
+
59
+ unless @options[:document_root]
60
+ raise ArgumentError, 'Document root must be specified with the ":document_root" option'
61
+ end
62
+
63
+ @file_path = "#{@options[:document_root]}/#{strip_leading_slash(@options[:path])}"
64
+ Dir.mkdir(@file_path) unless File.exists? @file_path
65
+
66
+ @sources = []
67
+ @sitemap_files = []
68
+ end
69
+
70
+ def add(model, options={})
71
+ options[:path] ||= ActiveSupport::Inflector.tableize(model.to_s)
72
+ @sources << [model, options.dup]
73
+ return self
74
+ end
75
+
76
+ def clean
77
+ Dir["#{@file_path}/sitemap_*.{xml,xml.gz}"].each do |file|
78
+ FileUtils.rm file
79
+ end
80
+ return self
81
+ end
82
+
83
+ def generate
84
+ for model, options in @sources
85
+ with_sitemap(ActiveSupport::Inflector.tableize(model)) do |sitemap|
86
+ count_method = pick_method(model, COUNT_METHODS)
87
+ find_method = pick_method(model, FIND_METHODS)
88
+ raise ArgumentError, "#{model} must provide a count_for_sitemap class method" if count_method.nil?
89
+ raise ArgumentError, "#{model} must provide a find_for_sitemap class method" if find_method.nil?
90
+
91
+ count = model.send(count_method)
92
+ num_sitemaps = 1
93
+ num_batches = 1
94
+
95
+ if count > @options[:batch_size]
96
+ num_batches = (count.to_f / @options[:batch_size].to_f).ceil
97
+ num_sitemaps = (count.to_f / @options[:max_per_sitemap].to_f).ceil
98
+ end
99
+ batches_per_sitemap = num_batches.to_f / num_sitemaps.to_f
100
+
101
+ find_options = options.dup
102
+
103
+ for sitemap_num in 1..num_sitemaps
104
+ # Work out the start and end batch numbers for this sitemap
105
+ batch_num_start = sitemap_num == 1 ? 1 : ((sitemap_num * batches_per_sitemap).ceil - batches_per_sitemap + 1).to_i
106
+ batch_num_end = (batch_num_start + [batches_per_sitemap, num_batches].min).floor - 1
107
+
108
+ for batch_num in batch_num_start..batch_num_end
109
+ offset = ((batch_num - 1) * @options[:batch_size])
110
+ limit = (count - offset) < @options[:batch_size] ? (count - offset - 1) : @options[:batch_size]
111
+ find_options.update(:limit => limit, :offset => offset) if num_batches > 1
112
+
113
+ model.send(find_method, find_options).each do |record|
114
+ last_mod_method = pick_method(record, TIMESTAMP_METHODS)
115
+ last_mod = last_mod_method.nil? ? Time.now : record.send(last_mod_method)
116
+
117
+ param_method = pick_method(record, PARAM_METHODS)
118
+
119
+ location = defined?(Rails) ?
120
+ polymorphic_url(record) :
121
+ "#{root_url}/#{strip_leading_slash(options[:path])}/#{record.send(param_method)}"
122
+
123
+ change_frequency = options[:change_frequency] || 'weekly'
124
+ freq = change_frequency.is_a?(Proc) ? change_frequency.call(record) : change_frequency
125
+
126
+ priority = options[:priority]
127
+ pri = priority.is_a?(Proc) ? priority.call(record) : priority
128
+
129
+ sitemap.add_url!(location, last_mod, freq, pri)
130
+ end
131
+ end
132
+ end
133
+ end
134
+ end
135
+
136
+ generate_sitemap_index
137
+
138
+ return self
139
+ end
140
+
141
+ def ping_search_engines
142
+ require 'net/http'
143
+ require 'cgi'
144
+
145
+ sitemap_uri = CGI::escape(url_for_sitemap(@sitemap_files.last))
146
+
147
+ if @options[:ping_google]
148
+ Net::HTTP.get('www.google.com', "/webmasters/tools/ping?sitemap=#{sitemap_uri}")
149
+ end
150
+
151
+ if @options[:ping_yahoo]
152
+ if @options[:yahoo_app_id]
153
+ Net::HTTP.get(
154
+ 'search.yahooapis.com', "/SiteExplorerService/V1/updateNotification?" +
155
+ "appid=#{@options[:yahoo_app_id]}&url=#{sitemap_uri}"
156
+ )
157
+ else
158
+ $stderr.puts 'unable to ping Yahoo: no ":yahoo_app_id" provided'
159
+ end
160
+ end
161
+
162
+ if @options[:ping_msn]
163
+ Net::HTTP.get('webmaster.live.com', "/ping.aspx?siteMap=#{sitemap_uri}")
164
+ end
165
+
166
+ if @options[:pink_ask]
167
+ Net::HTTP.get('submissions.ask.com', "/ping?sitemap=#{sitemap_uri}")
168
+ end
169
+ end
170
+
171
+ def root_url
172
+ @root_url ||= begin
173
+ url = ''
174
+ url << (@default_url_options[:protocol] || 'http')
175
+ url << '://' unless url.match('://')
176
+ url << @default_url_options[:host]
177
+ url << ":#{port}" if port = @default_url_options[:port] and port != 80
178
+ end
179
+ end
180
+
181
+ private
182
+
183
+ def with_sitemap(name, options={})
184
+ options[:index] = name == 'index'
185
+ options[:filename] = "#{@file_path}/sitemap_#{name}"
186
+ options[:max_urls] = @options[:max_per_sitemap]
187
+
188
+ unless options[:gzip] = @options[:gzip]
189
+ options[:indent] = 2
190
+ end
191
+
192
+ sitemap = Builder.new(options)
193
+
194
+ begin
195
+ yield sitemap
196
+ ensure
197
+ sitemap.close!
198
+ @sitemap_files.concat sitemap.paths!
199
+ end
200
+ end
201
+
202
+ def strip_leading_slash(str)
203
+ str.sub(/^\//, '')
204
+ end
205
+
206
+ def pick_method(model, candidates)
207
+ method = nil
208
+ candidates.each do |candidate|
209
+ if model.respond_to? candidate
210
+ method = candidate
211
+ break
212
+ end
213
+ end
214
+ method
215
+ end
216
+
217
+ def url_for_sitemap(path)
218
+ "#{root_url}/#{File.basename(path)}"
219
+ end
220
+
221
+ # Create a sitemap index document
222
+ def generate_sitemap_index
223
+ with_sitemap 'index' do |sitemap|
224
+ for path in @sitemap_files
225
+ sitemap.add_url!(url_for_sitemap(path), File.stat(path).mtime)
226
+ end
227
+ end
228
+ end
229
+ end
@@ -0,0 +1,289 @@
1
+ require File.dirname(__FILE__) + '/test_helper'
2
+ require 'nokogiri'
3
+
4
+ class BigSitemapTest < Test::Unit::TestCase
5
+ def setup
6
+ delete_tmp_files
7
+ end
8
+
9
+ def teardown
10
+ delete_tmp_files
11
+ end
12
+
13
+ should 'raise an error if the :base_url option is not specified' do
14
+ assert_nothing_raised { BigSitemap.new(:base_url => 'http://example.com', :document_root => tmp_dir) }
15
+ assert_raise(ArgumentError) { BigSitemap.new(:document_root => tmp_dir) }
16
+ end
17
+
18
+ should 'generate the same base URL' do
19
+ options = {:document_root => tmp_dir}
20
+ assert_equal(
21
+ BigSitemap.new(options.merge(:base_url => 'http://example.com')).root_url,
22
+ BigSitemap.new(options.merge(:url_options => {:host => 'example.com'})).root_url
23
+ )
24
+ end
25
+
26
+ should 'generate a sitemap index file' do
27
+ generate_sitemap_files
28
+ assert File.exists?(sitemaps_index_file)
29
+ end
30
+
31
+ should 'generate a single sitemap model file' do
32
+ create_sitemap
33
+ add_model
34
+ @sitemap.generate
35
+ assert File.exists?(first_sitemaps_model_file), "#{first_sitemaps_model_file} exists"
36
+ end
37
+
38
+ should 'generate two sitemap model files' do
39
+ generate_two_model_sitemap_files
40
+ assert File.exists?(first_sitemaps_model_file), "#{first_sitemaps_model_file} exists"
41
+ assert File.exists?(second_sitemaps_model_file), "#{second_sitemaps_model_file} exists"
42
+ assert !File.exists?(third_sitemaps_model_file), "#{third_sitemaps_model_file} does not exist"
43
+ end
44
+
45
+ context 'Sitemap index file' do
46
+ should 'contain one sitemapindex element' do
47
+ generate_sitemap_files
48
+ assert_equal 1, num_elements(sitemaps_index_file, 'sitemapindex')
49
+ end
50
+
51
+ should 'contain one sitemap element' do
52
+ generate_sitemap_files
53
+ assert_equal 1, num_elements(sitemaps_index_file, 'sitemap')
54
+ end
55
+
56
+ should 'contain one loc element' do
57
+ generate_one_sitemap_model_file
58
+ assert_equal 1, num_elements(sitemaps_index_file, 'loc')
59
+ end
60
+
61
+ should 'contain one lastmod element' do
62
+ generate_one_sitemap_model_file
63
+ assert_equal 1, num_elements(sitemaps_index_file, 'lastmod')
64
+ end
65
+
66
+ should 'contain two loc elements' do
67
+ generate_two_model_sitemap_files
68
+ assert_equal 2, num_elements(sitemaps_index_file, 'loc')
69
+ end
70
+
71
+ should 'contain two lastmod elements' do
72
+ generate_two_model_sitemap_files
73
+ assert_equal 2, num_elements(sitemaps_index_file, 'lastmod')
74
+ end
75
+
76
+ should 'not be gzipped' do
77
+ generate_sitemap_files(:gzip => false)
78
+ assert File.exists?(unzipped_sitemaps_index_file)
79
+ end
80
+ end
81
+
82
+ context 'Sitemap model file' do
83
+ should 'contain one urlset element' do
84
+ generate_one_sitemap_model_file
85
+ assert_equal 1, num_elements(first_sitemaps_model_file, 'urlset')
86
+ end
87
+
88
+ should 'contain several loc elements' do
89
+ generate_one_sitemap_model_file
90
+ assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'loc')
91
+ end
92
+
93
+ should 'contain several lastmod elements' do
94
+ generate_one_sitemap_model_file
95
+ assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'lastmod')
96
+ end
97
+
98
+ should 'contain several changefreq elements' do
99
+ generate_one_sitemap_model_file
100
+ assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'changefreq')
101
+ end
102
+
103
+ should 'contain several priority elements' do
104
+ generate_one_sitemap_model_file(:priority => 0.2)
105
+ assert_equal default_num_items, num_elements(first_sitemaps_model_file, 'priority')
106
+ end
107
+
108
+ should 'have a change frequency of weekly by default' do
109
+ generate_one_sitemap_model_file
110
+ assert_equal 'weekly', elements(first_sitemaps_model_file, 'changefreq').first.text
111
+ end
112
+
113
+ should 'have a change frequency of daily' do
114
+ generate_one_sitemap_model_file(:change_frequency => 'daily')
115
+ assert_equal 'daily', elements(first_sitemaps_model_file, 'changefreq').first.text
116
+ end
117
+
118
+ should 'be able to use a lambda to specify change frequency' do
119
+ generate_one_sitemap_model_file(:change_frequency => lambda {|m| m.change_frequency})
120
+ assert_equal TestModel.new.change_frequency, elements(first_sitemaps_model_file, 'changefreq').first.text
121
+ end
122
+
123
+ should 'have a priority of 0.2' do
124
+ generate_one_sitemap_model_file(:priority => 0.2)
125
+ assert_equal '0.2', elements(first_sitemaps_model_file, 'priority').first.text
126
+ end
127
+
128
+ should 'be able to use a lambda to specify priority' do
129
+ generate_one_sitemap_model_file(:priority => lambda {|m| m.priority})
130
+ assert_equal TestModel.new.priority.to_s, elements(first_sitemaps_model_file, 'priority').first.text
131
+ end
132
+
133
+ should 'contain two loc element' do
134
+ generate_two_model_sitemap_files
135
+ assert_equal 2, num_elements(first_sitemaps_model_file, 'loc')
136
+ assert_equal 2, num_elements(second_sitemaps_model_file, 'loc')
137
+ end
138
+
139
+ should 'contain two lastmod element' do
140
+ generate_two_model_sitemap_files
141
+ assert_equal 2, num_elements(first_sitemaps_model_file, 'lastmod')
142
+ assert_equal 2, num_elements(second_sitemaps_model_file, 'lastmod')
143
+ end
144
+
145
+ should 'contain two changefreq elements' do
146
+ generate_two_model_sitemap_files
147
+ assert_equal 2, num_elements(first_sitemaps_model_file, 'changefreq')
148
+ assert_equal 2, num_elements(second_sitemaps_model_file, 'changefreq')
149
+ end
150
+
151
+ should 'contain two priority element' do
152
+ generate_two_model_sitemap_files(:priority => 0.2)
153
+ assert_equal 2, num_elements(first_sitemaps_model_file, 'priority')
154
+ assert_equal 2, num_elements(second_sitemaps_model_file, 'priority')
155
+ end
156
+
157
+ should 'strip leading slashes from controller paths' do
158
+ create_sitemap
159
+ @sitemap.add(TestModel, :path => '/test_controller').generate
160
+ assert(
161
+ !elements(first_sitemaps_model_file, 'loc').first.text.match(/\/\/test_controller\//),
162
+ 'URL does not contain a double-slash before the controller path'
163
+ )
164
+ end
165
+
166
+ should 'not be gzipped' do
167
+ generate_one_sitemap_model_file(:gzip => false)
168
+ assert File.exists?(unzipped_first_sitemaps_model_file)
169
+ end
170
+ end
171
+
172
+ context 'add method' do
173
+ should 'be chainable' do
174
+ create_sitemap
175
+ assert_equal BigSitemap, @sitemap.add(TestModel).class
176
+ end
177
+ end
178
+
179
+ context 'clean method' do
180
+ should 'be chainable' do
181
+ create_sitemap
182
+ assert_equal BigSitemap, @sitemap.clean.class
183
+ end
184
+
185
+ should 'clean all sitemap files' do
186
+ generate_sitemap_files
187
+ assert Dir.entries(sitemaps_dir).size > 2, "#{sitemaps_dir} is not empty" # ['.', '..'].size == 2
188
+ @sitemap.clean
189
+ assert_equal 2, Dir.entries(sitemaps_dir).size, "#{sitemaps_dir} is empty"
190
+ end
191
+ end
192
+
193
+ context 'generate method' do
194
+ should 'be chainable' do
195
+ create_sitemap
196
+ assert_equal BigSitemap, @sitemap.generate.class
197
+ end
198
+ end
199
+
200
+ private
201
+ def delete_tmp_files
202
+ FileUtils.rm_rf(sitemaps_dir)
203
+ end
204
+
205
+ def create_sitemap(options={})
206
+ @sitemap = BigSitemap.new({
207
+ :base_url => 'http://example.com',
208
+ :document_root => tmp_dir,
209
+ :update_google => false
210
+ }.update(options))
211
+ end
212
+
213
+ def generate_sitemap_files(options={})
214
+ create_sitemap(options)
215
+ add_model
216
+ @sitemap.generate
217
+ end
218
+
219
+ def generate_one_sitemap_model_file(options={})
220
+ change_frequency = options.delete(:change_frequency)
221
+ priority = options.delete(:priority)
222
+ create_sitemap(options.merge(:max_per_sitemap => default_num_items, :batch_size => default_num_items))
223
+ add_model(:change_frequency => change_frequency, :priority => priority)
224
+ @sitemap.generate
225
+ end
226
+
227
+ def generate_two_model_sitemap_files(options={})
228
+ change_frequency = options.delete(:change_frequency)
229
+ priority = options.delete(:priority)
230
+ create_sitemap(options.merge(:max_per_sitemap => 2, :batch_size => 1))
231
+ add_model(:num_items => 4, :change_frequency => change_frequency, :priority => priority)
232
+ @sitemap.generate
233
+ end
234
+
235
+ def add_model(options={})
236
+ num_items = options.delete(:num_items) || default_num_items
237
+ TestModel.stubs(:num_items).returns(num_items)
238
+ @sitemap.add(TestModel, options)
239
+ end
240
+
241
+ def default_num_items
242
+ 10
243
+ end
244
+
245
+ def sitemaps_index_file
246
+ "#{unzipped_sitemaps_index_file}.gz"
247
+ end
248
+
249
+ def unzipped_sitemaps_index_file
250
+ "#{sitemaps_dir}/sitemap_index.xml"
251
+ end
252
+
253
+ def unzipped_first_sitemaps_model_file
254
+ "#{sitemaps_dir}/sitemap_test_models.xml"
255
+ end
256
+
257
+ def first_sitemaps_model_file
258
+ "#{sitemaps_dir}/sitemap_test_models.xml.gz"
259
+ end
260
+
261
+ def second_sitemaps_model_file
262
+ "#{sitemaps_dir}/sitemap_test_models_1.xml.gz"
263
+ end
264
+
265
+ def third_sitemaps_model_file
266
+ "#{sitemaps_dir}/sitemap_test_model_2.xml.gz"
267
+ end
268
+
269
+ def sitemaps_dir
270
+ "#{tmp_dir}/sitemaps"
271
+ end
272
+
273
+ def tmp_dir
274
+ '/tmp'
275
+ end
276
+
277
+ def ns
278
+ {'s' => 'http://www.sitemaps.org/schemas/sitemap/0.9'}
279
+ end
280
+
281
+ def elements(filename, el)
282
+ data = Nokogiri::XML.parse(Zlib::GzipReader.open(filename).read)
283
+ data.search("//s:#{el}", ns)
284
+ end
285
+
286
+ def num_elements(filename, el)
287
+ elements(filename, el).size
288
+ end
289
+ end
@@ -0,0 +1,30 @@
1
+ class TestModel
2
+ def to_param
3
+ object_id
4
+ end
5
+
6
+ def change_frequency
7
+ 'monthly'
8
+ end
9
+
10
+ def priority
11
+ 0.8
12
+ end
13
+
14
+ class << self
15
+ def count_for_sitemap
16
+ self.find_for_sitemap.size
17
+ end
18
+
19
+ def num_items
20
+ 10
21
+ end
22
+
23
+ def find_for_sitemap(options={})
24
+ instances = []
25
+ num_times = options.delete(:limit) || self.num_items
26
+ num_times.times { instances.push(self.new) }
27
+ instances
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,11 @@
1
+ require 'rubygems'
2
+ require 'test/unit'
3
+ require 'shoulda'
4
+ require 'mocha'
5
+ require 'test/fixtures/test_model'
6
+
7
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
8
+ require 'big_sitemap'
9
+
10
+ class Test::Unit::TestCase
11
+ end
metadata ADDED
@@ -0,0 +1,82 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: pj_nitin-big_sitemap
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.3.1
5
+ platform: ruby
6
+ authors:
7
+ - Alex Rabarts
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-04-17 00:00:00 -07:00
13
+ default_executable:
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: builder
17
+ type: :runtime
18
+ version_requirement:
19
+ version_requirements: !ruby/object:Gem::Requirement
20
+ requirements:
21
+ - - ">="
22
+ - !ruby/object:Gem::Version
23
+ version: 2.1.2
24
+ version:
25
+ - !ruby/object:Gem::Dependency
26
+ name: activesupport
27
+ type: :runtime
28
+ version_requirement:
29
+ version_requirements: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: "0"
34
+ version:
35
+ description: (Now depends on activesupport) A Sitemap generator specifically designed for large sites (although it works equally well with small sites)
36
+ email: alexrabarts@gmail.com
37
+ executables: []
38
+
39
+ extensions: []
40
+
41
+ extra_rdoc_files:
42
+ - README.rdoc
43
+ - LICENSE
44
+ files:
45
+ - History.txt
46
+ - README.rdoc
47
+ - VERSION.yml
48
+ - lib/big_sitemap.rb
49
+ - test/big_sitemap_test.rb
50
+ - test/fixtures
51
+ - test/fixtures/test_model.rb
52
+ - test/test_helper.rb
53
+ - LICENSE
54
+ has_rdoc: true
55
+ homepage: http://github.com/pj_nitin/big_sitemap
56
+ post_install_message:
57
+ rdoc_options:
58
+ - --inline-source
59
+ - --charset=UTF-8
60
+ require_paths:
61
+ - lib
62
+ required_ruby_version: !ruby/object:Gem::Requirement
63
+ requirements:
64
+ - - ">="
65
+ - !ruby/object:Gem::Version
66
+ version: "0"
67
+ version:
68
+ required_rubygems_version: !ruby/object:Gem::Requirement
69
+ requirements:
70
+ - - ">="
71
+ - !ruby/object:Gem::Version
72
+ version: "0"
73
+ version:
74
+ requirements: []
75
+
76
+ rubyforge_project:
77
+ rubygems_version: 1.2.0
78
+ signing_key:
79
+ specification_version: 2
80
+ summary: A Sitemap generator specifically designed for large sites (although it works equally well with small sites)
81
+ test_files: []
82
+