airblade-sitemap_generator 0.3.4

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2009 [name of plugin creator]
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,232 @@
1
+ N.B. This fork is Ruby 1.8.6-compatible (and probably therefore not compatible with 1.8.7+).
2
+
3
+ <hr>
4
+
5
+ SitemapGenerator
6
+ ================
7
+
8
+ A Rails 3-compatible gem/plugin to generate ['enterprise-class'][enterprise_class] Sitemaps using a familiar Rails Routes-like DSL. Sitemaps are readable by all search engines and adhere to the ['Sitemap protocol specification'][sitemap_protocol]. Automatically pings search engines to notify them of new sitemaps (including Google, Yahoo and Bing). Provides rake tasks to easily manage your sitemaps. Supports image sitemaps and handles millions of links.
9
+
10
+ Features
11
+ -------
12
+
13
+ - v0.2.6: **Support ['image sitemaps'][sitemap_images]**!
14
+ - v0.2.5: **Support Rails 3**!
15
+
16
+ - Adheres to the ['Sitemap protocol specification'][sitemap_protocol]
17
+ - Handles millions of links
18
+ - Automatic Gzip of Sitemap files
19
+ - Automatic ping of search engines to notify them of new sitemaps: Google, Yahoo, Bing, Ask, SitemapWriter
20
+ - Won't clobber your old sitemaps if the new one fails to generate
21
+ - Set the priority of links, change frequency etc
22
+ - You control which links are included
23
+ - You set the host name, so it doesn't matter if your application is in a subdirectory
24
+
25
+ Foreword
26
+ -------
27
+
28
+ Unfortunately, Adam Salter passed away in 2009. Those who knew him know what an amazing guy he was, and what an excellent Rails programmer he was. His passing is a great loss to the Rails community.
29
+
30
+ [Karl Varga](http://github.com/kjvarga) has taken over development of SitemapGenerator. The canonical repository is [http://github.com/kjvarga/sitemap_generator][canonical_repo]
31
+
32
+ Installation
33
+ =======
34
+
35
+ **Rails 3:**
36
+
37
+ 1. Add the gem to your <tt>Gemspec</tt>
38
+
39
+ <code>gem 'sitemap_generator'</code>
40
+
41
+ 2. `$ rake sitemap:install`
42
+
43
+ **Rails 2.x: As a gem**
44
+
45
+ 1. Add the gem as a dependency in your <tt>config/environment.rb</tt>
46
+
47
+ <code>config.gem 'sitemap_generator', :lib => false</code>
48
+
49
+ 2. `$ rake gems:install`
50
+
51
+ 3. Add the following to your <tt>RAILS_ROOT/Rakefile</tt>
52
+
53
+ <pre>begin
54
+ require 'sitemap_generator/tasks'
55
+ rescue Exception => e
56
+ puts "Warning, couldn't load gem tasks: #{e.message}! Skipping..."
57
+ end</pre>
58
+
59
+ 4. `$ rake sitemap:install`
60
+
61
+ **Rails 2.x: As a plugin**
62
+
63
+ 1. <code>$ ./script/plugin install git://github.com/kjvarga/sitemap_generator.git</code>
64
+
65
+ ----
66
+
67
+ Installation creates a <tt>config/sitemap.rb</tt> file which will contain your logic for generating the Sitemap files. If you want to create this file manually run <code>rake sitemap:install</code>.
68
+
69
+ You can run <code>rake sitemap:refresh</code> as needed to create Sitemap files. This will also ping these ['major search engines'][sitemap_engines]: Google, Yahoo, Bing, Ask, SitemapWriter. If you want to disable all non-essential output run the rake task with <code>rake -s sitemap:refresh</code>.
70
+
71
+ To keep your Sitemaps up-to-date, setup a cron job. Pass the <tt>-s</tt> option to the rake task to silence all but the most important output. If you're using Whenever, then your schedule would look something like:
72
+
73
+ # config/schedule.rb
74
+ every 1.day, :at => '5:00 am' do
75
+ rake "-s sitemap:refresh"
76
+ end
77
+
78
+ Optionally, you can add the following to your <code>public/robots.txt</code> file, so that robots can find the sitemap file:
79
+
80
+ Sitemap: <hostname>/sitemap_index.xml.gz
81
+
82
+ The Sitemap URL in the robots file should be the complete URL to the Sitemap Index, such as <tt>http://www.example.org/sitemap_index.xml.gz</tt>
83
+
84
+
85
+ Example 'config/sitemap.rb'
86
+ ==========
87
+
88
+ # Set the host name for URL creation
89
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
90
+
91
+ SitemapGenerator::Sitemap.add_links do |sitemap|
92
+ # Put links creation logic here.
93
+ #
94
+ # The Root Path ('/') and Sitemap Index file are added automatically.
95
+ # Links are added to the Sitemap output in the order they are specified.
96
+ #
97
+ # Usage: sitemap.add path, options
98
+ # (default options are used if you don't specify them)
99
+ #
100
+ # Defaults: :priority => 0.5, :changefreq => 'weekly',
101
+ # :lastmod => Time.now, :host => default_host
102
+
103
+
104
+ # Examples:
105
+
106
+ # add '/articles'
107
+ sitemap.add articles_path, :priority => 0.7, :changefreq => 'daily'
108
+
109
+ # add all individual articles
110
+ Article.find(:all).each do |a|
111
+ sitemap.add article_path(a), :lastmod => a.updated_at
112
+ end
113
+
114
+ # add merchant path
115
+ sitemap.add '/purchase', :priority => 0.7, :host => "https://www.example.com"
116
+
117
+ # add all individual news with images
118
+ News.all.each do |n|
119
+ sitemap.add news_path(n), :lastmod => n.updated_at, :images=>n.images.collect{ |r| :loc=>r.image.url, :title=>r.image.name }
120
+ end
121
+
122
+ end
123
+
124
+ # Including Sitemaps from Rails Engines.
125
+ #
126
+ # These Sitemaps should be almost identical to a regular Sitemap file except
127
+ # they needn't define their own SitemapGenerator::Sitemap.default_host since
128
+ # they will undoubtedly share the host name of the application they belong to.
129
+ #
130
+ # As an example, say we have a Rails Engine in vendor/plugins/cadability_client
131
+ # We can include its Sitemap here as follows:
132
+ #
133
+ file = File.join(Rails.root, 'vendor/plugins/cadability_client/config/sitemap.rb')
134
+ eval(open(file).read, binding, file)
135
+
136
+ Raison d'être
137
+ -------
138
+
139
+ Most of the Sitemap plugins out there seem to try to recreate the Sitemap links by iterating the Rails routes. In some cases this is possible, but for a great deal of cases it isn't.
140
+
141
+ a) There are probably quite a few routes in your routes file that don't need inclusion in the Sitemap. (AJAX routes I'm looking at you.)
142
+
143
+ and
144
+
145
+ b) How would you infer the correct series of links for the following route?
146
+
147
+ map.zipcode 'location/:state/:city/:zipcode', :controller => 'zipcode', :action => 'index'
148
+
149
+ Don't tell me it's trivial, because it isn't. It just looks trivial.
150
+
151
+ So my idea is to have another file similar to 'routes.rb' called 'sitemap.rb', where you can define what goes into the Sitemap.
152
+
153
+ Here's my solution:
154
+
155
+ Zipcode.find(:all, :include => :city).each do |z|
156
+ sitemap.add zipcode_path(:state => z.city.state, :city => z.city, :zipcode => z)
157
+ end
158
+
159
+ Easy hey?
160
+
161
+ Other Sitemap settings for the link, like `lastmod`, `priority`, `changefreq` and `host` are entered automatically, although you can override them if you need to.
162
+
163
+ Compatibility
164
+ =======
165
+
166
+ Tested and working on:
167
+
168
+ - **Rails** 3.0.0, sitemap_generator version >= 0.2.5
169
+ - **Rails** 1.x - 2.3.5
170
+ - **Ruby** 1.8.7, 1.9.1
171
+
172
+ Notes
173
+ =======
174
+
175
+ 1) For large sitemaps it may be useful to split your generation into batches to avoid running out of memory. E.g.:
176
+
177
+ # add movies
178
+ Movie.find_in_batches(:batch_size => 1000) do |movies|
179
+ movies.each do |movie|
180
+ sitemap.add "/movies/show/#{movie.to_param}", :lastmod => movie.updated_at, :changefreq => 'weekly'
181
+ end
182
+ end
183
+
184
+ 2) New Capistrano deploys will remove your Sitemap files, unless you run `rake sitemap:refresh`. The way around this is to create a cap task:
185
+
186
+ after "deploy:update_code", "deploy:copy_old_sitemap"
187
+
188
+ namespace :deploy do
189
+ task :copy_old_sitemap do
190
+ run "if [ -e #{previous_release}/public/sitemap_index.xml.gz ]; then cp #{previous_release}/public/sitemap* #{current_release}/public/; fi"
191
+ end
192
+ end
193
+
194
+ 3) If generation of your sitemap fails for some reason, the old sitemap will remain in public/. This ensures that robots will always find a valid sitemap. Running silently (`rake -s sitemap:refresh`) and with email forwarding setup you'll only get an email if your sitemap fails to build, and no notification when everything is fine - which will be most of the time.
195
+
196
+ Known Bugs
197
+ ========
198
+
199
+ - There's no check on the size of a URL which [isn't supposed to exceed 2,048 bytes][sitemaps_xml].
200
+ - Currently only supports one Sitemap Index file, which can contain 50,000 Sitemap files which can each contain 50,000 urls, so it _only_ supports up to 2,500,000,000 (2.5 billion) urls. I personally have no need of support for more urls, but plugin could be improved to support this.
201
+
202
+ Wishlist & Coming Soon
203
+ ========
204
+
205
+ - Support for generating sitemaps for sites with multiple domains. Sitemaps are generated into subdirectories and we use a Rack middleware to rewrite requests for sitemaps to the correct subdirectory based on the request host.
206
+ - I want to refactor the code because it has grown a lot. Part of this refactoring will include implementing some more checks to make sure we adhere to standards as well as making sure that the sitemaps are being generated as efficiently as possible.
207
+
208
+ I'd like to simplify adding links to a sitemap. Right now it's all or nothing. I'd like to break it up so you can add batches.
209
+ - Auto coverage testing. Generate a report of broken URLs by checking the status codes of each page in the sitemap.
210
+
211
+ Thanks (in no particular order)
212
+ ========
213
+
214
+ - [Alexadre Bini](http://github.com/alexandrebini) for image sitemaps
215
+ - [Dan Pickett](http://github.com/dpickett)
216
+ - [Rob Biedenharn](http://github.com/rab)
217
+ - [Richie Vos](http://github.com/jerryvos)
218
+ - [Adrian Mugnolo](http://github.com/xymbol)
219
+ - [Jason Weathered](http://github.com/jasoncodes)
220
+
221
+ Copyright (c) 2009 Karl Varga released under the MIT license
222
+
223
+ [canonical_repo]:http://github.com/kjvarga/sitemap_generator
224
+ [enterprise_class]:https://twitter.com/dhh/status/1631034662 "I use enterprise in the same sense the Phusion guys do - i.e. Enterprise Ruby. Please don't look down on my use of the word 'enterprise' to represent being a cut above. It doesn't mean you ever have to work for a company the size of IBM. Or constantly fight inertia, writing crappy software, adhering to change management practices and spending hours in meetings... Not that there's anything wrong with that - Wait, what?"
225
+ [sitemap_engines]:http://en.wikipedia.org/wiki/Sitemap_index "http://en.wikipedia.org/wiki/Sitemap_index"
226
+ [sitemaps_org]:http://www.sitemaps.org/protocol.php "http://www.sitemaps.org/protocol.php"
227
+ [sitemaps_xml]:http://www.sitemaps.org/protocol.php#xmlTagDefinitions "XML Tag Definitions"
228
+ [sitemap_generator_usage]:http://wiki.github.com/adamsalter/sitemap_generator/sitemapgenerator-usage "http://wiki.github.com/adamsalter/sitemap_generator/sitemapgenerator-usage"
229
+ [boost_juice]:http://www.boostjuice.com.au/ "Mmmm, sweet, sweet Boost Juice."
230
+ [cb]:http://codebright.net "http://codebright.net"
231
+ [sitemap_images]:http://www.google.com/support/webmasters/bin/answer.py?answer=178636
232
+ [sitemap_protocol]:http://sitemaps.org/protocol.php
@@ -0,0 +1,114 @@
1
+ require 'rake'
2
+ require 'rake/rdoctask'
3
+ require 'rubygems'
4
+ gem 'rspec', '1.3.0'
5
+ require 'spec/rake/spectask'
6
+ gem 'nokogiri'
7
+
8
+ begin
9
+ require 'jeweler'
10
+ Jeweler::Tasks.new do |gem|
11
+ gem.name = "airblade-sitemap_generator"
12
+ gem.summary = %Q{Easily generate enterprise class Sitemaps for your Rails site using a familiar Rails Routes-like DSL}
13
+ gem.description = %Q{A Rails 3-compatible gem/plugin to generate enterprise-class Sitemaps using a familiar Rails Routes-like DSL. Sitemaps are readable by all search engines and adhere to the Sitemap protocol specification. Automatically pings search engines to notify them of new sitemaps (including Google, Yahoo and Bing). Provides rake tasks to easily manage your sitemaps. Supports image sitemaps and handles millions of links.}
14
+ gem.email = "boss@airbladesoftware.com"
15
+ gem.homepage = "http://github.com/airblade/sitemap_generator"
16
+ gem.authors = ["Adam Salter", "Karl Varga"]
17
+ gem.files = FileList["[A-Z]*", "{bin,lib,rails,templates,tasks}/**/*"]
18
+ gem.test_files = []
19
+ gem.add_development_dependency "rspec"
20
+ gem.add_development_dependency "nokogiri"
21
+ end
22
+ Jeweler::GemcutterTasks.new
23
+ rescue LoadError
24
+ puts "Jeweler (or a dependency) not available. Install it with: gem install jeweler"
25
+ end
26
+
27
+ #
28
+ # Helper methods
29
+ #
30
+ module Helpers
31
+ extend self
32
+
33
+ # Return a full local path to path fragment <tt>path</tt>
34
+ def local_path(path)
35
+ File.join(File.dirname(__FILE__), path)
36
+ end
37
+
38
+ # Copy all of the local files into <tt>path</tt> after completely cleaning it
39
+ def prepare_path(path)
40
+ rm_rf path
41
+ mkdir_p path
42
+ cp_r(FileList["[A-Z]*", "{bin,lib,rails,templates,tasks}"], path)
43
+ end
44
+ end
45
+
46
+ #
47
+ # Tasks
48
+ #
49
+ task :default => :test
50
+
51
+ namespace :test do
52
+ #desc "Test as a gem, plugin and Rails 3 gem"
53
+ #task :all => ['test:gem', 'test:plugin']
54
+
55
+ task :gem => ['test:prepare:gem', 'multi_spec']
56
+ task :plugin => ['test:prepare:plugin', 'multi_spec']
57
+ task :rails3 => ['test:prepare:rails3', 'multi_spec']
58
+
59
+ task :multi_spec do
60
+ Rake::Task['spec'].invoke
61
+ Rake::Task['spec'].reenable
62
+ end
63
+
64
+ namespace :prepare do
65
+ task :gem do
66
+ ENV["SITEMAP_RAILS"] = 'gem'
67
+ Helpers.prepare_path(Helpers.local_path('spec/mock_app_gem/vendor/gems/sitemap_generator-1.2.3'))
68
+ rm_rf(Helpers.local_path('spec/mock_app_gem/public/sitemap*'))
69
+ end
70
+
71
+ task :plugin do
72
+ ENV["SITEMAP_RAILS"] = 'plugin'
73
+ Helpers.prepare_path(Helpers.local_path('spec/mock_app_plugin/vendor/plugins/sitemap_generator-1.2.3'))
74
+ rm_rf(Helpers.local_path('spec/mock_app_plugin/public/sitemap*'))
75
+ end
76
+
77
+ task :rails3 do
78
+ ENV["SITEMAP_RAILS"] = 'rails3'
79
+ rm_rf(Helpers.local_path('spec/mock_rails3_gem/public/sitemap*'))
80
+ end
81
+ end
82
+ end
83
+
84
+ desc "Release a new patch version"
85
+ task :release_new_version do
86
+ Rake::Task['version:bump:patch'].invoke
87
+ Rake::Task['github:release'].invoke
88
+ Rake::Task['git:release'].invoke
89
+ Rake::Task['gemcutter:release'].invoke
90
+ end
91
+
92
+ desc "Run tests as a gem install"
93
+ task :test => ['test:gem']
94
+
95
+ Spec::Rake::SpecTask.new(:spec) do |spec|
96
+ spec.libs << 'lib' << 'spec'
97
+ spec.spec_files = FileList['spec/**/*_spec.rb']
98
+ end
99
+ task :spec => :check_dependencies
100
+
101
+ Spec::Rake::SpecTask.new(:rcov) do |spec|
102
+ spec.libs << 'lib' << 'spec'
103
+ spec.pattern = 'spec/**/*_spec.rb'
104
+ spec.rcov = true
105
+ end
106
+
107
+ desc 'Generate documentation'
108
+ Rake::RDocTask.new(:rdoc) do |rdoc|
109
+ rdoc.rdoc_dir = 'rdoc'
110
+ rdoc.title = 'SitemapGenerator'
111
+ rdoc.options << '--line-numbers' << '--inline-source'
112
+ rdoc.rdoc_files.include('README.md')
113
+ rdoc.rdoc_files.include('lib/**/*.rb')
114
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.3.4
@@ -0,0 +1,28 @@
1
+ require 'sitemap_generator/builder'
2
+ require 'sitemap_generator/mapper'
3
+ require 'sitemap_generator/link'
4
+ require 'sitemap_generator/link_set'
5
+ require 'sitemap_generator/templates'
6
+ require 'sitemap_generator/utilities'
7
+ require 'sitemap_generator/railtie' if SitemapGenerator::Utilities.rails3?
8
+
9
+ require 'active_support/core_ext/numeric'
10
+
11
+ module SitemapGenerator
12
+ silence_warnings do
13
+ VERSION = File.read(File.dirname(__FILE__) + "/../VERSION").strip
14
+ MAX_SITEMAP_FILES = 50_000 # max sitemap links per index file
15
+ MAX_SITEMAP_LINKS = 50_000 # max links per sitemap
16
+ MAX_SITEMAP_IMAGES = 1_000 # max images per url
17
+ MAX_SITEMAP_FILESIZE = 10.megabytes # bytes
18
+
19
+ Sitemap = LinkSet.new
20
+ end
21
+
22
+ class << self
23
+ attr_accessor :root, :templates
24
+ end
25
+
26
+ self.root = File.expand_path(File.join(File.dirname(__FILE__), '../'))
27
+ self.templates = SitemapGenerator::Templates.new(self.root)
28
+ end
@@ -0,0 +1,9 @@
1
+ require 'sitemap_generator/builder/helper'
2
+ require 'sitemap_generator/builder/sitemap_file'
3
+ require 'sitemap_generator/builder/sitemap_index_file'
4
+
5
+ module SitemapGenerator
6
+ module Builder
7
+
8
+ end
9
+ end
@@ -0,0 +1,10 @@
1
+ module SitemapGenerator
2
+ module Builder
3
+ module Helper
4
+
5
+ def w3c_date(date)
6
+ date.utc.strftime("%Y-%m-%dT%H:%M:%S+00:00")
7
+ end
8
+ end
9
+ end
10
+ end
@@ -0,0 +1,124 @@
1
+ require 'sitemap_generator/builder/helper'
2
+ require 'builder'
3
+ require 'zlib'
4
+
5
+ module SitemapGenerator
6
+ module Builder
7
+ class SitemapFile
8
+ include SitemapGenerator::Builder::Helper
9
+
10
+ attr_accessor :sitemap_path, :public_path, :filesize, :link_count, :hostname
11
+
12
+ # <tt>public_path</tt> full path of the directory to write sitemaps in.
13
+ # Usually your Rails <tt>public/</tt> directory.
14
+ #
15
+ # <tt>sitemap_path</tt> relative path including filename of the sitemap
16
+ # file relative to <tt>public_path</tt>
17
+ #
18
+ # <tt>hostname</tt> hostname including protocol to use in all links
19
+ # e.g. http://en.google.ca
20
+ def initialize(public_path, sitemap_path, hostname)
21
+ self.sitemap_path = sitemap_path
22
+ self.public_path = public_path
23
+ self.hostname = hostname
24
+ self.link_count = 0
25
+
26
+ @xml_content = '' # XML urlset content
27
+ @xml_wrapper_start = <<-HTML
28
+ <?xml version="1.0" encoding="UTF-8"?>
29
+ <urlset
30
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
31
+ xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
32
+ xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
33
+ http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
34
+ xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
35
+ >
36
+ HTML
37
+ @xml_wrapper_start.gsub!(/\s+/, ' ').gsub!(/ *> */, '>').strip!
38
+ @xml_wrapper_end = %q[</urlset>]
39
+ self.filesize = @xml_wrapper_start.length + @xml_wrapper_end.length
40
+ end
41
+
42
+ def lastmod
43
+ File.mtime(self.full_path) rescue nil
44
+ end
45
+
46
+ def empty?
47
+ self.link_count == 0
48
+ end
49
+
50
+ def full_url
51
+ URI.join(self.hostname, self.sitemap_path).to_s
52
+ end
53
+
54
+ def full_path
55
+ @full_path ||= File.join(self.public_path, self.sitemap_path)
56
+ end
57
+
58
+ # Return a boolean indicating whether the sitemap file can fit another link
59
+ # of <tt>bytes</tt> bytes in size.
60
+ def file_can_fit?(bytes)
61
+ (self.filesize + bytes) < SitemapGenerator::MAX_SITEMAP_FILESIZE && self.link_count < SitemapGenerator::MAX_SITEMAP_LINKS
62
+ end
63
+
64
+ # Add a link to the sitemap file and return a boolean indicating whether the
65
+ # link was added.
66
+ #
67
+ # If a link cannot be added, the file is too large or the link limit has been reached.
68
+ def add_link(link)
69
+ xml = build_xml(::Builder::XmlMarkup.new, link)
70
+ unless file_can_fit?(xml.length)
71
+ self.finalize!
72
+ return false
73
+ end
74
+
75
+ @xml_content << xml
76
+ self.filesize += xml.length
77
+ self.link_count += 1
78
+ true
79
+ end
80
+ alias_method :<<, :add_link
81
+
82
+ # Return XML as a String
83
+ def build_xml(builder, link)
84
+ builder.url do
85
+ builder.loc link[:loc]
86
+ builder.lastmod w3c_date(link[:lastmod]) if link[:lastmod]
87
+ builder.changefreq link[:changefreq] if link[:changefreq]
88
+ builder.priority link[:priority] if link[:priority]
89
+
90
+ unless link[:images].blank?
91
+ link[:images].each do |image|
92
+ builder.image:image do
93
+ builder.image :loc, image[:loc]
94
+ builder.image :caption, image[:caption] if image[:caption]
95
+ builder.image :geo_location, image[:geo_location] if image[:geo_location]
96
+ builder.image :title, image[:title] if image[:title]
97
+ builder.image :license, image[:license] if image[:license]
98
+ end
99
+ end
100
+ end
101
+ end
102
+ builder << ''
103
+ end
104
+
105
+ # Insert the content into the XML "wrapper" and write and close the file.
106
+ #
107
+ # All the xml content in the instance is cleared, but attributes like
108
+ # <tt>filesize</tt> are still available.
109
+ def finalize!
110
+ return if self.frozen?
111
+
112
+ open(self.full_path, 'wb') do |file|
113
+ gz = Zlib::GzipWriter.new(file)
114
+ gz.write @xml_wrapper_start
115
+ gz.write @xml_content
116
+ gz.write @xml_wrapper_end
117
+ gz.close
118
+ end
119
+ @xml_content = @xml_wrapper_start = @xml_wrapper_end = ''
120
+ self.freeze
121
+ end
122
+ end
123
+ end
124
+ end
@@ -0,0 +1,33 @@
1
+ module SitemapGenerator
2
+ module Builder
3
+ class SitemapIndexFile < SitemapFile
4
+
5
+ def initialize(*args)
6
+ super(*args)
7
+
8
+ @xml_content = '' # XML urlset content
9
+ @xml_wrapper_start = <<-HTML
10
+ <?xml version="1.0" encoding="UTF-8"?>
11
+ <sitemapindex
12
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
13
+ xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
14
+ http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd"
15
+ xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
16
+ >
17
+ HTML
18
+ @xml_wrapper_start.gsub!(/\s+/, ' ').gsub!(/ *> */, '>').strip!
19
+ @xml_wrapper_end = %q[</sitemapindex>]
20
+ self.filesize = @xml_wrapper_start.length + @xml_wrapper_end.length
21
+ end
22
+
23
+ # Return XML as a String
24
+ def build_xml(builder, link)
25
+ builder.sitemap do
26
+ builder.loc link[:loc]
27
+ builder.lastmod w3c_date(link[:lastmod]) if link[:lastmod]
28
+ end
29
+ builder << ''
30
+ end
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,28 @@
1
+ module SitemapGenerator
2
+
3
+ # Evaluate a sitemap config file within the context of a class that includes the
4
+ # Rails URL helpers.
5
+ class Interpreter
6
+
7
+ if SitemapGenerator::Utilities.rails3?
8
+ include ::Rails.application.routes.url_helpers
9
+ else
10
+ require 'action_controller'
11
+ include ActionController::UrlWriter
12
+ end
13
+
14
+ def initialize(sitemap_config_file=nil)
15
+ sitemap_config_file ||= File.join(::Rails.root, 'config/sitemap.rb')
16
+ eval(open(sitemap_config_file).read)
17
+ end
18
+
19
+ # KJV do we need this? We should be using path_* helpers.
20
+ # def self.default_url_options(options = nil)
21
+ # { :host => SitemapGenerator::Sitemap.default_host }
22
+ # end
23
+
24
+ def self.run
25
+ new
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,36 @@
1
+ module SitemapGenerator
2
+ module Link
3
+ extend self
4
+
5
+ # Return a Hash of options suitable to pass to a SitemapGenerator::Builder::SitemapFile instance.
6
+ def generate(path, options = {})
7
+ if path.is_a?(SitemapGenerator::Builder::SitemapFile)
8
+ options.reverse_merge!(:host => path.hostname, :lastmod => path.lastmod)
9
+ path = path.sitemap_path
10
+ end
11
+
12
+ options.assert_valid_keys(:priority, :changefreq, :lastmod, :host, :images)
13
+ options.reverse_merge!(:priority => 0.5, :changefreq => 'weekly', :lastmod => Time.now, :host => Sitemap.default_host, :images => [])
14
+ {
15
+ :path => path,
16
+ :priority => options[:priority],
17
+ :changefreq => options[:changefreq],
18
+ :lastmod => options[:lastmod],
19
+ :host => options[:host],
20
+ :loc => URI.join(options[:host], path).to_s,
21
+ :images => prepare_images(options[:images], options[:host])
22
+ }
23
+ end
24
+
25
+ # Return an Array of image option Hashes suitable to be parsed by SitemapGenerator::Builder::SitemapFile
26
+ def prepare_images(images, host)
27
+ images.delete_if { |key,value| key[:loc] == nil }
28
+ images.each do |r|
29
+ r.assert_valid_keys(:loc, :caption, :geo_location, :title, :license)
30
+ r[:loc] = URI.join(host, r[:loc]).to_s
31
+ end
32
+ images[0..(SitemapGenerator::MAX_SITEMAP_IMAGES-1)]
33
+ end
34
+ end
35
+ end
36
+
@@ -0,0 +1,174 @@
1
+ require 'builder'
2
+ require 'action_view'
3
+
4
+ # A LinkSet provisions a bunch of links to sitemap files. It also writes the index file
5
+ # which lists all the sitemap files written.
6
+ module SitemapGenerator
7
+ class LinkSet
8
+ include ActionView::Helpers::NumberHelper # for number_with_delimiter
9
+
10
+ attr_accessor :default_host, :public_path, :sitemaps_path
11
+ attr_accessor :sitemap, :sitemaps, :sitemap_index
12
+ attr_accessor :verbose, :yahoo_app_id
13
+
14
+ # Evaluate the sitemap config file and write all sitemaps.
15
+ #
16
+ # This should be refactored so that we can have multiple instances
17
+ # of LinkSet.
18
+ def create
19
+ require 'sitemap_generator/interpreter'
20
+
21
+ self.public_path = File.join(::Rails.root, 'public/') if self.public_path.nil?
22
+
23
+ start_time = Time.now
24
+ SitemapGenerator::Interpreter.run
25
+ finalize!
26
+ end_time = Time.now
27
+
28
+ puts "\nSitemap stats: #{number_with_delimiter(self.link_count)} links / #{self.sitemaps.size} files / " + ("%dm%02ds" % (end_time - start_time).divmod(60)) if verbose
29
+ end
30
+
31
+ # <tt>public_path</tt> (optional) full path to the directory to write sitemaps in.
32
+ # Defaults to your Rails <tt>public/</tt> directory.
33
+ #
34
+ # <tt>sitemaps_path</tt> (optional) path fragment within public to write sitemaps
35
+ # to e.g. 'en/'. Sitemaps are written to <tt>public_path</tt> + <tt>sitemaps_path</tt>
36
+ #
37
+ # <tt>default_host</tt> hostname including protocol to use in all sitemap links
38
+ # e.g. http://en.google.ca
39
+ def initialize(public_path = nil, sitemaps_path = nil, default_host = nil)
40
+ self.default_host = default_host
41
+ self.public_path = public_path
42
+ self.sitemaps_path = sitemaps_path
43
+
44
+ # Completed sitemaps
45
+ self.sitemaps = []
46
+ end
47
+
48
+ def link_count
49
+ self.sitemaps.inject(0) { |link_count_sum, sitemap| link_count_sum + sitemap.link_count }
50
+ end
51
+
52
+ # Called within the user's eval'ed sitemap config file. Add links to sitemap files
53
+ # passing a block.
54
+ #
55
+ # TODO: Refactor. The call chain is confusing and convoluted here.
56
+ def add_links
57
+ raise ArgumentError, "Default hostname not set" if default_host.blank?
58
+
59
+ # I'd rather have these calls in <tt>create</tt> but we have to wait
60
+ # for <tt>default_host</tt> to be set by the user's sitemap config
61
+ new_sitemap
62
+ add_default_links
63
+
64
+ yield Mapper.new(self)
65
+ end
66
+
67
+ # Called from Mapper.
68
+ #
69
+ # Add a link to the current sitemap.
70
+ def add_link(link)
71
+ unless self.sitemap << link
72
+ new_sitemap
73
+ self.sitemap << link
74
+ end
75
+ end
76
+
77
+ # Add the current sitemap to the <tt>sitemaps</tt> Array and
78
+ # start a new sitemap.
79
+ #
80
+ # If the current sitemap is nil or empty it is not added.
81
+ def new_sitemap
82
+ unless self.sitemap_index
83
+ self.sitemap_index = SitemapGenerator::Builder::SitemapIndexFile.new(public_path, sitemap_index_path, default_host)
84
+ end
85
+
86
+ unless self.sitemap
87
+ self.sitemap = SitemapGenerator::Builder::SitemapFile.new(public_path, new_sitemap_path, default_host)
88
+ end
89
+
90
+ # Mark the sitemap as complete and add it to the sitemap index
91
+ unless self.sitemap.empty?
92
+ self.sitemap.finalize!
93
+ self.sitemap_index << Link.generate(self.sitemap)
94
+ self.sitemaps << self.sitemap
95
+ show_progress(self.sitemap) if verbose
96
+
97
+ self.sitemap = SitemapGenerator::Builder::SitemapFile.new(public_path, new_sitemap_path, default_host)
98
+ end
99
+ end
100
+
101
+ # Report progress line.
102
+ def show_progress(sitemap)
103
+ uncompressed_size = number_to_human_size(sitemap.filesize)
104
+ compressed_size = number_to_human_size(File.size?(sitemap.full_path))
105
+ puts "+ #{sitemap.sitemap_path} #{sitemap.link_count} links / #{uncompressed_size} / #{compressed_size} gzipped"
106
+ end
107
+
108
+ # Finalize all sitemap files
109
+ def finalize!
110
+ new_sitemap
111
+ self.sitemap_index.finalize!
112
+ end
113
+
114
+ # Ping search engines.
115
+ #
116
+ # @see http://en.wikipedia.org/wiki/Sitemap_index
117
+ def ping_search_engines
118
+ require 'open-uri'
119
+
120
+ sitemap_index_url = CGI.escape(self.sitemap_index.full_url)
121
+ search_engines = {
122
+ :google => "http://www.google.com/webmasters/sitemaps/ping?sitemap=#{sitemap_index_url}",
123
+ :yahoo => "http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=#{sitemap_index_url}&appid=#{yahoo_app_id}",
124
+ :ask => "http://submissions.ask.com/ping?sitemap=#{sitemap_index_url}",
125
+ :bing => "http://www.bing.com/webmaster/ping.aspx?siteMap=#{sitemap_index_url}",
126
+ :sitemap_writer => "http://www.sitemapwriter.com/notify.php?crawler=all&url=#{sitemap_index_url}"
127
+ }
128
+
129
+ puts "\n" if verbose
130
+ search_engines.each do |engine, link|
131
+ next if engine == :yahoo && !self.yahoo_app_id
132
+ begin
133
+ open(link)
134
+ puts "Successful ping of #{engine.to_s.titleize}" if verbose
135
+ rescue Timeout::Error, StandardError => e
136
+ puts "Ping failed for #{engine.to_s.titleize}: #{e.inspect} (URL #{link})" if verbose
137
+ end
138
+ end
139
+
140
+ if !self.yahoo_app_id && verbose
141
+ puts "\n"
142
+ puts <<-END.gsub(/^\s+/, '')
143
+ To ping Yahoo you require a Yahoo AppID. Add it to your config/sitemap.rb with:
144
+
145
+ SitemapGenerator::Sitemap.yahoo_app_id = "my_app_id"
146
+
147
+ For more information see http://developer.yahoo.com/search/siteexplorer/V1/updateNotification.html
148
+ END
149
+ end
150
+ end
151
+
152
+ protected
153
+
154
+ def add_default_links
155
+ self.sitemap << Link.generate('/', :lastmod => Time.now, :changefreq => 'always', :priority => 1.0)
156
+ self.sitemap << Link.generate(self.sitemap_index, :lastmod => Time.now, :changefreq => 'always', :priority => 1.0)
157
+ end
158
+
159
+ # Return the current sitemap filename with index.
160
+ #
161
+ # The index depends on the length of the <tt>sitemaps</tt> array.
162
+ def new_sitemap_path
163
+ File.join(self.sitemaps_path || '', "sitemap#{self.sitemaps.length + 1}.xml.gz")
164
+ end
165
+
166
+ # Return the current sitemap index filename.
167
+ #
168
+ # At the moment we only support one index file which can link to
169
+ # up to 50,000 sitemap files.
170
+ def sitemap_index_path
171
+ File.join(self.sitemaps_path || '', 'sitemap_index.xml.gz')
172
+ end
173
+ end
174
+ end
@@ -0,0 +1,16 @@
1
+ module SitemapGenerator
2
+ # Generator instances are used to build links.
3
+ # The object passed to the add_links block in config/sitemap.rb is a Generator instance.
4
+ class Mapper
5
+ attr_accessor :set
6
+
7
+ def initialize(set)
8
+ @set = set
9
+ end
10
+
11
+ def add(loc, options = {})
12
+ set.add_link Link.generate(loc, options)
13
+ end
14
+ end
15
+ end
16
+
@@ -0,0 +1,7 @@
1
+ module SitemapGenerator
2
+ class Railtie < Rails::Railtie
3
+ rake_tasks do
4
+ load File.expand_path('../../../tasks/sitemap_generator_tasks.rake', __FILE__)
5
+ end
6
+ end
7
+ end
@@ -0,0 +1 @@
1
+ load File.expand_path(File.join(File.dirname(__FILE__), '../../tasks/sitemap_generator_tasks.rake'))
@@ -0,0 +1,41 @@
1
+ module SitemapGenerator
2
+ # Provide convenient access to template files. E.g.
3
+ #
4
+ # SitemapGenerator.templates.sitemap_index
5
+ #
6
+ # Lazy-load and cache for efficient access.
7
+ # Define an accessor method for each template file.
8
+ class Templates
9
+ FILES = {
10
+ :sitemap_sample => 'sitemap.rb',
11
+ }
12
+
13
+ # Dynamically define accessors for each key defined in <tt>FILES</tt>
14
+ attr_accessor *FILES.keys
15
+ FILES.keys.each do |name|
16
+ eval <<-END
17
+ define_method(:#{name}) do
18
+ @#{name} ||= read_template(:#{name})
19
+ end
20
+ END
21
+ end
22
+
23
+ def initialize(root = SitemapGenerator.root)
24
+ @root = root
25
+ end
26
+
27
+ # Return the full path to a template.
28
+ #
29
+ # <tt>file</tt> template symbol e.g. <tt>:sitemap_sample</tt>
30
+ def template_path(template)
31
+ File.join(@root, 'templates', self.class::FILES[template])
32
+ end
33
+
34
+ protected
35
+
36
+ # Read the template file and return its contents.
37
+ def read_template(template)
38
+ File.read(template_path(template))
39
+ end
40
+ end
41
+ end
@@ -0,0 +1,54 @@
1
+ module SitemapGenerator
2
+ module Utilities
3
+ extend self
4
+
5
+ # Copy templates/sitemap.rb to config if not there yet.
6
+ def install_sitemap_rb(verbose=false)
7
+ if File.exist?(File.join(RAILS_ROOT, 'config/sitemap.rb'))
8
+ puts "already exists: config/sitemap.rb, file not copied" if verbose
9
+ else
10
+ FileUtils.cp(
11
+ SitemapGenerator.templates.template_path(:sitemap_sample),
12
+ File.join(RAILS_ROOT, 'config/sitemap.rb'))
13
+ puts "created: config/sitemap.rb" if verbose
14
+ end
15
+ end
16
+
17
+ # Remove config/sitemap.rb if exists.
18
+ def uninstall_sitemap_rb
19
+ if File.exist?(File.join(RAILS_ROOT, 'config/sitemap.rb'))
20
+ File.rm(File.join(RAILS_ROOT, 'config/sitemap.rb'))
21
+ end
22
+ end
23
+
24
+ # Clean sitemap files in output directory.
25
+ def clean_files
26
+ FileUtils.rm(Dir[File.join(RAILS_ROOT, 'public/sitemap*.xml.gz')])
27
+ end
28
+
29
+ # Returns whether this environment is using ActionPack
30
+ # version 3.0.0 or greater.
31
+ #
32
+ # @return [Boolean]
33
+ def self.rails3?
34
+ # The ActionPack module is always loaded automatically in Rails >= 3
35
+ return false unless defined?(ActionPack) && defined?(ActionPack::VERSION)
36
+
37
+ version =
38
+ if defined?(ActionPack::VERSION::MAJOR)
39
+ ActionPack::VERSION::MAJOR
40
+ else
41
+ # Rails 1.2
42
+ ActionPack::VERSION::Major
43
+ end
44
+
45
+ # 3.0.0.beta1 acts more like ActionPack 2
46
+ # for purposes of this method
47
+ # (checking whether block helpers require = or -).
48
+ # This extra check can be removed when beta2 is out.
49
+ version >= 3 &&
50
+ !(defined?(ActionPack::VERSION::TINY) &&
51
+ ActionPack::VERSION::TINY == "0.beta")
52
+ end
53
+ end
54
+ end
@@ -0,0 +1,2 @@
1
+ # Install hook code here
2
+ SitemapGenerator::Utilities.install_sitemap_rb
@@ -0,0 +1,2 @@
1
+ # Uninstall hook code here
2
+ SitemapGenerator::Utilities.uninstall_sitemap_rb
@@ -0,0 +1,31 @@
1
+ begin
2
+ require 'sitemap_generator'
3
+ rescue LoadError, NameError
4
+ # Application should work without vlad
5
+ end
6
+
7
+ namespace :sitemap do
8
+ desc "Install a default config/sitemap.rb file"
9
+ task :install do
10
+ SitemapGenerator::Utilities.install_sitemap_rb(verbose)
11
+ end
12
+
13
+ desc "Delete all Sitemap files in public/ directory"
14
+ task :clean do
15
+ SitemapGenerator::Utilities.clean_files
16
+ end
17
+
18
+ desc "Create Sitemap XML files in public/ directory (rake -s for no output)"
19
+ task :refresh => ['sitemap:create'] do
20
+ SitemapGenerator::Sitemap.ping_search_engines
21
+ end
22
+
23
+ desc "Create Sitemap XML files (don't ping search engines)"
24
+ task 'refresh:no_ping' => ['sitemap:create']
25
+
26
+ task :create => [:environment] do
27
+ SitemapGenerator::Sitemap.verbose = verbose
28
+ SitemapGenerator::Sitemap.create
29
+ end
30
+ end
31
+
@@ -0,0 +1,42 @@
1
+ # Set the host name for URL creation
2
+ SitemapGenerator::Sitemap.default_host = "http://www.example.com"
3
+
4
+ SitemapGenerator::Sitemap.add_links do |sitemap|
5
+ # Put links creation logic here.
6
+ #
7
+ # The root path '/' and sitemap index file are added automatically.
8
+ # Links are added to the Sitemap in the order they are specified.
9
+ #
10
+ # Usage: sitemap.add path, options
11
+ # (default options are used if you don't specify)
12
+ #
13
+ # Defaults: :priority => 0.5, :changefreq => 'weekly',
14
+ # :lastmod => Time.now, :host => default_host
15
+
16
+
17
+ # Examples:
18
+
19
+ # add '/articles'
20
+ sitemap.add articles_path, :priority => 0.7, :changefreq => 'daily'
21
+
22
+ # add all individual articles
23
+ Article.find(:all).each do |a|
24
+ sitemap.add article_path(a), :lastmod => a.updated_at
25
+ end
26
+
27
+ # add merchant path
28
+ sitemap.add '/purchase', :priority => 0.7, :host => "https://www.example.com"
29
+
30
+ end
31
+
32
+ # Including Sitemaps from Rails Engines.
33
+ #
34
+ # These Sitemaps should be almost identical to a regular Sitemap file except
35
+ # they needn't define their own SitemapGenerator::Sitemap.default_host since
36
+ # they will undoubtedly share the host name of the application they belong to.
37
+ #
38
+ # As an example, say we have a Rails Engine in vendor/plugins/cadability_client
39
+ # We can include its Sitemap here as follows:
40
+ #
41
+ # file = File.join(Rails.root, 'vendor/plugins/cadability_client/config/sitemap.rb')
42
+ # eval(open(file).read, binding, file)
metadata ADDED
@@ -0,0 +1,115 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: airblade-sitemap_generator
3
+ version: !ruby/object:Gem::Version
4
+ hash: 27
5
+ prerelease: false
6
+ segments:
7
+ - 0
8
+ - 3
9
+ - 4
10
+ version: 0.3.4
11
+ platform: ruby
12
+ authors:
13
+ - Adam Salter
14
+ - Karl Varga
15
+ autorequire:
16
+ bindir: bin
17
+ cert_chain: []
18
+
19
+ date: 2010-06-28 00:00:00 +01:00
20
+ default_executable:
21
+ dependencies:
22
+ - !ruby/object:Gem::Dependency
23
+ name: rspec
24
+ prerelease: false
25
+ requirement: &id001 !ruby/object:Gem::Requirement
26
+ none: false
27
+ requirements:
28
+ - - ">="
29
+ - !ruby/object:Gem::Version
30
+ hash: 3
31
+ segments:
32
+ - 0
33
+ version: "0"
34
+ type: :development
35
+ version_requirements: *id001
36
+ - !ruby/object:Gem::Dependency
37
+ name: nokogiri
38
+ prerelease: false
39
+ requirement: &id002 !ruby/object:Gem::Requirement
40
+ none: false
41
+ requirements:
42
+ - - ">="
43
+ - !ruby/object:Gem::Version
44
+ hash: 3
45
+ segments:
46
+ - 0
47
+ version: "0"
48
+ type: :development
49
+ version_requirements: *id002
50
+ description: A Rails 3-compatible gem/plugin to generate enterprise-class Sitemaps using a familiar Rails Routes-like DSL. Sitemaps are readable by all search engines and adhere to the Sitemap protocol specification. Automatically pings search engines to notify them of new sitemaps (including Google, Yahoo and Bing). Provides rake tasks to easily manage your sitemaps. Supports image sitemaps and handles millions of links.
51
+ email: boss@airbladesoftware.com
52
+ executables: []
53
+
54
+ extensions: []
55
+
56
+ extra_rdoc_files:
57
+ - README.md
58
+ files:
59
+ - MIT-LICENSE
60
+ - README.md
61
+ - Rakefile
62
+ - VERSION
63
+ - lib/sitemap_generator.rb
64
+ - lib/sitemap_generator/builder.rb
65
+ - lib/sitemap_generator/builder/helper.rb
66
+ - lib/sitemap_generator/builder/sitemap_file.rb
67
+ - lib/sitemap_generator/builder/sitemap_index_file.rb
68
+ - lib/sitemap_generator/interpreter.rb
69
+ - lib/sitemap_generator/link.rb
70
+ - lib/sitemap_generator/link_set.rb
71
+ - lib/sitemap_generator/mapper.rb
72
+ - lib/sitemap_generator/railtie.rb
73
+ - lib/sitemap_generator/tasks.rb
74
+ - lib/sitemap_generator/templates.rb
75
+ - lib/sitemap_generator/utilities.rb
76
+ - rails/install.rb
77
+ - rails/uninstall.rb
78
+ - tasks/sitemap_generator_tasks.rake
79
+ - templates/sitemap.rb
80
+ has_rdoc: true
81
+ homepage: http://github.com/airblade/sitemap_generator
82
+ licenses: []
83
+
84
+ post_install_message:
85
+ rdoc_options:
86
+ - --charset=UTF-8
87
+ require_paths:
88
+ - lib
89
+ required_ruby_version: !ruby/object:Gem::Requirement
90
+ none: false
91
+ requirements:
92
+ - - ">="
93
+ - !ruby/object:Gem::Version
94
+ hash: 3
95
+ segments:
96
+ - 0
97
+ version: "0"
98
+ required_rubygems_version: !ruby/object:Gem::Requirement
99
+ none: false
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ hash: 3
104
+ segments:
105
+ - 0
106
+ version: "0"
107
+ requirements: []
108
+
109
+ rubyforge_project:
110
+ rubygems_version: 1.3.7
111
+ signing_key:
112
+ specification_version: 3
113
+ summary: Easily generate enterprise class Sitemaps for your Rails site using a familiar Rails Routes-like DSL
114
+ test_files: []
115
+