apsoto-sitemap_generator 1.0.1.dev
Sign up to get free protection for your applications and to get access to all the features.
- data/MIT-LICENSE +20 -0
- data/README.md +227 -0
- data/Rakefile +123 -0
- data/VERSION +1 -0
- data/lib/sitemap_generator/builder/helper.rb +10 -0
- data/lib/sitemap_generator/builder/sitemap_file.rb +155 -0
- data/lib/sitemap_generator/builder/sitemap_index_file.rb +33 -0
- data/lib/sitemap_generator/builder.rb +9 -0
- data/lib/sitemap_generator/interpreter.rb +28 -0
- data/lib/sitemap_generator/link.rb +37 -0
- data/lib/sitemap_generator/link_set.rb +174 -0
- data/lib/sitemap_generator/mapper.rb +16 -0
- data/lib/sitemap_generator/railtie.rb +7 -0
- data/lib/sitemap_generator/tasks.rb +1 -0
- data/lib/sitemap_generator/templates.rb +41 -0
- data/lib/sitemap_generator/utilities.rb +36 -0
- data/lib/sitemap_generator.rb +28 -0
- data/rails/install.rb +2 -0
- data/rails/uninstall.rb +2 -0
- data/tasks/sitemap_generator_tasks.rake +39 -0
- data/templates/sitemap.rb +42 -0
- metadata +133 -0
data/MIT-LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2009 [name of plugin creator]
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,227 @@
|
|
1
|
+
SitemapGenerator
|
2
|
+
================
|
3
|
+
|
4
|
+
SitemapGenerator is a Rails gem that makes it easy to generate ['enterprise-class'][enterprise_class] Sitemaps readable by all search engines. Generated Sitemaps adhere to the ['Sitemap protocol specification'][sitemap_protocol]. When you generate new Sitemaps, SitemapGenerator can automatically ping the major search engines (including Google, Yahoo and Bing) to notify them. SitemapGenerator includes rake tasks to easily manage your sitemaps.
|
5
|
+
|
6
|
+
Features
|
7
|
+
-------
|
8
|
+
|
9
|
+
- v0.2.6: ['Google Image Sitemap'][sitemap_images] support
|
10
|
+
- v0.2.5: Rails 3 support (beta)
|
11
|
+
|
12
|
+
- Adheres to the ['Sitemap protocol specification'][sitemap_protocol]
|
13
|
+
- Handles millions of links
|
14
|
+
- Automatic Gzip of Sitemap files
|
15
|
+
- Automatic ping of search engines to notify them of new sitemaps: Google, Yahoo, Bing, Ask, SitemapWriter
|
16
|
+
- Leaves your old sitemaps in place if a new one fails to generate
|
17
|
+
- Allows you to set the hostname for the links in your Sitemap
|
18
|
+
|
19
|
+
Foreword
|
20
|
+
-------
|
21
|
+
|
22
|
+
Unfortunately, Adam Salter passed away in 2009. Those who knew him know what an amazing guy he was, and what an excellent Rails programmer he was. His passing is a great loss to the Rails community.
|
23
|
+
|
24
|
+
[Karl Varga](http://github.com/kjvarga) has taken over development of SitemapGenerator. The canonical repository is [http://github.com/kjvarga/sitemap_generator][canonical_repo]
|
25
|
+
|
26
|
+
Installation
|
27
|
+
=======
|
28
|
+
|
29
|
+
**Rails 3:**
|
30
|
+
|
31
|
+
1. Add the gem to your <tt>Gemspec</tt>
|
32
|
+
|
33
|
+
<code>gem 'sitemap_generator'</code>
|
34
|
+
|
35
|
+
2. `$ rake sitemap:install`
|
36
|
+
|
37
|
+
**Rails 2.x: As a gem**
|
38
|
+
|
39
|
+
1. Add the gem as a dependency in your <tt>config/environment.rb</tt>
|
40
|
+
|
41
|
+
<code>config.gem 'sitemap_generator', :lib => false</code>
|
42
|
+
|
43
|
+
2. `$ rake gems:install`
|
44
|
+
|
45
|
+
3. Add the following to your <tt>RAILS_ROOT/Rakefile</tt>
|
46
|
+
|
47
|
+
<pre>begin
|
48
|
+
require 'sitemap_generator/tasks'
|
49
|
+
rescue Exception => e
|
50
|
+
puts "Warning, couldn't load gem tasks: #{e.message}! Skipping..."
|
51
|
+
end</pre>
|
52
|
+
|
53
|
+
4. `$ rake sitemap:install`
|
54
|
+
|
55
|
+
**Rails 2.x: As a plugin**
|
56
|
+
|
57
|
+
1. <code>$ ./script/plugin install git://github.com/kjvarga/sitemap_generator.git</code>
|
58
|
+
|
59
|
+
----
|
60
|
+
|
61
|
+
Installation creates a <tt>config/sitemap.rb</tt> file which will contain your logic for generating the Sitemap files. If you want to create this file manually run <code>rake sitemap:install</code>.
|
62
|
+
|
63
|
+
You can run <code>rake sitemap:refresh</code> as needed to create Sitemap files. This will also ping these ['major search engines'][sitemap_engines]: Google, Yahoo, Bing, Ask, SitemapWriter. If you want to disable all non-essential output run the rake task with <code>rake -s sitemap:refresh</code>.
|
64
|
+
|
65
|
+
To keep your Sitemaps up-to-date, setup a cron job. Pass the <tt>-s</tt> option to the rake task to silence all but the most important output. If you're using Whenever, then your schedule would look something like:
|
66
|
+
|
67
|
+
# config/schedule.rb
|
68
|
+
every 1.day, :at => '5:00 am' do
|
69
|
+
rake "-s sitemap:refresh"
|
70
|
+
end
|
71
|
+
|
72
|
+
Optionally, you can add the following to your <code>public/robots.txt</code> file, so that robots can find the sitemap file:
|
73
|
+
|
74
|
+
Sitemap: <hostname>/sitemap_index.xml.gz
|
75
|
+
|
76
|
+
The Sitemap URL in the robots file should be the complete URL to the Sitemap Index, such as <tt>http://www.example.org/sitemap_index.xml.gz</tt>
|
77
|
+
|
78
|
+
|
79
|
+
Example 'config/sitemap.rb'
|
80
|
+
==========
|
81
|
+
|
82
|
+
# Set the host name for URL creation
|
83
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
84
|
+
|
85
|
+
SitemapGenerator::Sitemap.add_links do |sitemap|
|
86
|
+
# Put links creation logic here.
|
87
|
+
#
|
88
|
+
# The Root Path ('/') and Sitemap Index file are added automatically.
|
89
|
+
# Links are added to the Sitemap output in the order they are specified.
|
90
|
+
#
|
91
|
+
# Usage: sitemap.add path, options
|
92
|
+
# (default options are used if you don't specify them)
|
93
|
+
#
|
94
|
+
# Defaults: :priority => 0.5, :changefreq => 'weekly',
|
95
|
+
# :lastmod => Time.now, :host => default_host
|
96
|
+
|
97
|
+
|
98
|
+
# Examples:
|
99
|
+
|
100
|
+
# add '/articles'
|
101
|
+
sitemap.add articles_path, :priority => 0.7, :changefreq => 'daily'
|
102
|
+
|
103
|
+
# add all individual articles
|
104
|
+
Article.find(:all).each do |a|
|
105
|
+
sitemap.add article_path(a), :lastmod => a.updated_at
|
106
|
+
end
|
107
|
+
|
108
|
+
# add merchant path
|
109
|
+
sitemap.add '/purchase', :priority => 0.7, :host => "https://www.example.com"
|
110
|
+
|
111
|
+
# add all individual news with images
|
112
|
+
News.all.each do |n|
|
113
|
+
sitemap.add news_path(n), :lastmod => n.updated_at, :images=>n.images.collect{ |r| :loc=>r.image.url, :title=>r.image.name }
|
114
|
+
end
|
115
|
+
|
116
|
+
end
|
117
|
+
|
118
|
+
# Including Sitemaps from Rails Engines.
|
119
|
+
#
|
120
|
+
# These Sitemaps should be almost identical to a regular Sitemap file except
|
121
|
+
# they needn't define their own SitemapGenerator::Sitemap.default_host since
|
122
|
+
# they will undoubtedly share the host name of the application they belong to.
|
123
|
+
#
|
124
|
+
# As an example, say we have a Rails Engine in vendor/plugins/cadability_client
|
125
|
+
# We can include its Sitemap here as follows:
|
126
|
+
#
|
127
|
+
file = File.join(Rails.root, 'vendor/plugins/cadability_client/config/sitemap.rb')
|
128
|
+
eval(open(file).read, binding, file)
|
129
|
+
|
130
|
+
Raison d'être
|
131
|
+
-------
|
132
|
+
|
133
|
+
Most of the Sitemap plugins out there seem to try to recreate the Sitemap links by iterating the Rails routes. In some cases this is possible, but for a great deal of cases it isn't.
|
134
|
+
|
135
|
+
a) There are probably quite a few routes in your routes file that don't need inclusion in the Sitemap. (AJAX routes I'm looking at you.)
|
136
|
+
|
137
|
+
and
|
138
|
+
|
139
|
+
b) How would you infer the correct series of links for the following route?
|
140
|
+
|
141
|
+
map.zipcode 'location/:state/:city/:zipcode', :controller => 'zipcode', :action => 'index'
|
142
|
+
|
143
|
+
Don't tell me it's trivial, because it isn't. It just looks trivial.
|
144
|
+
|
145
|
+
So my idea is to have another file similar to 'routes.rb' called 'sitemap.rb', where you can define what goes into the Sitemap.
|
146
|
+
|
147
|
+
Here's my solution:
|
148
|
+
|
149
|
+
Zipcode.find(:all, :include => :city).each do |z|
|
150
|
+
sitemap.add zipcode_path(:state => z.city.state, :city => z.city, :zipcode => z)
|
151
|
+
end
|
152
|
+
|
153
|
+
Easy hey?
|
154
|
+
|
155
|
+
Other Sitemap settings for the link, like `lastmod`, `priority`, `changefreq` and `host` are entered automatically, although you can override them if you need to.
|
156
|
+
|
157
|
+
Compatibility
|
158
|
+
=======
|
159
|
+
|
160
|
+
Tested and working on:
|
161
|
+
|
162
|
+
- **Rails** 3.0.0, sitemap_generator version >= 0.2.5
|
163
|
+
- **Rails** 1.x - 2.3.5
|
164
|
+
- **Ruby** 1.8.6, 1.8.7, 1.9.1
|
165
|
+
|
166
|
+
Notes
|
167
|
+
=======
|
168
|
+
|
169
|
+
1) For large sitemaps it may be useful to split your generation into batches to avoid running out of memory. E.g.:
|
170
|
+
|
171
|
+
# add movies
|
172
|
+
Movie.find_in_batches(:batch_size => 1000) do |movies|
|
173
|
+
movies.each do |movie|
|
174
|
+
sitemap.add "/movies/show/#{movie.to_param}", :lastmod => movie.updated_at, :changefreq => 'weekly'
|
175
|
+
end
|
176
|
+
end
|
177
|
+
|
178
|
+
2) New Capistrano deploys will remove your Sitemap files, unless you run `rake sitemap:refresh`. The way around this is to create a cap task:
|
179
|
+
|
180
|
+
after "deploy:update_code", "deploy:copy_old_sitemap"
|
181
|
+
|
182
|
+
namespace :deploy do
|
183
|
+
task :copy_old_sitemap do
|
184
|
+
run "if [ -e #{previous_release}/public/sitemap_index.xml.gz ]; then cp #{previous_release}/public/sitemap* #{current_release}/public/; fi"
|
185
|
+
end
|
186
|
+
end
|
187
|
+
|
188
|
+
3) If generation of your sitemap fails for some reason, the old sitemap will remain in public/. This ensures that robots will always find a valid sitemap. Running silently (`rake -s sitemap:refresh`) and with email forwarding setup you'll only get an email if your sitemap fails to build, and no notification when everything is fine - which will be most of the time.
|
189
|
+
|
190
|
+
Known Bugs
|
191
|
+
========
|
192
|
+
|
193
|
+
- There's no check on the size of a URL which [isn't supposed to exceed 2,048 bytes][sitemaps_xml].
|
194
|
+
- Currently only supports one Sitemap Index file, which can contain 50,000 Sitemap files which can each contain 50,000 urls, so it _only_ supports up to 2,500,000,000 (2.5 billion) urls. I personally have no need of support for more urls, but plugin could be improved to support this.
|
195
|
+
|
196
|
+
Wishlist & Coming Soon
|
197
|
+
========
|
198
|
+
|
199
|
+
- Support for generating sitemaps for sites with multiple domains. Sitemaps are generated into subdirectories and we use a Rack middleware to rewrite requests for sitemaps to the correct subdirectory based on the request host.
|
200
|
+
- I want to refactor the code because it has grown a lot. Part of this refactoring will include implementing some more checks to make sure we adhere to standards as well as making sure that the sitemaps are being generated as efficiently as possible.
|
201
|
+
|
202
|
+
I'd like to simplify adding links to a sitemap. Right now it's all or nothing. I'd like to break it up so you can add batches.
|
203
|
+
- Auto coverage testing. Generate a report of broken URLs by checking the status codes of each page in the sitemap.
|
204
|
+
|
205
|
+
Thanks (in no particular order)
|
206
|
+
========
|
207
|
+
|
208
|
+
- [Alexadre Bini](http://github.com/alexandrebini) for image sitemaps
|
209
|
+
- [Dan Pickett](http://github.com/dpickett)
|
210
|
+
- [Rob Biedenharn](http://github.com/rab)
|
211
|
+
- [Richie Vos](http://github.com/jerryvos)
|
212
|
+
- [Adrian Mugnolo](http://github.com/xymbol)
|
213
|
+
- [Jason Weathered](http://github.com/jasoncodes)
|
214
|
+
- [Andy Stewart](http://github.com/airblade)
|
215
|
+
|
216
|
+
Copyright (c) 2009 Karl Varga released under the MIT license
|
217
|
+
|
218
|
+
[canonical_repo]:http://github.com/kjvarga/sitemap_generator
|
219
|
+
[enterprise_class]:https://twitter.com/dhh/status/1631034662 "I use enterprise in the same sense the Phusion guys do - i.e. Enterprise Ruby. Please don't look down on my use of the word 'enterprise' to represent being a cut above. It doesn't mean you ever have to work for a company the size of IBM. Or constantly fight inertia, writing crappy software, adhering to change management practices and spending hours in meetings... Not that there's anything wrong with that - Wait, what?"
|
220
|
+
[sitemap_engines]:http://en.wikipedia.org/wiki/Sitemap_index "http://en.wikipedia.org/wiki/Sitemap_index"
|
221
|
+
[sitemaps_org]:http://www.sitemaps.org/protocol.php "http://www.sitemaps.org/protocol.php"
|
222
|
+
[sitemaps_xml]:http://www.sitemaps.org/protocol.php#xmlTagDefinitions "XML Tag Definitions"
|
223
|
+
[sitemap_generator_usage]:http://wiki.github.com/adamsalter/sitemap_generator/sitemapgenerator-usage "http://wiki.github.com/adamsalter/sitemap_generator/sitemapgenerator-usage"
|
224
|
+
[boost_juice]:http://www.boostjuice.com.au/ "Mmmm, sweet, sweet Boost Juice."
|
225
|
+
[cb]:http://codebright.net "http://codebright.net"
|
226
|
+
[sitemap_images]:http://www.google.com/support/webmasters/bin/answer.py?answer=178636
|
227
|
+
[sitemap_protocol]:http://sitemaps.org/protocol.php
|
data/Rakefile
ADDED
@@ -0,0 +1,123 @@
|
|
1
|
+
require 'rake'
|
2
|
+
require 'rake/rdoctask'
|
3
|
+
require 'rubygems'
|
4
|
+
gem 'rspec', '1.3.0'
|
5
|
+
require 'spec/rake/spectask'
|
6
|
+
gem 'nokogiri'
|
7
|
+
|
8
|
+
begin
|
9
|
+
require 'jeweler'
|
10
|
+
Jeweler::Tasks.new do |gem|
|
11
|
+
gem.name = "apsoto-sitemap_generator"
|
12
|
+
gem.summary = %Q{Easily generate enterprise class Sitemaps for your Rails site using a familiar Rails Routes-like DSL}
|
13
|
+
gem.description = %Q{SitemapGenerator is a Rails gem that makes it easy to generate enterprise-class Sitemaps readable by all search engines. Generated Sitemaps adhere to the Sitemap protocol specification. When you generate new Sitemaps, SitemapGenerator can automatically ping the major search engines (including Google, Yahoo and Bing) to notify them. SitemapGenerator includes rake tasks to easily manage your sitemaps.}
|
14
|
+
gem.email = "apsoto@gmail.com"
|
15
|
+
gem.homepage = "http://github.com/apsoto/sitemap_generator"
|
16
|
+
gem.authors = ["Alex Soto", "Karl Varga", "Adam Salter"]
|
17
|
+
gem.files = FileList["[A-Z]*", "{bin,lib,rails,templates,tasks}/**/*"]
|
18
|
+
gem.test_files = []
|
19
|
+
gem.add_development_dependency "rspec"
|
20
|
+
gem.add_development_dependency "nokogiri"
|
21
|
+
gem.add_development_dependency "sqlite3-ruby"
|
22
|
+
end
|
23
|
+
Jeweler::GemcutterTasks.new
|
24
|
+
rescue LoadError
|
25
|
+
puts "Jeweler (or a dependency) not available. Install it with: gem install jeweler"
|
26
|
+
end
|
27
|
+
|
28
|
+
#
|
29
|
+
# Helper methods
|
30
|
+
#
|
31
|
+
module Helpers
|
32
|
+
extend self
|
33
|
+
|
34
|
+
# Return a full local path to path fragment <tt>path</tt>
|
35
|
+
def local_path(path)
|
36
|
+
File.join(File.dirname(__FILE__), path)
|
37
|
+
end
|
38
|
+
|
39
|
+
# Copy all of the local files into <tt>path</tt> after completely cleaning it
|
40
|
+
def prepare_path(path)
|
41
|
+
rm_rf path
|
42
|
+
mkdir_p path
|
43
|
+
cp_r(FileList["[A-Z]*", "{bin,lib,rails,templates,tasks}"], path)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
#
|
48
|
+
# Tasks
|
49
|
+
#
|
50
|
+
task :default => :test
|
51
|
+
|
52
|
+
namespace :test do
|
53
|
+
#desc "Test as a gem, plugin and Rails 3 gem"
|
54
|
+
#task :all => ['test:gem', 'test:plugin']
|
55
|
+
|
56
|
+
task :gem => ['test:prepare:gem', 'multi_spec']
|
57
|
+
task :plugin => ['test:prepare:plugin', 'multi_spec']
|
58
|
+
task :rails3 => ['test:prepare:rails3', 'multi_spec']
|
59
|
+
|
60
|
+
task :multi_spec do
|
61
|
+
Rake::Task['spec'].invoke
|
62
|
+
Rake::Task['spec'].reenable
|
63
|
+
end
|
64
|
+
|
65
|
+
namespace :prepare do
|
66
|
+
task :gem do
|
67
|
+
ENV["SITEMAP_RAILS"] = 'gem'
|
68
|
+
Helpers.prepare_path(Helpers.local_path('spec/mock_app_gem/vendor/gems/sitemap_generator-1.2.3'))
|
69
|
+
rm_rf(Helpers.local_path('spec/mock_app_gem/public/sitemap*'))
|
70
|
+
end
|
71
|
+
|
72
|
+
task :plugin do
|
73
|
+
ENV["SITEMAP_RAILS"] = 'plugin'
|
74
|
+
Helpers.prepare_path(Helpers.local_path('spec/mock_app_plugin/vendor/plugins/sitemap_generator-1.2.3'))
|
75
|
+
rm_rf(Helpers.local_path('spec/mock_app_plugin/public/sitemap*'))
|
76
|
+
end
|
77
|
+
|
78
|
+
task :rails3 do
|
79
|
+
ENV["SITEMAP_RAILS"] = 'rails3'
|
80
|
+
rm_rf(Helpers.local_path('spec/mock_rails3_gem/public/sitemap*'))
|
81
|
+
end
|
82
|
+
end
|
83
|
+
end
|
84
|
+
|
85
|
+
desc "Run tests as a gem install"
|
86
|
+
task :test => ['test:gem']
|
87
|
+
|
88
|
+
Spec::Rake::SpecTask.new(:spec) do |spec|
|
89
|
+
spec.libs << 'lib' << 'spec'
|
90
|
+
spec.spec_files = FileList['spec/**/*_spec.rb']
|
91
|
+
end
|
92
|
+
task :spec => :check_dependencies
|
93
|
+
|
94
|
+
Spec::Rake::SpecTask.new(:rcov) do |spec|
|
95
|
+
spec.libs << 'lib' << 'spec'
|
96
|
+
spec.pattern = 'spec/**/*_spec.rb'
|
97
|
+
spec.rcov = true
|
98
|
+
end
|
99
|
+
|
100
|
+
desc 'Generate documentation'
|
101
|
+
Rake::RDocTask.new(:rdoc) do |rdoc|
|
102
|
+
rdoc.rdoc_dir = 'rdoc'
|
103
|
+
rdoc.title = 'SitemapGenerator'
|
104
|
+
rdoc.options << '--line-numbers' << '--inline-source'
|
105
|
+
rdoc.rdoc_files.include('README.md')
|
106
|
+
rdoc.rdoc_files.include('lib/**/*.rb')
|
107
|
+
end
|
108
|
+
|
109
|
+
namespace :release do
|
110
|
+
|
111
|
+
desc "Release a new patch version"
|
112
|
+
task :patch do
|
113
|
+
Rake::Task['version:bump:patch'].invoke
|
114
|
+
Rake::Task['release:current'].invoke
|
115
|
+
end
|
116
|
+
|
117
|
+
desc "Release the current version (e.g. after a version bump). This rebuilds the gemspec, pushes the updated code, tags it and releases to RubyGems"
|
118
|
+
task :current do
|
119
|
+
Rake::Task['github:release'].invoke
|
120
|
+
Rake::Task['git:release'].invoke
|
121
|
+
Rake::Task['gemcutter:release'].invoke
|
122
|
+
end
|
123
|
+
end
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
1.0.1.dev
|
@@ -0,0 +1,155 @@
|
|
1
|
+
require 'sitemap_generator/builder/helper'
|
2
|
+
require 'builder'
|
3
|
+
require 'zlib'
|
4
|
+
|
5
|
+
module SitemapGenerator
|
6
|
+
module Builder
|
7
|
+
class SitemapFile
|
8
|
+
include SitemapGenerator::Builder::Helper
|
9
|
+
|
10
|
+
attr_accessor :sitemap_path, :public_path, :filesize, :link_count, :hostname
|
11
|
+
|
12
|
+
# <tt>public_path</tt> full path of the directory to write sitemaps in.
|
13
|
+
# Usually your Rails <tt>public/</tt> directory.
|
14
|
+
#
|
15
|
+
# <tt>sitemap_path</tt> relative path including filename of the sitemap
|
16
|
+
# file relative to <tt>public_path</tt>
|
17
|
+
#
|
18
|
+
# <tt>hostname</tt> hostname including protocol to use in all links
|
19
|
+
# e.g. http://en.google.ca
|
20
|
+
def initialize(public_path, sitemap_path, hostname)
|
21
|
+
self.sitemap_path = sitemap_path
|
22
|
+
self.public_path = public_path
|
23
|
+
self.hostname = hostname
|
24
|
+
self.link_count = 0
|
25
|
+
|
26
|
+
@xml_content = '' # XML urlset content
|
27
|
+
@xml_wrapper_start = <<-HTML
|
28
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
29
|
+
<urlset
|
30
|
+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
31
|
+
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
|
32
|
+
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
|
33
|
+
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
|
34
|
+
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
|
35
|
+
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"
|
36
|
+
>
|
37
|
+
HTML
|
38
|
+
@xml_wrapper_start.gsub!(/\s+/, ' ').gsub!(/ *> */, '>').strip!
|
39
|
+
@xml_wrapper_end = %q[</urlset>]
|
40
|
+
self.filesize = bytesize(@xml_wrapper_start) + bytesize(@xml_wrapper_end)
|
41
|
+
end
|
42
|
+
|
43
|
+
def lastmod
|
44
|
+
File.mtime(self.full_path) rescue nil
|
45
|
+
end
|
46
|
+
|
47
|
+
def empty?
|
48
|
+
self.link_count == 0
|
49
|
+
end
|
50
|
+
|
51
|
+
def full_url
|
52
|
+
URI.join(self.hostname, self.sitemap_path).to_s
|
53
|
+
end
|
54
|
+
|
55
|
+
def full_path
|
56
|
+
@full_path ||= File.join(self.public_path, self.sitemap_path)
|
57
|
+
end
|
58
|
+
|
59
|
+
# Return a boolean indicating whether the sitemap file can fit another link
|
60
|
+
# of <tt>bytes</tt> bytes in size.
|
61
|
+
def file_can_fit?(bytes)
|
62
|
+
(self.filesize + bytes) < SitemapGenerator::MAX_SITEMAP_FILESIZE && self.link_count < SitemapGenerator::MAX_SITEMAP_LINKS
|
63
|
+
end
|
64
|
+
|
65
|
+
# Add a link to the sitemap file and return a boolean indicating whether the
|
66
|
+
# link was added.
|
67
|
+
#
|
68
|
+
# If a link cannot be added, the file is too large or the link limit has been reached.
|
69
|
+
def add_link(link)
|
70
|
+
xml = build_xml(::Builder::XmlMarkup.new, link)
|
71
|
+
unless file_can_fit?(bytesize(xml))
|
72
|
+
self.finalize!
|
73
|
+
return false
|
74
|
+
end
|
75
|
+
|
76
|
+
@xml_content << xml
|
77
|
+
self.filesize += bytesize(xml)
|
78
|
+
self.link_count += 1
|
79
|
+
true
|
80
|
+
end
|
81
|
+
alias_method :<<, :add_link
|
82
|
+
|
83
|
+
# Return XML as a String
|
84
|
+
def build_xml(builder, link)
|
85
|
+
builder.url do
|
86
|
+
builder.loc link[:loc]
|
87
|
+
builder.lastmod w3c_date(link[:lastmod]) if link[:lastmod]
|
88
|
+
builder.changefreq link[:changefreq] if link[:changefreq]
|
89
|
+
builder.priority link[:priority] if link[:priority]
|
90
|
+
|
91
|
+
unless link[:images].blank?
|
92
|
+
link[:images].each do |image|
|
93
|
+
builder.image:image do
|
94
|
+
builder.image :loc, image[:loc]
|
95
|
+
builder.image :caption, image[:caption] if image[:caption]
|
96
|
+
builder.image :geo_location, image[:geo_location] if image[:geo_location]
|
97
|
+
builder.image :title, image[:title] if image[:title]
|
98
|
+
builder.image :license, image[:license] if image[:license]
|
99
|
+
end
|
100
|
+
end
|
101
|
+
end
|
102
|
+
|
103
|
+
unless link[:video].blank?
|
104
|
+
video = link[:video]
|
105
|
+
builder.video :video do
|
106
|
+
# required elements
|
107
|
+
builder.video :thumbnail_loc, video[:thumbnail_loc]
|
108
|
+
builder.video :title, video[:title]
|
109
|
+
builder.video :description, video[:description]
|
110
|
+
|
111
|
+
builder.video :content_loc, video[:content_loc] if video[:content_loc]
|
112
|
+
if video[:player_loc]
|
113
|
+
builder.video :player_loc, video[:player_loc], :allow_embed => (video[:allow_embed] ? 'yes' : 'no'), :autoplay => video[:autoplay]
|
114
|
+
end
|
115
|
+
|
116
|
+
builder.video :rating, video[:rating] if video[:rating]
|
117
|
+
builder.video :view_count, video[:view_count] if video[:view_count]
|
118
|
+
builder.video :publication_date, video[:publication_date] if video[:publication_date]
|
119
|
+
builder.video :expiration_date, video[:expiration_date] if video[:expiration_date]
|
120
|
+
builder.video :duration, video[:duration] if video[:duration]
|
121
|
+
builder.video :family_friendly, (video[:family_friendly] ? 'yes' : 'no') if video[:family_friendly]
|
122
|
+
builder.video :duration, video[:duration] if video[:duration]
|
123
|
+
video[:tags].each {|tag| builder.video :tag, tag } if video[:tags]
|
124
|
+
video[:categories].each {|category| builder.video :category, category} if video[:categories]
|
125
|
+
end
|
126
|
+
end
|
127
|
+
end
|
128
|
+
builder << ''
|
129
|
+
end
|
130
|
+
|
131
|
+
# Insert the content into the XML "wrapper" and write and close the file.
|
132
|
+
#
|
133
|
+
# All the xml content in the instance is cleared, but attributes like
|
134
|
+
# <tt>filesize</tt> are still available.
|
135
|
+
def finalize!
|
136
|
+
return if self.frozen?
|
137
|
+
|
138
|
+
open(self.full_path, 'wb') do |file|
|
139
|
+
gz = Zlib::GzipWriter.new(file)
|
140
|
+
gz.write @xml_wrapper_start
|
141
|
+
gz.write @xml_content
|
142
|
+
gz.write @xml_wrapper_end
|
143
|
+
gz.close
|
144
|
+
end
|
145
|
+
@xml_content = @xml_wrapper_start = @xml_wrapper_end = ''
|
146
|
+
self.freeze
|
147
|
+
end
|
148
|
+
|
149
|
+
# Return the bytesize length of the string
|
150
|
+
def bytesize(string)
|
151
|
+
string.respond_to?(:bytesize) ? string.bytesize : string.length
|
152
|
+
end
|
153
|
+
end
|
154
|
+
end
|
155
|
+
end
|
@@ -0,0 +1,33 @@
|
|
1
|
+
module SitemapGenerator
|
2
|
+
module Builder
|
3
|
+
class SitemapIndexFile < SitemapFile
|
4
|
+
|
5
|
+
def initialize(*args)
|
6
|
+
super(*args)
|
7
|
+
|
8
|
+
@xml_content = '' # XML urlset content
|
9
|
+
@xml_wrapper_start = <<-HTML
|
10
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
11
|
+
<sitemapindex
|
12
|
+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
13
|
+
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
|
14
|
+
http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd"
|
15
|
+
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
|
16
|
+
>
|
17
|
+
HTML
|
18
|
+
@xml_wrapper_start.gsub!(/\s+/, ' ').gsub!(/ *> */, '>').strip!
|
19
|
+
@xml_wrapper_end = %q[</sitemapindex>]
|
20
|
+
self.filesize = bytesize(@xml_wrapper_start) + bytesize(@xml_wrapper_end)
|
21
|
+
end
|
22
|
+
|
23
|
+
# Return XML as a String
|
24
|
+
def build_xml(builder, link)
|
25
|
+
builder.sitemap do
|
26
|
+
builder.loc link[:loc]
|
27
|
+
builder.lastmod w3c_date(link[:lastmod]) if link[:lastmod]
|
28
|
+
end
|
29
|
+
builder << ''
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
@@ -0,0 +1,28 @@
|
|
1
|
+
module SitemapGenerator
|
2
|
+
|
3
|
+
# Evaluate a sitemap config file within the context of a class that includes the
|
4
|
+
# Rails URL helpers.
|
5
|
+
class Interpreter
|
6
|
+
|
7
|
+
if SitemapGenerator::Utilities.rails3?
|
8
|
+
include ::Rails.application.routes.url_helpers
|
9
|
+
else
|
10
|
+
require 'action_controller'
|
11
|
+
include ActionController::UrlWriter
|
12
|
+
end
|
13
|
+
|
14
|
+
def initialize(sitemap_config_file=nil)
|
15
|
+
sitemap_config_file ||= File.join(::Rails.root, 'config/sitemap.rb')
|
16
|
+
eval(open(sitemap_config_file).read)
|
17
|
+
end
|
18
|
+
|
19
|
+
# KJV do we need this? We should be using path_* helpers.
|
20
|
+
# def self.default_url_options(options = nil)
|
21
|
+
# { :host => SitemapGenerator::Sitemap.default_host }
|
22
|
+
# end
|
23
|
+
|
24
|
+
def self.run
|
25
|
+
new
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
@@ -0,0 +1,37 @@
|
|
1
|
+
module SitemapGenerator
|
2
|
+
module Link
|
3
|
+
extend self
|
4
|
+
|
5
|
+
# Return a Hash of options suitable to pass to a SitemapGenerator::Builder::SitemapFile instance.
|
6
|
+
def generate(path, options = {})
|
7
|
+
if path.is_a?(SitemapGenerator::Builder::SitemapFile)
|
8
|
+
options.reverse_merge!(:host => path.hostname, :lastmod => path.lastmod)
|
9
|
+
path = path.sitemap_path
|
10
|
+
end
|
11
|
+
|
12
|
+
options.assert_valid_keys(:priority, :changefreq, :lastmod, :host, :images, :video)
|
13
|
+
options.reverse_merge!(:priority => 0.5, :changefreq => 'weekly', :lastmod => Time.now, :host => Sitemap.default_host, :images => [])
|
14
|
+
{
|
15
|
+
:path => path,
|
16
|
+
:priority => options[:priority],
|
17
|
+
:changefreq => options[:changefreq],
|
18
|
+
:lastmod => options[:lastmod],
|
19
|
+
:host => options[:host],
|
20
|
+
:loc => URI.join(options[:host], path).to_s,
|
21
|
+
:images => prepare_images(options[:images], options[:host]),
|
22
|
+
:video => options[:video]
|
23
|
+
}
|
24
|
+
end
|
25
|
+
|
26
|
+
# Return an Array of image option Hashes suitable to be parsed by SitemapGenerator::Builder::SitemapFile
|
27
|
+
def prepare_images(images, host)
|
28
|
+
images.delete_if { |key,value| key[:loc] == nil }
|
29
|
+
images.each do |r|
|
30
|
+
r.assert_valid_keys(:loc, :caption, :geo_location, :title, :license)
|
31
|
+
r[:loc] = URI.join(host, r[:loc]).to_s
|
32
|
+
end
|
33
|
+
images[0..(SitemapGenerator::MAX_SITEMAP_IMAGES-1)]
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
37
|
+
|
@@ -0,0 +1,174 @@
|
|
1
|
+
require 'builder'
|
2
|
+
require 'action_view'
|
3
|
+
|
4
|
+
# A LinkSet provisions a bunch of links to sitemap files. It also writes the index file
|
5
|
+
# which lists all the sitemap files written.
|
6
|
+
module SitemapGenerator
|
7
|
+
class LinkSet
|
8
|
+
include ActionView::Helpers::NumberHelper # for number_with_delimiter
|
9
|
+
|
10
|
+
attr_accessor :default_host, :public_path, :sitemaps_path
|
11
|
+
attr_accessor :sitemap, :sitemaps, :sitemap_index
|
12
|
+
attr_accessor :verbose, :yahoo_app_id
|
13
|
+
|
14
|
+
# Evaluate the sitemap config file and write all sitemaps.
|
15
|
+
#
|
16
|
+
# This should be refactored so that we can have multiple instances
|
17
|
+
# of LinkSet.
|
18
|
+
def create
|
19
|
+
require 'sitemap_generator/interpreter'
|
20
|
+
|
21
|
+
self.public_path = File.join(::Rails.root, 'public/') if self.public_path.nil?
|
22
|
+
|
23
|
+
start_time = Time.now
|
24
|
+
SitemapGenerator::Interpreter.run
|
25
|
+
finalize!
|
26
|
+
end_time = Time.now
|
27
|
+
|
28
|
+
puts "\nSitemap stats: #{number_with_delimiter(self.link_count)} links / #{self.sitemaps.size} files / " + ("%dm%02ds" % (end_time - start_time).divmod(60)) if verbose
|
29
|
+
end
|
30
|
+
|
31
|
+
# <tt>public_path</tt> (optional) full path to the directory to write sitemaps in.
|
32
|
+
# Defaults to your Rails <tt>public/</tt> directory.
|
33
|
+
#
|
34
|
+
# <tt>sitemaps_path</tt> (optional) path fragment within public to write sitemaps
|
35
|
+
# to e.g. 'en/'. Sitemaps are written to <tt>public_path</tt> + <tt>sitemaps_path</tt>
|
36
|
+
#
|
37
|
+
# <tt>default_host</tt> hostname including protocol to use in all sitemap links
|
38
|
+
# e.g. http://en.google.ca
|
39
|
+
def initialize(public_path = nil, sitemaps_path = nil, default_host = nil)
|
40
|
+
self.default_host = default_host
|
41
|
+
self.public_path = public_path
|
42
|
+
self.sitemaps_path = sitemaps_path
|
43
|
+
|
44
|
+
# Completed sitemaps
|
45
|
+
self.sitemaps = []
|
46
|
+
end
|
47
|
+
|
48
|
+
def link_count
|
49
|
+
self.sitemaps.inject(0) { |link_count_sum, sitemap| link_count_sum + sitemap.link_count }
|
50
|
+
end
|
51
|
+
|
52
|
+
# Called within the user's eval'ed sitemap config file. Add links to sitemap files
|
53
|
+
# passing a block.
|
54
|
+
#
|
55
|
+
# TODO: Refactor. The call chain is confusing and convoluted here.
|
56
|
+
def add_links
|
57
|
+
raise ArgumentError, "Default hostname not set" if default_host.blank?
|
58
|
+
|
59
|
+
# I'd rather have these calls in <tt>create</tt> but we have to wait
|
60
|
+
# for <tt>default_host</tt> to be set by the user's sitemap config
|
61
|
+
new_sitemap
|
62
|
+
add_default_links
|
63
|
+
|
64
|
+
yield Mapper.new(self)
|
65
|
+
end
|
66
|
+
|
67
|
+
# Called from Mapper.
|
68
|
+
#
|
69
|
+
# Add a link to the current sitemap.
|
70
|
+
def add_link(link)
|
71
|
+
unless self.sitemap << link
|
72
|
+
new_sitemap
|
73
|
+
self.sitemap << link
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
# Add the current sitemap to the <tt>sitemaps</tt> Array and
|
78
|
+
# start a new sitemap.
|
79
|
+
#
|
80
|
+
# If the current sitemap is nil or empty it is not added.
|
81
|
+
def new_sitemap
|
82
|
+
unless self.sitemap_index
|
83
|
+
self.sitemap_index = SitemapGenerator::Builder::SitemapIndexFile.new(public_path, sitemap_index_path, default_host)
|
84
|
+
end
|
85
|
+
|
86
|
+
unless self.sitemap
|
87
|
+
self.sitemap = SitemapGenerator::Builder::SitemapFile.new(public_path, new_sitemap_path, default_host)
|
88
|
+
end
|
89
|
+
|
90
|
+
# Mark the sitemap as complete and add it to the sitemap index
|
91
|
+
unless self.sitemap.empty?
|
92
|
+
self.sitemap.finalize!
|
93
|
+
self.sitemap_index << Link.generate(self.sitemap)
|
94
|
+
self.sitemaps << self.sitemap
|
95
|
+
show_progress(self.sitemap) if verbose
|
96
|
+
|
97
|
+
self.sitemap = SitemapGenerator::Builder::SitemapFile.new(public_path, new_sitemap_path, default_host)
|
98
|
+
end
|
99
|
+
end
|
100
|
+
|
101
|
+
# Report progress line.
|
102
|
+
def show_progress(sitemap)
|
103
|
+
uncompressed_size = number_to_human_size(sitemap.filesize)
|
104
|
+
compressed_size = number_to_human_size(File.size?(sitemap.full_path))
|
105
|
+
puts "+ #{sitemap.sitemap_path} #{sitemap.link_count} links / #{uncompressed_size} / #{compressed_size} gzipped"
|
106
|
+
end
|
107
|
+
|
108
|
+
# Finalize all sitemap files
|
109
|
+
def finalize!
|
110
|
+
new_sitemap
|
111
|
+
self.sitemap_index.finalize!
|
112
|
+
end
|
113
|
+
|
114
|
+
# Ping search engines.
|
115
|
+
#
|
116
|
+
# @see http://en.wikipedia.org/wiki/Sitemap_index
|
117
|
+
def ping_search_engines
|
118
|
+
require 'open-uri'
|
119
|
+
|
120
|
+
sitemap_index_url = CGI.escape(self.sitemap_index.full_url)
|
121
|
+
search_engines = {
|
122
|
+
:google => "http://www.google.com/webmasters/sitemaps/ping?sitemap=#{sitemap_index_url}",
|
123
|
+
:yahoo => "http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=#{sitemap_index_url}&appid=#{yahoo_app_id}",
|
124
|
+
:ask => "http://submissions.ask.com/ping?sitemap=#{sitemap_index_url}",
|
125
|
+
:bing => "http://www.bing.com/webmaster/ping.aspx?siteMap=#{sitemap_index_url}",
|
126
|
+
:sitemap_writer => "http://www.sitemapwriter.com/notify.php?crawler=all&url=#{sitemap_index_url}"
|
127
|
+
}
|
128
|
+
|
129
|
+
puts "\n" if verbose
|
130
|
+
search_engines.each do |engine, link|
|
131
|
+
next if engine == :yahoo && !self.yahoo_app_id
|
132
|
+
begin
|
133
|
+
open(link)
|
134
|
+
puts "Successful ping of #{engine.to_s.titleize}" if verbose
|
135
|
+
rescue Timeout::Error, StandardError => e
|
136
|
+
puts "Ping failed for #{engine.to_s.titleize}: #{e.inspect} (URL #{link})" if verbose
|
137
|
+
end
|
138
|
+
end
|
139
|
+
|
140
|
+
if !self.yahoo_app_id && verbose
|
141
|
+
puts "\n"
|
142
|
+
puts <<-END.gsub(/^\s+/, '')
|
143
|
+
To ping Yahoo you require a Yahoo AppID. Add it to your config/sitemap.rb with:
|
144
|
+
|
145
|
+
SitemapGenerator::Sitemap.yahoo_app_id = "my_app_id"
|
146
|
+
|
147
|
+
For more information see http://developer.yahoo.com/search/siteexplorer/V1/updateNotification.html
|
148
|
+
END
|
149
|
+
end
|
150
|
+
end
|
151
|
+
|
152
|
+
protected
|
153
|
+
|
154
|
+
def add_default_links
|
155
|
+
self.sitemap << Link.generate('/', :lastmod => Time.now, :changefreq => 'always', :priority => 1.0)
|
156
|
+
self.sitemap << Link.generate(self.sitemap_index, :lastmod => Time.now, :changefreq => 'always', :priority => 1.0)
|
157
|
+
end
|
158
|
+
|
159
|
+
# Return the current sitemap filename with index.
|
160
|
+
#
|
161
|
+
# The index depends on the length of the <tt>sitemaps</tt> array.
|
162
|
+
def new_sitemap_path
|
163
|
+
File.join(self.sitemaps_path || '', "sitemap#{self.sitemaps.length + 1}.xml.gz")
|
164
|
+
end
|
165
|
+
|
166
|
+
# Return the current sitemap index filename.
|
167
|
+
#
|
168
|
+
# At the moment we only support one index file which can link to
|
169
|
+
# up to 50,000 sitemap files.
|
170
|
+
def sitemap_index_path
|
171
|
+
File.join(self.sitemaps_path || '', 'sitemap_index.xml.gz')
|
172
|
+
end
|
173
|
+
end
|
174
|
+
end
|
@@ -0,0 +1,16 @@
|
|
1
|
+
module SitemapGenerator
|
2
|
+
# Generator instances are used to build links.
|
3
|
+
# The object passed to the add_links block in config/sitemap.rb is a Generator instance.
|
4
|
+
class Mapper
|
5
|
+
attr_accessor :set
|
6
|
+
|
7
|
+
def initialize(set)
|
8
|
+
@set = set
|
9
|
+
end
|
10
|
+
|
11
|
+
def add(loc, options = {})
|
12
|
+
set.add_link Link.generate(loc, options)
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
@@ -0,0 +1 @@
|
|
1
|
+
load File.expand_path(File.join(File.dirname(__FILE__), '../../tasks/sitemap_generator_tasks.rake'))
|
@@ -0,0 +1,41 @@
|
|
1
|
+
module SitemapGenerator
|
2
|
+
# Provide convenient access to template files. E.g.
|
3
|
+
#
|
4
|
+
# SitemapGenerator.templates.sitemap_index
|
5
|
+
#
|
6
|
+
# Lazy-load and cache for efficient access.
|
7
|
+
# Define an accessor method for each template file.
|
8
|
+
class Templates
|
9
|
+
FILES = {
|
10
|
+
:sitemap_sample => 'sitemap.rb',
|
11
|
+
}
|
12
|
+
|
13
|
+
# Dynamically define accessors for each key defined in <tt>FILES</tt>
|
14
|
+
attr_accessor *FILES.keys
|
15
|
+
FILES.keys.each do |name|
|
16
|
+
eval <<-END
|
17
|
+
define_method(:#{name}) do
|
18
|
+
@#{name} ||= read_template(:#{name})
|
19
|
+
end
|
20
|
+
END
|
21
|
+
end
|
22
|
+
|
23
|
+
def initialize(root = SitemapGenerator.root)
|
24
|
+
@root = root
|
25
|
+
end
|
26
|
+
|
27
|
+
# Return the full path to a template.
|
28
|
+
#
|
29
|
+
# <tt>file</tt> template symbol e.g. <tt>:sitemap_sample</tt>
|
30
|
+
def template_path(template)
|
31
|
+
File.join(@root, 'templates', self.class::FILES[template])
|
32
|
+
end
|
33
|
+
|
34
|
+
protected
|
35
|
+
|
36
|
+
# Read the template file and return its contents.
|
37
|
+
def read_template(template)
|
38
|
+
File.read(template_path(template))
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
module SitemapGenerator
|
2
|
+
module Utilities
|
3
|
+
extend self
|
4
|
+
|
5
|
+
# Copy templates/sitemap.rb to config if not there yet.
|
6
|
+
def install_sitemap_rb(verbose=false)
|
7
|
+
if File.exist?(File.join(RAILS_ROOT, 'config/sitemap.rb'))
|
8
|
+
puts "already exists: config/sitemap.rb, file not copied" if verbose
|
9
|
+
else
|
10
|
+
FileUtils.cp(
|
11
|
+
SitemapGenerator.templates.template_path(:sitemap_sample),
|
12
|
+
File.join(RAILS_ROOT, 'config/sitemap.rb'))
|
13
|
+
puts "created: config/sitemap.rb" if verbose
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
17
|
+
# Remove config/sitemap.rb if exists.
|
18
|
+
def uninstall_sitemap_rb
|
19
|
+
if File.exist?(File.join(RAILS_ROOT, 'config/sitemap.rb'))
|
20
|
+
File.rm(File.join(RAILS_ROOT, 'config/sitemap.rb'))
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
# Clean sitemap files in output directory.
|
25
|
+
def clean_files
|
26
|
+
FileUtils.rm(Dir[File.join(RAILS_ROOT, 'public/sitemap*.xml.gz')])
|
27
|
+
end
|
28
|
+
|
29
|
+
# Returns a boolean indicating whether this environment is Rails 3
|
30
|
+
#
|
31
|
+
# @return [Boolean]
|
32
|
+
def self.rails3?
|
33
|
+
Rails.version.to_f >= 3
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
@@ -0,0 +1,28 @@
|
|
1
|
+
require 'sitemap_generator/builder'
|
2
|
+
require 'sitemap_generator/mapper'
|
3
|
+
require 'sitemap_generator/link'
|
4
|
+
require 'sitemap_generator/link_set'
|
5
|
+
require 'sitemap_generator/templates'
|
6
|
+
require 'sitemap_generator/utilities'
|
7
|
+
require 'sitemap_generator/railtie' if SitemapGenerator::Utilities.rails3?
|
8
|
+
|
9
|
+
require 'active_support/core_ext/numeric'
|
10
|
+
|
11
|
+
module SitemapGenerator
|
12
|
+
silence_warnings do
|
13
|
+
VERSION = File.read(File.dirname(__FILE__) + "/../VERSION").strip
|
14
|
+
MAX_SITEMAP_FILES = 50_000 # max sitemap links per index file
|
15
|
+
MAX_SITEMAP_LINKS = 50_000 # max links per sitemap
|
16
|
+
MAX_SITEMAP_IMAGES = 1_000 # max images per url
|
17
|
+
MAX_SITEMAP_FILESIZE = 10.megabytes # bytes
|
18
|
+
|
19
|
+
Sitemap = LinkSet.new
|
20
|
+
end
|
21
|
+
|
22
|
+
class << self
|
23
|
+
attr_accessor :root, :templates
|
24
|
+
end
|
25
|
+
|
26
|
+
self.root = File.expand_path(File.join(File.dirname(__FILE__), '../'))
|
27
|
+
self.templates = SitemapGenerator::Templates.new(self.root)
|
28
|
+
end
|
data/rails/install.rb
ADDED
data/rails/uninstall.rb
ADDED
@@ -0,0 +1,39 @@
|
|
1
|
+
environment = begin
|
2
|
+
|
3
|
+
# Try to require the library. If we are installed as a gem, this should work.
|
4
|
+
# We don't need to load the environment.
|
5
|
+
require 'sitemap_generator'
|
6
|
+
[]
|
7
|
+
|
8
|
+
rescue LoadError
|
9
|
+
|
10
|
+
# We must be installed as a plugin. Make sure the environment is loaded
|
11
|
+
# when running all rake tasks.
|
12
|
+
[:environment]
|
13
|
+
|
14
|
+
end
|
15
|
+
|
16
|
+
namespace :sitemap do
|
17
|
+
desc "Install a default config/sitemap.rb file"
|
18
|
+
task :install => environment do
|
19
|
+
SitemapGenerator::Utilities.install_sitemap_rb(verbose)
|
20
|
+
end
|
21
|
+
|
22
|
+
desc "Delete all Sitemap files in public/ directory"
|
23
|
+
task :clean => environment do
|
24
|
+
SitemapGenerator::Utilities.clean_files
|
25
|
+
end
|
26
|
+
|
27
|
+
desc "Create Sitemap XML files in public/ directory (rake -s for no output)"
|
28
|
+
task :refresh => ['sitemap:create'] do
|
29
|
+
SitemapGenerator::Sitemap.ping_search_engines
|
30
|
+
end
|
31
|
+
|
32
|
+
desc "Create Sitemap XML files (don't ping search engines)"
|
33
|
+
task 'refresh:no_ping' => ['sitemap:create']
|
34
|
+
|
35
|
+
task :create => [:environment] do
|
36
|
+
SitemapGenerator::Sitemap.verbose = verbose
|
37
|
+
SitemapGenerator::Sitemap.create
|
38
|
+
end
|
39
|
+
end
|
@@ -0,0 +1,42 @@
|
|
1
|
+
# Set the host name for URL creation
|
2
|
+
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
|
3
|
+
|
4
|
+
SitemapGenerator::Sitemap.add_links do |sitemap|
|
5
|
+
# Put links creation logic here.
|
6
|
+
#
|
7
|
+
# The root path '/' and sitemap index file are added automatically.
|
8
|
+
# Links are added to the Sitemap in the order they are specified.
|
9
|
+
#
|
10
|
+
# Usage: sitemap.add path, options
|
11
|
+
# (default options are used if you don't specify)
|
12
|
+
#
|
13
|
+
# Defaults: :priority => 0.5, :changefreq => 'weekly',
|
14
|
+
# :lastmod => Time.now, :host => default_host
|
15
|
+
|
16
|
+
|
17
|
+
# Examples:
|
18
|
+
|
19
|
+
# add '/articles'
|
20
|
+
sitemap.add articles_path, :priority => 0.7, :changefreq => 'daily'
|
21
|
+
|
22
|
+
# add all individual articles
|
23
|
+
Article.find(:all).each do |a|
|
24
|
+
sitemap.add article_path(a), :lastmod => a.updated_at
|
25
|
+
end
|
26
|
+
|
27
|
+
# add merchant path
|
28
|
+
sitemap.add '/purchase', :priority => 0.7, :host => "https://www.example.com"
|
29
|
+
|
30
|
+
end
|
31
|
+
|
32
|
+
# Including Sitemaps from Rails Engines.
|
33
|
+
#
|
34
|
+
# These Sitemaps should be almost identical to a regular Sitemap file except
|
35
|
+
# they needn't define their own SitemapGenerator::Sitemap.default_host since
|
36
|
+
# they will undoubtedly share the host name of the application they belong to.
|
37
|
+
#
|
38
|
+
# As an example, say we have a Rails Engine in vendor/plugins/cadability_client
|
39
|
+
# We can include its Sitemap here as follows:
|
40
|
+
#
|
41
|
+
# file = File.join(Rails.root, 'vendor/plugins/cadability_client/config/sitemap.rb')
|
42
|
+
# eval(open(file).read, binding, file)
|
metadata
ADDED
@@ -0,0 +1,133 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: apsoto-sitemap_generator
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
hash: 858800296
|
5
|
+
prerelease: true
|
6
|
+
segments:
|
7
|
+
- 1
|
8
|
+
- 0
|
9
|
+
- 1
|
10
|
+
- dev
|
11
|
+
version: 1.0.1.dev
|
12
|
+
platform: ruby
|
13
|
+
authors:
|
14
|
+
- Alex Soto
|
15
|
+
- Karl Varga
|
16
|
+
- Adam Salter
|
17
|
+
autorequire:
|
18
|
+
bindir: bin
|
19
|
+
cert_chain: []
|
20
|
+
|
21
|
+
date: 2010-09-13 00:00:00 -07:00
|
22
|
+
default_executable:
|
23
|
+
dependencies:
|
24
|
+
- !ruby/object:Gem::Dependency
|
25
|
+
name: rspec
|
26
|
+
prerelease: false
|
27
|
+
requirement: &id001 !ruby/object:Gem::Requirement
|
28
|
+
none: false
|
29
|
+
requirements:
|
30
|
+
- - ">="
|
31
|
+
- !ruby/object:Gem::Version
|
32
|
+
hash: 3
|
33
|
+
segments:
|
34
|
+
- 0
|
35
|
+
version: "0"
|
36
|
+
type: :development
|
37
|
+
version_requirements: *id001
|
38
|
+
- !ruby/object:Gem::Dependency
|
39
|
+
name: nokogiri
|
40
|
+
prerelease: false
|
41
|
+
requirement: &id002 !ruby/object:Gem::Requirement
|
42
|
+
none: false
|
43
|
+
requirements:
|
44
|
+
- - ">="
|
45
|
+
- !ruby/object:Gem::Version
|
46
|
+
hash: 3
|
47
|
+
segments:
|
48
|
+
- 0
|
49
|
+
version: "0"
|
50
|
+
type: :development
|
51
|
+
version_requirements: *id002
|
52
|
+
- !ruby/object:Gem::Dependency
|
53
|
+
name: sqlite3-ruby
|
54
|
+
prerelease: false
|
55
|
+
requirement: &id003 !ruby/object:Gem::Requirement
|
56
|
+
none: false
|
57
|
+
requirements:
|
58
|
+
- - ">="
|
59
|
+
- !ruby/object:Gem::Version
|
60
|
+
hash: 3
|
61
|
+
segments:
|
62
|
+
- 0
|
63
|
+
version: "0"
|
64
|
+
type: :development
|
65
|
+
version_requirements: *id003
|
66
|
+
description: SitemapGenerator is a Rails gem that makes it easy to generate enterprise-class Sitemaps readable by all search engines. Generated Sitemaps adhere to the Sitemap protocol specification. When you generate new Sitemaps, SitemapGenerator can automatically ping the major search engines (including Google, Yahoo and Bing) to notify them. SitemapGenerator includes rake tasks to easily manage your sitemaps.
|
67
|
+
email: apsoto@gmail.com
|
68
|
+
executables: []
|
69
|
+
|
70
|
+
extensions: []
|
71
|
+
|
72
|
+
extra_rdoc_files:
|
73
|
+
- README.md
|
74
|
+
files:
|
75
|
+
- MIT-LICENSE
|
76
|
+
- README.md
|
77
|
+
- Rakefile
|
78
|
+
- VERSION
|
79
|
+
- lib/sitemap_generator.rb
|
80
|
+
- lib/sitemap_generator/builder.rb
|
81
|
+
- lib/sitemap_generator/builder/helper.rb
|
82
|
+
- lib/sitemap_generator/builder/sitemap_file.rb
|
83
|
+
- lib/sitemap_generator/builder/sitemap_index_file.rb
|
84
|
+
- lib/sitemap_generator/interpreter.rb
|
85
|
+
- lib/sitemap_generator/link.rb
|
86
|
+
- lib/sitemap_generator/link_set.rb
|
87
|
+
- lib/sitemap_generator/mapper.rb
|
88
|
+
- lib/sitemap_generator/railtie.rb
|
89
|
+
- lib/sitemap_generator/tasks.rb
|
90
|
+
- lib/sitemap_generator/templates.rb
|
91
|
+
- lib/sitemap_generator/utilities.rb
|
92
|
+
- rails/install.rb
|
93
|
+
- rails/uninstall.rb
|
94
|
+
- tasks/sitemap_generator_tasks.rake
|
95
|
+
- templates/sitemap.rb
|
96
|
+
has_rdoc: true
|
97
|
+
homepage: http://github.com/apsoto/sitemap_generator
|
98
|
+
licenses: []
|
99
|
+
|
100
|
+
post_install_message:
|
101
|
+
rdoc_options:
|
102
|
+
- --charset=UTF-8
|
103
|
+
require_paths:
|
104
|
+
- lib
|
105
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
106
|
+
none: false
|
107
|
+
requirements:
|
108
|
+
- - ">="
|
109
|
+
- !ruby/object:Gem::Version
|
110
|
+
hash: 3
|
111
|
+
segments:
|
112
|
+
- 0
|
113
|
+
version: "0"
|
114
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
115
|
+
none: false
|
116
|
+
requirements:
|
117
|
+
- - ">"
|
118
|
+
- !ruby/object:Gem::Version
|
119
|
+
hash: 25
|
120
|
+
segments:
|
121
|
+
- 1
|
122
|
+
- 3
|
123
|
+
- 1
|
124
|
+
version: 1.3.1
|
125
|
+
requirements: []
|
126
|
+
|
127
|
+
rubyforge_project:
|
128
|
+
rubygems_version: 1.3.7
|
129
|
+
signing_key:
|
130
|
+
specification_version: 3
|
131
|
+
summary: Easily generate enterprise class Sitemaps for your Rails site using a familiar Rails Routes-like DSL
|
132
|
+
test_files: []
|
133
|
+
|