ionfish-urlify 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,69 @@
1
+ URLify
2
+ ======
3
+
4
+ A tiny library to convert diacritical marks to unaccented equivalents, for
5
+ ASCII-safe URI creation. It also includes a utility method to remove subtitles.
6
+
7
+
8
+ Installation
9
+ ------------
10
+
11
+ sudo gem install ionfish-urlify
12
+
13
+
14
+ API
15
+ ---
16
+
17
+ URLify.deaccentuate("Kurt Gödel") # => "Kurt Godel"
18
+
19
+ URLify.strip_subtitle "Begriffsschrift:
20
+ eine der arithmetischen nachgebildete
21
+ Formelsprache des reinen Denkens" # => "Begriffsschrift"
22
+
23
+ URLify.urlify "Über Sinn und Bedeutung" # => "uber_sinn_und_bedeutung"
24
+
25
+ URLify.urlify "Moses Schönfinkel", "-" # => "moses-schoenfinkel"
26
+
27
+ The `URLify` module may be mixed into the `String` class to add the above class
28
+ methods--`deaccentuate`, `strip_subtitle` and `urlify`--as instance methods on
29
+ the `String` class. It is not mixed in by default, for obvious reasons.
30
+
31
+ class String
32
+ include URLify
33
+ end
34
+
35
+ "Grundzüge der theoretischen Logik".urlify
36
+ # => "grundzuge_der_theoretischen_logik"
37
+
38
+ Please note that non-`a-z` characters are removed by the `deaccentuate` and
39
+ `urlify` methods, and only characters in URLify's accent library will be
40
+ replaced by their ASCII counterparts. If the library doesn't include a
41
+ particular conversion, please consider forking the project and adding it.
42
+
43
+
44
+ Licence
45
+ -------
46
+
47
+ Copyright (c) 2009, Benedict Eastaugh. All rights reserved.
48
+
49
+ Redistribution and use in source and binary forms, with or without
50
+ modification, are permitted provided that the following conditions are met:
51
+
52
+ * Redistributions of source code must retain the above copyright notice, this
53
+ list of conditions and the following disclaimer.
54
+ * Redistributions in binary form must reproduce the above copyright notice,
55
+ this list of conditions and the following disclaimer in the documentation
56
+ and/or other materials provided with the distribution.
57
+ * The name of the author may not be used to endorse or promote products
58
+ derived from this software without specific prior written permission.
59
+
60
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
61
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
62
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
63
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
64
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
65
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
66
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
67
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
68
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
69
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
data/Rakefile ADDED
@@ -0,0 +1,32 @@
1
+ require 'lib/urlify'
2
+
3
+ begin
4
+ require 'jeweler'
5
+ Jeweler::Tasks.new do |s|
6
+ s.name = "urlify"
7
+ s.summary = "Diacritic conversion"
8
+ s.email = "benedict@eastaugh.net"
9
+ s.homepage = "http://ionfish.github.com/urlify/"
10
+ s.description = "A small library for converting accented characters " +
11
+ "to their ASCII equivalents."
12
+ s.authors = ["Benedict Eastaugh"]
13
+ end
14
+ rescue LoadError
15
+ puts "Jeweler not available. Install it with: sudo gem install " +
16
+ "technicalpickles-jeweler -s http://gems.github.com"
17
+ end
18
+
19
+ task :default => :test
20
+
21
+ desc "Run the URLify test suite"
22
+ task :test do
23
+ require 'test/unit'
24
+
25
+ testdir = "test"
26
+ Dir.foreach(testdir) do |f|
27
+ path = "#{testdir}/#{f}"
28
+ if File.ftype(path) == "file" && File.basename(f).match(/_test.rb$/)
29
+ load path
30
+ end
31
+ end
32
+ end
data/VERSION.yml ADDED
@@ -0,0 +1,4 @@
1
+ ---
2
+ :minor: 1
3
+ :patch: 0
4
+ :major: 0
data/lib/urlify.rb ADDED
@@ -0,0 +1,66 @@
1
+ # encoding: UTF-8
2
+
3
+ module URLify
4
+
5
+ URLIFY_PATH = File.expand_path(File.dirname(__FILE__)) + '/urlify/'
6
+ require URLIFY_PATH + 'accents'
7
+
8
+ # Converts an input string into a URL-safe string.
9
+ #
10
+ # * Leading and trailing whitespace is removed.
11
+ # * Diacritics are removed from all characters.
12
+ # * All letters are converted to lower case.
13
+ # * Remaining whitespace is replaced with separators.
14
+ # * Any remaining character which is not a letter, a digit or a valid
15
+ # separator is removed.
16
+ #
17
+ # Only underscores, dashes, plus signs and the empty string are allowed as
18
+ # separators, although combinations are permitted, so "_", "--", "+_-" and ""
19
+ # are all valid separators.
20
+ def self.urlify(string, separator = "_")
21
+ unless separator =~ /^[\-\_\+]*$/
22
+ separator = "_"
23
+ end
24
+
25
+ deaccentuate(strip_subtitle(string.strip)).
26
+ downcase.
27
+ gsub(/\s/, separator).
28
+ gsub(/[^a-z\d\_\-\+]/, "")
29
+ end
30
+
31
+ # Removes everything from a string after the first colon.
32
+ #
33
+ # Ensures that titles with really long subtitles don't convert to equally
34
+ # long permalinks.
35
+ def self.strip_subtitle(string)
36
+ string.split(/\s*\:\s*/).first
37
+ end
38
+
39
+ # Removes diacritics from an input string's characters.
40
+ #
41
+ # So a lowercase 'u' with an umlaut, ü, becomes u, while an uppercase 'A'
42
+ # with an acute accent, Á, becomes A. This method is UTF-8 safe.
43
+ def self.deaccentuate(string)
44
+ (RUBY_VERSION >= "1.9.0" ? string.chars : string.split(//u)).map {|c|
45
+ ACCENTMAP[c] || c
46
+ }.join("")
47
+ end
48
+
49
+ # Instance method version of URLify.urlify, so that the library can be used
50
+ # as a mixin for the String class.
51
+ def urlify(separator = "_")
52
+ URLify.urlify(self, separator)
53
+ end
54
+
55
+ # Instance method version of URLify.strip_subtitle, so that the library can
56
+ # be used as a mixin for the String class.
57
+ def strip_subtitle
58
+ URLify.strip_subtitle(self)
59
+ end
60
+
61
+ # Instance method version of URLify.deaccentuate, so that the library can be
62
+ # used as a mixin for the String class.
63
+ def deaccentuate
64
+ URLify.deaccentuate(self)
65
+ end
66
+ end
@@ -0,0 +1,74 @@
1
+ # encoding: UTF-8
2
+
3
+ module URLify
4
+
5
+ ACCENTMAP = {
6
+ 'À' => 'A',
7
+ 'Á' => 'A',
8
+ 'Â' => 'A',
9
+ 'Ã' => 'A',
10
+ 'Ä' => 'A',
11
+ 'Å' => 'AA',
12
+ 'Æ' => 'AE',
13
+ 'Ç' => 'C',
14
+ 'È' => 'E',
15
+ 'É' => 'E',
16
+ 'Ê' => 'E',
17
+ 'Ë' => 'E',
18
+ 'Ì' => 'I',
19
+ 'Í' => 'I',
20
+ 'Î' => 'I',
21
+ 'Ï' => 'I',
22
+ 'Ð' => 'D',
23
+ 'Ñ' => 'N',
24
+ 'Ò' => 'O',
25
+ 'Ó' => 'O',
26
+ 'Ô' => 'O',
27
+ 'Õ' => 'O',
28
+ 'Ö' => 'O',
29
+ 'Ø' => 'OE',
30
+ 'Ù' => 'U',
31
+ 'Ú' => 'U',
32
+ 'Ü' => 'U',
33
+ 'Û' => 'U',
34
+ 'Ý' => 'Y',
35
+ 'Þ' => 'Th',
36
+ 'ß' => 'ss',
37
+ 'à' => 'a',
38
+ 'á' => 'a',
39
+ 'â' => 'a',
40
+ 'ã' => 'a',
41
+ 'ä' => 'a',
42
+ 'å' => 'aa',
43
+ 'æ' => 'ae',
44
+ 'ç' => 'c',
45
+ 'è' => 'e',
46
+ 'é' => 'e',
47
+ 'ê' => 'e',
48
+ 'ë' => 'e',
49
+ 'ì' => 'i',
50
+ 'í' => 'i',
51
+ 'î' => 'i',
52
+ 'ï' => 'i',
53
+ 'ð' => 'd',
54
+ 'ñ' => 'n',
55
+ 'ò' => 'o',
56
+ 'ó' => 'o',
57
+ 'ô' => 'o',
58
+ 'õ' => 'o',
59
+ 'ō' => 'o',
60
+ 'ö' => 'o',
61
+ 'ø' => 'oe',
62
+ 'ù' => 'u',
63
+ 'ú' => 'u',
64
+ 'û' => 'u',
65
+ 'ū' => 'u',
66
+ 'ü' => 'u',
67
+ 'ý' => 'y',
68
+ 'þ' => 'th',
69
+ 'ÿ' => 'y',
70
+ 'Œ' => 'OE',
71
+ 'œ' => 'oe',
72
+ '&' => 'and'}
73
+
74
+ end
@@ -0,0 +1,39 @@
1
+ # encoding: UTF-8
2
+
3
+ class String
4
+ include URLify
5
+ end
6
+
7
+ class URLifyTest < Test::Unit::TestCase
8
+
9
+ def setup
10
+ @philosopher = "Søren Kierkegaard"
11
+ @biography = "Boyd: The Fighter Pilot Who Changed the Art of War"
12
+ end
13
+
14
+ def test_subtitle_stripping
15
+ assert_equal("Boyd", URLify.strip_subtitle(@biography))
16
+ end
17
+
18
+ def test_mixin_subtitle_stripping
19
+ assert_equal("Boyd", @biography.strip_subtitle)
20
+ end
21
+
22
+ def test_deaccentuation
23
+ assert_equal("Soeren Kierkegaard", URLify.deaccentuate(@philosopher))
24
+ end
25
+
26
+ def test_mixin_deaccentuation
27
+ assert_equal("Soeren Kierkegaard", @philosopher.deaccentuate)
28
+ end
29
+
30
+ def test_urlification
31
+ assert_equal("soeren_kierkegaard", URLify.urlify(@philosopher))
32
+ assert_equal("boyd", URLify.urlify(@biography))
33
+ end
34
+
35
+ def test_mixin_urlification
36
+ assert_equal("soeren_kierkegaard", @philosopher.urlify)
37
+ assert_equal("boyd", @biography.urlify)
38
+ end
39
+ end
data/urlify.gemspec ADDED
@@ -0,0 +1,42 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = %q{urlify}
5
+ s.version = "0.1.0"
6
+
7
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
8
+ s.authors = ["Benedict Eastaugh"]
9
+ s.date = %q{2009-06-29}
10
+ s.description = %q{A small library for converting accented characters to their ASCII equivalents.}
11
+ s.email = %q{benedict@eastaugh.net}
12
+ s.extra_rdoc_files = [
13
+ "README.md"
14
+ ]
15
+ s.files = [
16
+ "README.md",
17
+ "Rakefile",
18
+ "VERSION.yml",
19
+ "lib/urlify.rb",
20
+ "lib/urlify/accents.rb",
21
+ "test/urlify_test.rb",
22
+ "urlify.gemspec"
23
+ ]
24
+ s.homepage = %q{http://ionfish.github.com/urlify/}
25
+ s.rdoc_options = ["--charset=UTF-8"]
26
+ s.require_paths = ["lib"]
27
+ s.rubygems_version = %q{1.3.4}
28
+ s.summary = %q{Diacritic conversion}
29
+ s.test_files = [
30
+ "test/urlify_test.rb"
31
+ ]
32
+
33
+ if s.respond_to? :specification_version then
34
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
35
+ s.specification_version = 3
36
+
37
+ if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
38
+ else
39
+ end
40
+ else
41
+ end
42
+ end
metadata ADDED
@@ -0,0 +1,59 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: ionfish-urlify
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Benedict Eastaugh
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-06-29 00:00:00 -07:00
13
+ default_executable:
14
+ dependencies: []
15
+
16
+ description: A small library for converting accented characters to their ASCII equivalents.
17
+ email: benedict@eastaugh.net
18
+ executables: []
19
+
20
+ extensions: []
21
+
22
+ extra_rdoc_files:
23
+ - README.md
24
+ files:
25
+ - README.md
26
+ - Rakefile
27
+ - VERSION.yml
28
+ - lib/urlify.rb
29
+ - lib/urlify/accents.rb
30
+ - test/urlify_test.rb
31
+ - urlify.gemspec
32
+ has_rdoc: false
33
+ homepage: http://ionfish.github.com/urlify/
34
+ post_install_message:
35
+ rdoc_options:
36
+ - --charset=UTF-8
37
+ require_paths:
38
+ - lib
39
+ required_ruby_version: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - ">="
42
+ - !ruby/object:Gem::Version
43
+ version: "0"
44
+ version:
45
+ required_rubygems_version: !ruby/object:Gem::Requirement
46
+ requirements:
47
+ - - ">="
48
+ - !ruby/object:Gem::Version
49
+ version: "0"
50
+ version:
51
+ requirements: []
52
+
53
+ rubyforge_project:
54
+ rubygems_version: 1.2.0
55
+ signing_key:
56
+ specification_version: 3
57
+ summary: Diacritic conversion
58
+ test_files:
59
+ - test/urlify_test.rb