babosa 0.3.11 → 2.0.0.beta

Sign up to get free protection for your applications and to get access to all the features.
Files changed (57) hide show
  1. checksums.yaml +5 -5
  2. data/Changelog.md +117 -17
  3. data/README.md +82 -119
  4. data/Rakefile +9 -8
  5. data/lib/babosa.rb +2 -21
  6. data/lib/babosa/identifier.rb +87 -124
  7. data/lib/babosa/transliterator/base.rb +59 -43
  8. data/lib/babosa/transliterator/bulgarian.rb +3 -2
  9. data/lib/babosa/transliterator/cyrillic.rb +5 -5
  10. data/lib/babosa/transliterator/danish.rb +3 -3
  11. data/lib/babosa/transliterator/german.rb +3 -2
  12. data/lib/babosa/transliterator/greek.rb +4 -3
  13. data/lib/babosa/transliterator/hindi.rb +138 -0
  14. data/lib/babosa/transliterator/latin.rb +5 -5
  15. data/lib/babosa/transliterator/macedonian.rb +3 -2
  16. data/lib/babosa/transliterator/norwegian.rb +3 -3
  17. data/lib/babosa/transliterator/romanian.rb +3 -2
  18. data/lib/babosa/transliterator/russian.rb +3 -2
  19. data/lib/babosa/transliterator/serbian.rb +29 -27
  20. data/lib/babosa/transliterator/spanish.rb +2 -2
  21. data/lib/babosa/transliterator/swedish.rb +3 -3
  22. data/lib/babosa/transliterator/turkish.rb +8 -0
  23. data/lib/babosa/transliterator/ukrainian.rb +23 -3
  24. data/lib/babosa/transliterator/vietnamese.rb +4 -3
  25. data/lib/babosa/version.rb +3 -1
  26. data/spec/identifier_spec.rb +157 -0
  27. data/spec/spec_helper.rb +15 -12
  28. data/spec/transliterators/base_spec.rb +7 -8
  29. data/spec/transliterators/bulgarian_spec.rb +4 -5
  30. data/spec/transliterators/danish_spec.rb +5 -6
  31. data/spec/transliterators/german_spec.rb +6 -7
  32. data/spec/transliterators/greek_spec.rb +7 -7
  33. data/spec/transliterators/hindi_spec.rb +17 -0
  34. data/spec/transliterators/latin_spec.rb +8 -0
  35. data/spec/transliterators/macedonian_spec.rb +3 -4
  36. data/spec/transliterators/norwegian_spec.rb +4 -4
  37. data/spec/transliterators/polish_spec.rb +12 -0
  38. data/spec/transliterators/romanian_spec.rb +5 -6
  39. data/spec/transliterators/russian_spec.rb +3 -4
  40. data/spec/transliterators/serbian_spec.rb +6 -7
  41. data/spec/transliterators/spanish_spec.rb +5 -6
  42. data/spec/transliterators/swedish_spec.rb +7 -7
  43. data/spec/transliterators/turkish_spec.rb +24 -0
  44. data/spec/transliterators/ukrainian_spec.rb +81 -3
  45. data/spec/transliterators/vietnamese_spec.rb +10 -10
  46. metadata +41 -46
  47. data/init.rb +0 -3
  48. data/lib/babosa/candidates.rb +0 -45
  49. data/lib/babosa/generator.rb +0 -24
  50. data/lib/babosa/utf8/active_support_proxy.rb +0 -20
  51. data/lib/babosa/utf8/dumb_proxy.rb +0 -42
  52. data/lib/babosa/utf8/java_proxy.rb +0 -22
  53. data/lib/babosa/utf8/mappings.rb +0 -193
  54. data/lib/babosa/utf8/proxy.rb +0 -118
  55. data/lib/babosa/utf8/unicode_proxy.rb +0 -21
  56. data/spec/babosa_spec.rb +0 -145
  57. data/spec/utf8_proxy_spec.rb +0 -48
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 2ee04fad8c458a32dea08b5f6483d817d359dab4
4
- data.tar.gz: 22385c9ae0e279fc6531a6ad6e9851ec3d4f4e81
2
+ SHA256:
3
+ metadata.gz: ef1346a05f1b3a1104af8a095b7829eaa853427927335c7905d41d5ab47b2e9c
4
+ data.tar.gz: 54e543a250c7eff0c9613bea3f22d2e79317707f60cbb364e6e68979237c4c53
5
5
  SHA512:
6
- metadata.gz: ad5f5a7e2bbfd63ab2e0a89878b54dc52a96f6929f68dbb94070d86e7b0515d7f3fb0e6e30d4e3152320a07fe7684833f23135569e7200e4aad0f9930bb3b261
7
- data.tar.gz: 95e010b9b9c5138af14454258f332a0e32e735354ad49a65ed4f22514110e159909e4a4e714a06443a02f4d9f64edfe00ac2cf48f10036488beb2f4a4abde977
6
+ metadata.gz: d4f6732579088cda9d4514b4d1dddd32f2cd933f2818ed4c74bf0f00168ccc3b20e6ce220bc1c473a0453d29ef50de256689ed17e1f809b42dcb8956377c0e76
7
+ data.tar.gz: 797887db1d626a92b28249883f2dcc55e3b92d9f423aa23052ba74cc45856a15ace2b6c823c983b6e6baa0907cfc3d467b9a52e05cd9de9d9fcf37c779bd0647
@@ -1,19 +1,119 @@
1
1
  # Babosa Changelog
2
2
 
3
- * 0.3.11 - Added support for Vietnamese
4
- * 0.3.10 - Fixed Macedonian "S/S". Don't `include JRuby` unnecessarily.
5
- * 0.3.9 - Added missing Greek vowels with diaeresis.
6
- * 0.3.8 - Correct and improve Macedonian support.
7
- * 0.3.7 - Fix compatibility with Ruby 1.8.7. Add Swedish support.
8
- * 0.3.6 - Allow multiple transliterators. Add Greek support.
9
- * 0.3.5 - Don't strip underscores from identifiers.
10
- * 0.3.4 - Add Romanian support.
11
- * 0.3.3 - Add Norwegian support.
12
- * 0.3.2 - Improve Macedonian support.
13
- * 0.3.1 - Small fixes to Cyrillic.
14
- * 0.3.0 - Cyrillic support. Improve support for various Unicode spaces and dashes.
15
- * 0.2.2 - Fix for "smart" quote handling.
16
- * 0.2.1 - Implement #empty? for compatiblity with Active Support's #blank?.
17
- * 0.2.0 - Added support for Danish. Added method to generate Ruby identifiers. Improved performance.
18
- * 0.1.1 - Added support for Serbian.
19
- * 0.1.0 - Initial extraction from FriendlyId.
3
+ ## 2.0.0
4
+
5
+ This release contains no important changes. I had a week off from work and
6
+ decided to refactor the code. However there are some small breaking changes so
7
+ I have released it as 2.0.0.
8
+
9
+ * Refactor internals for simplicity
10
+ * Use built-in Ruby UTF-8 support in places of other gems.
11
+ * Drop support for Ruby < 2.5.0.
12
+ * `Babosa::Identifier#word_chars` no longer removes dashes
13
+ * `Babosa::Identifier#to_ruby_method` default argument `allow_bangs` is now a keyword argument
14
+
15
+ ## 1.0.4
16
+
17
+ * Fix nil being cast to frozen string (https://github.com/norman/babosa/pull/52)
18
+
19
+ ## 1.0.3
20
+
21
+ * Fix Active Support 6 deprecations (https://github.com/norman/babosa/pull/50)
22
+
23
+ ## 1.0.2
24
+
25
+ * Fix regression in ActiveSupport UTF8 proxy.
26
+
27
+ ## 1.0.1
28
+
29
+ * Fix error with tidy_bytes on Rubinius.
30
+ * Simplify Active Support UTF8 proxy.
31
+ * Fix `allow_bangs` argument to to_ruby_method being silently ignored.
32
+ * Raise error when generating an impossible Ruby method name.
33
+
34
+ ## 1.0.0
35
+
36
+ * Adopt semantic versioning.
37
+ * When using Active Support, require 3.2 or greater.
38
+ * Require Ruby 2.0 or greater.
39
+ * Fix Ruby warnings.
40
+ * Improve support for Ukrainian.
41
+ * Support some additional punctuation characters used by Chinese and others.
42
+ * Add Polish spec.
43
+ * Use native Unicode normalization on Ruby 2.2 in UTF8::DumbProxy.
44
+ * Invoke Ruby-native upcase/downcase in UTF8::DumbProxy.
45
+ * Proxy `tidy_bytes` method to Active Support when possible.
46
+ * Remove SlugString constant.
47
+
48
+ ## 0.3.11
49
+
50
+ * Add support for Vietnamese.
51
+
52
+ ## 0.3.10
53
+
54
+ * Fix Macedonian "S/S". Don't `include JRuby` unnecessarily.
55
+
56
+ ## 0.3.9
57
+
58
+ * Add missing Greek vowels with diaeresis.
59
+
60
+ ## 0.3.8
61
+
62
+ * Correct and improve Macedonian support.
63
+
64
+ ## 0.3.7
65
+
66
+ * Fix compatibility with Ruby 1.8.7.
67
+ * Add Swedish support.
68
+
69
+ ## 0.3.6
70
+
71
+ * Allow multiple transliterators.
72
+ * Add Greek support.
73
+
74
+ ## 0.3.5
75
+
76
+ * Don't strip underscores from identifiers.
77
+
78
+ ## 0.3.4
79
+
80
+ * Add Romanian support.
81
+
82
+ ## 0.3.3
83
+
84
+ * Add Norwegian support.
85
+
86
+ ## 0.3.2
87
+
88
+ * Improve Macedonian support.
89
+
90
+ ## 0.3.1
91
+
92
+ * Small fixes to Cyrillic.
93
+
94
+ ## 0.3.0
95
+
96
+ * Cyrillic support.
97
+ * Improve support for various Unicode spaces and dashes.
98
+
99
+ ## 0.2.2
100
+
101
+ * Fix for "smart" quote handling.
102
+
103
+ ## 0.2.1
104
+
105
+ * Implement #empty? for compatiblity with Active Support's #blank?.
106
+
107
+ ## 0.2.0
108
+
109
+ * Add support for Danish.
110
+ * Add method to generate Ruby identifiers.
111
+ * Improve performance.
112
+
113
+ ## 0.1.1
114
+
115
+ * Add support for Serbian.
116
+
117
+ ## 0.1.0
118
+
119
+ * Initial extraction from FriendlyId.
data/README.md CHANGED
@@ -15,12 +15,16 @@ FriendlyId.
15
15
 
16
16
  ### Transliterate UTF-8 characters to ASCII
17
17
 
18
- "Gölcük, Turkey".to_slug.transliterate.to_s #=> "Golcuk, Turkey"
18
+ ```ruby
19
+ "Gölcük, Turkey".to_slug.transliterate.to_s #=> "Golcuk, Turkey"
20
+ ```
19
21
 
20
22
  ### Locale sensitive transliteration, with support for many languages
21
23
 
22
- "Jürgen Müller".to_slug.transliterate.to_s #=> "Jurgen Muller"
23
- "Jürgen Müller".to_slug.transliterate(:german).to_s #=> "Juergen Mueller"
24
+ ```ruby
25
+ "Jürgen Müller".to_slug.transliterate.to_s #=> "Jurgen Muller"
26
+ "Jürgen Müller".to_slug.transliterate(:german).to_s #=> "Juergen Mueller"
27
+ ```
24
28
 
25
29
  Currently supported languages include:
26
30
 
@@ -28,6 +32,7 @@ Currently supported languages include:
28
32
  * Danish
29
33
  * German
30
34
  * Greek
35
+ * Hindi
31
36
  * Macedonian
32
37
  * Norwegian
33
38
  * Romanian
@@ -35,124 +40,125 @@ Currently supported languages include:
35
40
  * Serbian
36
41
  * Spanish
37
42
  * Swedish
43
+ * Turkish
38
44
  * Ukrainian
45
+ * Vietnamese
46
+
47
+ Additionally there are generic transliterators for transliterating from the
48
+ Cyrillic alphabet and Latin alphabet with diacritics. The Latin transliterator
49
+ can be used, for example, with Czech. There is also a transliterator named
50
+ "Hindi" which may be sufficient for other Indic languages using Devanagari, but
51
+ I do not know enough to say whether the transliterations would make sense.
39
52
 
40
53
  I'll gladly accept contributions from fluent speakers to support more languages.
41
54
 
42
55
  ### Strip non-ASCII characters
43
56
 
44
- "Gölcük, Turkey".to_slug.to_ascii.to_s #=> "Glck, Turkey"
57
+ ```ruby
58
+ "Gölcük, Turkey".to_slug.to_ascii.to_s #=> "Glck, Turkey"
59
+ ```
45
60
 
46
61
  ### Truncate by characters
47
62
 
48
- "üüü".to_slug.truncate(2).to_s #=> "üü"
63
+ ```ruby
64
+ "üüü".to_slug.truncate(2).to_s #=> "üü"
65
+ ```
49
66
 
50
67
  ### Truncate by bytes
51
68
 
52
69
  This can be useful to ensure the generated slug will fit in a database column
53
70
  whose length is limited by bytes rather than UTF-8 characters.
54
71
 
55
- "üüü".to_slug.truncate_bytes(2).to_s #=> "ü"
72
+ ```ruby
73
+ "üüü".to_slug.truncate_bytes(2).to_s #=> "ü"
74
+ ```
56
75
 
57
76
  ### Remove punctuation chars
58
77
 
59
- "this is, um, **really** cool, huh?".to_slug.word_chars.to_s #=> "this is um really cool huh"
78
+ ```ruby
79
+ "this is, um, **really** cool, huh?".to_slug.word_chars.to_s #=> "this is um really cool huh"
80
+ ```
60
81
 
61
82
  ### All-in-one
62
83
 
63
- "Gölcük, Turkey".to_slug.normalize.to_s #=> "golcuk-turkey"
84
+ ```ruby
85
+ "Gölcük, Turkey".to_slug.normalize.to_s #=> "golcuk-turkey"
86
+ ```
64
87
 
65
88
  ### Other stuff
66
89
 
67
- #### Using Babosa With FriendlyId 4
68
-
69
- require "babosa"
90
+ #### Using Babosa With FriendlyId 4+
70
91
 
71
- class Person < ActiveRecord::Base
72
- friendly_id :name, use: :slugged
73
-
74
- def normalize_friendly_id(input)
75
- input.to_s.to_slug.normalize(transliterations: :russian).to_s
76
- end
77
- end
92
+ ```ruby
93
+ require "babosa"
78
94
 
79
- #### Pedantic UTF-8 support
95
+ class Person < ActiveRecord::Base
96
+ friendly_id :name, use: :slugged
80
97
 
81
- Babosa goes out of its way to handle [nasty Unicode issues you might never think
82
- you would have](https://github.com/norman/enc/blob/master/equivalence.rb) by
83
- checking, sanitizing and normalizing your string input.
98
+ def normalize_friendly_id(input)
99
+ input.to_s.to_slug.normalize(transliterations: :russian).to_s
100
+ end
101
+ end
102
+ ```
84
103
 
85
- It will automatically use whatever Unicode library you have loaded before
86
- Babosa, or fall back to a simple built-in library. Supported
87
- Unicode libraries include:
104
+ #### UTF-8 support
88
105
 
89
- * Java (only on JRuby of course)
90
- * Active Support
91
- * [Unicode](https://github.com/blackwinter/unicode)
92
- * Built-in
93
-
94
- This built-in module is much faster than Active Support but much slower than
95
- Java or Unicode. It can only do **very** naive Unicode composition to ensure
96
- that, for example, "é" will always be composed to a single codepoint rather than
97
- an "e" and a "´" - making it safe to use as a hash key.
98
-
99
- But seriously - save yourself the headache and install a real Unicode library.
100
- If you are using Babosa with a language that uses the Cyrillic alphabet, Babosa
101
- requires either Unicode, Active Support or Java.
106
+ Babosa normalizes all input strings [to NFC](https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms).
102
107
 
103
108
  #### Ruby Method Names
104
109
 
105
- Babosa can also generate strings for Ruby method names. (Yes, Ruby 1.9 can use
110
+ Babosa can generate strings for Ruby method names. (Yes, Ruby 1.9+ can use
106
111
  UTF-8 chars in method names, but you may not want to):
107
112
 
108
113
 
109
- "this is a method".to_slug.to_ruby_method! #=> this_is_a_method
110
- "über cool stuff!".to_slug.to_ruby_method! #=> uber_cool_stuff!
114
+ ```ruby
115
+ "this is a method".to_slug.to_ruby_method! #=> this_is_a_method
116
+ "über cool stuff!".to_slug.to_ruby_method! #=> uber_cool_stuff!
111
117
 
112
- # You can also disallow trailing punctuation chars
113
- "über cool stuff!".to_slug.to_ruby_method(false) #=> uber_cool_stuff
118
+ # You can also disallow trailing punctuation chars
119
+ "über cool stuff!".to_slug.to_ruby_method(allow_bangs: false) #=> uber_cool_stuff
120
+ ```
114
121
 
115
122
  #### Easy to Extend
116
123
 
117
124
  You can add custom transliterators for your language with very little code. For
118
125
  example here's the transliterator for German:
119
126
 
120
- # encoding: utf-8
121
- module Babosa
122
- module Transliterator
123
- class German < Latin
124
- APPROXIMATIONS = {
125
- "ä" => "ae",
126
- "ö" => "oe",
127
- "ü" => "ue",
128
- "Ä" => "Ae",
129
- "Ö" => "Oe",
130
- "Ü" => "Ue"
131
- }
132
- end
133
- end
127
+ ```ruby
128
+ module Babosa
129
+ module Transliterator
130
+ class German < Latin
131
+ APPROXIMATIONS = {
132
+ "ä" => "ae",
133
+ "ö" => "oe",
134
+ "ü" => "ue",
135
+ "Ä" => "Ae",
136
+ "Ö" => "Oe",
137
+ "Ü" => "Ue"
138
+ }
134
139
  end
140
+ end
141
+ end
142
+ ```
135
143
 
136
144
  And a spec (you can use this as a template):
137
145
 
138
- # encoding: utf-8
139
- require File.expand_path("../../spec_helper", __FILE__)
140
-
141
- describe Babosa::Transliterator::German do
146
+ ```ruby
147
+ require "spec_helper"
142
148
 
143
- let(:t) { described_class.instance }
144
- it_behaves_like "a latin transliterator"
149
+ describe Babosa::Transliterator::German do
150
+ let(:t) { described_class.instance }
151
+ it_behaves_like "a latin transliterator"
145
152
 
146
- it "should transliterate Eszett" do
147
- t.transliterate("ß").should eql("ss")
148
- end
149
-
150
- it "should transliterate vowels with umlauts" do
151
- t.transliterate("üöä").should eql("ueoeae")
152
- end
153
-
154
- end
153
+ it "should transliterate Eszett" do
154
+ t.transliterate("ß").should eql("ss")
155
+ end
155
156
 
157
+ it "should transliterate vowels with umlauts" do
158
+ t.transliterate("üöä").should eql("ueoeae")
159
+ end
160
+ end
161
+ ```
156
162
 
157
163
  ### Rails 3.x and higher
158
164
 
@@ -167,46 +173,6 @@ and
167
173
  [parameterize](http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-parameterize)
168
174
  to see if they suit your needs.
169
175
 
170
- ### Babosa vs. Stringex
171
-
172
- Babosa provides much of the functionality provided by the
173
- [Stringex](https://github.com/rsl/stringex) gem, but in the subjective opinion
174
- of the author, is for most use cases a better choice.
175
-
176
- #### Fewer Features
177
-
178
- Stringex offers functionality for storing slugs in an Active Record model, like
179
- a simple version of [FriendlyId](http://github.com/norman/friendly_id), in
180
- addition to string processing. Babosa only does string processing.
181
-
182
- #### Less Aggressive Unicode Transliteration
183
-
184
- Stringex uses an agressive Unicode to ASCII mapping which outputs gibberish for
185
- almost anything but Western European langages and Mandarin Chinese. Babosa
186
- supports only languages for which fluent speakers have provided
187
- transliterations, to ensure that the output makes sense to users.
188
-
189
- #### Unicode Support
190
-
191
- Stringex does no Unicode normalization or validation before transliterating
192
- strings, so if you pass in strings with encoding errors or with different
193
- Unicode normalizations, you'll get unpredictable results.
194
-
195
- #### No Locale Assumptions
196
-
197
- Babosa avoids making assumptions about locales like Stringex does, so it doesn't
198
- offer transliterations like this out of the box:
199
-
200
- "$12 worth of Ruby power".to_url => "12-dollars-worth-of-ruby-power"
201
-
202
- This is because the symbol "$" is used in many Latin American countries for the
203
- peso. Stringex does this in many places, for example, transliterating all Han
204
- characters into Pinyin, effectively treating Japanese text as if it were
205
- Mandarin Chinese.
206
-
207
-
208
- ### More info
209
-
210
176
  Please see the [API docs](http://rubydoc.info/github/norman/babosa/master/frames) and source code for
211
177
  more info.
212
178
 
@@ -218,9 +184,6 @@ Babosa can be installed via Rubygems:
218
184
 
219
185
  You can get the source code from its [Github repository](http://github.com/norman/babosa).
220
186
 
221
- Babosa is tested to be compatible with Ruby 1.8.7-2.0.0, JRuby 1.4+, and
222
- Rubinius 1.0+ It's probably compatible with other Rubies as well.
223
-
224
187
  ## Reporting bugs
225
188
 
226
189
  Please use Babosa's [Github issue
@@ -229,7 +192,7 @@ tracker](http://github.com/norman/babosa/issues).
229
192
 
230
193
  ## Misc
231
194
 
232
- "Babosa" means slug in Spanish.
195
+ "Babosa" means "slug" in Spanish.
233
196
 
234
197
  ## Author
235
198
 
@@ -239,6 +202,7 @@ tracker](http://github.com/norman/babosa/issues).
239
202
 
240
203
  Many thanks to the following people for their help:
241
204
 
205
+ * [Dmitry A. Ilyashevich](https://github.com/dmitry-ilyashevich) - Deprecation fixes
242
206
  * [anhkind](https://github.com/anhkind) - Vietnamese support
243
207
  * [Martins Zakis](https://github.com/martins) - Bug fixes
244
208
  * [Vassilis Rodokanakis](https://github.com/vrodokanakis) - Greek support
@@ -255,10 +219,9 @@ Many thanks to the following people for their help:
255
219
  * [Molte Emil Strange Andersen](https://github.com/molte) - Danish support
256
220
  * [Milan Dobrota](https://github.com/milandobrota) - Serbian support
257
221
 
258
-
259
222
  ## Copyright
260
223
 
261
- Copyright (c) 2010-2013 Norman Clarke
224
+ Copyright (c) 2010-2020 Norman Clarke
262
225
 
263
226
  Permission is hereby granted, free of charge, to any person obtaining a copy of
264
227
  this software and associated documentation files (the "Software"), to deal in
@@ -276,4 +239,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
276
239
  AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
277
240
  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
278
241
  OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
279
- SOFTWARE.
242
+ SOFTWARE.