babosa 0.3.11 → 2.0.0.beta
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/Changelog.md +117 -17
- data/README.md +82 -119
- data/Rakefile +9 -8
- data/lib/babosa.rb +2 -21
- data/lib/babosa/identifier.rb +87 -124
- data/lib/babosa/transliterator/base.rb +59 -43
- data/lib/babosa/transliterator/bulgarian.rb +3 -2
- data/lib/babosa/transliterator/cyrillic.rb +5 -5
- data/lib/babosa/transliterator/danish.rb +3 -3
- data/lib/babosa/transliterator/german.rb +3 -2
- data/lib/babosa/transliterator/greek.rb +4 -3
- data/lib/babosa/transliterator/hindi.rb +138 -0
- data/lib/babosa/transliterator/latin.rb +5 -5
- data/lib/babosa/transliterator/macedonian.rb +3 -2
- data/lib/babosa/transliterator/norwegian.rb +3 -3
- data/lib/babosa/transliterator/romanian.rb +3 -2
- data/lib/babosa/transliterator/russian.rb +3 -2
- data/lib/babosa/transliterator/serbian.rb +29 -27
- data/lib/babosa/transliterator/spanish.rb +2 -2
- data/lib/babosa/transliterator/swedish.rb +3 -3
- data/lib/babosa/transliterator/turkish.rb +8 -0
- data/lib/babosa/transliterator/ukrainian.rb +23 -3
- data/lib/babosa/transliterator/vietnamese.rb +4 -3
- data/lib/babosa/version.rb +3 -1
- data/spec/identifier_spec.rb +157 -0
- data/spec/spec_helper.rb +15 -12
- data/spec/transliterators/base_spec.rb +7 -8
- data/spec/transliterators/bulgarian_spec.rb +4 -5
- data/spec/transliterators/danish_spec.rb +5 -6
- data/spec/transliterators/german_spec.rb +6 -7
- data/spec/transliterators/greek_spec.rb +7 -7
- data/spec/transliterators/hindi_spec.rb +17 -0
- data/spec/transliterators/latin_spec.rb +8 -0
- data/spec/transliterators/macedonian_spec.rb +3 -4
- data/spec/transliterators/norwegian_spec.rb +4 -4
- data/spec/transliterators/polish_spec.rb +12 -0
- data/spec/transliterators/romanian_spec.rb +5 -6
- data/spec/transliterators/russian_spec.rb +3 -4
- data/spec/transliterators/serbian_spec.rb +6 -7
- data/spec/transliterators/spanish_spec.rb +5 -6
- data/spec/transliterators/swedish_spec.rb +7 -7
- data/spec/transliterators/turkish_spec.rb +24 -0
- data/spec/transliterators/ukrainian_spec.rb +81 -3
- data/spec/transliterators/vietnamese_spec.rb +10 -10
- metadata +41 -46
- data/init.rb +0 -3
- data/lib/babosa/candidates.rb +0 -45
- data/lib/babosa/generator.rb +0 -24
- data/lib/babosa/utf8/active_support_proxy.rb +0 -20
- data/lib/babosa/utf8/dumb_proxy.rb +0 -42
- data/lib/babosa/utf8/java_proxy.rb +0 -22
- data/lib/babosa/utf8/mappings.rb +0 -193
- data/lib/babosa/utf8/proxy.rb +0 -118
- data/lib/babosa/utf8/unicode_proxy.rb +0 -21
- data/spec/babosa_spec.rb +0 -145
- data/spec/utf8_proxy_spec.rb +0 -48
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: ef1346a05f1b3a1104af8a095b7829eaa853427927335c7905d41d5ab47b2e9c
|
4
|
+
data.tar.gz: 54e543a250c7eff0c9613bea3f22d2e79317707f60cbb364e6e68979237c4c53
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d4f6732579088cda9d4514b4d1dddd32f2cd933f2818ed4c74bf0f00168ccc3b20e6ce220bc1c473a0453d29ef50de256689ed17e1f809b42dcb8956377c0e76
|
7
|
+
data.tar.gz: 797887db1d626a92b28249883f2dcc55e3b92d9f423aa23052ba74cc45856a15ace2b6c823c983b6e6baa0907cfc3d467b9a52e05cd9de9d9fcf37c779bd0647
|
data/Changelog.md
CHANGED
@@ -1,19 +1,119 @@
|
|
1
1
|
# Babosa Changelog
|
2
2
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
*
|
10
|
-
*
|
11
|
-
*
|
12
|
-
*
|
13
|
-
*
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
*
|
18
|
-
|
19
|
-
|
3
|
+
## 2.0.0
|
4
|
+
|
5
|
+
This release contains no important changes. I had a week off from work and
|
6
|
+
decided to refactor the code. However there are some small breaking changes so
|
7
|
+
I have released it as 2.0.0.
|
8
|
+
|
9
|
+
* Refactor internals for simplicity
|
10
|
+
* Use built-in Ruby UTF-8 support in places of other gems.
|
11
|
+
* Drop support for Ruby < 2.5.0.
|
12
|
+
* `Babosa::Identifier#word_chars` no longer removes dashes
|
13
|
+
* `Babosa::Identifier#to_ruby_method` default argument `allow_bangs` is now a keyword argument
|
14
|
+
|
15
|
+
## 1.0.4
|
16
|
+
|
17
|
+
* Fix nil being cast to frozen string (https://github.com/norman/babosa/pull/52)
|
18
|
+
|
19
|
+
## 1.0.3
|
20
|
+
|
21
|
+
* Fix Active Support 6 deprecations (https://github.com/norman/babosa/pull/50)
|
22
|
+
|
23
|
+
## 1.0.2
|
24
|
+
|
25
|
+
* Fix regression in ActiveSupport UTF8 proxy.
|
26
|
+
|
27
|
+
## 1.0.1
|
28
|
+
|
29
|
+
* Fix error with tidy_bytes on Rubinius.
|
30
|
+
* Simplify Active Support UTF8 proxy.
|
31
|
+
* Fix `allow_bangs` argument to to_ruby_method being silently ignored.
|
32
|
+
* Raise error when generating an impossible Ruby method name.
|
33
|
+
|
34
|
+
## 1.0.0
|
35
|
+
|
36
|
+
* Adopt semantic versioning.
|
37
|
+
* When using Active Support, require 3.2 or greater.
|
38
|
+
* Require Ruby 2.0 or greater.
|
39
|
+
* Fix Ruby warnings.
|
40
|
+
* Improve support for Ukrainian.
|
41
|
+
* Support some additional punctuation characters used by Chinese and others.
|
42
|
+
* Add Polish spec.
|
43
|
+
* Use native Unicode normalization on Ruby 2.2 in UTF8::DumbProxy.
|
44
|
+
* Invoke Ruby-native upcase/downcase in UTF8::DumbProxy.
|
45
|
+
* Proxy `tidy_bytes` method to Active Support when possible.
|
46
|
+
* Remove SlugString constant.
|
47
|
+
|
48
|
+
## 0.3.11
|
49
|
+
|
50
|
+
* Add support for Vietnamese.
|
51
|
+
|
52
|
+
## 0.3.10
|
53
|
+
|
54
|
+
* Fix Macedonian "S/S". Don't `include JRuby` unnecessarily.
|
55
|
+
|
56
|
+
## 0.3.9
|
57
|
+
|
58
|
+
* Add missing Greek vowels with diaeresis.
|
59
|
+
|
60
|
+
## 0.3.8
|
61
|
+
|
62
|
+
* Correct and improve Macedonian support.
|
63
|
+
|
64
|
+
## 0.3.7
|
65
|
+
|
66
|
+
* Fix compatibility with Ruby 1.8.7.
|
67
|
+
* Add Swedish support.
|
68
|
+
|
69
|
+
## 0.3.6
|
70
|
+
|
71
|
+
* Allow multiple transliterators.
|
72
|
+
* Add Greek support.
|
73
|
+
|
74
|
+
## 0.3.5
|
75
|
+
|
76
|
+
* Don't strip underscores from identifiers.
|
77
|
+
|
78
|
+
## 0.3.4
|
79
|
+
|
80
|
+
* Add Romanian support.
|
81
|
+
|
82
|
+
## 0.3.3
|
83
|
+
|
84
|
+
* Add Norwegian support.
|
85
|
+
|
86
|
+
## 0.3.2
|
87
|
+
|
88
|
+
* Improve Macedonian support.
|
89
|
+
|
90
|
+
## 0.3.1
|
91
|
+
|
92
|
+
* Small fixes to Cyrillic.
|
93
|
+
|
94
|
+
## 0.3.0
|
95
|
+
|
96
|
+
* Cyrillic support.
|
97
|
+
* Improve support for various Unicode spaces and dashes.
|
98
|
+
|
99
|
+
## 0.2.2
|
100
|
+
|
101
|
+
* Fix for "smart" quote handling.
|
102
|
+
|
103
|
+
## 0.2.1
|
104
|
+
|
105
|
+
* Implement #empty? for compatiblity with Active Support's #blank?.
|
106
|
+
|
107
|
+
## 0.2.0
|
108
|
+
|
109
|
+
* Add support for Danish.
|
110
|
+
* Add method to generate Ruby identifiers.
|
111
|
+
* Improve performance.
|
112
|
+
|
113
|
+
## 0.1.1
|
114
|
+
|
115
|
+
* Add support for Serbian.
|
116
|
+
|
117
|
+
## 0.1.0
|
118
|
+
|
119
|
+
* Initial extraction from FriendlyId.
|
data/README.md
CHANGED
@@ -15,12 +15,16 @@ FriendlyId.
|
|
15
15
|
|
16
16
|
### Transliterate UTF-8 characters to ASCII
|
17
17
|
|
18
|
-
|
18
|
+
```ruby
|
19
|
+
"Gölcük, Turkey".to_slug.transliterate.to_s #=> "Golcuk, Turkey"
|
20
|
+
```
|
19
21
|
|
20
22
|
### Locale sensitive transliteration, with support for many languages
|
21
23
|
|
22
|
-
|
23
|
-
|
24
|
+
```ruby
|
25
|
+
"Jürgen Müller".to_slug.transliterate.to_s #=> "Jurgen Muller"
|
26
|
+
"Jürgen Müller".to_slug.transliterate(:german).to_s #=> "Juergen Mueller"
|
27
|
+
```
|
24
28
|
|
25
29
|
Currently supported languages include:
|
26
30
|
|
@@ -28,6 +32,7 @@ Currently supported languages include:
|
|
28
32
|
* Danish
|
29
33
|
* German
|
30
34
|
* Greek
|
35
|
+
* Hindi
|
31
36
|
* Macedonian
|
32
37
|
* Norwegian
|
33
38
|
* Romanian
|
@@ -35,124 +40,125 @@ Currently supported languages include:
|
|
35
40
|
* Serbian
|
36
41
|
* Spanish
|
37
42
|
* Swedish
|
43
|
+
* Turkish
|
38
44
|
* Ukrainian
|
45
|
+
* Vietnamese
|
46
|
+
|
47
|
+
Additionally there are generic transliterators for transliterating from the
|
48
|
+
Cyrillic alphabet and Latin alphabet with diacritics. The Latin transliterator
|
49
|
+
can be used, for example, with Czech. There is also a transliterator named
|
50
|
+
"Hindi" which may be sufficient for other Indic languages using Devanagari, but
|
51
|
+
I do not know enough to say whether the transliterations would make sense.
|
39
52
|
|
40
53
|
I'll gladly accept contributions from fluent speakers to support more languages.
|
41
54
|
|
42
55
|
### Strip non-ASCII characters
|
43
56
|
|
44
|
-
|
57
|
+
```ruby
|
58
|
+
"Gölcük, Turkey".to_slug.to_ascii.to_s #=> "Glck, Turkey"
|
59
|
+
```
|
45
60
|
|
46
61
|
### Truncate by characters
|
47
62
|
|
48
|
-
|
63
|
+
```ruby
|
64
|
+
"üüü".to_slug.truncate(2).to_s #=> "üü"
|
65
|
+
```
|
49
66
|
|
50
67
|
### Truncate by bytes
|
51
68
|
|
52
69
|
This can be useful to ensure the generated slug will fit in a database column
|
53
70
|
whose length is limited by bytes rather than UTF-8 characters.
|
54
71
|
|
55
|
-
|
72
|
+
```ruby
|
73
|
+
"üüü".to_slug.truncate_bytes(2).to_s #=> "ü"
|
74
|
+
```
|
56
75
|
|
57
76
|
### Remove punctuation chars
|
58
77
|
|
59
|
-
|
78
|
+
```ruby
|
79
|
+
"this is, um, **really** cool, huh?".to_slug.word_chars.to_s #=> "this is um really cool huh"
|
80
|
+
```
|
60
81
|
|
61
82
|
### All-in-one
|
62
83
|
|
63
|
-
|
84
|
+
```ruby
|
85
|
+
"Gölcük, Turkey".to_slug.normalize.to_s #=> "golcuk-turkey"
|
86
|
+
```
|
64
87
|
|
65
88
|
### Other stuff
|
66
89
|
|
67
|
-
#### Using Babosa With FriendlyId 4
|
68
|
-
|
69
|
-
require "babosa"
|
90
|
+
#### Using Babosa With FriendlyId 4+
|
70
91
|
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
def normalize_friendly_id(input)
|
75
|
-
input.to_s.to_slug.normalize(transliterations: :russian).to_s
|
76
|
-
end
|
77
|
-
end
|
92
|
+
```ruby
|
93
|
+
require "babosa"
|
78
94
|
|
79
|
-
|
95
|
+
class Person < ActiveRecord::Base
|
96
|
+
friendly_id :name, use: :slugged
|
80
97
|
|
81
|
-
|
82
|
-
|
83
|
-
|
98
|
+
def normalize_friendly_id(input)
|
99
|
+
input.to_s.to_slug.normalize(transliterations: :russian).to_s
|
100
|
+
end
|
101
|
+
end
|
102
|
+
```
|
84
103
|
|
85
|
-
|
86
|
-
Babosa, or fall back to a simple built-in library. Supported
|
87
|
-
Unicode libraries include:
|
104
|
+
#### UTF-8 support
|
88
105
|
|
89
|
-
|
90
|
-
* Active Support
|
91
|
-
* [Unicode](https://github.com/blackwinter/unicode)
|
92
|
-
* Built-in
|
93
|
-
|
94
|
-
This built-in module is much faster than Active Support but much slower than
|
95
|
-
Java or Unicode. It can only do **very** naive Unicode composition to ensure
|
96
|
-
that, for example, "é" will always be composed to a single codepoint rather than
|
97
|
-
an "e" and a "´" - making it safe to use as a hash key.
|
98
|
-
|
99
|
-
But seriously - save yourself the headache and install a real Unicode library.
|
100
|
-
If you are using Babosa with a language that uses the Cyrillic alphabet, Babosa
|
101
|
-
requires either Unicode, Active Support or Java.
|
106
|
+
Babosa normalizes all input strings [to NFC](https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms).
|
102
107
|
|
103
108
|
#### Ruby Method Names
|
104
109
|
|
105
|
-
Babosa can
|
110
|
+
Babosa can generate strings for Ruby method names. (Yes, Ruby 1.9+ can use
|
106
111
|
UTF-8 chars in method names, but you may not want to):
|
107
112
|
|
108
113
|
|
109
|
-
|
110
|
-
|
114
|
+
```ruby
|
115
|
+
"this is a method".to_slug.to_ruby_method! #=> this_is_a_method
|
116
|
+
"über cool stuff!".to_slug.to_ruby_method! #=> uber_cool_stuff!
|
111
117
|
|
112
|
-
|
113
|
-
|
118
|
+
# You can also disallow trailing punctuation chars
|
119
|
+
"über cool stuff!".to_slug.to_ruby_method(allow_bangs: false) #=> uber_cool_stuff
|
120
|
+
```
|
114
121
|
|
115
122
|
#### Easy to Extend
|
116
123
|
|
117
124
|
You can add custom transliterators for your language with very little code. For
|
118
125
|
example here's the transliterator for German:
|
119
126
|
|
120
|
-
|
121
|
-
|
122
|
-
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
end
|
133
|
-
end
|
127
|
+
```ruby
|
128
|
+
module Babosa
|
129
|
+
module Transliterator
|
130
|
+
class German < Latin
|
131
|
+
APPROXIMATIONS = {
|
132
|
+
"ä" => "ae",
|
133
|
+
"ö" => "oe",
|
134
|
+
"ü" => "ue",
|
135
|
+
"Ä" => "Ae",
|
136
|
+
"Ö" => "Oe",
|
137
|
+
"Ü" => "Ue"
|
138
|
+
}
|
134
139
|
end
|
140
|
+
end
|
141
|
+
end
|
142
|
+
```
|
135
143
|
|
136
144
|
And a spec (you can use this as a template):
|
137
145
|
|
138
|
-
|
139
|
-
|
140
|
-
|
141
|
-
describe Babosa::Transliterator::German do
|
146
|
+
```ruby
|
147
|
+
require "spec_helper"
|
142
148
|
|
143
|
-
|
144
|
-
|
149
|
+
describe Babosa::Transliterator::German do
|
150
|
+
let(:t) { described_class.instance }
|
151
|
+
it_behaves_like "a latin transliterator"
|
145
152
|
|
146
|
-
|
147
|
-
|
148
|
-
|
149
|
-
|
150
|
-
it "should transliterate vowels with umlauts" do
|
151
|
-
t.transliterate("üöä").should eql("ueoeae")
|
152
|
-
end
|
153
|
-
|
154
|
-
end
|
153
|
+
it "should transliterate Eszett" do
|
154
|
+
t.transliterate("ß").should eql("ss")
|
155
|
+
end
|
155
156
|
|
157
|
+
it "should transliterate vowels with umlauts" do
|
158
|
+
t.transliterate("üöä").should eql("ueoeae")
|
159
|
+
end
|
160
|
+
end
|
161
|
+
```
|
156
162
|
|
157
163
|
### Rails 3.x and higher
|
158
164
|
|
@@ -167,46 +173,6 @@ and
|
|
167
173
|
[parameterize](http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-parameterize)
|
168
174
|
to see if they suit your needs.
|
169
175
|
|
170
|
-
### Babosa vs. Stringex
|
171
|
-
|
172
|
-
Babosa provides much of the functionality provided by the
|
173
|
-
[Stringex](https://github.com/rsl/stringex) gem, but in the subjective opinion
|
174
|
-
of the author, is for most use cases a better choice.
|
175
|
-
|
176
|
-
#### Fewer Features
|
177
|
-
|
178
|
-
Stringex offers functionality for storing slugs in an Active Record model, like
|
179
|
-
a simple version of [FriendlyId](http://github.com/norman/friendly_id), in
|
180
|
-
addition to string processing. Babosa only does string processing.
|
181
|
-
|
182
|
-
#### Less Aggressive Unicode Transliteration
|
183
|
-
|
184
|
-
Stringex uses an agressive Unicode to ASCII mapping which outputs gibberish for
|
185
|
-
almost anything but Western European langages and Mandarin Chinese. Babosa
|
186
|
-
supports only languages for which fluent speakers have provided
|
187
|
-
transliterations, to ensure that the output makes sense to users.
|
188
|
-
|
189
|
-
#### Unicode Support
|
190
|
-
|
191
|
-
Stringex does no Unicode normalization or validation before transliterating
|
192
|
-
strings, so if you pass in strings with encoding errors or with different
|
193
|
-
Unicode normalizations, you'll get unpredictable results.
|
194
|
-
|
195
|
-
#### No Locale Assumptions
|
196
|
-
|
197
|
-
Babosa avoids making assumptions about locales like Stringex does, so it doesn't
|
198
|
-
offer transliterations like this out of the box:
|
199
|
-
|
200
|
-
"$12 worth of Ruby power".to_url => "12-dollars-worth-of-ruby-power"
|
201
|
-
|
202
|
-
This is because the symbol "$" is used in many Latin American countries for the
|
203
|
-
peso. Stringex does this in many places, for example, transliterating all Han
|
204
|
-
characters into Pinyin, effectively treating Japanese text as if it were
|
205
|
-
Mandarin Chinese.
|
206
|
-
|
207
|
-
|
208
|
-
### More info
|
209
|
-
|
210
176
|
Please see the [API docs](http://rubydoc.info/github/norman/babosa/master/frames) and source code for
|
211
177
|
more info.
|
212
178
|
|
@@ -218,9 +184,6 @@ Babosa can be installed via Rubygems:
|
|
218
184
|
|
219
185
|
You can get the source code from its [Github repository](http://github.com/norman/babosa).
|
220
186
|
|
221
|
-
Babosa is tested to be compatible with Ruby 1.8.7-2.0.0, JRuby 1.4+, and
|
222
|
-
Rubinius 1.0+ It's probably compatible with other Rubies as well.
|
223
|
-
|
224
187
|
## Reporting bugs
|
225
188
|
|
226
189
|
Please use Babosa's [Github issue
|
@@ -229,7 +192,7 @@ tracker](http://github.com/norman/babosa/issues).
|
|
229
192
|
|
230
193
|
## Misc
|
231
194
|
|
232
|
-
"Babosa" means slug in Spanish.
|
195
|
+
"Babosa" means "slug" in Spanish.
|
233
196
|
|
234
197
|
## Author
|
235
198
|
|
@@ -239,6 +202,7 @@ tracker](http://github.com/norman/babosa/issues).
|
|
239
202
|
|
240
203
|
Many thanks to the following people for their help:
|
241
204
|
|
205
|
+
* [Dmitry A. Ilyashevich](https://github.com/dmitry-ilyashevich) - Deprecation fixes
|
242
206
|
* [anhkind](https://github.com/anhkind) - Vietnamese support
|
243
207
|
* [Martins Zakis](https://github.com/martins) - Bug fixes
|
244
208
|
* [Vassilis Rodokanakis](https://github.com/vrodokanakis) - Greek support
|
@@ -255,10 +219,9 @@ Many thanks to the following people for their help:
|
|
255
219
|
* [Molte Emil Strange Andersen](https://github.com/molte) - Danish support
|
256
220
|
* [Milan Dobrota](https://github.com/milandobrota) - Serbian support
|
257
221
|
|
258
|
-
|
259
222
|
## Copyright
|
260
223
|
|
261
|
-
Copyright (c) 2010-
|
224
|
+
Copyright (c) 2010-2020 Norman Clarke
|
262
225
|
|
263
226
|
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
264
227
|
this software and associated documentation files (the "Software"), to deal in
|
@@ -276,4 +239,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
276
239
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
277
240
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
278
241
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
279
|
-
SOFTWARE.
|
242
|
+
SOFTWARE.
|