byk 0.6.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 34203e0b4291cde495d17da65522df586de7e712
4
- data.tar.gz: 290d743dab23c58241520252bd81d4ae4115ce98
3
+ metadata.gz: cc996c9d9dc81f884e02cc1dd760eeb57b6545fc
4
+ data.tar.gz: de07860c2cb41bcb39b299fee4500fd2bf01db73
5
5
  SHA512:
6
- metadata.gz: f11d00e9ac1057a5596804e03c6c4a6c41841bedc21030a9ed776cfbaaabba85a341a62de71c990c8deadc8f8384bf263b41d477b33b299af26b55acef47fe0c
7
- data.tar.gz: 335ddfeca9f6793f2887c1cc93cfc916e011f7dd01fd97073162871148d0fe61395bdb1115c5ed4f7583ff207f6b6d27462c8920a7a70b016b9427420796bc28
6
+ metadata.gz: 16e97855924c380b205e2e651fdcde391785fe051c2971d948f801ff4260eb691dc4c3304ac17b3083fc0a2469f26d134c9622f74b058f6950d5fd8dfaf62383
7
+ data.tar.gz: c85659aaaccbc5e1db30305b52e2f4955de160dcb7a617ae564877619fe5f36d852ea7c17228f301e639aae3d4793133baa5d644faeffc83942d6b179bef53e9
@@ -1,5 +1,10 @@
1
1
  # Changelog
2
2
 
3
+ ### Byk 1.0.0 (2016-04-09)
4
+
5
+ * Introduced `#to_cyrillic` and `#to_cyrillic!`
6
+ * Introduced console utility
7
+
3
8
  ### Byk 0.6.0 (2015-04-25)
4
9
 
5
10
  * Introduced module methods and the optional safe require
data/README.md CHANGED
@@ -4,39 +4,85 @@ Byk
4
4
  [![Gem Version](https://badge.fury.io/rb/byk.svg)](https://rubygems.org/gems/byk)
5
5
  [![Build Status](https://travis-ci.org/topalovic/byk.svg?branch=master)](https://travis-ci.org/topalovic/byk)
6
6
 
7
- Ruby gem for fast transliteration of Serbian Cyrillic into Latin
8
- <br />
9
- <sub>Inspired by @dejan's
10
- [nice little gem](https://github.com/dejan/srbovanje),
11
- this one comes with a C-optimized twist</sub>
7
+ Ruby gem for fast transliteration of Serbian Cyrillic Latin
12
8
 
13
9
  ![byk](https://cloud.githubusercontent.com/assets/626128/7155207/07545960-e35d-11e4-804e-5fdee70a3e30.png)
14
10
 
15
11
 
16
12
  ## Installation
17
13
 
18
- Add this line to your application's Gemfile:
14
+ Byk can be used as a standalone console utility or as a `String`
15
+ extension in your Ruby programs. It has zero dependencies beyond
16
+ vanilla Ruby and the toolchain for building native gems <sup>1</sup>.
17
+
18
+ You can install it directly:
19
+
20
+ ```ruby
21
+ $ gem install byk
22
+ ```
23
+
24
+ or add it as a dependency in your application's Gemfile:
19
25
 
20
26
  ```ruby
21
27
  gem "byk"
22
28
  ```
23
29
 
24
- And then execute:
30
+ <sub><sup>1</sup> For Windows, you might want to check out
31
+ [DevKit](https://github.com/oneclick/rubyinstaller/wiki/Development-Kit)</sub>
32
+
33
+
34
+ ## Usage
35
+
36
+ ### As a standalone utility
37
+
38
+ Here's the help banner with all the available options:
25
39
 
26
40
  ```
27
- $ bundle
41
+ usage: byk [options] [files]
42
+
43
+ options:
44
+ -c, --cyrillic convert input to Cyrillic (default)
45
+ -l, --latin convert input to Latin
46
+ -a, --ascii convert input to "ASCII Latin"
47
+ -v, --version show version
28
48
  ```
29
49
 
30
- Or install it yourself as:
50
+ Translation goes to stdout so you can redirect it or pipe it as you
51
+ see fit. Let's take a look at some common scenarios.
31
52
 
53
+ To translate files to Cyrillic:
54
+ ```sh
55
+ $ byk in1.txt in2.txt > out.txt
32
56
  ```
33
- $ gem install byk
57
+
58
+ To translate files to Latin and search for a phrase:
59
+ ```sh
60
+ $ byk -l file.txt | grep stvar
34
61
  ```
35
62
 
63
+ Ad hoc conversion:
64
+ ```sh
65
+ $ echo "Вук Стефановић Караџић" | byk -a
66
+ Vuk Stefanovic Karadzic
67
+ ```
36
68
 
37
- ## Usage
69
+ or simply omit args and type away:
70
+ ```sh
71
+ $ byk
72
+ a u ruke Mandušića Vuka
73
+ biće svaka puška ubojita!
74
+ ^D
75
+ а у руке Мандушића Вука
76
+ биће свака пушка убојита!
77
+ ```
38
78
 
39
- First, make sure to require the gem in your initializer:
79
+ `^D` being <kbd>ctrl</kbd> <kbd>d</kbd>.
80
+
81
+
82
+ ### As a `String` extension
83
+
84
+ Unless you're using Bundler, make sure to require the gem in your
85
+ initializer:
40
86
 
41
87
  ```ruby
42
88
  require "byk"
@@ -45,22 +91,23 @@ require "byk"
45
91
  This will extend `String` with a couple of simple methods:
46
92
 
47
93
  ```ruby
48
- "Шеширџија".to_latin # => "Šeširdžija"
49
- "Шеширџија".to_ascii_latin # => "Sesirdzija"
50
- "Šeširdžija".to_ascii_latin # => "Sesirdzija"
94
+ "Šeširdžija".to_cyrillic # => "Шеширџија"
95
+ "Шеширџија".to_latin # => "Šeširdžija"
96
+ "Шеширџија".to_ascii_latin # => "Sesirdzija"
51
97
  ```
52
98
 
53
- There's also a destructive variant of each:
99
+ These do not modify the receiver. For that, there's a destructive
100
+ variant of each:
54
101
 
55
102
  ```ruby
56
- text = "Жвазбука"
57
- text.to_latin! # => "Žvazbuka"
58
- text # => "Žvazbuka"
59
- text.to_ascii_latin! # => "Zvazbuka"
60
- text # => "Zvazbuka"
103
+ text = "Šeširdžija"
104
+ text.to_cyrillic! # => "Шеширџија"
105
+ text.to_latin! # => "Šeširdžija"
106
+ text.to_ascii_latin! # => "Sesirdzija"
107
+ text # => "Sesirdzija"
61
108
  ```
62
109
 
63
- Note that these methods take into account the
110
+ Note that both latinization methods observe
64
111
  [digraph capitalization rules](http://sr.wikipedia.org/wiki/Гајица#.D0.94.D0.B8.D0.B3.D1.80.D0.B0.D1.84.D0.B8):
65
112
 
66
113
  ```ruby
@@ -68,63 +115,88 @@ Note that these methods take into account the
68
115
  "ĐORĐE Đorđević".to_ascii_latin # => "DJORDJE Djordjevic"
69
116
  ```
70
117
 
71
- If you prefer not to monkey patch your strings, you can use the "safe"
72
- require:
118
+
119
+ ### Safe require
120
+
121
+ If you prefer not to monkey patch `String`, you can do a "safe"
122
+ require in your Gemfile:
123
+
73
124
 
74
125
  ```ruby
75
- require "byk/safe"
126
+ gem "byk", :require => "byk/safe"
76
127
  ```
77
128
 
78
- and then call the module methods:
129
+ or initializer:
79
130
 
80
131
  ```ruby
81
- text = "Вук"
82
- Byk.to_latin(text) # => "Vuk"
83
- text # => "Byk"
84
- Byk.to_latin!(text) # => "Vuk"
85
- text # => "Vuk"
132
+ require "byk/safe"
86
133
  ```
87
134
 
135
+ Then, you should rely on module methods:
88
136
 
89
- ## Testing
137
+ ```ruby
138
+ text = "Жвазбука"
90
139
 
91
- To test the gem, clone the repo and run:
140
+ Byk.to_latin(text) # => "Žvazbuka"
141
+ text # => "Жвазбука"
142
+
143
+ Byk.to_latin!(text) # => "Žvazbuka"
144
+ text # => "Žvazbuka"
92
145
 
146
+ # etc.
93
147
  ```
94
- $ bundle
95
- $ bundle exec rake
148
+
149
+
150
+ ## How fast is "fast" transliteration?
151
+
152
+ Here's a quick test:
153
+
154
+ ```sh
155
+ $ wget https://sr.wikipedia.org/ -O sample
156
+ $ du -h sample
157
+ 128K
158
+
159
+ $ time byk -l sample > /dev/null
160
+ 0.08s user 0.04s system 96% cpu 0.126 total
96
161
  ```
97
162
 
163
+ Let's up the ante:
164
+
165
+ ```sh
166
+ $ for i in {1..800}; do cat sample; done > big
167
+ $ du -h big
168
+ 97M
169
+
170
+ $ time byk -l big > /dev/null
171
+ 1.71s user 0.13s system 99% cpu 1.846 total
172
+ ```
98
173
 
99
- ## How fast is fast?
174
+ So, ~100MB in under 2s. Fast enough, I suppose. You can expect it to
175
+ scale linearly.
100
176
 
101
- About [10-40x faster](benchmark) than the baseline Ruby implementation
102
- on my hardware, depending on the string's Cyrillic content ratio. YMMV
103
- of course.
177
+ Compared to the pure Ruby implementation, it is about
178
+ [10-30x faster](benchmark), depending on the input composition and the
179
+ transliteration method applied.
104
180
 
105
181
 
106
- ## Raison d'être
182
+ ## Testing
107
183
 
108
- This kind of speed-up might be worthwhile for massive localization
109
- projects, e.g. sites supporting dual script content. Remember,
110
- `Benchmark` is your friend.
184
+ To test the gem, clone the repo and run:
111
185
 
112
- I found transliteration to be a straightforward little problem that
113
- lends itself well to optimization. It also gave me an excuse to play
114
- with Ruby extensions, so there :smirk_cat:
186
+ ```
187
+ $ bundle && bundle exec rake
188
+ ```
115
189
 
116
190
 
117
191
  ## Compatibility
118
192
 
119
- Byk is supported under MRI Ruby >= 1.9.2.
193
+ Byk is supported under MRI 1.9.2+. I might try my hand in writing a
194
+ JRuby extension in a future release.
120
195
 
121
- I don't plan to support 1.8.7 or older due to substantial C API
122
- changes between 1.8 and 1.9. It doesn't build under Rubinius
123
- currently, but I intend to support it in future releases.
124
196
 
125
197
 
126
198
  ## License
127
199
 
128
- This gem is released under the [MIT License](http://www.opensource.org/licenses/MIT).
200
+ This gem is released under the [MIT License](LICENSE).
129
201
 
130
202
  Уздравље!
data/exe/byk ADDED
@@ -0,0 +1,51 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "byk/safe"
4
+ require "optparse"
5
+
6
+ trap "SIGINT" do
7
+ exit 130
8
+ end
9
+
10
+ method_name = :to_cyrillic
11
+
12
+ opts = OptionParser.new do |opt|
13
+ opt.banner = "usage: byk [options] [files]"
14
+ opt.summary_width = 20
15
+
16
+ opt.separator ""
17
+ opt.separator "options:"
18
+
19
+ opt.on("-c", "--cyrillic", "convert input to Cyrillic (default)") do
20
+ method_name = :to_cyrillic
21
+ end
22
+
23
+ opt.on("-l", "--latin", "convert input to Latin") do
24
+ method_name = :to_latin
25
+ end
26
+
27
+ opt.on("-a", "--ascii", 'convert input to "ASCII Latin"') do
28
+ method_name = :to_ascii_latin
29
+ end
30
+
31
+ opt.on_tail("-v", "--version", "show version") do
32
+ puts Byk::VERSION
33
+ exit
34
+ end
35
+ end
36
+
37
+ begin
38
+ opts.parse!
39
+ rescue OptionParser::InvalidOption => e
40
+ puts e
41
+ puts
42
+ puts opts
43
+ exit 1
44
+ end
45
+
46
+ begin
47
+ puts Byk.send(method_name, ARGF.read)
48
+ rescue => e
49
+ puts e
50
+ exit 1
51
+ end
@@ -3,103 +3,225 @@
3
3
 
4
4
  #define STR_ENC_GET(str) rb_enc_from_index(ENCODING_GET(str))
5
5
 
6
- #define STR_CAT_COND_ASCII(ascii, dest, chr, ascii_chr, len, enc) \
7
- ascii ? rb_str_buf_cat(dest, chr, len) \
8
- : str_cat_char(dest, ascii_chr, enc)
6
+ static inline void
7
+ _str_cat_char(VALUE str, unsigned c, rb_encoding *enc)
8
+ {
9
+ char s[16];
10
+ int n = rb_enc_codelen(c, enc);
11
+ rb_enc_mbcput(c, s, enc);
12
+ rb_str_buf_cat(str, s, n);
13
+ }
9
14
 
10
15
  enum {
11
- LAT_CAP_TJ = 0x106,
12
- LAT_TJ,
13
- LAT_CAP_CH = 0x10c,
14
- LAT_CH,
15
- LAT_CAP_DJ = 0x110,
16
- LAT_DJ,
17
- LAT_CAP_SH = 0x160,
18
- LAT_SH,
19
- LAT_CAP_ZH = 0x17d,
20
- LAT_ZH,
21
- CYR_CAP_DJ = 0x402,
22
- CYR_CAP_J = 0x408,
23
- CYR_CAP_LJ,
24
- CYR_CAP_NJ,
25
- CYR_CAP_TJ,
26
- CYR_CAP_DZ = 0x40f,
27
- CYR_CAP_A,
28
- CYR_CAP_ZH = 0x416,
29
- CYR_CAP_C = 0x426,
30
- CYR_CAP_CH,
31
- CYR_CAP_SH,
32
- CYR_A = 0x430,
33
- CYR_ZH = 0x436,
34
- CYR_C = 0x446,
35
- CYR_CH,
36
- CYR_SH,
37
- CYR_DJ = 0x452,
38
- CYR_J = 0x458,
39
- CYR_LJ,
40
- CYR_NJ,
41
- CYR_TJ,
42
- CYR_DZ = 0x45f
16
+ LAT_CAP_TJ=262, LAT_TJ, LAT_CAP_CH=268, LAT_CH,
17
+ LAT_CAP_DJ=272, LAT_DJ, LAT_CAP_SH=352, LAT_SH,
18
+ LAT_CAP_ZH=381, LAT_ZH, CYR_CAP_DJ=1026, CYR_CAP_J=1032,
19
+ CYR_CAP_LJ, CYR_CAP_NJ, CYR_CAP_TJ, CYR_CAP_DZ=1039,
20
+ CYR_CAP_A, CYR_CAP_B, CYR_CAP_V, CYR_CAP_G,
21
+ CYR_CAP_D, CYR_CAP_E, CYR_CAP_ZH, CYR_CAP_Z,
22
+ CYR_CAP_I, CYR_CAP_K=1050, CYR_CAP_L, CYR_CAP_M,
23
+ CYR_CAP_N, CYR_CAP_O, CYR_CAP_P, CYR_CAP_R,
24
+ CYR_CAP_S, CYR_CAP_T, CYR_CAP_U, CYR_CAP_F,
25
+ CYR_CAP_H, CYR_CAP_C, CYR_CAP_CH, CYR_CAP_SH,
26
+ CYR_A=1072, CYR_B, CYR_V, CYR_G, CYR_D,
27
+ CYR_E, CYR_ZH, CYR_Z, CYR_I, CYR_K=1082,
28
+ CYR_L, CYR_M, CYR_N, CYR_O, CYR_P,
29
+ CYR_R, CYR_S, CYR_T, CYR_U, CYR_F,
30
+ CYR_H, CYR_C, CYR_CH, CYR_SH, CYR_DJ=1106,
31
+ CYR_J=1112, CYR_LJ, CYR_NJ, CYR_TJ, CYR_DZ=1119
43
32
  };
44
33
 
45
- static inline unsigned int
46
- is_cyrillic(unsigned int c)
34
+ static inline unsigned
35
+ is_cap(unsigned codepoint)
47
36
  {
48
- return c >= CYR_CAP_DJ && c <= CYR_DZ;
37
+ if (codepoint >= 65 && codepoint <= 90) return 1;
38
+ if (codepoint >= CYR_CAP_DJ && codepoint <= CYR_CAP_SH) return 1;
39
+
40
+ switch(codepoint) {
41
+ case LAT_CAP_TJ:
42
+ case LAT_CAP_CH:
43
+ case LAT_CAP_DJ:
44
+ case LAT_CAP_SH:
45
+ case LAT_CAP_ZH:
46
+ return 1;
47
+ default:
48
+ return 0;
49
+ }
49
50
  }
50
51
 
51
- static inline unsigned int
52
- is_upper(unsigned int c)
52
+ static inline unsigned
53
+ is_digraph(unsigned codepoint)
53
54
  {
54
- return (c >= 65 && c <= 90)
55
- || (c >= CYR_CAP_DJ && c <= CYR_CAP_SH)
56
- || c == LAT_CAP_TJ
57
- || c == LAT_CAP_CH
58
- || c == LAT_CAP_DJ
59
- || c == LAT_CAP_SH
60
- || c == LAT_CAP_ZH;
55
+ switch(codepoint) {
56
+ case CYR_LJ:
57
+ case CYR_NJ:
58
+ case CYR_DZ:
59
+ case CYR_CAP_LJ:
60
+ case CYR_CAP_NJ:
61
+ case CYR_CAP_DZ:
62
+ return 1;
63
+ default:
64
+ return 0;
65
+ }
61
66
  }
62
67
 
63
- static inline unsigned int
64
- maps_directly(unsigned int c)
68
+ static unsigned
69
+ digraph_to_cyr(unsigned codepoint, unsigned codepoint2, unsigned capitalize, unsigned *next_out)
65
70
  {
66
- return c != CYR_ZH
67
- && c != CYR_CAP_ZH
68
- && ((c >= CYR_A && c <= CYR_C) || (c >= CYR_CAP_A && c <= CYR_CAP_C));
71
+ static unsigned CYR_MAP[] = {
72
+ CYR_A, CYR_B, CYR_C, CYR_D, CYR_E, CYR_F,
73
+ CYR_G, CYR_H, CYR_I, CYR_J, CYR_K, CYR_L,
74
+ CYR_M, CYR_N, CYR_O, CYR_P, 0, CYR_R,
75
+ CYR_S, CYR_T, CYR_U, CYR_V, 0, 0, 0, CYR_Z
76
+ };
77
+
78
+ static unsigned CYR_CAPS_MAP[] = {
79
+ CYR_CAP_A, CYR_CAP_B, CYR_CAP_C, CYR_CAP_D, CYR_CAP_E, CYR_CAP_F,
80
+ CYR_CAP_G, CYR_CAP_H, CYR_CAP_I, CYR_CAP_J, CYR_CAP_K, CYR_CAP_L,
81
+ CYR_CAP_M, CYR_CAP_N, CYR_CAP_O, CYR_CAP_P, 0, CYR_CAP_R,
82
+ CYR_CAP_S, CYR_CAP_T, CYR_CAP_U, CYR_CAP_V, 0, 0, 0, CYR_CAP_Z
83
+ };
84
+
85
+ if (codepoint2 == LAT_CAP_ZH || codepoint2 == LAT_ZH) {
86
+ switch (codepoint) {
87
+ case 'd': return CYR_DZ;
88
+ case 'D': return CYR_CAP_DZ;
89
+ }
90
+ }
91
+
92
+ if (codepoint2 == 'j' || codepoint2 == 'J') {
93
+ switch (codepoint) {
94
+ case 'l': return CYR_LJ;
95
+ case 'n': return CYR_NJ;
96
+ case 'L': return CYR_CAP_LJ;
97
+ case 'N': return CYR_CAP_NJ;
98
+ }
99
+ }
100
+
101
+ if (codepoint >= 'a' && codepoint <= 'z') return CYR_MAP[codepoint - 'a'];
102
+ if (codepoint >= 'A' && codepoint <= 'Z') return CYR_CAPS_MAP[codepoint - 'A'];
103
+
104
+ switch (codepoint) {
105
+ case LAT_CH: return CYR_CH;
106
+ case LAT_DJ: return CYR_DJ;
107
+ case LAT_SH: return CYR_SH;
108
+ case LAT_TJ: return CYR_TJ;
109
+ case LAT_ZH: return CYR_ZH;
110
+ case LAT_CAP_CH: return CYR_CAP_CH;
111
+ case LAT_CAP_DJ: return CYR_CAP_DJ;
112
+ case LAT_CAP_SH: return CYR_CAP_SH;
113
+ case LAT_CAP_TJ: return CYR_CAP_TJ;
114
+ case LAT_CAP_ZH: return CYR_CAP_ZH;
115
+ }
116
+
117
+ return 0;
69
118
  }
70
119
 
71
- static void
72
- str_cat_char(VALUE str, unsigned int c, rb_encoding *enc)
120
+ static unsigned
121
+ digraph_to_latin(unsigned codepoint, unsigned codepoint2, unsigned capitalize, unsigned *next_out)
73
122
  {
74
- char s[16];
75
- int n = rb_enc_codelen(c, enc);
76
- rb_enc_mbcput(c, s, enc);
77
- rb_str_buf_cat(str, s, n);
123
+ static char LAT_MAP[] = {
124
+ 'a', 'b', 'v', 'g', 'd', 'e', 0, 'z', 'i', 0, 'k', 'l',
125
+ 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c'
126
+ };
127
+
128
+ static char LAT_CAPS_MAP[] = {
129
+ 'A', 'B', 'V', 'G', 'D', 'E', 0, 'Z', 'I', 0, 'K', 'L',
130
+ 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', 'H', 'C'
131
+ };
132
+
133
+ if (codepoint < CYR_CAP_DJ || codepoint > CYR_DZ) return 0;
134
+
135
+ switch (codepoint) {
136
+ case CYR_ZH: return LAT_ZH;
137
+ case CYR_CAP_ZH: return LAT_CAP_ZH;
138
+ }
139
+
140
+ if (codepoint >= CYR_A && codepoint <= CYR_C)
141
+ return LAT_MAP[codepoint - CYR_A];
142
+
143
+ if (codepoint >= CYR_CAP_A && codepoint <= CYR_CAP_C)
144
+ return LAT_CAPS_MAP[codepoint - CYR_CAP_A];
145
+
146
+ if (codepoint >= CYR_A) {
147
+ switch (codepoint) {
148
+ case CYR_J: return 'j';
149
+ case CYR_TJ: return LAT_TJ;
150
+ case CYR_CH: return LAT_CH;
151
+ case CYR_SH: return LAT_SH;
152
+ case CYR_DJ: return LAT_DJ;
153
+ case CYR_LJ: *next_out = 'j'; return 'l';
154
+ case CYR_NJ: *next_out = 'j'; return 'n';
155
+ case CYR_DZ: *next_out = LAT_ZH; return 'd';
156
+ }
157
+ }
158
+ else {
159
+ switch (codepoint) {
160
+ case CYR_CAP_J: return 'J';
161
+ case CYR_CAP_TJ: return LAT_CAP_TJ;
162
+ case CYR_CAP_CH: return LAT_CAP_CH;
163
+ case CYR_CAP_SH: return LAT_CAP_SH;
164
+ case CYR_CAP_DJ: return LAT_CAP_DJ;
165
+ case CYR_CAP_LJ: *next_out = (capitalize || is_cap(codepoint2)) ? 'J' : 'j'; return 'L';
166
+ case CYR_CAP_NJ: *next_out = (capitalize || is_cap(codepoint2)) ? 'J' : 'j'; return 'N';
167
+ case CYR_CAP_DZ: *next_out = (capitalize || is_cap(codepoint2)) ? LAT_CAP_ZH : LAT_ZH; return 'D';
168
+ }
169
+ }
170
+
171
+ return 0;
172
+ }
173
+
174
+ static unsigned
175
+ digraph_to_ascii(unsigned codepoint, unsigned codepoint2, unsigned capitalize, unsigned *next_out)
176
+ {
177
+ switch (codepoint) {
178
+ case LAT_TJ:
179
+ case LAT_CH:
180
+ case CYR_TJ:
181
+ case CYR_CH: return 'c';
182
+ case LAT_SH:
183
+ case CYR_SH: return 's';
184
+ case LAT_ZH:
185
+ case CYR_ZH: return 'z';
186
+ case LAT_DJ:
187
+ case CYR_DJ: *next_out = 'j'; return 'd';
188
+ case LAT_CAP_TJ:
189
+ case LAT_CAP_CH:
190
+ case CYR_CAP_TJ:
191
+ case CYR_CAP_CH: return 'C';
192
+ case LAT_CAP_SH:
193
+ case CYR_CAP_SH: return 'S';
194
+ case LAT_CAP_ZH:
195
+ case CYR_CAP_ZH: return 'Z';
196
+ case LAT_CAP_DJ:
197
+ case CYR_CAP_DJ:
198
+ *next_out = (capitalize || is_cap(codepoint2)) ? 'J' : 'j'; return 'D';
199
+ case CYR_DZ:
200
+ *next_out = (capitalize || is_cap(codepoint2)) ? 'Z' : 'z'; return 'd';
201
+ case CYR_CAP_DZ:
202
+ *next_out = (capitalize || is_cap(codepoint2)) ? 'Z' : 'z'; return 'D';
203
+ default:
204
+ return digraph_to_latin(codepoint, codepoint2, capitalize, next_out);
205
+ }
78
206
  }
79
207
 
80
208
  static VALUE
81
- str_to_latin(VALUE str, int ascii, int bang)
209
+ str_to_srb(VALUE str, int strategy, int bang)
82
210
  {
83
211
  VALUE dest;
84
- long dest_len;
212
+ rb_encoding *enc;
213
+
85
214
  int len, next_len;
86
- int seen_upper = 0;
87
- int force_upper = 0;
215
+ unsigned in, in2, out, out2, seen_cap = 0;
88
216
  char *pos, *end, *seq_start = 0;
89
- char cyr;
90
- unsigned int codepoint = 0;
91
- unsigned int next_codepoint = 0;
92
- rb_encoding *enc;
93
217
 
94
- char CYR_MAP[] = {
95
- 'a', 'b', 'v', 'g', 'd', 'e', '\0', 'z', 'i', '\0', 'k',
96
- 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c'
97
- };
218
+ unsigned (*method)(unsigned, unsigned, unsigned, unsigned*);
98
219
 
99
- char CYR_CAPS_MAP[] = {
100
- 'A', 'B', 'V', 'G', 'D', 'E', '\0', 'Z', 'I', '\0', 'K',
101
- 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', 'H', 'C'
102
- };
220
+ switch(strategy) {
221
+ case 0: method = &digraph_to_cyr; break;
222
+ case 1: method = &digraph_to_latin; break;
223
+ default: method = &digraph_to_ascii;
224
+ }
103
225
 
104
226
  StringValue(str);
105
227
  pos = RSTRING_PTR(str);
@@ -107,123 +229,50 @@ str_to_latin(VALUE str, int ascii, int bang)
107
229
 
108
230
  end = RSTRING_END(str);
109
231
  enc = STR_ENC_GET(str);
110
- dest_len = RSTRING_LEN(str) + 30;
111
- dest = rb_str_buf_new(dest_len);
232
+ dest = rb_str_buf_new(RSTRING_LEN(str) + 30);
112
233
  rb_enc_associate(dest, enc);
113
234
 
114
- codepoint = rb_enc_codepoint_len(pos, end, &len, enc);
235
+ in = rb_enc_codepoint_len(pos, end, &len, enc);
115
236
 
116
237
  while (pos < end) {
117
- if (pos + len < end) {
118
- next_codepoint = rb_enc_codepoint_len(pos + len, end, &next_len, enc);
119
- }
238
+ in2 = out2 = 0;
120
239
 
121
- /* Latin -> "ASCII Latin" conversion */
122
- if (ascii && codepoint >= LAT_CAP_TJ && codepoint <= LAT_ZH) {
123
- if (seq_start) {
124
- rb_str_buf_cat(dest, seq_start, pos - seq_start);
125
- seq_start = 0;
126
- }
240
+ if (pos + len < end)
241
+ in2 = rb_enc_codepoint_len(pos + len, end, &next_len, enc);
127
242
 
128
- switch (codepoint) {
129
- case LAT_TJ:
130
- case LAT_CH: rb_str_buf_cat(dest, "c", 1); break;
131
- case LAT_DJ: rb_str_buf_cat(dest, "dj", 2); break;
132
- case LAT_SH: rb_str_buf_cat(dest, "s", 1); break;
133
- case LAT_ZH: rb_str_buf_cat(dest, "z", 1); break;
134
- case LAT_CAP_TJ:
135
- case LAT_CAP_CH: rb_str_buf_cat(dest, "C", 1); break;
136
- case LAT_CAP_SH: rb_str_buf_cat(dest, "S", 1); break;
137
- case LAT_CAP_ZH: rb_str_buf_cat(dest, "Z", 1); break;
138
- case LAT_CAP_DJ:
139
- (seen_upper || is_upper(next_codepoint))
140
- ? rb_str_buf_cat(dest, "DJ", 2)
141
- : rb_str_buf_cat(dest, "Dj", 2);
142
- break;
143
- default:
144
- rb_str_buf_cat(dest, pos, len);
145
- }
146
- }
243
+ out = (*method)(in, in2, seen_cap, &out2);
147
244
 
148
- /* Cyrillic coderange */
149
- else if (is_cyrillic(codepoint)) {
245
+ if (out) {
246
+ /* flush previous untranslatable sequence */
150
247
  if (seq_start) {
151
248
  rb_str_buf_cat(dest, seq_start, pos - seq_start);
152
249
  seq_start = 0;
153
250
  }
154
251
 
155
- if (codepoint >= CYR_A) {
156
- if (maps_directly(codepoint)) {
157
- cyr = CYR_MAP[codepoint - CYR_A];
158
- cyr ? rb_str_buf_cat(dest, &cyr, 1)
159
- : rb_str_buf_cat(dest, pos, len);
160
- }
161
- else {
162
- switch (codepoint) {
163
- case CYR_J: rb_str_buf_cat(dest, "j", 1); break;
164
- case CYR_LJ: rb_str_buf_cat(dest, "lj", 2); break;
165
- case CYR_NJ: rb_str_buf_cat(dest, "nj", 2); break;
166
- case CYR_DJ: STR_CAT_COND_ASCII(ascii, dest, "dj", LAT_DJ, 2, enc); break;
167
- case CYR_TJ: STR_CAT_COND_ASCII(ascii, dest, "c", LAT_TJ, 1, enc); break;
168
- case CYR_CH: STR_CAT_COND_ASCII(ascii, dest, "c", LAT_CH, 1, enc); break;
169
- case CYR_SH: STR_CAT_COND_ASCII(ascii, dest, "s", LAT_SH, 1, enc); break;
170
- case CYR_ZH: STR_CAT_COND_ASCII(ascii, dest, "z", LAT_ZH, 1, enc); break;
171
- case CYR_DZ:
172
- rb_str_buf_cat(dest, "d", 1);
173
- STR_CAT_COND_ASCII(ascii, dest, "z", LAT_ZH, 1, enc);
174
- break;
175
- default:
176
- rb_str_buf_cat(dest, pos, len);
177
- }
178
- }
179
- }
180
- else {
181
- if (maps_directly(codepoint)) {
182
- cyr = CYR_CAPS_MAP[codepoint - CYR_CAP_A];
183
- cyr ? rb_str_buf_cat(dest, &cyr, 1)
184
- : rb_str_buf_cat(dest, pos, len);
185
- }
186
- else {
187
- force_upper = seen_upper || is_upper(next_codepoint);
188
-
189
- switch (codepoint) {
190
- case CYR_CAP_J: rb_str_buf_cat(dest, "J", 1); break;
191
- case CYR_CAP_LJ: rb_str_buf_cat(dest, (force_upper ? "LJ" : "Lj"), 2); break;
192
- case CYR_CAP_NJ: rb_str_buf_cat(dest, (force_upper ? "NJ" : "Nj"), 2); break;
193
- case CYR_CAP_TJ: STR_CAT_COND_ASCII(ascii, dest, "C", LAT_CAP_TJ, 1, enc); break;
194
- case CYR_CAP_CH: STR_CAT_COND_ASCII(ascii, dest, "C", LAT_CAP_CH, 1, enc); break;
195
- case CYR_CAP_SH: STR_CAT_COND_ASCII(ascii, dest, "S", LAT_CAP_SH, 1, enc); break;
196
- case CYR_CAP_ZH: STR_CAT_COND_ASCII(ascii, dest, "Z", LAT_CAP_ZH, 1, enc); break;
197
- case CYR_CAP_DJ: STR_CAT_COND_ASCII(ascii, dest, (force_upper ? "DJ" : "Dj"), LAT_CAP_DJ, 2, enc); break;
198
- case CYR_CAP_DZ:
199
- rb_str_buf_cat(dest, "D", 1);
200
- force_upper ? STR_CAT_COND_ASCII(ascii, dest, "Z", LAT_CAP_ZH, 1, enc)
201
- : STR_CAT_COND_ASCII(ascii, dest, "z", LAT_ZH, 1, enc);
202
- break;
203
- default:
204
- rb_str_buf_cat(dest, pos, len);
205
- }
206
- }
207
- }
252
+ _str_cat_char(dest, out, enc);
253
+ if (out2) _str_cat_char(dest, out2, enc);
208
254
  }
209
- else {
210
- /* Mark the start of a copyable sequence */
211
- if (!seq_start) seq_start = pos;
255
+ else if (!seq_start) {
256
+ /* mark the beginning of an untranslatable sequence */
257
+ seq_start = pos;
258
+ }
259
+
260
+ /* for cyrillic output, skip the second half of an input digraph */
261
+ if (strategy == 0 && is_digraph(out)) {
262
+ pos += next_len;
263
+ if (pos + len < end)
264
+ in2 = rb_enc_codepoint_len(pos + len, end, &next_len, enc);
212
265
  }
213
266
 
214
- seen_upper = is_upper(codepoint);
267
+ seen_cap = is_cap(in);
215
268
 
216
269
  pos += len;
217
270
  len = next_len;
218
-
219
- codepoint = next_codepoint;
220
- next_codepoint = 0;
271
+ in = in2;
221
272
  }
222
273
 
223
- /* Flush the last sequence, if any */
224
- if (seq_start) {
225
- rb_str_buf_cat(dest, seq_start, pos - seq_start);
226
- }
274
+ /* flush final sequence */
275
+ if (seq_start) rb_str_buf_cat(dest, seq_start, pos - seq_start);
227
276
 
228
277
  if (bang) {
229
278
  rb_str_shared_replace(str, dest);
@@ -237,7 +286,35 @@ str_to_latin(VALUE str, int ascii, int bang)
237
286
  }
238
287
 
239
288
  /**
240
- * Returns a copy of <i>str</i> with the Serbian Cyrillic characters
289
+ * Returns a copy of <i>str</i> with Latin characters transliterated
290
+ * into Serbian Cyrillic.
291
+ *
292
+ * @overload to_cyrillic(str)
293
+ * @param [String] str text to be transliterated
294
+ * @return [String] transliterated text
295
+ */
296
+ static VALUE
297
+ rb_str_to_cyrillic(VALUE self, VALUE str)
298
+ {
299
+ return str_to_srb(str, 0, 0);
300
+ }
301
+
302
+ /**
303
+ * Performs transliteration of <code>Byk.to_cyrillic</code> in place,
304
+ * returning <i>str</i>, whether any changes were made or not.
305
+ *
306
+ * @overload to_cyrillic!(str)
307
+ * @param [String] str text to be transliterated
308
+ * @return [String] transliterated text
309
+ */
310
+ static VALUE
311
+ rb_str_to_cyrillic_bang(VALUE self, VALUE str)
312
+ {
313
+ return str_to_srb(str, 0, 1);
314
+ }
315
+
316
+ /**
317
+ * Returns a copy of <i>str</i> with Serbian Cyrillic characters
241
318
  * transliterated into Latin.
242
319
  *
243
320
  * @overload to_latin(str)
@@ -247,12 +324,12 @@ str_to_latin(VALUE str, int ascii, int bang)
247
324
  static VALUE
248
325
  rb_str_to_latin(VALUE self, VALUE str)
249
326
  {
250
- return str_to_latin(str, 0, 0);
327
+ return str_to_srb(str, 1, 0);
251
328
  }
252
329
 
253
330
  /**
254
- * Performs the transliteration of <code>Byk.to_latin</code> in place,
255
- * returning <i>str</i>, whether changes were made or not.
331
+ * Performs transliteration of <code>Byk.to_latin</code> in place,
332
+ * returning <i>str</i>, whether any changes were made or not.
256
333
  *
257
334
  * @overload to_latin!(str)
258
335
  * @param [String] str text to be transliterated
@@ -261,12 +338,12 @@ rb_str_to_latin(VALUE self, VALUE str)
261
338
  static VALUE
262
339
  rb_str_to_latin_bang(VALUE self, VALUE str)
263
340
  {
264
- return str_to_latin(str, 0, 1);
341
+ return str_to_srb(str, 1, 1);
265
342
  }
266
343
 
267
344
  /**
268
- * Returns a copy of <i>str</i> with the Serbian Cyrillic
269
- * characters transliterated into ASCII Latin.
345
+ * Returns a copy of <i>str</i> with Serbian characters transliterated
346
+ * into ASCII Latin.
270
347
  *
271
348
  * @overload to_ascii_latin(str)
272
349
  * @param [String] str text to be transliterated
@@ -275,12 +352,12 @@ rb_str_to_latin_bang(VALUE self, VALUE str)
275
352
  static VALUE
276
353
  rb_str_to_ascii_latin(VALUE self, VALUE str)
277
354
  {
278
- return str_to_latin(str, 1, 0);
355
+ return str_to_srb(str, 2, 0);
279
356
  }
280
357
 
281
358
  /**
282
- * Performs the transliteration of <code>Byk.to_ascii_latin</code> in
283
- * place, returning <i>str</i>, whether changes were made or not.
359
+ * Performs transliteration of <code>Byk.to_ascii_latin</code> in
360
+ * place, returning <i>str</i>, whether any changes were made or not.
284
361
  *
285
362
  * @overload to_ascii_latin!(str)
286
363
  * @param [String] str text to be transliterated
@@ -289,12 +366,14 @@ rb_str_to_ascii_latin(VALUE self, VALUE str)
289
366
  static VALUE
290
367
  rb_str_to_ascii_latin_bang(VALUE self, VALUE str)
291
368
  {
292
- return str_to_latin(str, 1, 1);
369
+ return str_to_srb(str, 2, 1);
293
370
  }
294
371
 
295
372
  void Init_byk_native(void)
296
373
  {
297
374
  VALUE Byk = rb_define_module("Byk");
375
+ rb_define_singleton_method(Byk, "to_cyrillic", rb_str_to_cyrillic, 1);
376
+ rb_define_singleton_method(Byk, "to_cyrillic!", rb_str_to_cyrillic_bang, 1);
298
377
  rb_define_singleton_method(Byk, "to_latin", rb_str_to_latin, 1);
299
378
  rb_define_singleton_method(Byk, "to_latin!", rb_str_to_latin_bang, 1);
300
379
  rb_define_singleton_method(Byk, "to_ascii_latin", rb_str_to_ascii_latin, 1);
@@ -1,3 +1,3 @@
1
1
  module Byk
2
- VERSION = "0.6.0"
2
+ VERSION = "1.0.0"
3
3
  end
@@ -1,5 +1,4 @@
1
1
  # coding: utf-8
2
-
3
2
  require "spec_helper"
4
3
 
5
4
  describe Byk do
@@ -24,70 +23,114 @@ describe Byk do
24
23
  let(:non_serbian_cyrillic) { non_serbian_cyrillic_coderange.join }
25
24
 
26
25
  let(:ascii) { "The quick brown fox jumps over the lazy dog." }
27
- let(:other) { "संस्कृतम् saṃskṛtam" }
26
+ let(:other) { "संस्कृतम्" }
28
27
 
29
- let(:mixed) { "संस्कृतम् saṃskṛtam илити Sanskrit, obrati ПАЖЊУ." }
30
- let(:mixed_latin) { "संस्कृतम् saṃskṛtam iliti Sanskrit, obrati PAŽNJU." }
31
- let(:mixed_ascii_latin) { "संस्कृतम् saṃskṛtam iliti Sanskrit, obrati PAZNJU." }
28
+ let(:mixed) { "संस्कृतम् илити Sanskrit, obrati ПАЖЊУ." }
29
+ let(:mixed_cyrillic) { "संस्कृतम् илити Санскрит, обрати ПАЖЊУ." }
30
+ let(:mixed_latin) { "संस्कृतम् iliti Sanskrit, obrati PAŽNJU." }
31
+ let(:mixed_ascii_latin) { "संस्कृतम् iliti Sanskrit, obrati PAZNJU." }
32
32
 
33
- it "doesn't convert an empty string" do
33
+ it "doesn't translate an empty string" do
34
34
  expect(Byk.send(method, "")).to eq ""
35
35
  end
36
36
 
37
- it "doesn't convert ASCII text" do
38
- expect(Byk.send(method, ascii)).to eq ascii
37
+ it "doesn't translate foreign coderanges" do
38
+ expect(Byk.send(method, other)).to eq other
39
39
  end
40
+ end
40
41
 
41
- it "doesn't convert non-Serbian Cyrillic" do
42
+ shared_examples :cyrillization_method do |method|
43
+ include_examples :base, method
44
+
45
+ let(:edge_cases) do
46
+ [
47
+ ["lJ", "љ"],
48
+ ["nJ", "њ"],
49
+ ["dŽ", "џ"]
50
+ ]
51
+ end
52
+
53
+ it "doesn't translate Cyrillic" do
54
+ expect(Byk.send(method, pangram)).to eq pangram
55
+ end
56
+
57
+ it "doesn't translate non-Serbian Cyrillic" do
42
58
  expect(Byk.send(method, non_serbian_cyrillic)).to eq non_serbian_cyrillic
43
59
  end
44
60
 
45
- it "doesn't convert other coderanges" do
46
- expect(Byk.send(method, other)).to eq other
61
+ it "translates Latin to Cyrillic" do
62
+ expect(Byk.send(method, pangram_latin)).to eq pangram
63
+ end
64
+
65
+ it "translates Latin caps to Cyrillic caps" do
66
+ expect(Byk.send(method, pangram_latin_caps)).to eq pangram_caps
67
+ end
68
+
69
+ it "translates mixed text properly" do
70
+ expect(Byk.send(method, mixed)).to eq mixed_cyrillic
71
+ end
72
+
73
+ it "translates edge cases properly" do
74
+ edge_cases.each do |input, output|
75
+ expect(Byk.send(method, input)).to eq output
76
+ end
77
+ end
78
+
79
+ it "translates ABECEDA to AZBUKA" do
80
+ expect(Byk::ABECEDA.map { |l| l.dup.send(:to_cyrillic) }).to match_array(Byk::AZBUKA)
81
+ end
82
+
83
+ it "translates ABECEDA_CAPS to AZBUKA_CAPS" do
84
+ expect(Byk::ABECEDA_CAPS.map { |l| l.dup.send(:to_cyrillic) }).to match_array(Byk::AZBUKA_CAPS)
47
85
  end
48
86
  end
49
87
 
50
88
  shared_examples :latinization_method do |method|
51
89
  include_examples :base, method
52
90
 
53
- let(:edge_cases) {
91
+ let(:edge_cases) do
54
92
  [
55
- ["Њ", "Nj"],
56
- ["Љ", "Lj"],
57
- ["Џ", "Dž"],
58
- ["ЊЊ", "NJNJ"],
59
93
  ["ЉЉ", "LJLJ"],
94
+ ["ЊЊ", "NJNJ"],
60
95
  ["ЏЏ", "DŽDŽ"]
61
96
  ]
62
- }
97
+ end
63
98
 
64
- it "doesn't convert Latin" do
99
+ it "doesn't translate ASCII" do
100
+ expect(Byk.send(method, ascii)).to eq ascii
101
+ end
102
+
103
+ it "doesn't translate Latin" do
65
104
  expect(Byk.send(method, pangram_latin)).to eq pangram_latin
66
105
  end
67
106
 
68
- it "converts Cyrillic to Latin" do
107
+ it "doesn't translate non-Serbian Cyrillic" do
108
+ expect(Byk.send(method, non_serbian_cyrillic)).to eq non_serbian_cyrillic
109
+ end
110
+
111
+ it "translates Cyrillic to Latin" do
69
112
  expect(Byk.send(method, pangram)).to eq pangram_latin
70
113
  end
71
114
 
72
- it "converts Cyrillic caps to Latin caps" do
115
+ it "translates Cyrillic caps to Latin caps" do
73
116
  expect(Byk.send(method, pangram_caps)).to eq pangram_latin_caps
74
117
  end
75
118
 
76
- it "converts mixed text properly" do
119
+ it "translates mixed text properly" do
77
120
  expect(Byk.send(method, mixed)).to eq mixed_latin
78
121
  end
79
122
 
80
- it "converts edge cases properly" do
123
+ it "translates edge cases properly" do
81
124
  edge_cases.each do |input, output|
82
125
  expect(Byk.send(method, input)).to eq output
83
126
  end
84
127
  end
85
128
 
86
- it "converts AZBUKA to ABECEDA" do
129
+ it "translates AZBUKA to ABECEDA" do
87
130
  expect(Byk::AZBUKA.map { |l| l.dup.send(method) }).to match_array(Byk::ABECEDA)
88
131
  end
89
132
 
90
- it "converts AZBUKA_CAPS to ABECEDA_CAPS" do
133
+ it "translates AZBUKA_CAPS to ABECEDA_CAPS" do
91
134
  expect(Byk::AZBUKA_CAPS.map { |l| l.dup.send(method) }).to match_array(Byk::ABECEDA_CAPS)
92
135
  end
93
136
  end
@@ -95,7 +138,7 @@ describe Byk do
95
138
  shared_examples :ascii_latinization_method do |method|
96
139
  include_examples :base, method
97
140
 
98
- let(:edge_cases) {
141
+ let(:edge_cases) do
99
142
  [
100
143
  ["Њ", "Nj"],
101
144
  ["Љ", "Lj"],
@@ -107,32 +150,36 @@ describe Byk do
107
150
  ["ЏЏ", "DZDZ"],
108
151
  ["ЂЂ", "DJDJ"],
109
152
  ["ĐĐ", "DJDJ"],
110
- ["ЂУРАЂ Ђорђевић", "DJURADJ Djordjevic"],
111
- ["ĐURAĐ Đorđević", "DJURADJ Djordjevic"]
153
+ ["ЂУРАЂ Ђурђевић", "DJURADJ Djurdjevic"],
154
+ ["ĐURAĐ Đurđević", "DJURADJ Djurdjevic"]
112
155
  ]
113
- }
114
-
115
- it "converts Cyrillic to ASCII Latin" do
116
- expect(Byk.send(method, pangram)).to eq pangram_ascii_latin
117
156
  end
118
157
 
119
- it "converts Cyrillic caps to ASCII Latin caps" do
120
- expect(Byk.send(method, pangram_caps)).to eq pangram_ascii_latin_caps
158
+ it "doesn't translate ASCII" do
159
+ expect(Byk.send(method, ascii)).to eq ascii
121
160
  end
122
161
 
123
- it "converts Latin to ASCII Latin" do
162
+ it "translates Latin to ASCII Latin" do
124
163
  expect(Byk.send(method, pangram_latin)).to eq pangram_ascii_latin
125
164
  end
126
165
 
127
- it "converts Latin caps to ASCII Latin caps" do
166
+ it "translates Latin caps to ASCII Latin caps" do
128
167
  expect(Byk.send(method, pangram_latin_caps)).to eq pangram_ascii_latin_caps
129
168
  end
130
169
 
131
- it "converts mixed text properly" do
170
+ it "translates Cyrillic to ASCII Latin" do
171
+ expect(Byk.send(method, pangram)).to eq pangram_ascii_latin
172
+ end
173
+
174
+ it "translates Cyrillic caps to ASCII Latin caps" do
175
+ expect(Byk.send(method, pangram_caps)).to eq pangram_ascii_latin_caps
176
+ end
177
+
178
+ it "translates mixed text properly" do
132
179
  expect(Byk.send(method, mixed)).to eq mixed_ascii_latin
133
180
  end
134
181
 
135
- it "converts edge cases properly" do
182
+ it "translates edge cases properly" do
136
183
  edge_cases.each do |input, output|
137
184
  expect(Byk.send(method, input)).to eq output
138
185
  end
@@ -141,18 +188,28 @@ describe Byk do
141
188
 
142
189
  shared_examples :non_destructive_method do |method|
143
190
  it "doesn't modify the arg" do
144
- str = "Ж"
191
+ str = "ЖŽ"
145
192
  expect { Byk.send(method, str) }.to_not change { str }
146
193
  end
147
194
  end
148
195
 
149
196
  shared_examples :destructive_method do |method|
150
197
  it "modifies the arg" do
151
- str = "Ж"
198
+ str = "ЖŽ"
152
199
  expect { Byk.send(method, str) }.to change { str }
153
200
  end
154
201
  end
155
202
 
203
+ describe ".to_cyrillic" do
204
+ it_behaves_like :cyrillization_method, :to_cyrillic
205
+ it_behaves_like :non_destructive_method, :to_cyrillic
206
+ end
207
+
208
+ describe ".to_cyrillic!" do
209
+ it_behaves_like :cyrillization_method, :to_cyrillic!
210
+ it_behaves_like :destructive_method, :to_cyrillic!
211
+ end
212
+
156
213
  describe ".to_latin" do
157
214
  it_behaves_like :latinization_method, :to_latin
158
215
  it_behaves_like :non_destructive_method, :to_latin
@@ -176,7 +233,7 @@ end
176
233
 
177
234
  describe String do
178
235
  it "responds to Byk methods" do
179
- Byk.instance_methods.each do |method|
236
+ Byk.singleton_methods.each do |method|
180
237
  expect("").to respond_to(method)
181
238
  end
182
239
  end
metadata CHANGED
@@ -1,15 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: byk
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.0
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Nikola Topalović
8
8
  autorequire:
9
- bindir: bin
9
+ bindir: exe
10
10
  cert_chain: []
11
- date: 2015-04-25 00:00:00.000000000 Z
11
+ date: 2016-04-09 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rake
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '10.5'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '10.5'
13
27
  - !ruby/object:Gem::Dependency
14
28
  name: rake-compiler
15
29
  requirement: !ruby/object:Gem::Requirement
@@ -38,10 +52,11 @@ dependencies:
38
52
  - - "~>"
39
53
  - !ruby/object:Gem::Version
40
54
  version: '3.2'
41
- description: Provides C-optimized methods for transliteration of Serbian Cyrillic
42
- into Latin.
55
+ description: Fast transliteration of Serbian Cyrillic to Latin and back. Brzo preslovljavanje
56
+ ćirilice u latinicu i obratno.
43
57
  email: nikola.topalovic@gmail.com
44
- executables: []
58
+ executables:
59
+ - byk
45
60
  extensions:
46
61
  - ext/byk/extconf.rb
47
62
  extra_rdoc_files: []
@@ -49,6 +64,7 @@ files:
49
64
  - CHANGELOG.md
50
65
  - LICENSE
51
66
  - README.md
67
+ - exe/byk
52
68
  - ext/byk/byk.c
53
69
  - ext/byk/extconf.rb
54
70
  - lib/byk.rb
@@ -76,9 +92,10 @@ required_rubygems_version: !ruby/object:Gem::Requirement
76
92
  version: '0'
77
93
  requirements: []
78
94
  rubyforge_project:
79
- rubygems_version: 2.4.5
95
+ rubygems_version: 2.5.1
80
96
  signing_key:
81
97
  specification_version: 4
82
- summary: Fast transliteration of Serbian Cyrillic into Latin.
98
+ summary: Fast transliteration of Serbian Cyrillic to Latin and back. Brzo preslovljavanje
99
+ ćirilice u latinicu i obratno.
83
100
  test_files:
84
101
  - spec/byk_spec.rb