byk 0.6.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 34203e0b4291cde495d17da65522df586de7e712
4
- data.tar.gz: 290d743dab23c58241520252bd81d4ae4115ce98
3
+ metadata.gz: cc996c9d9dc81f884e02cc1dd760eeb57b6545fc
4
+ data.tar.gz: de07860c2cb41bcb39b299fee4500fd2bf01db73
5
5
  SHA512:
6
- metadata.gz: f11d00e9ac1057a5596804e03c6c4a6c41841bedc21030a9ed776cfbaaabba85a341a62de71c990c8deadc8f8384bf263b41d477b33b299af26b55acef47fe0c
7
- data.tar.gz: 335ddfeca9f6793f2887c1cc93cfc916e011f7dd01fd97073162871148d0fe61395bdb1115c5ed4f7583ff207f6b6d27462c8920a7a70b016b9427420796bc28
6
+ metadata.gz: 16e97855924c380b205e2e651fdcde391785fe051c2971d948f801ff4260eb691dc4c3304ac17b3083fc0a2469f26d134c9622f74b058f6950d5fd8dfaf62383
7
+ data.tar.gz: c85659aaaccbc5e1db30305b52e2f4955de160dcb7a617ae564877619fe5f36d852ea7c17228f301e639aae3d4793133baa5d644faeffc83942d6b179bef53e9
@@ -1,5 +1,10 @@
1
1
  # Changelog
2
2
 
3
+ ### Byk 1.0.0 (2016-04-09)
4
+
5
+ * Introduced `#to_cyrillic` and `#to_cyrillic!`
6
+ * Introduced console utility
7
+
3
8
  ### Byk 0.6.0 (2015-04-25)
4
9
 
5
10
  * Introduced module methods and the optional safe require
data/README.md CHANGED
@@ -4,39 +4,85 @@ Byk
4
4
  [![Gem Version](https://badge.fury.io/rb/byk.svg)](https://rubygems.org/gems/byk)
5
5
  [![Build Status](https://travis-ci.org/topalovic/byk.svg?branch=master)](https://travis-ci.org/topalovic/byk)
6
6
 
7
- Ruby gem for fast transliteration of Serbian Cyrillic into Latin
8
- <br />
9
- <sub>Inspired by @dejan's
10
- [nice little gem](https://github.com/dejan/srbovanje),
11
- this one comes with a C-optimized twist</sub>
7
+ Ruby gem for fast transliteration of Serbian Cyrillic Latin
12
8
 
13
9
  ![byk](https://cloud.githubusercontent.com/assets/626128/7155207/07545960-e35d-11e4-804e-5fdee70a3e30.png)
14
10
 
15
11
 
16
12
  ## Installation
17
13
 
18
- Add this line to your application's Gemfile:
14
+ Byk can be used as a standalone console utility or as a `String`
15
+ extension in your Ruby programs. It has zero dependencies beyond
16
+ vanilla Ruby and the toolchain for building native gems <sup>1</sup>.
17
+
18
+ You can install it directly:
19
+
20
+ ```ruby
21
+ $ gem install byk
22
+ ```
23
+
24
+ or add it as a dependency in your application's Gemfile:
19
25
 
20
26
  ```ruby
21
27
  gem "byk"
22
28
  ```
23
29
 
24
- And then execute:
30
+ <sub><sup>1</sup> For Windows, you might want to check out
31
+ [DevKit](https://github.com/oneclick/rubyinstaller/wiki/Development-Kit)</sub>
32
+
33
+
34
+ ## Usage
35
+
36
+ ### As a standalone utility
37
+
38
+ Here's the help banner with all the available options:
25
39
 
26
40
  ```
27
- $ bundle
41
+ usage: byk [options] [files]
42
+
43
+ options:
44
+ -c, --cyrillic convert input to Cyrillic (default)
45
+ -l, --latin convert input to Latin
46
+ -a, --ascii convert input to "ASCII Latin"
47
+ -v, --version show version
28
48
  ```
29
49
 
30
- Or install it yourself as:
50
+ Translation goes to stdout so you can redirect it or pipe it as you
51
+ see fit. Let's take a look at some common scenarios.
31
52
 
53
+ To translate files to Cyrillic:
54
+ ```sh
55
+ $ byk in1.txt in2.txt > out.txt
32
56
  ```
33
- $ gem install byk
57
+
58
+ To translate files to Latin and search for a phrase:
59
+ ```sh
60
+ $ byk -l file.txt | grep stvar
34
61
  ```
35
62
 
63
+ Ad hoc conversion:
64
+ ```sh
65
+ $ echo "Вук Стефановић Караџић" | byk -a
66
+ Vuk Stefanovic Karadzic
67
+ ```
36
68
 
37
- ## Usage
69
+ or simply omit args and type away:
70
+ ```sh
71
+ $ byk
72
+ a u ruke Mandušića Vuka
73
+ biće svaka puška ubojita!
74
+ ^D
75
+ а у руке Мандушића Вука
76
+ биће свака пушка убојита!
77
+ ```
38
78
 
39
- First, make sure to require the gem in your initializer:
79
+ `^D` being <kbd>ctrl</kbd> <kbd>d</kbd>.
80
+
81
+
82
+ ### As a `String` extension
83
+
84
+ Unless you're using Bundler, make sure to require the gem in your
85
+ initializer:
40
86
 
41
87
  ```ruby
42
88
  require "byk"
@@ -45,22 +91,23 @@ require "byk"
45
91
  This will extend `String` with a couple of simple methods:
46
92
 
47
93
  ```ruby
48
- "Шеширџија".to_latin # => "Šeširdžija"
49
- "Шеширџија".to_ascii_latin # => "Sesirdzija"
50
- "Šeširdžija".to_ascii_latin # => "Sesirdzija"
94
+ "Šeširdžija".to_cyrillic # => "Шеширџија"
95
+ "Шеширџија".to_latin # => "Šeširdžija"
96
+ "Шеширџија".to_ascii_latin # => "Sesirdzija"
51
97
  ```
52
98
 
53
- There's also a destructive variant of each:
99
+ These do not modify the receiver. For that, there's a destructive
100
+ variant of each:
54
101
 
55
102
  ```ruby
56
- text = "Жвазбука"
57
- text.to_latin! # => "Žvazbuka"
58
- text # => "Žvazbuka"
59
- text.to_ascii_latin! # => "Zvazbuka"
60
- text # => "Zvazbuka"
103
+ text = "Šeširdžija"
104
+ text.to_cyrillic! # => "Шеширџија"
105
+ text.to_latin! # => "Šeširdžija"
106
+ text.to_ascii_latin! # => "Sesirdzija"
107
+ text # => "Sesirdzija"
61
108
  ```
62
109
 
63
- Note that these methods take into account the
110
+ Note that both latinization methods observe
64
111
  [digraph capitalization rules](http://sr.wikipedia.org/wiki/Гајица#.D0.94.D0.B8.D0.B3.D1.80.D0.B0.D1.84.D0.B8):
65
112
 
66
113
  ```ruby
@@ -68,63 +115,88 @@ Note that these methods take into account the
68
115
  "ĐORĐE Đorđević".to_ascii_latin # => "DJORDJE Djordjevic"
69
116
  ```
70
117
 
71
- If you prefer not to monkey patch your strings, you can use the "safe"
72
- require:
118
+
119
+ ### Safe require
120
+
121
+ If you prefer not to monkey patch `String`, you can do a "safe"
122
+ require in your Gemfile:
123
+
73
124
 
74
125
  ```ruby
75
- require "byk/safe"
126
+ gem "byk", :require => "byk/safe"
76
127
  ```
77
128
 
78
- and then call the module methods:
129
+ or initializer:
79
130
 
80
131
  ```ruby
81
- text = "Вук"
82
- Byk.to_latin(text) # => "Vuk"
83
- text # => "Byk"
84
- Byk.to_latin!(text) # => "Vuk"
85
- text # => "Vuk"
132
+ require "byk/safe"
86
133
  ```
87
134
 
135
+ Then, you should rely on module methods:
88
136
 
89
- ## Testing
137
+ ```ruby
138
+ text = "Жвазбука"
90
139
 
91
- To test the gem, clone the repo and run:
140
+ Byk.to_latin(text) # => "Žvazbuka"
141
+ text # => "Жвазбука"
142
+
143
+ Byk.to_latin!(text) # => "Žvazbuka"
144
+ text # => "Žvazbuka"
92
145
 
146
+ # etc.
93
147
  ```
94
- $ bundle
95
- $ bundle exec rake
148
+
149
+
150
+ ## How fast is "fast" transliteration?
151
+
152
+ Here's a quick test:
153
+
154
+ ```sh
155
+ $ wget https://sr.wikipedia.org/ -O sample
156
+ $ du -h sample
157
+ 128K
158
+
159
+ $ time byk -l sample > /dev/null
160
+ 0.08s user 0.04s system 96% cpu 0.126 total
96
161
  ```
97
162
 
163
+ Let's up the ante:
164
+
165
+ ```sh
166
+ $ for i in {1..800}; do cat sample; done > big
167
+ $ du -h big
168
+ 97M
169
+
170
+ $ time byk -l big > /dev/null
171
+ 1.71s user 0.13s system 99% cpu 1.846 total
172
+ ```
98
173
 
99
- ## How fast is fast?
174
+ So, ~100MB in under 2s. Fast enough, I suppose. You can expect it to
175
+ scale linearly.
100
176
 
101
- About [10-40x faster](benchmark) than the baseline Ruby implementation
102
- on my hardware, depending on the string's Cyrillic content ratio. YMMV
103
- of course.
177
+ Compared to the pure Ruby implementation, it is about
178
+ [10-30x faster](benchmark), depending on the input composition and the
179
+ transliteration method applied.
104
180
 
105
181
 
106
- ## Raison d'être
182
+ ## Testing
107
183
 
108
- This kind of speed-up might be worthwhile for massive localization
109
- projects, e.g. sites supporting dual script content. Remember,
110
- `Benchmark` is your friend.
184
+ To test the gem, clone the repo and run:
111
185
 
112
- I found transliteration to be a straightforward little problem that
113
- lends itself well to optimization. It also gave me an excuse to play
114
- with Ruby extensions, so there :smirk_cat:
186
+ ```
187
+ $ bundle && bundle exec rake
188
+ ```
115
189
 
116
190
 
117
191
  ## Compatibility
118
192
 
119
- Byk is supported under MRI Ruby >= 1.9.2.
193
+ Byk is supported under MRI 1.9.2+. I might try my hand in writing a
194
+ JRuby extension in a future release.
120
195
 
121
- I don't plan to support 1.8.7 or older due to substantial C API
122
- changes between 1.8 and 1.9. It doesn't build under Rubinius
123
- currently, but I intend to support it in future releases.
124
196
 
125
197
 
126
198
  ## License
127
199
 
128
- This gem is released under the [MIT License](http://www.opensource.org/licenses/MIT).
200
+ This gem is released under the [MIT License](LICENSE).
129
201
 
130
202
  Уздравље!
data/exe/byk ADDED
@@ -0,0 +1,51 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "byk/safe"
4
+ require "optparse"
5
+
6
+ trap "SIGINT" do
7
+ exit 130
8
+ end
9
+
10
+ method_name = :to_cyrillic
11
+
12
+ opts = OptionParser.new do |opt|
13
+ opt.banner = "usage: byk [options] [files]"
14
+ opt.summary_width = 20
15
+
16
+ opt.separator ""
17
+ opt.separator "options:"
18
+
19
+ opt.on("-c", "--cyrillic", "convert input to Cyrillic (default)") do
20
+ method_name = :to_cyrillic
21
+ end
22
+
23
+ opt.on("-l", "--latin", "convert input to Latin") do
24
+ method_name = :to_latin
25
+ end
26
+
27
+ opt.on("-a", "--ascii", 'convert input to "ASCII Latin"') do
28
+ method_name = :to_ascii_latin
29
+ end
30
+
31
+ opt.on_tail("-v", "--version", "show version") do
32
+ puts Byk::VERSION
33
+ exit
34
+ end
35
+ end
36
+
37
+ begin
38
+ opts.parse!
39
+ rescue OptionParser::InvalidOption => e
40
+ puts e
41
+ puts
42
+ puts opts
43
+ exit 1
44
+ end
45
+
46
+ begin
47
+ puts Byk.send(method_name, ARGF.read)
48
+ rescue => e
49
+ puts e
50
+ exit 1
51
+ end
@@ -3,103 +3,225 @@
3
3
 
4
4
  #define STR_ENC_GET(str) rb_enc_from_index(ENCODING_GET(str))
5
5
 
6
- #define STR_CAT_COND_ASCII(ascii, dest, chr, ascii_chr, len, enc) \
7
- ascii ? rb_str_buf_cat(dest, chr, len) \
8
- : str_cat_char(dest, ascii_chr, enc)
6
+ static inline void
7
+ _str_cat_char(VALUE str, unsigned c, rb_encoding *enc)
8
+ {
9
+ char s[16];
10
+ int n = rb_enc_codelen(c, enc);
11
+ rb_enc_mbcput(c, s, enc);
12
+ rb_str_buf_cat(str, s, n);
13
+ }
9
14
 
10
15
  enum {
11
- LAT_CAP_TJ = 0x106,
12
- LAT_TJ,
13
- LAT_CAP_CH = 0x10c,
14
- LAT_CH,
15
- LAT_CAP_DJ = 0x110,
16
- LAT_DJ,
17
- LAT_CAP_SH = 0x160,
18
- LAT_SH,
19
- LAT_CAP_ZH = 0x17d,
20
- LAT_ZH,
21
- CYR_CAP_DJ = 0x402,
22
- CYR_CAP_J = 0x408,
23
- CYR_CAP_LJ,
24
- CYR_CAP_NJ,
25
- CYR_CAP_TJ,
26
- CYR_CAP_DZ = 0x40f,
27
- CYR_CAP_A,
28
- CYR_CAP_ZH = 0x416,
29
- CYR_CAP_C = 0x426,
30
- CYR_CAP_CH,
31
- CYR_CAP_SH,
32
- CYR_A = 0x430,
33
- CYR_ZH = 0x436,
34
- CYR_C = 0x446,
35
- CYR_CH,
36
- CYR_SH,
37
- CYR_DJ = 0x452,
38
- CYR_J = 0x458,
39
- CYR_LJ,
40
- CYR_NJ,
41
- CYR_TJ,
42
- CYR_DZ = 0x45f
16
+ LAT_CAP_TJ=262, LAT_TJ, LAT_CAP_CH=268, LAT_CH,
17
+ LAT_CAP_DJ=272, LAT_DJ, LAT_CAP_SH=352, LAT_SH,
18
+ LAT_CAP_ZH=381, LAT_ZH, CYR_CAP_DJ=1026, CYR_CAP_J=1032,
19
+ CYR_CAP_LJ, CYR_CAP_NJ, CYR_CAP_TJ, CYR_CAP_DZ=1039,
20
+ CYR_CAP_A, CYR_CAP_B, CYR_CAP_V, CYR_CAP_G,
21
+ CYR_CAP_D, CYR_CAP_E, CYR_CAP_ZH, CYR_CAP_Z,
22
+ CYR_CAP_I, CYR_CAP_K=1050, CYR_CAP_L, CYR_CAP_M,
23
+ CYR_CAP_N, CYR_CAP_O, CYR_CAP_P, CYR_CAP_R,
24
+ CYR_CAP_S, CYR_CAP_T, CYR_CAP_U, CYR_CAP_F,
25
+ CYR_CAP_H, CYR_CAP_C, CYR_CAP_CH, CYR_CAP_SH,
26
+ CYR_A=1072, CYR_B, CYR_V, CYR_G, CYR_D,
27
+ CYR_E, CYR_ZH, CYR_Z, CYR_I, CYR_K=1082,
28
+ CYR_L, CYR_M, CYR_N, CYR_O, CYR_P,
29
+ CYR_R, CYR_S, CYR_T, CYR_U, CYR_F,
30
+ CYR_H, CYR_C, CYR_CH, CYR_SH, CYR_DJ=1106,
31
+ CYR_J=1112, CYR_LJ, CYR_NJ, CYR_TJ, CYR_DZ=1119
43
32
  };
44
33
 
45
- static inline unsigned int
46
- is_cyrillic(unsigned int c)
34
+ static inline unsigned
35
+ is_cap(unsigned codepoint)
47
36
  {
48
- return c >= CYR_CAP_DJ && c <= CYR_DZ;
37
+ if (codepoint >= 65 && codepoint <= 90) return 1;
38
+ if (codepoint >= CYR_CAP_DJ && codepoint <= CYR_CAP_SH) return 1;
39
+
40
+ switch(codepoint) {
41
+ case LAT_CAP_TJ:
42
+ case LAT_CAP_CH:
43
+ case LAT_CAP_DJ:
44
+ case LAT_CAP_SH:
45
+ case LAT_CAP_ZH:
46
+ return 1;
47
+ default:
48
+ return 0;
49
+ }
49
50
  }
50
51
 
51
- static inline unsigned int
52
- is_upper(unsigned int c)
52
+ static inline unsigned
53
+ is_digraph(unsigned codepoint)
53
54
  {
54
- return (c >= 65 && c <= 90)
55
- || (c >= CYR_CAP_DJ && c <= CYR_CAP_SH)
56
- || c == LAT_CAP_TJ
57
- || c == LAT_CAP_CH
58
- || c == LAT_CAP_DJ
59
- || c == LAT_CAP_SH
60
- || c == LAT_CAP_ZH;
55
+ switch(codepoint) {
56
+ case CYR_LJ:
57
+ case CYR_NJ:
58
+ case CYR_DZ:
59
+ case CYR_CAP_LJ:
60
+ case CYR_CAP_NJ:
61
+ case CYR_CAP_DZ:
62
+ return 1;
63
+ default:
64
+ return 0;
65
+ }
61
66
  }
62
67
 
63
- static inline unsigned int
64
- maps_directly(unsigned int c)
68
+ static unsigned
69
+ digraph_to_cyr(unsigned codepoint, unsigned codepoint2, unsigned capitalize, unsigned *next_out)
65
70
  {
66
- return c != CYR_ZH
67
- && c != CYR_CAP_ZH
68
- && ((c >= CYR_A && c <= CYR_C) || (c >= CYR_CAP_A && c <= CYR_CAP_C));
71
+ static unsigned CYR_MAP[] = {
72
+ CYR_A, CYR_B, CYR_C, CYR_D, CYR_E, CYR_F,
73
+ CYR_G, CYR_H, CYR_I, CYR_J, CYR_K, CYR_L,
74
+ CYR_M, CYR_N, CYR_O, CYR_P, 0, CYR_R,
75
+ CYR_S, CYR_T, CYR_U, CYR_V, 0, 0, 0, CYR_Z
76
+ };
77
+
78
+ static unsigned CYR_CAPS_MAP[] = {
79
+ CYR_CAP_A, CYR_CAP_B, CYR_CAP_C, CYR_CAP_D, CYR_CAP_E, CYR_CAP_F,
80
+ CYR_CAP_G, CYR_CAP_H, CYR_CAP_I, CYR_CAP_J, CYR_CAP_K, CYR_CAP_L,
81
+ CYR_CAP_M, CYR_CAP_N, CYR_CAP_O, CYR_CAP_P, 0, CYR_CAP_R,
82
+ CYR_CAP_S, CYR_CAP_T, CYR_CAP_U, CYR_CAP_V, 0, 0, 0, CYR_CAP_Z
83
+ };
84
+
85
+ if (codepoint2 == LAT_CAP_ZH || codepoint2 == LAT_ZH) {
86
+ switch (codepoint) {
87
+ case 'd': return CYR_DZ;
88
+ case 'D': return CYR_CAP_DZ;
89
+ }
90
+ }
91
+
92
+ if (codepoint2 == 'j' || codepoint2 == 'J') {
93
+ switch (codepoint) {
94
+ case 'l': return CYR_LJ;
95
+ case 'n': return CYR_NJ;
96
+ case 'L': return CYR_CAP_LJ;
97
+ case 'N': return CYR_CAP_NJ;
98
+ }
99
+ }
100
+
101
+ if (codepoint >= 'a' && codepoint <= 'z') return CYR_MAP[codepoint - 'a'];
102
+ if (codepoint >= 'A' && codepoint <= 'Z') return CYR_CAPS_MAP[codepoint - 'A'];
103
+
104
+ switch (codepoint) {
105
+ case LAT_CH: return CYR_CH;
106
+ case LAT_DJ: return CYR_DJ;
107
+ case LAT_SH: return CYR_SH;
108
+ case LAT_TJ: return CYR_TJ;
109
+ case LAT_ZH: return CYR_ZH;
110
+ case LAT_CAP_CH: return CYR_CAP_CH;
111
+ case LAT_CAP_DJ: return CYR_CAP_DJ;
112
+ case LAT_CAP_SH: return CYR_CAP_SH;
113
+ case LAT_CAP_TJ: return CYR_CAP_TJ;
114
+ case LAT_CAP_ZH: return CYR_CAP_ZH;
115
+ }
116
+
117
+ return 0;
69
118
  }
70
119
 
71
- static void
72
- str_cat_char(VALUE str, unsigned int c, rb_encoding *enc)
120
+ static unsigned
121
+ digraph_to_latin(unsigned codepoint, unsigned codepoint2, unsigned capitalize, unsigned *next_out)
73
122
  {
74
- char s[16];
75
- int n = rb_enc_codelen(c, enc);
76
- rb_enc_mbcput(c, s, enc);
77
- rb_str_buf_cat(str, s, n);
123
+ static char LAT_MAP[] = {
124
+ 'a', 'b', 'v', 'g', 'd', 'e', 0, 'z', 'i', 0, 'k', 'l',
125
+ 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c'
126
+ };
127
+
128
+ static char LAT_CAPS_MAP[] = {
129
+ 'A', 'B', 'V', 'G', 'D', 'E', 0, 'Z', 'I', 0, 'K', 'L',
130
+ 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', 'H', 'C'
131
+ };
132
+
133
+ if (codepoint < CYR_CAP_DJ || codepoint > CYR_DZ) return 0;
134
+
135
+ switch (codepoint) {
136
+ case CYR_ZH: return LAT_ZH;
137
+ case CYR_CAP_ZH: return LAT_CAP_ZH;
138
+ }
139
+
140
+ if (codepoint >= CYR_A && codepoint <= CYR_C)
141
+ return LAT_MAP[codepoint - CYR_A];
142
+
143
+ if (codepoint >= CYR_CAP_A && codepoint <= CYR_CAP_C)
144
+ return LAT_CAPS_MAP[codepoint - CYR_CAP_A];
145
+
146
+ if (codepoint >= CYR_A) {
147
+ switch (codepoint) {
148
+ case CYR_J: return 'j';
149
+ case CYR_TJ: return LAT_TJ;
150
+ case CYR_CH: return LAT_CH;
151
+ case CYR_SH: return LAT_SH;
152
+ case CYR_DJ: return LAT_DJ;
153
+ case CYR_LJ: *next_out = 'j'; return 'l';
154
+ case CYR_NJ: *next_out = 'j'; return 'n';
155
+ case CYR_DZ: *next_out = LAT_ZH; return 'd';
156
+ }
157
+ }
158
+ else {
159
+ switch (codepoint) {
160
+ case CYR_CAP_J: return 'J';
161
+ case CYR_CAP_TJ: return LAT_CAP_TJ;
162
+ case CYR_CAP_CH: return LAT_CAP_CH;
163
+ case CYR_CAP_SH: return LAT_CAP_SH;
164
+ case CYR_CAP_DJ: return LAT_CAP_DJ;
165
+ case CYR_CAP_LJ: *next_out = (capitalize || is_cap(codepoint2)) ? 'J' : 'j'; return 'L';
166
+ case CYR_CAP_NJ: *next_out = (capitalize || is_cap(codepoint2)) ? 'J' : 'j'; return 'N';
167
+ case CYR_CAP_DZ: *next_out = (capitalize || is_cap(codepoint2)) ? LAT_CAP_ZH : LAT_ZH; return 'D';
168
+ }
169
+ }
170
+
171
+ return 0;
172
+ }
173
+
174
+ static unsigned
175
+ digraph_to_ascii(unsigned codepoint, unsigned codepoint2, unsigned capitalize, unsigned *next_out)
176
+ {
177
+ switch (codepoint) {
178
+ case LAT_TJ:
179
+ case LAT_CH:
180
+ case CYR_TJ:
181
+ case CYR_CH: return 'c';
182
+ case LAT_SH:
183
+ case CYR_SH: return 's';
184
+ case LAT_ZH:
185
+ case CYR_ZH: return 'z';
186
+ case LAT_DJ:
187
+ case CYR_DJ: *next_out = 'j'; return 'd';
188
+ case LAT_CAP_TJ:
189
+ case LAT_CAP_CH:
190
+ case CYR_CAP_TJ:
191
+ case CYR_CAP_CH: return 'C';
192
+ case LAT_CAP_SH:
193
+ case CYR_CAP_SH: return 'S';
194
+ case LAT_CAP_ZH:
195
+ case CYR_CAP_ZH: return 'Z';
196
+ case LAT_CAP_DJ:
197
+ case CYR_CAP_DJ:
198
+ *next_out = (capitalize || is_cap(codepoint2)) ? 'J' : 'j'; return 'D';
199
+ case CYR_DZ:
200
+ *next_out = (capitalize || is_cap(codepoint2)) ? 'Z' : 'z'; return 'd';
201
+ case CYR_CAP_DZ:
202
+ *next_out = (capitalize || is_cap(codepoint2)) ? 'Z' : 'z'; return 'D';
203
+ default:
204
+ return digraph_to_latin(codepoint, codepoint2, capitalize, next_out);
205
+ }
78
206
  }
79
207
 
80
208
  static VALUE
81
- str_to_latin(VALUE str, int ascii, int bang)
209
+ str_to_srb(VALUE str, int strategy, int bang)
82
210
  {
83
211
  VALUE dest;
84
- long dest_len;
212
+ rb_encoding *enc;
213
+
85
214
  int len, next_len;
86
- int seen_upper = 0;
87
- int force_upper = 0;
215
+ unsigned in, in2, out, out2, seen_cap = 0;
88
216
  char *pos, *end, *seq_start = 0;
89
- char cyr;
90
- unsigned int codepoint = 0;
91
- unsigned int next_codepoint = 0;
92
- rb_encoding *enc;
93
217
 
94
- char CYR_MAP[] = {
95
- 'a', 'b', 'v', 'g', 'd', 'e', '\0', 'z', 'i', '\0', 'k',
96
- 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c'
97
- };
218
+ unsigned (*method)(unsigned, unsigned, unsigned, unsigned*);
98
219
 
99
- char CYR_CAPS_MAP[] = {
100
- 'A', 'B', 'V', 'G', 'D', 'E', '\0', 'Z', 'I', '\0', 'K',
101
- 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', 'H', 'C'
102
- };
220
+ switch(strategy) {
221
+ case 0: method = &digraph_to_cyr; break;
222
+ case 1: method = &digraph_to_latin; break;
223
+ default: method = &digraph_to_ascii;
224
+ }
103
225
 
104
226
  StringValue(str);
105
227
  pos = RSTRING_PTR(str);
@@ -107,123 +229,50 @@ str_to_latin(VALUE str, int ascii, int bang)
107
229
 
108
230
  end = RSTRING_END(str);
109
231
  enc = STR_ENC_GET(str);
110
- dest_len = RSTRING_LEN(str) + 30;
111
- dest = rb_str_buf_new(dest_len);
232
+ dest = rb_str_buf_new(RSTRING_LEN(str) + 30);
112
233
  rb_enc_associate(dest, enc);
113
234
 
114
- codepoint = rb_enc_codepoint_len(pos, end, &len, enc);
235
+ in = rb_enc_codepoint_len(pos, end, &len, enc);
115
236
 
116
237
  while (pos < end) {
117
- if (pos + len < end) {
118
- next_codepoint = rb_enc_codepoint_len(pos + len, end, &next_len, enc);
119
- }
238
+ in2 = out2 = 0;
120
239
 
121
- /* Latin -> "ASCII Latin" conversion */
122
- if (ascii && codepoint >= LAT_CAP_TJ && codepoint <= LAT_ZH) {
123
- if (seq_start) {
124
- rb_str_buf_cat(dest, seq_start, pos - seq_start);
125
- seq_start = 0;
126
- }
240
+ if (pos + len < end)
241
+ in2 = rb_enc_codepoint_len(pos + len, end, &next_len, enc);
127
242
 
128
- switch (codepoint) {
129
- case LAT_TJ:
130
- case LAT_CH: rb_str_buf_cat(dest, "c", 1); break;
131
- case LAT_DJ: rb_str_buf_cat(dest, "dj", 2); break;
132
- case LAT_SH: rb_str_buf_cat(dest, "s", 1); break;
133
- case LAT_ZH: rb_str_buf_cat(dest, "z", 1); break;
134
- case LAT_CAP_TJ:
135
- case LAT_CAP_CH: rb_str_buf_cat(dest, "C", 1); break;
136
- case LAT_CAP_SH: rb_str_buf_cat(dest, "S", 1); break;
137
- case LAT_CAP_ZH: rb_str_buf_cat(dest, "Z", 1); break;
138
- case LAT_CAP_DJ:
139
- (seen_upper || is_upper(next_codepoint))
140
- ? rb_str_buf_cat(dest, "DJ", 2)
141
- : rb_str_buf_cat(dest, "Dj", 2);
142
- break;
143
- default:
144
- rb_str_buf_cat(dest, pos, len);
145
- }
146
- }
243
+ out = (*method)(in, in2, seen_cap, &out2);
147
244
 
148
- /* Cyrillic coderange */
149
- else if (is_cyrillic(codepoint)) {
245
+ if (out) {
246
+ /* flush previous untranslatable sequence */
150
247
  if (seq_start) {
151
248
  rb_str_buf_cat(dest, seq_start, pos - seq_start);
152
249
  seq_start = 0;
153
250
  }
154
251
 
155
- if (codepoint >= CYR_A) {
156
- if (maps_directly(codepoint)) {
157
- cyr = CYR_MAP[codepoint - CYR_A];
158
- cyr ? rb_str_buf_cat(dest, &cyr, 1)
159
- : rb_str_buf_cat(dest, pos, len);
160
- }
161
- else {
162
- switch (codepoint) {
163
- case CYR_J: rb_str_buf_cat(dest, "j", 1); break;
164
- case CYR_LJ: rb_str_buf_cat(dest, "lj", 2); break;
165
- case CYR_NJ: rb_str_buf_cat(dest, "nj", 2); break;
166
- case CYR_DJ: STR_CAT_COND_ASCII(ascii, dest, "dj", LAT_DJ, 2, enc); break;
167
- case CYR_TJ: STR_CAT_COND_ASCII(ascii, dest, "c", LAT_TJ, 1, enc); break;
168
- case CYR_CH: STR_CAT_COND_ASCII(ascii, dest, "c", LAT_CH, 1, enc); break;
169
- case CYR_SH: STR_CAT_COND_ASCII(ascii, dest, "s", LAT_SH, 1, enc); break;
170
- case CYR_ZH: STR_CAT_COND_ASCII(ascii, dest, "z", LAT_ZH, 1, enc); break;
171
- case CYR_DZ:
172
- rb_str_buf_cat(dest, "d", 1);
173
- STR_CAT_COND_ASCII(ascii, dest, "z", LAT_ZH, 1, enc);
174
- break;
175
- default:
176
- rb_str_buf_cat(dest, pos, len);
177
- }
178
- }
179
- }
180
- else {
181
- if (maps_directly(codepoint)) {
182
- cyr = CYR_CAPS_MAP[codepoint - CYR_CAP_A];
183
- cyr ? rb_str_buf_cat(dest, &cyr, 1)
184
- : rb_str_buf_cat(dest, pos, len);
185
- }
186
- else {
187
- force_upper = seen_upper || is_upper(next_codepoint);
188
-
189
- switch (codepoint) {
190
- case CYR_CAP_J: rb_str_buf_cat(dest, "J", 1); break;
191
- case CYR_CAP_LJ: rb_str_buf_cat(dest, (force_upper ? "LJ" : "Lj"), 2); break;
192
- case CYR_CAP_NJ: rb_str_buf_cat(dest, (force_upper ? "NJ" : "Nj"), 2); break;
193
- case CYR_CAP_TJ: STR_CAT_COND_ASCII(ascii, dest, "C", LAT_CAP_TJ, 1, enc); break;
194
- case CYR_CAP_CH: STR_CAT_COND_ASCII(ascii, dest, "C", LAT_CAP_CH, 1, enc); break;
195
- case CYR_CAP_SH: STR_CAT_COND_ASCII(ascii, dest, "S", LAT_CAP_SH, 1, enc); break;
196
- case CYR_CAP_ZH: STR_CAT_COND_ASCII(ascii, dest, "Z", LAT_CAP_ZH, 1, enc); break;
197
- case CYR_CAP_DJ: STR_CAT_COND_ASCII(ascii, dest, (force_upper ? "DJ" : "Dj"), LAT_CAP_DJ, 2, enc); break;
198
- case CYR_CAP_DZ:
199
- rb_str_buf_cat(dest, "D", 1);
200
- force_upper ? STR_CAT_COND_ASCII(ascii, dest, "Z", LAT_CAP_ZH, 1, enc)
201
- : STR_CAT_COND_ASCII(ascii, dest, "z", LAT_ZH, 1, enc);
202
- break;
203
- default:
204
- rb_str_buf_cat(dest, pos, len);
205
- }
206
- }
207
- }
252
+ _str_cat_char(dest, out, enc);
253
+ if (out2) _str_cat_char(dest, out2, enc);
208
254
  }
209
- else {
210
- /* Mark the start of a copyable sequence */
211
- if (!seq_start) seq_start = pos;
255
+ else if (!seq_start) {
256
+ /* mark the beginning of an untranslatable sequence */
257
+ seq_start = pos;
258
+ }
259
+
260
+ /* for cyrillic output, skip the second half of an input digraph */
261
+ if (strategy == 0 && is_digraph(out)) {
262
+ pos += next_len;
263
+ if (pos + len < end)
264
+ in2 = rb_enc_codepoint_len(pos + len, end, &next_len, enc);
212
265
  }
213
266
 
214
- seen_upper = is_upper(codepoint);
267
+ seen_cap = is_cap(in);
215
268
 
216
269
  pos += len;
217
270
  len = next_len;
218
-
219
- codepoint = next_codepoint;
220
- next_codepoint = 0;
271
+ in = in2;
221
272
  }
222
273
 
223
- /* Flush the last sequence, if any */
224
- if (seq_start) {
225
- rb_str_buf_cat(dest, seq_start, pos - seq_start);
226
- }
274
+ /* flush final sequence */
275
+ if (seq_start) rb_str_buf_cat(dest, seq_start, pos - seq_start);
227
276
 
228
277
  if (bang) {
229
278
  rb_str_shared_replace(str, dest);
@@ -237,7 +286,35 @@ str_to_latin(VALUE str, int ascii, int bang)
237
286
  }
238
287
 
239
288
  /**
240
- * Returns a copy of <i>str</i> with the Serbian Cyrillic characters
289
+ * Returns a copy of <i>str</i> with Latin characters transliterated
290
+ * into Serbian Cyrillic.
291
+ *
292
+ * @overload to_cyrillic(str)
293
+ * @param [String] str text to be transliterated
294
+ * @return [String] transliterated text
295
+ */
296
+ static VALUE
297
+ rb_str_to_cyrillic(VALUE self, VALUE str)
298
+ {
299
+ return str_to_srb(str, 0, 0);
300
+ }
301
+
302
+ /**
303
+ * Performs transliteration of <code>Byk.to_cyrillic</code> in place,
304
+ * returning <i>str</i>, whether any changes were made or not.
305
+ *
306
+ * @overload to_cyrillic!(str)
307
+ * @param [String] str text to be transliterated
308
+ * @return [String] transliterated text
309
+ */
310
+ static VALUE
311
+ rb_str_to_cyrillic_bang(VALUE self, VALUE str)
312
+ {
313
+ return str_to_srb(str, 0, 1);
314
+ }
315
+
316
+ /**
317
+ * Returns a copy of <i>str</i> with Serbian Cyrillic characters
241
318
  * transliterated into Latin.
242
319
  *
243
320
  * @overload to_latin(str)
@@ -247,12 +324,12 @@ str_to_latin(VALUE str, int ascii, int bang)
247
324
  static VALUE
248
325
  rb_str_to_latin(VALUE self, VALUE str)
249
326
  {
250
- return str_to_latin(str, 0, 0);
327
+ return str_to_srb(str, 1, 0);
251
328
  }
252
329
 
253
330
  /**
254
- * Performs the transliteration of <code>Byk.to_latin</code> in place,
255
- * returning <i>str</i>, whether changes were made or not.
331
+ * Performs transliteration of <code>Byk.to_latin</code> in place,
332
+ * returning <i>str</i>, whether any changes were made or not.
256
333
  *
257
334
  * @overload to_latin!(str)
258
335
  * @param [String] str text to be transliterated
@@ -261,12 +338,12 @@ rb_str_to_latin(VALUE self, VALUE str)
261
338
  static VALUE
262
339
  rb_str_to_latin_bang(VALUE self, VALUE str)
263
340
  {
264
- return str_to_latin(str, 0, 1);
341
+ return str_to_srb(str, 1, 1);
265
342
  }
266
343
 
267
344
  /**
268
- * Returns a copy of <i>str</i> with the Serbian Cyrillic
269
- * characters transliterated into ASCII Latin.
345
+ * Returns a copy of <i>str</i> with Serbian characters transliterated
346
+ * into ASCII Latin.
270
347
  *
271
348
  * @overload to_ascii_latin(str)
272
349
  * @param [String] str text to be transliterated
@@ -275,12 +352,12 @@ rb_str_to_latin_bang(VALUE self, VALUE str)
275
352
  static VALUE
276
353
  rb_str_to_ascii_latin(VALUE self, VALUE str)
277
354
  {
278
- return str_to_latin(str, 1, 0);
355
+ return str_to_srb(str, 2, 0);
279
356
  }
280
357
 
281
358
  /**
282
- * Performs the transliteration of <code>Byk.to_ascii_latin</code> in
283
- * place, returning <i>str</i>, whether changes were made or not.
359
+ * Performs transliteration of <code>Byk.to_ascii_latin</code> in
360
+ * place, returning <i>str</i>, whether any changes were made or not.
284
361
  *
285
362
  * @overload to_ascii_latin!(str)
286
363
  * @param [String] str text to be transliterated
@@ -289,12 +366,14 @@ rb_str_to_ascii_latin(VALUE self, VALUE str)
289
366
  static VALUE
290
367
  rb_str_to_ascii_latin_bang(VALUE self, VALUE str)
291
368
  {
292
- return str_to_latin(str, 1, 1);
369
+ return str_to_srb(str, 2, 1);
293
370
  }
294
371
 
295
372
  void Init_byk_native(void)
296
373
  {
297
374
  VALUE Byk = rb_define_module("Byk");
375
+ rb_define_singleton_method(Byk, "to_cyrillic", rb_str_to_cyrillic, 1);
376
+ rb_define_singleton_method(Byk, "to_cyrillic!", rb_str_to_cyrillic_bang, 1);
298
377
  rb_define_singleton_method(Byk, "to_latin", rb_str_to_latin, 1);
299
378
  rb_define_singleton_method(Byk, "to_latin!", rb_str_to_latin_bang, 1);
300
379
  rb_define_singleton_method(Byk, "to_ascii_latin", rb_str_to_ascii_latin, 1);
@@ -1,3 +1,3 @@
1
1
  module Byk
2
- VERSION = "0.6.0"
2
+ VERSION = "1.0.0"
3
3
  end
@@ -1,5 +1,4 @@
1
1
  # coding: utf-8
2
-
3
2
  require "spec_helper"
4
3
 
5
4
  describe Byk do
@@ -24,70 +23,114 @@ describe Byk do
24
23
  let(:non_serbian_cyrillic) { non_serbian_cyrillic_coderange.join }
25
24
 
26
25
  let(:ascii) { "The quick brown fox jumps over the lazy dog." }
27
- let(:other) { "संस्कृतम् saṃskṛtam" }
26
+ let(:other) { "संस्कृतम्" }
28
27
 
29
- let(:mixed) { "संस्कृतम् saṃskṛtam илити Sanskrit, obrati ПАЖЊУ." }
30
- let(:mixed_latin) { "संस्कृतम् saṃskṛtam iliti Sanskrit, obrati PAŽNJU." }
31
- let(:mixed_ascii_latin) { "संस्कृतम् saṃskṛtam iliti Sanskrit, obrati PAZNJU." }
28
+ let(:mixed) { "संस्कृतम् илити Sanskrit, obrati ПАЖЊУ." }
29
+ let(:mixed_cyrillic) { "संस्कृतम् илити Санскрит, обрати ПАЖЊУ." }
30
+ let(:mixed_latin) { "संस्कृतम् iliti Sanskrit, obrati PAŽNJU." }
31
+ let(:mixed_ascii_latin) { "संस्कृतम् iliti Sanskrit, obrati PAZNJU." }
32
32
 
33
- it "doesn't convert an empty string" do
33
+ it "doesn't translate an empty string" do
34
34
  expect(Byk.send(method, "")).to eq ""
35
35
  end
36
36
 
37
- it "doesn't convert ASCII text" do
38
- expect(Byk.send(method, ascii)).to eq ascii
37
+ it "doesn't translate foreign coderanges" do
38
+ expect(Byk.send(method, other)).to eq other
39
39
  end
40
+ end
40
41
 
41
- it "doesn't convert non-Serbian Cyrillic" do
42
+ shared_examples :cyrillization_method do |method|
43
+ include_examples :base, method
44
+
45
+ let(:edge_cases) do
46
+ [
47
+ ["lJ", "љ"],
48
+ ["nJ", "њ"],
49
+ ["dŽ", "џ"]
50
+ ]
51
+ end
52
+
53
+ it "doesn't translate Cyrillic" do
54
+ expect(Byk.send(method, pangram)).to eq pangram
55
+ end
56
+
57
+ it "doesn't translate non-Serbian Cyrillic" do
42
58
  expect(Byk.send(method, non_serbian_cyrillic)).to eq non_serbian_cyrillic
43
59
  end
44
60
 
45
- it "doesn't convert other coderanges" do
46
- expect(Byk.send(method, other)).to eq other
61
+ it "translates Latin to Cyrillic" do
62
+ expect(Byk.send(method, pangram_latin)).to eq pangram
63
+ end
64
+
65
+ it "translates Latin caps to Cyrillic caps" do
66
+ expect(Byk.send(method, pangram_latin_caps)).to eq pangram_caps
67
+ end
68
+
69
+ it "translates mixed text properly" do
70
+ expect(Byk.send(method, mixed)).to eq mixed_cyrillic
71
+ end
72
+
73
+ it "translates edge cases properly" do
74
+ edge_cases.each do |input, output|
75
+ expect(Byk.send(method, input)).to eq output
76
+ end
77
+ end
78
+
79
+ it "translates ABECEDA to AZBUKA" do
80
+ expect(Byk::ABECEDA.map { |l| l.dup.send(:to_cyrillic) }).to match_array(Byk::AZBUKA)
81
+ end
82
+
83
+ it "translates ABECEDA_CAPS to AZBUKA_CAPS" do
84
+ expect(Byk::ABECEDA_CAPS.map { |l| l.dup.send(:to_cyrillic) }).to match_array(Byk::AZBUKA_CAPS)
47
85
  end
48
86
  end
49
87
 
50
88
  shared_examples :latinization_method do |method|
51
89
  include_examples :base, method
52
90
 
53
- let(:edge_cases) {
91
+ let(:edge_cases) do
54
92
  [
55
- ["Њ", "Nj"],
56
- ["Љ", "Lj"],
57
- ["Џ", "Dž"],
58
- ["ЊЊ", "NJNJ"],
59
93
  ["ЉЉ", "LJLJ"],
94
+ ["ЊЊ", "NJNJ"],
60
95
  ["ЏЏ", "DŽDŽ"]
61
96
  ]
62
- }
97
+ end
63
98
 
64
- it "doesn't convert Latin" do
99
+ it "doesn't translate ASCII" do
100
+ expect(Byk.send(method, ascii)).to eq ascii
101
+ end
102
+
103
+ it "doesn't translate Latin" do
65
104
  expect(Byk.send(method, pangram_latin)).to eq pangram_latin
66
105
  end
67
106
 
68
- it "converts Cyrillic to Latin" do
107
+ it "doesn't translate non-Serbian Cyrillic" do
108
+ expect(Byk.send(method, non_serbian_cyrillic)).to eq non_serbian_cyrillic
109
+ end
110
+
111
+ it "translates Cyrillic to Latin" do
69
112
  expect(Byk.send(method, pangram)).to eq pangram_latin
70
113
  end
71
114
 
72
- it "converts Cyrillic caps to Latin caps" do
115
+ it "translates Cyrillic caps to Latin caps" do
73
116
  expect(Byk.send(method, pangram_caps)).to eq pangram_latin_caps
74
117
  end
75
118
 
76
- it "converts mixed text properly" do
119
+ it "translates mixed text properly" do
77
120
  expect(Byk.send(method, mixed)).to eq mixed_latin
78
121
  end
79
122
 
80
- it "converts edge cases properly" do
123
+ it "translates edge cases properly" do
81
124
  edge_cases.each do |input, output|
82
125
  expect(Byk.send(method, input)).to eq output
83
126
  end
84
127
  end
85
128
 
86
- it "converts AZBUKA to ABECEDA" do
129
+ it "translates AZBUKA to ABECEDA" do
87
130
  expect(Byk::AZBUKA.map { |l| l.dup.send(method) }).to match_array(Byk::ABECEDA)
88
131
  end
89
132
 
90
- it "converts AZBUKA_CAPS to ABECEDA_CAPS" do
133
+ it "translates AZBUKA_CAPS to ABECEDA_CAPS" do
91
134
  expect(Byk::AZBUKA_CAPS.map { |l| l.dup.send(method) }).to match_array(Byk::ABECEDA_CAPS)
92
135
  end
93
136
  end
@@ -95,7 +138,7 @@ describe Byk do
95
138
  shared_examples :ascii_latinization_method do |method|
96
139
  include_examples :base, method
97
140
 
98
- let(:edge_cases) {
141
+ let(:edge_cases) do
99
142
  [
100
143
  ["Њ", "Nj"],
101
144
  ["Љ", "Lj"],
@@ -107,32 +150,36 @@ describe Byk do
107
150
  ["ЏЏ", "DZDZ"],
108
151
  ["ЂЂ", "DJDJ"],
109
152
  ["ĐĐ", "DJDJ"],
110
- ["ЂУРАЂ Ђорђевић", "DJURADJ Djordjevic"],
111
- ["ĐURAĐ Đorđević", "DJURADJ Djordjevic"]
153
+ ["ЂУРАЂ Ђурђевић", "DJURADJ Djurdjevic"],
154
+ ["ĐURAĐ Đurđević", "DJURADJ Djurdjevic"]
112
155
  ]
113
- }
114
-
115
- it "converts Cyrillic to ASCII Latin" do
116
- expect(Byk.send(method, pangram)).to eq pangram_ascii_latin
117
156
  end
118
157
 
119
- it "converts Cyrillic caps to ASCII Latin caps" do
120
- expect(Byk.send(method, pangram_caps)).to eq pangram_ascii_latin_caps
158
+ it "doesn't translate ASCII" do
159
+ expect(Byk.send(method, ascii)).to eq ascii
121
160
  end
122
161
 
123
- it "converts Latin to ASCII Latin" do
162
+ it "translates Latin to ASCII Latin" do
124
163
  expect(Byk.send(method, pangram_latin)).to eq pangram_ascii_latin
125
164
  end
126
165
 
127
- it "converts Latin caps to ASCII Latin caps" do
166
+ it "translates Latin caps to ASCII Latin caps" do
128
167
  expect(Byk.send(method, pangram_latin_caps)).to eq pangram_ascii_latin_caps
129
168
  end
130
169
 
131
- it "converts mixed text properly" do
170
+ it "translates Cyrillic to ASCII Latin" do
171
+ expect(Byk.send(method, pangram)).to eq pangram_ascii_latin
172
+ end
173
+
174
+ it "translates Cyrillic caps to ASCII Latin caps" do
175
+ expect(Byk.send(method, pangram_caps)).to eq pangram_ascii_latin_caps
176
+ end
177
+
178
+ it "translates mixed text properly" do
132
179
  expect(Byk.send(method, mixed)).to eq mixed_ascii_latin
133
180
  end
134
181
 
135
- it "converts edge cases properly" do
182
+ it "translates edge cases properly" do
136
183
  edge_cases.each do |input, output|
137
184
  expect(Byk.send(method, input)).to eq output
138
185
  end
@@ -141,18 +188,28 @@ describe Byk do
141
188
 
142
189
  shared_examples :non_destructive_method do |method|
143
190
  it "doesn't modify the arg" do
144
- str = "Ж"
191
+ str = "ЖŽ"
145
192
  expect { Byk.send(method, str) }.to_not change { str }
146
193
  end
147
194
  end
148
195
 
149
196
  shared_examples :destructive_method do |method|
150
197
  it "modifies the arg" do
151
- str = "Ж"
198
+ str = "ЖŽ"
152
199
  expect { Byk.send(method, str) }.to change { str }
153
200
  end
154
201
  end
155
202
 
203
+ describe ".to_cyrillic" do
204
+ it_behaves_like :cyrillization_method, :to_cyrillic
205
+ it_behaves_like :non_destructive_method, :to_cyrillic
206
+ end
207
+
208
+ describe ".to_cyrillic!" do
209
+ it_behaves_like :cyrillization_method, :to_cyrillic!
210
+ it_behaves_like :destructive_method, :to_cyrillic!
211
+ end
212
+
156
213
  describe ".to_latin" do
157
214
  it_behaves_like :latinization_method, :to_latin
158
215
  it_behaves_like :non_destructive_method, :to_latin
@@ -176,7 +233,7 @@ end
176
233
 
177
234
  describe String do
178
235
  it "responds to Byk methods" do
179
- Byk.instance_methods.each do |method|
236
+ Byk.singleton_methods.each do |method|
180
237
  expect("").to respond_to(method)
181
238
  end
182
239
  end
metadata CHANGED
@@ -1,15 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: byk
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.0
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Nikola Topalović
8
8
  autorequire:
9
- bindir: bin
9
+ bindir: exe
10
10
  cert_chain: []
11
- date: 2015-04-25 00:00:00.000000000 Z
11
+ date: 2016-04-09 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rake
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '10.5'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '10.5'
13
27
  - !ruby/object:Gem::Dependency
14
28
  name: rake-compiler
15
29
  requirement: !ruby/object:Gem::Requirement
@@ -38,10 +52,11 @@ dependencies:
38
52
  - - "~>"
39
53
  - !ruby/object:Gem::Version
40
54
  version: '3.2'
41
- description: Provides C-optimized methods for transliteration of Serbian Cyrillic
42
- into Latin.
55
+ description: Fast transliteration of Serbian Cyrillic to Latin and back. Brzo preslovljavanje
56
+ ćirilice u latinicu i obratno.
43
57
  email: nikola.topalovic@gmail.com
44
- executables: []
58
+ executables:
59
+ - byk
45
60
  extensions:
46
61
  - ext/byk/extconf.rb
47
62
  extra_rdoc_files: []
@@ -49,6 +64,7 @@ files:
49
64
  - CHANGELOG.md
50
65
  - LICENSE
51
66
  - README.md
67
+ - exe/byk
52
68
  - ext/byk/byk.c
53
69
  - ext/byk/extconf.rb
54
70
  - lib/byk.rb
@@ -76,9 +92,10 @@ required_rubygems_version: !ruby/object:Gem::Requirement
76
92
  version: '0'
77
93
  requirements: []
78
94
  rubyforge_project:
79
- rubygems_version: 2.4.5
95
+ rubygems_version: 2.5.1
80
96
  signing_key:
81
97
  specification_version: 4
82
- summary: Fast transliteration of Serbian Cyrillic into Latin.
98
+ summary: Fast transliteration of Serbian Cyrillic to Latin and back. Brzo preslovljavanje
99
+ ćirilice u latinicu i obratno.
83
100
  test_files:
84
101
  - spec/byk_spec.rb