smstools 0.0.1 → 0.2.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/CHANGELOG.md +23 -1
- data/README.md +152 -21
- data/Rakefile +6 -7
- data/lib/assets/javascripts/sms_tools/message.js.coffee +25 -4
- data/lib/sms_tools.rb +21 -0
- data/lib/sms_tools/encoding_detection.rb +25 -6
- data/lib/sms_tools/gsm_encoding.rb +13 -6
- data/lib/sms_tools/unicode_encoding.rb +15 -0
- data/lib/sms_tools/version.rb +1 -1
- data/spec/sms_tools/encoding_detection_spec.rb +95 -12
- data/spec/sms_tools/gsm_encoding_spec.rb +36 -0
- metadata +9 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: f2cecee4608c47f5abf1cf0a980b3a3a646e358d50a72e6b0f1931f554f86c5f
|
4
|
+
data.tar.gz: 46eae0938780419f4672581f4b1105a8d51cc4fe7f6150b2e44fdc3c00f16c6e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6f40d959431dc1185a989b179c91858363978b5200bba7504e499b214ba8b1493c859eebb3308b343ac8000ec747db89ccfbc664692721315bb68c41f96000a0
|
7
|
+
data.tar.gz: 7670ac023de1612cd5e4573ad4879526e5f45e38c871416cef3466e4bbee6d1740d0b2ae760b907d57811f913670862e2907b3ed3c8f268b876085195bed2a61
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,25 @@
|
|
1
|
-
## 0.
|
1
|
+
## 0.2.2 (20 Jan 2021)
|
2
|
+
|
3
|
+
* #9 Fix the way some complex Unicode characters (like composite emojis) are counted. Thanks to @bryanrite for the neat implementation. Note the fix could be **potentially backwards-incompatible** if you were relying on the incorrect behaviour previously. Technically it's still a bug fix.
|
4
|
+
|
5
|
+
## 0.2.1 (18 Aug 2020)
|
6
|
+
|
7
|
+
* #7 Introduce `SmsTools.use_ascii_encoding` option (defaults to `true` for backwards-compatibility) that allows disabling the `:ascii` workaround encoding. See #6 and #7 for details. Thanks @kingsley-wang.
|
8
|
+
|
9
|
+
## 0.2.0 (2 March 2017)
|
10
|
+
|
11
|
+
* The non-breaking space character (0x00A0 in Unicode and "\xC2\xA0" in UTF-8) is no longer regarded as a valid GSM 7-bit symbol. [#4](https://github.com/livebg/smstools/issues/4)
|
12
|
+
* GsmEncoding.to_utf8 will now raise errors in case the provided argument is not a valid GSM 7-bit text.
|
13
|
+
|
14
|
+
## 0.1.1 (18 April 2016)
|
15
|
+
|
16
|
+
* Replaces small c with cedilla to capital one, as per the GSM 03.38 standard (by @skliask)
|
17
|
+
|
18
|
+
## 0.1.0 (08 October 2015)
|
19
|
+
|
20
|
+
* distinguish between ascii encoding and gsm encoding
|
21
|
+
* add option for preventing the use of gsm encoding, that is to use unicode instead
|
22
|
+
|
23
|
+
## 0.0.1 (17 January 2014)
|
2
24
|
|
3
25
|
* Initial release.
|
data/README.md
CHANGED
@@ -1,23 +1,80 @@
|
|
1
1
|
# Sms Tools
|
2
2
|
|
3
|
-
A small collection of
|
4
|
-
|
3
|
+
A small collection of Ruby and JavaScript classes implementing often needed functionality for
|
4
|
+
dealing with SMS messages.
|
5
5
|
|
6
|
-
The gem
|
7
|
-
|
6
|
+
The gem can also be used in a Rails application as an engine. It integrates with the asset pipeline
|
7
|
+
and gives you access to some client-side SMS manipulation functionality.
|
8
8
|
|
9
9
|
## Features
|
10
10
|
|
11
11
|
The following features are available on both the server side and the client
|
12
12
|
side:
|
13
13
|
|
14
|
-
- Detection of the most optimal encoding for sending an SMS message (GSM 7-bit
|
15
|
-
|
16
|
-
- Correctly determining the message's length according to the most optimal
|
17
|
-
encoding.
|
14
|
+
- Detection of the most optimal encoding for sending an SMS message (GSM 7-bit or Unicode).
|
15
|
+
- Correctly determining a message's length in the most optimal encoding.
|
18
16
|
- Concatenation detection and concatenated message parts counting.
|
19
17
|
|
20
|
-
|
18
|
+
The following can be accomplished only on the server with Ruby:
|
19
|
+
|
20
|
+
- Converting a UTF-8 string to a GSM 7-bit encoding and vice versa.
|
21
|
+
- Detecting if a UTF-8 string can be safely represented in a GSM 7-bit encoding.
|
22
|
+
- Detection of double-byte chars in the GSM 7-bit encoding.
|
23
|
+
|
24
|
+
And possibly more.
|
25
|
+
|
26
|
+
### Note on the GSM encoding
|
27
|
+
|
28
|
+
All references to the "GSM" encoding or the "GSM 7-bit alphabet" in this text actually refer to the
|
29
|
+
[GSM 03.38 spec](http://en.wikipedia.org/wiki/GSM_03.38) and [its latest
|
30
|
+
version](ftp://ftp.unicode.org/Public/MAPPINGS/ETSI/GSM0338.TXT), as defined by the Unicode
|
31
|
+
consortium.
|
32
|
+
|
33
|
+
This encoding is the most widely used one when sending SMS messages.
|
34
|
+
|
35
|
+
### Note regarding non-ASCII symbols from the GSM encoding
|
36
|
+
|
37
|
+
The GSM 03.38 encoding is used by default. This standard defines a set of
|
38
|
+
symbols which can be encoded in 7-bits each, thus allowing up to 160 symbols
|
39
|
+
per SMS message (each SMS message can contain up to 140 bytes of data).
|
40
|
+
|
41
|
+
This standard covers most of the ASCII table, but also includes some non-ASCII
|
42
|
+
symbols such as `æ`, `ø` and `å`. If you use these in your messages, you can
|
43
|
+
still send them as GSM encoded, having a 160-symbol limit. This is technically
|
44
|
+
correct.
|
45
|
+
|
46
|
+
In reality, however, some SMS routes have problems delivering messages which
|
47
|
+
contain such non-ASCII symbols in the GSM encoding. The special symbols might
|
48
|
+
be omitted, or the message might not arrive at all.
|
49
|
+
|
50
|
+
Thus, it might be safer to just send messages in Unicode if the message's text
|
51
|
+
contains any non-ASCII symbols. This is not the default as it reduces the max
|
52
|
+
symbols count to 70 per message, instead of 160, and you might not have any
|
53
|
+
issues with GSM-encoded messages. In case you do, however, you can turn off
|
54
|
+
support for the GSM encoding and just treat messages as Unicode if they contain
|
55
|
+
non-ASCII symbols.
|
56
|
+
|
57
|
+
In case you decide to do so, you have to specify it in both the Ruby and the
|
58
|
+
JavaScript part of the library, like so:
|
59
|
+
|
60
|
+
#### In Ruby
|
61
|
+
|
62
|
+
SmsTools.use_gsm_encoding = false
|
63
|
+
|
64
|
+
#### In Javascript
|
65
|
+
|
66
|
+
//= require sms_tools
|
67
|
+
SmsTools.use_gsm_encoding = false;
|
68
|
+
|
69
|
+
There is another alternative as well. As explained in this commit – f1ffd948d4b8c – SmsTools will by
|
70
|
+
default detect the encoding as `:ascii` if the SMS message contains ASCII-only symbols. The safest
|
71
|
+
way to send messages would be to use an ASCII subset of the GSM encodnig.
|
72
|
+
|
73
|
+
The `:ascii` encoding is informative only, however. Your SMS sending implementation will have to
|
74
|
+
decide how to handle it. You may also find it confusing that the dummy `:ascii` encoding does not
|
75
|
+
consider double-byte chars at all when counting the length of the message.
|
76
|
+
|
77
|
+
To disable this dummy `:ascii` encoding, set `SmsTools.use_ascii_encoding` to `false`.
|
21
78
|
|
22
79
|
## Installation
|
23
80
|
|
@@ -33,32 +90,99 @@ Or install it yourself as:
|
|
33
90
|
|
34
91
|
$ gem install smstools
|
35
92
|
|
93
|
+
If you're using the gem in Rails, you may also want to add the following to your `application.js`
|
94
|
+
manifest file to gain access to the client-side features:
|
95
|
+
|
96
|
+
//= require sms_tools
|
97
|
+
|
36
98
|
## Usage
|
37
99
|
|
38
100
|
The gem consists of both server-side (Ruby) and client-side classes. You can
|
39
|
-
use either
|
101
|
+
use either.
|
40
102
|
|
41
103
|
### Server-side code
|
42
104
|
|
43
|
-
|
44
|
-
|
105
|
+
First make sure you have installed the gem and have required the appropriate files.
|
106
|
+
|
107
|
+
#### Encoding detection
|
108
|
+
|
109
|
+
The `SmsTools::EncodingDetection` class provides you with a few simple methods to detect the most
|
110
|
+
optimal encoding for sending an SMS message, to correctly caclulate its length in that encoding and
|
111
|
+
to see if the text would need to be concatenated or will fit in a single message.
|
112
|
+
|
113
|
+
Here is an example with a non-concatenated message which is best encoded in the GSM 7-bit alphabet:
|
114
|
+
|
115
|
+
```ruby
|
116
|
+
sms_text = 'Text in GSM 03.38: ÄäøÆ with a double-byte char: ~ '
|
117
|
+
sms_encoding = SmsTools::EncodingDetection.new sms_text
|
118
|
+
|
119
|
+
sms_encoding.gsm? # => true
|
120
|
+
sms_encoding.unicode? # => false
|
121
|
+
sms_encoding.length # => 52 (because of the double-byte char)
|
122
|
+
sms_encoding.concatenated? # => false
|
123
|
+
sms_encoding.concatenated_parts # => 1
|
124
|
+
sms_encoding.encoding # => :gsm
|
125
|
+
```
|
126
|
+
|
127
|
+
Here's another example with a concatenated Unicode message:
|
128
|
+
|
129
|
+
```ruby
|
130
|
+
sms_text = 'Я' * 90
|
131
|
+
sms_encoding = SmsTools::EncodingDetection.new sms_text
|
132
|
+
|
133
|
+
sms_encoding.gsm? # => false
|
134
|
+
sms_encoding.unicode? # => true
|
135
|
+
sms_encoding.length # => 90
|
136
|
+
sms_encoding.concatenated? # => true
|
137
|
+
sms_encoding.concatenated_parts # => 2
|
138
|
+
sms_encoding.encoding # => :unicode
|
139
|
+
```
|
140
|
+
|
141
|
+
You can check the specs for this class for more examples.
|
45
142
|
|
46
|
-
####
|
47
|
-
|
143
|
+
#### GSM 03.38 encoding conversion
|
144
|
+
|
145
|
+
The `SmsTools::GsmEncoding` class can be used to check if a given UTF-8 string can be fully
|
146
|
+
represented in the GSM 03.38 encoding as well as to convert from UTF-8 to GSM 03.38 and vice-versa.
|
147
|
+
|
148
|
+
The main API this class provides is the following:
|
149
|
+
|
150
|
+
```ruby
|
151
|
+
SmsTools::GsmEncoding.valid? message_text_in_utf8 # => true or false
|
152
|
+
|
153
|
+
SmsTools::GsmEncoding.from_utf8 utf8_encoded_string # => a GSM 03.38 encoded string
|
154
|
+
SmsTools::GsmEncoding.to_utf8 gsm_encoded_string # => an UTF-8 encoded string
|
155
|
+
```
|
156
|
+
|
157
|
+
Check out the source code of the class to find out more.
|
48
158
|
|
49
159
|
### Client-side code
|
50
160
|
|
51
|
-
If you're using the gem in Rails 3.
|
52
|
-
|
161
|
+
If you're using the gem in Rails 3.1 or newer, you can gain access to the `SmsTools.Message` class.
|
162
|
+
Its interface is similar to the one of `SmsTools::EncodingDetection`. Here is an example in
|
163
|
+
CoffeeScript:
|
53
164
|
|
54
|
-
|
165
|
+
```coffeescript
|
166
|
+
message = new SmsTools.Message 'The text of the message: ~'
|
55
167
|
|
56
|
-
|
168
|
+
message.encoding # => 'gsm'
|
169
|
+
message.length # => 27
|
170
|
+
message.concatenatedPartsCount # => 1
|
171
|
+
```
|
57
172
|
|
58
|
-
|
173
|
+
You can also check how long can this message be in the current most optimal encoding, if we want to
|
174
|
+
limit the number of concatenated messages we will allow to be sent:
|
59
175
|
|
60
|
-
|
61
|
-
|
176
|
+
```coffeescript
|
177
|
+
maxConcatenatedPartsCount = 2
|
178
|
+
message.maxLengthFor(maxConcatenatedPartsCount) # => 306
|
179
|
+
```
|
180
|
+
|
181
|
+
This allows you to have a dynamic instead of a fixed length limit, for when you use a non-GSM 03.38
|
182
|
+
symbol in your text, your message length limit decreases significantly.
|
183
|
+
|
184
|
+
Note that to use this client-side code, a Rails application with an active asset pipeline is
|
185
|
+
assumed. It might be possible to use it in other setups as well, but you're on your own there.
|
62
186
|
|
63
187
|
## Contributing
|
64
188
|
|
@@ -69,3 +193,10 @@ CoffeeScript preprocessor set up.
|
|
69
193
|
5. Commit your changes (`git commit -am 'Add some feature'`)
|
70
194
|
6. Push to the branch (`git push origin my-new-feature`)
|
71
195
|
7. Send a pull request.
|
196
|
+
|
197
|
+
## Publishing a new version
|
198
|
+
|
199
|
+
1. Pick a version number according to Semantic Versioning.
|
200
|
+
2. Update `CHANGELOG.md`, `version.rb` and potentially this readme.
|
201
|
+
3. Commit the changes, tag them with `vX.Y.Z` (e.g. `v0.2.1`) and push all with `git push --tags`.
|
202
|
+
4. Build and publish the new version of the gem with `gem build smstools.gemspec && gem push *.gem`.
|
data/Rakefile
CHANGED
@@ -1,11 +1,10 @@
|
|
1
1
|
require 'bundler/gem_tasks'
|
2
|
+
require 'rake/testtask'
|
2
3
|
|
3
|
-
task :test
|
4
|
-
test_files = Dir[File.expand_path('../spec/**/*_spec.rb', __FILE__)]
|
5
|
-
command = "ruby -Ispec #{test_files.join ' '}"
|
4
|
+
task default: :test
|
6
5
|
|
7
|
-
|
8
|
-
|
6
|
+
Rake::TestTask.new do |t|
|
7
|
+
t.libs << 'spec'
|
8
|
+
t.test_files = FileList['spec/**/*_spec.rb']
|
9
|
+
t.verbose = true
|
9
10
|
end
|
10
|
-
|
11
|
-
task default: :test
|
@@ -2,6 +2,9 @@ window.SmsTools ?= {}
|
|
2
2
|
|
3
3
|
class SmsTools.Message
|
4
4
|
maxLengthForEncoding:
|
5
|
+
ascii:
|
6
|
+
normal: 160
|
7
|
+
concatenated: 153
|
5
8
|
gsm:
|
6
9
|
normal: 160
|
7
10
|
concatenated: 153
|
@@ -20,6 +23,7 @@ class SmsTools.Message
|
|
20
23
|
'€': true
|
21
24
|
'\\': true
|
22
25
|
|
26
|
+
asciiPattern: /^[\x00-\x7F]*$/
|
23
27
|
gsmEncodingPattern: /^[0-9a-zA-Z@Δ¡¿£_!Φ"¥Γ#èΛ¤éΩ%ùΠ&ìΨòΣçΘΞ:Ø;ÄäøÆ,<Ööæ=ÑñÅß>Üüåɧà€~ \$\.\-\+\(\)\*\\\/\?\|\^\}\{\[\]\'\r\n]*$/
|
24
28
|
|
25
29
|
constructor: (@text) ->
|
@@ -33,8 +37,25 @@ class SmsTools.Message
|
|
33
37
|
|
34
38
|
concatenatedPartsCount * @maxLengthForEncoding[@encoding][messageType]
|
35
39
|
|
40
|
+
use_gsm_encoding: ->
|
41
|
+
if SmsTools['use_gsm_encoding'] == undefined
|
42
|
+
true
|
43
|
+
else
|
44
|
+
SmsTools['use_gsm_encoding']
|
45
|
+
|
46
|
+
use_ascii_encoding: ->
|
47
|
+
if SmsTools['use_ascii_encoding'] == undefined
|
48
|
+
true
|
49
|
+
else
|
50
|
+
SmsTools['use_ascii_encoding']
|
51
|
+
|
36
52
|
_encoding: ->
|
37
|
-
if @
|
53
|
+
if @asciiPattern.test(@text) and @use_ascii_encoding()
|
54
|
+
'ascii'
|
55
|
+
else if @use_gsm_encoding() and @gsmEncodingPattern.test(@text)
|
56
|
+
'gsm'
|
57
|
+
else
|
58
|
+
'unicode'
|
38
59
|
|
39
60
|
_concatenatedPartsCount: ->
|
40
61
|
encoding = @encoding
|
@@ -45,9 +66,9 @@ class SmsTools.Message
|
|
45
66
|
else
|
46
67
|
parseInt Math.ceil(length / @maxLengthForEncoding[encoding].concatenated), 10
|
47
68
|
|
48
|
-
|
49
|
-
|
50
|
-
|
69
|
+
# Returns the number of symbols which the given text will eat up in an SMS
|
70
|
+
# message, taking into account any double-space symbols in the GSM 03.38
|
71
|
+
# encoding.
|
51
72
|
_length: ->
|
52
73
|
length = @text.length
|
53
74
|
|
data/lib/sms_tools.rb
CHANGED
@@ -1,7 +1,28 @@
|
|
1
1
|
require 'sms_tools/version'
|
2
2
|
require 'sms_tools/encoding_detection'
|
3
3
|
require 'sms_tools/gsm_encoding'
|
4
|
+
require 'sms_tools/unicode_encoding'
|
4
5
|
|
5
6
|
if defined?(::Rails) and ::Rails.version >= '3.1'
|
6
7
|
require 'sms_tools/rails/engine'
|
7
8
|
end
|
9
|
+
|
10
|
+
module SmsTools
|
11
|
+
class << self
|
12
|
+
def use_gsm_encoding?
|
13
|
+
@use_gsm_encoding.nil? ? true : @use_gsm_encoding
|
14
|
+
end
|
15
|
+
|
16
|
+
def use_gsm_encoding=(value)
|
17
|
+
@use_gsm_encoding = value
|
18
|
+
end
|
19
|
+
|
20
|
+
def use_ascii_encoding?
|
21
|
+
@use_ascii_encoding.nil? ? true : @use_ascii_encoding
|
22
|
+
end
|
23
|
+
|
24
|
+
def use_ascii_encoding=(value)
|
25
|
+
@use_ascii_encoding = value
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
@@ -3,6 +3,10 @@ require 'sms_tools/gsm_encoding'
|
|
3
3
|
module SmsTools
|
4
4
|
class EncodingDetection
|
5
5
|
MAX_LENGTH_FOR_ENCODING = {
|
6
|
+
ascii: {
|
7
|
+
normal: 160,
|
8
|
+
concatenated: 153,
|
9
|
+
},
|
6
10
|
gsm: {
|
7
11
|
normal: 160,
|
8
12
|
concatenated: 153,
|
@@ -20,7 +24,18 @@ module SmsTools
|
|
20
24
|
end
|
21
25
|
|
22
26
|
def encoding
|
23
|
-
@encoding ||=
|
27
|
+
@encoding ||=
|
28
|
+
if text.ascii_only? and SmsTools.use_ascii_encoding?
|
29
|
+
:ascii
|
30
|
+
elsif SmsTools.use_gsm_encoding? and GsmEncoding.valid?(text)
|
31
|
+
:gsm
|
32
|
+
else
|
33
|
+
:unicode
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
def ascii?
|
38
|
+
encoding == :ascii
|
24
39
|
end
|
25
40
|
|
26
41
|
def gsm?
|
@@ -49,12 +64,16 @@ module SmsTools
|
|
49
64
|
concatenated_parts * MAX_LENGTH_FOR_ENCODING[encoding][message_type]
|
50
65
|
end
|
51
66
|
|
52
|
-
|
53
|
-
|
54
|
-
|
67
|
+
# Returns the number of symbols which the given text will eat up in an SMS
|
68
|
+
# message, taking into account any double-space symbols in the GSM 03.38
|
69
|
+
# encoding.
|
55
70
|
def length
|
56
|
-
|
57
|
-
|
71
|
+
if unicode?
|
72
|
+
length = text.chars.sum { |char| UnicodeEncoding.character_count(char) }
|
73
|
+
else
|
74
|
+
length = text.length
|
75
|
+
length += text.chars.count { |char| GsmEncoding.double_byte?(char) } if gsm?
|
76
|
+
end
|
58
77
|
|
59
78
|
length
|
60
79
|
end
|
@@ -4,6 +4,8 @@ module SmsTools
|
|
4
4
|
module GsmEncoding
|
5
5
|
extend self
|
6
6
|
|
7
|
+
GSM_EXTENSION_TABLE_ESCAPE_CODE = "\x1B".freeze
|
8
|
+
|
7
9
|
UTF8_TO_GSM_BASE_TABLE = {
|
8
10
|
0x0040 => "\x00", # COMMERCIAL AT
|
9
11
|
0x00A3 => "\x01", # POUND SIGN
|
@@ -14,7 +16,7 @@ module SmsTools
|
|
14
16
|
0x00F9 => "\x06", # LATIN SMALL LETTER U WITH GRAVE
|
15
17
|
0x00EC => "\x07", # LATIN SMALL LETTER I WITH GRAVE
|
16
18
|
0x00F2 => "\x08", # LATIN SMALL LETTER O WITH GRAVE
|
17
|
-
|
19
|
+
0x00C7 => "\x09", # LATIN CAPITAL LETTER C WITH CEDILLA
|
18
20
|
0x000A => "\x0A", # LINE FEED
|
19
21
|
0x00D8 => "\x0B", # LATIN CAPITAL LETTER O WITH STROKE
|
20
22
|
0x00F8 => "\x0C", # LATIN SMALL LETTER O WITH STROKE
|
@@ -32,7 +34,7 @@ module SmsTools
|
|
32
34
|
0x03A3 => "\x18", # GREEK CAPITAL LETTER SIGMA
|
33
35
|
0x0398 => "\x19", # GREEK CAPITAL LETTER THETA
|
34
36
|
0x039E => "\x1A", # GREEK CAPITAL LETTER XI
|
35
|
-
|
37
|
+
nil => "\x1B", # ESCAPE TO EXTENSION TABLE or NON-BREAKING SPACE
|
36
38
|
0x00C6 => "\x1C", # LATIN CAPITAL LETTER AE
|
37
39
|
0x00E6 => "\x1D", # LATIN SMALL LETTER AE
|
38
40
|
0x00DF => "\x1E", # LATIN SMALL LETTER SHARP S (German)
|
@@ -176,20 +178,25 @@ module SmsTools
|
|
176
178
|
def to_utf8(gsm_encoded_string)
|
177
179
|
utf8_encoded_string = ''
|
178
180
|
escape = false
|
179
|
-
escape_code = "\e".freeze
|
180
181
|
|
181
182
|
gsm_encoded_string.each_char do |char|
|
182
|
-
if char ==
|
183
|
+
if char == GSM_EXTENSION_TABLE_ESCAPE_CODE
|
183
184
|
escape = true
|
184
185
|
elsif escape
|
185
186
|
escape = false
|
186
|
-
utf8_encoded_string << [
|
187
|
+
utf8_encoded_string << [fetch_utf8_char(GSM_EXTENSION_TABLE_ESCAPE_CODE + char)].pack('U')
|
187
188
|
else
|
188
|
-
utf8_encoded_string << [
|
189
|
+
utf8_encoded_string << [fetch_utf8_char(char)].pack('U')
|
189
190
|
end
|
190
191
|
end
|
191
192
|
|
192
193
|
utf8_encoded_string
|
193
194
|
end
|
195
|
+
|
196
|
+
private
|
197
|
+
|
198
|
+
def fetch_utf8_char(char)
|
199
|
+
GSM_TO_UTF8.fetch(char) { raise "Unsupported symbol in GSM-7 encoding: #{char}" }
|
200
|
+
end
|
194
201
|
end
|
195
202
|
end
|
@@ -0,0 +1,15 @@
|
|
1
|
+
module SmsTools
|
2
|
+
module UnicodeEncoding
|
3
|
+
extend self
|
4
|
+
|
5
|
+
BASIC_PLANE = 0x0000..0xFFFF
|
6
|
+
|
7
|
+
# UCS-2/UTF-16 is used for unicode text messaging. UCS-2/UTF-16 represents characters in minimum
|
8
|
+
# 2-bytes, any characters in the basic plane are represented with 2-bytes, so each codepoint
|
9
|
+
# within the Basic Plane counts as a single character. Any codepoint outside the Basic Plane is
|
10
|
+
# encoded using 4-bytes and therefore counts as 2 characters in a text message.
|
11
|
+
def character_count(char)
|
12
|
+
char.each_codepoint.sum { |codepoint| BASIC_PLANE.include?(codepoint) ? 1 : 2 }
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
data/lib/sms_tools/version.rb
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
require 'spec_helper'
|
2
|
-
require 'sms_tools
|
2
|
+
require 'sms_tools'
|
3
3
|
|
4
4
|
describe SmsTools::EncodingDetection do
|
5
5
|
it "exposes the original text as a method" do
|
@@ -7,27 +7,77 @@ describe SmsTools::EncodingDetection do
|
|
7
7
|
end
|
8
8
|
|
9
9
|
describe "encoding" do
|
10
|
-
it "defaults to
|
11
|
-
detection_for('').encoding.must_equal :
|
10
|
+
it "defaults to ASCII encoding for empty messages" do
|
11
|
+
detection_for('').encoding.must_equal :ascii
|
12
12
|
end
|
13
13
|
|
14
|
-
it "returns
|
15
|
-
detection_for('foo bar baz').encoding.must_equal :
|
14
|
+
it "returns ASCII as encoding for simple ASCII text" do
|
15
|
+
detection_for('foo bar baz').encoding.must_equal :ascii
|
16
16
|
end
|
17
17
|
|
18
18
|
it "returns GSM as encoding for special symbols defined in GSM 03.38" do
|
19
|
-
detection_for('09azAZ@Δ¡¿£_!Φ"
|
19
|
+
detection_for('09azAZ@Δ¡¿£_!Φ"¥Γ#èΛ¤éΩ%ùΠ&ìΨòΣCΘΞ:Ø;ÄäøÆ,<Ööæ=ÑñÅß>Üüåɧà€~').encoding.must_equal :gsm
|
20
20
|
end
|
21
21
|
|
22
|
-
it "returns
|
23
|
-
detection_for('Foo bar {} [baz]! Larodi $5. What else?').encoding.must_equal :
|
24
|
-
detection_for("Spaces and newlines are GSM 03.38, too: \r\n").encoding.must_equal :
|
22
|
+
it "returns ASCII as encoding for puntucation and newline symbols" do
|
23
|
+
detection_for('Foo bar {} [baz]! Larodi $5. What else?').encoding.must_equal :ascii
|
24
|
+
detection_for("Spaces and newlines are GSM 03.38, too: \r\n").encoding.must_equal :ascii
|
25
25
|
end
|
26
26
|
|
27
27
|
it "returns Unicode when non-GSM Unicode symbols are used" do
|
28
28
|
detection_for('Foo bar лароди').encoding.must_equal :unicode
|
29
29
|
detection_for('∞').encoding.must_equal :unicode
|
30
30
|
end
|
31
|
+
|
32
|
+
it 'considers the non-breaking space character as a non-GSM Unicode symbol' do
|
33
|
+
non_breaking_space = "\xC2\xA0"
|
34
|
+
|
35
|
+
detection_for(non_breaking_space).encoding.must_equal :unicode
|
36
|
+
end
|
37
|
+
|
38
|
+
describe 'with SmsTools.use_gsm_encoding = false' do
|
39
|
+
before do
|
40
|
+
SmsTools.use_gsm_encoding = false
|
41
|
+
end
|
42
|
+
|
43
|
+
after do
|
44
|
+
SmsTools.use_gsm_encoding = true
|
45
|
+
end
|
46
|
+
|
47
|
+
it "returns Unicode as encoding for special symbols defined in GSM 03.38" do
|
48
|
+
detection_for('09azAZ@Δ¡¿£_!Φ"¥Γ#èΛ¤éΩ%ùΠ&ìΨòΣCΘΞ:Ø;ÄäøÆ,<Ööæ=ÑñÅß>Üüåɧà€~').encoding.must_equal :unicode
|
49
|
+
end
|
50
|
+
|
51
|
+
it 'returns ASCII for simple ASCII text' do
|
52
|
+
detection_for('Hello world.').encoding.must_equal :ascii
|
53
|
+
end
|
54
|
+
|
55
|
+
it "defaults to ASCII encoding for empty messages" do
|
56
|
+
detection_for('').encoding.must_equal :ascii
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
describe 'with SmsTools.use_ascii_encoding = false' do
|
61
|
+
before do
|
62
|
+
SmsTools.use_ascii_encoding = false
|
63
|
+
end
|
64
|
+
|
65
|
+
after do
|
66
|
+
SmsTools.use_ascii_encoding = true
|
67
|
+
end
|
68
|
+
|
69
|
+
it "returns GSM 03.38 as encoding for special symbols defined in GSM 03.38" do
|
70
|
+
detection_for('09azAZ@Δ¡¿£_!Φ"¥Γ#èΛ¤éΩ%ùΠ&ìΨòΣCΘΞ:Ø;ÄäøÆ,<Ööæ=ÑñÅß>Üüåɧà€~').encoding.must_equal :gsm
|
71
|
+
end
|
72
|
+
|
73
|
+
it 'returns GSM 03.38 for simple ASCII text' do
|
74
|
+
detection_for('Hello world.').encoding.must_equal :gsm
|
75
|
+
end
|
76
|
+
|
77
|
+
it "defaults to GSM 03.38 encoding for empty messages" do
|
78
|
+
detection_for('').encoding.must_equal :gsm
|
79
|
+
end
|
80
|
+
end
|
31
81
|
end
|
32
82
|
|
33
83
|
describe "message length" do
|
@@ -38,7 +88,7 @@ describe SmsTools::EncodingDetection do
|
|
38
88
|
end
|
39
89
|
|
40
90
|
it "computes the length of non-trivial GSM encoded messages correctly" do
|
41
|
-
detection_for('GSM: 09azAZ@Δ¡¿£_!Φ"
|
91
|
+
detection_for('GSM: 09azAZ@Δ¡¿£_!Φ"¥Γ#èΛ¤éΩ%ùΠ&ìΨòΣÇΘΞ:Ø;ÄäøÆ,<Ööæ=ÑñÅß>Üüåɧà').length.must_equal 63
|
42
92
|
end
|
43
93
|
|
44
94
|
it "correctly counts the length of whitespace-only messages" do
|
@@ -67,6 +117,34 @@ describe SmsTools::EncodingDetection do
|
|
67
117
|
detection_for('Уникод: ^{}[~]|€\\').length.must_equal 17
|
68
118
|
detection_for('Уникод: Σ: €').length.must_equal 12
|
69
119
|
end
|
120
|
+
|
121
|
+
it "counts ZWJ unicode characters correctly" do
|
122
|
+
detection_for('😴').length.must_equal 2
|
123
|
+
detection_for('🛌🏽').length.must_equal 4
|
124
|
+
detection_for('🤾🏽♀️').length.must_equal 7
|
125
|
+
detection_for('🇵🇵').length.must_equal 4
|
126
|
+
detection_for('👩❤️👩').length.must_equal 8
|
127
|
+
end
|
128
|
+
|
129
|
+
describe 'with SmsTools.use_gsm_encoding = false' do
|
130
|
+
before do
|
131
|
+
SmsTools.use_gsm_encoding = false
|
132
|
+
end
|
133
|
+
|
134
|
+
it "returns ASCII encoded length for some specific symbols which are also in GSM 03.38" do
|
135
|
+
detection_for('[]').length.must_equal 2
|
136
|
+
end
|
137
|
+
end
|
138
|
+
|
139
|
+
describe 'with SmsTools.use_ascii_encoding = false' do
|
140
|
+
before do
|
141
|
+
SmsTools.use_ascii_encoding = false
|
142
|
+
end
|
143
|
+
|
144
|
+
it "returns GSM 03.38 encoded length for some specific symbols which are also in ASCII" do
|
145
|
+
detection_for('[]').length.must_equal 4
|
146
|
+
end
|
147
|
+
end
|
70
148
|
end
|
71
149
|
|
72
150
|
describe "concatenated message parts counting" do
|
@@ -96,11 +174,16 @@ describe SmsTools::EncodingDetection do
|
|
96
174
|
concatenated_parts_for length: 135, encoding: :unicode, must_be: 3
|
97
175
|
end
|
98
176
|
|
99
|
-
it "counts parts for actual GSM-encoded
|
177
|
+
it "counts parts for actual GSM-encoded messages" do
|
100
178
|
detection_for('').concatenated_parts.must_equal 1
|
101
|
-
detection_for('Я').concatenated_parts.must_equal 1
|
102
179
|
detection_for('Σ' * 160).concatenated_parts.must_equal 1
|
103
180
|
detection_for('Σ' * 159 + '~').concatenated_parts.must_equal 2
|
181
|
+
end
|
182
|
+
|
183
|
+
it "counts parts for actual Unicode-encoded messages" do
|
184
|
+
detection_for('Я').concatenated_parts.must_equal 1
|
185
|
+
detection_for('Я' * 70).concatenated_parts.must_equal 1
|
186
|
+
detection_for('Я' * 71).concatenated_parts.must_equal 2
|
104
187
|
detection_for('Я' * 133 + '~').concatenated_parts.must_equal 2
|
105
188
|
end
|
106
189
|
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'sms_tools'
|
3
|
+
|
4
|
+
describe SmsTools::GsmEncoding do
|
5
|
+
describe 'from_utf8' do
|
6
|
+
it 'converts simple UTF-8 text to GSM 03.38' do
|
7
|
+
SmsTools::GsmEncoding.from_utf8('simple').must_equal 'simple'
|
8
|
+
end
|
9
|
+
|
10
|
+
it 'converts UTF-8 text with double-byte chars to GSM 03.38' do
|
11
|
+
SmsTools::GsmEncoding.from_utf8('foo []').must_equal "foo \e<\e>"
|
12
|
+
end
|
13
|
+
|
14
|
+
it 'raises an exception if the UTF-8 text contains chars outside of GSM 03.38' do
|
15
|
+
-> { SmsTools::GsmEncoding.from_utf8('баба') }.must_raise RuntimeError, /Unsupported symbol in GSM-7 encoding/
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
describe 'to_utf8' do
|
20
|
+
it 'converts simple GSM 03.38 to UTF-8' do
|
21
|
+
SmsTools::GsmEncoding.to_utf8('simple').must_equal 'simple'
|
22
|
+
end
|
23
|
+
|
24
|
+
it 'converts UTF-8 text with double-byte chars to GSM 03.38' do
|
25
|
+
SmsTools::GsmEncoding.to_utf8("GSM \e<\e>").must_equal 'GSM []'
|
26
|
+
end
|
27
|
+
|
28
|
+
it 'raises an exception if the UTF-8 text contains chars outside of GSM 03.38' do
|
29
|
+
-> { SmsTools::GsmEncoding.to_utf8('баба') }.must_raise RuntimeError, /Unsupported symbol in GSM-7 encoding/
|
30
|
+
end
|
31
|
+
|
32
|
+
it 'ignores single occurrences of the GSM-7 extension table escape code' do
|
33
|
+
SmsTools::GsmEncoding.to_utf8("\x1B").must_equal ''
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: smstools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitar Dimitrov
|
8
|
-
autorequire:
|
8
|
+
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-01-20 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -86,16 +86,18 @@ files:
|
|
86
86
|
- lib/sms_tools/encoding_detection.rb
|
87
87
|
- lib/sms_tools/gsm_encoding.rb
|
88
88
|
- lib/sms_tools/rails/engine.rb
|
89
|
+
- lib/sms_tools/unicode_encoding.rb
|
89
90
|
- lib/sms_tools/version.rb
|
90
91
|
- lib/smstools.rb
|
91
92
|
- smstools.gemspec
|
92
93
|
- spec/sms_tools/encoding_detection_spec.rb
|
94
|
+
- spec/sms_tools/gsm_encoding_spec.rb
|
93
95
|
- spec/spec_helper.rb
|
94
96
|
homepage: https://github.com/mitio/smstools
|
95
97
|
licenses:
|
96
98
|
- MIT
|
97
99
|
metadata: {}
|
98
|
-
post_install_message:
|
100
|
+
post_install_message:
|
99
101
|
rdoc_options: []
|
100
102
|
require_paths:
|
101
103
|
- lib
|
@@ -110,11 +112,11 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
110
112
|
- !ruby/object:Gem::Version
|
111
113
|
version: '0'
|
112
114
|
requirements: []
|
113
|
-
|
114
|
-
|
115
|
-
signing_key:
|
115
|
+
rubygems_version: 3.0.3
|
116
|
+
signing_key:
|
116
117
|
specification_version: 4
|
117
118
|
summary: Small library of classes for common SMS-related functionality.
|
118
119
|
test_files:
|
119
120
|
- spec/sms_tools/encoding_detection_spec.rb
|
121
|
+
- spec/sms_tools/gsm_encoding_spec.rb
|
120
122
|
- spec/spec_helper.rb
|