smstools 0.0.1 → 0.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +5 -5
- data/CHANGELOG.md +23 -1
- data/README.md +152 -21
- data/Rakefile +6 -7
- data/lib/assets/javascripts/sms_tools/message.js.coffee +25 -4
- data/lib/sms_tools.rb +21 -0
- data/lib/sms_tools/encoding_detection.rb +25 -6
- data/lib/sms_tools/gsm_encoding.rb +13 -6
- data/lib/sms_tools/unicode_encoding.rb +15 -0
- data/lib/sms_tools/version.rb +1 -1
- data/spec/sms_tools/encoding_detection_spec.rb +95 -12
- data/spec/sms_tools/gsm_encoding_spec.rb +36 -0
- metadata +9 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: f2cecee4608c47f5abf1cf0a980b3a3a646e358d50a72e6b0f1931f554f86c5f
|
4
|
+
data.tar.gz: 46eae0938780419f4672581f4b1105a8d51cc4fe7f6150b2e44fdc3c00f16c6e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6f40d959431dc1185a989b179c91858363978b5200bba7504e499b214ba8b1493c859eebb3308b343ac8000ec747db89ccfbc664692721315bb68c41f96000a0
|
7
|
+
data.tar.gz: 7670ac023de1612cd5e4573ad4879526e5f45e38c871416cef3466e4bbee6d1740d0b2ae760b907d57811f913670862e2907b3ed3c8f268b876085195bed2a61
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,25 @@
|
|
1
|
-
## 0.
|
1
|
+
## 0.2.2 (20 Jan 2021)
|
2
|
+
|
3
|
+
* #9 Fix the way some complex Unicode characters (like composite emojis) are counted. Thanks to @bryanrite for the neat implementation. Note the fix could be **potentially backwards-incompatible** if you were relying on the incorrect behaviour previously. Technically it's still a bug fix.
|
4
|
+
|
5
|
+
## 0.2.1 (18 Aug 2020)
|
6
|
+
|
7
|
+
* #7 Introduce `SmsTools.use_ascii_encoding` option (defaults to `true` for backwards-compatibility) that allows disabling the `:ascii` workaround encoding. See #6 and #7 for details. Thanks @kingsley-wang.
|
8
|
+
|
9
|
+
## 0.2.0 (2 March 2017)
|
10
|
+
|
11
|
+
* The non-breaking space character (0x00A0 in Unicode and "\xC2\xA0" in UTF-8) is no longer regarded as a valid GSM 7-bit symbol. [#4](https://github.com/livebg/smstools/issues/4)
|
12
|
+
* GsmEncoding.to_utf8 will now raise errors in case the provided argument is not a valid GSM 7-bit text.
|
13
|
+
|
14
|
+
## 0.1.1 (18 April 2016)
|
15
|
+
|
16
|
+
* Replaces small c with cedilla to capital one, as per the GSM 03.38 standard (by @skliask)
|
17
|
+
|
18
|
+
## 0.1.0 (08 October 2015)
|
19
|
+
|
20
|
+
* distinguish between ascii encoding and gsm encoding
|
21
|
+
* add option for preventing the use of gsm encoding, that is to use unicode instead
|
22
|
+
|
23
|
+
## 0.0.1 (17 January 2014)
|
2
24
|
|
3
25
|
* Initial release.
|
data/README.md
CHANGED
@@ -1,23 +1,80 @@
|
|
1
1
|
# Sms Tools
|
2
2
|
|
3
|
-
A small collection of
|
4
|
-
|
3
|
+
A small collection of Ruby and JavaScript classes implementing often needed functionality for
|
4
|
+
dealing with SMS messages.
|
5
5
|
|
6
|
-
The gem
|
7
|
-
|
6
|
+
The gem can also be used in a Rails application as an engine. It integrates with the asset pipeline
|
7
|
+
and gives you access to some client-side SMS manipulation functionality.
|
8
8
|
|
9
9
|
## Features
|
10
10
|
|
11
11
|
The following features are available on both the server side and the client
|
12
12
|
side:
|
13
13
|
|
14
|
-
- Detection of the most optimal encoding for sending an SMS message (GSM 7-bit
|
15
|
-
|
16
|
-
- Correctly determining the message's length according to the most optimal
|
17
|
-
encoding.
|
14
|
+
- Detection of the most optimal encoding for sending an SMS message (GSM 7-bit or Unicode).
|
15
|
+
- Correctly determining a message's length in the most optimal encoding.
|
18
16
|
- Concatenation detection and concatenated message parts counting.
|
19
17
|
|
20
|
-
|
18
|
+
The following can be accomplished only on the server with Ruby:
|
19
|
+
|
20
|
+
- Converting a UTF-8 string to a GSM 7-bit encoding and vice versa.
|
21
|
+
- Detecting if a UTF-8 string can be safely represented in a GSM 7-bit encoding.
|
22
|
+
- Detection of double-byte chars in the GSM 7-bit encoding.
|
23
|
+
|
24
|
+
And possibly more.
|
25
|
+
|
26
|
+
### Note on the GSM encoding
|
27
|
+
|
28
|
+
All references to the "GSM" encoding or the "GSM 7-bit alphabet" in this text actually refer to the
|
29
|
+
[GSM 03.38 spec](http://en.wikipedia.org/wiki/GSM_03.38) and [its latest
|
30
|
+
version](ftp://ftp.unicode.org/Public/MAPPINGS/ETSI/GSM0338.TXT), as defined by the Unicode
|
31
|
+
consortium.
|
32
|
+
|
33
|
+
This encoding is the most widely used one when sending SMS messages.
|
34
|
+
|
35
|
+
### Note regarding non-ASCII symbols from the GSM encoding
|
36
|
+
|
37
|
+
The GSM 03.38 encoding is used by default. This standard defines a set of
|
38
|
+
symbols which can be encoded in 7-bits each, thus allowing up to 160 symbols
|
39
|
+
per SMS message (each SMS message can contain up to 140 bytes of data).
|
40
|
+
|
41
|
+
This standard covers most of the ASCII table, but also includes some non-ASCII
|
42
|
+
symbols such as `æ`, `ø` and `å`. If you use these in your messages, you can
|
43
|
+
still send them as GSM encoded, having a 160-symbol limit. This is technically
|
44
|
+
correct.
|
45
|
+
|
46
|
+
In reality, however, some SMS routes have problems delivering messages which
|
47
|
+
contain such non-ASCII symbols in the GSM encoding. The special symbols might
|
48
|
+
be omitted, or the message might not arrive at all.
|
49
|
+
|
50
|
+
Thus, it might be safer to just send messages in Unicode if the message's text
|
51
|
+
contains any non-ASCII symbols. This is not the default as it reduces the max
|
52
|
+
symbols count to 70 per message, instead of 160, and you might not have any
|
53
|
+
issues with GSM-encoded messages. In case you do, however, you can turn off
|
54
|
+
support for the GSM encoding and just treat messages as Unicode if they contain
|
55
|
+
non-ASCII symbols.
|
56
|
+
|
57
|
+
In case you decide to do so, you have to specify it in both the Ruby and the
|
58
|
+
JavaScript part of the library, like so:
|
59
|
+
|
60
|
+
#### In Ruby
|
61
|
+
|
62
|
+
SmsTools.use_gsm_encoding = false
|
63
|
+
|
64
|
+
#### In Javascript
|
65
|
+
|
66
|
+
//= require sms_tools
|
67
|
+
SmsTools.use_gsm_encoding = false;
|
68
|
+
|
69
|
+
There is another alternative as well. As explained in this commit – f1ffd948d4b8c – SmsTools will by
|
70
|
+
default detect the encoding as `:ascii` if the SMS message contains ASCII-only symbols. The safest
|
71
|
+
way to send messages would be to use an ASCII subset of the GSM encodnig.
|
72
|
+
|
73
|
+
The `:ascii` encoding is informative only, however. Your SMS sending implementation will have to
|
74
|
+
decide how to handle it. You may also find it confusing that the dummy `:ascii` encoding does not
|
75
|
+
consider double-byte chars at all when counting the length of the message.
|
76
|
+
|
77
|
+
To disable this dummy `:ascii` encoding, set `SmsTools.use_ascii_encoding` to `false`.
|
21
78
|
|
22
79
|
## Installation
|
23
80
|
|
@@ -33,32 +90,99 @@ Or install it yourself as:
|
|
33
90
|
|
34
91
|
$ gem install smstools
|
35
92
|
|
93
|
+
If you're using the gem in Rails, you may also want to add the following to your `application.js`
|
94
|
+
manifest file to gain access to the client-side features:
|
95
|
+
|
96
|
+
//= require sms_tools
|
97
|
+
|
36
98
|
## Usage
|
37
99
|
|
38
100
|
The gem consists of both server-side (Ruby) and client-side classes. You can
|
39
|
-
use either
|
101
|
+
use either.
|
40
102
|
|
41
103
|
### Server-side code
|
42
104
|
|
43
|
-
|
44
|
-
|
105
|
+
First make sure you have installed the gem and have required the appropriate files.
|
106
|
+
|
107
|
+
#### Encoding detection
|
108
|
+
|
109
|
+
The `SmsTools::EncodingDetection` class provides you with a few simple methods to detect the most
|
110
|
+
optimal encoding for sending an SMS message, to correctly caclulate its length in that encoding and
|
111
|
+
to see if the text would need to be concatenated or will fit in a single message.
|
112
|
+
|
113
|
+
Here is an example with a non-concatenated message which is best encoded in the GSM 7-bit alphabet:
|
114
|
+
|
115
|
+
```ruby
|
116
|
+
sms_text = 'Text in GSM 03.38: ÄäøÆ with a double-byte char: ~ '
|
117
|
+
sms_encoding = SmsTools::EncodingDetection.new sms_text
|
118
|
+
|
119
|
+
sms_encoding.gsm? # => true
|
120
|
+
sms_encoding.unicode? # => false
|
121
|
+
sms_encoding.length # => 52 (because of the double-byte char)
|
122
|
+
sms_encoding.concatenated? # => false
|
123
|
+
sms_encoding.concatenated_parts # => 1
|
124
|
+
sms_encoding.encoding # => :gsm
|
125
|
+
```
|
126
|
+
|
127
|
+
Here's another example with a concatenated Unicode message:
|
128
|
+
|
129
|
+
```ruby
|
130
|
+
sms_text = 'Я' * 90
|
131
|
+
sms_encoding = SmsTools::EncodingDetection.new sms_text
|
132
|
+
|
133
|
+
sms_encoding.gsm? # => false
|
134
|
+
sms_encoding.unicode? # => true
|
135
|
+
sms_encoding.length # => 90
|
136
|
+
sms_encoding.concatenated? # => true
|
137
|
+
sms_encoding.concatenated_parts # => 2
|
138
|
+
sms_encoding.encoding # => :unicode
|
139
|
+
```
|
140
|
+
|
141
|
+
You can check the specs for this class for more examples.
|
45
142
|
|
46
|
-
####
|
47
|
-
|
143
|
+
#### GSM 03.38 encoding conversion
|
144
|
+
|
145
|
+
The `SmsTools::GsmEncoding` class can be used to check if a given UTF-8 string can be fully
|
146
|
+
represented in the GSM 03.38 encoding as well as to convert from UTF-8 to GSM 03.38 and vice-versa.
|
147
|
+
|
148
|
+
The main API this class provides is the following:
|
149
|
+
|
150
|
+
```ruby
|
151
|
+
SmsTools::GsmEncoding.valid? message_text_in_utf8 # => true or false
|
152
|
+
|
153
|
+
SmsTools::GsmEncoding.from_utf8 utf8_encoded_string # => a GSM 03.38 encoded string
|
154
|
+
SmsTools::GsmEncoding.to_utf8 gsm_encoded_string # => an UTF-8 encoded string
|
155
|
+
```
|
156
|
+
|
157
|
+
Check out the source code of the class to find out more.
|
48
158
|
|
49
159
|
### Client-side code
|
50
160
|
|
51
|
-
If you're using the gem in Rails 3.
|
52
|
-
|
161
|
+
If you're using the gem in Rails 3.1 or newer, you can gain access to the `SmsTools.Message` class.
|
162
|
+
Its interface is similar to the one of `SmsTools::EncodingDetection`. Here is an example in
|
163
|
+
CoffeeScript:
|
53
164
|
|
54
|
-
|
165
|
+
```coffeescript
|
166
|
+
message = new SmsTools.Message 'The text of the message: ~'
|
55
167
|
|
56
|
-
|
168
|
+
message.encoding # => 'gsm'
|
169
|
+
message.length # => 27
|
170
|
+
message.concatenatedPartsCount # => 1
|
171
|
+
```
|
57
172
|
|
58
|
-
|
173
|
+
You can also check how long can this message be in the current most optimal encoding, if we want to
|
174
|
+
limit the number of concatenated messages we will allow to be sent:
|
59
175
|
|
60
|
-
|
61
|
-
|
176
|
+
```coffeescript
|
177
|
+
maxConcatenatedPartsCount = 2
|
178
|
+
message.maxLengthFor(maxConcatenatedPartsCount) # => 306
|
179
|
+
```
|
180
|
+
|
181
|
+
This allows you to have a dynamic instead of a fixed length limit, for when you use a non-GSM 03.38
|
182
|
+
symbol in your text, your message length limit decreases significantly.
|
183
|
+
|
184
|
+
Note that to use this client-side code, a Rails application with an active asset pipeline is
|
185
|
+
assumed. It might be possible to use it in other setups as well, but you're on your own there.
|
62
186
|
|
63
187
|
## Contributing
|
64
188
|
|
@@ -69,3 +193,10 @@ CoffeeScript preprocessor set up.
|
|
69
193
|
5. Commit your changes (`git commit -am 'Add some feature'`)
|
70
194
|
6. Push to the branch (`git push origin my-new-feature`)
|
71
195
|
7. Send a pull request.
|
196
|
+
|
197
|
+
## Publishing a new version
|
198
|
+
|
199
|
+
1. Pick a version number according to Semantic Versioning.
|
200
|
+
2. Update `CHANGELOG.md`, `version.rb` and potentially this readme.
|
201
|
+
3. Commit the changes, tag them with `vX.Y.Z` (e.g. `v0.2.1`) and push all with `git push --tags`.
|
202
|
+
4. Build and publish the new version of the gem with `gem build smstools.gemspec && gem push *.gem`.
|
data/Rakefile
CHANGED
@@ -1,11 +1,10 @@
|
|
1
1
|
require 'bundler/gem_tasks'
|
2
|
+
require 'rake/testtask'
|
2
3
|
|
3
|
-
task :test
|
4
|
-
test_files = Dir[File.expand_path('../spec/**/*_spec.rb', __FILE__)]
|
5
|
-
command = "ruby -Ispec #{test_files.join ' '}"
|
4
|
+
task default: :test
|
6
5
|
|
7
|
-
|
8
|
-
|
6
|
+
Rake::TestTask.new do |t|
|
7
|
+
t.libs << 'spec'
|
8
|
+
t.test_files = FileList['spec/**/*_spec.rb']
|
9
|
+
t.verbose = true
|
9
10
|
end
|
10
|
-
|
11
|
-
task default: :test
|
@@ -2,6 +2,9 @@ window.SmsTools ?= {}
|
|
2
2
|
|
3
3
|
class SmsTools.Message
|
4
4
|
maxLengthForEncoding:
|
5
|
+
ascii:
|
6
|
+
normal: 160
|
7
|
+
concatenated: 153
|
5
8
|
gsm:
|
6
9
|
normal: 160
|
7
10
|
concatenated: 153
|
@@ -20,6 +23,7 @@ class SmsTools.Message
|
|
20
23
|
'€': true
|
21
24
|
'\\': true
|
22
25
|
|
26
|
+
asciiPattern: /^[\x00-\x7F]*$/
|
23
27
|
gsmEncodingPattern: /^[0-9a-zA-Z@Δ¡¿£_!Φ"¥Γ#èΛ¤éΩ%ùΠ&ìΨòΣçΘΞ:Ø;ÄäøÆ,<Ööæ=ÑñÅß>Üüåɧà€~ \$\.\-\+\(\)\*\\\/\?\|\^\}\{\[\]\'\r\n]*$/
|
24
28
|
|
25
29
|
constructor: (@text) ->
|
@@ -33,8 +37,25 @@ class SmsTools.Message
|
|
33
37
|
|
34
38
|
concatenatedPartsCount * @maxLengthForEncoding[@encoding][messageType]
|
35
39
|
|
40
|
+
use_gsm_encoding: ->
|
41
|
+
if SmsTools['use_gsm_encoding'] == undefined
|
42
|
+
true
|
43
|
+
else
|
44
|
+
SmsTools['use_gsm_encoding']
|
45
|
+
|
46
|
+
use_ascii_encoding: ->
|
47
|
+
if SmsTools['use_ascii_encoding'] == undefined
|
48
|
+
true
|
49
|
+
else
|
50
|
+
SmsTools['use_ascii_encoding']
|
51
|
+
|
36
52
|
_encoding: ->
|
37
|
-
if @
|
53
|
+
if @asciiPattern.test(@text) and @use_ascii_encoding()
|
54
|
+
'ascii'
|
55
|
+
else if @use_gsm_encoding() and @gsmEncodingPattern.test(@text)
|
56
|
+
'gsm'
|
57
|
+
else
|
58
|
+
'unicode'
|
38
59
|
|
39
60
|
_concatenatedPartsCount: ->
|
40
61
|
encoding = @encoding
|
@@ -45,9 +66,9 @@ class SmsTools.Message
|
|
45
66
|
else
|
46
67
|
parseInt Math.ceil(length / @maxLengthForEncoding[encoding].concatenated), 10
|
47
68
|
|
48
|
-
|
49
|
-
|
50
|
-
|
69
|
+
# Returns the number of symbols which the given text will eat up in an SMS
|
70
|
+
# message, taking into account any double-space symbols in the GSM 03.38
|
71
|
+
# encoding.
|
51
72
|
_length: ->
|
52
73
|
length = @text.length
|
53
74
|
|
data/lib/sms_tools.rb
CHANGED
@@ -1,7 +1,28 @@
|
|
1
1
|
require 'sms_tools/version'
|
2
2
|
require 'sms_tools/encoding_detection'
|
3
3
|
require 'sms_tools/gsm_encoding'
|
4
|
+
require 'sms_tools/unicode_encoding'
|
4
5
|
|
5
6
|
if defined?(::Rails) and ::Rails.version >= '3.1'
|
6
7
|
require 'sms_tools/rails/engine'
|
7
8
|
end
|
9
|
+
|
10
|
+
module SmsTools
|
11
|
+
class << self
|
12
|
+
def use_gsm_encoding?
|
13
|
+
@use_gsm_encoding.nil? ? true : @use_gsm_encoding
|
14
|
+
end
|
15
|
+
|
16
|
+
def use_gsm_encoding=(value)
|
17
|
+
@use_gsm_encoding = value
|
18
|
+
end
|
19
|
+
|
20
|
+
def use_ascii_encoding?
|
21
|
+
@use_ascii_encoding.nil? ? true : @use_ascii_encoding
|
22
|
+
end
|
23
|
+
|
24
|
+
def use_ascii_encoding=(value)
|
25
|
+
@use_ascii_encoding = value
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
@@ -3,6 +3,10 @@ require 'sms_tools/gsm_encoding'
|
|
3
3
|
module SmsTools
|
4
4
|
class EncodingDetection
|
5
5
|
MAX_LENGTH_FOR_ENCODING = {
|
6
|
+
ascii: {
|
7
|
+
normal: 160,
|
8
|
+
concatenated: 153,
|
9
|
+
},
|
6
10
|
gsm: {
|
7
11
|
normal: 160,
|
8
12
|
concatenated: 153,
|
@@ -20,7 +24,18 @@ module SmsTools
|
|
20
24
|
end
|
21
25
|
|
22
26
|
def encoding
|
23
|
-
@encoding ||=
|
27
|
+
@encoding ||=
|
28
|
+
if text.ascii_only? and SmsTools.use_ascii_encoding?
|
29
|
+
:ascii
|
30
|
+
elsif SmsTools.use_gsm_encoding? and GsmEncoding.valid?(text)
|
31
|
+
:gsm
|
32
|
+
else
|
33
|
+
:unicode
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
def ascii?
|
38
|
+
encoding == :ascii
|
24
39
|
end
|
25
40
|
|
26
41
|
def gsm?
|
@@ -49,12 +64,16 @@ module SmsTools
|
|
49
64
|
concatenated_parts * MAX_LENGTH_FOR_ENCODING[encoding][message_type]
|
50
65
|
end
|
51
66
|
|
52
|
-
|
53
|
-
|
54
|
-
|
67
|
+
# Returns the number of symbols which the given text will eat up in an SMS
|
68
|
+
# message, taking into account any double-space symbols in the GSM 03.38
|
69
|
+
# encoding.
|
55
70
|
def length
|
56
|
-
|
57
|
-
|
71
|
+
if unicode?
|
72
|
+
length = text.chars.sum { |char| UnicodeEncoding.character_count(char) }
|
73
|
+
else
|
74
|
+
length = text.length
|
75
|
+
length += text.chars.count { |char| GsmEncoding.double_byte?(char) } if gsm?
|
76
|
+
end
|
58
77
|
|
59
78
|
length
|
60
79
|
end
|
@@ -4,6 +4,8 @@ module SmsTools
|
|
4
4
|
module GsmEncoding
|
5
5
|
extend self
|
6
6
|
|
7
|
+
GSM_EXTENSION_TABLE_ESCAPE_CODE = "\x1B".freeze
|
8
|
+
|
7
9
|
UTF8_TO_GSM_BASE_TABLE = {
|
8
10
|
0x0040 => "\x00", # COMMERCIAL AT
|
9
11
|
0x00A3 => "\x01", # POUND SIGN
|
@@ -14,7 +16,7 @@ module SmsTools
|
|
14
16
|
0x00F9 => "\x06", # LATIN SMALL LETTER U WITH GRAVE
|
15
17
|
0x00EC => "\x07", # LATIN SMALL LETTER I WITH GRAVE
|
16
18
|
0x00F2 => "\x08", # LATIN SMALL LETTER O WITH GRAVE
|
17
|
-
|
19
|
+
0x00C7 => "\x09", # LATIN CAPITAL LETTER C WITH CEDILLA
|
18
20
|
0x000A => "\x0A", # LINE FEED
|
19
21
|
0x00D8 => "\x0B", # LATIN CAPITAL LETTER O WITH STROKE
|
20
22
|
0x00F8 => "\x0C", # LATIN SMALL LETTER O WITH STROKE
|
@@ -32,7 +34,7 @@ module SmsTools
|
|
32
34
|
0x03A3 => "\x18", # GREEK CAPITAL LETTER SIGMA
|
33
35
|
0x0398 => "\x19", # GREEK CAPITAL LETTER THETA
|
34
36
|
0x039E => "\x1A", # GREEK CAPITAL LETTER XI
|
35
|
-
|
37
|
+
nil => "\x1B", # ESCAPE TO EXTENSION TABLE or NON-BREAKING SPACE
|
36
38
|
0x00C6 => "\x1C", # LATIN CAPITAL LETTER AE
|
37
39
|
0x00E6 => "\x1D", # LATIN SMALL LETTER AE
|
38
40
|
0x00DF => "\x1E", # LATIN SMALL LETTER SHARP S (German)
|
@@ -176,20 +178,25 @@ module SmsTools
|
|
176
178
|
def to_utf8(gsm_encoded_string)
|
177
179
|
utf8_encoded_string = ''
|
178
180
|
escape = false
|
179
|
-
escape_code = "\e".freeze
|
180
181
|
|
181
182
|
gsm_encoded_string.each_char do |char|
|
182
|
-
if char ==
|
183
|
+
if char == GSM_EXTENSION_TABLE_ESCAPE_CODE
|
183
184
|
escape = true
|
184
185
|
elsif escape
|
185
186
|
escape = false
|
186
|
-
utf8_encoded_string << [
|
187
|
+
utf8_encoded_string << [fetch_utf8_char(GSM_EXTENSION_TABLE_ESCAPE_CODE + char)].pack('U')
|
187
188
|
else
|
188
|
-
utf8_encoded_string << [
|
189
|
+
utf8_encoded_string << [fetch_utf8_char(char)].pack('U')
|
189
190
|
end
|
190
191
|
end
|
191
192
|
|
192
193
|
utf8_encoded_string
|
193
194
|
end
|
195
|
+
|
196
|
+
private
|
197
|
+
|
198
|
+
def fetch_utf8_char(char)
|
199
|
+
GSM_TO_UTF8.fetch(char) { raise "Unsupported symbol in GSM-7 encoding: #{char}" }
|
200
|
+
end
|
194
201
|
end
|
195
202
|
end
|
@@ -0,0 +1,15 @@
|
|
1
|
+
module SmsTools
|
2
|
+
module UnicodeEncoding
|
3
|
+
extend self
|
4
|
+
|
5
|
+
BASIC_PLANE = 0x0000..0xFFFF
|
6
|
+
|
7
|
+
# UCS-2/UTF-16 is used for unicode text messaging. UCS-2/UTF-16 represents characters in minimum
|
8
|
+
# 2-bytes, any characters in the basic plane are represented with 2-bytes, so each codepoint
|
9
|
+
# within the Basic Plane counts as a single character. Any codepoint outside the Basic Plane is
|
10
|
+
# encoded using 4-bytes and therefore counts as 2 characters in a text message.
|
11
|
+
def character_count(char)
|
12
|
+
char.each_codepoint.sum { |codepoint| BASIC_PLANE.include?(codepoint) ? 1 : 2 }
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
data/lib/sms_tools/version.rb
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
require 'spec_helper'
|
2
|
-
require 'sms_tools
|
2
|
+
require 'sms_tools'
|
3
3
|
|
4
4
|
describe SmsTools::EncodingDetection do
|
5
5
|
it "exposes the original text as a method" do
|
@@ -7,27 +7,77 @@ describe SmsTools::EncodingDetection do
|
|
7
7
|
end
|
8
8
|
|
9
9
|
describe "encoding" do
|
10
|
-
it "defaults to
|
11
|
-
detection_for('').encoding.must_equal :
|
10
|
+
it "defaults to ASCII encoding for empty messages" do
|
11
|
+
detection_for('').encoding.must_equal :ascii
|
12
12
|
end
|
13
13
|
|
14
|
-
it "returns
|
15
|
-
detection_for('foo bar baz').encoding.must_equal :
|
14
|
+
it "returns ASCII as encoding for simple ASCII text" do
|
15
|
+
detection_for('foo bar baz').encoding.must_equal :ascii
|
16
16
|
end
|
17
17
|
|
18
18
|
it "returns GSM as encoding for special symbols defined in GSM 03.38" do
|
19
|
-
detection_for('09azAZ@Δ¡¿£_!Φ"
|
19
|
+
detection_for('09azAZ@Δ¡¿£_!Φ"¥Γ#èΛ¤éΩ%ùΠ&ìΨòΣCΘΞ:Ø;ÄäøÆ,<Ööæ=ÑñÅß>Üüåɧà€~').encoding.must_equal :gsm
|
20
20
|
end
|
21
21
|
|
22
|
-
it "returns
|
23
|
-
detection_for('Foo bar {} [baz]! Larodi $5. What else?').encoding.must_equal :
|
24
|
-
detection_for("Spaces and newlines are GSM 03.38, too: \r\n").encoding.must_equal :
|
22
|
+
it "returns ASCII as encoding for puntucation and newline symbols" do
|
23
|
+
detection_for('Foo bar {} [baz]! Larodi $5. What else?').encoding.must_equal :ascii
|
24
|
+
detection_for("Spaces and newlines are GSM 03.38, too: \r\n").encoding.must_equal :ascii
|
25
25
|
end
|
26
26
|
|
27
27
|
it "returns Unicode when non-GSM Unicode symbols are used" do
|
28
28
|
detection_for('Foo bar лароди').encoding.must_equal :unicode
|
29
29
|
detection_for('∞').encoding.must_equal :unicode
|
30
30
|
end
|
31
|
+
|
32
|
+
it 'considers the non-breaking space character as a non-GSM Unicode symbol' do
|
33
|
+
non_breaking_space = "\xC2\xA0"
|
34
|
+
|
35
|
+
detection_for(non_breaking_space).encoding.must_equal :unicode
|
36
|
+
end
|
37
|
+
|
38
|
+
describe 'with SmsTools.use_gsm_encoding = false' do
|
39
|
+
before do
|
40
|
+
SmsTools.use_gsm_encoding = false
|
41
|
+
end
|
42
|
+
|
43
|
+
after do
|
44
|
+
SmsTools.use_gsm_encoding = true
|
45
|
+
end
|
46
|
+
|
47
|
+
it "returns Unicode as encoding for special symbols defined in GSM 03.38" do
|
48
|
+
detection_for('09azAZ@Δ¡¿£_!Φ"¥Γ#èΛ¤éΩ%ùΠ&ìΨòΣCΘΞ:Ø;ÄäøÆ,<Ööæ=ÑñÅß>Üüåɧà€~').encoding.must_equal :unicode
|
49
|
+
end
|
50
|
+
|
51
|
+
it 'returns ASCII for simple ASCII text' do
|
52
|
+
detection_for('Hello world.').encoding.must_equal :ascii
|
53
|
+
end
|
54
|
+
|
55
|
+
it "defaults to ASCII encoding for empty messages" do
|
56
|
+
detection_for('').encoding.must_equal :ascii
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
describe 'with SmsTools.use_ascii_encoding = false' do
|
61
|
+
before do
|
62
|
+
SmsTools.use_ascii_encoding = false
|
63
|
+
end
|
64
|
+
|
65
|
+
after do
|
66
|
+
SmsTools.use_ascii_encoding = true
|
67
|
+
end
|
68
|
+
|
69
|
+
it "returns GSM 03.38 as encoding for special symbols defined in GSM 03.38" do
|
70
|
+
detection_for('09azAZ@Δ¡¿£_!Φ"¥Γ#èΛ¤éΩ%ùΠ&ìΨòΣCΘΞ:Ø;ÄäøÆ,<Ööæ=ÑñÅß>Üüåɧà€~').encoding.must_equal :gsm
|
71
|
+
end
|
72
|
+
|
73
|
+
it 'returns GSM 03.38 for simple ASCII text' do
|
74
|
+
detection_for('Hello world.').encoding.must_equal :gsm
|
75
|
+
end
|
76
|
+
|
77
|
+
it "defaults to GSM 03.38 encoding for empty messages" do
|
78
|
+
detection_for('').encoding.must_equal :gsm
|
79
|
+
end
|
80
|
+
end
|
31
81
|
end
|
32
82
|
|
33
83
|
describe "message length" do
|
@@ -38,7 +88,7 @@ describe SmsTools::EncodingDetection do
|
|
38
88
|
end
|
39
89
|
|
40
90
|
it "computes the length of non-trivial GSM encoded messages correctly" do
|
41
|
-
detection_for('GSM: 09azAZ@Δ¡¿£_!Φ"
|
91
|
+
detection_for('GSM: 09azAZ@Δ¡¿£_!Φ"¥Γ#èΛ¤éΩ%ùΠ&ìΨòΣÇΘΞ:Ø;ÄäøÆ,<Ööæ=ÑñÅß>Üüåɧà').length.must_equal 63
|
42
92
|
end
|
43
93
|
|
44
94
|
it "correctly counts the length of whitespace-only messages" do
|
@@ -67,6 +117,34 @@ describe SmsTools::EncodingDetection do
|
|
67
117
|
detection_for('Уникод: ^{}[~]|€\\').length.must_equal 17
|
68
118
|
detection_for('Уникод: Σ: €').length.must_equal 12
|
69
119
|
end
|
120
|
+
|
121
|
+
it "counts ZWJ unicode characters correctly" do
|
122
|
+
detection_for('😴').length.must_equal 2
|
123
|
+
detection_for('🛌🏽').length.must_equal 4
|
124
|
+
detection_for('🤾🏽♀️').length.must_equal 7
|
125
|
+
detection_for('🇵🇵').length.must_equal 4
|
126
|
+
detection_for('👩❤️👩').length.must_equal 8
|
127
|
+
end
|
128
|
+
|
129
|
+
describe 'with SmsTools.use_gsm_encoding = false' do
|
130
|
+
before do
|
131
|
+
SmsTools.use_gsm_encoding = false
|
132
|
+
end
|
133
|
+
|
134
|
+
it "returns ASCII encoded length for some specific symbols which are also in GSM 03.38" do
|
135
|
+
detection_for('[]').length.must_equal 2
|
136
|
+
end
|
137
|
+
end
|
138
|
+
|
139
|
+
describe 'with SmsTools.use_ascii_encoding = false' do
|
140
|
+
before do
|
141
|
+
SmsTools.use_ascii_encoding = false
|
142
|
+
end
|
143
|
+
|
144
|
+
it "returns GSM 03.38 encoded length for some specific symbols which are also in ASCII" do
|
145
|
+
detection_for('[]').length.must_equal 4
|
146
|
+
end
|
147
|
+
end
|
70
148
|
end
|
71
149
|
|
72
150
|
describe "concatenated message parts counting" do
|
@@ -96,11 +174,16 @@ describe SmsTools::EncodingDetection do
|
|
96
174
|
concatenated_parts_for length: 135, encoding: :unicode, must_be: 3
|
97
175
|
end
|
98
176
|
|
99
|
-
it "counts parts for actual GSM-encoded
|
177
|
+
it "counts parts for actual GSM-encoded messages" do
|
100
178
|
detection_for('').concatenated_parts.must_equal 1
|
101
|
-
detection_for('Я').concatenated_parts.must_equal 1
|
102
179
|
detection_for('Σ' * 160).concatenated_parts.must_equal 1
|
103
180
|
detection_for('Σ' * 159 + '~').concatenated_parts.must_equal 2
|
181
|
+
end
|
182
|
+
|
183
|
+
it "counts parts for actual Unicode-encoded messages" do
|
184
|
+
detection_for('Я').concatenated_parts.must_equal 1
|
185
|
+
detection_for('Я' * 70).concatenated_parts.must_equal 1
|
186
|
+
detection_for('Я' * 71).concatenated_parts.must_equal 2
|
104
187
|
detection_for('Я' * 133 + '~').concatenated_parts.must_equal 2
|
105
188
|
end
|
106
189
|
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'sms_tools'
|
3
|
+
|
4
|
+
describe SmsTools::GsmEncoding do
|
5
|
+
describe 'from_utf8' do
|
6
|
+
it 'converts simple UTF-8 text to GSM 03.38' do
|
7
|
+
SmsTools::GsmEncoding.from_utf8('simple').must_equal 'simple'
|
8
|
+
end
|
9
|
+
|
10
|
+
it 'converts UTF-8 text with double-byte chars to GSM 03.38' do
|
11
|
+
SmsTools::GsmEncoding.from_utf8('foo []').must_equal "foo \e<\e>"
|
12
|
+
end
|
13
|
+
|
14
|
+
it 'raises an exception if the UTF-8 text contains chars outside of GSM 03.38' do
|
15
|
+
-> { SmsTools::GsmEncoding.from_utf8('баба') }.must_raise RuntimeError, /Unsupported symbol in GSM-7 encoding/
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
describe 'to_utf8' do
|
20
|
+
it 'converts simple GSM 03.38 to UTF-8' do
|
21
|
+
SmsTools::GsmEncoding.to_utf8('simple').must_equal 'simple'
|
22
|
+
end
|
23
|
+
|
24
|
+
it 'converts UTF-8 text with double-byte chars to GSM 03.38' do
|
25
|
+
SmsTools::GsmEncoding.to_utf8("GSM \e<\e>").must_equal 'GSM []'
|
26
|
+
end
|
27
|
+
|
28
|
+
it 'raises an exception if the UTF-8 text contains chars outside of GSM 03.38' do
|
29
|
+
-> { SmsTools::GsmEncoding.to_utf8('баба') }.must_raise RuntimeError, /Unsupported symbol in GSM-7 encoding/
|
30
|
+
end
|
31
|
+
|
32
|
+
it 'ignores single occurrences of the GSM-7 extension table escape code' do
|
33
|
+
SmsTools::GsmEncoding.to_utf8("\x1B").must_equal ''
|
34
|
+
end
|
35
|
+
end
|
36
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: smstools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dimitar Dimitrov
|
8
|
-
autorequire:
|
8
|
+
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-01-20 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -86,16 +86,18 @@ files:
|
|
86
86
|
- lib/sms_tools/encoding_detection.rb
|
87
87
|
- lib/sms_tools/gsm_encoding.rb
|
88
88
|
- lib/sms_tools/rails/engine.rb
|
89
|
+
- lib/sms_tools/unicode_encoding.rb
|
89
90
|
- lib/sms_tools/version.rb
|
90
91
|
- lib/smstools.rb
|
91
92
|
- smstools.gemspec
|
92
93
|
- spec/sms_tools/encoding_detection_spec.rb
|
94
|
+
- spec/sms_tools/gsm_encoding_spec.rb
|
93
95
|
- spec/spec_helper.rb
|
94
96
|
homepage: https://github.com/mitio/smstools
|
95
97
|
licenses:
|
96
98
|
- MIT
|
97
99
|
metadata: {}
|
98
|
-
post_install_message:
|
100
|
+
post_install_message:
|
99
101
|
rdoc_options: []
|
100
102
|
require_paths:
|
101
103
|
- lib
|
@@ -110,11 +112,11 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
110
112
|
- !ruby/object:Gem::Version
|
111
113
|
version: '0'
|
112
114
|
requirements: []
|
113
|
-
|
114
|
-
|
115
|
-
signing_key:
|
115
|
+
rubygems_version: 3.0.3
|
116
|
+
signing_key:
|
116
117
|
specification_version: 4
|
117
118
|
summary: Small library of classes for common SMS-related functionality.
|
118
119
|
test_files:
|
119
120
|
- spec/sms_tools/encoding_detection_spec.rb
|
121
|
+
- spec/sms_tools/gsm_encoding_spec.rb
|
120
122
|
- spec/spec_helper.rb
|