unibits 2.1.0 → 2.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 93d43c363159fed623abd9acd021e0aa6ed90061
4
- data.tar.gz: 7fe58c18ca55c1a595345235fa3cf6fa9b0d3635
3
+ metadata.gz: 62770bc741ec5a7693759d4e888d46cf7b3860f2
4
+ data.tar.gz: 171c479f1f4ebdcf11bc97c4e4e0b94bac713abb
5
5
  SHA512:
6
- metadata.gz: 4b580cbfc775274bd9790e0b000a039ed304ffe4e5f41349f5e445118154b9351d5084dfe204c77203b0f4dd4b3e777fa3119e3b47bbba0a422df89bdfec6bad
7
- data.tar.gz: 54da485c62dcf7390dde62460d977472130e1c3c6d091e3fa3a71be07c3854260c06c2eb067f7ac2abc96d934e69c6393861c06197a519d1b432f541bc9cf0f8
6
+ metadata.gz: 793cdfeca34183b1b1f6cc219ddd822b9faac5542e92f1ec1065c3fd6ff07c38762becc5312d9bd29fbe3f976bd6eabb3e4c5e228bcfbe8e6340e90b5407234e
7
+ data.tar.gz: 9691e9072ff3084706e5dfe1ed51b1e6c206d2b11a9eaf710a37ee7b5f1e09b15a6eb3d85e3cb4517a64ef9e5b6aa0bb6bfadee72285487a4a1d0f3f66fa32bd
data/CHANGELOG.md CHANGED
@@ -1,5 +1,9 @@
1
1
  ## CHANGELOG
2
2
 
3
+ ### 2.1.1
4
+
5
+ * Proper UTF-32 validness / invalid codepoint highlighting, see https://bugs.ruby-lang.org/issues/13292
6
+
3
7
  ### 2.1.0
4
8
 
5
9
  * Support more encoding: IBMX, CP85X, macX, TIS-620/Windows-874, and KOI8-X
data/README.md CHANGED
@@ -4,8 +4,8 @@ Ruby library and CLI command that visualizes various Unicode and ASCII/single by
4
4
 
5
5
  - Makes analyzing encodings easier
6
6
  - Helps you with debugging strings
7
- - Supports **UTF-8**, **UTF-16LE**/**UTF-16BE**, **UTF-32LE**/**UTF-32BE**, **ISO-8859-X**, **Windows-125X**, **IBMX**, **CP85X**, **macX**, **TIS-620**/**Windows-874**, **KOI8-R**/**KOI8-U**, arbitrary **BINARY** data, and 7-Bit **ASCII**
8
7
  - Highlights invalid/special/blank bytes/characters/codepoints
8
+ - Supports *UTF-8*, *UTF-16LE*/*UTF-16BE*, *UTF-32LE*/*UTF-32BE*, *ISO-8859-X*, *Windows-125X*, *IBMX*, *CP85X*, *macX*, *TIS-620*/*Windows-874*, *KOI8-R*/*KOI8-U*, 7-Bit *ASCII*, and arbitrary *BINARY* data
9
9
 
10
10
  ## Color Coding
11
11
 
@@ -53,7 +53,7 @@ unibits "🌫 Idiosyncrätic ℜսᖯʏ"
53
53
  - *wide-ambiguous*: Treat characters of ambiguous width as 2 spaces instead of 1 ([more info](https://github.com/janlelis/unicode-display_width))
54
54
  - *width (w)*: Set a custom column width, if not set, *unibits* will retrieve it from the terminal or just use 80
55
55
 
56
- ## Output of Different Valid Encodings
56
+ ## Examples of Valid Encodings
57
57
  ### UTF-8
58
58
 
59
59
  CLI: `$ unibits -e utf-8 -c utf-8 "🌫 Idiosyncrätic ℜսᖯʏ"`
@@ -70,22 +70,6 @@ Ruby: `unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'utf-8', convert:
70
70
 
71
71
  ![Screenshot UTF-16LE](/screenshots/utf-16le.png?raw=true "UTF-16LE")
72
72
 
73
- ### UTF-16BE
74
-
75
- CLI: `$ unibits -e utf-8 -c utf-16be "🌫 Idiosyncrätic ℜսᖯʏ"`
76
-
77
- Ruby: `unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'utf-8', convert: 'utf-16be'`
78
-
79
- ![Screenshot UTF-16BE](/screenshots/utf-16be.png?raw=true "UTF-16BE")
80
-
81
- ### UTF-32LE
82
-
83
- CLI: `$ unibits -e utf-8 -c utf-32le "🌫 Idiosyncrätic ℜսᖯʏ"`
84
-
85
- Ruby: `unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'utf-8', convert: 'utf-32le'`
86
-
87
- ![Screenshot UTF-32LE](/screenshots/utf-32le.png?raw=true "UTF-32LE")
88
-
89
73
  ### UTF-32BE
90
74
 
91
75
  CLI: `$ unibits -e utf-8 -c utf-32be "🌫 Idiosyncrätic ℜսᖯʏ"`
@@ -106,11 +90,11 @@ Ruby: `unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'binary'`
106
90
 
107
91
  CLI: `$ unibits -e utf-8 -c ascii "ascii"`
108
92
 
109
- Ruby: `unibits "ASCII String", encoding: 'utf-8', convert: 'ascii'`
93
+ Ruby: `unibits "ascii", encoding: 'utf-8', convert: 'ascii'`
110
94
 
111
95
  ![Screenshot ASCII](/screenshots/ascii.png?raw=true "ASCII")
112
96
 
113
- ## Invalid Encodings
97
+ ## Examples of Invalid Encodings
114
98
  ### UTF-8
115
99
 
116
100
  Example in Ruby: `unibits "unexpected \x80 | not enough \xF0\x9F\x8C | overlong \xE0\x81\x81 | surrogate \xED\xA0\x80 | too large \xF5\x8F\xBF\xBF"`
@@ -123,10 +107,6 @@ Example in Ruby: `unibits "🌫 Idiosyncrätic ℜսᖯʏ", encoding: 'ascii'
123
107
 
124
108
  ![Screenshot invalid ASCII](/screenshots/ascii.invalid.png?raw=true "Invalid ASCII")
125
109
 
126
- ### BINARY
127
-
128
- Not possible to produce invalid binary strings
129
-
130
110
  ## Notes
131
111
 
132
112
  Also see
data/lib/unibits.rb CHANGED
@@ -79,6 +79,7 @@ module Unibits
79
79
  puts
80
80
  string.each_char{ |char|
81
81
  char_info = Characteristics.create_for_type(char, type)
82
+ double_check_utf32_validness!(char, char_info)
82
83
  current_color = determine_char_color(char_info)
83
84
 
84
85
  current_encoding_error = nil if char_info.valid?
@@ -184,10 +185,13 @@ module Unibits
184
185
  codepoint = "invalid"
185
186
  end
186
187
  when 'UTF-32LE', 'UTF-32BE'
187
- if char.bytesize != "4"
188
+ if char.bytesize % 4 != 0
188
189
  codepoint = "incompl."
190
+ elsif char.b.unpack("C*")[encoding_name == 'UTF-32LE' ? 2 : 1] > 16 ||
191
+ char.b.unpack("C*")[encoding_name == 'UTF-32LE' ? 3 : 0] > 0
192
+ codepoint = "toolarge"
189
193
  else
190
- codepoint = "invalid"
194
+ codepoint = "sur.gate"
191
195
  end
192
196
  end
193
197
  end
@@ -305,4 +309,15 @@ module Unibits
305
309
  res << Paint[ bin_byte_2, current_color, :underline ] unless !bin_byte_2 || bin_byte_2.empty?
306
310
  res
307
311
  end
312
+
313
+ def self.double_check_utf32_validness!(char, char_info)
314
+ return if RUBY_VERSION > "2.4.0" || char_info.encoding.name[0, 6] != "UTF-32" || !char_info.valid?
315
+ byte_values = char.b.unpack("C*")
316
+ le = char_info.encoding.name == 'UTF-32LE'
317
+ if byte_values[le ? 2 : 1] > 16 ||
318
+ byte_values[le ? 3 : 0] > 0 ||
319
+ byte_values[le ? 1 : 2] >= 216 && byte_values[le ? 1 : 2] <= 223
320
+ char_info.instance_variable_set(:@is_valid, false)
321
+ end
322
+ end
308
323
  end
@@ -1,3 +1,3 @@
1
1
  module Unibits
2
- VERSION = "2.1.0".freeze
2
+ VERSION = "2.1.1".freeze
3
3
  end
data/spec/unibits_spec.rb CHANGED
@@ -259,21 +259,26 @@ describe Unibits do
259
259
  result.must_match "�"
260
260
  end
261
261
 
262
- # TODO implement when https://bugs.ruby-lang.org/issues/13292 is released
263
-
264
- # it "- too large codepoint" do
265
- # string = "\x00\x00\x11\x00".force_encoding("UTF-32LE")
266
- # result = Paint.unpaint(Unibits.visualize(string))
267
- # result.must_match "�"
268
- # result.must_match /toolarge.*toolarge.*toolarge.*toolarge/m
269
- # end
270
-
271
- # it "- has surrogate" do
272
- # string = "\x00\xD8\x00\x00".force_encoding("UTF-32LE")
273
- # result = Paint.unpaint(Unibits.visualize(string))
274
- # result.must_match "�"
275
- # result.must_match "sur.gate"
276
- # end
262
+ it "- too large codepoint (1/2)" do
263
+ string = "\x00\x00\x11\x00".force_encoding("UTF-32LE")
264
+ result = Paint.unpaint(Unibits.visualize(string))
265
+ result.must_match ""
266
+ result.must_match "toolarge"
267
+ end
268
+
269
+ it "- too large codepoint (2/2)" do
270
+ string = "\x00\x00\x00\x01".force_encoding("UTF-32LE")
271
+ result = Paint.unpaint(Unibits.visualize(string))
272
+ result.must_match ""
273
+ result.must_match "toolarge"
274
+ end
275
+
276
+ it "- has surrogate" do
277
+ string = "\x00\xD8\x00\x00".force_encoding("UTF-32LE")
278
+ result = Paint.unpaint(Unibits.visualize(string))
279
+ result.must_match "�"
280
+ result.must_match "sur.gate"
281
+ end
277
282
  end
278
283
 
279
284
  describe "invalid ASCII encodings" do
data/unibits.gemspec CHANGED
@@ -5,8 +5,8 @@ require File.dirname(__FILE__) + "/lib/unibits/version"
5
5
  Gem::Specification.new do |gem|
6
6
  gem.name = "unibits"
7
7
  gem.version = Unibits::VERSION
8
- gem.summary = "Visualizes Unicode encodings."
9
- gem.description = "Visualizes Unicode encodings in the terminal. Supports UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE, US-ASCII, and ASCII-8BIT. Comes as CLI command and as Ruby Kernel method."
8
+ gem.summary = "Visualizes encodings."
9
+ gem.description = "Visualizes encodings in the terminal. Supports UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE, US-ASCII, ASCII-8BIT, and most of Rubies single-byte encodings. Comes as CLI command and as Ruby Kernel method."
10
10
  gem.authors = ["Jan Lelis"]
11
11
  gem.email = ["mail@janlelis.de"]
12
12
  gem.homepage = "https://github.com/janlelis/unibits"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unibits
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.0
4
+ version: 2.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-03-14 00:00:00.000000000 Z
11
+ date: 2017-03-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: paint
@@ -72,9 +72,9 @@ dependencies:
72
72
  - - "~>"
73
73
  - !ruby/object:Gem::Version
74
74
  version: '2.0'
75
- description: Visualizes Unicode encodings in the terminal. Supports UTF-8, UTF-16LE,
76
- UTF-16BE, UTF-32LE, UTF-32BE, US-ASCII, and ASCII-8BIT. Comes as CLI command and
77
- as Ruby Kernel method.
75
+ description: Visualizes encodings in the terminal. Supports UTF-8, UTF-16LE, UTF-16BE,
76
+ UTF-32LE, UTF-32BE, US-ASCII, ASCII-8BIT, and most of Rubies single-byte encodings.
77
+ Comes as CLI command and as Ruby Kernel method.
78
78
  email:
79
79
  - mail@janlelis.de
80
80
  executables:
@@ -121,6 +121,6 @@ rubyforge_project:
121
121
  rubygems_version: 2.6.8
122
122
  signing_key:
123
123
  specification_version: 4
124
- summary: Visualizes Unicode encodings.
124
+ summary: Visualizes encodings.
125
125
  test_files:
126
126
  - spec/unibits_spec.rb