unicoder 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: f79eb48ad06b13b61fc4ceb7fc5e176ee4e9e984
4
+ data.tar.gz: 94a62eb108e01e1d7da774b58352ab0585235bc7
5
+ SHA512:
6
+ metadata.gz: 01714742c72568ab92a9c3df0b700f3918e32482b7f658da8f099e2cfb54359e098e90fa1caa72a343cbdf2ede36081a9c01a6d65ee76cee841e65b87c9083ad
7
+ data.tar.gz: dd5b55100962d9408a503b338ebf25062c3dee7dc1ff9ceaccd97e30d57f97d131191ef96ce266e267cc729d70e6a0860702f22fbb1c9a6e4b512547ff1b5805
@@ -0,0 +1,3 @@
1
+ Gemfile.lock
2
+ /pkg
3
+ /data
@@ -0,0 +1,20 @@
1
+ sudo: false
2
+ language: ruby
3
+
4
+ script: bundle exec ruby spec/unicoder_spec.rb
5
+
6
+ rvm:
7
+ - 2.3.0
8
+ - 2.2
9
+ - 2.1
10
+ - 2.0
11
+ - ruby-head
12
+ - rbx-2
13
+ - jruby-head
14
+ - jruby-9000
15
+
16
+ cache:
17
+ - bundler
18
+
19
+ # matrix:
20
+ # fast_finish: true
@@ -0,0 +1,5 @@
1
+ ## CHANGELOG
2
+
3
+ ### 0.1.0
4
+
5
+ * WIP
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at opensource@janlelis.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ gem 'minitest'
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2016 Jan Lelis, mail@janlelis.de
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,15 @@
1
+ # unicoder [![[version]](https://badge.fury.io/rb/unicoder.svg)](http://badge.fury.io/rb/unicoder)
2
+
3
+ WIP
4
+
5
+
6
+ ## Usage
7
+
8
+ ```
9
+ $ unicoder build index_name
10
+ ```
11
+
12
+
13
+ ## MIT License
14
+
15
+ Copyright (C) 2016 Jan Lelis <http://janlelis.com>. Released under the MIT license.
@@ -0,0 +1,35 @@
1
+ # # #
2
+ # Get gemspec info
3
+
4
+ gemspec_file = Dir['*.gemspec'].first
5
+ gemspec = eval File.read(gemspec_file), binding, gemspec_file
6
+ info = "#{gemspec.name} | #{gemspec.version} | " \
7
+ "#{gemspec.runtime_dependencies.size} dependencies | " \
8
+ "#{gemspec.files.size} files"
9
+
10
+
11
+ # # #
12
+ # Gem build and install task
13
+
14
+ desc info
15
+ task :gem do
16
+ puts info + "\n\n"
17
+ print " "; sh "gem build #{gemspec_file}"
18
+ FileUtils.mkdir_p 'pkg'
19
+ FileUtils.mv "#{gemspec.name}-#{gemspec.version}.gem", 'pkg'
20
+ puts; sh %{gem install --no-document pkg/#{gemspec.name}-#{gemspec.version}.gem}
21
+ end
22
+
23
+
24
+ # # #
25
+ # Start an IRB session with the gem loaded
26
+
27
+ desc "#{gemspec.name} | IRB"
28
+ task :irb do
29
+ sh "irb -I ./lib -r #{gemspec.name.gsub '-','/'}"
30
+ end
31
+
32
+ # # #
33
+ # Require self
34
+
35
+ require_relative 'lib/unicoder'
@@ -0,0 +1,40 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require_relative "../lib/unicoder"
4
+ require "rationalist"
5
+
6
+ args = Rationalist.parse
7
+ command = args[:_][0]
8
+ identifier = args[:_][1]
9
+ KNOWN_OPTIONS = [:version, :help, :verbose, :format, :gzip]
10
+ options = args.select { |option,| KNOWN_OPTIONS.include? option }
11
+
12
+ if options.has_key?(:version)
13
+ puts "unicoder #{Unicoder::VERSION}"
14
+ elsif options.has_key?(:help)
15
+ puts <<USAGE_INSTRUCTIONS
16
+
17
+ USAGE
18
+
19
+ unicoder fetch <data_identifier>
20
+ unicoder build <builder_name> <output_file>
21
+
22
+ DATA FILE IDENTIFIERS
23
+
24
+
25
+
26
+ BUILDERS
27
+
28
+
29
+
30
+ USAGE_INSTRUCTIONS
31
+ else
32
+ case command
33
+ when "fetch"
34
+ Unicoder::Downloader.fetch(identifier, **options)
35
+ when "build"
36
+ Unicoder::Builder.build(identifier, **options)
37
+ else
38
+ raise ArgumentError, "Unknown unicoder command!"
39
+ end
40
+ end
@@ -0,0 +1,8 @@
1
+ require_relative "unicoder/constants"
2
+ require_relative "unicoder/downloader"
3
+ require_relative "unicoder/builder"
4
+ require_relative "unicoder/multi_dimensional_array_builder"
5
+
6
+ if defined?(Rake)
7
+ Rake.add_rakelib(File.expand_path('../unicoder', __FILE__))
8
+ end
@@ -0,0 +1,76 @@
1
+ require "json"
2
+
3
+ module Unicoder
4
+ # A builder defines a parse function which translates one (ore more) unicode data
5
+ # files into an index hash
6
+ module Builder
7
+ attr_reader :index
8
+
9
+ def initialize(unicode_version = nil)
10
+ @unicode_version = unicode_version
11
+ initialize_index
12
+ end
13
+
14
+ def initialize_index
15
+ @index = {}
16
+ end
17
+
18
+ def assign_codepoint(codepoint, value, index = @index)
19
+ index[codepoint] = value
20
+ end
21
+
22
+ def parse!
23
+ raise ArgumentError, "abstract"
24
+ end
25
+
26
+ def parse_file(identifier, parse_mode, **parse_options)
27
+ filename = UNICODE_FILES[identifier.to_sym] || filename
28
+ raise ArgumentError, "No valid file identifier or filename given" if !filename
29
+ filename.sub! 'VERSION', @unicode_version
30
+ Downloader.fetch(identifier) unless File.exists?(filename)
31
+ file = File.read(LOCAL_DATA_DIRECTORY + filename)
32
+
33
+ if parse_mode == :line
34
+ file.each_line{ |line|
35
+ yield Hash[ $~.names.zip( $~.captures ) ] if line =~ parse_options[:regex]
36
+ }
37
+ end
38
+ end
39
+
40
+ def export(format: :marshal, **options)
41
+ p index if options[:verbose]
42
+
43
+ case format.to_sym
44
+ when :marshal
45
+ index_file = Marshal.dump(index)
46
+ when :json
47
+ index_file = JSON.dump(index)
48
+ end
49
+
50
+ # if false# || options[:gzip]
51
+ if options[:gzip]
52
+ Gem.gzip(index_file)
53
+ else
54
+ index_file
55
+ end
56
+ end
57
+
58
+ def self.build(identifier, **options)
59
+ format = options[:format] || :marshal
60
+ require_relative "builders/#{identifier}"
61
+ # require "unicoder/builders/#{identifier}"
62
+ builder_class = self.const_get(identifier.to_s.gsub(/(?:^|_)([a-z])/){ $1.upcase })
63
+ builder = builder_class.new(options[:unicode_version] || CURRENT_UNICODE_VERSION)
64
+ puts "Building index for #{identifier}…"
65
+ builder.parse!
66
+ index_file = builder.export(options)
67
+
68
+ destination ||= options[:destination] || identifier.to_s
69
+ destination += ".#{format}"
70
+ destination += ".gz" if options[:gzip]
71
+ bytes = File.write destination, index_file
72
+
73
+ puts "Index created at: #{destination} (#{bytes} bytes written)"
74
+ end
75
+ end
76
+ end
@@ -0,0 +1,17 @@
1
+ module Unicoder
2
+ module Builder
3
+ class Blocks
4
+ include Builder
5
+
6
+ def initialize_index
7
+ @index = []
8
+ end
9
+
10
+ def parse!
11
+ parse_file :blocks, :line, regex: /^(?<from>\S+?)\.\.(?<to>\S+);\s(?<name>.+)$/ do |line|
12
+ @index << [line["from"].to_i(16), line["to"].to_i(16), line["name"]]
13
+ end
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,43 @@
1
+ module Unicoder
2
+ module Builder
3
+ # Assigns categories to every codepoint using a multi dimensional Array index structure
4
+ class Categories
5
+ include Builder
6
+ include MultiDimensionalArrayBuilder
7
+
8
+ def initialize_index
9
+ @index = {
10
+ CATEGORIES: [],
11
+ CATEGORY_NAMES: {},
12
+ }
13
+ @range_start = nil
14
+ end
15
+
16
+ def parse!
17
+ parse_file :unicode_data, :line, regex: /^(?<codepoint>.+?);(?<range><(?!control).+>)?.*?;(?<category>.+?);.*$/ do |line|
18
+ if line["range"]
19
+ if line["range"] =~ /First/
20
+ @range_start = line["codepoint"].to_i(16)
21
+ elsif line["range"] =~ /Last/ && @range_start
22
+ (@range_start..line["codepoint"].to_i(16)).each{ |codepoint|
23
+ assign_codepoint(codepoint, line["category"], @index[:CATEGORIES])
24
+ }
25
+ else
26
+ raise ArgumentError, "inconsistent range found in data, don't know what to do"
27
+ end
28
+ else
29
+ assign_codepoint(line["codepoint"].to_i(16), line["category"], @index[:CATEGORIES])
30
+ end
31
+ end
32
+
33
+ 4.times{ compress! @index[:CATEGORIES] }
34
+
35
+ parse_file :property_value_aliases, :line, regex: /^gc ; (?<short>\S{2}?) *; (?<long>\S+).*$/ do |line|
36
+ @index[:CATEGORY_NAMES][line["short"]] = line["long"]
37
+ end
38
+
39
+ @index
40
+ end
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,21 @@
1
+ module Unicoder
2
+ module Builder
3
+ class Confusable
4
+ include Builder
5
+
6
+ def parse!
7
+ parse_file :confusables, :line, regex: /^(?<from>\S+)\s+;\s+(?<to>.+)\s+;.*$/ do |line|
8
+ source = line["from"].to_i(16)
9
+ if line["to"].include?(" ")
10
+ replace_with = line["to"].split(" ").map{ |codepoint|
11
+ codepoint.to_i(16)
12
+ }
13
+ else
14
+ replace_with = line["to"].to_i(16)
15
+ end
16
+ @index[source] = replace_with
17
+ end
18
+ end
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,71 @@
1
+ module Unicoder
2
+ module Builder
3
+ class DisplayWidth
4
+ include Builder
5
+ include MultiDimensionalArrayBuilder
6
+
7
+ IGNORE_CATEGORIES = %w[Cs Co Cn].freeze
8
+ ZERO_WIDTH_CATEGORIES = %w[Mn Me Cf].freeze
9
+ ZERO_WIDTH_CODEPOINTS = [*0x1160..0x11FF].freeze
10
+ SPECIAL_WIDTHS = {
11
+ 0x0 => 0, # \0 NULL
12
+ 0x5 => 0, # ENQUIRY
13
+ 0x7 => 0, # \a BELL
14
+ 0x8 => -1, # \b BACKSPACE
15
+ 0xA => 0, # \n LINE FEED
16
+ 0xB => 0, # \v LINE TABULATION
17
+ 0xC => 0, # \f FORM FEED
18
+ 0xD => 0, # \r CARRIAGE RETURN
19
+ 0xE => 0, # SHIFT OUT
20
+ 0xF => 0, # SHIFT IN
21
+ 0x00AD => 1, # SOFT HYPHEN
22
+ 0x2E3A => 2, # TWO-EM DASH
23
+ 0x2E3B => 3, # THREE-EM DASH
24
+ }.freeze
25
+
26
+ def initialize_index
27
+ @index = []
28
+ end
29
+
30
+ def parse!
31
+ parse_file :east_asian_width, :line, regex: /^(?<codepoints>\S+?);(?<width>\S+)\s+#\s(?<category>\S+).*$/ do |line|
32
+ next if IGNORE_CATEGORIES.include?(line["category"])
33
+
34
+ if line["codepoints"]['..']
35
+ codepoints = Range.new(*line["codepoints"].split('..').map{ |codepoint|
36
+ codepoint.to_i(16)
37
+ })
38
+ else
39
+ codepoints = [line["codepoints"].to_i(16)]
40
+ end
41
+
42
+ codepoints.each{ |codepoint|
43
+ assign_codepoint codepoint, determine_width(codepoint, line["category"], line["width"])
44
+ }
45
+ end
46
+
47
+ SPECIAL_WIDTHS.each{ |codepoint, value|
48
+ assign_codepoint codepoint, value
49
+ }
50
+
51
+ 4.times{ compress! }
52
+
53
+ p @index
54
+ end
55
+
56
+ def determine_width(codepoint, category, east_asian_width)
57
+ if ( ZERO_WIDTH_CATEGORIES.include?(category) &&
58
+ [codepoint].pack('U') !~ /\p{Cf}(?<=\p{Arabic})/ ) ||
59
+ ZERO_WIDTH_CODEPOINTS.include?(codepoint)
60
+ 0
61
+ elsif east_asian_width == "F" || east_asian_width == "W"
62
+ 2
63
+ elsif east_asian_width == "A"
64
+ :A
65
+ else
66
+ nil
67
+ end
68
+ end
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,59 @@
1
+ module Unicoder
2
+ module Builder
3
+ class Scripts
4
+ include Builder
5
+ include MultiDimensionalArrayBuilder
6
+
7
+ def initialize_index
8
+ @index = {
9
+ SCRIPTS: [],
10
+ SCRIPT_EXTENSIONS: {},
11
+ SCRIPT_ALIASES: {},
12
+ SCRIPT_NAMES: [],
13
+ }
14
+ @reverse_script_names = {}
15
+ @reverse_script_extension_names = {}
16
+ end
17
+
18
+ def lookup_extension_names(extension_scripts_string)
19
+ extension_scripts_string.split(" ").map{ |extension_script|
20
+ @reverse_script_extension_names[extension_script]
21
+ }
22
+ end
23
+
24
+ def parse!
25
+ parse_file :property_value_aliases, :line, regex: /^sc ; (?<short>\S+?)\s*; (?<long>\S+?)(?:\s*; (?<short2>\S+))?$/ do |line|
26
+ @index[:SCRIPT_NAMES] << line["long"]
27
+ script_number = @reverse_script_names.size
28
+ @reverse_script_names[line["long"]] = script_number
29
+
30
+ @index[:SCRIPT_ALIASES][line["short" ]] = script_number
31
+ @index[:SCRIPT_ALIASES][line["short2"]] = script_number if line["short2"]
32
+ @reverse_script_extension_names[line["short"]] = script_number
33
+ end
34
+
35
+ parse_file :scripts, :line, regex: /^(?<from>\S+?)(\.\.(?<to>\S+))?\s+; (?<script>\S+) #.*$/ do |line|
36
+ if line["to"]
37
+ (line["from"].to_i(16)..line["to"].to_i(16)).each{ |codepoint|
38
+ assign_codepoint codepoint, @reverse_script_names[line["script"]], @index[:SCRIPTS]
39
+ }
40
+ else
41
+ assign_codepoint line["from"].to_i(16), @reverse_script_names[line["script"]], @index[:SCRIPTS]
42
+ end
43
+ end
44
+
45
+ 4.times{ compress! @index[:SCRIPTS] }
46
+
47
+ parse_file :script_extensions, :line, regex: /^(?<from>\S+?)(\.\.(?<to>\S+))?\s+; (?<scripts>.+?) #.*$/ do |line|
48
+ if line["to"]
49
+ (line["from"].to_i(16)..line["to"].to_i(16)).each{ |codepoint|
50
+ @index[:SCRIPT_EXTENSIONS][codepoint] = lookup_extension_names(line["scripts"])
51
+ }
52
+ else
53
+ @index[:SCRIPT_EXTENSIONS][line["from"].to_i(16)] = lookup_extension_names(line["scripts"])
54
+ end
55
+ end
56
+ end
57
+ end
58
+ end
59
+ end
@@ -0,0 +1,29 @@
1
+ module Unicoder
2
+ VERSION = "0.1.0".freeze
3
+
4
+ CURRENT_UNICODE_VERSION = "8.0.0".freeze
5
+
6
+ UNICODE_VERSIONS = %w[
7
+ 6.3.0
8
+ 7.0.0
9
+ 8.0.0
10
+ 9.0.0
11
+ ].freeze
12
+
13
+ UNICODE_DATA_ENDPOINT = "ftp://ftp.unicode.org/Public".freeze
14
+
15
+ LOCAL_DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../data/unicode").freeze
16
+
17
+ UNICODE_FILES = {
18
+ east_asian_width: "/VERSION/ucd/EastAsianWidth.txt",
19
+ unicode_data: "/VERSION/ucd/UnicodeData.txt",
20
+ name_aliases: "/VERSION/ucd/NameAliases.txt",
21
+ confusables: "/security/VERSION/confusables.txt",
22
+ blocks: "/VERSION/ucd/Blocks.txt",
23
+ scripts: "/VERSION/ucd/Scripts.txt",
24
+ script_extensions: "/VERSION/ucd/ScriptExtensions.txt",
25
+ property_value_aliases: "/VERSION/ucd/PropertyValueAliases.txt",
26
+ general_categories: "/VERSION/ucd/extracted/DerivedGeneralCategory.txt",
27
+ }
28
+ end
29
+
@@ -0,0 +1,28 @@
1
+ require "open-uri"
2
+ require "fileutils"
3
+
4
+ module Unicoder
5
+ module Downloader
6
+ def self.fetch(identifier,
7
+ unicode_version: CURRENT_UNICODE_VERSION,
8
+ destination_directory: LOCAL_DATA_DIRECTORY,
9
+ destination: nil,
10
+ filename: nil
11
+ )
12
+ filename = UNICODE_FILES[identifier.to_sym] || filename
13
+ raise ArgumentError, "No valid file identifier or filename given" if !filename
14
+ filename.sub! 'VERSION', unicode_version
15
+ source = UNICODE_DATA_ENDPOINT + filename
16
+ destination ||= destination_directory + filename
17
+
18
+ open(source){ |f|
19
+ FileUtils.mkdir_p(File.dirname(destination))
20
+ File.write(destination, f.read)
21
+ }
22
+
23
+ puts "GET #{source} => #{destination}"
24
+ rescue => e
25
+ $stderr.puts "#{e.class}: #{e.message}"
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,64 @@
1
+ require "json"
2
+
3
+ module Unicoder
4
+ # Include after Builder
5
+ module MultiDimensionalArrayBuilder
6
+ def initialize_index
7
+ @index = []
8
+ end
9
+
10
+ def assign_codepoint(codepoint, value, index = @index)
11
+ plane = codepoint / 0x10000
12
+ plane_offset = codepoint % 0x10000
13
+ row = plane_offset / 0x1000
14
+ row_offset = plane_offset % 0x1000
15
+ byte = row_offset / 0x100
16
+ byte_offset = row_offset % 0x100
17
+ nibble = byte_offset / 0x10
18
+ nibble_offset = byte_offset % 0x10
19
+
20
+ index[plane] ||= []
21
+ index[plane][row] ||= []
22
+ index[plane][row][byte] ||= []
23
+ index[plane][row][byte][nibble] ||= []
24
+ index[plane][row][byte][nibble][nibble_offset] = value
25
+ end
26
+
27
+ def compress!(index = @index)
28
+ index.map!{ |plane|
29
+ if !plane.is_a?(Array)
30
+ plane
31
+ elsif plane.flatten.uniq.size == 1
32
+ plane[0]
33
+ else
34
+ plane.map!{ |row|
35
+ if !row.is_a?(Array)
36
+ row
37
+ elsif row.flatten.uniq.size == 1
38
+ row[0]
39
+ else
40
+ row.map!{ |byte|
41
+ if !byte.is_a?(Array)
42
+ byte
43
+ elsif byte.uniq.size == 1
44
+ byte[0]
45
+ else
46
+ byte.map! { |nibble|
47
+ if !nibble.is_a?(Array)
48
+ nibble
49
+ elsif nibble.uniq.size == 1
50
+ nibble[0]
51
+ else
52
+ nibble
53
+ end
54
+ }
55
+ end
56
+ }
57
+ end
58
+ }
59
+ end
60
+ }
61
+ end
62
+
63
+ end
64
+ end
@@ -0,0 +1,11 @@
1
+ namespace :unicoder do
2
+ desc "(fetch)"
3
+ task :fetch, [:identifier] do |t, args|
4
+ Unicoder::Downloader.fetch(args.identifier)
5
+ end
6
+
7
+ desc "(index)"
8
+ task :index, [:identifier] do |t, args|
9
+ Unicoder::Builder.build(args.identifier)
10
+ end
11
+ end
@@ -0,0 +1,9 @@
1
+ require_relative "../lib/unicoder"
2
+ require "minitest/autorun"
3
+
4
+ describe Unicoder do
5
+ it "works" do
6
+ assert_equal true, false
7
+ end
8
+ end
9
+
@@ -0,0 +1,22 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require File.dirname(__FILE__) + "/lib/unicoder/constants"
4
+
5
+ Gem::Specification.new do |gem|
6
+ gem.name = "unicoder"
7
+ gem.version = Unicoder::VERSION
8
+ gem.summary = "Create specialized indexes for Unicode data lookup"
9
+ gem.description = "Generate specialized indexes for Unicode data lookup"
10
+ gem.authors = ["Jan Lelis"]
11
+ gem.email = ["mail@janlelis.de"]
12
+ gem.homepage = "https://github.com/janlelis/unicoder"
13
+ gem.license = "MIT"
14
+
15
+ gem.files = Dir["{**/}{.*,*}"].select{ |path| File.file?(path) && path !~ /^pkg/ }
16
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.required_ruby_version = "~> 2.0"
21
+ gem.add_dependency "rationalist", "~> 2.0"
22
+ end
metadata ADDED
@@ -0,0 +1,93 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: unicoder
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Jan Lelis
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-04-13 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rationalist
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '2.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '2.0'
27
+ description: Generate specialized indexes for Unicode data lookup
28
+ email:
29
+ - mail@janlelis.de
30
+ executables:
31
+ - unicoder
32
+ extensions: []
33
+ extra_rdoc_files: []
34
+ files:
35
+ - ".gitignore"
36
+ - ".travis.yml"
37
+ - CHANGELOG.md
38
+ - CODE_OF_CONDUCT.md
39
+ - Gemfile
40
+ - MIT-LICENSE.txt
41
+ - README.md
42
+ - Rakefile
43
+ - bin/unicoder
44
+ - data/.keep
45
+ - data/unicode/8.0.0/ucd/Blocks.txt
46
+ - data/unicode/8.0.0/ucd/EastAsianWidth.txt
47
+ - data/unicode/8.0.0/ucd/NameAliases.txt
48
+ - data/unicode/8.0.0/ucd/PropertyValueAliases.txt
49
+ - data/unicode/8.0.0/ucd/ScriptExtensions.txt
50
+ - data/unicode/8.0.0/ucd/Scripts.txt
51
+ - data/unicode/8.0.0/ucd/UnicodeData.txt
52
+ - data/unicode/8.0.0/ucd/extracted/DerivedGeneralCategory.txt
53
+ - data/unicode/security/8.0.0/confusables.txt
54
+ - lib/unicoder.rb
55
+ - lib/unicoder/builder.rb
56
+ - lib/unicoder/builders/blocks.rb
57
+ - lib/unicoder/builders/categories.rb
58
+ - lib/unicoder/builders/confusable.rb
59
+ - lib/unicoder/builders/display_width.rb
60
+ - lib/unicoder/builders/scripts.rb
61
+ - lib/unicoder/constants.rb
62
+ - lib/unicoder/downloader.rb
63
+ - lib/unicoder/multi_dimensional_array_builder.rb
64
+ - lib/unicoder/tasks.rake
65
+ - spec/unicoder_spec.rb
66
+ - unicoder.gemspec
67
+ homepage: https://github.com/janlelis/unicoder
68
+ licenses:
69
+ - MIT
70
+ metadata: {}
71
+ post_install_message:
72
+ rdoc_options: []
73
+ require_paths:
74
+ - lib
75
+ required_ruby_version: !ruby/object:Gem::Requirement
76
+ requirements:
77
+ - - "~>"
78
+ - !ruby/object:Gem::Version
79
+ version: '2.0'
80
+ required_rubygems_version: !ruby/object:Gem::Requirement
81
+ requirements:
82
+ - - ">="
83
+ - !ruby/object:Gem::Version
84
+ version: '0'
85
+ requirements: []
86
+ rubyforge_project:
87
+ rubygems_version: 2.6.3
88
+ signing_key:
89
+ specification_version: 4
90
+ summary: Create specialized indexes for Unicode data lookup
91
+ test_files:
92
+ - spec/unicoder_spec.rb
93
+ has_rdoc: