unicoder 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: f79eb48ad06b13b61fc4ceb7fc5e176ee4e9e984
4
+ data.tar.gz: 94a62eb108e01e1d7da774b58352ab0585235bc7
5
+ SHA512:
6
+ metadata.gz: 01714742c72568ab92a9c3df0b700f3918e32482b7f658da8f099e2cfb54359e098e90fa1caa72a343cbdf2ede36081a9c01a6d65ee76cee841e65b87c9083ad
7
+ data.tar.gz: dd5b55100962d9408a503b338ebf25062c3dee7dc1ff9ceaccd97e30d57f97d131191ef96ce266e267cc729d70e6a0860702f22fbb1c9a6e4b512547ff1b5805
@@ -0,0 +1,3 @@
1
+ Gemfile.lock
2
+ /pkg
3
+ /data
@@ -0,0 +1,20 @@
1
+ sudo: false
2
+ language: ruby
3
+
4
+ script: bundle exec ruby spec/unicoder_spec.rb
5
+
6
+ rvm:
7
+ - 2.3.0
8
+ - 2.2
9
+ - 2.1
10
+ - 2.0
11
+ - ruby-head
12
+ - rbx-2
13
+ - jruby-head
14
+ - jruby-9000
15
+
16
+ cache:
17
+ - bundler
18
+
19
+ # matrix:
20
+ # fast_finish: true
@@ -0,0 +1,5 @@
1
+ ## CHANGELOG
2
+
3
+ ### 0.1.0
4
+
5
+ * WIP
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at opensource@janlelis.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ gem 'minitest'
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2016 Jan Lelis, mail@janlelis.de
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,15 @@
1
+ # unicoder [![[version]](https://badge.fury.io/rb/unicoder.svg)](http://badge.fury.io/rb/unicoder)
2
+
3
+ WIP
4
+
5
+
6
+ ## Usage
7
+
8
+ ```
9
+ $ unicoder build index_name
10
+ ```
11
+
12
+
13
+ ## MIT License
14
+
15
+ Copyright (C) 2016 Jan Lelis <http://janlelis.com>. Released under the MIT license.
@@ -0,0 +1,35 @@
1
+ # # #
2
+ # Get gemspec info
3
+
4
+ gemspec_file = Dir['*.gemspec'].first
5
+ gemspec = eval File.read(gemspec_file), binding, gemspec_file
6
+ info = "#{gemspec.name} | #{gemspec.version} | " \
7
+ "#{gemspec.runtime_dependencies.size} dependencies | " \
8
+ "#{gemspec.files.size} files"
9
+
10
+
11
+ # # #
12
+ # Gem build and install task
13
+
14
+ desc info
15
+ task :gem do
16
+ puts info + "\n\n"
17
+ print " "; sh "gem build #{gemspec_file}"
18
+ FileUtils.mkdir_p 'pkg'
19
+ FileUtils.mv "#{gemspec.name}-#{gemspec.version}.gem", 'pkg'
20
+ puts; sh %{gem install --no-document pkg/#{gemspec.name}-#{gemspec.version}.gem}
21
+ end
22
+
23
+
24
+ # # #
25
+ # Start an IRB session with the gem loaded
26
+
27
+ desc "#{gemspec.name} | IRB"
28
+ task :irb do
29
+ sh "irb -I ./lib -r #{gemspec.name.gsub '-','/'}"
30
+ end
31
+
32
+ # # #
33
+ # Require self
34
+
35
+ require_relative 'lib/unicoder'
@@ -0,0 +1,40 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require_relative "../lib/unicoder"
4
+ require "rationalist"
5
+
6
+ args = Rationalist.parse
7
+ command = args[:_][0]
8
+ identifier = args[:_][1]
9
+ KNOWN_OPTIONS = [:version, :help, :verbose, :format, :gzip]
10
+ options = args.select { |option,| KNOWN_OPTIONS.include? option }
11
+
12
+ if options.has_key?(:version)
13
+ puts "unicoder #{Unicoder::VERSION}"
14
+ elsif options.has_key?(:help)
15
+ puts <<USAGE_INSTRUCTIONS
16
+
17
+ USAGE
18
+
19
+ unicoder fetch <data_identifier>
20
+ unicoder build <builder_name> <output_file>
21
+
22
+ DATA FILE IDENTIFIERS
23
+
24
+
25
+
26
+ BUILDERS
27
+
28
+
29
+
30
+ USAGE_INSTRUCTIONS
31
+ else
32
+ case command
33
+ when "fetch"
34
+ Unicoder::Downloader.fetch(identifier, **options)
35
+ when "build"
36
+ Unicoder::Builder.build(identifier, **options)
37
+ else
38
+ raise ArgumentError, "Unknown unicoder command!"
39
+ end
40
+ end
@@ -0,0 +1,8 @@
1
+ require_relative "unicoder/constants"
2
+ require_relative "unicoder/downloader"
3
+ require_relative "unicoder/builder"
4
+ require_relative "unicoder/multi_dimensional_array_builder"
5
+
6
+ if defined?(Rake)
7
+ Rake.add_rakelib(File.expand_path('../unicoder', __FILE__))
8
+ end
@@ -0,0 +1,76 @@
1
+ require "json"
2
+
3
+ module Unicoder
4
+ # A builder defines a parse function which translates one (ore more) unicode data
5
+ # files into an index hash
6
+ module Builder
7
+ attr_reader :index
8
+
9
+ def initialize(unicode_version = nil)
10
+ @unicode_version = unicode_version
11
+ initialize_index
12
+ end
13
+
14
+ def initialize_index
15
+ @index = {}
16
+ end
17
+
18
+ def assign_codepoint(codepoint, value, index = @index)
19
+ index[codepoint] = value
20
+ end
21
+
22
+ def parse!
23
+ raise ArgumentError, "abstract"
24
+ end
25
+
26
+ def parse_file(identifier, parse_mode, **parse_options)
27
+ filename = UNICODE_FILES[identifier.to_sym] || filename
28
+ raise ArgumentError, "No valid file identifier or filename given" if !filename
29
+ filename.sub! 'VERSION', @unicode_version
30
+ Downloader.fetch(identifier) unless File.exists?(filename)
31
+ file = File.read(LOCAL_DATA_DIRECTORY + filename)
32
+
33
+ if parse_mode == :line
34
+ file.each_line{ |line|
35
+ yield Hash[ $~.names.zip( $~.captures ) ] if line =~ parse_options[:regex]
36
+ }
37
+ end
38
+ end
39
+
40
+ def export(format: :marshal, **options)
41
+ p index if options[:verbose]
42
+
43
+ case format.to_sym
44
+ when :marshal
45
+ index_file = Marshal.dump(index)
46
+ when :json
47
+ index_file = JSON.dump(index)
48
+ end
49
+
50
+ # if false# || options[:gzip]
51
+ if options[:gzip]
52
+ Gem.gzip(index_file)
53
+ else
54
+ index_file
55
+ end
56
+ end
57
+
58
+ def self.build(identifier, **options)
59
+ format = options[:format] || :marshal
60
+ require_relative "builders/#{identifier}"
61
+ # require "unicoder/builders/#{identifier}"
62
+ builder_class = self.const_get(identifier.to_s.gsub(/(?:^|_)([a-z])/){ $1.upcase })
63
+ builder = builder_class.new(options[:unicode_version] || CURRENT_UNICODE_VERSION)
64
+ puts "Building index for #{identifier}…"
65
+ builder.parse!
66
+ index_file = builder.export(options)
67
+
68
+ destination ||= options[:destination] || identifier.to_s
69
+ destination += ".#{format}"
70
+ destination += ".gz" if options[:gzip]
71
+ bytes = File.write destination, index_file
72
+
73
+ puts "Index created at: #{destination} (#{bytes} bytes written)"
74
+ end
75
+ end
76
+ end
@@ -0,0 +1,17 @@
1
+ module Unicoder
2
+ module Builder
3
+ class Blocks
4
+ include Builder
5
+
6
+ def initialize_index
7
+ @index = []
8
+ end
9
+
10
+ def parse!
11
+ parse_file :blocks, :line, regex: /^(?<from>\S+?)\.\.(?<to>\S+);\s(?<name>.+)$/ do |line|
12
+ @index << [line["from"].to_i(16), line["to"].to_i(16), line["name"]]
13
+ end
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,43 @@
1
+ module Unicoder
2
+ module Builder
3
+ # Assigns categories to every codepoint using a multi dimensional Array index structure
4
+ class Categories
5
+ include Builder
6
+ include MultiDimensionalArrayBuilder
7
+
8
+ def initialize_index
9
+ @index = {
10
+ CATEGORIES: [],
11
+ CATEGORY_NAMES: {},
12
+ }
13
+ @range_start = nil
14
+ end
15
+
16
+ def parse!
17
+ parse_file :unicode_data, :line, regex: /^(?<codepoint>.+?);(?<range><(?!control).+>)?.*?;(?<category>.+?);.*$/ do |line|
18
+ if line["range"]
19
+ if line["range"] =~ /First/
20
+ @range_start = line["codepoint"].to_i(16)
21
+ elsif line["range"] =~ /Last/ && @range_start
22
+ (@range_start..line["codepoint"].to_i(16)).each{ |codepoint|
23
+ assign_codepoint(codepoint, line["category"], @index[:CATEGORIES])
24
+ }
25
+ else
26
+ raise ArgumentError, "inconsistent range found in data, don't know what to do"
27
+ end
28
+ else
29
+ assign_codepoint(line["codepoint"].to_i(16), line["category"], @index[:CATEGORIES])
30
+ end
31
+ end
32
+
33
+ 4.times{ compress! @index[:CATEGORIES] }
34
+
35
+ parse_file :property_value_aliases, :line, regex: /^gc ; (?<short>\S{2}?) *; (?<long>\S+).*$/ do |line|
36
+ @index[:CATEGORY_NAMES][line["short"]] = line["long"]
37
+ end
38
+
39
+ @index
40
+ end
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,21 @@
1
+ module Unicoder
2
+ module Builder
3
+ class Confusable
4
+ include Builder
5
+
6
+ def parse!
7
+ parse_file :confusables, :line, regex: /^(?<from>\S+)\s+;\s+(?<to>.+)\s+;.*$/ do |line|
8
+ source = line["from"].to_i(16)
9
+ if line["to"].include?(" ")
10
+ replace_with = line["to"].split(" ").map{ |codepoint|
11
+ codepoint.to_i(16)
12
+ }
13
+ else
14
+ replace_with = line["to"].to_i(16)
15
+ end
16
+ @index[source] = replace_with
17
+ end
18
+ end
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,71 @@
1
+ module Unicoder
2
+ module Builder
3
+ class DisplayWidth
4
+ include Builder
5
+ include MultiDimensionalArrayBuilder
6
+
7
+ IGNORE_CATEGORIES = %w[Cs Co Cn].freeze
8
+ ZERO_WIDTH_CATEGORIES = %w[Mn Me Cf].freeze
9
+ ZERO_WIDTH_CODEPOINTS = [*0x1160..0x11FF].freeze
10
+ SPECIAL_WIDTHS = {
11
+ 0x0 => 0, # \0 NULL
12
+ 0x5 => 0, # ENQUIRY
13
+ 0x7 => 0, # \a BELL
14
+ 0x8 => -1, # \b BACKSPACE
15
+ 0xA => 0, # \n LINE FEED
16
+ 0xB => 0, # \v LINE TABULATION
17
+ 0xC => 0, # \f FORM FEED
18
+ 0xD => 0, # \r CARRIAGE RETURN
19
+ 0xE => 0, # SHIFT OUT
20
+ 0xF => 0, # SHIFT IN
21
+ 0x00AD => 1, # SOFT HYPHEN
22
+ 0x2E3A => 2, # TWO-EM DASH
23
+ 0x2E3B => 3, # THREE-EM DASH
24
+ }.freeze
25
+
26
+ def initialize_index
27
+ @index = []
28
+ end
29
+
30
+ def parse!
31
+ parse_file :east_asian_width, :line, regex: /^(?<codepoints>\S+?);(?<width>\S+)\s+#\s(?<category>\S+).*$/ do |line|
32
+ next if IGNORE_CATEGORIES.include?(line["category"])
33
+
34
+ if line["codepoints"]['..']
35
+ codepoints = Range.new(*line["codepoints"].split('..').map{ |codepoint|
36
+ codepoint.to_i(16)
37
+ })
38
+ else
39
+ codepoints = [line["codepoints"].to_i(16)]
40
+ end
41
+
42
+ codepoints.each{ |codepoint|
43
+ assign_codepoint codepoint, determine_width(codepoint, line["category"], line["width"])
44
+ }
45
+ end
46
+
47
+ SPECIAL_WIDTHS.each{ |codepoint, value|
48
+ assign_codepoint codepoint, value
49
+ }
50
+
51
+ 4.times{ compress! }
52
+
53
+ p @index
54
+ end
55
+
56
+ def determine_width(codepoint, category, east_asian_width)
57
+ if ( ZERO_WIDTH_CATEGORIES.include?(category) &&
58
+ [codepoint].pack('U') !~ /\p{Cf}(?<=\p{Arabic})/ ) ||
59
+ ZERO_WIDTH_CODEPOINTS.include?(codepoint)
60
+ 0
61
+ elsif east_asian_width == "F" || east_asian_width == "W"
62
+ 2
63
+ elsif east_asian_width == "A"
64
+ :A
65
+ else
66
+ nil
67
+ end
68
+ end
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,59 @@
1
+ module Unicoder
2
+ module Builder
3
+ class Scripts
4
+ include Builder
5
+ include MultiDimensionalArrayBuilder
6
+
7
+ def initialize_index
8
+ @index = {
9
+ SCRIPTS: [],
10
+ SCRIPT_EXTENSIONS: {},
11
+ SCRIPT_ALIASES: {},
12
+ SCRIPT_NAMES: [],
13
+ }
14
+ @reverse_script_names = {}
15
+ @reverse_script_extension_names = {}
16
+ end
17
+
18
+ def lookup_extension_names(extension_scripts_string)
19
+ extension_scripts_string.split(" ").map{ |extension_script|
20
+ @reverse_script_extension_names[extension_script]
21
+ }
22
+ end
23
+
24
+ def parse!
25
+ parse_file :property_value_aliases, :line, regex: /^sc ; (?<short>\S+?)\s*; (?<long>\S+?)(?:\s*; (?<short2>\S+))?$/ do |line|
26
+ @index[:SCRIPT_NAMES] << line["long"]
27
+ script_number = @reverse_script_names.size
28
+ @reverse_script_names[line["long"]] = script_number
29
+
30
+ @index[:SCRIPT_ALIASES][line["short" ]] = script_number
31
+ @index[:SCRIPT_ALIASES][line["short2"]] = script_number if line["short2"]
32
+ @reverse_script_extension_names[line["short"]] = script_number
33
+ end
34
+
35
+ parse_file :scripts, :line, regex: /^(?<from>\S+?)(\.\.(?<to>\S+))?\s+; (?<script>\S+) #.*$/ do |line|
36
+ if line["to"]
37
+ (line["from"].to_i(16)..line["to"].to_i(16)).each{ |codepoint|
38
+ assign_codepoint codepoint, @reverse_script_names[line["script"]], @index[:SCRIPTS]
39
+ }
40
+ else
41
+ assign_codepoint line["from"].to_i(16), @reverse_script_names[line["script"]], @index[:SCRIPTS]
42
+ end
43
+ end
44
+
45
+ 4.times{ compress! @index[:SCRIPTS] }
46
+
47
+ parse_file :script_extensions, :line, regex: /^(?<from>\S+?)(\.\.(?<to>\S+))?\s+; (?<scripts>.+?) #.*$/ do |line|
48
+ if line["to"]
49
+ (line["from"].to_i(16)..line["to"].to_i(16)).each{ |codepoint|
50
+ @index[:SCRIPT_EXTENSIONS][codepoint] = lookup_extension_names(line["scripts"])
51
+ }
52
+ else
53
+ @index[:SCRIPT_EXTENSIONS][line["from"].to_i(16)] = lookup_extension_names(line["scripts"])
54
+ end
55
+ end
56
+ end
57
+ end
58
+ end
59
+ end
@@ -0,0 +1,29 @@
1
+ module Unicoder
2
+ VERSION = "0.1.0".freeze
3
+
4
+ CURRENT_UNICODE_VERSION = "8.0.0".freeze
5
+
6
+ UNICODE_VERSIONS = %w[
7
+ 6.3.0
8
+ 7.0.0
9
+ 8.0.0
10
+ 9.0.0
11
+ ].freeze
12
+
13
+ UNICODE_DATA_ENDPOINT = "ftp://ftp.unicode.org/Public".freeze
14
+
15
+ LOCAL_DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../data/unicode").freeze
16
+
17
+ UNICODE_FILES = {
18
+ east_asian_width: "/VERSION/ucd/EastAsianWidth.txt",
19
+ unicode_data: "/VERSION/ucd/UnicodeData.txt",
20
+ name_aliases: "/VERSION/ucd/NameAliases.txt",
21
+ confusables: "/security/VERSION/confusables.txt",
22
+ blocks: "/VERSION/ucd/Blocks.txt",
23
+ scripts: "/VERSION/ucd/Scripts.txt",
24
+ script_extensions: "/VERSION/ucd/ScriptExtensions.txt",
25
+ property_value_aliases: "/VERSION/ucd/PropertyValueAliases.txt",
26
+ general_categories: "/VERSION/ucd/extracted/DerivedGeneralCategory.txt",
27
+ }
28
+ end
29
+
@@ -0,0 +1,28 @@
1
+ require "open-uri"
2
+ require "fileutils"
3
+
4
+ module Unicoder
5
+ module Downloader
6
+ def self.fetch(identifier,
7
+ unicode_version: CURRENT_UNICODE_VERSION,
8
+ destination_directory: LOCAL_DATA_DIRECTORY,
9
+ destination: nil,
10
+ filename: nil
11
+ )
12
+ filename = UNICODE_FILES[identifier.to_sym] || filename
13
+ raise ArgumentError, "No valid file identifier or filename given" if !filename
14
+ filename.sub! 'VERSION', unicode_version
15
+ source = UNICODE_DATA_ENDPOINT + filename
16
+ destination ||= destination_directory + filename
17
+
18
+ open(source){ |f|
19
+ FileUtils.mkdir_p(File.dirname(destination))
20
+ File.write(destination, f.read)
21
+ }
22
+
23
+ puts "GET #{source} => #{destination}"
24
+ rescue => e
25
+ $stderr.puts "#{e.class}: #{e.message}"
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,64 @@
1
+ require "json"
2
+
3
+ module Unicoder
4
+ # Include after Builder
5
+ module MultiDimensionalArrayBuilder
6
+ def initialize_index
7
+ @index = []
8
+ end
9
+
10
+ def assign_codepoint(codepoint, value, index = @index)
11
+ plane = codepoint / 0x10000
12
+ plane_offset = codepoint % 0x10000
13
+ row = plane_offset / 0x1000
14
+ row_offset = plane_offset % 0x1000
15
+ byte = row_offset / 0x100
16
+ byte_offset = row_offset % 0x100
17
+ nibble = byte_offset / 0x10
18
+ nibble_offset = byte_offset % 0x10
19
+
20
+ index[plane] ||= []
21
+ index[plane][row] ||= []
22
+ index[plane][row][byte] ||= []
23
+ index[plane][row][byte][nibble] ||= []
24
+ index[plane][row][byte][nibble][nibble_offset] = value
25
+ end
26
+
27
+ def compress!(index = @index)
28
+ index.map!{ |plane|
29
+ if !plane.is_a?(Array)
30
+ plane
31
+ elsif plane.flatten.uniq.size == 1
32
+ plane[0]
33
+ else
34
+ plane.map!{ |row|
35
+ if !row.is_a?(Array)
36
+ row
37
+ elsif row.flatten.uniq.size == 1
38
+ row[0]
39
+ else
40
+ row.map!{ |byte|
41
+ if !byte.is_a?(Array)
42
+ byte
43
+ elsif byte.uniq.size == 1
44
+ byte[0]
45
+ else
46
+ byte.map! { |nibble|
47
+ if !nibble.is_a?(Array)
48
+ nibble
49
+ elsif nibble.uniq.size == 1
50
+ nibble[0]
51
+ else
52
+ nibble
53
+ end
54
+ }
55
+ end
56
+ }
57
+ end
58
+ }
59
+ end
60
+ }
61
+ end
62
+
63
+ end
64
+ end
@@ -0,0 +1,11 @@
1
+ namespace :unicoder do
2
+ desc "(fetch)"
3
+ task :fetch, [:identifier] do |t, args|
4
+ Unicoder::Downloader.fetch(args.identifier)
5
+ end
6
+
7
+ desc "(index)"
8
+ task :index, [:identifier] do |t, args|
9
+ Unicoder::Builder.build(args.identifier)
10
+ end
11
+ end
@@ -0,0 +1,9 @@
1
+ require_relative "../lib/unicoder"
2
+ require "minitest/autorun"
3
+
4
+ describe Unicoder do
5
+ it "works" do
6
+ assert_equal true, false
7
+ end
8
+ end
9
+
@@ -0,0 +1,22 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require File.dirname(__FILE__) + "/lib/unicoder/constants"
4
+
5
+ Gem::Specification.new do |gem|
6
+ gem.name = "unicoder"
7
+ gem.version = Unicoder::VERSION
8
+ gem.summary = "Create specialized indexes for Unicode data lookup"
9
+ gem.description = "Generate specialized indexes for Unicode data lookup"
10
+ gem.authors = ["Jan Lelis"]
11
+ gem.email = ["mail@janlelis.de"]
12
+ gem.homepage = "https://github.com/janlelis/unicoder"
13
+ gem.license = "MIT"
14
+
15
+ gem.files = Dir["{**/}{.*,*}"].select{ |path| File.file?(path) && path !~ /^pkg/ }
16
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.required_ruby_version = "~> 2.0"
21
+ gem.add_dependency "rationalist", "~> 2.0"
22
+ end
metadata ADDED
@@ -0,0 +1,93 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: unicoder
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Jan Lelis
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-04-13 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rationalist
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '2.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '2.0'
27
+ description: Generate specialized indexes for Unicode data lookup
28
+ email:
29
+ - mail@janlelis.de
30
+ executables:
31
+ - unicoder
32
+ extensions: []
33
+ extra_rdoc_files: []
34
+ files:
35
+ - ".gitignore"
36
+ - ".travis.yml"
37
+ - CHANGELOG.md
38
+ - CODE_OF_CONDUCT.md
39
+ - Gemfile
40
+ - MIT-LICENSE.txt
41
+ - README.md
42
+ - Rakefile
43
+ - bin/unicoder
44
+ - data/.keep
45
+ - data/unicode/8.0.0/ucd/Blocks.txt
46
+ - data/unicode/8.0.0/ucd/EastAsianWidth.txt
47
+ - data/unicode/8.0.0/ucd/NameAliases.txt
48
+ - data/unicode/8.0.0/ucd/PropertyValueAliases.txt
49
+ - data/unicode/8.0.0/ucd/ScriptExtensions.txt
50
+ - data/unicode/8.0.0/ucd/Scripts.txt
51
+ - data/unicode/8.0.0/ucd/UnicodeData.txt
52
+ - data/unicode/8.0.0/ucd/extracted/DerivedGeneralCategory.txt
53
+ - data/unicode/security/8.0.0/confusables.txt
54
+ - lib/unicoder.rb
55
+ - lib/unicoder/builder.rb
56
+ - lib/unicoder/builders/blocks.rb
57
+ - lib/unicoder/builders/categories.rb
58
+ - lib/unicoder/builders/confusable.rb
59
+ - lib/unicoder/builders/display_width.rb
60
+ - lib/unicoder/builders/scripts.rb
61
+ - lib/unicoder/constants.rb
62
+ - lib/unicoder/downloader.rb
63
+ - lib/unicoder/multi_dimensional_array_builder.rb
64
+ - lib/unicoder/tasks.rake
65
+ - spec/unicoder_spec.rb
66
+ - unicoder.gemspec
67
+ homepage: https://github.com/janlelis/unicoder
68
+ licenses:
69
+ - MIT
70
+ metadata: {}
71
+ post_install_message:
72
+ rdoc_options: []
73
+ require_paths:
74
+ - lib
75
+ required_ruby_version: !ruby/object:Gem::Requirement
76
+ requirements:
77
+ - - "~>"
78
+ - !ruby/object:Gem::Version
79
+ version: '2.0'
80
+ required_rubygems_version: !ruby/object:Gem::Requirement
81
+ requirements:
82
+ - - ">="
83
+ - !ruby/object:Gem::Version
84
+ version: '0'
85
+ requirements: []
86
+ rubyforge_project:
87
+ rubygems_version: 2.6.3
88
+ signing_key:
89
+ specification_version: 4
90
+ summary: Create specialized indexes for Unicode data lookup
91
+ test_files:
92
+ - spec/unicoder_spec.rb
93
+ has_rdoc: