unicode-categories 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: b5ffea705e0ea825e2da7d52510acc2d07542f34
4
+ data.tar.gz: 807bf0ab0e1fee003ab34d39f20a8383573fe029
5
+ SHA512:
6
+ metadata.gz: 83e66060a6505ceee287bab54743ae5ff2a16719d3d4faf126b7a18448f6316660f4c0aa14d0658783d1482fe622c0e402eb948dc5b3d22ebf46b54fc867d20a
7
+ data.tar.gz: 5700854254d1d80e15d6459d3fed41f0a3c69e64437408eb244e1747e23e3597b0462637af7c2fb838b670f046ab2bb89931573db12e09fc2911c902ca1d6141
@@ -0,0 +1,2 @@
1
+ Gemfile.lock
2
+ /pkg
@@ -0,0 +1,21 @@
1
+ sudo: false
2
+ language: ruby
3
+
4
+ script: bundle exec ruby spec/unicode_categories_spec.rb
5
+
6
+ rvm:
7
+ - 2.3.0
8
+ - 2.2
9
+ - 2.1
10
+ - ruby-head
11
+ - rbx-2
12
+ - jruby-head
13
+ - jruby-9.0.5.0
14
+
15
+ cache:
16
+ - bundler
17
+
18
+ matrix:
19
+ allow_failures:
20
+ - rvm: jruby-head
21
+ - rvm: rbx-2
@@ -0,0 +1,6 @@
1
+ ## CHANGELOG
2
+
3
+ ### 1.0.0
4
+
5
+ * Inital release
6
+
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at opensource@janlelis.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ gem 'minitest'
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2016 Jan Lelis, mail@janlelis.de
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,92 @@
1
+ # Unicode::Categories [![[version]](https://badge.fury.io/rb/unicode-categories.svg)](http://badge.fury.io/rb/unicode-categories) [![[travis]](https://travis-ci.org/janlelis/unicode-categories.png)](https://travis-ci.org/janlelis/unicode-categories)
2
+
3
+ Returns which [General Categories](https://en.wikipedia.org/wiki/Unicode_character_property#General_Category) are contained in Unicode string.
4
+
5
+ Unicode version: **8.0.0**
6
+
7
+ Supported Rubies: **2.3**, **2.2**, **2.1**
8
+
9
+ ## Gemfile
10
+
11
+ ```ruby
12
+ gem "unicode-categories"
13
+ ```
14
+
15
+ ## Usage
16
+
17
+ ```ruby
18
+ require "unicode/categories"
19
+
20
+ # All general categories of a string
21
+ Unicode::Categories.categories("A 2") # => ["Lu", "Nd", "Zs"]
22
+ Unicode::Categories.categories("A 2", format: :long)
23
+ # => ["Decimal_Number", "Space_Separator", "Uppercase_Letter"]
24
+
25
+ # Also aliased as .of
26
+ Unicode::Categories.of("\u{10c50}") # => ["Cn"]
27
+
28
+ # Single character
29
+ Unicode::Categories.category("☼", format: :long) # => "Other_Symbol"
30
+ ```
31
+
32
+ The list of categories is always sorted alphabetically.
33
+
34
+ ## Hints
35
+
36
+ ### Regex Matching
37
+
38
+ If you have a string and want to match a substring/character from a specific Unicode block, you actually won't need this gem. Instead, you can use the [Regexp Unicode Property Syntax `\p{}`](http://ruby-doc.org/core-2.3.0/Regexp.html#class-Regexp-label-Character+Properties):
39
+
40
+ ```ruby
41
+ "Find decimal numbers (like 2 or 3) within a string".scan(/\p{Nd}+/) # => ["2", "3"]
42
+ ```
43
+
44
+ ### List of General Categories
45
+
46
+ You can retrieve a list of all General Categories like this:
47
+
48
+ ```ruby
49
+ require "unicode/categories"
50
+ puts Unicode::Categories.names
51
+
52
+ # # # Output # # #
53
+
54
+ Cc
55
+ Cf
56
+ Cn
57
+ Co
58
+ Cs
59
+ LC
60
+ Ll
61
+ Lm
62
+ Lo
63
+ Lt
64
+ Lu
65
+ Mc
66
+ Me
67
+ Mn
68
+ Nd
69
+ Nl
70
+ No
71
+ Pc
72
+ Pd
73
+ Pe
74
+ Pf
75
+ Pi
76
+ Po
77
+ Ps
78
+ Sc
79
+ Sk
80
+ Sm
81
+ So
82
+ Zl
83
+ Zp
84
+ Zs
85
+ ```
86
+
87
+
88
+ ## MIT License
89
+
90
+ - Copyright (C) 2016 Jan Lelis <http://janlelis.com>. Released under the MIT license.
91
+ - Unicode data: http://www.unicode.org/copyright.html#Exhibit1
92
+
@@ -0,0 +1,30 @@
1
+ # # #
2
+ # Get gemspec info
3
+
4
+ gemspec_file = Dir['*.gemspec'].first
5
+ gemspec = eval File.read(gemspec_file), binding, gemspec_file
6
+ info = "#{gemspec.name} | #{gemspec.version} | " \
7
+ "#{gemspec.runtime_dependencies.size} dependencies | " \
8
+ "#{gemspec.files.size} files"
9
+
10
+
11
+ # # #
12
+ # Gem build and install task
13
+
14
+ desc info
15
+ task :gem do
16
+ puts info + "\n\n"
17
+ print " "; sh "gem build #{gemspec_file}"
18
+ FileUtils.mkdir_p 'pkg'
19
+ FileUtils.mv "#{gemspec.name}-#{gemspec.version}.gem", 'pkg'
20
+ puts; sh %{gem install --no-document pkg/#{gemspec.name}-#{gemspec.version}.gem}
21
+ end
22
+
23
+
24
+ # # #
25
+ # Start an IRB session with the gem loaded
26
+
27
+ desc "#{gemspec.name} | IRB"
28
+ task :irb do
29
+ sh "irb -I ./lib -r #{gemspec.name.gsub '-','/'}"
30
+ end
@@ -0,0 +1,45 @@
1
+ require_relative "categories/constants"
2
+
3
+ module Unicode
4
+ module Categories
5
+ def self.categories(string, **options)
6
+ res = []
7
+ string.each_char{ |char|
8
+ category_name = category(char, **options)
9
+ res << category_name unless res.include?(category_name)
10
+ }
11
+ res.sort
12
+ end
13
+ class << self; alias of categories; end
14
+
15
+ def self.category(char, format: :short)
16
+ require_relative 'categories/index' unless defined? ::Unicode::Categories::INDEX
17
+ codepoint_depth_offset = char.unpack("U")[0] or
18
+ raise(ArgumentError, "Unicode::Categories.category must be given a valid char")
19
+ index_or_value = INDEX[:CATEGORIES]
20
+ [0x10000, 0x1000, 0x100, 0x10].each{ |depth|
21
+ index_or_value = index_or_value[codepoint_depth_offset / depth]
22
+ codepoint_depth_offset = codepoint_depth_offset % depth
23
+ unless index_or_value.is_a? Array
24
+ res = index_or_value || "Cn"
25
+ return format == :long ? INDEX[:CATEGORY_NAMES][res] : res
26
+ end
27
+ }
28
+
29
+ res = index_or_value[codepoint_depth_offset] || "Cn"
30
+ format == :long ? INDEX[:CATEGORY_NAMES][res] : res
31
+ end
32
+
33
+ def self.names(format: :short)
34
+ require_relative 'categories/index' unless defined? ::Unicode::Categories::INDEX
35
+ case format
36
+ when :long
37
+ INDEX[:CATEGORY_NAMES].values.sort
38
+ when :short
39
+ INDEX[:CATEGORY_NAMES].keys.sort
40
+ when :table
41
+ INDEX[:CATEGORY_NAMES].dup
42
+ end
43
+ end
44
+ end
45
+ end
@@ -0,0 +1,9 @@
1
+ module Unicode
2
+ module Categories
3
+ VERSION = "1.0.0".freeze
4
+ UNICODE_VERSION = "8.0.0".freeze
5
+ DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + '/../../../data/').freeze
6
+ INDEX_FILENAME = (DATA_DIRECTORY + '/categories.marshal.gz').freeze
7
+ end
8
+ end
9
+
@@ -0,0 +1,8 @@
1
+ require_relative 'constants'
2
+
3
+ module Unicode
4
+ module Categories
5
+ INDEX = Marshal.load(Gem.gunzip(File.binread(INDEX_FILENAME)))
6
+ end
7
+ end
8
+
@@ -0,0 +1,8 @@
1
+ require_relative "../categories"
2
+
3
+ class String
4
+ # Optional string extension for your convenience
5
+ def unicode_categories
6
+ Unicode::Categories.of(self)
7
+ end
8
+ end
@@ -0,0 +1,57 @@
1
+ require_relative "../lib/unicode/categories"
2
+ require "minitest/autorun"
3
+
4
+ describe Unicode::Categories do
5
+ describe ".categories (alias .of)" do
6
+ it "will always return an Array" do
7
+ assert_equal [], Unicode::Categories.of("")
8
+ end
9
+
10
+ it "will return all categories that characters in the string belong to" do
11
+ assert_equal ["Lu", "Nd", "Zs"], Unicode::Categories.of("A 2")
12
+ end
13
+
14
+ it "will return long identifiers for format: :long option" do
15
+ assert_equal ["Decimal_Number", "Space_Separator", "Uppercase_Letter"],
16
+ Unicode::Categories.of("A 2", format: :long)
17
+ end
18
+
19
+ it "will return all categories sorted order" do
20
+ assert_equal ["Lu", "Nd"], Unicode::Categories.of("A2")
21
+ assert_equal ["Lu", "Nd"], Unicode::Categories.of("2A")
22
+ end
23
+
24
+ it "will call .category for every character" do
25
+ mocked_method = MiniTest::Mock.new
26
+ mocked_method.expect :call, "first category", ["A", {}]
27
+ mocked_method.expect :call, "second category", ["2", {}]
28
+ Unicode::Categories.stub :category, mocked_method do
29
+ Unicode::Categories.of("A2")
30
+ end
31
+ mocked_method.verify
32
+ end
33
+ end
34
+
35
+ describe ".category" do
36
+ it "will return category for that character" do
37
+ assert_equal "So", Unicode::Categories.category("�")
38
+ end
39
+
40
+ it "will return Cn for unassigned codepoints" do
41
+ assert_equal "Cn", Unicode::Categories.category("\u{10c50}")
42
+ end
43
+ end
44
+
45
+ describe ".names" do
46
+ it "will return a list of all categories" do
47
+ assert_kind_of Array, Unicode::Categories.names
48
+ assert_includes Unicode::Categories.names, "Sc"
49
+ end
50
+
51
+ it "will return a list of all long category names when used with format: :long" do
52
+ assert_kind_of Array, Unicode::Categories.names(format: :long)
53
+ assert_includes Unicode::Categories.names(format: :long), "Currency_Symbol"
54
+ end
55
+ end
56
+ end
57
+
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require File.dirname(__FILE__) + "/lib/unicode/categories/constants"
4
+
5
+ Gem::Specification.new do |gem|
6
+ gem.name = "unicode-categories"
7
+ gem.version = Unicode::Categories::VERSION
8
+ gem.summary = "Determine the Unicode General Categories of a string."
9
+ gem.description = "[Unicode version: #{Unicode::Categories::UNICODE_VERSION}] Determine which Unicode General Categories a string belongs to."
10
+ gem.authors = ["Jan Lelis"]
11
+ gem.email = ["mail@janlelis.de"]
12
+ gem.homepage = "https://github.com/janlelis/unicode-categories"
13
+ gem.license = "MIT"
14
+
15
+ gem.files = Dir["{**/}{.*,*}"].select{ |path| File.file?(path) && path !~ /^pkg/ }
16
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.required_ruby_version = "~> 2.0"
21
+ end
metadata ADDED
@@ -0,0 +1,62 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: unicode-categories
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Jan Lelis
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-04-13 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: "[Unicode version: 8.0.0] Determine which Unicode General Categories
14
+ a string belongs to."
15
+ email:
16
+ - mail@janlelis.de
17
+ executables: []
18
+ extensions: []
19
+ extra_rdoc_files: []
20
+ files:
21
+ - ".gitignore"
22
+ - ".travis.yml"
23
+ - CHANGELOG.md
24
+ - CODE_OF_CONDUCT.md
25
+ - Gemfile
26
+ - MIT-LICENSE.txt
27
+ - README.md
28
+ - Rakefile
29
+ - data/categories.marshal.gz
30
+ - lib/unicode/categories.rb
31
+ - lib/unicode/categories/constants.rb
32
+ - lib/unicode/categories/index.rb
33
+ - lib/unicode/categories/string_ext.rb
34
+ - spec/unicode_categories_spec.rb
35
+ - unicode-categories.gemspec
36
+ homepage: https://github.com/janlelis/unicode-categories
37
+ licenses:
38
+ - MIT
39
+ metadata: {}
40
+ post_install_message:
41
+ rdoc_options: []
42
+ require_paths:
43
+ - lib
44
+ required_ruby_version: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - "~>"
47
+ - !ruby/object:Gem::Version
48
+ version: '2.0'
49
+ required_rubygems_version: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ requirements: []
55
+ rubyforge_project:
56
+ rubygems_version: 2.6.3
57
+ signing_key:
58
+ specification_version: 4
59
+ summary: Determine the Unicode General Categories of a string.
60
+ test_files:
61
+ - spec/unicode_categories_spec.rb
62
+ has_rdoc: