unicode-categories 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: b5ffea705e0ea825e2da7d52510acc2d07542f34
4
+ data.tar.gz: 807bf0ab0e1fee003ab34d39f20a8383573fe029
5
+ SHA512:
6
+ metadata.gz: 83e66060a6505ceee287bab54743ae5ff2a16719d3d4faf126b7a18448f6316660f4c0aa14d0658783d1482fe622c0e402eb948dc5b3d22ebf46b54fc867d20a
7
+ data.tar.gz: 5700854254d1d80e15d6459d3fed41f0a3c69e64437408eb244e1747e23e3597b0462637af7c2fb838b670f046ab2bb89931573db12e09fc2911c902ca1d6141
@@ -0,0 +1,2 @@
1
+ Gemfile.lock
2
+ /pkg
@@ -0,0 +1,21 @@
1
+ sudo: false
2
+ language: ruby
3
+
4
+ script: bundle exec ruby spec/unicode_categories_spec.rb
5
+
6
+ rvm:
7
+ - 2.3.0
8
+ - 2.2
9
+ - 2.1
10
+ - ruby-head
11
+ - rbx-2
12
+ - jruby-head
13
+ - jruby-9.0.5.0
14
+
15
+ cache:
16
+ - bundler
17
+
18
+ matrix:
19
+ allow_failures:
20
+ - rvm: jruby-head
21
+ - rvm: rbx-2
@@ -0,0 +1,6 @@
1
+ ## CHANGELOG
2
+
3
+ ### 1.0.0
4
+
5
+ * Inital release
6
+
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at opensource@janlelis.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ gem 'minitest'
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2016 Jan Lelis, mail@janlelis.de
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,92 @@
1
+ # Unicode::Categories [![[version]](https://badge.fury.io/rb/unicode-categories.svg)](http://badge.fury.io/rb/unicode-categories) [![[travis]](https://travis-ci.org/janlelis/unicode-categories.png)](https://travis-ci.org/janlelis/unicode-categories)
2
+
3
+ Returns which [General Categories](https://en.wikipedia.org/wiki/Unicode_character_property#General_Category) are contained in Unicode string.
4
+
5
+ Unicode version: **8.0.0**
6
+
7
+ Supported Rubies: **2.3**, **2.2**, **2.1**
8
+
9
+ ## Gemfile
10
+
11
+ ```ruby
12
+ gem "unicode-categories"
13
+ ```
14
+
15
+ ## Usage
16
+
17
+ ```ruby
18
+ require "unicode/categories"
19
+
20
+ # All general categories of a string
21
+ Unicode::Categories.categories("A 2") # => ["Lu", "Nd", "Zs"]
22
+ Unicode::Categories.categories("A 2", format: :long)
23
+ # => ["Decimal_Number", "Space_Separator", "Uppercase_Letter"]
24
+
25
+ # Also aliased as .of
26
+ Unicode::Categories.of("\u{10c50}") # => ["Cn"]
27
+
28
+ # Single character
29
+ Unicode::Categories.category("☼", format: :long) # => "Other_Symbol"
30
+ ```
31
+
32
+ The list of categories is always sorted alphabetically.
33
+
34
+ ## Hints
35
+
36
+ ### Regex Matching
37
+
38
+ If you have a string and want to match a substring/character from a specific Unicode block, you actually won't need this gem. Instead, you can use the [Regexp Unicode Property Syntax `\p{}`](http://ruby-doc.org/core-2.3.0/Regexp.html#class-Regexp-label-Character+Properties):
39
+
40
+ ```ruby
41
+ "Find decimal numbers (like 2 or 3) within a string".scan(/\p{Nd}+/) # => ["2", "3"]
42
+ ```
43
+
44
+ ### List of General Categories
45
+
46
+ You can retrieve a list of all General Categories like this:
47
+
48
+ ```ruby
49
+ require "unicode/categories"
50
+ puts Unicode::Categories.names
51
+
52
+ # # # Output # # #
53
+
54
+ Cc
55
+ Cf
56
+ Cn
57
+ Co
58
+ Cs
59
+ LC
60
+ Ll
61
+ Lm
62
+ Lo
63
+ Lt
64
+ Lu
65
+ Mc
66
+ Me
67
+ Mn
68
+ Nd
69
+ Nl
70
+ No
71
+ Pc
72
+ Pd
73
+ Pe
74
+ Pf
75
+ Pi
76
+ Po
77
+ Ps
78
+ Sc
79
+ Sk
80
+ Sm
81
+ So
82
+ Zl
83
+ Zp
84
+ Zs
85
+ ```
86
+
87
+
88
+ ## MIT License
89
+
90
+ - Copyright (C) 2016 Jan Lelis <http://janlelis.com>. Released under the MIT license.
91
+ - Unicode data: http://www.unicode.org/copyright.html#Exhibit1
92
+
@@ -0,0 +1,30 @@
1
+ # # #
2
+ # Get gemspec info
3
+
4
+ gemspec_file = Dir['*.gemspec'].first
5
+ gemspec = eval File.read(gemspec_file), binding, gemspec_file
6
+ info = "#{gemspec.name} | #{gemspec.version} | " \
7
+ "#{gemspec.runtime_dependencies.size} dependencies | " \
8
+ "#{gemspec.files.size} files"
9
+
10
+
11
+ # # #
12
+ # Gem build and install task
13
+
14
+ desc info
15
+ task :gem do
16
+ puts info + "\n\n"
17
+ print " "; sh "gem build #{gemspec_file}"
18
+ FileUtils.mkdir_p 'pkg'
19
+ FileUtils.mv "#{gemspec.name}-#{gemspec.version}.gem", 'pkg'
20
+ puts; sh %{gem install --no-document pkg/#{gemspec.name}-#{gemspec.version}.gem}
21
+ end
22
+
23
+
24
+ # # #
25
+ # Start an IRB session with the gem loaded
26
+
27
+ desc "#{gemspec.name} | IRB"
28
+ task :irb do
29
+ sh "irb -I ./lib -r #{gemspec.name.gsub '-','/'}"
30
+ end
@@ -0,0 +1,45 @@
1
+ require_relative "categories/constants"
2
+
3
+ module Unicode
4
+ module Categories
5
+ def self.categories(string, **options)
6
+ res = []
7
+ string.each_char{ |char|
8
+ category_name = category(char, **options)
9
+ res << category_name unless res.include?(category_name)
10
+ }
11
+ res.sort
12
+ end
13
+ class << self; alias of categories; end
14
+
15
+ def self.category(char, format: :short)
16
+ require_relative 'categories/index' unless defined? ::Unicode::Categories::INDEX
17
+ codepoint_depth_offset = char.unpack("U")[0] or
18
+ raise(ArgumentError, "Unicode::Categories.category must be given a valid char")
19
+ index_or_value = INDEX[:CATEGORIES]
20
+ [0x10000, 0x1000, 0x100, 0x10].each{ |depth|
21
+ index_or_value = index_or_value[codepoint_depth_offset / depth]
22
+ codepoint_depth_offset = codepoint_depth_offset % depth
23
+ unless index_or_value.is_a? Array
24
+ res = index_or_value || "Cn"
25
+ return format == :long ? INDEX[:CATEGORY_NAMES][res] : res
26
+ end
27
+ }
28
+
29
+ res = index_or_value[codepoint_depth_offset] || "Cn"
30
+ format == :long ? INDEX[:CATEGORY_NAMES][res] : res
31
+ end
32
+
33
+ def self.names(format: :short)
34
+ require_relative 'categories/index' unless defined? ::Unicode::Categories::INDEX
35
+ case format
36
+ when :long
37
+ INDEX[:CATEGORY_NAMES].values.sort
38
+ when :short
39
+ INDEX[:CATEGORY_NAMES].keys.sort
40
+ when :table
41
+ INDEX[:CATEGORY_NAMES].dup
42
+ end
43
+ end
44
+ end
45
+ end
@@ -0,0 +1,9 @@
1
+ module Unicode
2
+ module Categories
3
+ VERSION = "1.0.0".freeze
4
+ UNICODE_VERSION = "8.0.0".freeze
5
+ DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + '/../../../data/').freeze
6
+ INDEX_FILENAME = (DATA_DIRECTORY + '/categories.marshal.gz').freeze
7
+ end
8
+ end
9
+
@@ -0,0 +1,8 @@
1
+ require_relative 'constants'
2
+
3
+ module Unicode
4
+ module Categories
5
+ INDEX = Marshal.load(Gem.gunzip(File.binread(INDEX_FILENAME)))
6
+ end
7
+ end
8
+
@@ -0,0 +1,8 @@
1
+ require_relative "../categories"
2
+
3
+ class String
4
+ # Optional string extension for your convenience
5
+ def unicode_categories
6
+ Unicode::Categories.of(self)
7
+ end
8
+ end
@@ -0,0 +1,57 @@
1
+ require_relative "../lib/unicode/categories"
2
+ require "minitest/autorun"
3
+
4
+ describe Unicode::Categories do
5
+ describe ".categories (alias .of)" do
6
+ it "will always return an Array" do
7
+ assert_equal [], Unicode::Categories.of("")
8
+ end
9
+
10
+ it "will return all categories that characters in the string belong to" do
11
+ assert_equal ["Lu", "Nd", "Zs"], Unicode::Categories.of("A 2")
12
+ end
13
+
14
+ it "will return long identifiers for format: :long option" do
15
+ assert_equal ["Decimal_Number", "Space_Separator", "Uppercase_Letter"],
16
+ Unicode::Categories.of("A 2", format: :long)
17
+ end
18
+
19
+ it "will return all categories sorted order" do
20
+ assert_equal ["Lu", "Nd"], Unicode::Categories.of("A2")
21
+ assert_equal ["Lu", "Nd"], Unicode::Categories.of("2A")
22
+ end
23
+
24
+ it "will call .category for every character" do
25
+ mocked_method = MiniTest::Mock.new
26
+ mocked_method.expect :call, "first category", ["A", {}]
27
+ mocked_method.expect :call, "second category", ["2", {}]
28
+ Unicode::Categories.stub :category, mocked_method do
29
+ Unicode::Categories.of("A2")
30
+ end
31
+ mocked_method.verify
32
+ end
33
+ end
34
+
35
+ describe ".category" do
36
+ it "will return category for that character" do
37
+ assert_equal "So", Unicode::Categories.category("�")
38
+ end
39
+
40
+ it "will return Cn for unassigned codepoints" do
41
+ assert_equal "Cn", Unicode::Categories.category("\u{10c50}")
42
+ end
43
+ end
44
+
45
+ describe ".names" do
46
+ it "will return a list of all categories" do
47
+ assert_kind_of Array, Unicode::Categories.names
48
+ assert_includes Unicode::Categories.names, "Sc"
49
+ end
50
+
51
+ it "will return a list of all long category names when used with format: :long" do
52
+ assert_kind_of Array, Unicode::Categories.names(format: :long)
53
+ assert_includes Unicode::Categories.names(format: :long), "Currency_Symbol"
54
+ end
55
+ end
56
+ end
57
+
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require File.dirname(__FILE__) + "/lib/unicode/categories/constants"
4
+
5
+ Gem::Specification.new do |gem|
6
+ gem.name = "unicode-categories"
7
+ gem.version = Unicode::Categories::VERSION
8
+ gem.summary = "Determine the Unicode General Categories of a string."
9
+ gem.description = "[Unicode version: #{Unicode::Categories::UNICODE_VERSION}] Determine which Unicode General Categories a string belongs to."
10
+ gem.authors = ["Jan Lelis"]
11
+ gem.email = ["mail@janlelis.de"]
12
+ gem.homepage = "https://github.com/janlelis/unicode-categories"
13
+ gem.license = "MIT"
14
+
15
+ gem.files = Dir["{**/}{.*,*}"].select{ |path| File.file?(path) && path !~ /^pkg/ }
16
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.required_ruby_version = "~> 2.0"
21
+ end
metadata ADDED
@@ -0,0 +1,62 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: unicode-categories
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Jan Lelis
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-04-13 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: "[Unicode version: 8.0.0] Determine which Unicode General Categories
14
+ a string belongs to."
15
+ email:
16
+ - mail@janlelis.de
17
+ executables: []
18
+ extensions: []
19
+ extra_rdoc_files: []
20
+ files:
21
+ - ".gitignore"
22
+ - ".travis.yml"
23
+ - CHANGELOG.md
24
+ - CODE_OF_CONDUCT.md
25
+ - Gemfile
26
+ - MIT-LICENSE.txt
27
+ - README.md
28
+ - Rakefile
29
+ - data/categories.marshal.gz
30
+ - lib/unicode/categories.rb
31
+ - lib/unicode/categories/constants.rb
32
+ - lib/unicode/categories/index.rb
33
+ - lib/unicode/categories/string_ext.rb
34
+ - spec/unicode_categories_spec.rb
35
+ - unicode-categories.gemspec
36
+ homepage: https://github.com/janlelis/unicode-categories
37
+ licenses:
38
+ - MIT
39
+ metadata: {}
40
+ post_install_message:
41
+ rdoc_options: []
42
+ require_paths:
43
+ - lib
44
+ required_ruby_version: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - "~>"
47
+ - !ruby/object:Gem::Version
48
+ version: '2.0'
49
+ required_rubygems_version: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ requirements: []
55
+ rubyforge_project:
56
+ rubygems_version: 2.6.3
57
+ signing_key:
58
+ specification_version: 4
59
+ summary: Determine the Unicode General Categories of a string.
60
+ test_files:
61
+ - spec/unicode_categories_spec.rb
62
+ has_rdoc: