annex_29 0.1.1 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 15eda304b9abf808d15c2c6698915c173143fbc0
4
- data.tar.gz: 08c0d64e61736e7f235d8f9b2ac0d1bad2320113
2
+ SHA256:
3
+ metadata.gz: f75649a14a18b7d666e2cf04b18d144fc4ad66cf203d9707ece3794d7485d0f6
4
+ data.tar.gz: 0f0d5583462a3e3fb13150d6ebeabc3304e77009207371bf6dfb654f7b97d633
5
5
  SHA512:
6
- metadata.gz: 7a6a3c5aff7be224bdc632a04c32e52c3ef6f9aeb526b687357fa6afcb8bd200b0531e40047c3b3cb6e3214afef2c12dcb0e5860394fbb8265b0d4361a676284
7
- data.tar.gz: 02379b90df98b4d259d94cfad8c07a35f5a28156fdef4a519f7ed84d20229f19a712b1ee5cae690fbdb09e0c1f0ab37bd900a90fc7e7acf669a552cd97d798c5
6
+ metadata.gz: 61a738b96ece05bd6021d082e656dfedf25fc30ea8e7abdeb8cefe90968e5424ecb15881d9fcce714a33c1a282e57f6a36d5430e59f8cba593fdc29f353dd3d8
7
+ data.tar.gz: 8e6a1ab2006341cd08d1d2115f038b3033b61de18e17c5d7e5988d52646269fb9e163c116bfa474eefb7d4eca5441c1203d38fb8833d528cda74b0956fb7fa40
@@ -0,0 +1,26 @@
1
+ name: CI
2
+
3
+ on: [push, pull_request]
4
+
5
+ jobs:
6
+ build:
7
+ runs-on: ubuntu-latest
8
+ name: Ruby ${{ matrix.ruby }} / ${{ matrix.gemfile }}
9
+ strategy:
10
+ fail-fast: false
11
+ matrix:
12
+ gemfile:
13
+ - Gemfile
14
+ ruby: ["3.2"]
15
+ env:
16
+ BUNDLE_GEMFILE: ${{ matrix.gemfile }}
17
+ steps:
18
+ - name: Check out code
19
+ uses: actions/checkout@v3
20
+ - name: Set up Ruby ${{ matrix.ruby }}
21
+ uses: ruby/setup-ruby@v1
22
+ with:
23
+ ruby-version: ${{ matrix.ruby }}
24
+ bundler-cache: true
25
+ - name: Tests
26
+ run: bin/rspec
@@ -0,0 +1,22 @@
1
+ name: Contributor License Agreement (CLA)
2
+
3
+ on:
4
+ pull_request_target:
5
+ types: [opened, synchronize]
6
+ issue_comment:
7
+ types: [created]
8
+
9
+ jobs:
10
+ cla:
11
+ runs-on: ubuntu-latest
12
+ if: |
13
+ (github.event.issue.pull_request
14
+ && !github.event.issue.pull_request.merged_at
15
+ && contains(github.event.comment.body, 'signed')
16
+ )
17
+ || (github.event.pull_request && !github.event.pull_request.merged)
18
+ steps:
19
+ - uses: Shopify/shopify-cla-action@v1
20
+ with:
21
+ github-token: ${{ secrets.GITHUB_TOKEN }}
22
+ cla-token: ${{ secrets.CLA_TOKEN }}
data/.gitignore ADDED
@@ -0,0 +1,2 @@
1
+ lib/annex_29/word_segmentation.rl
2
+ spec/examples.txt
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --color
2
+ --require spec_helper
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ 3.2.2
data/CHANGELOG.md ADDED
@@ -0,0 +1,38 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
+
7
+ ## How do I make a good changelog?
8
+
9
+ ### Guiding Principles
10
+
11
+ - Changelogs are for humans, not machines.
12
+ - There should be an entry for every single version.
13
+ - The same types of changes should be grouped.
14
+ - Versions and sections should be linkable.
15
+ - The latest version comes first.
16
+ - The release date of each version is displayed.
17
+ - Mention whether you follow Semantic Versioning.
18
+
19
+ ### Types of changes
20
+
21
+ - Added for new features.
22
+ - Changed for changes in existing functionality.
23
+ - Deprecated for soon-to-be removed features.
24
+ - Removed for now removed features.
25
+ - Fixed for any bug fixes.
26
+ - Security in case of vulnerabilities.
27
+
28
+ ## [Unreleased]
29
+
30
+ - nil
31
+
32
+ ---
33
+
34
+ [0.2.0] - 2023-12-11
35
+
36
+ - Add .ruby-version, Gemfile.lock, and GH test suite [#3](https://github.com/Shopify/annex-29/pull/3)
37
+ - Set "Shopify developers" as gem owners; move to MIT license [#5](https://github.com/Shopify/annex-29/pull/5)
38
+ - Bump rake gem from 11.3.0 to 13.1.0 [#6](https://github.com/Shopify/annex-29/pull/6)
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source("https://rubygems.org")
2
+
3
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,35 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ annex_29 (0.2.0)
5
+
6
+ GEM
7
+ remote: https://rubygems.org/
8
+ specs:
9
+ diff-lcs (1.5.0)
10
+ rake (13.1.0)
11
+ rspec (3.12.0)
12
+ rspec-core (~> 3.12.0)
13
+ rspec-expectations (~> 3.12.0)
14
+ rspec-mocks (~> 3.12.0)
15
+ rspec-core (3.12.2)
16
+ rspec-support (~> 3.12.0)
17
+ rspec-expectations (3.12.3)
18
+ diff-lcs (>= 1.2.0, < 2.0)
19
+ rspec-support (~> 3.12.0)
20
+ rspec-mocks (3.12.6)
21
+ diff-lcs (>= 1.2.0, < 2.0)
22
+ rspec-support (~> 3.12.0)
23
+ rspec-support (3.12.1)
24
+
25
+ PLATFORMS
26
+ arm64-darwin-22
27
+ x86_64-linux
28
+
29
+ DEPENDENCIES
30
+ annex_29!
31
+ rake (~> 13.1)
32
+ rspec (~> 3.5)
33
+
34
+ BUNDLED WITH
35
+ 2.4.10
data/LICENSE.md ADDED
@@ -0,0 +1,64 @@
1
+ # License for the annex_29 gem
2
+
3
+ Copyright 2023-present, Shopify Inc.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
6
+
7
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
8
+
9
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10
+
11
+ # Incorporated Components
12
+
13
+ ## Unicode CLDR
14
+
15
+ This software includes data files that have been derived from the
16
+ Unicode Common Locale Data Repository (CLDR, https://cldr.unicode.org/)
17
+ CLDR is governed by the following license:
18
+
19
+ UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE
20
+
21
+ See Terms of Use <https://www.unicode.org/copyright.html>
22
+ for definitions of Unicode Inc.’s Data Files and Software.
23
+
24
+ NOTICE TO USER: Carefully read the following legal agreement.
25
+ BY DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S
26
+ DATA FILES ("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"),
27
+ YOU UNEQUIVOCALLY ACCEPT, AND AGREE TO BE BOUND BY, ALL OF THE
28
+ TERMS AND CONDITIONS OF THIS AGREEMENT.
29
+ IF YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE
30
+ THE DATA FILES OR SOFTWARE.
31
+
32
+ COPYRIGHT AND PERMISSION NOTICE
33
+
34
+ Copyright © 1991-2023 Unicode, Inc. All rights reserved.
35
+ Distributed under the Terms of Use in https://www.unicode.org/copyright.html.
36
+
37
+ Permission is hereby granted, free of charge, to any person obtaining
38
+ a copy of the Unicode data files and any associated documentation
39
+ (the "Data Files") or Unicode software and any associated documentation
40
+ (the "Software") to deal in the Data Files or Software
41
+ without restriction, including without limitation the rights to use,
42
+ copy, modify, merge, publish, distribute, and/or sell copies of
43
+ the Data Files or Software, and to permit persons to whom the Data Files
44
+ or Software are furnished to do so, provided that either
45
+ (a) this copyright and permission notice appear with all copies
46
+ of the Data Files or Software, or
47
+ (b) this copyright and permission notice appear in associated
48
+ Documentation.
49
+
50
+ THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF
51
+ ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
52
+ WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
53
+ NONINFRINGEMENT OF THIRD PARTY RIGHTS.
54
+ IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS
55
+ NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
56
+ DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
57
+ DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
58
+ TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
59
+ PERFORMANCE OF THE DATA FILES OR SOFTWARE.
60
+
61
+ Except as contained in this notice, the name of a copyright holder
62
+ shall not be used in advertising or otherwise to promote the sale,
63
+ use or other dealings in these Data Files or Software without prior
64
+ written authorization of the copyright holder.
data/README.md ADDED
@@ -0,0 +1,3 @@
1
+ # Annex 29
2
+
3
+ Annex 29 is a Ruby gem that implements Unicode text segmentation. It is named after the section of the unicode spec that describes these algorithms.
data/Rakefile ADDED
@@ -0,0 +1,28 @@
1
+ require("bundler/gem_tasks")
2
+
3
+ require("erb")
4
+ require("pathname")
5
+
6
+ UNICODE_DATA = Dir.glob("data/*.txt")
7
+
8
+ ANNEX_29_PATH = Pathname.new("lib/annex_29/")
9
+ WORD_SEGMENTATION_ERB = ANNEX_29_PATH.join("word_segmentation.rl.erb").to_s
10
+ WORD_SEGMENTATION_RL = ANNEX_29_PATH.join("word_segmentation.rl").to_s
11
+ WORD_SEGMENTATION_RB = ANNEX_29_PATH.join("word_segmentation.rb").to_s
12
+
13
+ file(WORD_SEGMENTATION_RL => [WORD_SEGMENTATION_ERB, *UNICODE_DATA]) do
14
+ template = File.open(WORD_SEGMENTATION_ERB).read
15
+ rendered = ERB.new(template).result
16
+ File.open(
17
+ WORD_SEGMENTATION_RL,
18
+ File::CREAT | File::TRUNC | File::RDWR,
19
+ ).write(rendered)
20
+ end
21
+
22
+ file(WORD_SEGMENTATION_RB => [WORD_SEGMENTATION_RL]) do
23
+ sh("ragel", "-R", "-T1", "-o", WORD_SEGMENTATION_RB, WORD_SEGMENTATION_RL)
24
+ end
25
+
26
+ task(compile: [WORD_SEGMENTATION_RB])
27
+
28
+ task(default: [:compile])
data/annex_29.gemspec ADDED
@@ -0,0 +1,28 @@
1
+ # frozen_string_literal: true
2
+
3
+ lib = File.expand_path("../lib", __FILE__)
4
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
5
+ require "annex_29/version"
6
+
7
+ Gem::Specification.new do |spec|
8
+ spec.name = "annex_29"
9
+ spec.version = Annex29::VERSION
10
+ spec.summary = "Unicode annex 29 compliant word segmentation"
11
+ spec.author = "Shopify"
12
+ spec.email = "developers@shopify.com"
13
+ spec.homepage = "https://github.com/Shopify/annex-29"
14
+
15
+ spec.metadata["allowed_push_host"] = "https://rubygems.org/"
16
+
17
+ spec.files = %x(git ls-files -z).split("\x0").reject do |f|
18
+ f.match(%r{^(rake|test|spec|features)/})
19
+ end
20
+
21
+ spec.require_paths = ["lib"]
22
+
23
+ spec.required_ruby_version = ">= 3.2.0"
24
+
25
+
26
+ spec.add_development_dependency("rake", "~> 13.1")
27
+ spec.add_development_dependency("rspec", "~> 3.5")
28
+ end
data/bin/rake ADDED
@@ -0,0 +1,17 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+ #
4
+ # This file was generated by Bundler.
5
+ #
6
+ # The application 'rake' is installed as part of a gem, and
7
+ # this file is here to facilitate running it.
8
+ #
9
+
10
+ require "pathname"
11
+ ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile",
12
+ Pathname.new(__FILE__).realpath)
13
+
14
+ require "rubygems"
15
+ require "bundler/setup"
16
+
17
+ load Gem.bin_path("rake", "rake")
data/bin/rspec ADDED
@@ -0,0 +1,17 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+ #
4
+ # This file was generated by Bundler.
5
+ #
6
+ # The application 'rspec' is installed as part of a gem, and
7
+ # this file is here to facilitate running it.
8
+ #
9
+
10
+ require "pathname"
11
+ ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../../Gemfile",
12
+ Pathname.new(__FILE__).realpath)
13
+
14
+ require "rubygems"
15
+ require "bundler/setup"
16
+
17
+ load Gem.bin_path("rspec-core", "rspec")
data/data/Blocks.txt ADDED
@@ -0,0 +1,309 @@
1
+ # Blocks-9.0.0.txt
2
+ # Date: 2016-02-05, 23:48:00 GMT [KW]
3
+ # © 2016 Unicode®, Inc.
4
+ # For terms of use, see http://www.unicode.org/terms_of_use.html
5
+ #
6
+ # Unicode Character Database
7
+ # For documentation, see http://www.unicode.org/reports/tr44/
8
+ #
9
+ # Format:
10
+ # Start Code..End Code; Block Name
11
+
12
+ # ================================================
13
+
14
+ # Note: When comparing block names, casing, whitespace, hyphens,
15
+ # and underbars are ignored.
16
+ # For example, "Latin Extended-A" and "latin extended a" are equivalent.
17
+ # For more information on the comparison of property values,
18
+ # see UAX #44: http://www.unicode.org/reports/tr44/
19
+ #
20
+ # All block ranges start with a value where (cp MOD 16) = 0,
21
+ # and end with a value where (cp MOD 16) = 15. In other words,
22
+ # the last hexadecimal digit of the start of range is ...0
23
+ # and the last hexadecimal digit of the end of range is ...F.
24
+ # This constraint on block ranges guarantees that allocations
25
+ # are done in terms of whole columns, and that code chart display
26
+ # never involves splitting columns in the charts.
27
+ #
28
+ # All code points not explicitly listed for Block
29
+ # have the value No_Block.
30
+
31
+ # Property: Block
32
+ #
33
+ # @missing: 0000..10FFFF; No_Block
34
+
35
+ 0000..007F; Basic Latin
36
+ 0080..00FF; Latin-1 Supplement
37
+ 0100..017F; Latin Extended-A
38
+ 0180..024F; Latin Extended-B
39
+ 0250..02AF; IPA Extensions
40
+ 02B0..02FF; Spacing Modifier Letters
41
+ 0300..036F; Combining Diacritical Marks
42
+ 0370..03FF; Greek and Coptic
43
+ 0400..04FF; Cyrillic
44
+ 0500..052F; Cyrillic Supplement
45
+ 0530..058F; Armenian
46
+ 0590..05FF; Hebrew
47
+ 0600..06FF; Arabic
48
+ 0700..074F; Syriac
49
+ 0750..077F; Arabic Supplement
50
+ 0780..07BF; Thaana
51
+ 07C0..07FF; NKo
52
+ 0800..083F; Samaritan
53
+ 0840..085F; Mandaic
54
+ 08A0..08FF; Arabic Extended-A
55
+ 0900..097F; Devanagari
56
+ 0980..09FF; Bengali
57
+ 0A00..0A7F; Gurmukhi
58
+ 0A80..0AFF; Gujarati
59
+ 0B00..0B7F; Oriya
60
+ 0B80..0BFF; Tamil
61
+ 0C00..0C7F; Telugu
62
+ 0C80..0CFF; Kannada
63
+ 0D00..0D7F; Malayalam
64
+ 0D80..0DFF; Sinhala
65
+ 0E00..0E7F; Thai
66
+ 0E80..0EFF; Lao
67
+ 0F00..0FFF; Tibetan
68
+ 1000..109F; Myanmar
69
+ 10A0..10FF; Georgian
70
+ 1100..11FF; Hangul Jamo
71
+ 1200..137F; Ethiopic
72
+ 1380..139F; Ethiopic Supplement
73
+ 13A0..13FF; Cherokee
74
+ 1400..167F; Unified Canadian Aboriginal Syllabics
75
+ 1680..169F; Ogham
76
+ 16A0..16FF; Runic
77
+ 1700..171F; Tagalog
78
+ 1720..173F; Hanunoo
79
+ 1740..175F; Buhid
80
+ 1760..177F; Tagbanwa
81
+ 1780..17FF; Khmer
82
+ 1800..18AF; Mongolian
83
+ 18B0..18FF; Unified Canadian Aboriginal Syllabics Extended
84
+ 1900..194F; Limbu
85
+ 1950..197F; Tai Le
86
+ 1980..19DF; New Tai Lue
87
+ 19E0..19FF; Khmer Symbols
88
+ 1A00..1A1F; Buginese
89
+ 1A20..1AAF; Tai Tham
90
+ 1AB0..1AFF; Combining Diacritical Marks Extended
91
+ 1B00..1B7F; Balinese
92
+ 1B80..1BBF; Sundanese
93
+ 1BC0..1BFF; Batak
94
+ 1C00..1C4F; Lepcha
95
+ 1C50..1C7F; Ol Chiki
96
+ 1C80..1C8F; Cyrillic Extended-C
97
+ 1CC0..1CCF; Sundanese Supplement
98
+ 1CD0..1CFF; Vedic Extensions
99
+ 1D00..1D7F; Phonetic Extensions
100
+ 1D80..1DBF; Phonetic Extensions Supplement
101
+ 1DC0..1DFF; Combining Diacritical Marks Supplement
102
+ 1E00..1EFF; Latin Extended Additional
103
+ 1F00..1FFF; Greek Extended
104
+ 2000..206F; General Punctuation
105
+ 2070..209F; Superscripts and Subscripts
106
+ 20A0..20CF; Currency Symbols
107
+ 20D0..20FF; Combining Diacritical Marks for Symbols
108
+ 2100..214F; Letterlike Symbols
109
+ 2150..218F; Number Forms
110
+ 2190..21FF; Arrows
111
+ 2200..22FF; Mathematical Operators
112
+ 2300..23FF; Miscellaneous Technical
113
+ 2400..243F; Control Pictures
114
+ 2440..245F; Optical Character Recognition
115
+ 2460..24FF; Enclosed Alphanumerics
116
+ 2500..257F; Box Drawing
117
+ 2580..259F; Block Elements
118
+ 25A0..25FF; Geometric Shapes
119
+ 2600..26FF; Miscellaneous Symbols
120
+ 2700..27BF; Dingbats
121
+ 27C0..27EF; Miscellaneous Mathematical Symbols-A
122
+ 27F0..27FF; Supplemental Arrows-A
123
+ 2800..28FF; Braille Patterns
124
+ 2900..297F; Supplemental Arrows-B
125
+ 2980..29FF; Miscellaneous Mathematical Symbols-B
126
+ 2A00..2AFF; Supplemental Mathematical Operators
127
+ 2B00..2BFF; Miscellaneous Symbols and Arrows
128
+ 2C00..2C5F; Glagolitic
129
+ 2C60..2C7F; Latin Extended-C
130
+ 2C80..2CFF; Coptic
131
+ 2D00..2D2F; Georgian Supplement
132
+ 2D30..2D7F; Tifinagh
133
+ 2D80..2DDF; Ethiopic Extended
134
+ 2DE0..2DFF; Cyrillic Extended-A
135
+ 2E00..2E7F; Supplemental Punctuation
136
+ 2E80..2EFF; CJK Radicals Supplement
137
+ 2F00..2FDF; Kangxi Radicals
138
+ 2FF0..2FFF; Ideographic Description Characters
139
+ 3000..303F; CJK Symbols and Punctuation
140
+ 3040..309F; Hiragana
141
+ 30A0..30FF; Katakana
142
+ 3100..312F; Bopomofo
143
+ 3130..318F; Hangul Compatibility Jamo
144
+ 3190..319F; Kanbun
145
+ 31A0..31BF; Bopomofo Extended
146
+ 31C0..31EF; CJK Strokes
147
+ 31F0..31FF; Katakana Phonetic Extensions
148
+ 3200..32FF; Enclosed CJK Letters and Months
149
+ 3300..33FF; CJK Compatibility
150
+ 3400..4DBF; CJK Unified Ideographs Extension A
151
+ 4DC0..4DFF; Yijing Hexagram Symbols
152
+ 4E00..9FFF; CJK Unified Ideographs
153
+ A000..A48F; Yi Syllables
154
+ A490..A4CF; Yi Radicals
155
+ A4D0..A4FF; Lisu
156
+ A500..A63F; Vai
157
+ A640..A69F; Cyrillic Extended-B
158
+ A6A0..A6FF; Bamum
159
+ A700..A71F; Modifier Tone Letters
160
+ A720..A7FF; Latin Extended-D
161
+ A800..A82F; Syloti Nagri
162
+ A830..A83F; Common Indic Number Forms
163
+ A840..A87F; Phags-pa
164
+ A880..A8DF; Saurashtra
165
+ A8E0..A8FF; Devanagari Extended
166
+ A900..A92F; Kayah Li
167
+ A930..A95F; Rejang
168
+ A960..A97F; Hangul Jamo Extended-A
169
+ A980..A9DF; Javanese
170
+ A9E0..A9FF; Myanmar Extended-B
171
+ AA00..AA5F; Cham
172
+ AA60..AA7F; Myanmar Extended-A
173
+ AA80..AADF; Tai Viet
174
+ AAE0..AAFF; Meetei Mayek Extensions
175
+ AB00..AB2F; Ethiopic Extended-A
176
+ AB30..AB6F; Latin Extended-E
177
+ AB70..ABBF; Cherokee Supplement
178
+ ABC0..ABFF; Meetei Mayek
179
+ AC00..D7AF; Hangul Syllables
180
+ D7B0..D7FF; Hangul Jamo Extended-B
181
+ D800..DB7F; High Surrogates
182
+ DB80..DBFF; High Private Use Surrogates
183
+ DC00..DFFF; Low Surrogates
184
+ E000..F8FF; Private Use Area
185
+ F900..FAFF; CJK Compatibility Ideographs
186
+ FB00..FB4F; Alphabetic Presentation Forms
187
+ FB50..FDFF; Arabic Presentation Forms-A
188
+ FE00..FE0F; Variation Selectors
189
+ FE10..FE1F; Vertical Forms
190
+ FE20..FE2F; Combining Half Marks
191
+ FE30..FE4F; CJK Compatibility Forms
192
+ FE50..FE6F; Small Form Variants
193
+ FE70..FEFF; Arabic Presentation Forms-B
194
+ FF00..FFEF; Halfwidth and Fullwidth Forms
195
+ FFF0..FFFF; Specials
196
+ 10000..1007F; Linear B Syllabary
197
+ 10080..100FF; Linear B Ideograms
198
+ 10100..1013F; Aegean Numbers
199
+ 10140..1018F; Ancient Greek Numbers
200
+ 10190..101CF; Ancient Symbols
201
+ 101D0..101FF; Phaistos Disc
202
+ 10280..1029F; Lycian
203
+ 102A0..102DF; Carian
204
+ 102E0..102FF; Coptic Epact Numbers
205
+ 10300..1032F; Old Italic
206
+ 10330..1034F; Gothic
207
+ 10350..1037F; Old Permic
208
+ 10380..1039F; Ugaritic
209
+ 103A0..103DF; Old Persian
210
+ 10400..1044F; Deseret
211
+ 10450..1047F; Shavian
212
+ 10480..104AF; Osmanya
213
+ 104B0..104FF; Osage
214
+ 10500..1052F; Elbasan
215
+ 10530..1056F; Caucasian Albanian
216
+ 10600..1077F; Linear A
217
+ 10800..1083F; Cypriot Syllabary
218
+ 10840..1085F; Imperial Aramaic
219
+ 10860..1087F; Palmyrene
220
+ 10880..108AF; Nabataean
221
+ 108E0..108FF; Hatran
222
+ 10900..1091F; Phoenician
223
+ 10920..1093F; Lydian
224
+ 10980..1099F; Meroitic Hieroglyphs
225
+ 109A0..109FF; Meroitic Cursive
226
+ 10A00..10A5F; Kharoshthi
227
+ 10A60..10A7F; Old South Arabian
228
+ 10A80..10A9F; Old North Arabian
229
+ 10AC0..10AFF; Manichaean
230
+ 10B00..10B3F; Avestan
231
+ 10B40..10B5F; Inscriptional Parthian
232
+ 10B60..10B7F; Inscriptional Pahlavi
233
+ 10B80..10BAF; Psalter Pahlavi
234
+ 10C00..10C4F; Old Turkic
235
+ 10C80..10CFF; Old Hungarian
236
+ 10E60..10E7F; Rumi Numeral Symbols
237
+ 11000..1107F; Brahmi
238
+ 11080..110CF; Kaithi
239
+ 110D0..110FF; Sora Sompeng
240
+ 11100..1114F; Chakma
241
+ 11150..1117F; Mahajani
242
+ 11180..111DF; Sharada
243
+ 111E0..111FF; Sinhala Archaic Numbers
244
+ 11200..1124F; Khojki
245
+ 11280..112AF; Multani
246
+ 112B0..112FF; Khudawadi
247
+ 11300..1137F; Grantha
248
+ 11400..1147F; Newa
249
+ 11480..114DF; Tirhuta
250
+ 11580..115FF; Siddham
251
+ 11600..1165F; Modi
252
+ 11660..1167F; Mongolian Supplement
253
+ 11680..116CF; Takri
254
+ 11700..1173F; Ahom
255
+ 118A0..118FF; Warang Citi
256
+ 11AC0..11AFF; Pau Cin Hau
257
+ 11C00..11C6F; Bhaiksuki
258
+ 11C70..11CBF; Marchen
259
+ 12000..123FF; Cuneiform
260
+ 12400..1247F; Cuneiform Numbers and Punctuation
261
+ 12480..1254F; Early Dynastic Cuneiform
262
+ 13000..1342F; Egyptian Hieroglyphs
263
+ 14400..1467F; Anatolian Hieroglyphs
264
+ 16800..16A3F; Bamum Supplement
265
+ 16A40..16A6F; Mro
266
+ 16AD0..16AFF; Bassa Vah
267
+ 16B00..16B8F; Pahawh Hmong
268
+ 16F00..16F9F; Miao
269
+ 16FE0..16FFF; Ideographic Symbols and Punctuation
270
+ 17000..187FF; Tangut
271
+ 18800..18AFF; Tangut Components
272
+ 1B000..1B0FF; Kana Supplement
273
+ 1BC00..1BC9F; Duployan
274
+ 1BCA0..1BCAF; Shorthand Format Controls
275
+ 1D000..1D0FF; Byzantine Musical Symbols
276
+ 1D100..1D1FF; Musical Symbols
277
+ 1D200..1D24F; Ancient Greek Musical Notation
278
+ 1D300..1D35F; Tai Xuan Jing Symbols
279
+ 1D360..1D37F; Counting Rod Numerals
280
+ 1D400..1D7FF; Mathematical Alphanumeric Symbols
281
+ 1D800..1DAAF; Sutton SignWriting
282
+ 1E000..1E02F; Glagolitic Supplement
283
+ 1E800..1E8DF; Mende Kikakui
284
+ 1E900..1E95F; Adlam
285
+ 1EE00..1EEFF; Arabic Mathematical Alphabetic Symbols
286
+ 1F000..1F02F; Mahjong Tiles
287
+ 1F030..1F09F; Domino Tiles
288
+ 1F0A0..1F0FF; Playing Cards
289
+ 1F100..1F1FF; Enclosed Alphanumeric Supplement
290
+ 1F200..1F2FF; Enclosed Ideographic Supplement
291
+ 1F300..1F5FF; Miscellaneous Symbols and Pictographs
292
+ 1F600..1F64F; Emoticons
293
+ 1F650..1F67F; Ornamental Dingbats
294
+ 1F680..1F6FF; Transport and Map Symbols
295
+ 1F700..1F77F; Alchemical Symbols
296
+ 1F780..1F7FF; Geometric Shapes Extended
297
+ 1F800..1F8FF; Supplemental Arrows-C
298
+ 1F900..1F9FF; Supplemental Symbols and Pictographs
299
+ 20000..2A6DF; CJK Unified Ideographs Extension B
300
+ 2A700..2B73F; CJK Unified Ideographs Extension C
301
+ 2B740..2B81F; CJK Unified Ideographs Extension D
302
+ 2B820..2CEAF; CJK Unified Ideographs Extension E
303
+ 2F800..2FA1F; CJK Compatibility Ideographs Supplement
304
+ E0000..E007F; Tags
305
+ E0100..E01EF; Variation Selectors Supplement
306
+ F0000..FFFFF; Supplementary Private Use Area-A
307
+ 100000..10FFFF; Supplementary Private Use Area-B
308
+
309
+ # EOF