wenlin_db_scanner 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,5 @@
1
+ lib/**/*.rb
2
+ bin/*
3
+ -
4
+ features/**/*.feature
5
+ LICENSE.txt
data/Gemfile ADDED
@@ -0,0 +1,13 @@
1
+ source "http://rubygems.org"
2
+ # Add dependencies required to use your gem here.
3
+ # Example:
4
+ # gem "activesupport", ">= 2.3.5"
5
+
6
+ # Add dependencies to develop your gem here.
7
+ # Include everything needed to run rake, tests, features, etc.
8
+ group :development do
9
+ gem "yard", ">= 0.8.2.1"
10
+ gem "rdoc", ">= 3.12"
11
+ gem "bundler", ">= 1.2.0"
12
+ gem "jeweler", ">= 1.8.4"
13
+ end
@@ -0,0 +1,23 @@
1
+ GEM
2
+ remote: http://rubygems.org/
3
+ specs:
4
+ git (1.2.5)
5
+ jeweler (1.8.4)
6
+ bundler (~> 1.0)
7
+ git (>= 1.2.5)
8
+ rake
9
+ rdoc
10
+ json (1.7.5)
11
+ rake (0.9.2.2)
12
+ rdoc (3.12)
13
+ json (~> 1.4)
14
+ yard (0.8.2.1)
15
+
16
+ PLATFORMS
17
+ ruby
18
+
19
+ DEPENDENCIES
20
+ bundler (>= 1.2.0)
21
+ jeweler (>= 1.8.4)
22
+ rdoc (>= 3.12)
23
+ yard (>= 0.8.2.1)
@@ -0,0 +1,122 @@
1
+ Creative Commons Legal Code
2
+
3
+ CC0 1.0 Universal
4
+
5
+ CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
6
+ LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
7
+ ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
8
+ INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
9
+ REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
10
+ PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
11
+ THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
12
+ HEREUNDER.
13
+
14
+ Statement of Purpose
15
+
16
+ The laws of most jurisdictions throughout the world automatically confer
17
+ exclusive Copyright and Related Rights (defined below) upon the creator
18
+ and subsequent owner(s) (each and all, an "owner") of an original work of
19
+ authorship and/or a database (each, a "Work").
20
+
21
+ Certain owners wish to permanently relinquish those rights to a Work for
22
+ the purpose of contributing to a commons of creative, cultural and
23
+ scientific works ("Commons") that the public can reliably and without fear
24
+ of later claims of infringement build upon, modify, incorporate in other
25
+ works, reuse and redistribute as freely as possible in any form whatsoever
26
+ and for any purposes, including without limitation commercial purposes.
27
+ These owners may contribute to the Commons to promote the ideal of a free
28
+ culture and the further production of creative, cultural and scientific
29
+ works, or to gain reputation or greater distribution for their Work in
30
+ part through the use and efforts of others.
31
+
32
+ For these and/or other purposes and motivations, and without any
33
+ expectation of additional consideration or compensation, the person
34
+ associating CC0 with a Work (the "Affirmer"), to the extent that he or she
35
+ is an owner of Copyright and Related Rights in the Work, voluntarily
36
+ elects to apply CC0 to the Work and publicly distribute the Work under its
37
+ terms, with knowledge of his or her Copyright and Related Rights in the
38
+ Work and the meaning and intended legal effect of CC0 on those rights.
39
+
40
+ 1. Copyright and Related Rights. A Work made available under CC0 may be
41
+ protected by copyright and related or neighboring rights ("Copyright and
42
+ Related Rights"). Copyright and Related Rights include, but are not
43
+ limited to, the following:
44
+
45
+ i. the right to reproduce, adapt, distribute, perform, display,
46
+ communicate, and translate a Work;
47
+ ii. moral rights retained by the original author(s) and/or performer(s);
48
+ iii. publicity and privacy rights pertaining to a person's image or
49
+ likeness depicted in a Work;
50
+ iv. rights protecting against unfair competition in regards to a Work,
51
+ subject to the limitations in paragraph 4(a), below;
52
+ v. rights protecting the extraction, dissemination, use and reuse of data
53
+ in a Work;
54
+ vi. database rights (such as those arising under Directive 96/9/EC of the
55
+ European Parliament and of the Council of 11 March 1996 on the legal
56
+ protection of databases, and under any national implementation
57
+ thereof, including any amended or successor version of such
58
+ directive); and
59
+ vii. other similar, equivalent or corresponding rights throughout the
60
+ world based on applicable law or treaty, and any national
61
+ implementations thereof.
62
+
63
+ 2. Waiver. To the greatest extent permitted by, but not in contravention
64
+ of, applicable law, Affirmer hereby overtly, fully, permanently,
65
+ irrevocably and unconditionally waives, abandons, and surrenders all of
66
+ Affirmer's Copyright and Related Rights and associated claims and causes
67
+ of action, whether now known or unknown (including existing as well as
68
+ future claims and causes of action), in the Work (i) in all territories
69
+ worldwide, (ii) for the maximum duration provided by applicable law or
70
+ treaty (including future time extensions), (iii) in any current or future
71
+ medium and for any number of copies, and (iv) for any purpose whatsoever,
72
+ including without limitation commercial, advertising or promotional
73
+ purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
74
+ member of the public at large and to the detriment of Affirmer's heirs and
75
+ successors, fully intending that such Waiver shall not be subject to
76
+ revocation, rescission, cancellation, termination, or any other legal or
77
+ equitable action to disrupt the quiet enjoyment of the Work by the public
78
+ as contemplated by Affirmer's express Statement of Purpose.
79
+
80
+ 3. Public License Fallback. Should any part of the Waiver for any reason
81
+ be judged legally invalid or ineffective under applicable law, then the
82
+ Waiver shall be preserved to the maximum extent permitted taking into
83
+ account Affirmer's express Statement of Purpose. In addition, to the
84
+ extent the Waiver is so judged Affirmer hereby grants to each affected
85
+ person a royalty-free, non transferable, non sublicensable, non exclusive,
86
+ irrevocable and unconditional license to exercise Affirmer's Copyright and
87
+ Related Rights in the Work (i) in all territories worldwide, (ii) for the
88
+ maximum duration provided by applicable law or treaty (including future
89
+ time extensions), (iii) in any current or future medium and for any number
90
+ of copies, and (iv) for any purpose whatsoever, including without
91
+ limitation commercial, advertising or promotional purposes (the
92
+ "License"). The License shall be deemed effective as of the date CC0 was
93
+ applied by Affirmer to the Work. Should any part of the License for any
94
+ reason be judged legally invalid or ineffective under applicable law, such
95
+ partial invalidity or ineffectiveness shall not invalidate the remainder
96
+ of the License, and in such case Affirmer hereby affirms that he or she
97
+ will not (i) exercise any of his or her remaining Copyright and Related
98
+ Rights in the Work or (ii) assert any associated claims and causes of
99
+ action with respect to the Work, in either case contrary to Affirmer's
100
+ express Statement of Purpose.
101
+
102
+ 4. Limitations and Disclaimers.
103
+
104
+ a. No trademark or patent rights held by Affirmer are waived, abandoned,
105
+ surrendered, licensed or otherwise affected by this document.
106
+ b. Affirmer offers the Work as-is and makes no representations or
107
+ warranties of any kind concerning the Work, express, implied,
108
+ statutory or otherwise, including without limitation warranties of
109
+ title, merchantability, fitness for a particular purpose, non
110
+ infringement, or the absence of latent or other defects, accuracy, or
111
+ the present or absence of errors, whether or not discoverable, all to
112
+ the greatest extent permissible under applicable law.
113
+ c. Affirmer disclaims responsibility for clearing rights of other persons
114
+ that may apply to the Work or any use thereof, including without
115
+ limitation any person's Copyright and Related Rights in the Work.
116
+ Further, Affirmer disclaims responsibility for obtaining any necessary
117
+ consents, permissions or other rights required for any use of the
118
+ Work.
119
+ d. Affirmer understands and acknowledges that Creative Commons is not a
120
+ party to this document and has no duty or obligation with respect to
121
+ this CC0 or use of the Work.
122
+
@@ -0,0 +1,102 @@
1
+ # wenlin_db_scanner
2
+
3
+ Extracts the data from the Wenlin dictionary program.
4
+
5
+ The Wenlin Dictionary contains two great databases, the
6
+ [ABC English<->Chinese Dictionary](http://www.wenlin.com/abc.htm) and the
7
+ [Character Description Language](http://www.wenlin.com/cdl/) (CDL).
8
+
9
+ Unfortunately, this great data is wrapped by a less-than-great UI. This code is
10
+ intended to be useful to Chinese language students who wish to interact with
11
+ the data on their own terms.
12
+
13
+
14
+ ## Installation
15
+
16
+ The tool ships as a Ruby gem, and the standard installation process applies.
17
+ The code relies on Ruby 1.9 syntax and String encoding. It was tested to work
18
+ with MRI 1.9.3.
19
+
20
+ ```bash
21
+ gem install wenlin_db_server
22
+ ```
23
+
24
+
25
+ ## Command-Line Usage
26
+
27
+ The following commands assume that the current directory of your `Terminal` /
28
+ `Command Prompt` is the Wenlin application's main directory. If your current
29
+ directory contains a `W4DB` directory, you're probably in the right place.
30
+
31
+ ### wenlin_dict
32
+
33
+ Parses a dictionary database into a file containing one JSON line per entry.
34
+
35
+ ```bash
36
+ wenlin_dict W4DB/ en-zh > en_zh.json
37
+ wenlin_dict W4DB/ zh-en > zh_en.json
38
+ wenlin_dict W4DB/ hz-en > hz_en.json
39
+ ```
40
+
41
+ ### wenlin_hanzi
42
+
43
+ Parses the database that breaks down hanzi (Chinese characters) into
44
+ components.
45
+
46
+ ```bash
47
+ wenlin_hanzi W4DB > hanzi.json
48
+ ```
49
+
50
+ ### wenlin_parts
51
+
52
+ Parses a parts-of-speech database into a file containing one JSON line per part
53
+ of speech.
54
+
55
+ The parts of speech are referenced by the word defintion databases, which use
56
+ their abbreviations.
57
+
58
+ ```bash
59
+ wenlin_parts W4DB/ en > en_parts.json
60
+ wenlin_parts W4DB/ zh > zh_parts.json
61
+ ```
62
+
63
+ ### wenlin_dbdump
64
+
65
+ Extracts the raw text entries in a .db file. Useful for debugging and
66
+ understanding the record format.
67
+
68
+ ```bash
69
+ wenlin_dbdumb W4DB/abc_ce.db
70
+ ```
71
+
72
+
73
+ ## API Usage
74
+
75
+ The scripts in the `bin` directory are thin wrappers over the API. Read them if
76
+ you want to use the Ruby API directly.
77
+
78
+ It is very likely that you'll get your job done faster by using the output of
79
+ the CLI tools.
80
+
81
+
82
+ ## Testing
83
+
84
+ I test this code by runing the tools inside `bin` against the Wenlin databases,
85
+ and by spot-checking the output.
86
+
87
+
88
+ ## Contributing
89
+
90
+ This tool works fairly well on the Wenlin 4 data files. Bugfixes and support
91
+ for new .db file formats are welcome, other features are most likely outside
92
+ the project's scope.
93
+
94
+ Note that this tool is designed to help moving the data into another program,
95
+ so it only supports full table scans. Support for random access using the
96
+ B-tree indexes is outside the scope of this project.
97
+
98
+
99
+ ## Copyright
100
+
101
+ This code is licensed under the
102
+ [CC0 Public Domain](http://creativecommons.org/publicdomain/zero/1.0/) license.
@@ -0,0 +1,36 @@
1
+ # encoding: utf-8
2
+
3
+ require 'rubygems'
4
+ require 'bundler'
5
+ begin
6
+ Bundler.setup(:default, :development)
7
+ rescue Bundler::BundlerError => e
8
+ $stderr.puts e.message
9
+ $stderr.puts "Run `bundle install` to install missing gems"
10
+ exit e.status_code
11
+ end
12
+ require 'rake'
13
+
14
+ require 'jeweler'
15
+ Jeweler::Tasks.new do |gem|
16
+ # gem is a Gem::Specification... see http://docs.rubygems.org/read/chapter/20 for more options
17
+ gem.name = "wenlin_db_scanner"
18
+ gem.homepage = "http://github.com/pwnall/wenlin_db_scanner"
19
+ gem.license = "CC0"
20
+ gem.summary = %Q{Extracts the data from the Wenlin dictionary}
21
+ gem.description = <<END
22
+ The Wenlin dictionary contains two great databases, the ABC English<->Chinese
23
+ dictionary, and the Character Description Language (CDL). Unfortunately, this
24
+ data is wrapped by a less-than-great UI. This gem lets you extract the data so
25
+ you can build your own UI for it.
26
+ END
27
+ gem.email = "victor@costan.us"
28
+ gem.authors = ["Victor Costan"]
29
+ # dependencies defined in Gemfile
30
+ end
31
+ Jeweler::RubygemsDotOrgTasks.new
32
+
33
+ task :default => :install
34
+
35
+ require 'yard'
36
+ YARD::Rake::YardocTask.new
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.2.0
@@ -0,0 +1,24 @@
1
+ #!/usr/bin/env ruby
2
+ # Requires Ruby 1.9, tested on MRI 1.9.3.
3
+
4
+ require 'wenlin_db_scanner'
5
+
6
+ unless ARGV.length == 1
7
+ STDERR.puts "Usage: #{$0} path-to-db-file"
8
+ exit 1
9
+ end
10
+
11
+ db = WenlinDbScanner::Db.new ARGV[0]
12
+ db.records.each do |record|
13
+ puts "---------- record tag: #{record.tag} = 0b#{'%b' % record.tag}"
14
+
15
+ if record.binary?
16
+ puts "---------- binary record, size: #{record.size}"
17
+ next
18
+ end
19
+
20
+ puts record.text
21
+ puts "---------- record end"
22
+ end
23
+ db.close
24
+
@@ -0,0 +1,24 @@
1
+ #!/usr/bin/env ruby
2
+ # Requires Ruby 1.9, tested on MRI 1.9.3.
3
+
4
+ require 'json'
5
+ require 'wenlin_db_scanner'
6
+
7
+ unless ARGV.length == 2
8
+ STDERR.puts "Usage: #{$0} path-to-db-dir en-zh|zh-en|hz-en"
9
+ exit 1
10
+ end
11
+
12
+ case ARGV[1]
13
+ when 'en-zh'
14
+ entries = WenlinDbScanner::Dicts.en_zh ARGV[0]
15
+ when 'zh-en'
16
+ entries = WenlinDbScanner::Dicts.zh_en ARGV[0]
17
+ when 'hz-en'
18
+ entries = WenlinDbScanner::Chars.hz_en ARGV[0]
19
+ else
20
+ STDERR.puts "Unknown dictionary #{ARGV[1]}\nUse en-zh, zh-en, or hz-en\n"
21
+ exit 1
22
+ end
23
+
24
+ entries.each { |entry| puts entry.to_hash.to_json }
@@ -0,0 +1,13 @@
1
+ #!/usr/bin/env ruby
2
+ # Requires Ruby 1.9, tested on MRI 1.9.3.
3
+
4
+ require 'json'
5
+ require 'wenlin_db_scanner'
6
+
7
+ unless ARGV.length == 1
8
+ puts "Usage: #{$0} path-to-db-dir"
9
+ end
10
+
11
+ chars = WenlinDbScanner::Chars.hanzi ARGV[0]
12
+
13
+ chars.each { |char| puts char.to_hash.to_json }
@@ -0,0 +1,23 @@
1
+ #!/usr/bin/env ruby
2
+ # Requires Ruby 1.9, tested on MRI 1.9.3.
3
+
4
+ require 'json'
5
+ require 'wenlin_db_scanner'
6
+
7
+ unless ARGV.length == 2
8
+ STDERR.puts "Usage: #{$0} path-to-db-dir en|zh"
9
+ exit 1
10
+ end
11
+
12
+ case ARGV[1]
13
+ when 'en'
14
+ parts = WenlinDbScanner::SpeechParts.en ARGV[0]
15
+ when 'zh'
16
+ parts = WenlinDbScanner::SpeechParts.zh ARGV[0]
17
+ else
18
+ STDERR.puts "Unknown language #{ARGV[1]}\nUse en or zh"
19
+ exit 1
20
+ end
21
+
22
+ parts.each { |part| puts part.to_hash.to_json }
23
+
@@ -0,0 +1,13 @@
1
+ # Namespace for the Db scanner classes.
2
+ #
3
+ # The awfully long name was chosen on purpose, to save the good names for more
4
+ # useful libraries.
5
+ module WenlinDbScanner
6
+ end
7
+
8
+ require 'wenlin_db_scanner/chars.rb'
9
+ require 'wenlin_db_scanner/db.rb'
10
+ require 'wenlin_db_scanner/db_record.rb'
11
+ require 'wenlin_db_scanner/dict.rb'
12
+ require 'wenlin_db_scanner/speech_parts.rb'
13
+
@@ -0,0 +1,210 @@
1
+ # coding: utf-8
2
+
3
+ require 'rexml/document'
4
+
5
+ module WenlinDbScanner
6
+
7
+ # Parses the data in the character (hanzi) databases.
8
+ module Chars
9
+ # The entries in the database that breaks down hanzi into components.
10
+ #
11
+ # @param [String] db_root the directory containing the .db files
12
+ # @return [Enumerator<Hash>]
13
+ def self.hanzi(db_root)
14
+ _hanzi File.join(db_root, 'cdl.db')
15
+ end
16
+
17
+ # Decoder for a CDL database.
18
+ #
19
+ # @param [String] db_file path to the .db file containing CDL data
20
+ # @return [Enumerator<Hash>]
21
+ def self._hanzi(db_file)
22
+ Enumerator.new do |yielder|
23
+ db = Db.new db_file
24
+ db.records.each do |record|
25
+ next if record.binary?
26
+ xml = REXML::Document.new record.text
27
+
28
+ entry = {}
29
+ xml.root.attributes.each do |name, raw_value|
30
+ key = name.to_sym
31
+ entry[key] = cdl_attribute_value key, raw_value
32
+ end
33
+
34
+ entry[:parts] = xml.root.elements.map do |element|
35
+ part = { part: element.name.to_sym }
36
+ element.attributes.each do |name, raw_value|
37
+ key = name.to_sym
38
+ part[key] = cdl_attribute_value key, raw_value
39
+ end
40
+ part
41
+ end
42
+
43
+ yielder << entry
44
+ end
45
+ end
46
+ end
47
+
48
+ # Decodes known attributes for CDL XML elements.
49
+ #
50
+ # @param [Symbol] key the attribute's name, symbolized
51
+ # @param [String] value the attribute's value
52
+ # @return [Integer, Array, String] a more programmer-friendly value
53
+ def self.cdl_attribute_value(key, raw_value)
54
+ case key
55
+ when :points # coordinates
56
+ raw_value.split(' ').map do |pair|
57
+ pair.split(',').map { |coord| coord.strip.to_i }
58
+ end
59
+ when :radical # dictionary radicals?
60
+ raw_value.strip.split(' ').map(&:strip)
61
+ when :type # stroke type
62
+ raw_value.strip.to_sym
63
+ when :uni # unicode value
64
+ raw_value.strip.to_i(16)
65
+ else
66
+ raw_value.strip
67
+ end
68
+ end
69
+
70
+ # The entries in the hanzi -> English meaning dictionary.
71
+ #
72
+ # @param [String] db_root the directory containing the .db files
73
+ # @return [Enumerator<CharMeaning>]
74
+ def self.hz_en(db_root)
75
+ _hz_en File.join(db_root, 'zidian.db')
76
+ end
77
+
78
+ # Decodeder for a database of hanzi -> English meaning entries.
79
+ #
80
+ # @param [String] db_file path to the .db file containing dictionary data
81
+ # @return [Enumerator<DictEntry>]
82
+ def self._hz_en(db_file)
83
+ Enumerator.new do |yielder|
84
+ db = Db.new db_file
85
+ db.records.each do |record|
86
+ next if record.binary?
87
+ lines = record.text.split("\n").map(&:strip).reject(&:empty?)
88
+
89
+ header = lines[0]
90
+
91
+ entry = CharMeaning.new
92
+ entry.char = header[0, 1]
93
+ header = header[1..-1]
94
+
95
+ entry.pinyin = header.scan(/\[([^\]]*)\]/).
96
+ map { |match| match.first.strip }
97
+ entry.latin_pinyin =
98
+ entry.pinyin.map { |pinyin| pinyin_to_latin pinyin }
99
+ header.gsub!(/\[[^\]]*\]/, '')
100
+ header.strip!
101
+
102
+ header.scan(/\([^\)]+\)/).each do |aside|
103
+ aside_text = aside[1...-1]
104
+ case aside_text[0]
105
+ when '='
106
+ entry.variants = aside_text[1..-1].chars.to_a
107
+ header.gsub! aside, ''
108
+ when '!', '?'
109
+ entry.related ||= []
110
+ entry.related += aside_text[1..-1].chars.to_a
111
+ header.gsub! aside, ''
112
+ when 'F'
113
+ entry.complex_forms = aside_text[1..-1].chars.to_a
114
+ header.gsub! aside, ''
115
+ when 'S'
116
+ entry.simplified_forms = aside_text[1..-1].chars.to_a
117
+ header.gsub! aside, ''
118
+ when 'u', 'U'
119
+ if /^Unihan/i =~ aside_text
120
+ header.gsub! aside, ''
121
+ end
122
+ end
123
+ end
124
+ header.strip!
125
+ # Many definitions start with a (note).
126
+ if note_match = /^\(([^\)]*)\)/.match(header)
127
+ entry.note = note_match[1]
128
+ header = header[note_match[0].length..-1].strip
129
+ end
130
+ entry.meaning = header.gsub(/\s*<hr\s*\/?>\s*/, "\n")
131
+
132
+ lines[1..-1].each do |line|
133
+ unless line[0] == ?#
134
+ if entry.note
135
+ entry.note << "/ #{line}"
136
+ else
137
+ entry.note = line
138
+ end
139
+ next
140
+ end
141
+
142
+ tag, data = line[1], line[2..-1].strip
143
+ case 'tag'
144
+ when 'c'
145
+ entry.components = data.chars.to_a
146
+ when 'r'
147
+ # NOTE: skipping remarks
148
+ when 'y'
149
+ entry.cantonese = data
150
+ end
151
+ end
152
+
153
+ yielder << entry
154
+ end
155
+ end
156
+ end
157
+
158
+ # Removes the accents from a pinyin string.
159
+ #
160
+ # This computes the closest Latin alphabet string matching the given pinyin
161
+ # string. It is what users will most likely type to refer to the character,
162
+ # word or phrase inside the pinyin-spelling string.
163
+ #
164
+ # @param [String] pinyin a string that uses pinyin spelling
165
+ # @return [String] the closest approximation to the given string that only
166
+ # uses Latin characters
167
+ def self.pinyin_to_latin(pinyin)
168
+ pinyin.tr 'āēīōūǖĀĒĪŌŪǕáéíóúǘÁÉÍÓÚǗǎěǐǒǔǚǍĚǏǑǓǙàèìòùǜÀÈÌÒÙǛüÜ',
169
+ 'aeiouvAEIOUVaeiouvAEIOUVaeiouvAEIOUVaeiouvAEIOUVvV'
170
+ end
171
+ end # module WenlinDbScanner::Dicts
172
+
173
+ # Wraps a record in a dictionary database
174
+ class CharMeaning < Struct.new(:char, :meaning, :note, :pinyin, :variants,
175
+ :complex_forms, :simplified_forms,
176
+ :components, :cantonese, :related,
177
+ :latin_pinyin)
178
+ # @!attribute [r] char
179
+ # @return [String] 1-character string containing the defined character
180
+ # @!attribute [r] meaning
181
+ # @return [String] the character's definition, in English
182
+ # @!attribute [r] note
183
+ # @return [String] e.g., "same as X" or
184
+ # @!attribute [r] pinyin
185
+ # @return [Array<String>] pinyin pronunciation(s) of the character
186
+ # @!attribute [r] latin_pinyin
187
+ # @return [Array<String>] pinyin pronunciation(s) of the character, with
188
+ # with the accents removed; this is what users type to get the
189
+ # character
190
+ # @!attribute [r] variants
191
+ # @return [Array<String>] other variants of the character
192
+ # @!attribute [r] related
193
+ # @return [Array<String>] characters that are somehow related
194
+ # @!attribute [r] simplified_forms
195
+ # @return [Array<String>] simplified variants of the character
196
+ # @!attribute [r] complex_forms
197
+ # @return [Array<String>] this character is a simplified variant of them
198
+ # @!attribute [r] components
199
+ # @return [Array<String>] 1-character strings with characters that are
200
+ # contained in this character's image
201
+ # @!attribute [r] cantonese
202
+ # @return [Array<String>] character's pronunciation in Cantonese
203
+
204
+ # @return [Hash]
205
+ def to_hash
206
+ Hash[each_pair.reject { |k, v| v.nil? }.to_a]
207
+ end
208
+ end # class WenlinDbScanner::DictEntry
209
+
210
+ end # namespace WenlinDbScanner