ruby-jdict 0.0.7 → 0.0.8

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: b2236ff7f804ac1ade31ba8eb410d6a61c67a781
4
- data.tar.gz: 2cd8e3132230029b547895cb0149f54861a91e3c
3
+ metadata.gz: f6e49f13e68f5f1a1f1421f9c814ce484a796da6
4
+ data.tar.gz: 726e6cae80ffd4da3041f5e1521f9f04a92c92f8
5
5
  SHA512:
6
- metadata.gz: a016954bb6652c3822dfd5f4f720a4920e905ce4100027e89ca4c12a4571b60338b43dd65cdcd6eb40ed33c3c0bff3b1867d950d495cf932bd2f5cb45cdec7a5
7
- data.tar.gz: aa5ba67f9b7ca1edaff47a702c765307e59044c4fba4ee30d010c4f78e199c895e5fcdd18eabe5f0dd46891345a07078bb470b3cc6690ec2ec3e2d337c2979eb
6
+ metadata.gz: cb7155746f0037ebe7a5f539766db87a9571d83b71a21a367872e150a149d3091c66bc4ace0c30e5e2f836fbd34799e702e2e49b287b0f38c2695eeb83f134dd
7
+ data.tar.gz: 9dc01e646e59e945691bdfb1098cb7afddd422120a5707f96160690e3f7401586c20ff5ba62d19079f7e3fff36254909751dc761a4b9fe1ceef53597368d471e
data/LICENSING CHANGED
@@ -1,28 +1,28 @@
1
- Copyright (C) 2015 Ian Pickering
2
- All rights reserved.
3
-
4
- Redistribution and use in source and binary forms, with or without
5
- modification, are permitted provided that the following conditions
6
- are met:
7
-
8
- 1. Redistributions of source code must retain the above copyright
9
- notice, this list of conditions and the following disclaimer.
10
- 2. Redistributions in binary form must reproduce the above copyright
11
- notice, this list of conditions and the following disclaimer in
12
- the documentation and/or other materials provided with the
13
- distribution.
14
- 3. The name of the author may not be used to endorse or promote
15
- products derived from this software without specific prior
16
- written permission.
17
-
18
- THIS SOFTWARE IS PROVIDED BY THE AUTHOR `AS IS'' AND ANY EXPRESS
19
- OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
20
- WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21
- ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
22
- DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23
- DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
24
- GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25
- INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
26
- WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
27
- NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28
- SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1
+ Copyright (C) 2015 Ian Pickering
2
+ All rights reserved.
3
+
4
+ Redistribution and use in source and binary forms, with or without
5
+ modification, are permitted provided that the following conditions
6
+ are met:
7
+
8
+ 1. Redistributions of source code must retain the above copyright
9
+ notice, this list of conditions and the following disclaimer.
10
+ 2. Redistributions in binary form must reproduce the above copyright
11
+ notice, this list of conditions and the following disclaimer in
12
+ the documentation and/or other materials provided with the
13
+ distribution.
14
+ 3. The name of the author may not be used to endorse or promote
15
+ products derived from this software without specific prior
16
+ written permission.
17
+
18
+ THIS SOFTWARE IS PROVIDED BY THE AUTHOR `AS IS'' AND ANY EXPRESS
19
+ OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
20
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21
+ ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
22
+ DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
24
+ GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25
+ INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
26
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
27
+ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
data/README.md CHANGED
@@ -1,20 +1,18 @@
1
- # Ruby-JDict
2
- Ruby gem for accessing Jim Breen's Japanese dictionaries. Can currently access the following:
3
- * JMdict (Japanese-English dictionary)
4
-
5
- *Note*: For the moment, uses SQLite (via [amalgalite](https://github.com/copiousfreetime/amalgalite)) for data storage. Not intended to be scalable.
6
-
7
- ## Install
8
- ```
9
- gem install ruby-jdict
10
- ```
11
-
12
- ## Usage
13
- See [this](https://github.com/Ruin0x11/ruby-jdict/blob/master/examples/query.rb) example for basic usage.
14
-
15
- If the dictionary file is not found, you will be prompted to download it.
16
-
17
- ## Issues
18
- * The code for inserting Entry objects into the database is horrible. Should create multiple tables for each datatype instead of a single table for all datatypes.
19
- * Some routines need to be generalized to allow for the usage of dictionaries besides JMDict (like Tatoeba or KANJIDIC).
20
- * Many functions are getting too large/unreadable.
1
+ # Ruby-JDict
2
+ Ruby gem for accessing Jim Breen's Japanese dictionaries. Can currently access the following:
3
+ * JMdict (Japanese-English dictionary)
4
+
5
+ *Note*: For the moment, uses SQLite (via [amalgalite](https://github.com/copiousfreetime/amalgalite)) for data storage. Not intended to be scalable.
6
+
7
+ ## Install
8
+ ```
9
+ gem install ruby-jdict
10
+ ```
11
+
12
+ ## Usage
13
+ See [this](https://github.com/Ruin0x11/ruby-jdict/blob/master/examples/query.rb) example for basic usage.
14
+
15
+ ## Issues
16
+ * The code for inserting Entry objects into the database is horrible. Should create multiple tables for each datatype instead of a single table for all datatypes.
17
+ * Some routines need to be generalized to allow for the usage of dictionaries besides JMDict (like Tatoeba or KANJIDIC).
18
+ * Many functions are getting too large/unreadable.
data/Rakefile CHANGED
@@ -1,30 +1,41 @@
1
- require 'rubygems'
2
- require 'rake' #task runner
3
-
4
- INDEX_PATH = 'index'
5
- JMDICT_PATH = 'dictionaries/JMdict'
6
-
7
- namespace :index do
8
-
9
- desc "Build the dictionary's search index"
10
- task :build do
11
- raise "Index already exists at path #{File.expand_path(INDEX_PATH)}" if File.exists? INDEX_PATH
12
- @index = DictIndex.new(INDEX_PATH,
13
- JMDICT_PATH,
14
- false) # lazy_loadind? no. don't lazy load
15
- puts "Index created at path #{File.expand_path(INDEX_PATH)}" if File.exists? INDEX_PATH
16
- puts "Index with #{@index.size} entries."
17
- end
18
-
19
- desc "Destroy the dictionary's search index"
20
- task :destroy do
21
- puts 'TODO: destory the index'
22
- `sudo rm -R index`
23
- # This will not work, because we don't have sudooooo.
24
- # How do you delete folders in Ruby without sudo? Probably
25
- # can't... that'd be more consistent actually.
26
- # if File.exists? INDEX_PATH
27
- # File.delete INDEX_PATH
28
- # end
29
- end
30
- end
1
+ require 'rubygems'
2
+ require 'rake' #task runner
3
+
4
+ INDEX_PATH = 'index'
5
+ JMDICT_PATH = 'dictionaries/JMdict'
6
+
7
+ namespace :index do
8
+
9
+ desc "Build the dictionary's search index"
10
+ task :build do
11
+ raise "Index already exists at path #{File.expand_path(INDEX_PATH)}" if File.exists? INDEX_PATH
12
+ @index = DictIndex.new(INDEX_PATH,
13
+ JMDICT_PATH,
14
+ false) # lazy_loadind? no. don't lazy load
15
+ puts "Index created at path #{File.expand_path(INDEX_PATH)}" if File.exists? INDEX_PATH
16
+ puts "Index with #{@index.size} entries."
17
+ end
18
+
19
+ desc "Destroy the dictionary's search index"
20
+ task :destroy do
21
+ puts 'TODO: destory the index'
22
+ `sudo rm -R index`
23
+ # This will not work, because we don't have sudooooo.
24
+ # How do you delete folders in Ruby without sudo? Probably
25
+ # can't... that'd be more consistent actually.
26
+ # if File.exists? INDEX_PATH
27
+ # File.delete INDEX_PATH
28
+ # end
29
+ end
30
+ end
31
+
32
+ task :examples do
33
+ require_relative "lib/ruby-jdict.rb"
34
+ Dir['examples/*'].each do |example_file|
35
+ next if File.directory?(example_file)
36
+ puts "Running #{example_file}..."
37
+ puts
38
+
39
+ require_relative example_file
40
+ end
41
+ end
data/examples/query.rb CHANGED
@@ -1,22 +1,19 @@
1
- # -*- coding: utf-8 -*-
2
- require 'jdict'
3
-
4
- BASE_PATH = ENV["HOME"]
5
- DICT_PATH = File.join(BASE_PATH, '.dicts')
6
-
7
- JDict.configure do |config|
8
- config.dictionary_path = DICT_PATH # directory containing dictionary files
9
- config.language = JDict::JMDictConstants::Languages::ENGLISH # language for search results
10
- config.num_results = 50 # maximum results to return from searching
11
- end
12
-
13
- dict = JDict::JMDict.new
14
-
15
- query = ARGV.pop.dup unless ARGV.empty?
16
- query ||= "日本語"
17
-
18
- results = dict.search(query)
19
- results.each do |entry|
20
- puts entry.to_s
21
- puts
22
- end
1
+ # -*- coding: utf-8 -*-
2
+ require 'ruby-jdict'
3
+
4
+ DICT_PATH = File.join(ENV["HOME"], '.dicts')
5
+
6
+ dict = JDict::Dictionary.new(DICT_PATH)
7
+ dict.build_index!
8
+
9
+ query = ARGV.pop.dup unless ARGV.empty?
10
+ query ||= "日本語"
11
+
12
+ puts "Searching for \"#{query}\"."
13
+ puts
14
+
15
+ results = dict.search(query)
16
+ results.each do |entry|
17
+ puts entry.to_s
18
+ puts
19
+ end
data/lib/ruby-jdict.rb ADDED
@@ -0,0 +1,14 @@
1
+ require "ruby-jdict/dictionary"
2
+ require "ruby-jdict/constants"
3
+ require "ruby-jdict/convert"
4
+ require "ruby-jdict/dictionary"
5
+ require "ruby-jdict/index"
6
+ require "ruby-jdict/jdict"
7
+ require "ruby-jdict/version"
8
+
9
+ require "ruby-jdict/indexer/dictionary_indexer"
10
+ #require "ruby-jdict/indexer/libxml_dictionary_indexer"
11
+ require "ruby-jdict/indexer/nokogiri_dictionary_indexer"
12
+
13
+ require "ruby-jdict/models/entry"
14
+ require "ruby-jdict/models/sense"
@@ -1,64 +1,73 @@
1
- # Constants and descriptions for important elements/attributes
2
- # of the JMdict XML dictionary.
3
- # Descriptions come from JMdict.dtd (document type definition)
4
- module JDict
5
- module JMDictConstants
6
-
7
- # TODO: change these strings to symbols ?
8
- # XML elements of the JMDict file
9
- module Elements
10
- # Entries consist of kanji elements, kana elements,
11
- # general information and sense elements. Each entry must have at
12
- # least one kana element and one sense element. Others are optional.
13
- ENTRY = 'entry'
14
- SEQUENCE = 'ent_seq'
15
-
16
- # This element will contain a word or short phrase in Japanese
17
- # which is written using at least one kanji. The valid characters are
18
- # kanji, kana, related characters such as chouon and kurikaeshi, and
19
- # in exceptional cases, letters from other alphabets.
20
- KANJI = 'keb'
21
-
22
- # This element content is restricted to kana and related
23
- # characters such as chouon and kurikaeshi. Kana usage will be
24
- # consistent between the keb and reb elements; e.g. if the keb
25
- # contains katakana, so too will the reb.
26
- KANA = 'reb'
27
-
28
- # The sense element will record the translational equivalent
29
- # of the Japanese word, plus other related information. Where there
30
- # are several distinctly different meanings of the word, multiple
31
- # sense elements will be employed.
32
- SENSE = 'sense'
33
-
34
- # Part-of-speech information about the entry/sense. Should use
35
- # appropriate entity codes.
36
- PART_OF_SPEECH = 'pos'
37
-
38
- # Within each sense will be one or more "glosses", i.e.
39
- # target-language words or phrases which are equivalents to the
40
- # Japanese word. This element would normally be present, however it
41
- # may be omitted in entries which are purely for a cross-reference.
42
- GLOSS = 'gloss'
43
-
44
- CROSSREFERENCE = 'xref'
45
- end
46
-
47
- # Constants for selecting the search language.
48
- # Used in the "gloss" element's xml:lang attribute.
49
- # :eng never appears as a xml:lang constant because gloss is assumed to be English when not specified
50
- # :jpn never appears as a xml:lang because the dictionary itself pivots around Japanese
51
- module Languages
52
- JAPANESE = :jpn
53
- ENGLISH = :eng
54
- DUTCH = :dut
55
- FRENCH = :fre
56
- GERMAN = :ger
57
- RUSSIAN = :rus
58
- SPANISH = :spa
59
- SLOVENIAN = :slv
60
- SWEDISH = :swe
61
- HUNGARIAN = :hun
62
- end
63
- end
64
- end
1
+ # Constants and descriptions for important elements/attributes
2
+ # of the JMdict XML dictionary.
3
+ # Descriptions come from JMdict.dtd (document type definition)
4
+ module JDict
5
+ module JMDictConstants
6
+ # TODO: change these strings to symbols ?
7
+ # XML elements of the JMDict file
8
+ module Elements
9
+ # Entries consist of kanji elements, kana elements,
10
+ # general information and sense elements. Each entry must have at
11
+ # least one kana element and one sense element. Others are optional.
12
+ ENTRY = 'entry'
13
+ SEQUENCE = 'ent_seq'
14
+
15
+ # This element will contain a word or short phrase in Japanese
16
+ # which is written using at least one kanji. The valid characters are
17
+ # kanji, kana, related characters such as chouon and kurikaeshi, and
18
+ # in exceptional cases, letters from other alphabets.
19
+ KANJI = 'keb'
20
+
21
+ # This element content is restricted to kana and related
22
+ # characters such as chouon and kurikaeshi. Kana usage will be
23
+ # consistent between the keb and reb elements; e.g. if the keb
24
+ # contains katakana, so too will the reb.
25
+ KANA = 'reb'
26
+
27
+ # The sense element will record the translational equivalent
28
+ # of the Japanese word, plus other related information. Where there
29
+ # are several distinctly different meanings of the word, multiple
30
+ # sense elements will be employed.
31
+ SENSE = 'sense'
32
+
33
+ # Part-of-speech information about the entry/sense. Should use
34
+ # appropriate entity codes.
35
+ PART_OF_SPEECH = 'pos'
36
+
37
+ # Within each sense will be one or more "glosses", i.e.
38
+ # target-language words or phrases which are equivalents to the
39
+ # Japanese word. This element would normally be present, however it
40
+ # may be omitted in entries which are purely for a cross-reference.
41
+ GLOSS = 'gloss'
42
+
43
+ CROSSREFERENCE = 'xref'
44
+ end
45
+
46
+ # Constants for selecting the search language.
47
+ # Used in the "gloss" element's xml:lang attribute.
48
+ # :eng never appears as a xml:lang constant because gloss is assumed to be English when not specified
49
+ # :jpn never appears as a xml:lang because the dictionary itself pivots around Japanese
50
+ module Languages
51
+ JAPANESE = :jpn
52
+ ENGLISH = :eng
53
+ DUTCH = :dut
54
+ FRENCH = :fre
55
+ GERMAN = :ger
56
+ RUSSIAN = :rus
57
+ SPANISH = :spa
58
+ SLOVENIAN = :slv
59
+ SWEDISH = :swe
60
+ HUNGARIAN = :hun
61
+ end
62
+
63
+ LANGUAGE_DEFAULT = Languages::ENGLISH
64
+ end
65
+
66
+ # Used when serializing entries to SQL.
67
+ module SerialConstants
68
+ LANGUAGE_SENTINEL = '&&'
69
+ MEANING_SENTINEL = '**'
70
+ PART_OF_SPEECH_SENTINEL = '$$'
71
+ SENSE_SENTINEL = '%%'
72
+ end
73
+ end
@@ -0,0 +1,33 @@
1
+ # coding: utf-8
2
+ module JDict
3
+ module Convert
4
+ HANKAKU_KATAKANA = "ハヒフヘホウカキクケコサシスセソタチツテトアイエオナニヌネノマミムメモヤユヨラリルレロワヲンァィゥェォャュョッ"
5
+ HANKAKU_VSYMBOLS= { '' => 0, '゙' => 1, '゚' => 2 }
6
+ ZENKAKU_KATAKANA = [
7
+ 'ハヒフヘホウカキクケコサシスセソタチツテトアイエオ'+
8
+ 'ナニヌネノマミムメモヤユヨラリルレロワヲンァィゥェォャュョッ',
9
+ 'バビブベボヴガギグゲゴザジズゼゾダヂヅデド',
10
+ 'パピプペポ']
11
+
12
+
13
+ def self.han_to_zen(term)
14
+ term.gsub!(/([ヲ-ッア-ン])([゙゚]?)/) do
15
+ katakana = $1
16
+ sym = $2
17
+ index = HANKAKU_VSYMBOLS[sym]
18
+ pos = HANKAKU_KATAKANA.index(katakana)
19
+ ZENKAKU_KATAKANA[index][pos] || ZENKAKU_KATAKANA[0][pos]
20
+ end
21
+ end
22
+
23
+ def self.fullwidth_kata_to_hira(term)
24
+ term.tr!('ァ-ン','ぁ-ん')
25
+ end
26
+
27
+ def self.kata_to_hira(term)
28
+ term = han_to_zen(term)
29
+ term = fullwidth_kata_to_hira(term)
30
+ term
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,59 @@
1
+ module JDict
2
+ class Dictionary
3
+ def initialize(path)
4
+ @dictionary_path = File.join(path, self.dict_file)
5
+ @entries = []
6
+
7
+ @index = DictIndex.new(@dictionary_path)
8
+ end
9
+
10
+ def size
11
+ @entries.size
12
+ end
13
+
14
+ def build_index!
15
+ @index.build_index!
16
+ end
17
+
18
+ def loaded?
19
+ @index.built?
20
+ end
21
+
22
+ def dict_file
23
+ "JMDict"
24
+ end
25
+
26
+ # Search this dictionary's index for the given string.
27
+ # @param query [String] the search query
28
+ # @return [Array(Entry)] the results of the search
29
+ def search(query, opts = {})
30
+ opts = opts.merge(default_search_options)
31
+
32
+ results = []
33
+ return results if query.empty?
34
+
35
+ results = @index.search(query, opts)
36
+ end
37
+
38
+ # Retrieves the definition of a part-of-speech from its abbreviation
39
+ # @param pos [String] the abbreviation for the part-of-speech
40
+ # @return [String] the full description of the part-of-speech
41
+ def get_pos(pos)
42
+ @index.get_pos(pos)
43
+ end
44
+
45
+ def delete!
46
+ @index.delete!
47
+ end
48
+
49
+ private
50
+
51
+ def default_search_options
52
+ {
53
+ max_results: 50,
54
+ language: JMDictConstants::LANGUAGE_DEFAULT,
55
+ exact: false,
56
+ }
57
+ end
58
+ end
59
+ end