natto 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.yardopts +1 -0
- data/CHANGELOG +60 -0
- data/LICENSE +1 -1
- data/README.md +9 -78
- data/lib/natto.rb +24 -20
- data/lib/natto/version.rb +1 -1
- data/test/natto/tc_dictionaryinfo.rb +2 -1
- data/test/natto/tc_mecab.rb +3 -0
- metadata +12 -7
data/.yardopts
CHANGED
data/CHANGELOG
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
## CHANGELOG
|
|
2
|
+
|
|
3
|
+
- __2011/01/26__: 0.4.0 release.
|
|
4
|
+
- Added support for mecab option input-buffer-size
|
|
5
|
+
- Adding CHANGELOG file
|
|
6
|
+
- Continuing update of documentation
|
|
7
|
+
|
|
8
|
+
- __2011/01/22__: 0.3.0 release.
|
|
9
|
+
- Refactoring of Natto::Binding to make mecab methods available as class methods
|
|
10
|
+
- Refactoring of Natto::DictionaryInfo to override to_s method to return filename
|
|
11
|
+
- Refactoring of Natto::MeCab to use class methods in Natto::Binding
|
|
12
|
+
- Refactoring and logical separation of test cases
|
|
13
|
+
- Continuing update of documentation
|
|
14
|
+
|
|
15
|
+
- __2011/01/19__: 0.2.0 release.
|
|
16
|
+
- Added support for mecab option allocate-sentence
|
|
17
|
+
- Continuing update of documentation
|
|
18
|
+
|
|
19
|
+
- __2011/01/15__: 0.1.1 release.
|
|
20
|
+
- Refactored Natto::DictionaryInfo#method_missing
|
|
21
|
+
- Continuing update of documentation
|
|
22
|
+
|
|
23
|
+
- __2011/01/15__: 0.1.0 release.
|
|
24
|
+
- Added accessors to Natto::DictionaryInfo
|
|
25
|
+
- Added accessor for version in Natto::MeCab
|
|
26
|
+
- Continuing update of documentation
|
|
27
|
+
|
|
28
|
+
- __2011/01/13__: 0.0.9 release.
|
|
29
|
+
- Further development and testing for mecab dictionary access/destruction
|
|
30
|
+
- Continuing update of documentation
|
|
31
|
+
|
|
32
|
+
- __2011/01/07__: 0.0.8 release.
|
|
33
|
+
- Adding support for accessing dictionaries
|
|
34
|
+
- Further tweaking of documentation with markdown
|
|
35
|
+
|
|
36
|
+
- __2010/12/30__: 0.0.7 release.
|
|
37
|
+
- Adding support for all-morphs and partial options
|
|
38
|
+
- Further updating of documentation with markdown
|
|
39
|
+
|
|
40
|
+
- __2010/12/28__: 0.0.6 release.
|
|
41
|
+
- Correction to natto.gemspec to include lib/natto/binding.rb
|
|
42
|
+
|
|
43
|
+
- __2010/12/28__: 0.0.5 release. (yanked)
|
|
44
|
+
- On-going refactoring
|
|
45
|
+
- Project structure refactored for greater maintainability
|
|
46
|
+
|
|
47
|
+
- __2010/12/26__: 0.0.4 release.
|
|
48
|
+
- On-going refactoring
|
|
49
|
+
|
|
50
|
+
- __2010/12/23__: 0.0.3 release.
|
|
51
|
+
- On-going refactoring
|
|
52
|
+
- Adding documentation via yard
|
|
53
|
+
|
|
54
|
+
- __2010/12/20__: 0.0.2 release.
|
|
55
|
+
- Continuing development on proper resource deallocation
|
|
56
|
+
- Adding options hash in object initializer
|
|
57
|
+
|
|
58
|
+
- __2010/12/13__: Released version 0.0.1. The objective is to provide
|
|
59
|
+
an easy-to-use, production-level Ruby binding to MeCab.
|
|
60
|
+
- Initial release
|
data/LICENSE
CHANGED
data/README.md
CHANGED
|
@@ -4,6 +4,12 @@ A Tasty Ruby Binding with MeCab
|
|
|
4
4
|
## What is natto?
|
|
5
5
|
natto combines the [Ruby programming language](http://www.ruby-lang.org/) with [MeCab](http://mecab.sourceforge.net/), the part-of-speech and morphological analyzer for the Japanese language.
|
|
6
6
|
|
|
7
|
+
natto is a gem bridging Ruby and MeCab using FFI (foreign function interface). No compilation is necessary, and natto will run on CRuby (mri/yarv) and JRuby (jvm) equally well, on any OS.
|
|
8
|
+
|
|
9
|
+
You can learn more about [natto at Google Code Projects](http://code.google.com/p/natto/).
|
|
10
|
+
|
|
11
|
+
Comments and questions are welcome at the [natto-users Group](http://groups.google.com/group/natto-users).
|
|
12
|
+
|
|
7
13
|
## Requirements
|
|
8
14
|
natto requires the following:
|
|
9
15
|
|
|
@@ -20,7 +26,7 @@ Install natto with the following gem command:
|
|
|
20
26
|
- In case of <tt>LoadError</tt>, please set the <tt>MECAB_PATH</tt> environment variable to the exact name/path to your <tt>mecab</tt> library.
|
|
21
27
|
|
|
22
28
|
e.g., for bash on UNIX/Linux
|
|
23
|
-
export MECAB_PATH
|
|
29
|
+
export MECAB_PATH=/usr/local/lib/libmecab.so
|
|
24
30
|
e.g., on Windows
|
|
25
31
|
set MECAB_PATH=C:\Program Files\MeCab\bin\libmecab.dll
|
|
26
32
|
e.g., for Cygwin
|
|
@@ -71,82 +77,7 @@ e.g., for Cygwin
|
|
|
71
77
|
- Please try not to mess with the Rakefile, version, or history. If you must have your own version, that is fine, but please isolate to its own commit so I can cherry-pick around it.
|
|
72
78
|
|
|
73
79
|
## Changelog
|
|
74
|
-
|
|
75
|
-
- __2011/01/22: 0.3.0 release.
|
|
76
|
-
- Refactoring of Natto::Binding to make mecab methods available as class methods
|
|
77
|
-
- Refactoring of Natto::DictionaryInfo to override to_s method to return filename
|
|
78
|
-
- Refactoring of Natto::MeCab to use class methods in Natto::Binding
|
|
79
|
-
- Refactoring and logical separation of test cases
|
|
80
|
-
- Continuing update of documentation
|
|
81
|
-
|
|
82
|
-
- __2011/01/19__: 0.2.0 release.
|
|
83
|
-
- Added support for mecab option allocate-sentence
|
|
84
|
-
- Continuing update of documentation
|
|
85
|
-
|
|
86
|
-
- __2011/01/15__: 0.1.1 release.
|
|
87
|
-
- Refactored Natto::DictionaryInfo#method_missing
|
|
88
|
-
- Continuing update of documentation
|
|
89
|
-
|
|
90
|
-
- __2011/01/15__: 0.1.0 release.
|
|
91
|
-
- Added accessors to Natto::DictionaryInfo
|
|
92
|
-
- Added accessor for version in Natto::MeCab
|
|
93
|
-
- Continuing update of documentation
|
|
94
|
-
|
|
95
|
-
- __2011/01/13__: 0.0.9 release.
|
|
96
|
-
- Further development and testing for mecab dictionary access/destruction
|
|
97
|
-
- Continuing update of documentation
|
|
98
|
-
|
|
99
|
-
- __2011/01/07__: 0.0.8 release.
|
|
100
|
-
- Adding support for accessing dictionaries
|
|
101
|
-
- Further tweaking of documentation with markdown
|
|
102
|
-
|
|
103
|
-
- __2010/12/30__: 0.0.7 release.
|
|
104
|
-
- Adding support for all-morphs and partial options
|
|
105
|
-
- Further updating of documentation with markdown
|
|
106
|
-
|
|
107
|
-
- __2010/12/28__: 0.0.6 release.
|
|
108
|
-
- Correction to natto.gemspec to include lib/natto/binding.rb
|
|
109
|
-
|
|
110
|
-
- __2010/12/28__: 0.0.5 release. (yanked)
|
|
111
|
-
- On-going refactoring
|
|
112
|
-
- Project structure refactored for greater maintainability
|
|
113
|
-
|
|
114
|
-
- __2010/12/26__: 0.0.4 release.
|
|
115
|
-
- On-going refactoring
|
|
116
|
-
|
|
117
|
-
- __2010/12/23__: 0.0.3 release.
|
|
118
|
-
- On-going refactoring
|
|
119
|
-
- Adding documentation via yard
|
|
120
|
-
|
|
121
|
-
- __2010/12/20__: 0.0.2 release.
|
|
122
|
-
- Continuing development on proper resource deallocation
|
|
123
|
-
- Adding options hash in object initializer
|
|
124
|
-
|
|
125
|
-
- __2010/12/13__: Released version 0.0.1. The objective is to provide
|
|
126
|
-
an easy-to-use, production-level Ruby binding to MeCab.
|
|
127
|
-
- Initial release
|
|
80
|
+
Please see the {file:CHANGELOG} for this gem's release history.
|
|
128
81
|
|
|
129
82
|
## Copyright
|
|
130
|
-
Copyright ©
|
|
131
|
-
All rights reserved.
|
|
132
|
-
|
|
133
|
-
Redistribution and use in source and binary forms, with or without modification, are
|
|
134
|
-
permitted provided that the following conditions are met:
|
|
135
|
-
|
|
136
|
-
* Redistributions of source code must retain the above
|
|
137
|
-
copyright notice, this list of conditions and the
|
|
138
|
-
following disclaimer.
|
|
139
|
-
|
|
140
|
-
* Redistributions in binary form must reproduce the above
|
|
141
|
-
copyright notice, this list of conditions and the
|
|
142
|
-
following disclaimer in the documentation and/or other
|
|
143
|
-
materials provided with the distribution.
|
|
144
|
-
|
|
145
|
-
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
|
|
146
|
-
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
|
147
|
-
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
|
148
|
-
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
|
149
|
-
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
150
|
-
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
|
|
151
|
-
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
|
|
152
|
-
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
83
|
+
Copyright © 2011, Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.
|
data/lib/natto.rb
CHANGED
|
@@ -14,21 +14,23 @@ module Natto
|
|
|
14
14
|
# require 'rubygems' if RUBY_VERSION.to_f < 1.9
|
|
15
15
|
# require 'natto'
|
|
16
16
|
#
|
|
17
|
-
# mecab = Natto::MeCab.new
|
|
18
|
-
# => #<Natto::MeCab:
|
|
19
|
-
# @options={}, \
|
|
17
|
+
# mecab = Natto::MeCab.new(:output_format_type=>'wakati')
|
|
18
|
+
# => #<Natto::MeCab:0x28dd471c @ptr=#<FFI::Pointer address=0x28a027d8>, \
|
|
19
|
+
# @options={:output_format_type=>"wakati"}, \
|
|
20
20
|
# @version="0.98", \
|
|
21
21
|
# @dicts=[/usr/local/lib/mecab/dic/ipadic/sys.dic]>
|
|
22
22
|
#
|
|
23
|
-
#
|
|
24
|
-
#
|
|
25
|
-
#
|
|
26
|
-
#
|
|
27
|
-
#
|
|
28
|
-
#
|
|
29
|
-
#
|
|
30
|
-
#
|
|
31
|
-
#
|
|
23
|
+
# output = mecab.parse('ネバネバの組み合わせ美味しいです。').split
|
|
24
|
+
#
|
|
25
|
+
# output.each do |token|
|
|
26
|
+
# puts token
|
|
27
|
+
# end
|
|
28
|
+
# => ネバネバ
|
|
29
|
+
# の
|
|
30
|
+
# 組み合わせ
|
|
31
|
+
# 美味しい
|
|
32
|
+
# です
|
|
33
|
+
# 。
|
|
32
34
|
#
|
|
33
35
|
class MeCab
|
|
34
36
|
include Natto::Binding
|
|
@@ -40,7 +42,8 @@ module Natto
|
|
|
40
42
|
SUPPORTED_OPTS = [ :rcfile, :dicdir, :userdic, :lattice_level, :all_morphs,
|
|
41
43
|
:output_format_type, :partial, :node_format, :unk_format,
|
|
42
44
|
:bos_format, :eos_format, :eon_format, :unk_feature,
|
|
43
|
-
:allocate_sentence, :nbest, :theta,
|
|
45
|
+
:input_buffer_size, :allocate_sentence, :nbest, :theta,
|
|
46
|
+
:cost_factor ].freeze
|
|
44
47
|
|
|
45
48
|
# Initializes the wrapped <tt>mecab</tt> instance with the
|
|
46
49
|
# given <tt>options</tt> hash.
|
|
@@ -60,6 +63,7 @@ module Natto
|
|
|
60
63
|
# - :eos_format -- user-defined end-of-sentence format
|
|
61
64
|
# - :eon_format -- user-defined end-of-NBest format
|
|
62
65
|
# - :unk_feature -- feature for unknown word
|
|
66
|
+
# - :input_buffer_size -- set input buffer size (default 8192)
|
|
63
67
|
# - :allocate_sentence -- allocate new memory for input sentence
|
|
64
68
|
# - :nbest -- output N best results (integer, default 1)
|
|
65
69
|
# - :theta -- temperature parameter theta (float, default 0.75)
|
|
@@ -98,8 +102,8 @@ module Natto
|
|
|
98
102
|
raise MeCabError.new("Could not initialize MeCab with options: '#{opt_str}'") if @ptr.address == 0x0
|
|
99
103
|
|
|
100
104
|
@dicts << Natto::DictionaryInfo.new(Natto::Binding.mecab_dictionary_info(@ptr))
|
|
101
|
-
while @dicts.last
|
|
102
|
-
@dicts << Natto::DictionaryInfo.new(@dicts.last
|
|
105
|
+
while @dicts.last.next.address != 0x0
|
|
106
|
+
@dicts << Natto::DictionaryInfo.new(@dicts.last.next)
|
|
103
107
|
end
|
|
104
108
|
|
|
105
109
|
@version = self.mecab_version
|
|
@@ -229,13 +233,13 @@ module Natto
|
|
|
229
233
|
# @raise [NoMethodError] if <tt>attr_name</tt> is not a member of this <tt>mecab</tt> dictionary <tt>FFI::Struct</tt>
|
|
230
234
|
def method_missing(attr_name)
|
|
231
235
|
member_sym = attr_name.id2name.to_sym
|
|
232
|
-
if self.members.include?(member_sym)
|
|
233
|
-
|
|
234
|
-
else
|
|
235
|
-
raise(NoMethodError.new("undefined method '#{attr_name}' for #{self}"))
|
|
236
|
-
end
|
|
236
|
+
return self[member_sym] if self.members.include?(member_sym)
|
|
237
|
+
raise(NoMethodError.new("undefined method '#{attr_name}' for #{self}"))
|
|
237
238
|
end
|
|
238
239
|
|
|
240
|
+
# Returns the full-path file name for this dictionary. Overrides <tt>Object#to_s</tt>.
|
|
241
|
+
#
|
|
242
|
+
# @return [String] full-path filename for this dictionary
|
|
239
243
|
def to_s
|
|
240
244
|
self[:filename]
|
|
241
245
|
end
|
data/lib/natto/version.rb
CHANGED
|
@@ -23,6 +23,7 @@ class TestDictionaryInfo < Test::Unit::TestCase
|
|
|
23
23
|
assert_equal('/usr/local/lib/mecab/dic/ipadic/sys.dic', sysdic[:filename])
|
|
24
24
|
assert_equal('utf8', sysdic[:charset])
|
|
25
25
|
assert_equal(0x0, sysdic[:next].address)
|
|
26
|
+
#assert_nil(sysdic.next)
|
|
26
27
|
end
|
|
27
28
|
|
|
28
29
|
# Tests the to_s method.
|
|
@@ -42,7 +43,7 @@ class TestDictionaryInfo < Test::Unit::TestCase
|
|
|
42
43
|
|
|
43
44
|
# NoMethodError will be raised for anything else!
|
|
44
45
|
assert_raise NoMethodError do
|
|
45
|
-
sysdic.send :
|
|
46
|
+
sysdic.send :unknown_attr
|
|
46
47
|
end
|
|
47
48
|
end
|
|
48
49
|
end
|
data/test/natto/tc_mecab.rb
CHANGED
|
@@ -58,6 +58,9 @@ class TestMeCab < Test::Unit::TestCase
|
|
|
58
58
|
res = Natto::MeCab.build_options_str(:unk_feature=>'%m\t%f[7]\n')
|
|
59
59
|
assert_equal('--unk-feature=%m\t%f[7]\n', res)
|
|
60
60
|
|
|
61
|
+
res = Natto::MeCab.build_options_str(:input_buffer_size=>102400)
|
|
62
|
+
assert_equal('--input-buffer-size=102400', res)
|
|
63
|
+
|
|
61
64
|
res = Natto::MeCab.build_options_str(:allocate_sentence=>true)
|
|
62
65
|
assert_equal('--allocate-sentence', res)
|
|
63
66
|
|
metadata
CHANGED
|
@@ -1,12 +1,13 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: natto
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
|
|
4
|
+
hash: 15
|
|
5
|
+
prerelease:
|
|
5
6
|
segments:
|
|
6
7
|
- 0
|
|
7
|
-
-
|
|
8
|
+
- 4
|
|
8
9
|
- 0
|
|
9
|
-
version: 0.
|
|
10
|
+
version: 0.4.0
|
|
10
11
|
platform: ruby
|
|
11
12
|
authors:
|
|
12
13
|
- Brooke M. Fujita
|
|
@@ -14,7 +15,7 @@ autorequire:
|
|
|
14
15
|
bindir: bin
|
|
15
16
|
cert_chain: []
|
|
16
17
|
|
|
17
|
-
date: 2011-01-
|
|
18
|
+
date: 2011-01-26 00:00:00 +09:00
|
|
18
19
|
default_executable:
|
|
19
20
|
dependencies:
|
|
20
21
|
- !ruby/object:Gem::Dependency
|
|
@@ -25,6 +26,7 @@ dependencies:
|
|
|
25
26
|
requirements:
|
|
26
27
|
- - ">="
|
|
27
28
|
- !ruby/object:Gem::Version
|
|
29
|
+
hash: 1
|
|
28
30
|
segments:
|
|
29
31
|
- 0
|
|
30
32
|
- 6
|
|
@@ -34,7 +36,7 @@ dependencies:
|
|
|
34
36
|
version_requirements: *id001
|
|
35
37
|
description: |
|
|
36
38
|
natto is a gem bridging Ruby and MeCab using FFI (foreign function interface).
|
|
37
|
-
No compilation is necessary, and natto works on any platform and on any OS.
|
|
39
|
+
No compilation is necessary, and natto works on any Ruby platform and on any OS.
|
|
38
40
|
|
|
39
41
|
Find out more about natto by visiting the
|
|
40
42
|
project homepage at http://code.google.com/p/natto/
|
|
@@ -54,8 +56,9 @@ files:
|
|
|
54
56
|
- test/natto/tc_binding.rb
|
|
55
57
|
- test/natto/tc_dictionaryinfo.rb
|
|
56
58
|
- test/natto/tc_mecab.rb
|
|
57
|
-
- LICENSE
|
|
58
59
|
- README.md
|
|
60
|
+
- LICENSE
|
|
61
|
+
- CHANGELOG
|
|
59
62
|
- .yardopts
|
|
60
63
|
has_rdoc: true
|
|
61
64
|
homepage: http://code.google.com/p/natto/
|
|
@@ -71,6 +74,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
|
71
74
|
requirements:
|
|
72
75
|
- - ">="
|
|
73
76
|
- !ruby/object:Gem::Version
|
|
77
|
+
hash: 57
|
|
74
78
|
segments:
|
|
75
79
|
- 1
|
|
76
80
|
- 8
|
|
@@ -81,6 +85,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
81
85
|
requirements:
|
|
82
86
|
- - ">="
|
|
83
87
|
- !ruby/object:Gem::Version
|
|
88
|
+
hash: 3
|
|
84
89
|
segments:
|
|
85
90
|
- 0
|
|
86
91
|
version: "0"
|
|
@@ -88,7 +93,7 @@ requirements:
|
|
|
88
93
|
- MeCab, 0.98 or greater
|
|
89
94
|
- FFI, 0.6.3 or greater
|
|
90
95
|
rubyforge_project:
|
|
91
|
-
rubygems_version: 1.
|
|
96
|
+
rubygems_version: 1.4.2
|
|
92
97
|
signing_key:
|
|
93
98
|
specification_version: 3
|
|
94
99
|
summary: natto combines the Ruby programming language with MeCab, the part-of-speech and morphological analyzer for the Japanese language.
|