natto 0.3.0 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- data/.yardopts +1 -0
- data/CHANGELOG +60 -0
- data/LICENSE +1 -1
- data/README.md +9 -78
- data/lib/natto.rb +24 -20
- data/lib/natto/version.rb +1 -1
- data/test/natto/tc_dictionaryinfo.rb +2 -1
- data/test/natto/tc_mecab.rb +3 -0
- metadata +12 -7
data/.yardopts
CHANGED
data/CHANGELOG
ADDED
@@ -0,0 +1,60 @@
|
|
1
|
+
## CHANGELOG
|
2
|
+
|
3
|
+
- __2011/01/26__: 0.4.0 release.
|
4
|
+
- Added support for mecab option input-buffer-size
|
5
|
+
- Adding CHANGELOG file
|
6
|
+
- Continuing update of documentation
|
7
|
+
|
8
|
+
- __2011/01/22__: 0.3.0 release.
|
9
|
+
- Refactoring of Natto::Binding to make mecab methods available as class methods
|
10
|
+
- Refactoring of Natto::DictionaryInfo to override to_s method to return filename
|
11
|
+
- Refactoring of Natto::MeCab to use class methods in Natto::Binding
|
12
|
+
- Refactoring and logical separation of test cases
|
13
|
+
- Continuing update of documentation
|
14
|
+
|
15
|
+
- __2011/01/19__: 0.2.0 release.
|
16
|
+
- Added support for mecab option allocate-sentence
|
17
|
+
- Continuing update of documentation
|
18
|
+
|
19
|
+
- __2011/01/15__: 0.1.1 release.
|
20
|
+
- Refactored Natto::DictionaryInfo#method_missing
|
21
|
+
- Continuing update of documentation
|
22
|
+
|
23
|
+
- __2011/01/15__: 0.1.0 release.
|
24
|
+
- Added accessors to Natto::DictionaryInfo
|
25
|
+
- Added accessor for version in Natto::MeCab
|
26
|
+
- Continuing update of documentation
|
27
|
+
|
28
|
+
- __2011/01/13__: 0.0.9 release.
|
29
|
+
- Further development and testing for mecab dictionary access/destruction
|
30
|
+
- Continuing update of documentation
|
31
|
+
|
32
|
+
- __2011/01/07__: 0.0.8 release.
|
33
|
+
- Adding support for accessing dictionaries
|
34
|
+
- Further tweaking of documentation with markdown
|
35
|
+
|
36
|
+
- __2010/12/30__: 0.0.7 release.
|
37
|
+
- Adding support for all-morphs and partial options
|
38
|
+
- Further updating of documentation with markdown
|
39
|
+
|
40
|
+
- __2010/12/28__: 0.0.6 release.
|
41
|
+
- Correction to natto.gemspec to include lib/natto/binding.rb
|
42
|
+
|
43
|
+
- __2010/12/28__: 0.0.5 release. (yanked)
|
44
|
+
- On-going refactoring
|
45
|
+
- Project structure refactored for greater maintainability
|
46
|
+
|
47
|
+
- __2010/12/26__: 0.0.4 release.
|
48
|
+
- On-going refactoring
|
49
|
+
|
50
|
+
- __2010/12/23__: 0.0.3 release.
|
51
|
+
- On-going refactoring
|
52
|
+
- Adding documentation via yard
|
53
|
+
|
54
|
+
- __2010/12/20__: 0.0.2 release.
|
55
|
+
- Continuing development on proper resource deallocation
|
56
|
+
- Adding options hash in object initializer
|
57
|
+
|
58
|
+
- __2010/12/13__: Released version 0.0.1. The objective is to provide
|
59
|
+
an easy-to-use, production-level Ruby binding to MeCab.
|
60
|
+
- Initial release
|
data/LICENSE
CHANGED
data/README.md
CHANGED
@@ -4,6 +4,12 @@ A Tasty Ruby Binding with MeCab
|
|
4
4
|
## What is natto?
|
5
5
|
natto combines the [Ruby programming language](http://www.ruby-lang.org/) with [MeCab](http://mecab.sourceforge.net/), the part-of-speech and morphological analyzer for the Japanese language.
|
6
6
|
|
7
|
+
natto is a gem bridging Ruby and MeCab using FFI (foreign function interface). No compilation is necessary, and natto will run on CRuby (mri/yarv) and JRuby (jvm) equally well, on any OS.
|
8
|
+
|
9
|
+
You can learn more about [natto at Google Code Projects](http://code.google.com/p/natto/).
|
10
|
+
|
11
|
+
Comments and questions are welcome at the [natto-users Group](http://groups.google.com/group/natto-users).
|
12
|
+
|
7
13
|
## Requirements
|
8
14
|
natto requires the following:
|
9
15
|
|
@@ -20,7 +26,7 @@ Install natto with the following gem command:
|
|
20
26
|
- In case of <tt>LoadError</tt>, please set the <tt>MECAB_PATH</tt> environment variable to the exact name/path to your <tt>mecab</tt> library.
|
21
27
|
|
22
28
|
e.g., for bash on UNIX/Linux
|
23
|
-
export MECAB_PATH
|
29
|
+
export MECAB_PATH=/usr/local/lib/libmecab.so
|
24
30
|
e.g., on Windows
|
25
31
|
set MECAB_PATH=C:\Program Files\MeCab\bin\libmecab.dll
|
26
32
|
e.g., for Cygwin
|
@@ -71,82 +77,7 @@ e.g., for Cygwin
|
|
71
77
|
- Please try not to mess with the Rakefile, version, or history. If you must have your own version, that is fine, but please isolate to its own commit so I can cherry-pick around it.
|
72
78
|
|
73
79
|
## Changelog
|
74
|
-
|
75
|
-
- __2011/01/22: 0.3.0 release.
|
76
|
-
- Refactoring of Natto::Binding to make mecab methods available as class methods
|
77
|
-
- Refactoring of Natto::DictionaryInfo to override to_s method to return filename
|
78
|
-
- Refactoring of Natto::MeCab to use class methods in Natto::Binding
|
79
|
-
- Refactoring and logical separation of test cases
|
80
|
-
- Continuing update of documentation
|
81
|
-
|
82
|
-
- __2011/01/19__: 0.2.0 release.
|
83
|
-
- Added support for mecab option allocate-sentence
|
84
|
-
- Continuing update of documentation
|
85
|
-
|
86
|
-
- __2011/01/15__: 0.1.1 release.
|
87
|
-
- Refactored Natto::DictionaryInfo#method_missing
|
88
|
-
- Continuing update of documentation
|
89
|
-
|
90
|
-
- __2011/01/15__: 0.1.0 release.
|
91
|
-
- Added accessors to Natto::DictionaryInfo
|
92
|
-
- Added accessor for version in Natto::MeCab
|
93
|
-
- Continuing update of documentation
|
94
|
-
|
95
|
-
- __2011/01/13__: 0.0.9 release.
|
96
|
-
- Further development and testing for mecab dictionary access/destruction
|
97
|
-
- Continuing update of documentation
|
98
|
-
|
99
|
-
- __2011/01/07__: 0.0.8 release.
|
100
|
-
- Adding support for accessing dictionaries
|
101
|
-
- Further tweaking of documentation with markdown
|
102
|
-
|
103
|
-
- __2010/12/30__: 0.0.7 release.
|
104
|
-
- Adding support for all-morphs and partial options
|
105
|
-
- Further updating of documentation with markdown
|
106
|
-
|
107
|
-
- __2010/12/28__: 0.0.6 release.
|
108
|
-
- Correction to natto.gemspec to include lib/natto/binding.rb
|
109
|
-
|
110
|
-
- __2010/12/28__: 0.0.5 release. (yanked)
|
111
|
-
- On-going refactoring
|
112
|
-
- Project structure refactored for greater maintainability
|
113
|
-
|
114
|
-
- __2010/12/26__: 0.0.4 release.
|
115
|
-
- On-going refactoring
|
116
|
-
|
117
|
-
- __2010/12/23__: 0.0.3 release.
|
118
|
-
- On-going refactoring
|
119
|
-
- Adding documentation via yard
|
120
|
-
|
121
|
-
- __2010/12/20__: 0.0.2 release.
|
122
|
-
- Continuing development on proper resource deallocation
|
123
|
-
- Adding options hash in object initializer
|
124
|
-
|
125
|
-
- __2010/12/13__: Released version 0.0.1. The objective is to provide
|
126
|
-
an easy-to-use, production-level Ruby binding to MeCab.
|
127
|
-
- Initial release
|
80
|
+
Please see the {file:CHANGELOG} for this gem's release history.
|
128
81
|
|
129
82
|
## Copyright
|
130
|
-
Copyright ©
|
131
|
-
All rights reserved.
|
132
|
-
|
133
|
-
Redistribution and use in source and binary forms, with or without modification, are
|
134
|
-
permitted provided that the following conditions are met:
|
135
|
-
|
136
|
-
* Redistributions of source code must retain the above
|
137
|
-
copyright notice, this list of conditions and the
|
138
|
-
following disclaimer.
|
139
|
-
|
140
|
-
* Redistributions in binary form must reproduce the above
|
141
|
-
copyright notice, this list of conditions and the
|
142
|
-
following disclaimer in the documentation and/or other
|
143
|
-
materials provided with the distribution.
|
144
|
-
|
145
|
-
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
|
146
|
-
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
147
|
-
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
148
|
-
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
149
|
-
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
150
|
-
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
|
151
|
-
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
|
152
|
-
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
83
|
+
Copyright © 2011, Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.
|
data/lib/natto.rb
CHANGED
@@ -14,21 +14,23 @@ module Natto
|
|
14
14
|
# require 'rubygems' if RUBY_VERSION.to_f < 1.9
|
15
15
|
# require 'natto'
|
16
16
|
#
|
17
|
-
# mecab = Natto::MeCab.new
|
18
|
-
# => #<Natto::MeCab:
|
19
|
-
# @options={}, \
|
17
|
+
# mecab = Natto::MeCab.new(:output_format_type=>'wakati')
|
18
|
+
# => #<Natto::MeCab:0x28dd471c @ptr=#<FFI::Pointer address=0x28a027d8>, \
|
19
|
+
# @options={:output_format_type=>"wakati"}, \
|
20
20
|
# @version="0.98", \
|
21
21
|
# @dicts=[/usr/local/lib/mecab/dic/ipadic/sys.dic]>
|
22
22
|
#
|
23
|
-
#
|
24
|
-
#
|
25
|
-
#
|
26
|
-
#
|
27
|
-
#
|
28
|
-
#
|
29
|
-
#
|
30
|
-
#
|
31
|
-
#
|
23
|
+
# output = mecab.parse('ネバネバの組み合わせ美味しいです。').split
|
24
|
+
#
|
25
|
+
# output.each do |token|
|
26
|
+
# puts token
|
27
|
+
# end
|
28
|
+
# => ネバネバ
|
29
|
+
# の
|
30
|
+
# 組み合わせ
|
31
|
+
# 美味しい
|
32
|
+
# です
|
33
|
+
# 。
|
32
34
|
#
|
33
35
|
class MeCab
|
34
36
|
include Natto::Binding
|
@@ -40,7 +42,8 @@ module Natto
|
|
40
42
|
SUPPORTED_OPTS = [ :rcfile, :dicdir, :userdic, :lattice_level, :all_morphs,
|
41
43
|
:output_format_type, :partial, :node_format, :unk_format,
|
42
44
|
:bos_format, :eos_format, :eon_format, :unk_feature,
|
43
|
-
:allocate_sentence, :nbest, :theta,
|
45
|
+
:input_buffer_size, :allocate_sentence, :nbest, :theta,
|
46
|
+
:cost_factor ].freeze
|
44
47
|
|
45
48
|
# Initializes the wrapped <tt>mecab</tt> instance with the
|
46
49
|
# given <tt>options</tt> hash.
|
@@ -60,6 +63,7 @@ module Natto
|
|
60
63
|
# - :eos_format -- user-defined end-of-sentence format
|
61
64
|
# - :eon_format -- user-defined end-of-NBest format
|
62
65
|
# - :unk_feature -- feature for unknown word
|
66
|
+
# - :input_buffer_size -- set input buffer size (default 8192)
|
63
67
|
# - :allocate_sentence -- allocate new memory for input sentence
|
64
68
|
# - :nbest -- output N best results (integer, default 1)
|
65
69
|
# - :theta -- temperature parameter theta (float, default 0.75)
|
@@ -98,8 +102,8 @@ module Natto
|
|
98
102
|
raise MeCabError.new("Could not initialize MeCab with options: '#{opt_str}'") if @ptr.address == 0x0
|
99
103
|
|
100
104
|
@dicts << Natto::DictionaryInfo.new(Natto::Binding.mecab_dictionary_info(@ptr))
|
101
|
-
while @dicts.last
|
102
|
-
@dicts << Natto::DictionaryInfo.new(@dicts.last
|
105
|
+
while @dicts.last.next.address != 0x0
|
106
|
+
@dicts << Natto::DictionaryInfo.new(@dicts.last.next)
|
103
107
|
end
|
104
108
|
|
105
109
|
@version = self.mecab_version
|
@@ -229,13 +233,13 @@ module Natto
|
|
229
233
|
# @raise [NoMethodError] if <tt>attr_name</tt> is not a member of this <tt>mecab</tt> dictionary <tt>FFI::Struct</tt>
|
230
234
|
def method_missing(attr_name)
|
231
235
|
member_sym = attr_name.id2name.to_sym
|
232
|
-
if self.members.include?(member_sym)
|
233
|
-
|
234
|
-
else
|
235
|
-
raise(NoMethodError.new("undefined method '#{attr_name}' for #{self}"))
|
236
|
-
end
|
236
|
+
return self[member_sym] if self.members.include?(member_sym)
|
237
|
+
raise(NoMethodError.new("undefined method '#{attr_name}' for #{self}"))
|
237
238
|
end
|
238
239
|
|
240
|
+
# Returns the full-path file name for this dictionary. Overrides <tt>Object#to_s</tt>.
|
241
|
+
#
|
242
|
+
# @return [String] full-path filename for this dictionary
|
239
243
|
def to_s
|
240
244
|
self[:filename]
|
241
245
|
end
|
data/lib/natto/version.rb
CHANGED
@@ -23,6 +23,7 @@ class TestDictionaryInfo < Test::Unit::TestCase
|
|
23
23
|
assert_equal('/usr/local/lib/mecab/dic/ipadic/sys.dic', sysdic[:filename])
|
24
24
|
assert_equal('utf8', sysdic[:charset])
|
25
25
|
assert_equal(0x0, sysdic[:next].address)
|
26
|
+
#assert_nil(sysdic.next)
|
26
27
|
end
|
27
28
|
|
28
29
|
# Tests the to_s method.
|
@@ -42,7 +43,7 @@ class TestDictionaryInfo < Test::Unit::TestCase
|
|
42
43
|
|
43
44
|
# NoMethodError will be raised for anything else!
|
44
45
|
assert_raise NoMethodError do
|
45
|
-
sysdic.send :
|
46
|
+
sysdic.send :unknown_attr
|
46
47
|
end
|
47
48
|
end
|
48
49
|
end
|
data/test/natto/tc_mecab.rb
CHANGED
@@ -58,6 +58,9 @@ class TestMeCab < Test::Unit::TestCase
|
|
58
58
|
res = Natto::MeCab.build_options_str(:unk_feature=>'%m\t%f[7]\n')
|
59
59
|
assert_equal('--unk-feature=%m\t%f[7]\n', res)
|
60
60
|
|
61
|
+
res = Natto::MeCab.build_options_str(:input_buffer_size=>102400)
|
62
|
+
assert_equal('--input-buffer-size=102400', res)
|
63
|
+
|
61
64
|
res = Natto::MeCab.build_options_str(:allocate_sentence=>true)
|
62
65
|
assert_equal('--allocate-sentence', res)
|
63
66
|
|
metadata
CHANGED
@@ -1,12 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: natto
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
|
4
|
+
hash: 15
|
5
|
+
prerelease:
|
5
6
|
segments:
|
6
7
|
- 0
|
7
|
-
-
|
8
|
+
- 4
|
8
9
|
- 0
|
9
|
-
version: 0.
|
10
|
+
version: 0.4.0
|
10
11
|
platform: ruby
|
11
12
|
authors:
|
12
13
|
- Brooke M. Fujita
|
@@ -14,7 +15,7 @@ autorequire:
|
|
14
15
|
bindir: bin
|
15
16
|
cert_chain: []
|
16
17
|
|
17
|
-
date: 2011-01-
|
18
|
+
date: 2011-01-26 00:00:00 +09:00
|
18
19
|
default_executable:
|
19
20
|
dependencies:
|
20
21
|
- !ruby/object:Gem::Dependency
|
@@ -25,6 +26,7 @@ dependencies:
|
|
25
26
|
requirements:
|
26
27
|
- - ">="
|
27
28
|
- !ruby/object:Gem::Version
|
29
|
+
hash: 1
|
28
30
|
segments:
|
29
31
|
- 0
|
30
32
|
- 6
|
@@ -34,7 +36,7 @@ dependencies:
|
|
34
36
|
version_requirements: *id001
|
35
37
|
description: |
|
36
38
|
natto is a gem bridging Ruby and MeCab using FFI (foreign function interface).
|
37
|
-
No compilation is necessary, and natto works on any platform and on any OS.
|
39
|
+
No compilation is necessary, and natto works on any Ruby platform and on any OS.
|
38
40
|
|
39
41
|
Find out more about natto by visiting the
|
40
42
|
project homepage at http://code.google.com/p/natto/
|
@@ -54,8 +56,9 @@ files:
|
|
54
56
|
- test/natto/tc_binding.rb
|
55
57
|
- test/natto/tc_dictionaryinfo.rb
|
56
58
|
- test/natto/tc_mecab.rb
|
57
|
-
- LICENSE
|
58
59
|
- README.md
|
60
|
+
- LICENSE
|
61
|
+
- CHANGELOG
|
59
62
|
- .yardopts
|
60
63
|
has_rdoc: true
|
61
64
|
homepage: http://code.google.com/p/natto/
|
@@ -71,6 +74,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
71
74
|
requirements:
|
72
75
|
- - ">="
|
73
76
|
- !ruby/object:Gem::Version
|
77
|
+
hash: 57
|
74
78
|
segments:
|
75
79
|
- 1
|
76
80
|
- 8
|
@@ -81,6 +85,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
81
85
|
requirements:
|
82
86
|
- - ">="
|
83
87
|
- !ruby/object:Gem::Version
|
88
|
+
hash: 3
|
84
89
|
segments:
|
85
90
|
- 0
|
86
91
|
version: "0"
|
@@ -88,7 +93,7 @@ requirements:
|
|
88
93
|
- MeCab, 0.98 or greater
|
89
94
|
- FFI, 0.6.3 or greater
|
90
95
|
rubyforge_project:
|
91
|
-
rubygems_version: 1.
|
96
|
+
rubygems_version: 1.4.2
|
92
97
|
signing_key:
|
93
98
|
specification_version: 3
|
94
99
|
summary: natto combines the Ruby programming language with MeCab, the part-of-speech and morphological analyzer for the Japanese language.
|