natto 0.3.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
data/.yardopts CHANGED
@@ -4,3 +4,4 @@
4
4
  --markup-provider rdiscount
5
5
  -
6
6
  LICENSE
7
+ CHANGELOG
data/CHANGELOG ADDED
@@ -0,0 +1,60 @@
1
+ ## CHANGELOG
2
+
3
+ - __2011/01/26__: 0.4.0 release.
4
+ - Added support for mecab option input-buffer-size
5
+ - Adding CHANGELOG file
6
+ - Continuing update of documentation
7
+
8
+ - __2011/01/22__: 0.3.0 release.
9
+ - Refactoring of Natto::Binding to make mecab methods available as class methods
10
+ - Refactoring of Natto::DictionaryInfo to override to_s method to return filename
11
+ - Refactoring of Natto::MeCab to use class methods in Natto::Binding
12
+ - Refactoring and logical separation of test cases
13
+ - Continuing update of documentation
14
+
15
+ - __2011/01/19__: 0.2.0 release.
16
+ - Added support for mecab option allocate-sentence
17
+ - Continuing update of documentation
18
+
19
+ - __2011/01/15__: 0.1.1 release.
20
+ - Refactored Natto::DictionaryInfo#method_missing
21
+ - Continuing update of documentation
22
+
23
+ - __2011/01/15__: 0.1.0 release.
24
+ - Added accessors to Natto::DictionaryInfo
25
+ - Added accessor for version in Natto::MeCab
26
+ - Continuing update of documentation
27
+
28
+ - __2011/01/13__: 0.0.9 release.
29
+ - Further development and testing for mecab dictionary access/destruction
30
+ - Continuing update of documentation
31
+
32
+ - __2011/01/07__: 0.0.8 release.
33
+ - Adding support for accessing dictionaries
34
+ - Further tweaking of documentation with markdown
35
+
36
+ - __2010/12/30__: 0.0.7 release.
37
+ - Adding support for all-morphs and partial options
38
+ - Further updating of documentation with markdown
39
+
40
+ - __2010/12/28__: 0.0.6 release.
41
+ - Correction to natto.gemspec to include lib/natto/binding.rb
42
+
43
+ - __2010/12/28__: 0.0.5 release. (yanked)
44
+ - On-going refactoring
45
+ - Project structure refactored for greater maintainability
46
+
47
+ - __2010/12/26__: 0.0.4 release.
48
+ - On-going refactoring
49
+
50
+ - __2010/12/23__: 0.0.3 release.
51
+ - On-going refactoring
52
+ - Adding documentation via yard
53
+
54
+ - __2010/12/20__: 0.0.2 release.
55
+ - Continuing development on proper resource deallocation
56
+ - Adding options hash in object initializer
57
+
58
+ - __2010/12/13__: Released version 0.0.1. The objective is to provide
59
+ an easy-to-use, production-level Ruby binding to MeCab.
60
+ - Initial release
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright © 2010-2013, Brooke M. Fujita.
1
+ Copyright © 2011, Brooke M. Fujita.
2
2
  All rights reserved.
3
3
 
4
4
  Redistribution and use in source and binary forms, with or without modification, are
data/README.md CHANGED
@@ -4,6 +4,12 @@ A Tasty Ruby Binding with MeCab
4
4
  ## What is natto?
5
5
  natto combines the [Ruby programming language](http://www.ruby-lang.org/) with [MeCab](http://mecab.sourceforge.net/), the part-of-speech and morphological analyzer for the Japanese language.
6
6
 
7
+ natto is a gem bridging Ruby and MeCab using FFI (foreign function interface). No compilation is necessary, and natto will run on CRuby (mri/yarv) and JRuby (jvm) equally well, on any OS.
8
+
9
+ You can learn more about [natto at Google Code Projects](http://code.google.com/p/natto/).
10
+
11
+ Comments and questions are welcome at the [natto-users Group](http://groups.google.com/group/natto-users).
12
+
7
13
  ## Requirements
8
14
  natto requires the following:
9
15
 
@@ -20,7 +26,7 @@ Install natto with the following gem command:
20
26
  - In case of <tt>LoadError</tt>, please set the <tt>MECAB_PATH</tt> environment variable to the exact name/path to your <tt>mecab</tt> library.
21
27
 
22
28
  e.g., for bash on UNIX/Linux
23
- export MECAB_PATH=mecab.so
29
+ export MECAB_PATH=/usr/local/lib/libmecab.so
24
30
  e.g., on Windows
25
31
  set MECAB_PATH=C:\Program Files\MeCab\bin\libmecab.dll
26
32
  e.g., for Cygwin
@@ -71,82 +77,7 @@ e.g., for Cygwin
71
77
  - Please try not to mess with the Rakefile, version, or history. If you must have your own version, that is fine, but please isolate to its own commit so I can cherry-pick around it.
72
78
 
73
79
  ## Changelog
74
-
75
- - __2011/01/22: 0.3.0 release.
76
- - Refactoring of Natto::Binding to make mecab methods available as class methods
77
- - Refactoring of Natto::DictionaryInfo to override to_s method to return filename
78
- - Refactoring of Natto::MeCab to use class methods in Natto::Binding
79
- - Refactoring and logical separation of test cases
80
- - Continuing update of documentation
81
-
82
- - __2011/01/19__: 0.2.0 release.
83
- - Added support for mecab option allocate-sentence
84
- - Continuing update of documentation
85
-
86
- - __2011/01/15__: 0.1.1 release.
87
- - Refactored Natto::DictionaryInfo#method_missing
88
- - Continuing update of documentation
89
-
90
- - __2011/01/15__: 0.1.0 release.
91
- - Added accessors to Natto::DictionaryInfo
92
- - Added accessor for version in Natto::MeCab
93
- - Continuing update of documentation
94
-
95
- - __2011/01/13__: 0.0.9 release.
96
- - Further development and testing for mecab dictionary access/destruction
97
- - Continuing update of documentation
98
-
99
- - __2011/01/07__: 0.0.8 release.
100
- - Adding support for accessing dictionaries
101
- - Further tweaking of documentation with markdown
102
-
103
- - __2010/12/30__: 0.0.7 release.
104
- - Adding support for all-morphs and partial options
105
- - Further updating of documentation with markdown
106
-
107
- - __2010/12/28__: 0.0.6 release.
108
- - Correction to natto.gemspec to include lib/natto/binding.rb
109
-
110
- - __2010/12/28__: 0.0.5 release. (yanked)
111
- - On-going refactoring
112
- - Project structure refactored for greater maintainability
113
-
114
- - __2010/12/26__: 0.0.4 release.
115
- - On-going refactoring
116
-
117
- - __2010/12/23__: 0.0.3 release.
118
- - On-going refactoring
119
- - Adding documentation via yard
120
-
121
- - __2010/12/20__: 0.0.2 release.
122
- - Continuing development on proper resource deallocation
123
- - Adding options hash in object initializer
124
-
125
- - __2010/12/13__: Released version 0.0.1. The objective is to provide
126
- an easy-to-use, production-level Ruby binding to MeCab.
127
- - Initial release
80
+ Please see the {file:CHANGELOG} for this gem's release history.
128
81
 
129
82
  ## Copyright
130
- Copyright &copy; 2010-2013, Brooke M. Fujita.
131
- All rights reserved.
132
-
133
- Redistribution and use in source and binary forms, with or without modification, are
134
- permitted provided that the following conditions are met:
135
-
136
- * Redistributions of source code must retain the above
137
- copyright notice, this list of conditions and the
138
- following disclaimer.
139
-
140
- * Redistributions in binary form must reproduce the above
141
- copyright notice, this list of conditions and the
142
- following disclaimer in the documentation and/or other
143
- materials provided with the distribution.
144
-
145
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
146
- WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
147
- PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
148
- ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
149
- LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
150
- INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
151
- TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
152
- ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
83
+ Copyright &copy; 2011, Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.
data/lib/natto.rb CHANGED
@@ -14,21 +14,23 @@ module Natto
14
14
  # require 'rubygems' if RUBY_VERSION.to_f < 1.9
15
15
  # require 'natto'
16
16
  #
17
- # mecab = Natto::MeCab.new
18
- # => #<Natto::MeCab:0x289b88e0 @ptr=#<FFI::Pointer address=0x288865c8>, \
19
- # @options={}, \
17
+ # mecab = Natto::MeCab.new(:output_format_type=>'wakati')
18
+ # => #<Natto::MeCab:0x28dd471c @ptr=#<FFI::Pointer address=0x28a027d8>, \
19
+ # @options={:output_format_type=>"wakati"}, \
20
20
  # @version="0.98", \
21
21
  # @dicts=[/usr/local/lib/mecab/dic/ipadic/sys.dic]>
22
22
  #
23
- # puts mecab.parse("ネバネバの組み合わせ美味しいです。")
24
- # ネバネバ 名詞,サ変接続,*,*,*,*,ネバネバ,ネバネバ,ネバネバ
25
- # の 助詞,連体化,*,*,*,*,の,ノ,ノ
26
- # 組み合わせ 名詞,一般,*,*,*,*,組み合わせ,クミアワセ,クミアワセ
27
- # 美味しい 形容詞,自立,*,*,形容詞・イ段,基本形,美味しい,オイシイ,オイシイ
28
- # です 助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
29
- # 。 デス記号,句点,*,*,*,*,。,。,。
30
- # EOS
31
- # => nil
23
+ # output = mecab.parse('ネバネバの組み合わせ美味しいです。').split
24
+ #
25
+ # output.each do |token|
26
+ # puts token
27
+ # end
28
+ # => ネバネバ
29
+ #
30
+ # 組み合わせ
31
+ # 美味しい
32
+ # です
33
+ # 。
32
34
  #
33
35
  class MeCab
34
36
  include Natto::Binding
@@ -40,7 +42,8 @@ module Natto
40
42
  SUPPORTED_OPTS = [ :rcfile, :dicdir, :userdic, :lattice_level, :all_morphs,
41
43
  :output_format_type, :partial, :node_format, :unk_format,
42
44
  :bos_format, :eos_format, :eon_format, :unk_feature,
43
- :allocate_sentence, :nbest, :theta, :cost_factor ].freeze
45
+ :input_buffer_size, :allocate_sentence, :nbest, :theta,
46
+ :cost_factor ].freeze
44
47
 
45
48
  # Initializes the wrapped <tt>mecab</tt> instance with the
46
49
  # given <tt>options</tt> hash.
@@ -60,6 +63,7 @@ module Natto
60
63
  # - :eos_format -- user-defined end-of-sentence format
61
64
  # - :eon_format -- user-defined end-of-NBest format
62
65
  # - :unk_feature -- feature for unknown word
66
+ # - :input_buffer_size -- set input buffer size (default 8192)
63
67
  # - :allocate_sentence -- allocate new memory for input sentence
64
68
  # - :nbest -- output N best results (integer, default 1)
65
69
  # - :theta -- temperature parameter theta (float, default 0.75)
@@ -98,8 +102,8 @@ module Natto
98
102
  raise MeCabError.new("Could not initialize MeCab with options: '#{opt_str}'") if @ptr.address == 0x0
99
103
 
100
104
  @dicts << Natto::DictionaryInfo.new(Natto::Binding.mecab_dictionary_info(@ptr))
101
- while @dicts.last[:next].address != 0x0
102
- @dicts << Natto::DictionaryInfo.new(@dicts.last[:next])
105
+ while @dicts.last.next.address != 0x0
106
+ @dicts << Natto::DictionaryInfo.new(@dicts.last.next)
103
107
  end
104
108
 
105
109
  @version = self.mecab_version
@@ -229,13 +233,13 @@ module Natto
229
233
  # @raise [NoMethodError] if <tt>attr_name</tt> is not a member of this <tt>mecab</tt> dictionary <tt>FFI::Struct</tt>
230
234
  def method_missing(attr_name)
231
235
  member_sym = attr_name.id2name.to_sym
232
- if self.members.include?(member_sym)
233
- self[member_sym]
234
- else
235
- raise(NoMethodError.new("undefined method '#{attr_name}' for #{self}"))
236
- end
236
+ return self[member_sym] if self.members.include?(member_sym)
237
+ raise(NoMethodError.new("undefined method '#{attr_name}' for #{self}"))
237
238
  end
238
239
 
240
+ # Returns the full-path file name for this dictionary. Overrides <tt>Object#to_s</tt>.
241
+ #
242
+ # @return [String] full-path filename for this dictionary
239
243
  def to_s
240
244
  self[:filename]
241
245
  end
data/lib/natto/version.rb CHANGED
@@ -16,5 +16,5 @@
16
16
  # which are made available via <tt>FFI</tt> bindings to <tt>mecab</tt>.
17
17
  module Natto
18
18
  # Version string for this Rubygem.
19
- VERSION = "0.3.0"
19
+ VERSION = "0.4.0"
20
20
  end
@@ -23,6 +23,7 @@ class TestDictionaryInfo < Test::Unit::TestCase
23
23
  assert_equal('/usr/local/lib/mecab/dic/ipadic/sys.dic', sysdic[:filename])
24
24
  assert_equal('utf8', sysdic[:charset])
25
25
  assert_equal(0x0, sysdic[:next].address)
26
+ #assert_nil(sysdic.next)
26
27
  end
27
28
 
28
29
  # Tests the to_s method.
@@ -42,7 +43,7 @@ class TestDictionaryInfo < Test::Unit::TestCase
42
43
 
43
44
  # NoMethodError will be raised for anything else!
44
45
  assert_raise NoMethodError do
45
- sysdic.send :nomethoderror
46
+ sysdic.send :unknown_attr
46
47
  end
47
48
  end
48
49
  end
@@ -58,6 +58,9 @@ class TestMeCab < Test::Unit::TestCase
58
58
  res = Natto::MeCab.build_options_str(:unk_feature=>'%m\t%f[7]\n')
59
59
  assert_equal('--unk-feature=%m\t%f[7]\n', res)
60
60
 
61
+ res = Natto::MeCab.build_options_str(:input_buffer_size=>102400)
62
+ assert_equal('--input-buffer-size=102400', res)
63
+
61
64
  res = Natto::MeCab.build_options_str(:allocate_sentence=>true)
62
65
  assert_equal('--allocate-sentence', res)
63
66
 
metadata CHANGED
@@ -1,12 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: natto
3
3
  version: !ruby/object:Gem::Version
4
- prerelease: false
4
+ hash: 15
5
+ prerelease:
5
6
  segments:
6
7
  - 0
7
- - 3
8
+ - 4
8
9
  - 0
9
- version: 0.3.0
10
+ version: 0.4.0
10
11
  platform: ruby
11
12
  authors:
12
13
  - Brooke M. Fujita
@@ -14,7 +15,7 @@ autorequire:
14
15
  bindir: bin
15
16
  cert_chain: []
16
17
 
17
- date: 2011-01-22 00:00:00 +09:00
18
+ date: 2011-01-26 00:00:00 +09:00
18
19
  default_executable:
19
20
  dependencies:
20
21
  - !ruby/object:Gem::Dependency
@@ -25,6 +26,7 @@ dependencies:
25
26
  requirements:
26
27
  - - ">="
27
28
  - !ruby/object:Gem::Version
29
+ hash: 1
28
30
  segments:
29
31
  - 0
30
32
  - 6
@@ -34,7 +36,7 @@ dependencies:
34
36
  version_requirements: *id001
35
37
  description: |
36
38
  natto is a gem bridging Ruby and MeCab using FFI (foreign function interface).
37
- No compilation is necessary, and natto works on any platform and on any OS.
39
+ No compilation is necessary, and natto works on any Ruby platform and on any OS.
38
40
 
39
41
  Find out more about natto by visiting the
40
42
  project homepage at http://code.google.com/p/natto/
@@ -54,8 +56,9 @@ files:
54
56
  - test/natto/tc_binding.rb
55
57
  - test/natto/tc_dictionaryinfo.rb
56
58
  - test/natto/tc_mecab.rb
57
- - LICENSE
58
59
  - README.md
60
+ - LICENSE
61
+ - CHANGELOG
59
62
  - .yardopts
60
63
  has_rdoc: true
61
64
  homepage: http://code.google.com/p/natto/
@@ -71,6 +74,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
71
74
  requirements:
72
75
  - - ">="
73
76
  - !ruby/object:Gem::Version
77
+ hash: 57
74
78
  segments:
75
79
  - 1
76
80
  - 8
@@ -81,6 +85,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
81
85
  requirements:
82
86
  - - ">="
83
87
  - !ruby/object:Gem::Version
88
+ hash: 3
84
89
  segments:
85
90
  - 0
86
91
  version: "0"
@@ -88,7 +93,7 @@ requirements:
88
93
  - MeCab, 0.98 or greater
89
94
  - FFI, 0.6.3 or greater
90
95
  rubyforge_project:
91
- rubygems_version: 1.3.7
96
+ rubygems_version: 1.4.2
92
97
  signing_key:
93
98
  specification_version: 3
94
99
  summary: natto combines the Ruby programming language with MeCab, the part-of-speech and morphological analyzer for the Japanese language.