natto 0.9.9 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: dda6a3b2c795f26a54307637aeaa14b44f2035f0
4
+ data.tar.gz: 53b0944ea5822e7e3ab2e38bb07e0bc44d8336de
5
+ SHA512:
6
+ metadata.gz: aa4d39e5e0b2a11449a55f7c002a257444af14d1e534316ede57f5d2aedb64d95e9b00f5f77396f6701068e4203873856c91fea0fbe07d9ddf949e2913aa5a99
7
+ data.tar.gz: db01e6275b7348af0c58adcf0f7c7f62a29549bea5cff29c3e15e001550deebd8c10c66d0c88ce2ed6d9715d5d0681c4b2e3a554becd4c8d73f506777fe83287
data/CHANGELOG CHANGED
@@ -1,5 +1,18 @@
1
1
  ## CHANGELOG
2
2
 
3
+ - __2015/04/14__: 1.0.0 release.
4
+ - Issue 36: Fixed @param documentation
5
+ - Issue 37: README and bullet points under Automatic Configuration
6
+ - Issue 38: Updated URLs in documentation to point to Ruby 2.2.1
7
+ - Issue 39: Make refs to MeCab and Tagger consistent in docs
8
+ - Issue 40: Use new Model- and Lattice-based C APIs internally
9
+ - Issue 45: Add support for feature constraint parsing
10
+ - Issue 48: Put in a guard to prevent partial parsing of text that does not end with a new-line char
11
+ - Issue 50: Update all references to Natto::MeCab in documentation to reflect new internal structure
12
+ - Issue 52: Downloads and license badges for README.md & API docs
13
+ - Issue 55: Node parsing with --all-morphs option missing surface values
14
+
15
+
3
16
  - __2015/03/31__: 0.9.9 release.
4
17
  - Issue 21/34: Implemented boundary constraint parsing.
5
18
  - Issue 26: Removing deprecated methods parse_as_nodes, parse_as_strings, readnodes and readlines.
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # natto [![Gem Version](https://badge.fury.io/rb/natto.svg)](http://badge.fury.io/rb/natto) [![Build Status](https://travis-ci.org/buruzaemon/natto.svg?branch=master)](https://travis-ci.org/buruzaemon/natto)
1
+ # natto [![Gem Version](https://badge.fury.io/rb/natto.svg)](https://rubygems.org/gems/natto) [![Build Status](https://travis-ci.org/buruzaemon/natto.svg?branch=master)](https://travis-ci.org/buruzaemon/natto) [![Gem Downloads](https://img.shields.io/gem/dt/natto.svg)](https://rubygems.org/gems/natto) [![Gem License](https://img.shields.io/badge/license-BSD-blue.svg)]()
2
2
  A Tasty Ruby Binding with MeCab
3
3
 
4
4
  ## What is natto?
@@ -47,6 +47,7 @@ However, if you are using a CRuby on Windows, then you will first need to instal
47
47
 
48
48
  ## Automatic Configuration
49
49
  No explicit configuration should be necessary, as natto will try to locate the MeCab library based upon its runtime environment.
50
+
50
51
  - On OS X and \*nix, it will query `mecab-config --libs`
51
52
  - On Windows, it will query the Windows Registry to determine where `libmecab.dll` is installed
52
53
 
@@ -78,14 +79,16 @@ Instantiate a reference to the MeCab library, and display some details:
78
79
  require 'natto'
79
80
 
80
81
  nm = Natto::MeCab.new
81
- => #<Natto::MeCab:0x28d30748
82
- @tagger=#<FFI::Pointer address=0x28a97d50>, \
83
- @libpath="/usr/local/lib/libmecab.so", \
84
- @options={}, \
85
- @dicts=[#<Natto::DictionaryInfo:0x28d3061c \
82
+ => #<Natto::MeCab:0x00000803633ae8
83
+ @model=#<FFI::Pointer address=0x000008035d4640>, \
84
+ @tagger=#<FFI::Pointer address=0x00000802b07c90>, \
85
+ @lattice=#<FFI::Pointer address=0x00000803602f80>, \
86
+ @libpath="/usr/local/lib/libmecab.so", \
87
+ @options={}, \
88
+ @dicts=[#<Natto::DictionaryInfo:0x000008036337c8 \
86
89
  @filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic", \
87
- charset=utf8, \
88
- type=0>] \
90
+ charset=utf8, \
91
+ type=0>] \
89
92
  @version=0.996>
90
93
 
91
94
  puts nm.version
@@ -167,7 +170,7 @@ nodes:
167
170
 
168
171
  For more complex parsing, such as that for natural language
169
172
  processing tasks, it is far more efficient to use `enum_parse` to
170
- obtain an [`Enumerator`](http://ruby-doc.org/core-2.2.0/Enumerator.html)
173
+ obtain an [`Enumerator`](http://ruby-doc.org/core-2.2.1/Enumerator.html)
171
174
  to iterate over the resulting `MeCabNode` instances. An `Enumerator`
172
175
  yields each `MeCabNode` instance without first materializing all
173
176
  instances at once, thus being more efficient.
@@ -226,11 +229,13 @@ back to the beginning, and then iterate over it.
226
229
  ----
227
230
 
228
231
  [Partial parsing](http://taku910.github.io/mecab/partial.html) allows you to
229
- pass hints to MeCab on how to tokenize morphemes when parsing. With boundary
230
- constraint parsing, you can specify either
232
+ pass hints to MeCab on how to tokenize morphemes when parsing. Most useful are
233
+ boundary constraint parsing and feature constraint parsing.
234
+
235
+ With boundary constraint parsing, you can specify either
231
236
  a [Regexp](http://ruby-doc.org/core-2.2.1/Regexp.html) or
232
237
  [String](http://ruby-doc.org/core-2.2.1/String.html) to tell MeCab where the
233
- boundaries of a morpheme should be. Use the new `boundary_constraints` keyword.
238
+ boundaries of a morpheme should be. Use the `boundary_constraints` keyword.
234
239
  For hints on tokenization, please see
235
240
  [String#scan](http://ruby-doc.org/core-2.2.1/String.html#method-i-scan)
236
241
 
@@ -242,6 +247,7 @@ the resulting `MeCabNode` feature attribute to extract:
242
247
  - `%s` - node `stat` status value, 1 is `unknown`
243
248
 
244
249
  Note that any such morphemes captured will have node `stat` status of unknown.
250
+ Also note that MeCab will tag such nodes as a noun.
245
251
 
246
252
  nm = Natto::MeCab.new('-F%m,\s%f[0],\s%s')
247
253
 
@@ -268,6 +274,37 @@ Note that any such morphemes captured will have node `stat` status of unknown.
268
274
  ヒーロー見参, 名詞, 1
269
275
  !, 記号, 0
270
276
 
277
+ With feature constraint parsing, you can provide instructions to MeCab on
278
+ what feature to use for a matching morpheme. Use the `feature_constraints`
279
+ keyword to pass in a hash mapping a specific morpheme key (String)
280
+ to a corresponding feature (String).
281
+
282
+ # we re-use nm and text from above
283
+
284
+ nm.options
285
+ => {:node_format=>"%m,\\s%f[0],\\s%s"}
286
+
287
+ mapping = {"ヒーロー見参"=>"その他"}
288
+
289
+ nm.enum_parse(text, feature_constraints: mapping).each do |n|
290
+ puts n.feature if !(n.is_bos? || n.is_eos?)
291
+ end
292
+
293
+ # ヒーロー見参 will be treated as a single morpheme mapping to その他
294
+ 心, 名詞, 0
295
+ の, 助詞, 0
296
+ 中, 名詞, 0
297
+ で, 助詞, 0
298
+ 3, 名詞, 1
299
+ 回, 名詞, 0
300
+ 唱え, 動詞, 0
301
+ 、, 記号, 0
302
+ ヒーロー見参, その他, 1
303
+ !, 記号, 0
304
+ ヒーロー見参, その他, 1
305
+ !, 記号, 0
306
+ ヒーロー見参, その他, 1
307
+ !, 記号, 0
271
308
 
272
309
 
273
310
  ## Learn more
@@ -1,3 +1,4 @@
1
+ # coding: utf-8
1
2
  require 'natto/natto'
2
3
 
3
4
  # Copyright (c) 2015, Brooke M. Fujita.
@@ -2,16 +2,12 @@
2
2
  module Natto
3
3
 
4
4
  # Module `Binding` encapsulates methods and behavior
5
- # which are made available via `FFI` bindings to
6
- # `mecab`.
5
+ # which are made available via `FFI` bindings to MeCab.
7
6
  module Binding
8
7
  require 'ffi'
9
8
  require 'rbconfig'
10
9
  extend FFI::Library
11
10
 
12
- # String name for the environment variable used by
13
- # `Natto` to indicate the absolute pathname
14
- # to the `mecab` library.
15
11
  MECAB_PATH = 'MECAB_PATH'.freeze
16
12
 
17
13
  # @private
@@ -19,89 +15,91 @@ module Natto
19
15
  base.extend(ClassMethods)
20
16
  end
21
17
 
22
- # Returns the absolute pathname to the `mecab` library based on
18
+ # Returns the absolute pathname to the MeCab library based on
23
19
  # the runtime environment.
24
- #
25
- # @return [String] absolute pathname to the `mecab` library
20
+ # @return [String] absolute pathname to the MeCab library
26
21
  # @raise [LoadError] if the library cannot be located
27
22
  def self.find_library
28
- return File.absolute_path(ENV[MECAB_PATH]) if ENV[MECAB_PATH]
29
-
30
- host_os = RbConfig::CONFIG['host_os']
31
-
32
- if host_os =~ /mswin|mingw/i
33
- require 'win32/registry'
34
- begin
35
- base = nil
36
- Win32::Registry::HKEY_CURRENT_USER.open('Software\MeCab') do |r|
37
- base = r['mecabrc'].split('etc').first
38
- end
39
- lib = File.join(base, 'bin/libmecab.dll')
40
- File.absolute_path(lib)
41
- rescue
42
- raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.dll"
43
- end
23
+ if ENV[MECAB_PATH]
24
+ File.absolute_path(ENV[MECAB_PATH])
44
25
  else
45
- require 'open3'
46
- if host_os =~ /darwin/i
47
- ext = 'dylib'
26
+ host_os = RbConfig::CONFIG['host_os']
27
+
28
+ if host_os =~ /mswin|mingw/i
29
+ require 'win32/registry'
30
+ begin
31
+ base = nil
32
+ Win32::Registry::HKEY_CURRENT_USER.open('Software\MeCab') do |r|
33
+ base = r['mecabrc'].split('etc').first
34
+ end
35
+ lib = File.join(base, 'bin/libmecab.dll')
36
+ File.absolute_path(lib)
37
+ rescue
38
+ raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.dll"
39
+ end
48
40
  else
49
- ext = 'so'
50
- end
41
+ require 'open3'
42
+ if host_os =~ /darwin/i
43
+ ext = 'dylib'
44
+ else
45
+ ext = 'so'
46
+ end
51
47
 
52
- begin
53
- base, lib = nil, nil
54
- cmd = 'mecab-config --libs'
55
- Open3.popen3(cmd) do |stdin,stdout,stderr|
56
- toks = stdout.read.split
57
- base = toks[0][2..-1]
58
- lib = toks[1][2..-1]
48
+ begin
49
+ base, lib = nil, nil
50
+ cmd = 'mecab-config --libs'
51
+ Open3.popen3(cmd) do |stdin,stdout,stderr|
52
+ toks = stdout.read.split
53
+ base = toks[0][2..-1]
54
+ lib = toks[1][2..-1]
55
+ end
56
+ File.absolute_path(File.join(base, "lib#{lib}.#{ext}"))
57
+ rescue
58
+ raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.#{ext}"
59
59
  end
60
- File.absolute_path(File.join(base, "lib#{lib}.#{ext}"))
61
- rescue
62
- raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.#{ext}"
63
60
  end
64
61
  end
65
62
  end
66
63
 
67
64
  ffi_lib find_library
68
65
 
69
- # C interface
70
- attach_function :mecab_new2, [:string], :pointer
66
+ # Model interface
67
+ attach_function :mecab_model_new2, [:string], :pointer
68
+ attach_function :mecab_model_destroy, [:pointer], :void
69
+ attach_function :mecab_model_new_tagger, [:pointer], :pointer
70
+ attach_function :mecab_model_new_lattice, [:pointer], :pointer
71
+ attach_function :mecab_model_dictionary_info, [:pointer], :pointer
72
+
73
+ # Tagger interface
74
+ attach_function :mecab_destroy, [:pointer], :void
71
75
  attach_function :mecab_version, [], :string
72
76
  attach_function :mecab_strerror, [:pointer],:string
73
- attach_function :mecab_destroy, [:pointer], :void
74
- attach_function :mecab_set_partial, [:pointer, :int], :void
75
- attach_function :mecab_set_theta, [:pointer, :float], :void
76
- attach_function :mecab_set_lattice_level, [:pointer, :int], :void
77
- attach_function :mecab_set_all_morphs, [:pointer, :int], :void
78
- attach_function :mecab_sparse_tostr, [:pointer, :string], :string
79
- attach_function :mecab_sparse_tonode, [:pointer, :string], :pointer
80
- attach_function :mecab_nbest_init, [:pointer, :string], :int
81
- attach_function :mecab_nbest_sparse_tostr, [:pointer, :int, :string], :string
82
- attach_function :mecab_nbest_next_tonode, [:pointer], :pointer
83
77
  attach_function :mecab_format_node, [:pointer, :pointer], :string
84
- attach_function :mecab_dictionary_info, [:pointer], :pointer
85
78
 
86
- attach_function :mecab_lattice_new, [], :pointer
79
+ # Lattice interface
87
80
  attach_function :mecab_lattice_destroy, [:pointer], :void
88
81
  attach_function :mecab_lattice_clear, [:pointer], :void
89
82
  attach_function :mecab_lattice_is_available, [:pointer], :int
90
- attach_function :mecab_lattice_get_bos_node, [:pointer], :pointer
83
+ attach_function :mecab_lattice_strerror, [:pointer], :string
84
+
85
+ attach_function :mecab_lattice_get_sentence, [:pointer], :string
91
86
  attach_function :mecab_lattice_set_sentence, [:pointer, :string], :void
92
87
  attach_function :mecab_lattice_get_size, [:pointer], :int
93
- attach_function :mecab_lattice_set_z, [:pointer, :float], :void
94
88
  attach_function :mecab_lattice_set_theta, [:pointer, :float], :void
95
- attach_function :mecab_lattice_next, [:pointer], :int
89
+ attach_function :mecab_lattice_set_z, [:pointer, :float], :void
96
90
  attach_function :mecab_lattice_get_request_type, [:pointer], :int
97
91
  attach_function :mecab_lattice_add_request_type, [:pointer, :int], :void
98
92
  attach_function :mecab_lattice_set_request_type, [:pointer, :int], :void
99
- attach_function :mecab_lattice_tostr, [:pointer], :string
100
- attach_function :mecab_lattice_nbest_tostr, [:pointer, :int], :string
101
93
  attach_function :mecab_lattice_get_boundary_constraint, [:pointer, :int], :int
102
94
  attach_function :mecab_lattice_set_boundary_constraint, [:pointer, :int, :int], :void
95
+ attach_function :mecab_lattice_get_feature_constraint, [:pointer, :int], :string
96
+ attach_function :mecab_lattice_set_feature_constraint, [:pointer, :int, :int, :string], :void
97
+
103
98
  attach_function :mecab_parse_lattice, [:pointer, :pointer], :int
104
- attach_function :mecab_lattice_strerror, [:pointer], :string
99
+ attach_function :mecab_lattice_next, [:pointer], :int
100
+ attach_function :mecab_lattice_tostr, [:pointer], :string
101
+ attach_function :mecab_lattice_nbest_tostr, [:pointer, :int], :string
102
+ attach_function :mecab_lattice_get_bos_node, [:pointer], :pointer
105
103
 
106
104
  # @private
107
105
  module ClassMethods
@@ -110,75 +108,45 @@ module Natto
110
108
  Natto::Binding.find_library
111
109
  end
112
110
 
113
- # ----------------------------------------
114
- def mecab_new2(options_str)
115
- Natto::Binding.mecab_new2(options_str)
116
- end
117
-
118
- def mecab_version
119
- Natto::Binding.mecab_version
120
- end
121
-
122
- def mecab_strerror(tptr)
123
- Natto::Binding.mecab_strerror(tptr)
111
+ # Model interface ------------------------
112
+ def mecab_model_new2(opts_str)
113
+ Natto::Binding.mecab_model_new2(opts_str)
124
114
  end
125
115
 
126
- def mecab_destroy(tptr)
127
- Natto::Binding.mecab_destroy(tptr)
116
+ def mecab_model_destroy(mptr)
117
+ Natto::Binding.mecab_model_destroy(mptr)
128
118
  end
129
119
 
130
- def mecab_set_partial(tptr, ll)
131
- Natto::Binding.mecab_set_partial(tptr, ll)
132
- end
133
-
134
- def mecab_set_theta(tptr, t)
135
- Natto::Binding.mecab_set_theta(tptr, t)
120
+ def mecab_model_new_tagger(mptr)
121
+ Natto::Binding.mecab_model_new_tagger(mptr)
136
122
  end
137
123
 
138
- def mecab_set_lattice_level(tptr, ll)
139
- Natto::Binding.mecab_set_lattice_level(tptr, ll)
140
- end
141
-
142
- def mecab_set_all_morphs(tptr, am)
143
- Natto::Binding.mecab_set_all_morphs(tptr, am)
124
+ def mecab_model_new_lattice(mptr)
125
+ Natto::Binding.mecab_model_new_lattice(mptr)
144
126
  end
145
127
 
146
- def mecab_sparse_tostr(tptr, str)
147
- Natto::Binding.mecab_sparse_tostr(tptr, str)
148
- end
149
-
150
- def mecab_sparse_tonode(tptr, str)
151
- Natto::Binding.mecab_sparse_tonode(tptr, str)
152
- end
153
-
154
- def mecab_nbest_next_tonode(tptr)
155
- Natto::Binding.mecab_nbest_next_tonode(tptr)
128
+ def mecab_model_dictionary_info(mptr)
129
+ Natto::Binding.mecab_model_dictionary_info(mptr)
156
130
  end
157
131
 
158
- def mecab_nbest_init(tptr, str)
159
- Natto::Binding.mecab_nbest_init(tptr, str)
132
+ # Tagger interface -----------------------
133
+ def mecab_destroy(tptr)
134
+ Natto::Binding.mecab_destroy(tptr)
160
135
  end
161
136
 
162
- def mecab_nbest_sparse_tostr(tptr, n, str)
163
- Natto::Binding.mecab_nbest_sparse_tostr(tptr, n, str)
137
+ def mecab_version
138
+ Natto::Binding.mecab_version
164
139
  end
165
140
 
166
- def mecab_nbest_next_tonode(tptr)
167
- Natto::Binding.mecab_nbest_next_tonode(tptr)
141
+ def mecab_strerror(tptr)
142
+ Natto::Binding.mecab_strerror(tptr)
168
143
  end
169
144
 
170
145
  def mecab_format_node(tptr, nptr)
171
146
  Natto::Binding.mecab_format_node(tptr, nptr)
172
147
  end
173
-
174
- def mecab_dictionary_info(tptr)
175
- Natto::Binding.mecab_dictionary_info(tptr)
176
- end
177
-
178
- def mecab_lattice_new()
179
- Natto::Binding.mecab_lattice_new()
180
- end
181
148
 
149
+ # Lattice interface ----------------------
182
150
  def mecab_lattice_destroy(lptr)
183
151
  Natto::Binding.mecab_lattice_destroy(lptr)
184
152
  end
@@ -191,10 +159,14 @@ module Natto
191
159
  Natto::Binding.mecab_lattice_is_available(lptr)
192
160
  end
193
161
 
194
- def mecab_lattice_get_bos_node(lptr)
195
- Natto::Binding.mecab_lattice_get_bos_node(lptr)
162
+ def mecab_lattice_strerror(lptr)
163
+ Natto::Binding.mecab_lattice_strerror(lptr)
196
164
  end
197
-
165
+
166
+ def mecab_lattice_get_sentence(lptr)
167
+ Natto::Binding.mecab_lattice_get_sentence(lptr)
168
+ end
169
+
198
170
  def mecab_lattice_set_sentence(lptr, str)
199
171
  Natto::Binding.mecab_lattice_set_sentence(lptr, str)
200
172
  end
@@ -202,39 +174,27 @@ module Natto
202
174
  def mecab_lattice_get_size(lptr)
203
175
  Natto::Binding.mecab_lattice_get_size(lptr)
204
176
  end
205
-
206
- def mecab_lattice_set_z(lptr, z)
207
- Natto::Binding.mecab_lattice_set_z(lptr, z)
208
- end
209
-
177
+
210
178
  def mecab_lattice_set_theta(lptr, t)
211
179
  Natto::Binding.mecab_lattice_set_theta(lptr, t)
212
180
  end
213
181
 
214
- def mecab_lattice_next(lptr)
215
- Natto::Binding.mecab_lattice_next(lptr)
182
+ def mecab_lattice_set_z(lptr, z)
183
+ Natto::Binding.mecab_lattice_set_z(lptr, z)
216
184
  end
217
-
185
+
218
186
  def mecab_lattice_get_request_type(lptr)
219
187
  Natto::Binding.mecab_lattice_get_request_type(lptr)
220
188
  end
221
189
 
222
- def mecab_lattice_add_request_type(lptr, rtype)
223
- Natto::Binding.mecab_lattice_add_request_type(lptr, rtype)
190
+ def mecab_lattice_add_request_type(lptr, rt)
191
+ Natto::Binding.mecab_lattice_add_request_type(lptr, rt)
224
192
  end
225
193
 
226
194
  def mecab_lattice_set_request_type(lptr, rtype)
227
195
  Natto::Binding.mecab_lattice_set_request_type(lptr, rtype)
228
196
  end
229
197
 
230
- def mecab_lattice_tostr(lptr)
231
- Natto::Binding.mecab_lattice_tostr(lptr)
232
- end
233
-
234
- def mecab_lattice_nbest_tostr(lptr, n)
235
- Natto::Binding.mecab_lattice_nbest_tostr(lptr, n)
236
- end
237
-
238
198
  def mecab_lattice_get_boundary_constraint(lptr, pos)
239
199
  Natto::Binding.mecab_lattice_get_boundary_constraint(lptr, pos)
240
200
  end
@@ -243,12 +203,35 @@ module Natto
243
203
  Natto::Binding.mecab_lattice_set_boundary_constraint(lptr, pos, btype)
244
204
  end
245
205
 
206
+ def mecab_lattice_get_feature_constraint(lptr, bpos)
207
+ Natto::Binding.mecab_lattice_get_feature_constraint(lptr, bpos)
208
+ end
209
+
210
+ def mecab_lattice_set_feature_constraint(lptr, bpos, epos, feat)
211
+ Natto::Binding.mecab_lattice_set_feature_constraint(lptr,
212
+ bpos,
213
+ epos,
214
+ feat)
215
+ end
216
+
246
217
  def mecab_parse_lattice(tptr, lptr)
247
218
  Natto::Binding.mecab_parse_lattice(tptr, lptr)
248
219
  end
249
220
 
250
- def mecab_lattice_strerror(lptr)
251
- Natto::Binding.mecab_lattice_strerror(lptr)
221
+ def mecab_lattice_next(lptr)
222
+ Natto::Binding.mecab_lattice_next(lptr)
223
+ end
224
+
225
+ def mecab_lattice_tostr(lptr)
226
+ Natto::Binding.mecab_lattice_tostr(lptr)
227
+ end
228
+
229
+ def mecab_lattice_nbest_tostr(lptr, n)
230
+ Natto::Binding.mecab_lattice_nbest_tostr(lptr, n)
231
+ end
232
+
233
+ def mecab_lattice_get_bos_node(lptr)
234
+ Natto::Binding.mecab_lattice_get_bos_node(lptr)
252
235
  end
253
236
  end
254
237
  end