natto 0.9.9 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: dda6a3b2c795f26a54307637aeaa14b44f2035f0
4
+ data.tar.gz: 53b0944ea5822e7e3ab2e38bb07e0bc44d8336de
5
+ SHA512:
6
+ metadata.gz: aa4d39e5e0b2a11449a55f7c002a257444af14d1e534316ede57f5d2aedb64d95e9b00f5f77396f6701068e4203873856c91fea0fbe07d9ddf949e2913aa5a99
7
+ data.tar.gz: db01e6275b7348af0c58adcf0f7c7f62a29549bea5cff29c3e15e001550deebd8c10c66d0c88ce2ed6d9715d5d0681c4b2e3a554becd4c8d73f506777fe83287
data/CHANGELOG CHANGED
@@ -1,5 +1,18 @@
1
1
  ## CHANGELOG
2
2
 
3
+ - __2015/04/14__: 1.0.0 release.
4
+ - Issue 36: Fixed @param documentation
5
+ - Issue 37: README and bullet points under Automatic Configuration
6
+ - Issue 38: Updated URLs in documentation to point to Ruby 2.2.1
7
+ - Issue 39: Make refs to MeCab and Tagger consistent in docs
8
+ - Issue 40: Use new Model- and Lattice-based C APIs internally
9
+ - Issue 45: Add support for feature constraint parsing
10
+ - Issue 48: Put in a guard to prevent partial parsing of text that does not end with a new-line char
11
+ - Issue 50: Update all references to Natto::MeCab in documentation to reflect new internal structure
12
+ - Issue 52: Downloads and license badges for README.md & API docs
13
+ - Issue 55: Node parsing with --all-morphs option missing surface values
14
+
15
+
3
16
  - __2015/03/31__: 0.9.9 release.
4
17
  - Issue 21/34: Implemented boundary constraint parsing.
5
18
  - Issue 26: Removing deprecated methods parse_as_nodes, parse_as_strings, readnodes and readlines.
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # natto [![Gem Version](https://badge.fury.io/rb/natto.svg)](http://badge.fury.io/rb/natto) [![Build Status](https://travis-ci.org/buruzaemon/natto.svg?branch=master)](https://travis-ci.org/buruzaemon/natto)
1
+ # natto [![Gem Version](https://badge.fury.io/rb/natto.svg)](https://rubygems.org/gems/natto) [![Build Status](https://travis-ci.org/buruzaemon/natto.svg?branch=master)](https://travis-ci.org/buruzaemon/natto) [![Gem Downloads](https://img.shields.io/gem/dt/natto.svg)](https://rubygems.org/gems/natto) [![Gem License](https://img.shields.io/badge/license-BSD-blue.svg)]()
2
2
  A Tasty Ruby Binding with MeCab
3
3
 
4
4
  ## What is natto?
@@ -47,6 +47,7 @@ However, if you are using a CRuby on Windows, then you will first need to instal
47
47
 
48
48
  ## Automatic Configuration
49
49
  No explicit configuration should be necessary, as natto will try to locate the MeCab library based upon its runtime environment.
50
+
50
51
  - On OS X and \*nix, it will query `mecab-config --libs`
51
52
  - On Windows, it will query the Windows Registry to determine where `libmecab.dll` is installed
52
53
 
@@ -78,14 +79,16 @@ Instantiate a reference to the MeCab library, and display some details:
78
79
  require 'natto'
79
80
 
80
81
  nm = Natto::MeCab.new
81
- => #<Natto::MeCab:0x28d30748
82
- @tagger=#<FFI::Pointer address=0x28a97d50>, \
83
- @libpath="/usr/local/lib/libmecab.so", \
84
- @options={}, \
85
- @dicts=[#<Natto::DictionaryInfo:0x28d3061c \
82
+ => #<Natto::MeCab:0x00000803633ae8
83
+ @model=#<FFI::Pointer address=0x000008035d4640>, \
84
+ @tagger=#<FFI::Pointer address=0x00000802b07c90>, \
85
+ @lattice=#<FFI::Pointer address=0x00000803602f80>, \
86
+ @libpath="/usr/local/lib/libmecab.so", \
87
+ @options={}, \
88
+ @dicts=[#<Natto::DictionaryInfo:0x000008036337c8 \
86
89
  @filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic", \
87
- charset=utf8, \
88
- type=0>] \
90
+ charset=utf8, \
91
+ type=0>] \
89
92
  @version=0.996>
90
93
 
91
94
  puts nm.version
@@ -167,7 +170,7 @@ nodes:
167
170
 
168
171
  For more complex parsing, such as that for natural language
169
172
  processing tasks, it is far more efficient to use `enum_parse` to
170
- obtain an [`Enumerator`](http://ruby-doc.org/core-2.2.0/Enumerator.html)
173
+ obtain an [`Enumerator`](http://ruby-doc.org/core-2.2.1/Enumerator.html)
171
174
  to iterate over the resulting `MeCabNode` instances. An `Enumerator`
172
175
  yields each `MeCabNode` instance without first materializing all
173
176
  instances at once, thus being more efficient.
@@ -226,11 +229,13 @@ back to the beginning, and then iterate over it.
226
229
  ----
227
230
 
228
231
  [Partial parsing](http://taku910.github.io/mecab/partial.html) allows you to
229
- pass hints to MeCab on how to tokenize morphemes when parsing. With boundary
230
- constraint parsing, you can specify either
232
+ pass hints to MeCab on how to tokenize morphemes when parsing. Most useful are
233
+ boundary constraint parsing and feature constraint parsing.
234
+
235
+ With boundary constraint parsing, you can specify either
231
236
  a [Regexp](http://ruby-doc.org/core-2.2.1/Regexp.html) or
232
237
  [String](http://ruby-doc.org/core-2.2.1/String.html) to tell MeCab where the
233
- boundaries of a morpheme should be. Use the new `boundary_constraints` keyword.
238
+ boundaries of a morpheme should be. Use the `boundary_constraints` keyword.
234
239
  For hints on tokenization, please see
235
240
  [String#scan](http://ruby-doc.org/core-2.2.1/String.html#method-i-scan)
236
241
 
@@ -242,6 +247,7 @@ the resulting `MeCabNode` feature attribute to extract:
242
247
  - `%s` - node `stat` status value, 1 is `unknown`
243
248
 
244
249
  Note that any such morphemes captured will have node `stat` status of unknown.
250
+ Also note that MeCab will tag such nodes as a noun.
245
251
 
246
252
  nm = Natto::MeCab.new('-F%m,\s%f[0],\s%s')
247
253
 
@@ -268,6 +274,37 @@ Note that any such morphemes captured will have node `stat` status of unknown.
268
274
  ヒーロー見参, 名詞, 1
269
275
  !, 記号, 0
270
276
 
277
+ With feature constraint parsing, you can provide instructions to MeCab on
278
+ what feature to use for a matching morpheme. Use the `feature_constraints`
279
+ keyword to pass in a hash mapping a specific morpheme key (String)
280
+ to a corresponding feature (String).
281
+
282
+ # we re-use nm and text from above
283
+
284
+ nm.options
285
+ => {:node_format=>"%m,\\s%f[0],\\s%s"}
286
+
287
+ mapping = {"ヒーロー見参"=>"その他"}
288
+
289
+ nm.enum_parse(text, feature_constraints: mapping).each do |n|
290
+ puts n.feature if !(n.is_bos? || n.is_eos?)
291
+ end
292
+
293
+ # ヒーロー見参 will be treated as a single morpheme mapping to その他
294
+ 心, 名詞, 0
295
+ の, 助詞, 0
296
+ 中, 名詞, 0
297
+ で, 助詞, 0
298
+ 3, 名詞, 1
299
+ 回, 名詞, 0
300
+ 唱え, 動詞, 0
301
+ 、, 記号, 0
302
+ ヒーロー見参, その他, 1
303
+ !, 記号, 0
304
+ ヒーロー見参, その他, 1
305
+ !, 記号, 0
306
+ ヒーロー見参, その他, 1
307
+ !, 記号, 0
271
308
 
272
309
 
273
310
  ## Learn more
@@ -1,3 +1,4 @@
1
+ # coding: utf-8
1
2
  require 'natto/natto'
2
3
 
3
4
  # Copyright (c) 2015, Brooke M. Fujita.
@@ -2,16 +2,12 @@
2
2
  module Natto
3
3
 
4
4
  # Module `Binding` encapsulates methods and behavior
5
- # which are made available via `FFI` bindings to
6
- # `mecab`.
5
+ # which are made available via `FFI` bindings to MeCab.
7
6
  module Binding
8
7
  require 'ffi'
9
8
  require 'rbconfig'
10
9
  extend FFI::Library
11
10
 
12
- # String name for the environment variable used by
13
- # `Natto` to indicate the absolute pathname
14
- # to the `mecab` library.
15
11
  MECAB_PATH = 'MECAB_PATH'.freeze
16
12
 
17
13
  # @private
@@ -19,89 +15,91 @@ module Natto
19
15
  base.extend(ClassMethods)
20
16
  end
21
17
 
22
- # Returns the absolute pathname to the `mecab` library based on
18
+ # Returns the absolute pathname to the MeCab library based on
23
19
  # the runtime environment.
24
- #
25
- # @return [String] absolute pathname to the `mecab` library
20
+ # @return [String] absolute pathname to the MeCab library
26
21
  # @raise [LoadError] if the library cannot be located
27
22
  def self.find_library
28
- return File.absolute_path(ENV[MECAB_PATH]) if ENV[MECAB_PATH]
29
-
30
- host_os = RbConfig::CONFIG['host_os']
31
-
32
- if host_os =~ /mswin|mingw/i
33
- require 'win32/registry'
34
- begin
35
- base = nil
36
- Win32::Registry::HKEY_CURRENT_USER.open('Software\MeCab') do |r|
37
- base = r['mecabrc'].split('etc').first
38
- end
39
- lib = File.join(base, 'bin/libmecab.dll')
40
- File.absolute_path(lib)
41
- rescue
42
- raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.dll"
43
- end
23
+ if ENV[MECAB_PATH]
24
+ File.absolute_path(ENV[MECAB_PATH])
44
25
  else
45
- require 'open3'
46
- if host_os =~ /darwin/i
47
- ext = 'dylib'
26
+ host_os = RbConfig::CONFIG['host_os']
27
+
28
+ if host_os =~ /mswin|mingw/i
29
+ require 'win32/registry'
30
+ begin
31
+ base = nil
32
+ Win32::Registry::HKEY_CURRENT_USER.open('Software\MeCab') do |r|
33
+ base = r['mecabrc'].split('etc').first
34
+ end
35
+ lib = File.join(base, 'bin/libmecab.dll')
36
+ File.absolute_path(lib)
37
+ rescue
38
+ raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.dll"
39
+ end
48
40
  else
49
- ext = 'so'
50
- end
41
+ require 'open3'
42
+ if host_os =~ /darwin/i
43
+ ext = 'dylib'
44
+ else
45
+ ext = 'so'
46
+ end
51
47
 
52
- begin
53
- base, lib = nil, nil
54
- cmd = 'mecab-config --libs'
55
- Open3.popen3(cmd) do |stdin,stdout,stderr|
56
- toks = stdout.read.split
57
- base = toks[0][2..-1]
58
- lib = toks[1][2..-1]
48
+ begin
49
+ base, lib = nil, nil
50
+ cmd = 'mecab-config --libs'
51
+ Open3.popen3(cmd) do |stdin,stdout,stderr|
52
+ toks = stdout.read.split
53
+ base = toks[0][2..-1]
54
+ lib = toks[1][2..-1]
55
+ end
56
+ File.absolute_path(File.join(base, "lib#{lib}.#{ext}"))
57
+ rescue
58
+ raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.#{ext}"
59
59
  end
60
- File.absolute_path(File.join(base, "lib#{lib}.#{ext}"))
61
- rescue
62
- raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.#{ext}"
63
60
  end
64
61
  end
65
62
  end
66
63
 
67
64
  ffi_lib find_library
68
65
 
69
- # C interface
70
- attach_function :mecab_new2, [:string], :pointer
66
+ # Model interface
67
+ attach_function :mecab_model_new2, [:string], :pointer
68
+ attach_function :mecab_model_destroy, [:pointer], :void
69
+ attach_function :mecab_model_new_tagger, [:pointer], :pointer
70
+ attach_function :mecab_model_new_lattice, [:pointer], :pointer
71
+ attach_function :mecab_model_dictionary_info, [:pointer], :pointer
72
+
73
+ # Tagger interface
74
+ attach_function :mecab_destroy, [:pointer], :void
71
75
  attach_function :mecab_version, [], :string
72
76
  attach_function :mecab_strerror, [:pointer],:string
73
- attach_function :mecab_destroy, [:pointer], :void
74
- attach_function :mecab_set_partial, [:pointer, :int], :void
75
- attach_function :mecab_set_theta, [:pointer, :float], :void
76
- attach_function :mecab_set_lattice_level, [:pointer, :int], :void
77
- attach_function :mecab_set_all_morphs, [:pointer, :int], :void
78
- attach_function :mecab_sparse_tostr, [:pointer, :string], :string
79
- attach_function :mecab_sparse_tonode, [:pointer, :string], :pointer
80
- attach_function :mecab_nbest_init, [:pointer, :string], :int
81
- attach_function :mecab_nbest_sparse_tostr, [:pointer, :int, :string], :string
82
- attach_function :mecab_nbest_next_tonode, [:pointer], :pointer
83
77
  attach_function :mecab_format_node, [:pointer, :pointer], :string
84
- attach_function :mecab_dictionary_info, [:pointer], :pointer
85
78
 
86
- attach_function :mecab_lattice_new, [], :pointer
79
+ # Lattice interface
87
80
  attach_function :mecab_lattice_destroy, [:pointer], :void
88
81
  attach_function :mecab_lattice_clear, [:pointer], :void
89
82
  attach_function :mecab_lattice_is_available, [:pointer], :int
90
- attach_function :mecab_lattice_get_bos_node, [:pointer], :pointer
83
+ attach_function :mecab_lattice_strerror, [:pointer], :string
84
+
85
+ attach_function :mecab_lattice_get_sentence, [:pointer], :string
91
86
  attach_function :mecab_lattice_set_sentence, [:pointer, :string], :void
92
87
  attach_function :mecab_lattice_get_size, [:pointer], :int
93
- attach_function :mecab_lattice_set_z, [:pointer, :float], :void
94
88
  attach_function :mecab_lattice_set_theta, [:pointer, :float], :void
95
- attach_function :mecab_lattice_next, [:pointer], :int
89
+ attach_function :mecab_lattice_set_z, [:pointer, :float], :void
96
90
  attach_function :mecab_lattice_get_request_type, [:pointer], :int
97
91
  attach_function :mecab_lattice_add_request_type, [:pointer, :int], :void
98
92
  attach_function :mecab_lattice_set_request_type, [:pointer, :int], :void
99
- attach_function :mecab_lattice_tostr, [:pointer], :string
100
- attach_function :mecab_lattice_nbest_tostr, [:pointer, :int], :string
101
93
  attach_function :mecab_lattice_get_boundary_constraint, [:pointer, :int], :int
102
94
  attach_function :mecab_lattice_set_boundary_constraint, [:pointer, :int, :int], :void
95
+ attach_function :mecab_lattice_get_feature_constraint, [:pointer, :int], :string
96
+ attach_function :mecab_lattice_set_feature_constraint, [:pointer, :int, :int, :string], :void
97
+
103
98
  attach_function :mecab_parse_lattice, [:pointer, :pointer], :int
104
- attach_function :mecab_lattice_strerror, [:pointer], :string
99
+ attach_function :mecab_lattice_next, [:pointer], :int
100
+ attach_function :mecab_lattice_tostr, [:pointer], :string
101
+ attach_function :mecab_lattice_nbest_tostr, [:pointer, :int], :string
102
+ attach_function :mecab_lattice_get_bos_node, [:pointer], :pointer
105
103
 
106
104
  # @private
107
105
  module ClassMethods
@@ -110,75 +108,45 @@ module Natto
110
108
  Natto::Binding.find_library
111
109
  end
112
110
 
113
- # ----------------------------------------
114
- def mecab_new2(options_str)
115
- Natto::Binding.mecab_new2(options_str)
116
- end
117
-
118
- def mecab_version
119
- Natto::Binding.mecab_version
120
- end
121
-
122
- def mecab_strerror(tptr)
123
- Natto::Binding.mecab_strerror(tptr)
111
+ # Model interface ------------------------
112
+ def mecab_model_new2(opts_str)
113
+ Natto::Binding.mecab_model_new2(opts_str)
124
114
  end
125
115
 
126
- def mecab_destroy(tptr)
127
- Natto::Binding.mecab_destroy(tptr)
116
+ def mecab_model_destroy(mptr)
117
+ Natto::Binding.mecab_model_destroy(mptr)
128
118
  end
129
119
 
130
- def mecab_set_partial(tptr, ll)
131
- Natto::Binding.mecab_set_partial(tptr, ll)
132
- end
133
-
134
- def mecab_set_theta(tptr, t)
135
- Natto::Binding.mecab_set_theta(tptr, t)
120
+ def mecab_model_new_tagger(mptr)
121
+ Natto::Binding.mecab_model_new_tagger(mptr)
136
122
  end
137
123
 
138
- def mecab_set_lattice_level(tptr, ll)
139
- Natto::Binding.mecab_set_lattice_level(tptr, ll)
140
- end
141
-
142
- def mecab_set_all_morphs(tptr, am)
143
- Natto::Binding.mecab_set_all_morphs(tptr, am)
124
+ def mecab_model_new_lattice(mptr)
125
+ Natto::Binding.mecab_model_new_lattice(mptr)
144
126
  end
145
127
 
146
- def mecab_sparse_tostr(tptr, str)
147
- Natto::Binding.mecab_sparse_tostr(tptr, str)
148
- end
149
-
150
- def mecab_sparse_tonode(tptr, str)
151
- Natto::Binding.mecab_sparse_tonode(tptr, str)
152
- end
153
-
154
- def mecab_nbest_next_tonode(tptr)
155
- Natto::Binding.mecab_nbest_next_tonode(tptr)
128
+ def mecab_model_dictionary_info(mptr)
129
+ Natto::Binding.mecab_model_dictionary_info(mptr)
156
130
  end
157
131
 
158
- def mecab_nbest_init(tptr, str)
159
- Natto::Binding.mecab_nbest_init(tptr, str)
132
+ # Tagger interface -----------------------
133
+ def mecab_destroy(tptr)
134
+ Natto::Binding.mecab_destroy(tptr)
160
135
  end
161
136
 
162
- def mecab_nbest_sparse_tostr(tptr, n, str)
163
- Natto::Binding.mecab_nbest_sparse_tostr(tptr, n, str)
137
+ def mecab_version
138
+ Natto::Binding.mecab_version
164
139
  end
165
140
 
166
- def mecab_nbest_next_tonode(tptr)
167
- Natto::Binding.mecab_nbest_next_tonode(tptr)
141
+ def mecab_strerror(tptr)
142
+ Natto::Binding.mecab_strerror(tptr)
168
143
  end
169
144
 
170
145
  def mecab_format_node(tptr, nptr)
171
146
  Natto::Binding.mecab_format_node(tptr, nptr)
172
147
  end
173
-
174
- def mecab_dictionary_info(tptr)
175
- Natto::Binding.mecab_dictionary_info(tptr)
176
- end
177
-
178
- def mecab_lattice_new()
179
- Natto::Binding.mecab_lattice_new()
180
- end
181
148
 
149
+ # Lattice interface ----------------------
182
150
  def mecab_lattice_destroy(lptr)
183
151
  Natto::Binding.mecab_lattice_destroy(lptr)
184
152
  end
@@ -191,10 +159,14 @@ module Natto
191
159
  Natto::Binding.mecab_lattice_is_available(lptr)
192
160
  end
193
161
 
194
- def mecab_lattice_get_bos_node(lptr)
195
- Natto::Binding.mecab_lattice_get_bos_node(lptr)
162
+ def mecab_lattice_strerror(lptr)
163
+ Natto::Binding.mecab_lattice_strerror(lptr)
196
164
  end
197
-
165
+
166
+ def mecab_lattice_get_sentence(lptr)
167
+ Natto::Binding.mecab_lattice_get_sentence(lptr)
168
+ end
169
+
198
170
  def mecab_lattice_set_sentence(lptr, str)
199
171
  Natto::Binding.mecab_lattice_set_sentence(lptr, str)
200
172
  end
@@ -202,39 +174,27 @@ module Natto
202
174
  def mecab_lattice_get_size(lptr)
203
175
  Natto::Binding.mecab_lattice_get_size(lptr)
204
176
  end
205
-
206
- def mecab_lattice_set_z(lptr, z)
207
- Natto::Binding.mecab_lattice_set_z(lptr, z)
208
- end
209
-
177
+
210
178
  def mecab_lattice_set_theta(lptr, t)
211
179
  Natto::Binding.mecab_lattice_set_theta(lptr, t)
212
180
  end
213
181
 
214
- def mecab_lattice_next(lptr)
215
- Natto::Binding.mecab_lattice_next(lptr)
182
+ def mecab_lattice_set_z(lptr, z)
183
+ Natto::Binding.mecab_lattice_set_z(lptr, z)
216
184
  end
217
-
185
+
218
186
  def mecab_lattice_get_request_type(lptr)
219
187
  Natto::Binding.mecab_lattice_get_request_type(lptr)
220
188
  end
221
189
 
222
- def mecab_lattice_add_request_type(lptr, rtype)
223
- Natto::Binding.mecab_lattice_add_request_type(lptr, rtype)
190
+ def mecab_lattice_add_request_type(lptr, rt)
191
+ Natto::Binding.mecab_lattice_add_request_type(lptr, rt)
224
192
  end
225
193
 
226
194
  def mecab_lattice_set_request_type(lptr, rtype)
227
195
  Natto::Binding.mecab_lattice_set_request_type(lptr, rtype)
228
196
  end
229
197
 
230
- def mecab_lattice_tostr(lptr)
231
- Natto::Binding.mecab_lattice_tostr(lptr)
232
- end
233
-
234
- def mecab_lattice_nbest_tostr(lptr, n)
235
- Natto::Binding.mecab_lattice_nbest_tostr(lptr, n)
236
- end
237
-
238
198
  def mecab_lattice_get_boundary_constraint(lptr, pos)
239
199
  Natto::Binding.mecab_lattice_get_boundary_constraint(lptr, pos)
240
200
  end
@@ -243,12 +203,35 @@ module Natto
243
203
  Natto::Binding.mecab_lattice_set_boundary_constraint(lptr, pos, btype)
244
204
  end
245
205
 
206
+ def mecab_lattice_get_feature_constraint(lptr, bpos)
207
+ Natto::Binding.mecab_lattice_get_feature_constraint(lptr, bpos)
208
+ end
209
+
210
+ def mecab_lattice_set_feature_constraint(lptr, bpos, epos, feat)
211
+ Natto::Binding.mecab_lattice_set_feature_constraint(lptr,
212
+ bpos,
213
+ epos,
214
+ feat)
215
+ end
216
+
246
217
  def mecab_parse_lattice(tptr, lptr)
247
218
  Natto::Binding.mecab_parse_lattice(tptr, lptr)
248
219
  end
249
220
 
250
- def mecab_lattice_strerror(lptr)
251
- Natto::Binding.mecab_lattice_strerror(lptr)
221
+ def mecab_lattice_next(lptr)
222
+ Natto::Binding.mecab_lattice_next(lptr)
223
+ end
224
+
225
+ def mecab_lattice_tostr(lptr)
226
+ Natto::Binding.mecab_lattice_tostr(lptr)
227
+ end
228
+
229
+ def mecab_lattice_nbest_tostr(lptr, n)
230
+ Natto::Binding.mecab_lattice_nbest_tostr(lptr, n)
231
+ end
232
+
233
+ def mecab_lattice_get_bos_node(lptr)
234
+ Natto::Binding.mecab_lattice_get_bos_node(lptr)
252
235
  end
253
236
  end
254
237
  end