natto 0.9.7 → 0.9.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/CHANGELOG CHANGED
@@ -1,5 +1,11 @@
1
1
  ## CHANGELOG
2
2
 
3
+ - __2015/02/10__: 0.9.8 release.
4
+ - Migrated natto code home from Bitbucket to GitHub.
5
+ - Improved documentation following said migration.
6
+ - Minor refactoring to Natto::MeCabNode#to_s.
7
+ - Updating LICENSE for year 2015.
8
+
3
9
  - __2014/12/20__: 0.9.7 release.
4
10
  - Issue 14: [adding automatic discovery for mecab library; no need to
5
11
  explicitly set
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2014-2015, Brooke M. Fujita.
1
+ Copyright (c) 2015, Brooke M. Fujita.
2
2
  All rights reserved.
3
3
 
4
4
  Redistribution and use in source and binary forms, with or without
data/README.md CHANGED
@@ -9,10 +9,10 @@ and morphological analyzer for the Japanese language.
9
9
 
10
10
  - No compiler is necessary, as natto is _not_ a C extension.
11
11
  - It will run on CRuby (mri/yarv) and JRuby (jvm) equally well.
12
- - It will work with MeCab installations on Windows, Unix/Linux or Mac OS.
12
+ - It will work with MeCab installations on Windows, Unix/Linux or OS X.
13
13
  - natto provides a naturally Ruby-esque interface to MeCab.
14
14
 
15
- You can learn more about [natto at bitbucket](https://bitbucket.org/buruzaemon/natto/).
15
+ You can learn more about [natto at GitHub](https://github.com/buruzaemon/natto).
16
16
 
17
17
 
18
18
  ## Requirements
@@ -24,7 +24,7 @@ natto requires the following:
24
24
  - Ruby _1.9 or greater_
25
25
  - [ffi _1.9.0 or greater_](http://rubygems.org/gems/ffi)
26
26
 
27
- ## Installation on *nix and Mac OS
27
+ ## Installation on *nix and OS X
28
28
  Install natto with the following gem command:
29
29
 
30
30
  gem install natto
@@ -42,16 +42,16 @@ However, if you are using a CRuby on Windows, then you will first need to instal
42
42
 
43
43
  gem install natto
44
44
 
45
- 6. If you are on a 64-bit Windows and you use a 64-bit Ruby or JRuby, then you might want to [build a 64-bit version of libmecab.dll](https://bitbucket.org/buruzaemon/natto/wiki/64-Bit-Windows).
45
+ 6. If you are on a 64-bit Windows and you use a 64-bit Ruby or JRuby, then you might want to [build a 64-bit version of libmecab.dll](https://github.com/buruzaemon/natto/wiki/64-Bit-Windows).
46
46
 
47
47
 
48
48
  ## Configuration
49
49
  - ***No explicit configuration should be necessary, as natto will try to locate the `mecab` library based upon its runtime environment.***
50
50
  - On Windows, it will query the Windows Registry to determine where `libmecab.dll` is installed
51
- - On Mac OS and \*nix, it will query `mecab-config --libs`
51
+ - On OS X and \*nix, it will query `mecab-config --libs`
52
52
  - ***But if natto cannot find the `mecab` library, `LoadError` will be raised.***
53
53
  - Please set the `MECAB_PATH` environment variable to the exact name/path to your `mecab` library.
54
- - e.g., for Mac OS
54
+ - e.g., for OS X
55
55
 
56
56
  export MECAB_PATH=/usr/local/Cellar/mecab/0.996/lib/libmecab.dylib
57
57
 
@@ -134,7 +134,7 @@ However, if you are using a CRuby on Windows, then you will first need to instal
134
134
  EOS
135
135
 
136
136
  # parse more text and use a block to:
137
- # - iterate the resulting MeCab nodes
137
+ # - iterate over the resulting MeCabNode instances
138
138
  # - output morpheme surface and part-of-speech ID
139
139
  #
140
140
  # * ignore any end-of-sentence nodes
@@ -164,13 +164,11 @@ However, if you are using a CRuby on Windows, then you will first need to instal
164
164
  # language processing tasks, it is far more efficient
165
165
  # to iterate over MeCab nodes using an Enumerator
166
166
  #
167
- # this example uses the node-format option to customize
168
- # the resulting morpheme feature to extract:
169
- # - surface
170
- # - part-of-speech
171
- # - reading
172
- #
173
- # * again, ignore any end-of-sentence nodes
167
+ # this example uses the -F node-format option to customize
168
+ # the resulting MeCabNode feature attribute to extract:
169
+ # - %m ... surface
170
+ # - %f[0] ... part-of-speech
171
+ # - %f[7] ... reading
174
172
  #
175
173
  nm = Natto::MeCab.new('-F%m\t%f[0]\t%f[7]')
176
174
 
@@ -193,7 +191,8 @@ However, if you are using a CRuby on Windows, then you will first need to instal
193
191
 
194
192
  enum.rewind
195
193
 
196
- enum.each { |n| puts n.feature }
194
+ # again, ignore any end-of-sentence nodes
195
+ enum.each { |n| puts n.feature if !n.is_eos? }
197
196
  この 連体詞 コノ
198
197
  星 名詞 ホシ
199
198
  の 助詞 ノ
@@ -215,11 +214,11 @@ However, if you are using a CRuby on Windows, then you will first need to instal
215
214
 
216
215
 
217
216
  ## Learn more
218
- - You can read more about natto on the [project Wiki](https://bitbucket.org/buruzaemon/natto/wiki/Home).
217
+ - You can read more about natto on the [project Wiki](https://github.com/buruzaemon/natto/wiki).
219
218
 
220
219
  ## Contributing to natto
221
- - Use [mercurial](http://mercurial.selenic.com/) and [check out the latest code at bitbucket](https://bitbucket.org/buruzaemon/natto/src/) to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
222
- - [Browse the issue tracker](https://bitbucket.org/buruzaemon/natto/issues/) to make sure someone already hasn't requested it and/or contributed it.
220
+ - Use [git](http://git-scm.com/) and [check out the latest code at GitHub](https://github.com/buruzaemon/natto) to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
221
+ - [Browse the issue tracker](https://github.com/buruzaemon/natto/issues) to make sure someone already hasn't requested it and/or contributed it.
223
222
  - Fork the project.
224
223
  - Start a feature/bugfix branch.
225
224
  - Commit and push until you are happy with your contribution.
@@ -230,4 +229,4 @@ However, if you are using a CRuby on Windows, then you will first need to instal
230
229
  Please see the {file:CHANGELOG} for this gem's release history.
231
230
 
232
231
  ## Copyright
233
- Copyright © 2014-2015, Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.
232
+ Copyright © 2015, Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.
@@ -1,6 +1,6 @@
1
1
  require 'natto/natto'
2
2
 
3
- # Copyright (c) 2014-2015, Brooke M. Fujita.
3
+ # Copyright (c) 2015, Brooke M. Fujita.
4
4
  # All rights reserved.
5
5
  #
6
6
  # Redistribution and use in source and binary forms, with or without
@@ -175,7 +175,7 @@ module Natto
175
175
  end
176
176
  end
177
177
 
178
- # Copyright (c) 2014-2015, Brooke M. Fujita.
178
+ # Copyright (c) 2015, Brooke M. Fujita.
179
179
  # All rights reserved.
180
180
  #
181
181
  # Redistribution and use in source and binary forms, with or without
@@ -13,35 +13,93 @@ module Natto
13
13
  #
14
14
  # require 'natto'
15
15
  #
16
- # nm = Natto::MeCab.new('-Ochasen')
16
+ # text = '凡人にしか見えねえ風景ってのがあるんだよ。'
17
+ #
18
+ # nm = Natto::MeCab.new
17
19
  # => #<Natto::MeCab:0x28d3bdc8 \
18
20
  # @tagger=#<FFI::Pointer address=0x28afb980>, \
19
21
  # @libpath="/usr/local/lib/libmecab.so" \
20
- # @options={:output_format_type=>"chasen"}, \
22
+ # @options={}, \
21
23
  # @dicts=[#<Natto::DictionaryInfo:0x289a1f14 \
22
24
  # @filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic", \
23
- # charset=utf8 \
24
- # type=0>], \
25
+ # charset=utf8, \
26
+ # type=0>], \
25
27
  # @version=0.996>
26
28
  #
27
- # nm.parse('凡人にしか見えねえ風景ってのがあるんだよ。') do |n|
28
- # puts "#{n.surface}\t#{n.feature}"
29
+ # # print entire MeCab result to stdout
30
+ # #
31
+ # puts nm.parse(text)
32
+ # 凡人 名詞,一般,*,*,*,*,凡人,ボンジン,ボンジン
33
+ # に 助詞,格助詞,一般,*,*,*,に,ニ,ニ
34
+ # しか 助詞,係助詞,*,*,*,*,しか,シカ,シカ
35
+ # 見え 動詞,自立,*,*,一段,未然形,見える,ミエ,ミエ
36
+ # ねえ 助動詞,*,*,*,特殊・ナイ,音便基本形,ない,ネエ,ネー
37
+ # 風景 名詞,一般,*,*,*,*,風景,フウケイ,フーケイ
38
+ # って 助詞,格助詞,連語,*,*,*,って,ッテ,ッテ
39
+ # の 名詞,非自立,一般,*,*,*,の,ノ,ノ
40
+ # が 助詞,格助詞,一般,*,*,*,が,ガ,ガ
41
+ # ある 動詞,自立,*,*,五段・ラ行,基本形,ある,アル,アル
42
+ # ん 名詞,非自立,一般,*,*,*,ん,ン,ン
43
+ # だ 助動詞,*,*,*,特殊・ダ,基本形,だ,ダ,ダ
44
+ # よ 助詞,終助詞,*,*,*,*,よ,ヨ,ヨ
45
+ # 。 記号,句点,*,*,*,*,。,。,。
46
+ # EOS
47
+ #
48
+ # # pass a block to iterate over each MeCabNode instance
49
+ # #
50
+ # nm.parse(text) do |n|
51
+ # puts "#{n.surface},#{n.feature}" if !n.is_eos?
29
52
  # end
30
- # 凡人 名詞,一般,*,*,*,*,凡人,ボンジン,ボンジン
31
- # に 助詞,格助詞,一般,*,*,*,に,ニ,ニ
32
- # しか 助詞,係助詞,*,*,*,*,しか,シカ,シカ
33
- # 見え 動詞,自立,*,*,一段,未然形,見える,ミエ,ミエ
34
- # ねえ 助動詞,*,*,*,特殊・ナイ,音便基本形,ない,ネエ,ネー
35
- # 風景 名詞,一般,*,*,*,*,風景,フウケイ,フーケイ
36
- # って 助詞,格助詞,連語,*,*,*,って,ッテ,ッテ
37
- # の 名詞,非自立,一般,*,*,*,の,ノ,ノ
38
- # が 助詞,格助詞,一般,*,*,*,が,ガ,ガ
39
- # ある 動詞,自立,*,*,五段・ラ行,基本形,ある,アル,アル
40
- # ん 名詞,非自立,一般,*,*,*,ん,ン,ン
41
- # だ 助動詞,*,*,*一般,特殊・ダ,基本形,だ,ダ,ダ
42
- # よ 助詞,終助詞,*,*,*,*,よ,ã¨,ヨ
43
- # 。 記号,句点,*,*,*,*,。,。,。
44
- # BOS/EOS,*,*,*,*,*,*,*,*BOS
53
+ # 凡人,名詞,一般,*,*,*,*,凡人,ボンジン,ボンジン
54
+ # に,助詞,格助詞,一般,*,*,*,に,ニ,ニ
55
+ # しか,助詞,係助詞,*,*,*,*,しか,シカ,シカ
56
+ # 見え,動詞,自立,*,*,一段,未然形,見える,ミエ,ミエ
57
+ # ねえ,助動詞,*,*,*,特殊・ナイ,音便基本形,ない,ネエ,ネー
58
+ # 風景,名詞,一般,*,*,*,*,風景,フウケイ,フーケイ
59
+ # って,助詞,格助詞,連語,*,*,*,って,ッテ,ッテ
60
+ # の,名詞,非自立,一般,*,*,*,の,ノ,ノ
61
+ # が,助詞,格助詞,一般,*,*,*,が,ガ,ガ
62
+ # ある,動詞,自立,*,*,五段・ラ行,基本形,ある,アル,アル
63
+ # ん,名詞,非自立,一般,*,*,*,ん,ン,ン
64
+ # だ,助動詞,*,*,*,特殊・ダ,基本形,だ,ダ,ダ
65
+ # よ,助詞,終助詞,*,*,*,*,よ,ヨ,ヨ
66
+ # 。,記号,句点,*,*,*,*,。,。,。
67
+ #
68
+ #
69
+ # # customize MeCabNode feature attribute with node-formatting
70
+ # # %m ... morpheme surface
71
+ # # %F, ... comma-delimited ChaSen feature values
72
+ # # reading (index 7)
73
+ # # part-of-speech (index 0)
74
+ # # %h ... part-of-speech ID (IPADIC)
75
+ # #
76
+ # nm = Natto::MeCab.new('-F%m,%F,[7,0],%h')
77
+ #
78
+ # # Enumerator effectively iterates the MeCabNodes
79
+ # #
80
+ # enum = nm.enum_parse(text)
81
+ # => #<Enumerator: #<Enumerator::Generator:0x29cc5f8>:each>
82
+ #
83
+ # # output the feature attribute of each MeCabNode
84
+ # # only output normal nodes, ignoring any end-of-sentence
85
+ # # or unknown nodes
86
+ # #
87
+ # enum.map.with_index {|n,i| puts "#{i}: #{n.feature}" if n.is_nor?}
88
+ # 0: 凡人,ボンジン,名詞,38
89
+ # 1: に,ニ,助詞,13
90
+ # 2: しか,シカ,助詞,16
91
+ # 3: 見え,ミエ,動詞,31
92
+ # 4: ねえ,ネー,助動詞,25
93
+ # 5: 風景,フーケイ,名詞,38
94
+ # 6: って,ッテ,助詞,15
95
+ # 7: の,ノ,名詞,63
96
+ # 8: が,ガ,助詞,13
97
+ # 9: ある,アル,動詞,31
98
+ # 10: ん,ン,名詞,63
99
+ # 11: だ,ダ,助動詞,25
100
+ # 12: よ,ヨ,助詞,17
101
+ # 13: 。,。,記号,7
102
+ #
45
103
  #
46
104
  class MeCab
47
105
  include Natto::Binding
@@ -89,15 +147,15 @@ module Natto
89
147
  # <i>Use single-quotes to preserve format options that contain escape chars.</i><br/>
90
148
  # e.g.<br/>
91
149
  #
92
- # nm = Natto::MeCab.new(:node_format=>'%m¥t%f[7]¥n')
150
+ # nm = Natto::MeCab.new(node_format: '%m¥t%f[7]¥n')
93
151
  # => #<Natto::MeCab:0x28d2ae10
94
152
  # @tagger=#<FFI::Pointer address=0x28a97980>, \
95
- # @libpath="/usr/local/lib/libmecab.so", \
153
+ # @libpath="/usr/local/lib/libmecab.so", \
96
154
  # @options={:node_format=>"%m¥t%f[7]¥n"}, \
97
155
  # @dicts=[#<Natto::DictionaryInfo:0x28d2a85c \
98
156
  # @filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic" \
99
- # charset=utf8, \
100
- # type=0>] \
157
+ # charset=utf8, \
158
+ # type=0>] \
101
159
  # @version=0.996>
102
160
  #
103
161
  # puts nm.parse('才能とは求める人間に与えられるものではない。')
@@ -121,6 +179,7 @@ module Natto
121
179
  def initialize(options={})
122
180
  @options = self.class.parse_mecab_options(options)
123
181
  @dicts = []
182
+ # TODO invoke function for enhancing MeCabNode after this point
124
183
 
125
184
  opt_str = self.class.build_options_str(@options)
126
185
  @tagger = self.class.mecab_new2(opt_str)
@@ -362,7 +421,7 @@ module Natto
362
421
  class MeCabError < RuntimeError; end
363
422
  end
364
423
 
365
- # Copyright (c) 2014-2015, Brooke M. Fujita.
424
+ # Copyright (c) 2015, Brooke M. Fujita.
366
425
  # All rights reserved.
367
426
  #
368
427
  # Redistribution and use in source and binary forms, with or without
@@ -1,3 +1,4 @@
1
+ # coding: utf-8
1
2
  module Natto
2
3
 
3
4
  # Module `OptionParse` encapsulates methods and behavior
@@ -116,7 +117,7 @@ module Natto
116
117
  end
117
118
  end
118
119
 
119
- # Copyright (c) 2014-2015, Brooke M. Fujita.
120
+ # Copyright (c) 2015, Brooke M. Fujita.
120
121
  # All rights reserved.
121
122
  #
122
123
  # Redistribution and use in source and binary forms, with or without
@@ -49,9 +49,11 @@ module Natto
49
49
  # puts sysdic.filepath
50
50
  # => /usr/local/lib/mecab/dic/ipadic/sys.dic
51
51
  #
52
+ # # what charset (encoding) is the system dictionary?
52
53
  # puts sysdic.charset
53
54
  # => utf8
54
55
  #
56
+ # # is this really the system dictionary?
55
57
  # puts sysdic.is_sysdic?
56
58
  # => true
57
59
  class DictionaryInfo < MeCabStruct
@@ -60,8 +62,10 @@ module Natto
60
62
 
61
63
  # System dictionary.
62
64
  SYS_DIC = 0
65
+
63
66
  # User dictionary.
64
67
  USR_DIC = 1
68
+
65
69
  # Unknown dictionary.
66
70
  UNK_DIC = 2
67
71
 
@@ -263,7 +267,11 @@ module Natto
263
267
  #
264
268
  # @return [String] encoded object id, underlying FFI pointer, stat, surface, and feature
265
269
  def to_s
266
- %(#{super.chop} @pointer=#{@pointer}, stat=#{self[:stat]}, @surface="#{self.surface}", @feature="#{self.feature}">)
270
+ [ super.chop,
271
+ "@pointer=#{@pointer},",
272
+ "stat=#{self[:stat]},",
273
+ "@surface=\"#{self.surface}\",",
274
+ "@feature=\"#{self.feature}\">" ].join(' ')
267
275
  end
268
276
 
269
277
  # Overrides `Object#inspect`.
@@ -303,10 +311,13 @@ module Natto
303
311
  def is_eon?
304
312
  self.stat == EON_NODE
305
313
  end
314
+
315
+
306
316
  end
317
+
307
318
  end
308
319
 
309
- # Copyright (c) 2014-2015, Brooke M. Fujita.
320
+ # Copyright (c) 2015, Brooke M. Fujita.
310
321
  # All rights reserved.
311
322
  #
312
323
  # Redistribution and use in source and binary forms, with or without
@@ -27,10 +27,10 @@
27
27
  # `Natto`.
28
28
  module Natto
29
29
  # Version string for this Rubygem.
30
- VERSION = "0.9.7"
30
+ VERSION = "0.9.8"
31
31
  end
32
32
 
33
- # Copyright (c) 2014-2015, Brooke M. Fujita.
33
+ # Copyright (c) 2015, Brooke M. Fujita.
34
34
  # All rights reserved.
35
35
  #
36
36
  # Redistribution and use in source and binary forms, with or without
metadata CHANGED
@@ -1,73 +1,79 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: natto
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.7
4
+ version: 0.9.8
5
+ prerelease:
5
6
  platform: ruby
6
7
  authors:
7
8
  - Brooke M. Fujita
8
9
  autorequire:
9
10
  bindir: bin
10
11
  cert_chain: []
11
- date: 2014-12-20 00:00:00.000000000 Z
12
+ date: 2015-02-10 00:00:00.000000000 Z
12
13
  dependencies:
13
14
  - !ruby/object:Gem::Dependency
14
15
  name: ffi
15
16
  requirement: !ruby/object:Gem::Requirement
17
+ none: false
16
18
  requirements:
17
- - - '>='
19
+ - - ! '>='
18
20
  - !ruby/object:Gem::Version
19
21
  version: 1.9.0
20
22
  type: :runtime
21
23
  prerelease: false
22
24
  version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
23
26
  requirements:
24
- - - '>='
27
+ - - ! '>='
25
28
  - !ruby/object:Gem::Version
26
29
  version: 1.9.0
27
- description: |
28
- No compiler is necessary, as natto is not a C extension. It will run on CRuby (mri/yarv) and JRuby (jvm) equally well. It will also run on Windows, Unix/Linux, and Mac OS. natto provides a naturally Ruby-esque interface to MeCab.
30
+ description: ! 'No compiler is necessary, as natto is not a C extension. It will run
31
+ on CRuby (mri/yarv) and JRuby (jvm) equally well. It will also run on Windows, Unix/Linux,
32
+ and OS X. natto provides a naturally Ruby-esque interface to MeCab.
33
+
34
+ '
29
35
  email: buruzaemon@gmail.com
30
36
  executables: []
31
37
  extensions: []
32
38
  extra_rdoc_files: []
33
39
  files:
34
- - .yardopts
35
- - CHANGELOG
36
- - LICENSE
37
- - README.md
38
40
  - lib/natto.rb
39
41
  - lib/natto/binding.rb
40
42
  - lib/natto/natto.rb
41
43
  - lib/natto/option_parse.rb
42
44
  - lib/natto/struct.rb
43
45
  - lib/natto/version.rb
44
- homepage: https://bitbucket.org/buruzaemon/natto
46
+ - README.md
47
+ - LICENSE
48
+ - CHANGELOG
49
+ - .yardopts
50
+ homepage: https://github.com/buruzaemon/natto
45
51
  licenses:
46
52
  - BSD
47
- metadata: {}
48
53
  post_install_message:
49
54
  rdoc_options: []
50
55
  require_paths:
51
56
  - lib
52
57
  required_ruby_version: !ruby/object:Gem::Requirement
58
+ none: false
53
59
  requirements:
54
- - - '>='
60
+ - - ! '>='
55
61
  - !ruby/object:Gem::Version
56
62
  version: '1.9'
57
63
  required_rubygems_version: !ruby/object:Gem::Requirement
64
+ none: false
58
65
  requirements:
59
- - - '>='
66
+ - - ! '>='
60
67
  - !ruby/object:Gem::Version
61
68
  version: '0'
62
69
  requirements:
63
70
  - MeCab, 0.996 or greater
64
71
  - FFI, 1.9.0 or greater
65
72
  rubyforge_project:
66
- rubygems_version: 2.4.1
73
+ rubygems_version: 1.8.23.2
67
74
  signing_key:
68
- specification_version: 4
75
+ specification_version: 3
69
76
  summary: A gem leveraging FFI (foreign function interface), natto combines the Ruby
70
77
  programming language with MeCab, the part-of-speech and morphological analyzer for
71
78
  the Japanese language.
72
79
  test_files: []
73
- has_rdoc:
checksums.yaml DELETED
@@ -1,7 +0,0 @@
1
- ---
2
- SHA1:
3
- metadata.gz: fad99a300fd0a04d95e5ffacb7352b3855506e85
4
- data.tar.gz: 1e9ba71a7690d14099f45d0350fba7d388b7e4e9
5
- SHA512:
6
- metadata.gz: 185db00a5a3fba01b27ad27ea0e89e03e698b8d5ccfbef400539c0e48648ab77abe5b90e5cad9c7777dc5d6a79297b4dea99e3ecdae4111bb25f8d78614b164c
7
- data.tar.gz: fec5fd24301277deff762c68762b89fec9f33736a6cf918b7d4ec61a9019ff03433d6f96528d8220ca7f35a352950afd93f4b5500cf5fe9baf5b0ccbedeb5efe