natto 0.9.7 → 0.9.8

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG CHANGED
@@ -1,5 +1,11 @@
1
1
  ## CHANGELOG
2
2
 
3
+ - __2015/02/10__: 0.9.8 release.
4
+ - Migrated natto code home from Bitbucket to GitHub.
5
+ - Improved documentation following said migration.
6
+ - Minor refactoring to Natto::MeCabNode#to_s.
7
+ - Updating LICENSE for year 2015.
8
+
3
9
  - __2014/12/20__: 0.9.7 release.
4
10
  - Issue 14: [adding automatic discovery for mecab library; no need to
5
11
  explicitly set
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2014-2015, Brooke M. Fujita.
1
+ Copyright (c) 2015, Brooke M. Fujita.
2
2
  All rights reserved.
3
3
 
4
4
  Redistribution and use in source and binary forms, with or without
data/README.md CHANGED
@@ -9,10 +9,10 @@ and morphological analyzer for the Japanese language.
9
9
 
10
10
  - No compiler is necessary, as natto is _not_ a C extension.
11
11
  - It will run on CRuby (mri/yarv) and JRuby (jvm) equally well.
12
- - It will work with MeCab installations on Windows, Unix/Linux or Mac OS.
12
+ - It will work with MeCab installations on Windows, Unix/Linux or OS X.
13
13
  - natto provides a naturally Ruby-esque interface to MeCab.
14
14
 
15
- You can learn more about [natto at bitbucket](https://bitbucket.org/buruzaemon/natto/).
15
+ You can learn more about [natto at GitHub](https://github.com/buruzaemon/natto).
16
16
 
17
17
 
18
18
  ## Requirements
@@ -24,7 +24,7 @@ natto requires the following:
24
24
  - Ruby _1.9 or greater_
25
25
  - [ffi _1.9.0 or greater_](http://rubygems.org/gems/ffi)
26
26
 
27
- ## Installation on *nix and Mac OS
27
+ ## Installation on *nix and OS X
28
28
  Install natto with the following gem command:
29
29
 
30
30
  gem install natto
@@ -42,16 +42,16 @@ However, if you are using a CRuby on Windows, then you will first need to instal
42
42
 
43
43
  gem install natto
44
44
 
45
- 6. If you are on a 64-bit Windows and you use a 64-bit Ruby or JRuby, then you might want to [build a 64-bit version of libmecab.dll](https://bitbucket.org/buruzaemon/natto/wiki/64-Bit-Windows).
45
+ 6. If you are on a 64-bit Windows and you use a 64-bit Ruby or JRuby, then you might want to [build a 64-bit version of libmecab.dll](https://github.com/buruzaemon/natto/wiki/64-Bit-Windows).
46
46
 
47
47
 
48
48
  ## Configuration
49
49
  - ***No explicit configuration should be necessary, as natto will try to locate the `mecab` library based upon its runtime environment.***
50
50
  - On Windows, it will query the Windows Registry to determine where `libmecab.dll` is installed
51
- - On Mac OS and \*nix, it will query `mecab-config --libs`
51
+ - On OS X and \*nix, it will query `mecab-config --libs`
52
52
  - ***But if natto cannot find the `mecab` library, `LoadError` will be raised.***
53
53
  - Please set the `MECAB_PATH` environment variable to the exact name/path to your `mecab` library.
54
- - e.g., for Mac OS
54
+ - e.g., for OS X
55
55
 
56
56
  export MECAB_PATH=/usr/local/Cellar/mecab/0.996/lib/libmecab.dylib
57
57
 
@@ -134,7 +134,7 @@ However, if you are using a CRuby on Windows, then you will first need to instal
134
134
  EOS
135
135
 
136
136
  # parse more text and use a block to:
137
- # - iterate the resulting MeCab nodes
137
+ # - iterate over the resulting MeCabNode instances
138
138
  # - output morpheme surface and part-of-speech ID
139
139
  #
140
140
  # * ignore any end-of-sentence nodes
@@ -164,13 +164,11 @@ However, if you are using a CRuby on Windows, then you will first need to instal
164
164
  # language processing tasks, it is far more efficient
165
165
  # to iterate over MeCab nodes using an Enumerator
166
166
  #
167
- # this example uses the node-format option to customize
168
- # the resulting morpheme feature to extract:
169
- # - surface
170
- # - part-of-speech
171
- # - reading
172
- #
173
- # * again, ignore any end-of-sentence nodes
167
+ # this example uses the -F node-format option to customize
168
+ # the resulting MeCabNode feature attribute to extract:
169
+ # - %m ... surface
170
+ # - %f[0] ... part-of-speech
171
+ # - %f[7] ... reading
174
172
  #
175
173
  nm = Natto::MeCab.new('-F%m\t%f[0]\t%f[7]')
176
174
 
@@ -193,7 +191,8 @@ However, if you are using a CRuby on Windows, then you will first need to instal
193
191
 
194
192
  enum.rewind
195
193
 
196
- enum.each { |n| puts n.feature }
194
+ # again, ignore any end-of-sentence nodes
195
+ enum.each { |n| puts n.feature if !n.is_eos? }
197
196
  この 連体詞 コノ
198
197
  星 名詞 ホシ
199
198
  の 助詞 ノ
@@ -215,11 +214,11 @@ However, if you are using a CRuby on Windows, then you will first need to instal
215
214
 
216
215
 
217
216
  ## Learn more
218
- - You can read more about natto on the [project Wiki](https://bitbucket.org/buruzaemon/natto/wiki/Home).
217
+ - You can read more about natto on the [project Wiki](https://github.com/buruzaemon/natto/wiki).
219
218
 
220
219
  ## Contributing to natto
221
- - Use [mercurial](http://mercurial.selenic.com/) and [check out the latest code at bitbucket](https://bitbucket.org/buruzaemon/natto/src/) to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
222
- - [Browse the issue tracker](https://bitbucket.org/buruzaemon/natto/issues/) to make sure someone already hasn't requested it and/or contributed it.
220
+ - Use [git](http://git-scm.com/) and [check out the latest code at GitHub](https://github.com/buruzaemon/natto) to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
221
+ - [Browse the issue tracker](https://github.com/buruzaemon/natto/issues) to make sure someone already hasn't requested it and/or contributed it.
223
222
  - Fork the project.
224
223
  - Start a feature/bugfix branch.
225
224
  - Commit and push until you are happy with your contribution.
@@ -230,4 +229,4 @@ However, if you are using a CRuby on Windows, then you will first need to instal
230
229
  Please see the {file:CHANGELOG} for this gem's release history.
231
230
 
232
231
  ## Copyright
233
- Copyright © 2014-2015, Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.
232
+ Copyright © 2015, Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.
@@ -1,6 +1,6 @@
1
1
  require 'natto/natto'
2
2
 
3
- # Copyright (c) 2014-2015, Brooke M. Fujita.
3
+ # Copyright (c) 2015, Brooke M. Fujita.
4
4
  # All rights reserved.
5
5
  #
6
6
  # Redistribution and use in source and binary forms, with or without
@@ -175,7 +175,7 @@ module Natto
175
175
  end
176
176
  end
177
177
 
178
- # Copyright (c) 2014-2015, Brooke M. Fujita.
178
+ # Copyright (c) 2015, Brooke M. Fujita.
179
179
  # All rights reserved.
180
180
  #
181
181
  # Redistribution and use in source and binary forms, with or without
@@ -13,35 +13,93 @@ module Natto
13
13
  #
14
14
  # require 'natto'
15
15
  #
16
- # nm = Natto::MeCab.new('-Ochasen')
16
+ # text = '凡人にしか見えねえ風景ってのがあるんだよ。'
17
+ #
18
+ # nm = Natto::MeCab.new
17
19
  # => #<Natto::MeCab:0x28d3bdc8 \
18
20
  # @tagger=#<FFI::Pointer address=0x28afb980>, \
19
21
  # @libpath="/usr/local/lib/libmecab.so" \
20
- # @options={:output_format_type=>"chasen"}, \
22
+ # @options={}, \
21
23
  # @dicts=[#<Natto::DictionaryInfo:0x289a1f14 \
22
24
  # @filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic", \
23
- # charset=utf8 \
24
- # type=0>], \
25
+ # charset=utf8, \
26
+ # type=0>], \
25
27
  # @version=0.996>
26
28
  #
27
- # nm.parse('凡人にしか見えねえ風景ってのがあるんだよ。') do |n|
28
- # puts "#{n.surface}\t#{n.feature}"
29
+ # # print entire MeCab result to stdout
30
+ # #
31
+ # puts nm.parse(text)
32
+ # 凡人 名詞,一般,*,*,*,*,凡人,ボンジン,ボンジン
33
+ # に 助詞,格助詞,一般,*,*,*,に,ニ,ニ
34
+ # しか 助詞,係助詞,*,*,*,*,しか,シカ,シカ
35
+ # 見え 動詞,自立,*,*,一段,未然形,見える,ミエ,ミエ
36
+ # ねえ 助動詞,*,*,*,特殊・ナイ,音便基本形,ない,ネエ,ネー
37
+ # 風景 名詞,一般,*,*,*,*,風景,フウケイ,フーケイ
38
+ # って 助詞,格助詞,連語,*,*,*,って,ッテ,ッテ
39
+ # の 名詞,非自立,一般,*,*,*,の,ノ,ノ
40
+ # が 助詞,格助詞,一般,*,*,*,が,ガ,ガ
41
+ # ある 動詞,自立,*,*,五段・ラ行,基本形,ある,アル,アル
42
+ # ん 名詞,非自立,一般,*,*,*,ん,ン,ン
43
+ # だ 助動詞,*,*,*,特殊・ダ,基本形,だ,ダ,ダ
44
+ # よ 助詞,終助詞,*,*,*,*,よ,ヨ,ヨ
45
+ # 。 記号,句点,*,*,*,*,。,。,。
46
+ # EOS
47
+ #
48
+ # # pass a block to iterate over each MeCabNode instance
49
+ # #
50
+ # nm.parse(text) do |n|
51
+ # puts "#{n.surface},#{n.feature}" if !n.is_eos?
29
52
  # end
30
- # 凡人 名詞,一般,*,*,*,*,凡人,ボンジン,ボンジン
31
- # に 助詞,格助詞,一般,*,*,*,に,ニ,ニ
32
- # しか 助詞,係助詞,*,*,*,*,しか,シカ,シカ
33
- # 見え 動詞,自立,*,*,一段,未然形,見える,ミエ,ミエ
34
- # ねえ 助動詞,*,*,*,特殊・ナイ,音便基本形,ない,ネエ,ネー
35
- # 風景 名詞,一般,*,*,*,*,風景,フウケイ,フーケイ
36
- # って 助詞,格助詞,連語,*,*,*,って,ッテ,ッテ
37
- # の 名詞,非自立,一般,*,*,*,の,ノ,ノ
38
- # が 助詞,格助詞,一般,*,*,*,が,ガ,ガ
39
- # ある 動詞,自立,*,*,五段・ラ行,基本形,ある,アル,アル
40
- # ん 名詞,非自立,一般,*,*,*,ん,ン,ン
41
- # だ 助動詞,*,*,*一般,特殊・ダ,基本形,だ,ダ,ダ
42
- # よ 助詞,終助詞,*,*,*,*,よ,ã¨,ヨ
43
- # 。 記号,句点,*,*,*,*,。,。,。
44
- # BOS/EOS,*,*,*,*,*,*,*,*BOS
53
+ # 凡人,名詞,一般,*,*,*,*,凡人,ボンジン,ボンジン
54
+ # に,助詞,格助詞,一般,*,*,*,に,ニ,ニ
55
+ # しか,助詞,係助詞,*,*,*,*,しか,シカ,シカ
56
+ # 見え,動詞,自立,*,*,一段,未然形,見える,ミエ,ミエ
57
+ # ねえ,助動詞,*,*,*,特殊・ナイ,音便基本形,ない,ネエ,ネー
58
+ # 風景,名詞,一般,*,*,*,*,風景,フウケイ,フーケイ
59
+ # って,助詞,格助詞,連語,*,*,*,って,ッテ,ッテ
60
+ # の,名詞,非自立,一般,*,*,*,の,ノ,ノ
61
+ # が,助詞,格助詞,一般,*,*,*,が,ガ,ガ
62
+ # ある,動詞,自立,*,*,五段・ラ行,基本形,ある,アル,アル
63
+ # ん,名詞,非自立,一般,*,*,*,ん,ン,ン
64
+ # だ,助動詞,*,*,*,特殊・ダ,基本形,だ,ダ,ダ
65
+ # よ,助詞,終助詞,*,*,*,*,よ,ヨ,ヨ
66
+ # 。,記号,句点,*,*,*,*,。,。,。
67
+ #
68
+ #
69
+ # # customize MeCabNode feature attribute with node-formatting
70
+ # # %m ... morpheme surface
71
+ # # %F, ... comma-delimited ChaSen feature values
72
+ # # reading (index 7)
73
+ # # part-of-speech (index 0)
74
+ # # %h ... part-of-speech ID (IPADIC)
75
+ # #
76
+ # nm = Natto::MeCab.new('-F%m,%F,[7,0],%h')
77
+ #
78
+ # # Enumerator effectively iterates the MeCabNodes
79
+ # #
80
+ # enum = nm.enum_parse(text)
81
+ # => #<Enumerator: #<Enumerator::Generator:0x29cc5f8>:each>
82
+ #
83
+ # # output the feature attribute of each MeCabNode
84
+ # # only output normal nodes, ignoring any end-of-sentence
85
+ # # or unknown nodes
86
+ # #
87
+ # enum.map.with_index {|n,i| puts "#{i}: #{n.feature}" if n.is_nor?}
88
+ # 0: 凡人,ボンジン,名詞,38
89
+ # 1: に,ニ,助詞,13
90
+ # 2: しか,シカ,助詞,16
91
+ # 3: 見え,ミエ,動詞,31
92
+ # 4: ねえ,ネー,助動詞,25
93
+ # 5: 風景,フーケイ,名詞,38
94
+ # 6: って,ッテ,助詞,15
95
+ # 7: の,ノ,名詞,63
96
+ # 8: が,ガ,助詞,13
97
+ # 9: ある,アル,動詞,31
98
+ # 10: ん,ン,名詞,63
99
+ # 11: だ,ダ,助動詞,25
100
+ # 12: よ,ヨ,助詞,17
101
+ # 13: 。,。,記号,7
102
+ #
45
103
  #
46
104
  class MeCab
47
105
  include Natto::Binding
@@ -89,15 +147,15 @@ module Natto
89
147
  # <i>Use single-quotes to preserve format options that contain escape chars.</i><br/>
90
148
  # e.g.<br/>
91
149
  #
92
- # nm = Natto::MeCab.new(:node_format=>'%m¥t%f[7]¥n')
150
+ # nm = Natto::MeCab.new(node_format: '%m¥t%f[7]¥n')
93
151
  # => #<Natto::MeCab:0x28d2ae10
94
152
  # @tagger=#<FFI::Pointer address=0x28a97980>, \
95
- # @libpath="/usr/local/lib/libmecab.so", \
153
+ # @libpath="/usr/local/lib/libmecab.so", \
96
154
  # @options={:node_format=>"%m¥t%f[7]¥n"}, \
97
155
  # @dicts=[#<Natto::DictionaryInfo:0x28d2a85c \
98
156
  # @filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic" \
99
- # charset=utf8, \
100
- # type=0>] \
157
+ # charset=utf8, \
158
+ # type=0>] \
101
159
  # @version=0.996>
102
160
  #
103
161
  # puts nm.parse('才能とは求める人間に与えられるものではない。')
@@ -121,6 +179,7 @@ module Natto
121
179
  def initialize(options={})
122
180
  @options = self.class.parse_mecab_options(options)
123
181
  @dicts = []
182
+ # TODO invoke function for enhancing MeCabNode after this point
124
183
 
125
184
  opt_str = self.class.build_options_str(@options)
126
185
  @tagger = self.class.mecab_new2(opt_str)
@@ -362,7 +421,7 @@ module Natto
362
421
  class MeCabError < RuntimeError; end
363
422
  end
364
423
 
365
- # Copyright (c) 2014-2015, Brooke M. Fujita.
424
+ # Copyright (c) 2015, Brooke M. Fujita.
366
425
  # All rights reserved.
367
426
  #
368
427
  # Redistribution and use in source and binary forms, with or without
@@ -1,3 +1,4 @@
1
+ # coding: utf-8
1
2
  module Natto
2
3
 
3
4
  # Module `OptionParse` encapsulates methods and behavior
@@ -116,7 +117,7 @@ module Natto
116
117
  end
117
118
  end
118
119
 
119
- # Copyright (c) 2014-2015, Brooke M. Fujita.
120
+ # Copyright (c) 2015, Brooke M. Fujita.
120
121
  # All rights reserved.
121
122
  #
122
123
  # Redistribution and use in source and binary forms, with or without
@@ -49,9 +49,11 @@ module Natto
49
49
  # puts sysdic.filepath
50
50
  # => /usr/local/lib/mecab/dic/ipadic/sys.dic
51
51
  #
52
+ # # what charset (encoding) is the system dictionary?
52
53
  # puts sysdic.charset
53
54
  # => utf8
54
55
  #
56
+ # # is this really the system dictionary?
55
57
  # puts sysdic.is_sysdic?
56
58
  # => true
57
59
  class DictionaryInfo < MeCabStruct
@@ -60,8 +62,10 @@ module Natto
60
62
 
61
63
  # System dictionary.
62
64
  SYS_DIC = 0
65
+
63
66
  # User dictionary.
64
67
  USR_DIC = 1
68
+
65
69
  # Unknown dictionary.
66
70
  UNK_DIC = 2
67
71
 
@@ -263,7 +267,11 @@ module Natto
263
267
  #
264
268
  # @return [String] encoded object id, underlying FFI pointer, stat, surface, and feature
265
269
  def to_s
266
- %(#{super.chop} @pointer=#{@pointer}, stat=#{self[:stat]}, @surface="#{self.surface}", @feature="#{self.feature}">)
270
+ [ super.chop,
271
+ "@pointer=#{@pointer},",
272
+ "stat=#{self[:stat]},",
273
+ "@surface=\"#{self.surface}\",",
274
+ "@feature=\"#{self.feature}\">" ].join(' ')
267
275
  end
268
276
 
269
277
  # Overrides `Object#inspect`.
@@ -303,10 +311,13 @@ module Natto
303
311
  def is_eon?
304
312
  self.stat == EON_NODE
305
313
  end
314
+
315
+
306
316
  end
317
+
307
318
  end
308
319
 
309
- # Copyright (c) 2014-2015, Brooke M. Fujita.
320
+ # Copyright (c) 2015, Brooke M. Fujita.
310
321
  # All rights reserved.
311
322
  #
312
323
  # Redistribution and use in source and binary forms, with or without
@@ -27,10 +27,10 @@
27
27
  # `Natto`.
28
28
  module Natto
29
29
  # Version string for this Rubygem.
30
- VERSION = "0.9.7"
30
+ VERSION = "0.9.8"
31
31
  end
32
32
 
33
- # Copyright (c) 2014-2015, Brooke M. Fujita.
33
+ # Copyright (c) 2015, Brooke M. Fujita.
34
34
  # All rights reserved.
35
35
  #
36
36
  # Redistribution and use in source and binary forms, with or without
metadata CHANGED
@@ -1,73 +1,79 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: natto
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.7
4
+ version: 0.9.8
5
+ prerelease:
5
6
  platform: ruby
6
7
  authors:
7
8
  - Brooke M. Fujita
8
9
  autorequire:
9
10
  bindir: bin
10
11
  cert_chain: []
11
- date: 2014-12-20 00:00:00.000000000 Z
12
+ date: 2015-02-10 00:00:00.000000000 Z
12
13
  dependencies:
13
14
  - !ruby/object:Gem::Dependency
14
15
  name: ffi
15
16
  requirement: !ruby/object:Gem::Requirement
17
+ none: false
16
18
  requirements:
17
- - - '>='
19
+ - - ! '>='
18
20
  - !ruby/object:Gem::Version
19
21
  version: 1.9.0
20
22
  type: :runtime
21
23
  prerelease: false
22
24
  version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
23
26
  requirements:
24
- - - '>='
27
+ - - ! '>='
25
28
  - !ruby/object:Gem::Version
26
29
  version: 1.9.0
27
- description: |
28
- No compiler is necessary, as natto is not a C extension. It will run on CRuby (mri/yarv) and JRuby (jvm) equally well. It will also run on Windows, Unix/Linux, and Mac OS. natto provides a naturally Ruby-esque interface to MeCab.
30
+ description: ! 'No compiler is necessary, as natto is not a C extension. It will run
31
+ on CRuby (mri/yarv) and JRuby (jvm) equally well. It will also run on Windows, Unix/Linux,
32
+ and OS X. natto provides a naturally Ruby-esque interface to MeCab.
33
+
34
+ '
29
35
  email: buruzaemon@gmail.com
30
36
  executables: []
31
37
  extensions: []
32
38
  extra_rdoc_files: []
33
39
  files:
34
- - .yardopts
35
- - CHANGELOG
36
- - LICENSE
37
- - README.md
38
40
  - lib/natto.rb
39
41
  - lib/natto/binding.rb
40
42
  - lib/natto/natto.rb
41
43
  - lib/natto/option_parse.rb
42
44
  - lib/natto/struct.rb
43
45
  - lib/natto/version.rb
44
- homepage: https://bitbucket.org/buruzaemon/natto
46
+ - README.md
47
+ - LICENSE
48
+ - CHANGELOG
49
+ - .yardopts
50
+ homepage: https://github.com/buruzaemon/natto
45
51
  licenses:
46
52
  - BSD
47
- metadata: {}
48
53
  post_install_message:
49
54
  rdoc_options: []
50
55
  require_paths:
51
56
  - lib
52
57
  required_ruby_version: !ruby/object:Gem::Requirement
58
+ none: false
53
59
  requirements:
54
- - - '>='
60
+ - - ! '>='
55
61
  - !ruby/object:Gem::Version
56
62
  version: '1.9'
57
63
  required_rubygems_version: !ruby/object:Gem::Requirement
64
+ none: false
58
65
  requirements:
59
- - - '>='
66
+ - - ! '>='
60
67
  - !ruby/object:Gem::Version
61
68
  version: '0'
62
69
  requirements:
63
70
  - MeCab, 0.996 or greater
64
71
  - FFI, 1.9.0 or greater
65
72
  rubyforge_project:
66
- rubygems_version: 2.4.1
73
+ rubygems_version: 1.8.23.2
67
74
  signing_key:
68
- specification_version: 4
75
+ specification_version: 3
69
76
  summary: A gem leveraging FFI (foreign function interface), natto combines the Ruby
70
77
  programming language with MeCab, the part-of-speech and morphological analyzer for
71
78
  the Japanese language.
72
79
  test_files: []
73
- has_rdoc:
checksums.yaml DELETED
@@ -1,7 +0,0 @@
1
- ---
2
- SHA1:
3
- metadata.gz: fad99a300fd0a04d95e5ffacb7352b3855506e85
4
- data.tar.gz: 1e9ba71a7690d14099f45d0350fba7d388b7e4e9
5
- SHA512:
6
- metadata.gz: 185db00a5a3fba01b27ad27ea0e89e03e698b8d5ccfbef400539c0e48648ab77abe5b90e5cad9c7777dc5d6a79297b4dea99e3ecdae4111bb25f8d78614b164c
7
- data.tar.gz: fec5fd24301277deff762c68762b89fec9f33736a6cf918b7d4ec61a9019ff03433d6f96528d8220ca7f35a352950afd93f4b5500cf5fe9baf5b0ccbedeb5efe