natto 0.9.6 → 0.9.7
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG +18 -0
- data/LICENSE +13 -11
- data/README.md +233 -108
- data/lib/natto.rb +26 -0
- data/lib/natto/binding.rb +69 -25
- data/lib/natto/natto.rb +166 -72
- data/lib/natto/option_parse.rb +26 -0
- data/lib/natto/struct.rb +103 -80
- data/lib/natto/version.rb +27 -1
- metadata +12 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: fad99a300fd0a04d95e5ffacb7352b3855506e85
|
4
|
+
data.tar.gz: 1e9ba71a7690d14099f45d0350fba7d388b7e4e9
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 185db00a5a3fba01b27ad27ea0e89e03e698b8d5ccfbef400539c0e48648ab77abe5b90e5cad9c7777dc5d6a79297b4dea99e3ecdae4111bb25f8d78614b164c
|
7
|
+
data.tar.gz: fec5fd24301277deff762c68762b89fec9f33736a6cf918b7d4ec61a9019ff03433d6f96528d8220ca7f35a352950afd93f4b5500cf5fe9baf5b0ccbedeb5efe
|
data/CHANGELOG
CHANGED
@@ -1,5 +1,23 @@
|
|
1
1
|
## CHANGELOG
|
2
2
|
|
3
|
+
- __2014/12/20__: 0.9.7 release.
|
4
|
+
- Issue 14: [adding automatic discovery for mecab library; no need to
|
5
|
+
explicitly set
|
6
|
+
MECAB_PATH!](https://bitbucket.org/buruzaemon/natto/issue/14/automatic-discovery-of-libmecab-path-and)
|
7
|
+
- Issue 15: [refactored node-parsing to use Enumerator instead of
|
8
|
+
materializing every node and stuffing into
|
9
|
+
array](https://bitbucket.org/buruzaemon/natto/issue/15/use-enumerator-when-parsing-mecab-nodes)
|
10
|
+
- Issue 17: [adding filepath to MeCab and
|
11
|
+
DictionaryInfo](https://bitbucket.org/buruzaemon/natto/issue/17/use-filerealpath-value-for-all-file-paths)
|
12
|
+
- Issue 18: [bug-fix for node-formatting during default node
|
13
|
+
parse](https://bitbucket.org/buruzaemon/natto/issue/18/no-node-formatting-when-using-default-node)
|
14
|
+
- Deprecating parse_as_nodes and parse_as_strings; please use parse instead!
|
15
|
+
- CAUTION: parse_as_nodes, parse_as_strings, readnodes and readlines will be removed in the following release!
|
16
|
+
- Enhancements to to_s methods for both MeCab and DictionaryInfo
|
17
|
+
- Enhancements to TestDictionaryInfo to allow for building user dic during setup on Windows as well
|
18
|
+
- Slight enhancement to benchmark task.
|
19
|
+
- Updating LICENSE (adding copyright year 2015), adding to all files
|
20
|
+
|
3
21
|
- __2013/07/07__: 0.9.6 release.
|
4
22
|
- Upgrade to mecab 0.996
|
5
23
|
- Adding support for partial parsing mode (-p / --partial)
|
data/LICENSE
CHANGED
@@ -1,8 +1,8 @@
|
|
1
|
-
Copyright
|
1
|
+
Copyright (c) 2014-2015, Brooke M. Fujita.
|
2
2
|
All rights reserved.
|
3
3
|
|
4
|
-
Redistribution and use in source and binary forms, with or without
|
5
|
-
permitted provided that the following conditions are met:
|
4
|
+
Redistribution and use in source and binary forms, with or without
|
5
|
+
modification, are permitted provided that the following conditions are met:
|
6
6
|
|
7
7
|
* Redistributions of source code must retain the above
|
8
8
|
copyright notice, this list of conditions and the
|
@@ -13,11 +13,13 @@ permitted provided that the following conditions are met:
|
|
13
13
|
following disclaimer in the documentation and/or other
|
14
14
|
materials provided with the distribution.
|
15
15
|
|
16
|
-
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
17
|
-
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
16
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
17
|
+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
18
|
+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
19
|
+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
20
|
+
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
21
|
+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
22
|
+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
23
|
+
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
24
|
+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
25
|
+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
data/README.md
CHANGED
@@ -1,108 +1,233 @@
|
|
1
|
-
# natto
|
2
|
-
A Tasty Ruby Binding with MeCab
|
3
|
-
|
4
|
-
## What is natto?
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
##
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
1
|
+
# natto
|
2
|
+
A Tasty Ruby Binding with MeCab
|
3
|
+
|
4
|
+
## What is natto?
|
5
|
+
A gem leveraging FFI (foreign function interface), natto combines the
|
6
|
+
[Ruby programming language](http://www.ruby-lang.org/) with
|
7
|
+
[MeCab](http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html), the part-of-speech
|
8
|
+
and morphological analyzer for the Japanese language.
|
9
|
+
|
10
|
+
- No compiler is necessary, as natto is _not_ a C extension.
|
11
|
+
- It will run on CRuby (mri/yarv) and JRuby (jvm) equally well.
|
12
|
+
- It will work with MeCab installations on Windows, Unix/Linux or Mac OS.
|
13
|
+
- natto provides a naturally Ruby-esque interface to MeCab.
|
14
|
+
|
15
|
+
You can learn more about [natto at bitbucket](https://bitbucket.org/buruzaemon/natto/).
|
16
|
+
|
17
|
+
|
18
|
+
## Requirements
|
19
|
+
natto requires the following:
|
20
|
+
|
21
|
+
- [MeCab _0.996_](http://code.google.com/p/mecab/downloads/list)
|
22
|
+
- A system dictionary, like [mecab-ipadic](https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz) or [mecab-jumandic](https://mecab.googlecode.com/files/mecab-jumandic-5.1-20070304.tar.gz)
|
23
|
+
- `libmecab-devel` if you are on Linux, since natto uses `mecab-config`
|
24
|
+
- Ruby _1.9 or greater_
|
25
|
+
- [ffi _1.9.0 or greater_](http://rubygems.org/gems/ffi)
|
26
|
+
|
27
|
+
## Installation on *nix and Mac OS
|
28
|
+
Install natto with the following gem command:
|
29
|
+
|
30
|
+
gem install natto
|
31
|
+
|
32
|
+
This will automatically install the [ffi](http://rubygems.org/gems/ffi) rubygem, which natto uses to bind to the `mecab` library.
|
33
|
+
|
34
|
+
## Installation on Windows
|
35
|
+
However, if you are using a CRuby on Windows, then you will first need to install the [RubyInstaller Development Kit (DevKit)](https://github.com/oneclick/rubyinstaller/wiki/Development-Kit), a MSYS/MinGW based toolkit that enables your Windows Ruby installation to build many of the native C/C++ extensions available, including ffi.
|
36
|
+
|
37
|
+
1. Download the latest release for RubyInstaller for Windows platforms and the corresponding DevKit from the [RubyInstaller for Windows downloads page](http://rubyinstaller.org/downloads/).
|
38
|
+
2. After installing RubyInstaller for Windows, double-click on the DevKit-tdm installer `.exe`, and expand the contents to an appropriate location, for example `C:\devkit`.
|
39
|
+
3. Open a command window under `C:\devkit`, and execute: `ruby dk.rb init`. This will locate all known ruby installations, and add them to `C:\devkit\config.yml`.
|
40
|
+
4. Next, execute: `ruby dk.rb install`, which will add the DevKit to all of the installed rubies listed in your `C:\devkit\config.yml`. Now you should be able to install and build the ffi rubygem correctly on your Windows-installed ruby.
|
41
|
+
5. Install natto with:
|
42
|
+
|
43
|
+
gem install natto
|
44
|
+
|
45
|
+
6. If you are on a 64-bit Windows and you use a 64-bit Ruby or JRuby, then you might want to [build a 64-bit version of libmecab.dll](https://bitbucket.org/buruzaemon/natto/wiki/64-Bit-Windows).
|
46
|
+
|
47
|
+
|
48
|
+
## Configuration
|
49
|
+
- ***No explicit configuration should be necessary, as natto will try to locate the `mecab` library based upon its runtime environment.***
|
50
|
+
- On Windows, it will query the Windows Registry to determine where `libmecab.dll` is installed
|
51
|
+
- On Mac OS and \*nix, it will query `mecab-config --libs`
|
52
|
+
- ***But if natto cannot find the `mecab` library, `LoadError` will be raised.***
|
53
|
+
- Please set the `MECAB_PATH` environment variable to the exact name/path to your `mecab` library.
|
54
|
+
- e.g., for Mac OS
|
55
|
+
|
56
|
+
export MECAB_PATH=/usr/local/Cellar/mecab/0.996/lib/libmecab.dylib
|
57
|
+
|
58
|
+
- e.g., for bash on UNIX/Linux
|
59
|
+
|
60
|
+
export MECAB_PATH=/usr/local/lib/libmecab.so
|
61
|
+
|
62
|
+
- e.g., on Windows
|
63
|
+
|
64
|
+
set MECAB_PATH=C:\Program Files\MeCab\bin\libmecab.dll
|
65
|
+
|
66
|
+
- e.g., from within a Ruby program
|
67
|
+
|
68
|
+
ENV['MECAB_PATH']='/usr/local/lib/libmecab.so'
|
69
|
+
|
70
|
+
## Usage
|
71
|
+
|
72
|
+
|
73
|
+
# Quick Start
|
74
|
+
# -----------
|
75
|
+
#
|
76
|
+
# No explicit configuration should be necessary!
|
77
|
+
#
|
78
|
+
require 'natto'
|
79
|
+
|
80
|
+
# first, create an instance of Natto::MeCab
|
81
|
+
#
|
82
|
+
nm = Natto::MeCab.new
|
83
|
+
=> #<Natto::MeCab:0x28d30748
|
84
|
+
@tagger=#<FFI::Pointer address=0x28a97d50>, \
|
85
|
+
@libpath="/usr/local/lib/libmecab.so", \
|
86
|
+
@options={}, \
|
87
|
+
@dicts=[#<Natto::DictionaryInfo:0x28d3061c \
|
88
|
+
@filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic", \
|
89
|
+
charset=utf8, \
|
90
|
+
type=0>] \
|
91
|
+
@version=0.996>
|
92
|
+
|
93
|
+
# display MeCab version
|
94
|
+
#
|
95
|
+
puts nm.version
|
96
|
+
=> 0.996
|
97
|
+
|
98
|
+
# display full pathname to MeCab library
|
99
|
+
#
|
100
|
+
puts nm.libpath
|
101
|
+
=> /usr/local/lib/libmecab.so
|
102
|
+
|
103
|
+
# reference to MeCab system dictionary
|
104
|
+
#
|
105
|
+
sysdic = nm.dicts.first
|
106
|
+
|
107
|
+
# display full pathname to system dictionary file
|
108
|
+
#
|
109
|
+
puts sysdic.filepath
|
110
|
+
=> /usr/local/lib/mecab/dic/ipadic/sys.dic
|
111
|
+
|
112
|
+
# what charset (encoding) is the system dictionary?
|
113
|
+
#
|
114
|
+
puts sysdic.charset
|
115
|
+
=> utf8
|
116
|
+
|
117
|
+
# parse text and send output to stdout
|
118
|
+
#
|
119
|
+
puts nm.parse('俺の名前は星野豊だ!!そこんとこヨロシク!')
|
120
|
+
俺 名詞,代名詞,一般,*,*,*,俺,オレ,オレ
|
121
|
+
の 助詞,連体化,*,*,*,*,の,ノ,ノ
|
122
|
+
名前 名詞,一般,*,*,*,*,名前,ナマエ,ナマエ
|
123
|
+
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
|
124
|
+
星野 名詞,固有名詞,人名,姓,*,*,星野,ホシノ,ホシノ
|
125
|
+
豊 名詞,固有名詞,人名,名,*,*,豊,ユタカ,ユタカ
|
126
|
+
だ 助動詞,*,*,*,特殊・ダ,基本形,だ,ダ,ダ
|
127
|
+
! 記号,一般,*,*,*,*,!,!,!
|
128
|
+
! 記号,一般,*,*,*,*,!,!,!
|
129
|
+
そこ 名詞,代名詞,一般,*,*,*,そこ,ソコ,ソコ
|
130
|
+
ん 助詞,特殊,*,*,*,*,ん,ン,ン
|
131
|
+
とこ 名詞,一般,*,*,*,*,とこ,トコ,トコ
|
132
|
+
ヨロシク 感動詞,*,*,*,*,*,ヨロシク,ヨロシク,ヨロシク
|
133
|
+
! 記号,一般,*,*,*,*,!,!,!
|
134
|
+
EOS
|
135
|
+
|
136
|
+
# parse more text and use a block to:
|
137
|
+
# - iterate the resulting MeCab nodes
|
138
|
+
# - output morpheme surface and part-of-speech ID
|
139
|
+
#
|
140
|
+
# * ignore any end-of-sentence nodes
|
141
|
+
#
|
142
|
+
nm.parse('世界チャンプ目指してんだなこれがっ!!夢なの、俺のっ!!') do |n|
|
143
|
+
puts "#{n.surface}\tpart-of-speech id: #{n.posid}" if !n.is_eos?
|
144
|
+
end
|
145
|
+
世界 part-of-speech id: 38
|
146
|
+
チャンプ part-of-speech id: 38
|
147
|
+
目指し part-of-speech id: 31
|
148
|
+
て part-of-speech id: 18
|
149
|
+
ん part-of-speech id: 63
|
150
|
+
だ part-of-speech id: 25
|
151
|
+
な part-of-speech id: 17
|
152
|
+
これ part-of-speech id: 59
|
153
|
+
がっ part-of-speech id: 32
|
154
|
+
!! part-of-speech id: 36
|
155
|
+
夢 part-of-speech id: 38
|
156
|
+
な part-of-speech id: 25
|
157
|
+
の part-of-speech id: 17
|
158
|
+
、 part-of-speech id: 9
|
159
|
+
俺 part-of-speech id: 59
|
160
|
+
のっ part-of-speech id: 31
|
161
|
+
!! part-of-speech id: 36
|
162
|
+
|
163
|
+
# for more complex parsing, such as that for natural
|
164
|
+
# language processing tasks, it is far more efficient
|
165
|
+
# to iterate over MeCab nodes using an Enumerator
|
166
|
+
#
|
167
|
+
# this example uses the node-format option to customize
|
168
|
+
# the resulting morpheme feature to extract:
|
169
|
+
# - surface
|
170
|
+
# - part-of-speech
|
171
|
+
# - reading
|
172
|
+
#
|
173
|
+
# * again, ignore any end-of-sentence nodes
|
174
|
+
#
|
175
|
+
nm = Natto::MeCab.new('-F%m\t%f[0]\t%f[7]')
|
176
|
+
|
177
|
+
enum = nm.enum_parse('この星の一等賞になりたいの卓球で俺は、そんだけ!')
|
178
|
+
=> #<Enumerator: #<Enumerator::Generator:0x00000002ff3898>:each>
|
179
|
+
|
180
|
+
enum.next
|
181
|
+
=> #<Natto::MeCabNode:0x000000032eed68 \
|
182
|
+
@pointer=#<FFI::Pointer address=0x000000005ffb48>, \
|
183
|
+
stat=0, \
|
184
|
+
@surface="この", \
|
185
|
+
@feature="この 連体詞 コノ">
|
186
|
+
|
187
|
+
enum.peek
|
188
|
+
=> #<Natto::MeCabNode:0x00000002fe2110a \
|
189
|
+
@pointer=#<FFI::Pointer address=0x000000005ffdb8>, \
|
190
|
+
stat=0, \
|
191
|
+
@surface="星", \
|
192
|
+
@feature="星 名詞 ホシ">
|
193
|
+
|
194
|
+
enum.rewind
|
195
|
+
|
196
|
+
enum.each { |n| puts n.feature }
|
197
|
+
この 連体詞 コノ
|
198
|
+
星 名詞 ホシ
|
199
|
+
の 助詞 ノ
|
200
|
+
一等 名詞 イットウ
|
201
|
+
賞 名詞 ショウ
|
202
|
+
に 助詞 ニ
|
203
|
+
なり 動詞 ナリ
|
204
|
+
たい 助動詞 タイ
|
205
|
+
の 助詞 ノ
|
206
|
+
卓球 名詞 タッキュウ
|
207
|
+
で 助詞 デ
|
208
|
+
俺 名詞 オレ
|
209
|
+
は 助詞 ハ
|
210
|
+
、 記号 、
|
211
|
+
そん 名詞 ソン
|
212
|
+
だけ 助詞 ダケ
|
213
|
+
! 記号 !
|
214
|
+
|
215
|
+
|
216
|
+
|
217
|
+
## Learn more
|
218
|
+
- You can read more about natto on the [project Wiki](https://bitbucket.org/buruzaemon/natto/wiki/Home).
|
219
|
+
|
220
|
+
## Contributing to natto
|
221
|
+
- Use [mercurial](http://mercurial.selenic.com/) and [check out the latest code at bitbucket](https://bitbucket.org/buruzaemon/natto/src/) to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
|
222
|
+
- [Browse the issue tracker](https://bitbucket.org/buruzaemon/natto/issues/) to make sure someone already hasn't requested it and/or contributed it.
|
223
|
+
- Fork the project.
|
224
|
+
- Start a feature/bugfix branch.
|
225
|
+
- Commit and push until you are happy with your contribution.
|
226
|
+
- Make sure to add tests for it. This is important so I don't break it in a future version unintentionally. I use [MiniTest::Unit](http://rubydoc.info/gems/minitest/MiniTest/Unit) as it is very natural and easy-to-use.
|
227
|
+
- Please try not to mess with the Rakefile, CHANGELOG, or version. If you must have your own version, that is fine, but please isolate to its own commit so I can cherry-pick around it.
|
228
|
+
|
229
|
+
## Changelog
|
230
|
+
Please see the {file:CHANGELOG} for this gem's release history.
|
231
|
+
|
232
|
+
## Copyright
|
233
|
+
Copyright © 2014-2015, Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.
|
data/lib/natto.rb
CHANGED
@@ -1 +1,27 @@
|
|
1
1
|
require 'natto/natto'
|
2
|
+
|
3
|
+
# Copyright (c) 2014-2015, Brooke M. Fujita.
|
4
|
+
# All rights reserved.
|
5
|
+
#
|
6
|
+
# Redistribution and use in source and binary forms, with or without
|
7
|
+
# modification, are permitted provided that the following conditions are met:
|
8
|
+
#
|
9
|
+
# * Redistributions of source code must retain the above
|
10
|
+
# copyright notice, this list of conditions and the
|
11
|
+
# following disclaimer.
|
12
|
+
#
|
13
|
+
# * Redistributions in binary form must reproduce the above
|
14
|
+
# copyright notice, this list of conditions and the
|
15
|
+
# following disclaimer in the documentation and/or other
|
16
|
+
# materials provided with the distribution.
|
17
|
+
#
|
18
|
+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
19
|
+
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
20
|
+
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
21
|
+
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
22
|
+
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
23
|
+
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
24
|
+
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
25
|
+
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
26
|
+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
27
|
+
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
data/lib/natto/binding.rb
CHANGED
@@ -10,7 +10,7 @@ module Natto
|
|
10
10
|
extend FFI::Library
|
11
11
|
|
12
12
|
# String name for the environment variable used by
|
13
|
-
# `Natto` to indicate the
|
13
|
+
# `Natto` to indicate the absolute pathname
|
14
14
|
# to the `mecab` library.
|
15
15
|
MECAB_PATH = 'MECAB_PATH'.freeze
|
16
16
|
|
@@ -19,38 +19,52 @@ module Natto
|
|
19
19
|
base.extend(ClassMethods)
|
20
20
|
end
|
21
21
|
|
22
|
-
# Returns the
|
23
|
-
# the runtime environment.
|
24
|
-
#
|
25
|
-
#
|
26
|
-
#
|
27
|
-
# is _not_ set to the full path of the `mecab`
|
28
|
-
# library.
|
29
|
-
# @return name of the `mecab` library
|
30
|
-
# @raise [LoadError] if MECAB_PATH environment variable is not set in Windows
|
31
|
-
# <br/>
|
32
|
-
# e.g., for bash on UNIX/Linux
|
33
|
-
#
|
34
|
-
# export MECAB_PATH=/usr/local/lib/libmecab.so
|
35
|
-
#
|
36
|
-
# e.g., on Windows
|
37
|
-
#
|
38
|
-
# set MECAB_PATH=C:\Program Files\MeCab\bin\libmecab.dll
|
39
|
-
#
|
40
|
-
# e.g., from within a Ruby program
|
41
|
-
#
|
42
|
-
# ENV['MECAB_PATH']='usr/local/lib/libmecab.so'
|
22
|
+
# Returns the absolute pathname to the `mecab` library based on
|
23
|
+
# the runtime environment.
|
24
|
+
#
|
25
|
+
# @return [String] absolute pathname to the `mecab` library
|
26
|
+
# @raise [LoadError] if the library cannot be located
|
43
27
|
def self.find_library
|
28
|
+
return File.absolute_path(ENV[MECAB_PATH]) if ENV[MECAB_PATH]
|
29
|
+
|
44
30
|
host_os = RbConfig::CONFIG['host_os']
|
45
31
|
|
46
32
|
if host_os =~ /mswin|mingw/i
|
47
|
-
|
33
|
+
require 'win32/registry'
|
34
|
+
begin
|
35
|
+
base = nil
|
36
|
+
Win32::Registry::HKEY_CURRENT_USER.open('Software\MeCab') do |r|
|
37
|
+
base = r['mecabrc'].split('etc').first
|
38
|
+
end
|
39
|
+
lib = File.join(base, 'bin/libmecab.dll')
|
40
|
+
File.absolute_path(lib)
|
41
|
+
rescue
|
42
|
+
raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.dll"
|
43
|
+
end
|
48
44
|
else
|
49
|
-
'
|
45
|
+
require 'open3'
|
46
|
+
if host_os =~ /darwin/i
|
47
|
+
ext = 'dylib'
|
48
|
+
else
|
49
|
+
ext = 'so'
|
50
|
+
end
|
51
|
+
|
52
|
+
begin
|
53
|
+
base, lib = nil, nil
|
54
|
+
cmd = 'mecab-config --libs'
|
55
|
+
Open3.popen3(cmd) do |stdin,stdout,stderr|
|
56
|
+
toks = stdout.read.split
|
57
|
+
base = toks[0][2..-1]
|
58
|
+
lib = toks[1][2..-1]
|
59
|
+
end
|
60
|
+
File.absolute_path(File.join(base, "lib#{lib}.#{ext}"))
|
61
|
+
rescue
|
62
|
+
raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.#{ext}"
|
63
|
+
end
|
50
64
|
end
|
51
65
|
end
|
52
66
|
|
53
|
-
ffi_lib
|
67
|
+
ffi_lib find_library
|
54
68
|
|
55
69
|
# new interface
|
56
70
|
attach_function :mecab_model_new2, [:string], :pointer
|
@@ -77,6 +91,10 @@ module Natto
|
|
77
91
|
# @private
|
78
92
|
module ClassMethods
|
79
93
|
|
94
|
+
def find_library
|
95
|
+
Natto::Binding.find_library
|
96
|
+
end
|
97
|
+
|
80
98
|
def mecab_model_new2(options_str)
|
81
99
|
Natto::Binding.mecab_model_new2(options_str)
|
82
100
|
end
|
@@ -156,3 +174,29 @@ module Natto
|
|
156
174
|
end
|
157
175
|
end
|
158
176
|
end
|
177
|
+
|
178
|
+
# Copyright (c) 2014-2015, Brooke M. Fujita.
|
179
|
+
# All rights reserved.
|
180
|
+
#
|
181
|
+
# Redistribution and use in source and binary forms, with or without
|
182
|
+
# modification, are permitted provided that the following conditions are met:
|
183
|
+
#
|
184
|
+
# * Redistributions of source code must retain the above
|
185
|
+
# copyright notice, this list of conditions and the
|
186
|
+
# following disclaimer.
|
187
|
+
#
|
188
|
+
# * Redistributions in binary form must reproduce the above
|
189
|
+
# copyright notice, this list of conditions and the
|
190
|
+
# following disclaimer in the documentation and/or other
|
191
|
+
# materials provided with the distribution.
|
192
|
+
#
|
193
|
+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
194
|
+
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
195
|
+
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
196
|
+
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
197
|
+
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
198
|
+
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
199
|
+
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
200
|
+
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
201
|
+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
202
|
+
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|