natto 0.9.6 → 0.9.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG +18 -0
- data/LICENSE +13 -11
- data/README.md +233 -108
- data/lib/natto.rb +26 -0
- data/lib/natto/binding.rb +69 -25
- data/lib/natto/natto.rb +166 -72
- data/lib/natto/option_parse.rb +26 -0
- data/lib/natto/struct.rb +103 -80
- data/lib/natto/version.rb +27 -1
- metadata +12 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: fad99a300fd0a04d95e5ffacb7352b3855506e85
|
4
|
+
data.tar.gz: 1e9ba71a7690d14099f45d0350fba7d388b7e4e9
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 185db00a5a3fba01b27ad27ea0e89e03e698b8d5ccfbef400539c0e48648ab77abe5b90e5cad9c7777dc5d6a79297b4dea99e3ecdae4111bb25f8d78614b164c
|
7
|
+
data.tar.gz: fec5fd24301277deff762c68762b89fec9f33736a6cf918b7d4ec61a9019ff03433d6f96528d8220ca7f35a352950afd93f4b5500cf5fe9baf5b0ccbedeb5efe
|
data/CHANGELOG
CHANGED
@@ -1,5 +1,23 @@
|
|
1
1
|
## CHANGELOG
|
2
2
|
|
3
|
+
- __2014/12/20__: 0.9.7 release.
|
4
|
+
- Issue 14: [adding automatic discovery for mecab library; no need to
|
5
|
+
explicitly set
|
6
|
+
MECAB_PATH!](https://bitbucket.org/buruzaemon/natto/issue/14/automatic-discovery-of-libmecab-path-and)
|
7
|
+
- Issue 15: [refactored node-parsing to use Enumerator instead of
|
8
|
+
materializing every node and stuffing into
|
9
|
+
array](https://bitbucket.org/buruzaemon/natto/issue/15/use-enumerator-when-parsing-mecab-nodes)
|
10
|
+
- Issue 17: [adding filepath to MeCab and
|
11
|
+
DictionaryInfo](https://bitbucket.org/buruzaemon/natto/issue/17/use-filerealpath-value-for-all-file-paths)
|
12
|
+
- Issue 18: [bug-fix for node-formatting during default node
|
13
|
+
parse](https://bitbucket.org/buruzaemon/natto/issue/18/no-node-formatting-when-using-default-node)
|
14
|
+
- Deprecating parse_as_nodes and parse_as_strings; please use parse instead!
|
15
|
+
- CAUTION: parse_as_nodes, parse_as_strings, readnodes and readlines will be removed in the following release!
|
16
|
+
- Enhancements to to_s methods for both MeCab and DictionaryInfo
|
17
|
+
- Enhancements to TestDictionaryInfo to allow for building user dic during setup on Windows as well
|
18
|
+
- Slight enhancement to benchmark task.
|
19
|
+
- Updating LICENSE (adding copyright year 2015), adding to all files
|
20
|
+
|
3
21
|
- __2013/07/07__: 0.9.6 release.
|
4
22
|
- Upgrade to mecab 0.996
|
5
23
|
- Adding support for partial parsing mode (-p / --partial)
|
data/LICENSE
CHANGED
@@ -1,8 +1,8 @@
|
|
1
|
-
Copyright
|
1
|
+
Copyright (c) 2014-2015, Brooke M. Fujita.
|
2
2
|
All rights reserved.
|
3
3
|
|
4
|
-
Redistribution and use in source and binary forms, with or without
|
5
|
-
permitted provided that the following conditions are met:
|
4
|
+
Redistribution and use in source and binary forms, with or without
|
5
|
+
modification, are permitted provided that the following conditions are met:
|
6
6
|
|
7
7
|
* Redistributions of source code must retain the above
|
8
8
|
copyright notice, this list of conditions and the
|
@@ -13,11 +13,13 @@ permitted provided that the following conditions are met:
|
|
13
13
|
following disclaimer in the documentation and/or other
|
14
14
|
materials provided with the distribution.
|
15
15
|
|
16
|
-
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
17
|
-
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
16
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
17
|
+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
18
|
+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
19
|
+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
20
|
+
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
21
|
+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
22
|
+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
23
|
+
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
24
|
+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
25
|
+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
data/README.md
CHANGED
@@ -1,108 +1,233 @@
|
|
1
|
-
# natto
|
2
|
-
A Tasty Ruby Binding with MeCab
|
3
|
-
|
4
|
-
## What is natto?
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
##
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
1
|
+
# natto
|
2
|
+
A Tasty Ruby Binding with MeCab
|
3
|
+
|
4
|
+
## What is natto?
|
5
|
+
A gem leveraging FFI (foreign function interface), natto combines the
|
6
|
+
[Ruby programming language](http://www.ruby-lang.org/) with
|
7
|
+
[MeCab](http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html), the part-of-speech
|
8
|
+
and morphological analyzer for the Japanese language.
|
9
|
+
|
10
|
+
- No compiler is necessary, as natto is _not_ a C extension.
|
11
|
+
- It will run on CRuby (mri/yarv) and JRuby (jvm) equally well.
|
12
|
+
- It will work with MeCab installations on Windows, Unix/Linux or Mac OS.
|
13
|
+
- natto provides a naturally Ruby-esque interface to MeCab.
|
14
|
+
|
15
|
+
You can learn more about [natto at bitbucket](https://bitbucket.org/buruzaemon/natto/).
|
16
|
+
|
17
|
+
|
18
|
+
## Requirements
|
19
|
+
natto requires the following:
|
20
|
+
|
21
|
+
- [MeCab _0.996_](http://code.google.com/p/mecab/downloads/list)
|
22
|
+
- A system dictionary, like [mecab-ipadic](https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz) or [mecab-jumandic](https://mecab.googlecode.com/files/mecab-jumandic-5.1-20070304.tar.gz)
|
23
|
+
- `libmecab-devel` if you are on Linux, since natto uses `mecab-config`
|
24
|
+
- Ruby _1.9 or greater_
|
25
|
+
- [ffi _1.9.0 or greater_](http://rubygems.org/gems/ffi)
|
26
|
+
|
27
|
+
## Installation on *nix and Mac OS
|
28
|
+
Install natto with the following gem command:
|
29
|
+
|
30
|
+
gem install natto
|
31
|
+
|
32
|
+
This will automatically install the [ffi](http://rubygems.org/gems/ffi) rubygem, which natto uses to bind to the `mecab` library.
|
33
|
+
|
34
|
+
## Installation on Windows
|
35
|
+
However, if you are using a CRuby on Windows, then you will first need to install the [RubyInstaller Development Kit (DevKit)](https://github.com/oneclick/rubyinstaller/wiki/Development-Kit), a MSYS/MinGW based toolkit that enables your Windows Ruby installation to build many of the native C/C++ extensions available, including ffi.
|
36
|
+
|
37
|
+
1. Download the latest release for RubyInstaller for Windows platforms and the corresponding DevKit from the [RubyInstaller for Windows downloads page](http://rubyinstaller.org/downloads/).
|
38
|
+
2. After installing RubyInstaller for Windows, double-click on the DevKit-tdm installer `.exe`, and expand the contents to an appropriate location, for example `C:\devkit`.
|
39
|
+
3. Open a command window under `C:\devkit`, and execute: `ruby dk.rb init`. This will locate all known ruby installations, and add them to `C:\devkit\config.yml`.
|
40
|
+
4. Next, execute: `ruby dk.rb install`, which will add the DevKit to all of the installed rubies listed in your `C:\devkit\config.yml`. Now you should be able to install and build the ffi rubygem correctly on your Windows-installed ruby.
|
41
|
+
5. Install natto with:
|
42
|
+
|
43
|
+
gem install natto
|
44
|
+
|
45
|
+
6. If you are on a 64-bit Windows and you use a 64-bit Ruby or JRuby, then you might want to [build a 64-bit version of libmecab.dll](https://bitbucket.org/buruzaemon/natto/wiki/64-Bit-Windows).
|
46
|
+
|
47
|
+
|
48
|
+
## Configuration
|
49
|
+
- ***No explicit configuration should be necessary, as natto will try to locate the `mecab` library based upon its runtime environment.***
|
50
|
+
- On Windows, it will query the Windows Registry to determine where `libmecab.dll` is installed
|
51
|
+
- On Mac OS and \*nix, it will query `mecab-config --libs`
|
52
|
+
- ***But if natto cannot find the `mecab` library, `LoadError` will be raised.***
|
53
|
+
- Please set the `MECAB_PATH` environment variable to the exact name/path to your `mecab` library.
|
54
|
+
- e.g., for Mac OS
|
55
|
+
|
56
|
+
export MECAB_PATH=/usr/local/Cellar/mecab/0.996/lib/libmecab.dylib
|
57
|
+
|
58
|
+
- e.g., for bash on UNIX/Linux
|
59
|
+
|
60
|
+
export MECAB_PATH=/usr/local/lib/libmecab.so
|
61
|
+
|
62
|
+
- e.g., on Windows
|
63
|
+
|
64
|
+
set MECAB_PATH=C:\Program Files\MeCab\bin\libmecab.dll
|
65
|
+
|
66
|
+
- e.g., from within a Ruby program
|
67
|
+
|
68
|
+
ENV['MECAB_PATH']='/usr/local/lib/libmecab.so'
|
69
|
+
|
70
|
+
## Usage
|
71
|
+
|
72
|
+
|
73
|
+
# Quick Start
|
74
|
+
# -----------
|
75
|
+
#
|
76
|
+
# No explicit configuration should be necessary!
|
77
|
+
#
|
78
|
+
require 'natto'
|
79
|
+
|
80
|
+
# first, create an instance of Natto::MeCab
|
81
|
+
#
|
82
|
+
nm = Natto::MeCab.new
|
83
|
+
=> #<Natto::MeCab:0x28d30748
|
84
|
+
@tagger=#<FFI::Pointer address=0x28a97d50>, \
|
85
|
+
@libpath="/usr/local/lib/libmecab.so", \
|
86
|
+
@options={}, \
|
87
|
+
@dicts=[#<Natto::DictionaryInfo:0x28d3061c \
|
88
|
+
@filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic", \
|
89
|
+
charset=utf8, \
|
90
|
+
type=0>] \
|
91
|
+
@version=0.996>
|
92
|
+
|
93
|
+
# display MeCab version
|
94
|
+
#
|
95
|
+
puts nm.version
|
96
|
+
=> 0.996
|
97
|
+
|
98
|
+
# display full pathname to MeCab library
|
99
|
+
#
|
100
|
+
puts nm.libpath
|
101
|
+
=> /usr/local/lib/libmecab.so
|
102
|
+
|
103
|
+
# reference to MeCab system dictionary
|
104
|
+
#
|
105
|
+
sysdic = nm.dicts.first
|
106
|
+
|
107
|
+
# display full pathname to system dictionary file
|
108
|
+
#
|
109
|
+
puts sysdic.filepath
|
110
|
+
=> /usr/local/lib/mecab/dic/ipadic/sys.dic
|
111
|
+
|
112
|
+
# what charset (encoding) is the system dictionary?
|
113
|
+
#
|
114
|
+
puts sysdic.charset
|
115
|
+
=> utf8
|
116
|
+
|
117
|
+
# parse text and send output to stdout
|
118
|
+
#
|
119
|
+
puts nm.parse('俺の名前は星野豊だ!!そこんとこヨロシク!')
|
120
|
+
俺 名詞,代名詞,一般,*,*,*,俺,オレ,オレ
|
121
|
+
の 助詞,連体化,*,*,*,*,の,ノ,ノ
|
122
|
+
名前 名詞,一般,*,*,*,*,名前,ナマエ,ナマエ
|
123
|
+
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
|
124
|
+
星野 名詞,固有名詞,人名,姓,*,*,星野,ホシノ,ホシノ
|
125
|
+
豊 名詞,固有名詞,人名,名,*,*,豊,ユタカ,ユタカ
|
126
|
+
だ 助動詞,*,*,*,特殊・ダ,基本形,だ,ダ,ダ
|
127
|
+
! 記号,一般,*,*,*,*,!,!,!
|
128
|
+
! 記号,一般,*,*,*,*,!,!,!
|
129
|
+
そこ 名詞,代名詞,一般,*,*,*,そこ,ソコ,ソコ
|
130
|
+
ん 助詞,特殊,*,*,*,*,ん,ン,ン
|
131
|
+
とこ 名詞,一般,*,*,*,*,とこ,トコ,トコ
|
132
|
+
ヨロシク 感動詞,*,*,*,*,*,ヨロシク,ヨロシク,ヨロシク
|
133
|
+
! 記号,一般,*,*,*,*,!,!,!
|
134
|
+
EOS
|
135
|
+
|
136
|
+
# parse more text and use a block to:
|
137
|
+
# - iterate the resulting MeCab nodes
|
138
|
+
# - output morpheme surface and part-of-speech ID
|
139
|
+
#
|
140
|
+
# * ignore any end-of-sentence nodes
|
141
|
+
#
|
142
|
+
nm.parse('世界チャンプ目指してんだなこれがっ!!夢なの、俺のっ!!') do |n|
|
143
|
+
puts "#{n.surface}\tpart-of-speech id: #{n.posid}" if !n.is_eos?
|
144
|
+
end
|
145
|
+
世界 part-of-speech id: 38
|
146
|
+
チャンプ part-of-speech id: 38
|
147
|
+
目指し part-of-speech id: 31
|
148
|
+
て part-of-speech id: 18
|
149
|
+
ん part-of-speech id: 63
|
150
|
+
だ part-of-speech id: 25
|
151
|
+
な part-of-speech id: 17
|
152
|
+
これ part-of-speech id: 59
|
153
|
+
がっ part-of-speech id: 32
|
154
|
+
!! part-of-speech id: 36
|
155
|
+
夢 part-of-speech id: 38
|
156
|
+
な part-of-speech id: 25
|
157
|
+
の part-of-speech id: 17
|
158
|
+
、 part-of-speech id: 9
|
159
|
+
俺 part-of-speech id: 59
|
160
|
+
のっ part-of-speech id: 31
|
161
|
+
!! part-of-speech id: 36
|
162
|
+
|
163
|
+
# for more complex parsing, such as that for natural
|
164
|
+
# language processing tasks, it is far more efficient
|
165
|
+
# to iterate over MeCab nodes using an Enumerator
|
166
|
+
#
|
167
|
+
# this example uses the node-format option to customize
|
168
|
+
# the resulting morpheme feature to extract:
|
169
|
+
# - surface
|
170
|
+
# - part-of-speech
|
171
|
+
# - reading
|
172
|
+
#
|
173
|
+
# * again, ignore any end-of-sentence nodes
|
174
|
+
#
|
175
|
+
nm = Natto::MeCab.new('-F%m\t%f[0]\t%f[7]')
|
176
|
+
|
177
|
+
enum = nm.enum_parse('この星の一等賞になりたいの卓球で俺は、そんだけ!')
|
178
|
+
=> #<Enumerator: #<Enumerator::Generator:0x00000002ff3898>:each>
|
179
|
+
|
180
|
+
enum.next
|
181
|
+
=> #<Natto::MeCabNode:0x000000032eed68 \
|
182
|
+
@pointer=#<FFI::Pointer address=0x000000005ffb48>, \
|
183
|
+
stat=0, \
|
184
|
+
@surface="この", \
|
185
|
+
@feature="この 連体詞 コノ">
|
186
|
+
|
187
|
+
enum.peek
|
188
|
+
=> #<Natto::MeCabNode:0x00000002fe2110a \
|
189
|
+
@pointer=#<FFI::Pointer address=0x000000005ffdb8>, \
|
190
|
+
stat=0, \
|
191
|
+
@surface="星", \
|
192
|
+
@feature="星 名詞 ホシ">
|
193
|
+
|
194
|
+
enum.rewind
|
195
|
+
|
196
|
+
enum.each { |n| puts n.feature }
|
197
|
+
この 連体詞 コノ
|
198
|
+
星 名詞 ホシ
|
199
|
+
の 助詞 ノ
|
200
|
+
一等 名詞 イットウ
|
201
|
+
賞 名詞 ショウ
|
202
|
+
に 助詞 ニ
|
203
|
+
なり 動詞 ナリ
|
204
|
+
たい 助動詞 タイ
|
205
|
+
の 助詞 ノ
|
206
|
+
卓球 名詞 タッキュウ
|
207
|
+
で 助詞 デ
|
208
|
+
俺 名詞 オレ
|
209
|
+
は 助詞 ハ
|
210
|
+
、 記号 、
|
211
|
+
そん 名詞 ソン
|
212
|
+
だけ 助詞 ダケ
|
213
|
+
! 記号 !
|
214
|
+
|
215
|
+
|
216
|
+
|
217
|
+
## Learn more
|
218
|
+
- You can read more about natto on the [project Wiki](https://bitbucket.org/buruzaemon/natto/wiki/Home).
|
219
|
+
|
220
|
+
## Contributing to natto
|
221
|
+
- Use [mercurial](http://mercurial.selenic.com/) and [check out the latest code at bitbucket](https://bitbucket.org/buruzaemon/natto/src/) to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
|
222
|
+
- [Browse the issue tracker](https://bitbucket.org/buruzaemon/natto/issues/) to make sure someone already hasn't requested it and/or contributed it.
|
223
|
+
- Fork the project.
|
224
|
+
- Start a feature/bugfix branch.
|
225
|
+
- Commit and push until you are happy with your contribution.
|
226
|
+
- Make sure to add tests for it. This is important so I don't break it in a future version unintentionally. I use [MiniTest::Unit](http://rubydoc.info/gems/minitest/MiniTest/Unit) as it is very natural and easy-to-use.
|
227
|
+
- Please try not to mess with the Rakefile, CHANGELOG, or version. If you must have your own version, that is fine, but please isolate to its own commit so I can cherry-pick around it.
|
228
|
+
|
229
|
+
## Changelog
|
230
|
+
Please see the {file:CHANGELOG} for this gem's release history.
|
231
|
+
|
232
|
+
## Copyright
|
233
|
+
Copyright © 2014-2015, Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.
|
data/lib/natto.rb
CHANGED
@@ -1 +1,27 @@
|
|
1
1
|
require 'natto/natto'
|
2
|
+
|
3
|
+
# Copyright (c) 2014-2015, Brooke M. Fujita.
|
4
|
+
# All rights reserved.
|
5
|
+
#
|
6
|
+
# Redistribution and use in source and binary forms, with or without
|
7
|
+
# modification, are permitted provided that the following conditions are met:
|
8
|
+
#
|
9
|
+
# * Redistributions of source code must retain the above
|
10
|
+
# copyright notice, this list of conditions and the
|
11
|
+
# following disclaimer.
|
12
|
+
#
|
13
|
+
# * Redistributions in binary form must reproduce the above
|
14
|
+
# copyright notice, this list of conditions and the
|
15
|
+
# following disclaimer in the documentation and/or other
|
16
|
+
# materials provided with the distribution.
|
17
|
+
#
|
18
|
+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
19
|
+
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
20
|
+
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
21
|
+
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
22
|
+
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
23
|
+
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
24
|
+
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
25
|
+
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
26
|
+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
27
|
+
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
data/lib/natto/binding.rb
CHANGED
@@ -10,7 +10,7 @@ module Natto
|
|
10
10
|
extend FFI::Library
|
11
11
|
|
12
12
|
# String name for the environment variable used by
|
13
|
-
# `Natto` to indicate the
|
13
|
+
# `Natto` to indicate the absolute pathname
|
14
14
|
# to the `mecab` library.
|
15
15
|
MECAB_PATH = 'MECAB_PATH'.freeze
|
16
16
|
|
@@ -19,38 +19,52 @@ module Natto
|
|
19
19
|
base.extend(ClassMethods)
|
20
20
|
end
|
21
21
|
|
22
|
-
# Returns the
|
23
|
-
# the runtime environment.
|
24
|
-
#
|
25
|
-
#
|
26
|
-
#
|
27
|
-
# is _not_ set to the full path of the `mecab`
|
28
|
-
# library.
|
29
|
-
# @return name of the `mecab` library
|
30
|
-
# @raise [LoadError] if MECAB_PATH environment variable is not set in Windows
|
31
|
-
# <br/>
|
32
|
-
# e.g., for bash on UNIX/Linux
|
33
|
-
#
|
34
|
-
# export MECAB_PATH=/usr/local/lib/libmecab.so
|
35
|
-
#
|
36
|
-
# e.g., on Windows
|
37
|
-
#
|
38
|
-
# set MECAB_PATH=C:\Program Files\MeCab\bin\libmecab.dll
|
39
|
-
#
|
40
|
-
# e.g., from within a Ruby program
|
41
|
-
#
|
42
|
-
# ENV['MECAB_PATH']='usr/local/lib/libmecab.so'
|
22
|
+
# Returns the absolute pathname to the `mecab` library based on
|
23
|
+
# the runtime environment.
|
24
|
+
#
|
25
|
+
# @return [String] absolute pathname to the `mecab` library
|
26
|
+
# @raise [LoadError] if the library cannot be located
|
43
27
|
def self.find_library
|
28
|
+
return File.absolute_path(ENV[MECAB_PATH]) if ENV[MECAB_PATH]
|
29
|
+
|
44
30
|
host_os = RbConfig::CONFIG['host_os']
|
45
31
|
|
46
32
|
if host_os =~ /mswin|mingw/i
|
47
|
-
|
33
|
+
require 'win32/registry'
|
34
|
+
begin
|
35
|
+
base = nil
|
36
|
+
Win32::Registry::HKEY_CURRENT_USER.open('Software\MeCab') do |r|
|
37
|
+
base = r['mecabrc'].split('etc').first
|
38
|
+
end
|
39
|
+
lib = File.join(base, 'bin/libmecab.dll')
|
40
|
+
File.absolute_path(lib)
|
41
|
+
rescue
|
42
|
+
raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.dll"
|
43
|
+
end
|
48
44
|
else
|
49
|
-
'
|
45
|
+
require 'open3'
|
46
|
+
if host_os =~ /darwin/i
|
47
|
+
ext = 'dylib'
|
48
|
+
else
|
49
|
+
ext = 'so'
|
50
|
+
end
|
51
|
+
|
52
|
+
begin
|
53
|
+
base, lib = nil, nil
|
54
|
+
cmd = 'mecab-config --libs'
|
55
|
+
Open3.popen3(cmd) do |stdin,stdout,stderr|
|
56
|
+
toks = stdout.read.split
|
57
|
+
base = toks[0][2..-1]
|
58
|
+
lib = toks[1][2..-1]
|
59
|
+
end
|
60
|
+
File.absolute_path(File.join(base, "lib#{lib}.#{ext}"))
|
61
|
+
rescue
|
62
|
+
raise LoadError, "Please set #{MECAB_PATH} to the full path to libmecab.#{ext}"
|
63
|
+
end
|
50
64
|
end
|
51
65
|
end
|
52
66
|
|
53
|
-
ffi_lib
|
67
|
+
ffi_lib find_library
|
54
68
|
|
55
69
|
# new interface
|
56
70
|
attach_function :mecab_model_new2, [:string], :pointer
|
@@ -77,6 +91,10 @@ module Natto
|
|
77
91
|
# @private
|
78
92
|
module ClassMethods
|
79
93
|
|
94
|
+
def find_library
|
95
|
+
Natto::Binding.find_library
|
96
|
+
end
|
97
|
+
|
80
98
|
def mecab_model_new2(options_str)
|
81
99
|
Natto::Binding.mecab_model_new2(options_str)
|
82
100
|
end
|
@@ -156,3 +174,29 @@ module Natto
|
|
156
174
|
end
|
157
175
|
end
|
158
176
|
end
|
177
|
+
|
178
|
+
# Copyright (c) 2014-2015, Brooke M. Fujita.
|
179
|
+
# All rights reserved.
|
180
|
+
#
|
181
|
+
# Redistribution and use in source and binary forms, with or without
|
182
|
+
# modification, are permitted provided that the following conditions are met:
|
183
|
+
#
|
184
|
+
# * Redistributions of source code must retain the above
|
185
|
+
# copyright notice, this list of conditions and the
|
186
|
+
# following disclaimer.
|
187
|
+
#
|
188
|
+
# * Redistributions in binary form must reproduce the above
|
189
|
+
# copyright notice, this list of conditions and the
|
190
|
+
# following disclaimer in the documentation and/or other
|
191
|
+
# materials provided with the distribution.
|
192
|
+
#
|
193
|
+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
194
|
+
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
195
|
+
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
196
|
+
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
197
|
+
# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
198
|
+
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
199
|
+
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
200
|
+
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
201
|
+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
202
|
+
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|