nimono 0.1.5

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d3cdab19e6fcea0cbe7e9138ffe590c349060643
4
+ data.tar.gz: ebe278b4338fbfa396cbb985ba30528ca6822998
5
+ SHA512:
6
+ metadata.gz: ed4637885b7c69a7bec258019f2d414249b34a07332ed81226dbf0408b9c0272d84cca674e6995831addc3c7a286a6b6e2e4587fc8a9a6259189ee60d2a3fa13
7
+ data.tar.gz: 1d41c5132aa08372c9389056191917c31a60d9b01c3ae56c0049f7cbe8f4c21a038a673cb3efe4b2d95e90823798c77a9f33cb02db9519ea5bcf87c042e78d3d
@@ -0,0 +1,11 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ *~
11
+ *.swp
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
@@ -0,0 +1,5 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 1.9.3
5
+ before_install: gem install bundler -v 1.13.7
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at opal.s2000@gmail.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in nimono.gemspec
4
+ gemspec
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2017 Takayoshi Yamazaki
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,85 @@
1
+ # nimono
2
+
3
+ nimono is an interface of CaboCha for MRI Ruby and JRuby, and parsing Japanese
4
+ sentences using the library.
5
+ It depends on the CaboCha library so the library will have to install first.
6
+
7
+ ## Requirements
8
+
9
+ nimono requires the following:
10
+
11
+ - [CaboCha _0.69_](https://taku910.github.io/cabocha/)
12
+ - [CaboCha](https://taku910.github.io/cabocha/) requires [CRF++](https://taku910.github.io/crfpp/), [MeCab](http://taku910.github.io/mecab/#download) and either of the following dictionaries.
13
+ - mecab-ipadic, mecab-jumandic, unidic. For further information please refer to the [MeCab](http://taku910.github.io/mecab/#).
14
+ - [ffi _1.9.0 or higher_](https://rubygems.org/gems/ffi)
15
+
16
+ ## Installation
17
+
18
+ Install it as:
19
+
20
+ $ gem install nimono
21
+
22
+ ## Usage
23
+
24
+ Create an instance of Nimono::Cabocha and parse the sentence.
25
+ By default, the result is outpeuuted by displaying Sumple Tree.
26
+
27
+ ```ruby
28
+ require 'nimono'
29
+
30
+ nc = Nimono::Cabocha.new
31
+ puts nc.parse('太郎は花子が読んでいる本を次郎に渡した')
32
+ 太郎は---------D
33
+ 花子が-D |
34
+ 読んでいる-D |
35
+ 本を---D
36
+ 次郎に-D
37
+ 渡した
38
+ EOS
39
+ ```
40
+ Example of analyzing dependency:
41
+ ```ruby
42
+ require 'nimono'
43
+
44
+ nc = Nimono::Cabocha.new('-n1')
45
+ nc.parse('太郎は花子が読んでいる本を次郎に渡した')
46
+ nc.tokens.each do |t|
47
+ cid = 0
48
+ unless t.chunk.nil?
49
+ puts "* #{cid} #{t.chunk.link}D #{t.chunk.head_pos}/#{t.chunk.func_pos} #{'%6f' % t.chunk.score}"
50
+ cid += 1
51
+ end
52
+ puts "#{t.surface}\t#{t.feature}\t#{t.ne}"
53
+ end
54
+
55
+ * 0 5D 0/1 -0.742128
56
+ 太郎 名詞,固有名詞,人名,名,*,*,太郎,タロウ,タロー B-PERSON
57
+ は 助詞,係助詞,*,*,*,*,は,ハ,ワ O
58
+ * 1 2D 0/1 1.700175
59
+ 花子 名詞,固有名詞,人名,名,*,*,花子,ハナコ,ハナコ B-PERSON
60
+ が 助詞,格助詞,一般,*,*,*,が,ガ,ガ O
61
+ * 2 3D 0/2 1.825021
62
+ 読ん 動詞,自立,*,*,五段・マ行,連用タ接続,読む,ヨン,ヨン O
63
+ で 助詞,接続助詞,*,*,*,*,で,デ,デ O
64
+ いる 動詞,非自立,*,*,一段,基本形,いる,イル,イル O
65
+ * 3 5D 0/1 -0.742128
66
+ 本 名詞,一般,*,*,*,*,本,ホン,ホン O
67
+ を 助詞,格助詞,一般,*,*,*,を,ヲ,ヲ O
68
+ * 4 5D 1/2 -0.742128
69
+ 次 名詞,一般,*,*,*,*,次,ツギ,ツギ O
70
+ 郎 名詞,一般,*,*,*,*,郎,ロウ,ロー O
71
+ に 助詞,格助詞,一般,*,*,*,に,ニ,ニ O
72
+ * 5 -1D 0/1 0.000000
73
+ 渡し 動詞,自立,*,*,五段・サ行,連用形,渡す,ワタシ,ワタシ O
74
+ た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ O
75
+
76
+ ```
77
+ ## Contributing
78
+
79
+ Bug reports and pull requests are welcome on GitHub at https://github.com/TakayoshiYamazaki/nimono. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
80
+
81
+
82
+ ## License
83
+
84
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
85
+
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "nimono"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,3 @@
1
+ #-*- coding:utf-8 -*-
2
+ require 'nimono/nimono'
3
+
@@ -0,0 +1,293 @@
1
+ #-*- coding: utf-8 -*-
2
+ module Nimono
3
+
4
+ # Module `CabochaLib` is a ruby extension for CaboCha libraries using Ruby-FFI.
5
+ module CabochaLib
6
+ require 'ffi'
7
+ extend FFI::Library
8
+
9
+ CABOCHA_PATH = 'CABOCHA_PATH'.freeze
10
+
11
+ # @private
12
+ def self.included(klass)
13
+ klass.extend ClassMethods
14
+ end
15
+
16
+ # Returns the absolute path to the CaboCha library.
17
+ # @return [String] absolute path to the CaboCha library
18
+ def self.cabocha_library
19
+ if ENV[CABOCHA_PATH]
20
+ File.absolute_path(ENV[CABOCHA_PATH])
21
+ else
22
+ host_os = RbConfig::CONFIG['host_os'].downcase
23
+
24
+ lib_name = case host_os
25
+ when /mswin|mingw/ # /mswin|msys|mingw|cygwin|bccwin|wince|emc/
26
+ require 'win32/registry'
27
+ begin
28
+ path = nil
29
+ Win32::Registry::HKEY_CURRENT_USER.open('Software\Cabocha') {|r| path = r['cabocharc'].split('etc').first}
30
+ File.join(path, "bin\\libcabocha.dll")
31
+ rescue
32
+ raise LoadError, "Please set #{CABOCHA_PATH} to the full path to libcabocha.dll"
33
+ end
34
+ when /darwin/, /linux/, /solaris|bsd/
35
+ require 'open3'
36
+ host_os =~ /darwin/ ? ext = 'dylib' : ext = 'so'
37
+ begin
38
+ Open3.popen3('cabocha-config --libs') {|i, o, e, t|
39
+ i.close
40
+ tokens = o.read.split
41
+ File.absolute_path(File.join(tokens[0][2..-1], "lib#{tokens[1][2..-1]}.#{ext}"))
42
+ }
43
+ rescue
44
+ raise LoadError, "Please set #{CABOCHA_PATH} to the full path to libcabocha.#{ext}"
45
+ end
46
+ else
47
+ raise CabochaError.new "unknown os: #{host_os.inspect}"
48
+ end
49
+ end
50
+ end
51
+
52
+ begin
53
+ ffi_lib cabocha_library
54
+ rescue LoadError => e
55
+ raise LoadError, "Faild to load CaboCha Library, patches appreciated! (#{e.message})"
56
+ end
57
+
58
+ # parser interfaces
59
+ attach_function :cabocha_new2, [:string], :pointer
60
+ attach_function :cabocha_destroy, [:pointer], :void
61
+ # attach_function :cabocha_parse_tree, [:pointer, :pointer], :pointer
62
+ attach_function :cabocha_sparse_tostr, [:pointer, :pointer], :string
63
+ attach_function :cabocha_sparse_totree, [:pointer, :pointer], :pointer
64
+
65
+ # tree interfaces
66
+ # attach_function :cabocha_tree_new, [], :pointer
67
+ attach_function :cabocha_tree_destroy, [:pointer], :void
68
+ # attach_function :cabocha_tree_set_output_layer, [:pointer, :int], :void
69
+ attach_function :cabocha_tree_tostr, [:pointer, :int], :string
70
+ attach_function :cabocha_tree_size, [:pointer], :size_t
71
+ attach_function :cabocha_tree_chunk_size, [:pointer], :size_t
72
+ attach_function :cabocha_tree_token_size, [:pointer], :size_t
73
+ attach_function :cabocha_tree_token, [:pointer, :size_t], :pointer
74
+ attach_function :cabocha_tree_chunk, [:pointer, :size_t], :pointer
75
+ # attach_function :cabocha_tree_charset, [:pointer], :int
76
+ # attach_function :cabocha_tree_posset, [:pointer], :int
77
+
78
+
79
+ # @private
80
+ module ClassMethods
81
+ def cabocha_library
82
+ Nimono::CabochaLib.cabocha_library
83
+ end
84
+ # Create parser
85
+ # @param opt_str [String] the option for CaboCha
86
+ def cabocha_new2(opt_str)
87
+ Nimono::CabochaLib.cabocha_new2(opt_str)
88
+ end
89
+ # Destroy parser
90
+ # @param pptr [#parser] parser instance #parser
91
+ def cabocha_destroy(pptr)
92
+ Nimono::CabochaLib.cabocha_destroy(pptr)
93
+ end
94
+
95
+ # def cabocha_parse_tree(pptr, tptr)
96
+ # Nimono::CabochaLib.cabocha_parse_tree(pptr, tptr)
97
+ # end
98
+
99
+ # Create tree
100
+ # @param pptr [#parser] instance of parser #parser
101
+ # @param sptr [String] text for parsing
102
+ def cabocha_sparse_tostr(pptr, sptr)
103
+ Nimono::CabochaLib.cabocha_sparse_tostr(pptr, sptr)
104
+ end
105
+
106
+ def cabocha_sparse_totree(pptr, sptr)
107
+ Nimono::CabochaLib.cabocha_sparse_totree(pptr, sptr)
108
+ end
109
+
110
+ # tree
111
+ # def cabocha_tree_new
112
+ # Nimono::CabochaLib.cabocha_tree_new
113
+ # end
114
+
115
+ # def cabocha_tree_destory(tptr)
116
+ # Nimono::CabochaLib.cabocha_tree_destroy(tptr)
117
+ # end
118
+
119
+ # def cabocha_tree_set_output_layer(tptr, n)
120
+ # Nimono::CabochaLib.cabocha_tree_set_output_layer(tptr, n)
121
+ # end
122
+
123
+ def cabocha_tree_tostr(tptr, n)
124
+ Nimono::CabochaLib.cabocha_tree_tostr(tptr, n)
125
+ end
126
+
127
+ def cabocha_tree_size(tptr)
128
+ Nimono::CabochaLib.cabocha_tree_size(tptr)
129
+ end
130
+
131
+ def cabocha_tree_chunk_size(tptr)
132
+ Nimono::CabochaLib.cabocha_tree_chunk_size(tptr)
133
+ end
134
+
135
+ def cabocha_tree_token_size(tptr)
136
+ Nimono::CabochaLib.cabocha_tree_token_size(tptr)
137
+ end
138
+
139
+ # # convert Tree to Sentence
140
+ # def cabocha_tree_sentence(tptr)
141
+ # Nimono::CabochaLib.cabocha_tree_sentence(tptr)
142
+ # end
143
+
144
+ #
145
+ # @param [#tree] tree instance tree
146
+ # @param [Fixnum]
147
+ def cabocha_tree_token(tptr, n)
148
+ Nimono::CabochaLib.cabocha_tree_token(tptr, n)
149
+ end
150
+
151
+ def cabocha_tree_chunk(tptr, n)
152
+ Nimono::CabochaLib.cabocha_tree_chunk(tptr, n)
153
+ end
154
+
155
+ # def cabocha_tree_charset(tptr)
156
+ # Nimono::CabochaLib.cabocha_tree_charset(tptr)
157
+ # end
158
+
159
+ # def cabocha_tree_posset(tptr)
160
+ # Nimono::CabochaLib.cabocha_tree_posset(tptr)
161
+ # end
162
+
163
+ end
164
+ end
165
+
166
+
167
+ # 'Chunk' is a wrapper class for the 'cabocha_chunk_t' structure.
168
+ class Chunk < FFI::Struct
169
+ attr_reader :link
170
+ attr_reader :head_pos
171
+ attr_reader :func_pos
172
+ attr_reader :token_size
173
+ attr_reader :token_pos
174
+ attr_reader :score
175
+ attr_reader :feature_list
176
+ attr_reader :additional_info
177
+ attr_reader :feature_list_size
178
+
179
+ # :features is a hash of elements of :feature_list separated by colons.
180
+ attr_reader :features
181
+
182
+ layout :link, :int,
183
+ :head_pos, :size_t,
184
+ :func_pos, :size_t,
185
+ :token_size, :size_t,
186
+ :token_pos, :size_t,
187
+ :score, :float,
188
+ :feature_list, :pointer,
189
+ :additional_info, :string,
190
+ :feature_list_size, :ushort
191
+
192
+ def initialize(lptr)
193
+ super(lptr)
194
+ @pointer = lptr
195
+ if self[:additional_info]
196
+ @additional_info = self[:additional_info].force_encoding(Encoding.default_external)
197
+ end
198
+ if self[:link]
199
+ @link = self[:link]
200
+ end
201
+ if self[:head_pos]
202
+ @head_pos = self[:head_pos]
203
+ end
204
+ if self[:func_pos]
205
+ @func_pos = self[:func_pos]
206
+ end
207
+ if self[:token_size]
208
+ @token_size = self[:token_size]
209
+ end
210
+ if self[:token_pos]
211
+ @token_pos = self[:token_pos]
212
+ end
213
+ if self[:score]
214
+ @score = self[:score]
215
+ end
216
+ if self[:feature_list_size]
217
+ @feature_list_size = self[:feature_list_size]
218
+ end
219
+
220
+ if self[:feature_list_size] < 0
221
+ raise CabochaError.new "Feature list size error"
222
+ else
223
+ if self[:feature_list].null?
224
+ @feature_list = [].freeze
225
+ else
226
+ @feature_list = self[:feature_list].get_array_of_string(0, self[:feature_list_size]).each{|s| s.force_encoding(Encoding.default_external)}.freeze
227
+
228
+ # create a new hash from @feature_list
229
+ @features = Hash[*@feature_list.map {|f| f.split(':') }.flatten].freeze
230
+ end
231
+ end
232
+ end
233
+ end
234
+
235
+ # 'Token' is a wrapper class for the 'cabocha_token_t' structure.
236
+ class Token < FFI::Struct
237
+ attr_reader :surface
238
+ attr_reader :normalized_surface
239
+ attr_reader :feature
240
+ attr_reader :feature_list
241
+ attr_reader :feature_list_size
242
+ attr_reader :ne
243
+ attr_reader :additional_info
244
+ attr_reader :chunk
245
+
246
+ layout :surface, :string,
247
+ :normalized_surface, :string,
248
+ :feature, :string,
249
+ :feature_list, :pointer,
250
+ :feature_list_size, :ushort,
251
+ :ne, :string,
252
+ :additional_info, :string,
253
+ :chunk, :pointer
254
+
255
+ def initialize(lptr)
256
+ super(lptr)
257
+ @pointer = lptr
258
+ if self[:surface]
259
+ @surface = self[:surface].force_encoding(Encoding.default_external)
260
+ end
261
+ if self[:normalized_surface]
262
+ @normalized_surface = self[:normalized_surface].force_encoding(Encoding.default_external)
263
+ end
264
+ if self[:feature]
265
+ @feature = self[:feature].force_encoding(Encoding.default_external)
266
+ end
267
+ if self[:feature_list_size]
268
+ @feature_list_size = self[:feature_list_size]
269
+ end
270
+ if self[:ne]
271
+ @ne = self[:ne].force_encoding(Encoding.default_external)
272
+ end
273
+ if self[:additional_info]
274
+ @additional_info = self[:additional_info].force_encoding(Encoding.default_external)
275
+ end
276
+ if self[:chunk]
277
+ self[:chunk].null? ? nil : @chunk = Nimono::Chunk.new(self[:chunk])
278
+ end
279
+
280
+ if self[:feature_list_size] < 0
281
+ raise CabochaError.new "Feature list size error"
282
+ else
283
+ if self[:feature_list].null?
284
+ @feature_list = [].freeze
285
+ else
286
+ @feature_list = self[:feature_list].get_array_of_string(0, self[:feature_list_size]).each{|s| s.force_encoding(Encoding.default_external)}.freeze
287
+ end
288
+ end
289
+
290
+ end
291
+ end
292
+
293
+ end
@@ -0,0 +1,154 @@
1
+ #-*- coding: utf-8 -*-
2
+ require 'nimono/option_parse'
3
+ require 'nimono/cabocha_lib'
4
+ require 'nimono/version'
5
+
6
+ module Nimono
7
+
8
+ # `Cabocha` is a class providing an interface to the CaboCha library.
9
+ # In this class the arguments supported by CaboCha can be used in almost
10
+ # the same way.
11
+ class Cabocha
12
+ include Nimono::OptionParse
13
+ include Nimono::CabochaLib
14
+
15
+ # @return [Hash] CaboCha options as Key-Value pairs
16
+ attr_reader :options
17
+ # @return [String] absolute file path to CaboCha library
18
+ attr_reader :libpath
19
+ # @return [Array] Array of chunk
20
+ attr_reader :chunks
21
+ #
22
+ # @return [Array] Array of Token
23
+ attr_reader :tokens
24
+
25
+ # Initializes the CaboCha with the given 'options'.
26
+ # options is given as a string (CaboCha command line arguments) or
27
+ # as a Ruby-style hash.
28
+ #
29
+ # Options supported are:
30
+ #
31
+ # - :output_format
32
+ # - :input_layer
33
+ # - :output_layer
34
+ # - :ne
35
+ # - :parser_model
36
+ # - :chunker_model
37
+ # - :ne_model
38
+ # - :posset
39
+ # - :charset
40
+ # - :charset_file
41
+ # - :rcfile
42
+ # - :mecabrc
43
+ # - :mecab_dicdir
44
+ # - :mecab_userdic
45
+ # - :output
46
+ #
47
+ # <p>CaboCha command line arguments (-f1) or long (--output-format=1) may
48
+ # be used in addition ot Ruby-style hashs</p>
49
+ #
50
+ # e.g.<br />
51
+ #
52
+ # require 'nimono'
53
+ #
54
+ # nc = Nimono::Cabocha.new(output_format: 1)
55
+ # or nc = Nimono::Cabocha.new('-f1')
56
+ #
57
+ # => #<Nimono::Cabocha:0x6364e48d
58
+ # @sparse_tostr=#<Proc:0x74d917f5@/home/foo/nimono/lib/nimono/nimono.rb:54 (lambda)>,
59
+ # @libpath="/usr/local/lib/libcabocha.so",
60
+ # @options={:output_format=>1},
61
+ # @tree=#<FFI::Pointer address=0x7f6ecc2e3790>,
62
+ # @parser=#<FFI::Pointer address=0x7f6ecc2e3830>>
63
+ #
64
+ # puts nc.parse('太郎は花子が読んでいる本を次郎に渡した')
65
+ # 太郎 名詞,固有名詞,人名,名,*,*,太郎,タロウ,タロー
66
+ # は 助詞,係助詞,*,*,*,*,は,ハ,ワ
67
+ # * 1 2D 0/1 1.700175
68
+ # 花子 名詞,固有名詞,人名,名,*,*,花子,ハナコ,ハナコ
69
+ # が 助詞,格助詞,一般,*,*,*,が,ガ,ガ
70
+ # * 2 3D 0/2 1.825021
71
+ # 読ん 動詞,自立,*,*,五段・マ行,連用タ接続,読む,ヨン,ヨン
72
+ # で 助詞,接続助詞,*,*,*,*,で,デ,デ
73
+ # いる 動詞,非自立,*,*,一段,基本形,いる,イル,イル
74
+ # * 3 5D 0/1 -0.742128
75
+ # 本 名詞,一般,*,*,*,*,本,ホン,ホン
76
+ # を 助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
77
+ # * 4 5D 1/2 -0.742128
78
+ # 次 名詞,一般,*,*,*,*,次,ツギ,ツギ
79
+ # 郎 名詞,一般,*,*,*,*,郎,ロウ,ロー
80
+ # に 助詞,格助詞,一般,*,*,*,に,ニ,ニ
81
+ # * 5 -1D 0/1 0.000000
82
+ # 渡し 動詞,自立,*,*,五段・サ行,連用形,渡す,ワタシ,ワタシ
83
+ # た 助動詞,*,*,*,特殊・タ,基本形,た,タ,タ
84
+ # EOS
85
+ # => nil
86
+ #
87
+ # @param options [Hash, String] the CaboCha options
88
+ # @raise [CabochaError] if Cabocha cannot be initialized with the given 'options'
89
+ def initialize(options={})
90
+ @options = self.class.parse_options(options)
91
+ opt_str = self.class.build_options_str(@options)
92
+ @libpath = self.class.cabocha_library
93
+
94
+ @parser = self.class.cabocha_new2(opt_str)
95
+ if @parser.address == 0x0
96
+ raise CabochaError.new("Could not initialize CaboCha with options: '#{opt_str}'")
97
+ end
98
+ @tree = self.class.cabocha_sparse_totree(@parser, "")
99
+
100
+ if @options[:output_layer]
101
+ self.class.cabocha_tree_set_output_layer(@tree, @options[:output_layer])
102
+ end
103
+
104
+ @sparse_tostr = ->(text) {
105
+ begin
106
+ self.class.cabocha_sparse_tostr(@parser, text).force_encoding(Encoding.default_external)
107
+ rescue
108
+ raise CabochaError.new 'Parse Error'
109
+ end
110
+ }
111
+ end
112
+
113
+ # Parses the given `text`, returning the CaboCha output as a string.
114
+ # At the same time creating #chunks and #tokens.
115
+ # @param text [String] the japanese text to parse
116
+ # @return [String] parsing result from CaoboCha
117
+ # @raise [CabochaError] if the Cabocha cannot parse the given `text`
118
+ # @raise [ArgumentError] if the given string `text` argument is `null`
119
+ def parse(text)
120
+ if text.nil?
121
+ raise CabochaError.new 'Text to parse cannot be nil'
122
+ else
123
+ @result = @sparse_tostr.call(text)
124
+ @tree = self.class.cabocha_sparse_totree(@parser, text)
125
+
126
+ @tokens = []
127
+ self.class.cabocha_tree_token_size(@tree).times do |i|
128
+ @tokens << Nimono::Token.new(self.class.cabocha_tree_token(@tree, i))
129
+ end
130
+ @tokens.freeze
131
+
132
+ @chunks = []
133
+ self.class.cabocha_tree_chunk_size(@tree).times do |i|
134
+ @chunks << Nimono::Chunk.new(self.class.cabocha_tree_chunk(@tree, i))
135
+ # chunk = Nimono::Chunk.new(self.class.cabocha_tree_chunk(@tree, i))
136
+ # chunk.instance_variable_set(:@tokens, @tokens[chunk.token_pos..(chunk.token_pos + chunk.token_size - 1)])
137
+ # @chunks << chunk
138
+ end
139
+ @chunks.freeze
140
+
141
+ self.to_s
142
+ end
143
+ end
144
+
145
+ # The result of parsing Japanese text
146
+ # @return [String] parsing result
147
+ def to_s
148
+ @result
149
+ end
150
+
151
+ end
152
+
153
+ class CabochaError < RuntimeError; end
154
+ end
@@ -0,0 +1,106 @@
1
+ #-*- coding: utf-8 -*-
2
+ module Nimono
3
+
4
+ # Module 'OptionParse' encapsulates methods and behavior
5
+ # for parsing the various CaboCha options supported by 'Nimono'.
6
+ module OptionParse
7
+ require 'optparse'
8
+
9
+ # Supported CaboCha command line options are as follows.
10
+ # For further information please refer to CaboCha help.
11
+ SUPPORT_OPTS = {
12
+ '-f' => :output_format,
13
+ '-I' => :input_layer,
14
+ '-O' => :output_layer,
15
+ '-n' => :ne,
16
+ '-m' => :parser_model,
17
+ '-M' => :chunker_model,
18
+ '-N' => :ne_model,
19
+ '-P' => :posset,
20
+ '-t' => :charset,
21
+ '-T' => :charset_file,
22
+ '-r' => :rcfile,
23
+ '-b' => :mecabrc,
24
+ '-d' => :mecab_dicdir,
25
+ '-u' => :mecab_userdic,
26
+ '-o' => :output
27
+ }.freeze
28
+
29
+ # @private
30
+ def self.included(klass)
31
+ klass.extend(ClassMethods)
32
+ end
33
+
34
+ # @private
35
+ module ClassMethods
36
+
37
+ # Preparesand returns a hash mapping symbols for the specified,
38
+ # recognized CaboCha options, and their values. Will parse and
39
+ # convert string (short or long argument styles) or hash.
40
+ def parse_options(options={})
41
+ opts = {}
42
+ if options.is_a? String
43
+ opt = OptionParser.new
44
+ opt.on('-f', '--output-format=VAL') {|v| opts[:output_format] = v.strip }
45
+ opt.on('-I', '--input-layer=VAL') {|v| opts[:input_layer] = v.strip }
46
+ opt.on('-O', '--output-layer=VAL') {|v| opts[:output_layer] = v.strip }
47
+ opt.on('-n', '--ne=VAL') {|v| opts[:ne] = v.strip }
48
+ opt.on('-m', '--parser-model=VAL') {|v| opts[:parser_model] = v.strip }
49
+ opt.on('-M', '--chunker-model=VAL') {|v| opts[:chunker_model] = v.strip }
50
+ opt.on('-N', '--ne-model=VAL') {|v| opts[:ne_model] = v.strip }
51
+ opt.on('-P', '--posset=VAL') {|v| opts[:posset] = v.strip }
52
+ opt.on('-t', '--charset=VAL') {|v| opts[:charset] = v.strip }
53
+ opt.on('-T', '--charset-file=VAL') {|v| opts[:posset] = v.strip }
54
+ opt.on('-r', '--rcfile=VAL') {|v| opts[:rcfile] = v.strip }
55
+ opt.on('-b', '--mecabrc=VAL') {|v| opts[:mecabrc] = v.strip }
56
+ opt.on('-d', '--mecab-dicdir=VAL') {|v| opts[:posset] = v.strip }
57
+ opt.on('-u', '--mecab-userdic=VAL') {|v| opts[:posset] = v.strip }
58
+ opt.on('-o', '--output=VAL') {|v| opts[:output] = v.strip }
59
+ opt.parse!(options.split)
60
+ else
61
+ SUPPORT_OPTS.values.each do |k|
62
+ if options.has_key?(k)
63
+ opts[k] = options[k]
64
+ end
65
+ end
66
+ end
67
+
68
+ # validation
69
+ validate_numeric = ->(name, pattern) {
70
+ if opts.has_key?(name)
71
+ if opts[name].is_a?(String) && opts[name] =~ pattern
72
+ opts[name] = opts[name].to_i
73
+ elsif opts[name].is_a?(Fixnum) && opts[name].to_s =~ pattern
74
+ else
75
+ v = opts[name]
76
+ name_str = name.id2name.gsub('_', '-')
77
+ raise CabochaError.new("Invalid option: --#{name_str}=#{v}")
78
+ end
79
+ end
80
+ }
81
+
82
+ validate_numeric.(:output_format, /^[0-4]$/)
83
+ validate_numeric.(:input_layer, /^[0-3]$/)
84
+ validate_numeric.(:output_layer, /^[1-4]$/)
85
+ validate_numeric.(:ne, /^[0-2]$/)
86
+
87
+ opts
88
+ end
89
+
90
+ # Returns a string representation of the options to
91
+ # be passed in the construction of the CaboCha Paser.
92
+ # @param options[Hash] options for CaboCha
93
+ # @return [String] representation of the options to the CaboCha Parser
94
+ def build_options_str(options={})
95
+ opt = []
96
+ SUPPORT_OPTS.values.each do |k|
97
+ if options.has_key? k
98
+ key = k.to_s.gsub('_', '-')
99
+ opt << "--#{key}=#{options[k]}"
100
+ end
101
+ end
102
+ opt.empty? ? '' : opt.join(" ")
103
+ end
104
+ end
105
+ end
106
+ end
@@ -0,0 +1,4 @@
1
+ #-*- coding: utf-8 -*-
2
+ module Nimono
3
+ VERSION = "0.1.5"
4
+ end
@@ -0,0 +1,42 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'nimono/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "nimono"
8
+ spec.version = Nimono::VERSION
9
+ spec.authors = ["Takayoshi Yamazaki"]
10
+ spec.email = ["opal.s2000@gmail.com"]
11
+
12
+ spec.summary = %q{nimono is a interface to CaboCha.}
13
+ spec.description = %q{nimono is a interface to CaboCha for CRuby(mri/yarv)
14
+ and JRuby(jvm). It depends on the CaboCha library so the library will have
15
+ to install first.}
16
+ spec.homepage = "https://github.com/TakayoshiYamazaki/nimono"
17
+ spec.license = "MIT"
18
+
19
+ # Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
20
+ # to allow pushing to a single host or delete this section to allow pushing to any host.
21
+ # if spec.respond_to?(:metadata)
22
+ # spec.metadata['allowed_push_host'] = "TODO: Set to 'http://mygemserver.com'"
23
+ # else
24
+ # raise "RubyGems 2.0 or newer is required to protect against " \
25
+ # "public gem pushes."
26
+ # end
27
+
28
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
29
+ f.match(%r{^(test|spec|features)/})
30
+ end
31
+ spec.bindir = "exe"
32
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
33
+ spec.require_paths = ["lib"]
34
+
35
+ spec.required_ruby_version = '>= 1.9'
36
+ spec.requirements << 'CaboCha 0.69'
37
+ spec.add_runtime_dependency "ffi", ">= 1.9.0"
38
+
39
+ spec.add_development_dependency "bundler", "~> 1.13"
40
+ spec.add_development_dependency "rake", "~> 10.0"
41
+ spec.add_development_dependency "rspec", "~> 3.0"
42
+ end
metadata ADDED
@@ -0,0 +1,118 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: nimono
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.5
5
+ platform: ruby
6
+ authors:
7
+ - Takayoshi Yamazaki
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2017-02-14 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: ffi
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: 1.9.0
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: 1.9.0
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.13'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.13'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rake
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '10.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '10.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rspec
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '3.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '3.0'
69
+ description: "nimono is a interface to CaboCha for CRuby(mri/yarv)\n and JRuby(jvm).
70
+ It depends on the CaboCha library so the library will have \n to install first."
71
+ email:
72
+ - opal.s2000@gmail.com
73
+ executables: []
74
+ extensions: []
75
+ extra_rdoc_files: []
76
+ files:
77
+ - ".gitignore"
78
+ - ".rspec"
79
+ - ".travis.yml"
80
+ - CODE_OF_CONDUCT.md
81
+ - Gemfile
82
+ - LICENSE.txt
83
+ - README.md
84
+ - Rakefile
85
+ - bin/console
86
+ - bin/setup
87
+ - lib/nimono.rb
88
+ - lib/nimono/cabocha_lib.rb
89
+ - lib/nimono/nimono.rb
90
+ - lib/nimono/option_parse.rb
91
+ - lib/nimono/version.rb
92
+ - nimono.gemspec
93
+ homepage: https://github.com/TakayoshiYamazaki/nimono
94
+ licenses:
95
+ - MIT
96
+ metadata: {}
97
+ post_install_message:
98
+ rdoc_options: []
99
+ require_paths:
100
+ - lib
101
+ required_ruby_version: !ruby/object:Gem::Requirement
102
+ requirements:
103
+ - - ">="
104
+ - !ruby/object:Gem::Version
105
+ version: '1.9'
106
+ required_rubygems_version: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ requirements:
112
+ - CaboCha 0.69
113
+ rubyforge_project:
114
+ rubygems_version: 2.6.10
115
+ signing_key:
116
+ specification_version: 4
117
+ summary: nimono is a interface to CaboCha.
118
+ test_files: []