bayon 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/COPYING ADDED
@@ -0,0 +1,339 @@
1
+ GNU GENERAL PUBLIC LICENSE
2
+ Version 2, June 1991
3
+
4
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
5
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
6
+ Everyone is permitted to copy and distribute verbatim copies
7
+ of this license document, but changing it is not allowed.
8
+
9
+ Preamble
10
+
11
+ The licenses for most software are designed to take away your
12
+ freedom to share and change it. By contrast, the GNU General Public
13
+ License is intended to guarantee your freedom to share and change free
14
+ software--to make sure the software is free for all its users. This
15
+ General Public License applies to most of the Free Software
16
+ Foundation's software and to any other program whose authors commit to
17
+ using it. (Some other Free Software Foundation software is covered by
18
+ the GNU Lesser General Public License instead.) You can apply it to
19
+ your programs, too.
20
+
21
+ When we speak of free software, we are referring to freedom, not
22
+ price. Our General Public Licenses are designed to make sure that you
23
+ have the freedom to distribute copies of free software (and charge for
24
+ this service if you wish), that you receive source code or can get it
25
+ if you want it, that you can change the software or use pieces of it
26
+ in new free programs; and that you know you can do these things.
27
+
28
+ To protect your rights, we need to make restrictions that forbid
29
+ anyone to deny you these rights or to ask you to surrender the rights.
30
+ These restrictions translate to certain responsibilities for you if you
31
+ distribute copies of the software, or if you modify it.
32
+
33
+ For example, if you distribute copies of such a program, whether
34
+ gratis or for a fee, you must give the recipients all the rights that
35
+ you have. You must make sure that they, too, receive or can get the
36
+ source code. And you must show them these terms so they know their
37
+ rights.
38
+
39
+ We protect your rights with two steps: (1) copyright the software, and
40
+ (2) offer you this license which gives you legal permission to copy,
41
+ distribute and/or modify the software.
42
+
43
+ Also, for each author's protection and ours, we want to make certain
44
+ that everyone understands that there is no warranty for this free
45
+ software. If the software is modified by someone else and passed on, we
46
+ want its recipients to know that what they have is not the original, so
47
+ that any problems introduced by others will not reflect on the original
48
+ authors' reputations.
49
+
50
+ Finally, any free program is threatened constantly by software
51
+ patents. We wish to avoid the danger that redistributors of a free
52
+ program will individually obtain patent licenses, in effect making the
53
+ program proprietary. To prevent this, we have made it clear that any
54
+ patent must be licensed for everyone's free use or not licensed at all.
55
+
56
+ The precise terms and conditions for copying, distribution and
57
+ modification follow.
58
+
59
+ GNU GENERAL PUBLIC LICENSE
60
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
61
+
62
+ 0. This License applies to any program or other work which contains
63
+ a notice placed by the copyright holder saying it may be distributed
64
+ under the terms of this General Public License. The "Program", below,
65
+ refers to any such program or work, and a "work based on the Program"
66
+ means either the Program or any derivative work under copyright law:
67
+ that is to say, a work containing the Program or a portion of it,
68
+ either verbatim or with modifications and/or translated into another
69
+ language. (Hereinafter, translation is included without limitation in
70
+ the term "modification".) Each licensee is addressed as "you".
71
+
72
+ Activities other than copying, distribution and modification are not
73
+ covered by this License; they are outside its scope. The act of
74
+ running the Program is not restricted, and the output from the Program
75
+ is covered only if its contents constitute a work based on the
76
+ Program (independent of having been made by running the Program).
77
+ Whether that is true depends on what the Program does.
78
+
79
+ 1. You may copy and distribute verbatim copies of the Program's
80
+ source code as you receive it, in any medium, provided that you
81
+ conspicuously and appropriately publish on each copy an appropriate
82
+ copyright notice and disclaimer of warranty; keep intact all the
83
+ notices that refer to this License and to the absence of any warranty;
84
+ and give any other recipients of the Program a copy of this License
85
+ along with the Program.
86
+
87
+ You may charge a fee for the physical act of transferring a copy, and
88
+ you may at your option offer warranty protection in exchange for a fee.
89
+
90
+ 2. You may modify your copy or copies of the Program or any portion
91
+ of it, thus forming a work based on the Program, and copy and
92
+ distribute such modifications or work under the terms of Section 1
93
+ above, provided that you also meet all of these conditions:
94
+
95
+ a) You must cause the modified files to carry prominent notices
96
+ stating that you changed the files and the date of any change.
97
+
98
+ b) You must cause any work that you distribute or publish, that in
99
+ whole or in part contains or is derived from the Program or any
100
+ part thereof, to be licensed as a whole at no charge to all third
101
+ parties under the terms of this License.
102
+
103
+ c) If the modified program normally reads commands interactively
104
+ when run, you must cause it, when started running for such
105
+ interactive use in the most ordinary way, to print or display an
106
+ announcement including an appropriate copyright notice and a
107
+ notice that there is no warranty (or else, saying that you provide
108
+ a warranty) and that users may redistribute the program under
109
+ these conditions, and telling the user how to view a copy of this
110
+ License. (Exception: if the Program itself is interactive but
111
+ does not normally print such an announcement, your work based on
112
+ the Program is not required to print an announcement.)
113
+
114
+ These requirements apply to the modified work as a whole. If
115
+ identifiable sections of that work are not derived from the Program,
116
+ and can be reasonably considered independent and separate works in
117
+ themselves, then this License, and its terms, do not apply to those
118
+ sections when you distribute them as separate works. But when you
119
+ distribute the same sections as part of a whole which is a work based
120
+ on the Program, the distribution of the whole must be on the terms of
121
+ this License, whose permissions for other licensees extend to the
122
+ entire whole, and thus to each and every part regardless of who wrote it.
123
+
124
+ Thus, it is not the intent of this section to claim rights or contest
125
+ your rights to work written entirely by you; rather, the intent is to
126
+ exercise the right to control the distribution of derivative or
127
+ collective works based on the Program.
128
+
129
+ In addition, mere aggregation of another work not based on the Program
130
+ with the Program (or with a work based on the Program) on a volume of
131
+ a storage or distribution medium does not bring the other work under
132
+ the scope of this License.
133
+
134
+ 3. You may copy and distribute the Program (or a work based on it,
135
+ under Section 2) in object code or executable form under the terms of
136
+ Sections 1 and 2 above provided that you also do one of the following:
137
+
138
+ a) Accompany it with the complete corresponding machine-readable
139
+ source code, which must be distributed under the terms of Sections
140
+ 1 and 2 above on a medium customarily used for software interchange; or,
141
+
142
+ b) Accompany it with a written offer, valid for at least three
143
+ years, to give any third party, for a charge no more than your
144
+ cost of physically performing source distribution, a complete
145
+ machine-readable copy of the corresponding source code, to be
146
+ distributed under the terms of Sections 1 and 2 above on a medium
147
+ customarily used for software interchange; or,
148
+
149
+ c) Accompany it with the information you received as to the offer
150
+ to distribute corresponding source code. (This alternative is
151
+ allowed only for noncommercial distribution and only if you
152
+ received the program in object code or executable form with such
153
+ an offer, in accord with Subsection b above.)
154
+
155
+ The source code for a work means the preferred form of the work for
156
+ making modifications to it. For an executable work, complete source
157
+ code means all the source code for all modules it contains, plus any
158
+ associated interface definition files, plus the scripts used to
159
+ control compilation and installation of the executable. However, as a
160
+ special exception, the source code distributed need not include
161
+ anything that is normally distributed (in either source or binary
162
+ form) with the major components (compiler, kernel, and so on) of the
163
+ operating system on which the executable runs, unless that component
164
+ itself accompanies the executable.
165
+
166
+ If distribution of executable or object code is made by offering
167
+ access to copy from a designated place, then offering equivalent
168
+ access to copy the source code from the same place counts as
169
+ distribution of the source code, even though third parties are not
170
+ compelled to copy the source along with the object code.
171
+
172
+ 4. You may not copy, modify, sublicense, or distribute the Program
173
+ except as expressly provided under this License. Any attempt
174
+ otherwise to copy, modify, sublicense or distribute the Program is
175
+ void, and will automatically terminate your rights under this License.
176
+ However, parties who have received copies, or rights, from you under
177
+ this License will not have their licenses terminated so long as such
178
+ parties remain in full compliance.
179
+
180
+ 5. You are not required to accept this License, since you have not
181
+ signed it. However, nothing else grants you permission to modify or
182
+ distribute the Program or its derivative works. These actions are
183
+ prohibited by law if you do not accept this License. Therefore, by
184
+ modifying or distributing the Program (or any work based on the
185
+ Program), you indicate your acceptance of this License to do so, and
186
+ all its terms and conditions for copying, distributing or modifying
187
+ the Program or works based on it.
188
+
189
+ 6. Each time you redistribute the Program (or any work based on the
190
+ Program), the recipient automatically receives a license from the
191
+ original licensor to copy, distribute or modify the Program subject to
192
+ these terms and conditions. You may not impose any further
193
+ restrictions on the recipients' exercise of the rights granted herein.
194
+ You are not responsible for enforcing compliance by third parties to
195
+ this License.
196
+
197
+ 7. If, as a consequence of a court judgment or allegation of patent
198
+ infringement or for any other reason (not limited to patent issues),
199
+ conditions are imposed on you (whether by court order, agreement or
200
+ otherwise) that contradict the conditions of this License, they do not
201
+ excuse you from the conditions of this License. If you cannot
202
+ distribute so as to satisfy simultaneously your obligations under this
203
+ License and any other pertinent obligations, then as a consequence you
204
+ may not distribute the Program at all. For example, if a patent
205
+ license would not permit royalty-free redistribution of the Program by
206
+ all those who receive copies directly or indirectly through you, then
207
+ the only way you could satisfy both it and this License would be to
208
+ refrain entirely from distribution of the Program.
209
+
210
+ If any portion of this section is held invalid or unenforceable under
211
+ any particular circumstance, the balance of the section is intended to
212
+ apply and the section as a whole is intended to apply in other
213
+ circumstances.
214
+
215
+ It is not the purpose of this section to induce you to infringe any
216
+ patents or other property right claims or to contest validity of any
217
+ such claims; this section has the sole purpose of protecting the
218
+ integrity of the free software distribution system, which is
219
+ implemented by public license practices. Many people have made
220
+ generous contributions to the wide range of software distributed
221
+ through that system in reliance on consistent application of that
222
+ system; it is up to the author/donor to decide if he or she is willing
223
+ to distribute software through any other system and a licensee cannot
224
+ impose that choice.
225
+
226
+ This section is intended to make thoroughly clear what is believed to
227
+ be a consequence of the rest of this License.
228
+
229
+ 8. If the distribution and/or use of the Program is restricted in
230
+ certain countries either by patents or by copyrighted interfaces, the
231
+ original copyright holder who places the Program under this License
232
+ may add an explicit geographical distribution limitation excluding
233
+ those countries, so that distribution is permitted only in or among
234
+ countries not thus excluded. In such case, this License incorporates
235
+ the limitation as if written in the body of this License.
236
+
237
+ 9. The Free Software Foundation may publish revised and/or new versions
238
+ of the General Public License from time to time. Such new versions will
239
+ be similar in spirit to the present version, but may differ in detail to
240
+ address new problems or concerns.
241
+
242
+ Each version is given a distinguishing version number. If the Program
243
+ specifies a version number of this License which applies to it and "any
244
+ later version", you have the option of following the terms and conditions
245
+ either of that version or of any later version published by the Free
246
+ Software Foundation. If the Program does not specify a version number of
247
+ this License, you may choose any version ever published by the Free Software
248
+ Foundation.
249
+
250
+ 10. If you wish to incorporate parts of the Program into other free
251
+ programs whose distribution conditions are different, write to the author
252
+ to ask for permission. For software which is copyrighted by the Free
253
+ Software Foundation, write to the Free Software Foundation; we sometimes
254
+ make exceptions for this. Our decision will be guided by the two goals
255
+ of preserving the free status of all derivatives of our free software and
256
+ of promoting the sharing and reuse of software generally.
257
+
258
+ NO WARRANTY
259
+
260
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
261
+ FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
262
+ OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
263
+ PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
264
+ OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
265
+ MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
266
+ TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
267
+ PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
268
+ REPAIR OR CORRECTION.
269
+
270
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
271
+ WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
272
+ REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
273
+ INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
274
+ OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
275
+ TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
276
+ YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
277
+ PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
278
+ POSSIBILITY OF SUCH DAMAGES.
279
+
280
+ END OF TERMS AND CONDITIONS
281
+
282
+ How to Apply These Terms to Your New Programs
283
+
284
+ If you develop a new program, and you want it to be of the greatest
285
+ possible use to the public, the best way to achieve this is to make it
286
+ free software which everyone can redistribute and change under these terms.
287
+
288
+ To do so, attach the following notices to the program. It is safest
289
+ to attach them to the start of each source file to most effectively
290
+ convey the exclusion of warranty; and each file should have at least
291
+ the "copyright" line and a pointer to where the full notice is found.
292
+
293
+ <one line to give the program's name and a brief idea of what it does.>
294
+ Copyright (C) <year> <name of author>
295
+
296
+ This program is free software; you can redistribute it and/or modify
297
+ it under the terms of the GNU General Public License as published by
298
+ the Free Software Foundation; either version 2 of the License, or
299
+ (at your option) any later version.
300
+
301
+ This program is distributed in the hope that it will be useful,
302
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
303
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
304
+ GNU General Public License for more details.
305
+
306
+ You should have received a copy of the GNU General Public License along
307
+ with this program; if not, write to the Free Software Foundation, Inc.,
308
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
309
+
310
+ Also add information on how to contact you by electronic and paper mail.
311
+
312
+ If the program is interactive, make it output a short notice like this
313
+ when it starts in an interactive mode:
314
+
315
+ Gnomovision version 69, Copyright (C) year name of author
316
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
317
+ This is free software, and you are welcome to redistribute it
318
+ under certain conditions; type `show c' for details.
319
+
320
+ The hypothetical commands `show w' and `show c' should show the appropriate
321
+ parts of the General Public License. Of course, the commands you use may
322
+ be called something other than `show w' and `show c'; they could even be
323
+ mouse-clicks or menu items--whatever suits your program.
324
+
325
+ You should also get your employer (if you work as a programmer) or your
326
+ school, if any, to sign a "copyright disclaimer" for the program, if
327
+ necessary. Here is a sample; alter the names:
328
+
329
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
330
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
331
+
332
+ <signature of Ty Coon>, 1 April 1989
333
+ Ty Coon, President of Vice
334
+
335
+ This General Public License does not permit incorporating your program into
336
+ proprietary programs. If your program is a subroutine library, you may
337
+ consider it more useful to permit linking proprietary applications with the
338
+ library. If this is what you want to do, use the GNU Lesser General
339
+ Public License instead of this License.
data/README ADDED
@@ -0,0 +1,51 @@
1
+ = bayon/Ruby
2
+
3
+ Copyright (c) 2009 SUGAWARA Genki <sgwr_dts@yahoo.co.jp>
4
+
5
+ == Description
6
+
7
+ Ruby bindings for bayon.
8
+
9
+ see {bayon}[http://code.google.com/p/bayon/].
10
+
11
+ == Install
12
+
13
+ gem install bayon
14
+
15
+ == Example
16
+
17
+ require 'bayon'
18
+
19
+ docs = Bayon::Documents.new
20
+ docs.cluster_size_limit = 3
21
+ docs.add_document('Jacob' , 'J-POP' => 10, 'J-R&B' => 6, 'Rock' => 4)
22
+ docs.add_document('Emma' , 'Jazz' => 8, 'Reggae'=> 9)
23
+ docs.add_document('Michael' , 'Classical music' => 4, 'World music' => 4)
24
+ docs.add_document('Isabella', 'Jazz' => 9, 'Metal' => 2, 'Reggae' => 6)
25
+ docs.add_document('Ethan' , 'J-POP' => 4, 'Rock' => 3, 'Hip hop' => 3)
26
+ docs.add_document('Emily' , 'Classical music' => 8, 'Rock' => 1)
27
+
28
+ result = docs.do_clustering
29
+ #=> [["Emma", "Isabella"], ["Jacob", "Ethan"], ["Michael", "Emily"]]
30
+
31
+ result.each do |labels|
32
+ puts labels.join(', ')
33
+ end
34
+
35
+ == Project Page
36
+
37
+ http://rubyforge.org/projects/bayon
38
+
39
+ == Source Code
40
+
41
+ http://coderepos.org/share/browser/lang/ruby/ruby-bayon
42
+
43
+ == License
44
+
45
+ bayon/Ruby is distributed under the GNU General Public License.
46
+
47
+ see COPYING.
48
+
49
+ == Requirements
50
+
51
+ * {bayon - simple and fast clustering tool}[http://code.google.com/p/bayon/]
data/ext/bayonext.cpp ADDED
@@ -0,0 +1,204 @@
1
+ #include <string>
2
+ #include <vector>
3
+ #include <map>
4
+
5
+ #include <bayon/cluster.h>
6
+
7
+ #include "bayonext_internal.h"
8
+
9
+ namespace {
10
+
11
+ class CBayonDocument {
12
+ static VALUE rb_cBayonDocument;
13
+
14
+ bayon::Document* document_;
15
+
16
+ static void free(CBayonDocument *p) {
17
+ if (p->document_) {
18
+ delete p->document_;
19
+ }
20
+
21
+ delete p;
22
+ }
23
+
24
+ static VALUE alloc(VALUE klass) {
25
+ CBayonDocument *p;
26
+
27
+ p = new CBayonDocument;
28
+ p->document_ = 0;
29
+
30
+ return Data_Wrap_Struct(klass, 0, &free, p);
31
+ }
32
+
33
+ static VALUE initialize(VALUE self, VALUE v_docid) {
34
+ CBayonDocument *p;
35
+
36
+ long doc_id = NUM2LONG(v_docid);
37
+
38
+ Data_Get_Struct(self, CBayonDocument, p);
39
+ p->document_ = new bayon::Document(doc_id);
40
+
41
+ return Qnil;
42
+ }
43
+
44
+ static VALUE add_feature(VALUE self, VALUE v_key, VALUE v_value) {
45
+ CBayonDocument *p;
46
+
47
+ long key = NUM2LONG(v_key);
48
+ long value = NUM2LONG(v_value);
49
+
50
+ Data_Get_Struct(self, CBayonDocument, p);
51
+ p->document_->add_feature(key, value);
52
+
53
+ return Qnil;
54
+ }
55
+
56
+ public:
57
+ static void check_type(VALUE obj) {
58
+ const char * const classname = rb_class2name(CLASS_OF(obj));
59
+
60
+ if (!rb_obj_is_instance_of(obj, rb_cBayonDocument)) {
61
+ rb_raise(rb_eTypeError, "wrong argument type %s (expected Bayon::Document)", classname); \
62
+ }
63
+ }
64
+
65
+ static bayon::Document* to_document(VALUE obj) {
66
+ CBayonDocument *p;
67
+
68
+ check_type(obj);
69
+ Data_Get_Struct(obj, CBayonDocument, p);
70
+
71
+ return p->document_;
72
+ }
73
+
74
+ static void init(VALUE &module) {
75
+ rb_cBayonDocument = rb_define_class_under(module, "Document", rb_cObject);
76
+
77
+ rb_define_alloc_func(rb_cBayonDocument, &alloc);
78
+ rb_define_method(rb_cBayonDocument, "initialize", __F(&initialize), 1);
79
+ rb_define_method(rb_cBayonDocument, "add_feature", __F(&add_feature), 2);
80
+ }
81
+ };
82
+
83
+ VALUE CBayonDocument::rb_cBayonDocument = Qnil;
84
+
85
+ class CBayonAnalyzer {
86
+ bayon::Analyzer* analyzer_;
87
+
88
+ static void free(CBayonAnalyzer *p) {
89
+ if (p->analyzer_) {
90
+ delete p->analyzer_;
91
+ }
92
+
93
+ delete p;
94
+ }
95
+
96
+ static VALUE alloc(VALUE klass) {
97
+ CBayonAnalyzer *p;
98
+
99
+ p = new CBayonAnalyzer;
100
+ p->analyzer_ = 0;
101
+
102
+ return Data_Wrap_Struct(klass, 0, &free, p);
103
+ }
104
+
105
+ static VALUE initialize(VALUE self) {
106
+ CBayonAnalyzer *p;
107
+
108
+ Data_Get_Struct(self, CBayonAnalyzer, p);
109
+ p->analyzer_ = new bayon::Analyzer;
110
+
111
+ return Qnil;
112
+ }
113
+
114
+ static VALUE add_document(VALUE self, VALUE v_doc) {
115
+ CBayonAnalyzer *p;
116
+ bayon::Document *document;
117
+
118
+ document = CBayonDocument::to_document(v_doc);
119
+ Data_Get_Struct(self, CBayonAnalyzer, p);
120
+ p->analyzer_->add_document(*document);
121
+
122
+ return Qnil;
123
+ }
124
+
125
+ static VALUE set_cluster_size_limit(VALUE self, VALUE v_limit) {
126
+ CBayonAnalyzer *p;
127
+
128
+ size_t limit = NUM2LONG(v_limit);
129
+
130
+ Data_Get_Struct(self, CBayonAnalyzer, p);
131
+ p->analyzer_->set_cluster_size_limit(limit);
132
+
133
+ return Qnil;
134
+ }
135
+
136
+ static VALUE set_eval_limit(VALUE self, VALUE v_limit) {
137
+ CBayonAnalyzer *p;
138
+
139
+ double limit = NUM2DBL(v_limit);
140
+
141
+ Data_Get_Struct(self, CBayonAnalyzer, p);
142
+ p->analyzer_->set_eval_limit(limit);
143
+
144
+ return Qnil;
145
+ }
146
+
147
+ static VALUE do_clustering(VALUE self, VALUE v_method) {
148
+ CBayonAnalyzer *p;
149
+
150
+ Check_Type(v_method, T_STRING);
151
+
152
+ Data_Get_Struct(self, CBayonAnalyzer, p);
153
+ p->analyzer_->do_clustering(std::string(RSTRING_PTR(v_method)));
154
+
155
+ return Qnil;
156
+ }
157
+
158
+ static VALUE get_next_result(VALUE self) {
159
+ CBayonAnalyzer *p;
160
+ bayon::Cluster cluster;
161
+
162
+ Data_Get_Struct(self, CBayonAnalyzer, p);
163
+
164
+ if(p->analyzer_->get_next_result(cluster)) {
165
+ const std::vector<bayon::Document *> documents = cluster.documents();
166
+ VALUE docids = rb_ary_new2(documents.size());
167
+
168
+ for(std::vector<bayon::Document *>::const_iterator i = documents.begin(); i != documents.end(); i++) {
169
+ bayon::Document* doc = *i;
170
+ rb_ary_push(docids, LONG2NUM(doc->id()));
171
+ }
172
+
173
+ return docids;
174
+ } else {
175
+ return Qnil;
176
+ }
177
+ }
178
+
179
+ public:
180
+ static void init(VALUE &module) {
181
+ VALUE rb_cBayonAnalyzer = rb_define_class_under(module, "Analyzer", rb_cObject);
182
+
183
+ rb_define_alloc_func(rb_cBayonAnalyzer, &alloc);
184
+ rb_define_method(rb_cBayonAnalyzer, "initialize", __F(&initialize), 0);
185
+ rb_define_method(rb_cBayonAnalyzer, "add_document", __F(&add_document), 1);
186
+ rb_define_method(rb_cBayonAnalyzer, "set_cluster_size_limit", __F(&set_cluster_size_limit), 1);
187
+ rb_define_method(rb_cBayonAnalyzer, "set_eval_limit", __F(&set_eval_limit), 1);
188
+ rb_define_method(rb_cBayonAnalyzer, "do_clustering", __F(&do_clustering), 1);
189
+ rb_define_method(rb_cBayonAnalyzer, "get_next_result", __F(&get_next_result), 0);
190
+
191
+ rb_define_const(rb_cBayonAnalyzer, "KMEANS", rb_str_new2("kmeans"));
192
+ rb_define_const(rb_cBayonAnalyzer, "REPEATED_BISECTION", rb_str_new2("rb"));
193
+ }
194
+ };
195
+
196
+ }
197
+
198
+ void Init_bayonext() {
199
+ VALUE rb_mBayon;
200
+
201
+ rb_mBayon = rb_define_module("Bayon");
202
+ CBayonDocument::init(rb_mBayon);
203
+ CBayonAnalyzer::init(rb_mBayon);
204
+ }
@@ -0,0 +1,46 @@
1
+ #ifndef __BAYONEXT_INTERNAL_H__
2
+ #define __BAYONEXT_INTERNAL_H__
3
+
4
+ #ifdef PACKAGE_NAME
5
+ #undef PACKAGE_NAME
6
+ #endif
7
+
8
+ #ifdef PACKAGE_TARNAME
9
+ #undef PACKAGE_TARNAME
10
+ #endif
11
+
12
+ #ifdef PACKAGE_VERSION
13
+ #undef PACKAGE_VERSION
14
+ #endif
15
+
16
+ #ifdef PACKAGE_STRING
17
+ #undef PACKAGE_STRING
18
+ #endif
19
+
20
+ #ifdef PACKAGE_BUGREPORT
21
+ #undef PACKAGE_BUGREPORT
22
+ #endif
23
+
24
+ #include <ruby.h>
25
+
26
+ #ifndef RSTRING_PTR
27
+ #define RSTRING_PTR(s) (RSTRING(s)->ptr)
28
+ #endif
29
+ #ifndef RSTRING_LEN
30
+ #define RSTRING_LEN(s) (RSTRING(s)->len)
31
+ #endif
32
+
33
+ #ifdef _WIN32
34
+ #define __F(f) (reinterpret_cast<VALUE (__cdecl *)(...)>(f))
35
+ #else
36
+ #define __F(f) (reinterpret_cast<VALUE (*)(...)>(f))
37
+ #endif
38
+
39
+ extern "C" {
40
+ #ifdef _WIN32
41
+ __declspec(dllexport)
42
+ #endif
43
+ void Init_bayonext();
44
+ }
45
+
46
+ #endif // __BAYONEXT_INTERNAL_H__
data/ext/extconf.rb ADDED
@@ -0,0 +1,5 @@
1
+ require 'mkmf'
2
+
3
+ if have_library('stdc++') and have_library('bayon')
4
+ create_makefile('bayonext')
5
+ end
data/lib/bayon.rb ADDED
@@ -0,0 +1,68 @@
1
+ require 'bayonext'
2
+
3
+ module Bayon
4
+ class Documents
5
+ def initialize
6
+ @documents = []
7
+ @cluster_size_limit = nil
8
+ @eval_limit = nil
9
+ end
10
+
11
+ def cluster_size_limit=(limit)
12
+ unless limit.kind_of?(Integer)
13
+ raise TypeError, "wrong argument type #{limit.class} (expected Integer)"
14
+ end
15
+
16
+ @cluster_size_limit = limit
17
+ end
18
+
19
+ def eval_limit=(limit)
20
+ unless limit.kind_of?(Numeric)
21
+ raise TypeError, "wrong argument type #{limit.class} (expected Numeric)"
22
+ end
23
+
24
+ @eval_limit = limit
25
+ end
26
+
27
+ def add_document(label, features)
28
+ unless features.kind_of?(Hash)
29
+ raise TypeError, "wrong argument type #{limit.class} (expected Hash)"
30
+ end
31
+
32
+ if (label_features = @documents.assoc(label))
33
+ label_features[1] = features
34
+ else
35
+ @documents << [label, features]
36
+ end
37
+ end
38
+
39
+ def do_clustering(method = Analyzer::REPEATED_BISECTION)
40
+ analyzer = Analyzer.new
41
+ analyzer.set_cluster_size_limit(@cluster_size_limit) if @cluster_size_limit
42
+ analyzer.set_eval_limit(@eval_limit) if @eval_limit
43
+
44
+ feature_set = []
45
+
46
+ @documents.each_with_index do |label_features, i|
47
+ doc = Document.new(i)
48
+
49
+ label_features[1].each do |feature, value|
50
+ feature_set << feature unless feature_set.include?(feature)
51
+ doc.add_feature(feature_set.index(feature), value)
52
+ end
53
+
54
+ analyzer.add_document(doc)
55
+ end
56
+
57
+ analyzer.do_clustering(method)
58
+
59
+ result = []
60
+
61
+ while (cluster = analyzer.get_next_result)
62
+ result << cluster.map {|doc_id| @documents[doc_id][0] }
63
+ end
64
+
65
+ return result
66
+ end
67
+ end
68
+ end
metadata ADDED
@@ -0,0 +1,59 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: bayon
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - winebarrel
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-06-14 00:00:00 +09:00
13
+ default_executable:
14
+ dependencies: []
15
+
16
+ description:
17
+ email: sgwr_dts@yahoo.co.jp
18
+ executables: []
19
+
20
+ extensions:
21
+ - ext/extconf.rb
22
+ extra_rdoc_files:
23
+ - README
24
+ files:
25
+ - lib/bayon.rb
26
+ - ext/bayonext.cpp
27
+ - ext/bayonext_internal.h
28
+ - ext/extconf.rb
29
+ - README
30
+ - COPYING
31
+ has_rdoc: true
32
+ homepage: http://bayon.rubyforge.org
33
+ post_install_message:
34
+ rdoc_options:
35
+ - --title
36
+ - bayon/Ruby - Ruby bindings for bayon.
37
+ require_paths:
38
+ - lib
39
+ required_ruby_version: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - ">="
42
+ - !ruby/object:Gem::Version
43
+ version: "0"
44
+ version:
45
+ required_rubygems_version: !ruby/object:Gem::Requirement
46
+ requirements:
47
+ - - ">="
48
+ - !ruby/object:Gem::Version
49
+ version: "0"
50
+ version:
51
+ requirements: []
52
+
53
+ rubyforge_project: bayon
54
+ rubygems_version: 1.3.1
55
+ signing_key:
56
+ specification_version: 2
57
+ summary: Ruby bindings for bayon.
58
+ test_files: []
59
+