pluskid-rmmseg-cpp 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
data/README ADDED
@@ -0,0 +1,51 @@
1
+ = rmmseg-cpp
2
+
3
+ == Background
4
+
5
+ rmmseg (http://rmmseg.rubyforge.org) is a Chinese word segmentation library
6
+ written for and in Ruby. It features full integration with Ruby. However,
7
+ its performance (both time and memory) is terrible in some cases, especially
8
+ when you use the complex algorithm.
9
+
10
+ So I re-implemented rmmseg in C++ and wrapped it as a Ruby extension. This
11
+ gem is aimed at high performance and thus less extensible than the pure-
12
+ Ruby rmmseg gem. There are also some differences between them:
13
+
14
+ * The dictionary format is different. For performance reason, the words
15
+ dictionary of rmmseg-cpp included the word length information in it.
16
+ See rdoc of the Dictionary class for more information of the format.
17
+ It is likely that I will upgrade rmmseg's dictionary format to make
18
+ those two compatible, writing a Ruby script to convert the dictionary
19
+ is almost trivial.
20
+
21
+ While the dictionaries in rmmseg will be loaded automatically when needed,
22
+ you'll need to load dictionaries explicitly in rmmseg-cpp.
23
+
24
+ * Only complex algorithm is provided currently. And I don't see any need
25
+ to implement the simple algorithm here if complex algorithm has a good
26
+ performance and a much better accuracy.
27
+
28
+ == Install
29
+
30
+ This project is hosted at github (http://github.com/pluskid/rmmseg-cpp/).
31
+ You can use rubygems to install rmmseg-cpp:
32
+
33
+ sudo gem install pluskid-rmmseg-cpp --source=http://gems.github.com
34
+
35
+ Unfortunately, the command is long and you'll have to use the 'pluskid'
36
+ prefix currently. Or you can also check out the latest source code:
37
+
38
+ git clone git://github.com/pluskid/rmmseg-cpp.git
39
+
40
+ == Usage
41
+
42
+ Integration with ferret is almost identical as before, to run the example for
43
+ rmmseg (http://rmmseg.rubyforge.org/#Analyzer-for-Ferret), only one minor
44
+ change is needed:
45
+
46
+ RMMSeg::Dictionary.load_dictionaries
47
+
48
+ Add the line above to explicitly load the default dictionaries before
49
+ using the analyzer. You can also add your own dictionaries through
50
+ <tt>Dictionary#add</tt>.
51
+