rmmseg 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- data/History.txt +6 -0
- data/Manifest.txt +37 -0
- data/README.txt +63 -0
- data/Rakefile +33 -0
- data/TODO.txt +3 -0
- data/bin/rmmseg +63 -0
- data/lib/rmmseg/algorithm.rb +157 -0
- data/lib/rmmseg/amibguity.rb +4 -0
- data/lib/rmmseg/chars.dic +12638 -0
- data/lib/rmmseg/chunk.rb +51 -0
- data/lib/rmmseg/complex_algorithm.rb +52 -0
- data/lib/rmmseg/config.rb +59 -0
- data/lib/rmmseg/dictionary.rb +66 -0
- data/lib/rmmseg/ferret.rb +43 -0
- data/lib/rmmseg/lawl_rule.rb +14 -0
- data/lib/rmmseg/lsdmfocw_rule.rb +15 -0
- data/lib/rmmseg/mm_rule.rb +15 -0
- data/lib/rmmseg/rule_helper.rb +22 -0
- data/lib/rmmseg/simple_algorithm.rb +22 -0
- data/lib/rmmseg/svwl_rule.rb +14 -0
- data/lib/rmmseg/token.rb +22 -0
- data/lib/rmmseg/word.rb +37 -0
- data/lib/rmmseg/words.dic +120330 -0
- data/lib/rmmseg.rb +15 -0
- data/misc/homepage.erb +93 -0
- data/misc/homepage.html +1063 -0
- data/spec/chunk_spec.rb +26 -0
- data/spec/complex_algorithm_spec.rb +18 -0
- data/spec/config_spec.rb +12 -0
- data/spec/dictionary_spec.rb +20 -0
- data/spec/lawl_rule_spec.rb +15 -0
- data/spec/lsdmfocw_rule_spec.rb +14 -0
- data/spec/mm_rule_spec.rb +15 -0
- data/spec/simple_algorithm_spec.rb +46 -0
- data/spec/spec_helper.rb +15 -0
- data/spec/svwl_rule_spec.rb +14 -0
- data/spec/word_spec.rb +9 -0
- metadata +101 -0
data/lib/rmmseg.rb
ADDED
@@ -0,0 +1,15 @@
|
|
1
|
+
$KCODE = 'u'
|
2
|
+
require 'jcode'
|
3
|
+
|
4
|
+
require 'rmmseg/config'
|
5
|
+
require 'rmmseg/simple_algorithm'
|
6
|
+
require 'rmmseg/complex_algorithm'
|
7
|
+
|
8
|
+
module RMMSeg
|
9
|
+
VERSION = '0.0.1'
|
10
|
+
|
11
|
+
# Segment +text+ using the algorithm configured.
|
12
|
+
def segment(text)
|
13
|
+
Config.algorithm_instance(text).segment
|
14
|
+
end
|
15
|
+
end
|
data/misc/homepage.erb
ADDED
@@ -0,0 +1,93 @@
|
|
1
|
+
<%
|
2
|
+
$title = "RMMSeg Homepage"
|
3
|
+
$authors = { 'pluskid' => 'http://pluskid.lifegoo.com' }
|
4
|
+
|
5
|
+
$unindent = ' '
|
6
|
+
%>
|
7
|
+
|
8
|
+
<% chapter "Introduction" do %>
|
9
|
+
|
10
|
+
RMMSeg is an implementation of
|
11
|
+
"MMSEG":http://technology.chtsai.org/mmseg/ Chinese word
|
12
|
+
segmentation algorithm. It is based on two variants of maximum
|
13
|
+
matching algorithms. Two algorithms are available for using:
|
14
|
+
|
15
|
+
* simple algorithm that uses only forward maximum matching.
|
16
|
+
* complex algorithm that uses three-word chunk maximum matching and 3
|
17
|
+
aditonal rules to solve ambiguities.
|
18
|
+
|
19
|
+
For more information about the algorithm, please refer to the
|
20
|
+
following essays:
|
21
|
+
|
22
|
+
* http://technology.chtsai.org/mmseg/
|
23
|
+
* http://pluskid.lifegoo.com/?p=261
|
24
|
+
|
25
|
+
RMMSeg can be used as either a stand alone program or an Analyser of
|
26
|
+
"Ferret":http://ferret.davebalmain.com/trac.
|
27
|
+
|
28
|
+
<% end %>
|
29
|
+
|
30
|
+
<% chapter "Setup" do %>
|
31
|
+
<% section "Requirements" do %>
|
32
|
+
|
33
|
+
Your system needs the following software to run RMMSeg.
|
34
|
+
|
35
|
+
|_. Software |_. Notes |
|
36
|
+
| "Ruby":http://ruby-lang.org | Version 1.8.x is required |
|
37
|
+
| "hoe":http://seattlerb.rubyforge.org/hoe/ | If you want to build the gem manually |
|
38
|
+
| "Rake":http://rake.rubyforge.org/ | If you want to build the gem manually |
|
39
|
+
| "rspec":http://rspec.rubyforge.org/ | If you want to run the testcases |
|
40
|
+
|
41
|
+
<% end %>
|
42
|
+
|
43
|
+
<% section "Installation" do %>
|
44
|
+
<% section "Using RubyGems" do %>
|
45
|
+
To install the gem remotely from "RubyForge":http://rubyforge.org :
|
46
|
+
|
47
|
+
sudo gem install rmmseg
|
48
|
+
|
49
|
+
Or you can download the gem file manually from RubyForge and install it locally:
|
50
|
+
|
51
|
+
sudo gem install --local rmmseg-x.y.z.gem
|
52
|
+
|
53
|
+
<% end %>
|
54
|
+
|
55
|
+
<% section "From Subversion" do %>
|
56
|
+
From subversion repository hosted at "RubyForge":http://rmmseg.rubyforge.org/svn/, you can always get the latest source code.
|
57
|
+
<% note "The latest code might be unstable" do %>
|
58
|
+
Some new features may only be available in the latest code in subversion, but the code might be broken in some cases. So it is recommended to use the released gem package for production.
|
59
|
+
<% end %>
|
60
|
+
To check out the code from Rubyforge, you need to install subversion, then:
|
61
|
+
|
62
|
+
svn checkout http://rmmseg.rubyforge.org/svn/trunk/ rmmseg
|
63
|
+
|
64
|
+
Then you can run
|
65
|
+
|
66
|
+
rake gem
|
67
|
+
|
68
|
+
to build the gem file.
|
69
|
+
<% end %>
|
70
|
+
<% end %>
|
71
|
+
<% end %>
|
72
|
+
|
73
|
+
<% chapter "Usage" do %>
|
74
|
+
|
75
|
+
<% section "Stand Alone rmmseg" do %>
|
76
|
+
RMMSeg comes with a script @rmmseg@. To get the basic usage, just execute it with @-h@ option:
|
77
|
+
|
78
|
+
rmmseg -h
|
79
|
+
|
80
|
+
It reads from STDIN and print result to STDOUT.
|
81
|
+
<% end %>
|
82
|
+
|
83
|
+
<% end %>
|
84
|
+
|
85
|
+
<% chapter "Resources" do %>
|
86
|
+
* "Project Home":http://rmmseg.rubyforge.org/: The Project page at RubyForge.
|
87
|
+
* "Implementation Details":http://pluskid.lifegoo.com/?p=261: My blog post about the implementation details of RMMSeg.
|
88
|
+
* "Ferret Homepage":http://ferret.davebalmain.com/trac: The homepage of Ferret project.
|
89
|
+
<% end %>
|
90
|
+
|
91
|
+
<% footer do %>
|
92
|
+
"[Validate]":http://validator.w3.org/check/referer
|
93
|
+
<% end %>
|