rmmseg 0.1.4 → 0.1.5

Sign up to get free protection for your applications and to get access to all the features.
data/History.txt CHANGED
@@ -1,3 +1,7 @@
1
+ === 0.1.5 / 2008-03-03
2
+
3
+ * Bug fix: Ferret Token is not Duck-Typing. We need to construct Ferret token instead of reuse RMMSeg Token.
4
+
1
5
  === 0.1.4 / 2008-03-02
2
6
 
3
7
  * Let user store their customized word to Dictionary after loaded.
data/README.txt CHANGED
@@ -10,7 +10,7 @@ algorithms. Two algorithms are available for using:
10
10
 
11
11
  * simple algorithm that uses only forward maximum matching.
12
12
  * complex algorithm that uses three-word chunk maximum matching and 3
13
- aditonal rules to solve ambiguities.
13
+ additonal rules to solve ambiguities.
14
14
 
15
15
  For more information about the algorithm, please refer to the
16
16
  following essays:
data/TODO.txt CHANGED
@@ -1,4 +1,5 @@
1
1
  === TODO
2
2
 
3
+ * Add mock test for RMMSeg::Ferret.
3
4
  * Avoid Memory Leak
4
5
  * Improve Performance
data/lib/rmmseg/ferret.rb CHANGED
@@ -39,7 +39,11 @@ module RMMSeg
39
39
 
40
40
  # Get next token
41
41
  def next
42
- @algor.next_token
42
+ tok = @algor.next_token
43
+ if tok
44
+ tok = ::Ferret::Analysis::Token.new(tok.text, tok.start, tok.end)
45
+ end
46
+ tok
43
47
  end
44
48
 
45
49
  # Get the text being tokenized
data/lib/rmmseg/token.rb CHANGED
@@ -18,9 +18,6 @@ module RMMSeg
18
18
  # token. This is *byte* index instead of character.
19
19
  attr_accessor :end
20
20
 
21
- # See Ferret document for Token.
22
- attr_accessor :pos_inc
23
-
24
21
  # +text+ is the ref to the whole text. In other words:
25
22
  # +text[start_pos...end_pos]+ should be the string held by this
26
23
  # token.
@@ -28,23 +25,7 @@ module RMMSeg
28
25
  @text = text
29
26
  @start = start_pos
30
27
  @end = end_pos
31
- @pos_inc = 1
32
- end
33
-
34
- def <=> other
35
- if @start > other.start
36
- return 1
37
- elsif @start < other.start
38
- return -1
39
- elsif @end > other.end
40
- return 1
41
- elsif @end < other.end
42
- return -1
43
- else
44
- return @text <=> other.text
45
- end
46
28
  end
47
- include Comparable
48
29
 
49
30
  def to_s
50
31
  @text.dup
data/lib/rmmseg.rb CHANGED
@@ -6,7 +6,7 @@ require 'rmmseg/simple_algorithm'
6
6
  require 'rmmseg/complex_algorithm'
7
7
 
8
8
  module RMMSeg
9
- VERSION = '0.1.4'
9
+ VERSION = '0.1.5'
10
10
 
11
11
  # Segment +text+ using the algorithm configured.
12
12
  def segment(text)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rmmseg
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.4
4
+ version: 0.1.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - pluskid
@@ -9,11 +9,11 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2008-03-02 00:00:00 +00:00
12
+ date: 2008-03-04 00:00:00 +00:00
13
13
  default_executable:
14
14
  dependencies: []
15
15
 
16
- description: "RMMSeg is an implementation of MMSEG Chinese word segmentation algorithm. It is based on two variants of maximum matching algorithms. Two algorithms are available for using: * simple algorithm that uses only forward maximum matching. * complex algorithm that uses three-word chunk maximum matching and 3 aditonal rules to solve ambiguities. For more information about the algorithm, please refer to the following essays: * http://technology.chtsai.org/mmseg/ * http://pluskid.lifegoo.com/?p=261"
16
+ description: "RMMSeg is an implementation of MMSEG Chinese word segmentation algorithm. It is based on two variants of maximum matching algorithms. Two algorithms are available for using: * simple algorithm that uses only forward maximum matching. * complex algorithm that uses three-word chunk maximum matching and 3 additonal rules to solve ambiguities. For more information about the algorithm, please refer to the following essays: * http://technology.chtsai.org/mmseg/ * http://pluskid.lifegoo.com/?p=261"
17
17
  email: pluskid@gmail.com
18
18
  executables:
19
19
  - rmmseg