rbtagger 0.3.2 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README +44 -0
- data/Rakefile +78 -4
- data/ext/rule_tagger/registry.c +4 -4
- data/ext/rule_tagger/registry.h +1 -1
- data/ext/word_tagger/rtagger.cc +23 -1
- data/ext/word_tagger/tagger.cc +9 -4
- data/ext/word_tagger/tagger.h +2 -0
- data/ext/word_tagger/test.rb +2 -2
- data/lib/brill/brown/{LEXICON → Lexicon.rb} +0 -0
- data/lib/brill/tagger.rb +1 -1
- data/lib/rbtagger.rb +0 -3
- data/lib/rbtagger/version.rb +2 -2
- data/lib/word/tagger.rb +2 -1
- metadata +38 -101
- data/COPYING +0 -21
- data/History.txt +0 -4
- data/License.txt +0 -20
- data/Manifest.txt +0 -82
- data/PostInstall.txt +0 -1
- data/README.txt +0 -51
- data/config/hoe.rb +0 -74
- data/config/requirements.rb +0 -15
- data/ext/rule_tagger/mkmf.log +0 -46
- data/ext/word_tagger/mkmf.log +0 -24
- data/ext/word_tagger/test/Makefile +0 -22
- data/ext/word_tagger/test/doc.txt +0 -87
- data/lib/brill/brown/CONTEXTUALRULEFILE +0 -284
- data/lib/brill/brown/LEXICALRULEFILE +0 -148
- data/script/console +0 -10
- data/script/destroy +0 -14
- data/script/generate +0 -14
- data/script/txt2html +0 -82
- data/setup.rb +0 -1585
- data/tasks/deployment.rake +0 -34
- data/tasks/environment.rake +0 -7
- data/tasks/extconf.rake +0 -18
- data/tasks/extconf/rule_tagger.rake +0 -43
- data/tasks/extconf/word_tagger.rake +0 -43
- data/tasks/website.rake +0 -17
- data/test/docs/doc0.txt +0 -20
- data/test/docs/doc1.txt +0 -11
- data/test/docs/doc2.txt +0 -52
- data/test/docs/doc3.txt +0 -128
- data/test/docs/doc4.txt +0 -337
- data/test/docs/doc5.txt +0 -497
- data/test/docs/doc6.txt +0 -116
- data/test/docs/doc7.txt +0 -101
- data/test/docs/doc8.txt +0 -25
- data/test/docs/doc9.txt +0 -84
- data/test/fixtures/tags.txt +0 -976
- data/test/test_helper.rb +0 -5
- data/test/test_rule_tagger.rb +0 -151
- data/test/test_word_tagger.rb +0 -47
- data/tools/rakehelp.rb +0 -113
- data/website/index.html +0 -231
- data/website/index.txt +0 -70
- data/website/javascripts/rounded_corners_lite.inc.js +0 -285
- data/website/stylesheets/screen.css +0 -138
- data/website/template.html.erb +0 -184
data/test/test_helper.rb
DELETED
data/test/test_rule_tagger.rb
DELETED
@@ -1,151 +0,0 @@
|
|
1
|
-
require File.dirname(__FILE__) + '/test_helper'
|
2
|
-
|
3
|
-
|
4
|
-
class TestRuleTagger< Test::Unit::TestCase
|
5
|
-
SAMPLE_DOC=%q(
|
6
|
-
Take an active role in your care
|
7
|
-
When it comes to making decisions about the goals and direction of treatment, don't sit back. Work closely and actively with your oncologist and the rest of your medical team.
|
8
|
-
Dont overlook clinical trials
|
9
|
-
If youre eligible to enroll in clinical trials, select an oncologist who participates in them. Patients who enroll in clinical studies receive closer follow-up, the highest standard-of-care treatment and access to experimental therapies at no extra cost.
|
10
|
-
Maximize your nutrition strategy
|
11
|
-
Doing your best to eat a healthy, well-balanced diet is vital to prompt healing after surgery and for recovery from radiation or chemotherapy. Many oncology practices employ registered dieticians who can help you optimize your nutrition.
|
12
|
-
Steer clear of "natural cures"
|
13
|
-
Before trying nutritional supplements or herbal remedies, be sure to discuss your plans with a doctor. Most have not been tested in clinical studies, and some may actually interfere with your treatment.
|
14
|
-
Build a stronger body
|
15
|
-
Even walking regularly is can help you minimize long-term muscle weakness caused by illness or de-conditioning.
|
16
|
-
Focus on overall health
|
17
|
-
Patients may be cured of cancer but still face life-threatening medical problems that are underemphasized during cancer treatments, such as diabetes, high blood pressure and heart disease. Continue to monitor your overall health.
|
18
|
-
Put the fire out for good
|
19
|
-
Smoking impairs healing after surgery and radiation and increases your risk of cardiovascular disease and many types of cancers. Ask your doctor for help identifying and obtaining the most appropriate cessation aids.
|
20
|
-
Map a healthy future
|
21
|
-
Once youve completed treatment, discuss appropriate follow-up plans with your doctor and keep track of them yourself. Intensified screening over many years is frequently recommended to identify and treat a recurrence early on.
|
22
|
-
Share your feelings
|
23
|
-
Allow yourself time to discuss the emotional consequences of your illness and treatment with family, friends, your doctor and, if necessary, a professional therapist. Many patients also find antidepressants helpful during treatment.
|
24
|
-
Stay connected
|
25
|
-
Although many newly diagnosed patients fear they will not be able to keep working during treatment, this is usually not the case. Working, even at a reduced schedule, helps you maintain valuable social connections and weekly structure.
|
26
|
-
)
|
27
|
-
SAMPLE_DOC2=%q(
|
28
|
-
Britney Spears was granted a change in her visitation schedule with her sons Sean Preston and Jayden James at a hearing Tuesday.
|
29
|
-
"There was a change in visitation status that was ordered by Commissioner Gordon this morning," Los Angeles Superior Court spokesperson Alan Parachini confirmed after the hearing, which both Kevin Federline and her father (and co-conservator) Jamie Spears attended. (Britney and Kevin did not address each other during the hearing.)
|
30
|
-
The details of her visitation, however, are unclear.
|
31
|
-
"I'm not at liberty to answer any questions about the nature of that change," Parachini said. (TMZ.com had reported that Spears wanted overnight visits.)
|
32
|
-
Asked by Us if she were happy with the court outcome, Spears (clutching an Ed Hardy purse) smiled and told Us, "Yes."
|
33
|
-
Next up: A status hearing set for July 15.
|
34
|
-
The couple last appeared in court May 6. Spears was granted extended visitation — three days a week from 9 a.m. to 5 p.m. — of Sean Preston, 2, and Jayden James, 20 months.
|
35
|
-
)
|
36
|
-
SAMPLE_DOC3=%q(
|
37
|
-
TMZ.com: Britney celebrated getting overnights with her kids by going on a wild shopping trip for herself.With L.A.'s finest at her service, it was a total clusterf**k outside of Fred Segal as Brit Brit made her way out. The scene was crazy -- and it was all... Read more
|
38
|
-
)
|
39
|
-
def setup
|
40
|
-
if !defined?($tagger)
|
41
|
-
$rtagger = Brill::Tagger.new
|
42
|
-
end
|
43
|
-
end
|
44
|
-
|
45
|
-
def test_simple_tagger
|
46
|
-
pairs = tagger.tag( SAMPLE_DOC )
|
47
|
-
tags = [["", ")"], ["", ")"], ["Take", "VB"], ["an", "DT"], ["active", "JJ"], ["role", "NN"], ["in", "IN"],
|
48
|
-
["your", "PRP$"], ["care", "NN"], ["When", "WRB"], ["it", "PRP"], ["comes", "VBZ"], ["to", "TO"],
|
49
|
-
["making", "VBG"], ["decisions", "NNS"], ["about", "IN"], ["the", "DT"], ["goals", "NNS"], ["and", "CC"],
|
50
|
-
["direction", "NN"], ["of", "IN"], ["treatment", "NN"], [",", ","], ["", ")"], ["do", "VBP"], ["", ")"],
|
51
|
-
["n't", "RB"], ["sit", "VB"], ["back.", "CD"], ["Work", "NN"], ["closely", "RB"], ["and", "CC"],
|
52
|
-
["actively", "RB"], ["with", "IN"], ["your", "PRP$"], ["oncologist", "NN"], ["and", "CC"], ["the", "DT"],
|
53
|
-
["rest", "NN"], ["of", "IN"], ["your", "PRP$"], ["medical", "JJ"], ["team.", "JJ"], ["Do", "VBP"],
|
54
|
-
["", ")"], ["n't", "RB"], ["overlook", "VB"], ["clinical", "JJ"], ["trials", "NNS"], ["If", "IN"],
|
55
|
-
["you", "PRP"], ["'re", "VBP"], ["eligible", "JJ"], ["to", "TO"], ["enroll", "VB"], ["in", "IN"],
|
56
|
-
["clinical", "JJ"], ["trials", "NNS"], [",", ","], ["", ")"], ["select", "VB"], ["an", "DT"],
|
57
|
-
["oncologist", "NN"], ["who", "WP"], ["participates", "VBZ"], ["in", "IN"], ["them.", "JJ"],
|
58
|
-
["Patients", "NNS"], ["who", "WP"], ["enroll", "VBP"], ["in", "IN"], ["clinical", "JJ"],
|
59
|
-
["studies", "NNS"], ["receive", "VBP"], ["closer", "JJR"], ["follow-up", "NN"], [",", ","], ["", ")"],
|
60
|
-
["the", "DT"], ["highest", "JJS"], ["standard-of-care", "JJ"], ["treatment", "NN"], ["and", "CC"],
|
61
|
-
["access", "NN"], ["to", "TO"], ["experimental", "JJ"], ["therapies", "NNS"], ["at", "IN"], ["no", "DT"],
|
62
|
-
["extra", "JJ"], ["cost.", "NNP"], ["Maximize", "NNP"], ["your", "PRP$"], ["nutrition", "NN"],
|
63
|
-
["strategy", "NN"], ["Doing", "NNP"], ["your", "PRP$"], ["best", "JJS"], ["to", "TO"], ["eat", "VB"],
|
64
|
-
["a", "DT"], ["healthy", "JJ"], [",", ","], ["", ")"], ["well-balanced", "JJ"], ["diet", "NN"],
|
65
|
-
["is", "VBZ"], ["vital", "JJ"], ["to", "TO"], ["prompt", "VB"], ["healing", "NN"], ["after", "IN"],
|
66
|
-
["surgery", "NN"], ["and", "CC"], ["for", "IN"], ["recovery", "NN"], ["from", "IN"], ["radiation", "NN"],
|
67
|
-
["or", "CC"], ["chemotherapy.", "JJ"], ["Many", "JJ"], ["oncology", "NN"], ["practices", "NNS"],
|
68
|
-
["employ", "VBP"], ["registered", "VBN"], ["dieticians", "NNS"], ["who", "WP"], ["can", "MD"],
|
69
|
-
["help", "VB"], ["you", "PRP"], ["optimize", "VB"], ["your", "PRP$"], ["nutrition.", "JJ"],
|
70
|
-
["Steer", "VB"], ["clear", "JJ"], ["of", "IN"], ["", ")"], ["``", "``"], ["natural", "JJ"],
|
71
|
-
["cures", "NNS"], ["''", "''"], ["", ")"], ["Before", "IN"], ["trying", "VBG"], ["nutritional", "JJ"],
|
72
|
-
["supplements", "NNS"], ["or", "CC"], ["herbal", "JJ"], ["remedies", "NNS"], [",", ","], ["", ")"],
|
73
|
-
["be", "VB"], ["sure", "JJ"], ["to", "TO"], ["discuss", "VB"], ["your", "PRP$"], ["plans", "NNS"],
|
74
|
-
["with", "IN"], ["a", "DT"], ["doctor.", "JJ"], ["Most", "JJS"], ["have", "VBP"], ["not", "RB"],
|
75
|
-
["been", "VBN"], ["tested", "VBN"], ["in", "IN"], ["clinical", "JJ"], ["studies", "NNS"], [",", ","],
|
76
|
-
["", ")"], ["and", "CC"], ["some", "DT"], ["may", "MD"], ["actually", "RB"], ["interfere", "VB"],
|
77
|
-
["with", "IN"], ["your", "PRP$"], ["treatment.", "JJ"], ["Build", "VB"], ["a", "DT"], ["stronger", "JJR"],
|
78
|
-
["body", "NN"], ["Even", "RB"], ["walking", "VBG"], ["regularly", "RB"], ["is", "VBZ"], ["can", "MD"],
|
79
|
-
["help", "VB"], ["you", "PRP"], ["minimize", "VB"], ["long-term", "JJ"], ["muscle", "NN"],
|
80
|
-
["weakness", "NN"], ["caused", "VBN"], ["by", "IN"], ["illness", "NN"], ["or", "CC"],
|
81
|
-
["de-conditioning.", "NNP"], ["Focus", "NNP"], ["on", "IN"], ["overall", "JJ"], ["health", "NN"],
|
82
|
-
["Patients", "NNS"], ["may", "MD"], ["be", "VB"], ["cured", "VBN"], ["of", "IN"], ["cancer", "NN"],
|
83
|
-
["but", "CC"], ["still", "JJ"], ["face", "NN"], ["life-threatening", "JJ"], ["medical", "JJ"],
|
84
|
-
["problems", "NNS"], ["that", "WDT"], ["are", "VBP"], ["underemphasized", "JJ"], ["during", "IN"],
|
85
|
-
["cancer", "NN"], ["treatments", "NNS"], [",", ","], ["", ")"], ["such", "JJ"], ["as", "IN"],
|
86
|
-
["diabetes", "NN"], [",", ","], ["", ")"], ["high", "JJ"], ["blood", "NN"], ["pressure", "NN"],
|
87
|
-
["and", "CC"], ["heart", "NN"], ["disease.", "JJ"], ["Continue", "VB"], ["to", "TO"], ["monitor", "VB"],
|
88
|
-
["your", "PRP$"], ["overall", "JJ"], ["health.", "JJ"], ["Put", "NN"], ["the", "DT"], ["fire", "NN"],
|
89
|
-
["out", "IN"], ["for", "IN"], ["good", "JJ"], ["Smoking", "NNP"], ["impairs", "NNS"], ["healing", "NN"],
|
90
|
-
["after", "IN"], ["surgery", "NN"], ["and", "CC"], ["radiation", "NN"], ["and", "CC"], ["increases", "NNS"],
|
91
|
-
["your", "PRP$"], ["risk", "NN"], ["of", "IN"], ["cardiovascular", "JJ"], ["disease", "NN"], ["and", "CC"],
|
92
|
-
["many", "JJ"], ["types", "NNS"], ["of", "IN"], ["cancers.", "CD"], ["Ask", "VB"], ["your", "PRP$"],
|
93
|
-
["doctor", "NN"], ["for", "IN"], ["help", "NN"], ["identifying", "VBG"], ["and", "CC"], ["obtaining", "VBG"],
|
94
|
-
["the", "DT"], ["most", "RBS"], ["appropriate", "JJ"], ["cessation", "NN"], ["aids.", "NNP"], ["Map", "NNP"],
|
95
|
-
["a", "DT"], ["healthy", "JJ"], ["future", "NN"], ["Once", "RB"], ["youve", "VBP"], ["completed", "VBN"],
|
96
|
-
["treatment", "NN"], [",", ","], ["", ")"], ["discuss", "VB"], ["appropriate", "JJ"], ["follow-up", "NN"],
|
97
|
-
["plans", "NNS"], ["with", "IN"], ["your", "PRP$"], ["doctor", "NN"], ["and", "CC"], ["keep", "VB"],
|
98
|
-
["track", "NN"], ["of", "IN"], ["them", "PRP"], ["yourself.", "CD"], ["Intensified", "JJ"], ["screening", "NN"],
|
99
|
-
["over", "IN"], ["many", "JJ"], ["years", "NNS"], ["is", "VBZ"], ["frequently", "RB"], ["recommended", "VBN"],
|
100
|
-
["to", "TO"], ["identify", "VB"], ["and", "CC"], ["treat", "VB"], ["a", "DT"], ["recurrence", "NN"], ["early", "JJ"],
|
101
|
-
["on.", "CD"], ["Share", "VB"], ["your", "PRP$"], ["feelings", "NNS"], ["Allow", "VB"], ["yourself", "PRP"],
|
102
|
-
["time", "NN"], ["to", "TO"], ["discuss", "VB"], ["the", "DT"], ["emotional", "JJ"], ["consequences", "NNS"],
|
103
|
-
["of", "IN"], ["your", "PRP$"], ["illness", "NN"], ["and", "CC"], ["treatment", "NN"], ["with", "IN"],
|
104
|
-
["family", "NN"], [",", ","], ["", ")"], ["friends", "NNS"], [",", ","], ["", ")"], ["your", "PRP$"],
|
105
|
-
["doctor", "NN"], ["and", "CC"], [",", ","], ["", ")"], ["if", "IN"], ["necessary", "JJ"], [",", ","],
|
106
|
-
["", ")"], ["a", "DT"], ["professional", "JJ"], ["therapist.", "JJ"], ["Many", "JJ"], ["patients", "NNS"],
|
107
|
-
["also", "RB"], ["find", "VBP"], ["antidepressants", "NNS"], ["helpful", "JJ"], ["during", "IN"],
|
108
|
-
["treatment.", "JJ"], ["Stay", "VB"], ["connected", "VBN"], ["Although", "IN"], ["many", "JJ"],
|
109
|
-
["newly", "RB"], ["diagnosed", "VBN"], ["patients", "NNS"], ["fear", "VBP"], ["they", "PRP"], ["will", "MD"],
|
110
|
-
["not", "RB"], ["be", "VB"], ["able", "JJ"], ["to", "TO"], ["keep", "VB"], ["working", "VBG"], ["during", "IN"],
|
111
|
-
["treatment", "NN"], [",", ","], ["", ")"], ["this", "DT"], ["is", "VBZ"], ["usually", "RB"], ["not", "RB"],
|
112
|
-
["the", "DT"], ["case.", "CD"], ["Working", "NNP"], [",", ","], ["", ")"], ["even", "RB"], ["at", "IN"],
|
113
|
-
["a", "DT"], ["reduced", "VBN"], ["schedule", "NN"], [",", ","], ["", ")"], ["helps", "VBZ"], ["you", "PRP"],
|
114
|
-
["maintain", "VBP"], ["valuable", "JJ"], ["social", "JJ"], ["connections", "NNS"], ["and", "CC"],
|
115
|
-
["weekly", "JJ"], ["structure", "NN"], [".", "."]]
|
116
|
-
assert_equal tags, pairs
|
117
|
-
end
|
118
|
-
|
119
|
-
def test_multiple_docs
|
120
|
-
#timer = Time.now
|
121
|
-
count = 0
|
122
|
-
Dir["#{File.dirname(__FILE__)}/docs/doc*"].each do|doc|
|
123
|
-
tagger.tag( File.read( doc ) )
|
124
|
-
count += 1
|
125
|
-
end
|
126
|
-
#duration = Time.now - timer
|
127
|
-
#puts "time: #{duration} sec #{count.to_f/duration} docs/sec"
|
128
|
-
end
|
129
|
-
|
130
|
-
def test_suggest
|
131
|
-
results = tagger.suggest( SAMPLE_DOC )
|
132
|
-
# puts results.inspect
|
133
|
-
assert results.include?(["treatment", "NN", 5])
|
134
|
-
results = tagger.suggest( SAMPLE_DOC2 )
|
135
|
-
assert results.include?(["Britney Spears", "NNP", 6])
|
136
|
-
assert results.include?(["Jamie Spears", "NNP", 12])
|
137
|
-
# puts results.inspect
|
138
|
-
results = tagger.suggest( SAMPLE_DOC3, 5 )
|
139
|
-
#puts results.inspect
|
140
|
-
end
|
141
|
-
|
142
|
-
def test_adjectives
|
143
|
-
results = tagger.adjectives("So happy i get to bring my baby boy home tomorrow. Hospital tv is horrible, ten channels no one watches")
|
144
|
-
assert_equal [["happy", "JJ"], ["horrible", "JJ"]], results
|
145
|
-
end
|
146
|
-
|
147
|
-
private
|
148
|
-
def tagger
|
149
|
-
$rtagger
|
150
|
-
end
|
151
|
-
end
|
data/test/test_word_tagger.rb
DELETED
@@ -1,47 +0,0 @@
|
|
1
|
-
require File.dirname(__FILE__) + '/test_helper'
|
2
|
-
|
3
|
-
class TestWordTagger < Test::Unit::TestCase
|
4
|
-
|
5
|
-
def setup
|
6
|
-
if !defined?($wtagger)
|
7
|
-
$wtagger = Word::Tagger.new( File.join(File.dirname(__FILE__),'fixtures','tags.txt'), :words => 4 )
|
8
|
-
end
|
9
|
-
end
|
10
|
-
|
11
|
-
def test_basic
|
12
|
-
#timer = Time.now
|
13
|
-
text = "This is a sa'mple doc[]ument lets see how cancer ngrams 4 works out for this interesting text!"
|
14
|
-
tags = $wtagger.execute( text )
|
15
|
-
assert_equal ['cancer','work'], tags
|
16
|
-
#puts "Duration: #{Time.now - timer} sec"
|
17
|
-
end
|
18
|
-
|
19
|
-
def test_sample_bug
|
20
|
-
tags = ["foo", "bar", "baz", "squishy", "yummy"]
|
21
|
-
txt = 'This is some sample text. Foo walked into a bar. The bartender said "What can I get you?" Foo said he wanted something yummy - like a baz.'
|
22
|
-
tagger = Word::Tagger.new tags, :words => 4
|
23
|
-
result_tags = tagger.execute( txt )
|
24
|
-
assert_equal ["bar", "baz", "foo", "yummy"], result_tags
|
25
|
-
end
|
26
|
-
|
27
|
-
def test_ngram_size3
|
28
|
-
#timer = Time.now
|
29
|
-
text = "This body of text contains something like ventricular septal defect"
|
30
|
-
tags = $wtagger.execute( text )
|
31
|
-
assert_equal ['ventricular septal defect'], tags
|
32
|
-
#puts "Duration: #{Time.now - timer} sec"
|
33
|
-
end
|
34
|
-
|
35
|
-
def test_cat_and_the_hat
|
36
|
-
tagger = Word::Tagger.new( ['Cat','hat'], :words => 4 )
|
37
|
-
tags = tagger.execute( 'the cAt and the hat' )
|
38
|
-
assert_equal( ["Cat", "hat"], tags )
|
39
|
-
end
|
40
|
-
|
41
|
-
def test_freq_counts
|
42
|
-
tagger = Word::Tagger.new( ['Cat','hat'], :words => 4 )
|
43
|
-
tags = tagger.freq( 'the cAt and the hat the cAt and the hat the cAt and the hat the cAt and the hat' )
|
44
|
-
assert_equal( {"Cat"=>4, "hat"=>4}, tags )
|
45
|
-
end
|
46
|
-
|
47
|
-
end
|
data/tools/rakehelp.rb
DELETED
@@ -1,113 +0,0 @@
|
|
1
|
-
# This final came directly from mongrel 1.0.1 source
|
2
|
-
# with a few modifications to support some of my network tests
|
3
|
-
# Also, i have figured out yet if this should remain so much a clone of the mongrel tree
|
4
|
-
# or become a plugin, need to review more closely how that works
|
5
|
-
|
6
|
-
def make(makedir)
|
7
|
-
Dir.chdir(makedir) do
|
8
|
-
sh(PLATFORM =~ /win32/ ? 'nmake' : 'make')
|
9
|
-
end
|
10
|
-
end
|
11
|
-
|
12
|
-
def extconf(dir)
|
13
|
-
Dir.chdir(dir) do ruby "extconf.rb" end
|
14
|
-
end
|
15
|
-
|
16
|
-
def setup_tests
|
17
|
-
Rake::TestTask.new do |t|
|
18
|
-
t.test_files = FileList["test/*_test.rb"]
|
19
|
-
t.verbose = true
|
20
|
-
end
|
21
|
-
end
|
22
|
-
|
23
|
-
|
24
|
-
def setup_clean otherfiles
|
25
|
-
files = ['build/*', '**/*.o', '**/*.so', '**/*.a', 'lib/*-*', '**/*.log'] + otherfiles
|
26
|
-
CLEAN.include(files)
|
27
|
-
end
|
28
|
-
|
29
|
-
|
30
|
-
def setup_rdoc files
|
31
|
-
Rake::RDocTask.new do |rdoc|
|
32
|
-
rdoc.rdoc_dir = 'doc/rdoc'
|
33
|
-
rdoc.options << '--line-numbers'
|
34
|
-
rdoc.rdoc_files.add(files)
|
35
|
-
end
|
36
|
-
end
|
37
|
-
|
38
|
-
|
39
|
-
def setup_extension(dir, extension)
|
40
|
-
ext = "ext/#{dir}"
|
41
|
-
ext_so = "#{ext}/#{extension}.#{Config::CONFIG['DLEXT']}"
|
42
|
-
ext_files = FileList[
|
43
|
-
"#{ext}/*.c",
|
44
|
-
"#{ext}/*.h",
|
45
|
-
"#{ext}/extconf.rb",
|
46
|
-
"#{ext}/Makefile",
|
47
|
-
"lib"
|
48
|
-
]
|
49
|
-
|
50
|
-
task "lib" do
|
51
|
-
directory "lib"
|
52
|
-
end
|
53
|
-
|
54
|
-
desc "Builds just the #{extension} extension"
|
55
|
-
task extension.to_sym => ["#{ext}/Makefile", ext_so ]
|
56
|
-
|
57
|
-
file "#{ext}/Makefile" => ["#{ext}/extconf.rb"] do
|
58
|
-
extconf "#{ext}"
|
59
|
-
end
|
60
|
-
|
61
|
-
file ext_so => ext_files do
|
62
|
-
make "#{ext}"
|
63
|
-
cp ext_so, "lib"
|
64
|
-
end
|
65
|
-
end
|
66
|
-
|
67
|
-
|
68
|
-
def base_gem_spec(pkg_name, pkg_version)
|
69
|
-
rm_rf "test/coverage"
|
70
|
-
pkg_version = pkg_version
|
71
|
-
pkg_name = pkg_name
|
72
|
-
pkg_file_name = "#{pkg_name}-#{pkg_version}"
|
73
|
-
Gem::Specification.new do |s|
|
74
|
-
s.name = pkg_name
|
75
|
-
s.version = pkg_version
|
76
|
-
s.platform = Gem::Platform::RUBY
|
77
|
-
s.has_rdoc = true
|
78
|
-
s.extra_rdoc_files = [ "README" ]
|
79
|
-
|
80
|
-
s.files = %w(COPYING LICENSE README Rakefile) +
|
81
|
-
Dir.glob("{bin,doc/rdoc,test}/**/*") +
|
82
|
-
Dir.glob("ext/**/*.{h,c,rb,rl}") +
|
83
|
-
Dir.glob("{examples,tools,lib}/**/*.rb")
|
84
|
-
|
85
|
-
s.require_path = "lib"
|
86
|
-
s.extensions = FileList["ext/**/extconf.rb"].to_a
|
87
|
-
s.bindir = "bin"
|
88
|
-
end
|
89
|
-
end
|
90
|
-
|
91
|
-
def setup_gem(pkg_name, pkg_version)
|
92
|
-
spec = base_gem_spec(pkg_name, pkg_version)
|
93
|
-
yield spec if block_given?
|
94
|
-
|
95
|
-
Rake::GemPackageTask.new(spec) do |p|
|
96
|
-
p.gem_spec = spec
|
97
|
-
p.need_tar = true if RUBY_PLATFORM !~ /mswin/
|
98
|
-
end
|
99
|
-
end
|
100
|
-
|
101
|
-
# Conditional require rcov/rcovtask if present
|
102
|
-
begin
|
103
|
-
require 'rcov/rcovtask'
|
104
|
-
|
105
|
-
Rcov::RcovTask.new do |t|
|
106
|
-
t.test_files = FileList['test/unit/*_test.rb'] + FileList["test/integration/*_test.rb"]
|
107
|
-
t.rcov_opts << "-x /usr"
|
108
|
-
t.output_dir = "test/coverage"
|
109
|
-
t.verbose = true
|
110
|
-
end
|
111
|
-
rescue Object => e
|
112
|
-
puts e.message
|
113
|
-
end
|
data/website/index.html
DELETED
@@ -1,231 +0,0 @@
|
|
1
|
-
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
2
|
-
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
3
|
-
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
4
|
-
<head>
|
5
|
-
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
6
|
-
<title>rbtagger</title>
|
7
|
-
<style type="text/css">
|
8
|
-
body {
|
9
|
-
background-color: #F1F1F1;
|
10
|
-
font-family: "Georgia", sans-serif;
|
11
|
-
font-size: 16px;
|
12
|
-
line-height: 1.6em;
|
13
|
-
padding: 1.6em 0 0 0;
|
14
|
-
color: #333;
|
15
|
-
}
|
16
|
-
h1, h2, h3, h4, h5, h6 {
|
17
|
-
color: #444;
|
18
|
-
}
|
19
|
-
h1 {
|
20
|
-
font-family: sans-serif;
|
21
|
-
font-weight: normal;
|
22
|
-
font-size: 4em;
|
23
|
-
line-height: 0.8em;
|
24
|
-
letter-spacing: -0.1ex;
|
25
|
-
margin: 5px;
|
26
|
-
}
|
27
|
-
li {
|
28
|
-
padding: 0;
|
29
|
-
margin: 0;
|
30
|
-
list-style-type: square;
|
31
|
-
}
|
32
|
-
a {
|
33
|
-
color: #5E5AFF;
|
34
|
-
background-color: #DAC;
|
35
|
-
font-weight: normal;
|
36
|
-
text-decoration: underline;
|
37
|
-
}
|
38
|
-
blockquote {
|
39
|
-
font-size: 90%;
|
40
|
-
font-style: italic;
|
41
|
-
border-left: 1px solid #111;
|
42
|
-
padding-left: 1em;
|
43
|
-
}
|
44
|
-
.caps {
|
45
|
-
font-size: 80%;
|
46
|
-
}
|
47
|
-
|
48
|
-
#main {
|
49
|
-
width: 45em;
|
50
|
-
padding: 0;
|
51
|
-
margin: 0 auto;
|
52
|
-
}
|
53
|
-
.coda {
|
54
|
-
text-align: right;
|
55
|
-
color: #77f;
|
56
|
-
font-size: smaller;
|
57
|
-
}
|
58
|
-
|
59
|
-
table {
|
60
|
-
font-size: 90%;
|
61
|
-
line-height: 1.4em;
|
62
|
-
color: #ff8;
|
63
|
-
background-color: #111;
|
64
|
-
padding: 2px 10px 2px 10px;
|
65
|
-
border-style: dashed;
|
66
|
-
}
|
67
|
-
|
68
|
-
th {
|
69
|
-
color: #fff;
|
70
|
-
}
|
71
|
-
|
72
|
-
td {
|
73
|
-
padding: 2px 10px 2px 10px;
|
74
|
-
}
|
75
|
-
|
76
|
-
.success {
|
77
|
-
color: #0CC52B;
|
78
|
-
}
|
79
|
-
|
80
|
-
.failed {
|
81
|
-
color: #E90A1B;
|
82
|
-
}
|
83
|
-
|
84
|
-
.unknown {
|
85
|
-
color: #995000;
|
86
|
-
}
|
87
|
-
pre, code {
|
88
|
-
font-family: monospace;
|
89
|
-
font-size: 90%;
|
90
|
-
line-height: 1.4em;
|
91
|
-
color: #ff8;
|
92
|
-
background-color: #111;
|
93
|
-
padding: 2px 10px 2px 10px;
|
94
|
-
}
|
95
|
-
.comment { color: #aaa; font-style: italic; }
|
96
|
-
.keyword { color: #eff; font-weight: bold; }
|
97
|
-
.punct { color: #eee; font-weight: bold; }
|
98
|
-
.symbol { color: #0bb; }
|
99
|
-
.string { color: #6b4; }
|
100
|
-
.ident { color: #ff8; }
|
101
|
-
.constant { color: #66f; }
|
102
|
-
.regex { color: #ec6; }
|
103
|
-
.number { color: #F99; }
|
104
|
-
.expr { color: #227; }
|
105
|
-
|
106
|
-
#version {
|
107
|
-
float: right;
|
108
|
-
text-align: right;
|
109
|
-
font-family: sans-serif;
|
110
|
-
font-weight: normal;
|
111
|
-
background-color: #B3ABFF;
|
112
|
-
color: #141331;
|
113
|
-
padding: 15px 20px 10px 20px;
|
114
|
-
margin: 0 auto;
|
115
|
-
margin-top: 15px;
|
116
|
-
border: 3px solid #141331;
|
117
|
-
display:block;
|
118
|
-
-moz-border-radius-bottomleft:10px;
|
119
|
-
-moz-border-radius-bottomright:10px;
|
120
|
-
-moz-border-radius-topleft:10px;
|
121
|
-
-moz-border-radius-topright:10px;
|
122
|
-
-webkit-border-bottom-left-radius:10px;
|
123
|
-
-webkit-border-bottom-right-radius:10px;
|
124
|
-
-webkit-border-top-left-radius:10px;
|
125
|
-
-webkit-border-top-right-radius:10px;
|
126
|
-
}
|
127
|
-
|
128
|
-
#version .numbers {
|
129
|
-
display: block;
|
130
|
-
font-size: 4em;
|
131
|
-
line-height: 0.8em;
|
132
|
-
letter-spacing: -0.1ex;
|
133
|
-
margin-bottom: 15px;
|
134
|
-
}
|
135
|
-
|
136
|
-
#version p {
|
137
|
-
text-decoration: none;
|
138
|
-
color: #141331;
|
139
|
-
background-color: #B3ABFF;
|
140
|
-
margin: 0;
|
141
|
-
padding: 0;
|
142
|
-
}
|
143
|
-
|
144
|
-
#version a {
|
145
|
-
text-decoration: none;
|
146
|
-
color: #141331;
|
147
|
-
background-color: #B3ABFF;
|
148
|
-
}
|
149
|
-
|
150
|
-
.clickable {
|
151
|
-
cursor: pointer;
|
152
|
-
cursor: hand;
|
153
|
-
}
|
154
|
-
|
155
|
-
</style>
|
156
|
-
</head>
|
157
|
-
<body>
|
158
|
-
<div id="main">
|
159
|
-
|
160
|
-
<h1>rbtagger</h1>
|
161
|
-
<div id="version" class="clickable" onclick='document.location = "http://rubyforge.org/projects/rbtagger"; return false'>
|
162
|
-
<p>Get Version</p>
|
163
|
-
<a href="http://rubyforge.org/projects/rbtagger" class="numbers">0.3.1</a>
|
164
|
-
</div>
|
165
|
-
<h4 style="float:right;padding-right:10px;"> &#x2192; ‘rbtagger’</h4>
|
166
|
-
<h2>What</h2>
|
167
|
-
<p>A Simple Ruby Rule-Based Part of Speech Tagger</p>
|
168
|
-
<p>This work is based on the work of Eric Brill</p>
|
169
|
-
<h2>Installing</h2>
|
170
|
-
<p><pre class='syntax'>
|
171
|
-
gem install rbtagger
|
172
|
-
</pre></p>
|
173
|
-
<h2>The basics</h2>
|
174
|
-
<h4>Using the rule tagger</h4>
|
175
|
-
<p><pre class='syntax'>
|
176
|
-
<span class="ident">require</span> <span class="punct">'</span><span class="string">rbtagger</span><span class="punct">'</span>
|
177
|
-
|
178
|
-
<span class="ident">tagger</span> <span class="punct">=</span> <span class="constant">Brill</span><span class="punct">::</span><span class="constant">Tagger</span><span class="punct">.</span><span class="ident">new</span>
|
179
|
-
<span class="ident">docs</span><span class="punct">.</span><span class="ident">each</span> <span class="keyword">do</span><span class="punct">|</span><span class="ident">doc</span><span class="punct">|</span>
|
180
|
-
<span class="ident">tagger</span><span class="punct">.</span><span class="ident">tag</span><span class="punct">(</span> <span class="constant">File</span><span class="punct">.</span><span class="ident">read</span><span class="punct">(</span> <span class="ident">doc</span> <span class="punct">)</span> <span class="punct">)</span>
|
181
|
-
<span class="keyword">end</span>
|
182
|
-
|
183
|
-
<span class="ident">tagger</span><span class="punct">.</span><span class="ident">suggest</span><span class="punct">(</span> <span class="constant">File</span><span class="punct">.</span><span class="ident">read</span><span class="punct">("</span><span class="string">sample.txt</span><span class="punct">")</span> <span class="punct">)</span>
|
184
|
-
<span class="punct">=></span> <span class="punct">[["</span><span class="string">doctor</span><span class="punct">",</span> <span class="punct">"</span><span class="string">NN</span><span class="punct">",</span> <span class="number">3</span><span class="punct">],</span> <span class="punct">["</span><span class="string">treatment</span><span class="punct">",</span> <span class="punct">"</span><span class="string">NN</span><span class="punct">",</span> <span class="number">5</span><span class="punct">]]</span>
|
185
|
-
|
186
|
-
<span class="ident">tagger</span><span class="punct">.</span><span class="ident">nouns</span>
|
187
|
-
<span class="ident">tagger</span><span class="punct">.</span><span class="ident">adjectives</span>
|
188
|
-
</pre></p>
|
189
|
-
<h4>Using the word tagger</h4>
|
190
|
-
<p><pre class='syntax'>
|
191
|
-
<span class="ident">require</span> <span class="punct">'</span><span class="string">rbtagger</span><span class="punct">'</span>
|
192
|
-
|
193
|
-
<span class="ident">tagger</span> <span class="punct">=</span> <span class="constant">Word</span><span class="punct">::</span><span class="constant">Tagger</span><span class="punct">.</span><span class="ident">new</span><span class="punct">(</span> <span class="punct">['</span><span class="string">cat</span><span class="punct">','</span><span class="string">hat</span><span class="punct">'],</span> <span class="symbol">:words</span> <span class="punct">=></span> <span class="number">4</span> <span class="punct">)</span>
|
194
|
-
<span class="ident">tags</span> <span class="punct">=</span> <span class="ident">tagger</span><span class="punct">.</span><span class="ident">execute</span><span class="punct">(</span> <span class="punct">'</span><span class="string">the cat and the hat</span><span class="punct">'</span> <span class="punct">)</span>
|
195
|
-
<span class="ident">assert_equal</span><span class="punct">(</span> <span class="punct">["</span><span class="string">cat</span><span class="punct">",</span> <span class="punct">"</span><span class="string">hat</span><span class="punct">"],</span> <span class="ident">tags</span> <span class="punct">)</span>
|
196
|
-
</pre></p>
|
197
|
-
<h2>Forum</h2>
|
198
|
-
<p><a href="http://groups.google.com/group/rb-brill-tagger">http://groups.google.com/group/rb-brill-tagger</a></p>
|
199
|
-
<h2>How to submit patches</h2>
|
200
|
-
<p>Read the <a href="http://drnicwilliams.com/2007/06/01/8-steps-for-fixing-other-peoples-code/">8 steps for fixing other people’s code</a> and for section <a href="http://drnicwilliams.com/2007/06/01/8-steps-for-fixing-other-peoples-code/#8b-google-groups">8b: Submit patch to Google Groups</a>, use the Google Group above.</p>
|
201
|
-
<ul>
|
202
|
-
<li>github: <a href="http://github.com/taf2/rb-brill-tagger/tree/master">http://github.com/taf2/rb-brill-tagger/tree/master</a></li>
|
203
|
-
</ul>
|
204
|
-
<pre>git clone git://github.com/taf2/rb-brill-tagger.git</pre>
|
205
|
-
<h3>Build and test instructions</h3>
|
206
|
-
<pre>cd rb-brill-tagger
|
207
|
-
rake test
|
208
|
-
rake install_gem</pre>
|
209
|
-
<h2>License</h2>
|
210
|
-
<p>This code is free to use under the terms of the <span class="caps">MIT</span> license.</p>
|
211
|
-
<h2>Contact</h2>
|
212
|
-
<p>Comments are welcome. Send an email to <a href="mailto:rb-brill-tagger@googlegroups.com">Todd A. Fisher</a> email via the <a href="http://groups.google.com/group/rb-brill-tagger">forum</a></p>
|
213
|
-
<p class="coda">
|
214
|
-
<a href="http://xullicious.blogspot.com/">Todd A. Fisher</a>, 21st May 2009<br>
|
215
|
-
Theme extended from <a href="http://rb2js.rubyforge.org/">Paul Battley</a>
|
216
|
-
</p>
|
217
|
-
</div>
|
218
|
-
|
219
|
-
<!-- insert site tracking codes here, like Google Urchin -->
|
220
|
-
<script type="text/javascript">
|
221
|
-
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
|
222
|
-
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
|
223
|
-
</script>
|
224
|
-
<script type="text/javascript">
|
225
|
-
var pageTracker = _gat._getTracker("UA-246931-6");
|
226
|
-
pageTracker._initData();
|
227
|
-
pageTracker._trackPageview();
|
228
|
-
</script>
|
229
|
-
|
230
|
-
</body>
|
231
|
-
</html>
|