engtagger 0.1.1 → 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,9 +1,5 @@
1
- #! /local/ruby/bin/ruby
2
- #
3
- # $Id: stemmable.rb,v 1.2 2003/02/01 02:07:30 condit Exp $
4
- #
5
- # See example usage at the end of this file.
6
- #
1
+ #!/usr/bin/env ruby
2
+ # -*- coding: utf-8 -*-
7
3
 
8
4
  module Stemmable
9
5
 
Binary file
Binary file
@@ -0,0 +1,3 @@
1
+ module EngTagger
2
+ VERSION = "0.1.2"
3
+ end
metadata CHANGED
@@ -1,86 +1,63 @@
1
- --- !ruby/object:Gem::Specification
1
+ --- !ruby/object:Gem::Specification
2
2
  name: engtagger
3
- version: !ruby/object:Gem::Version
4
- version: 0.1.1
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.2
5
+ prerelease:
5
6
  platform: ruby
6
- authors:
7
+ authors:
7
8
  - Yoichiro Hasebe
8
9
  autorequire:
9
10
  bindir: bin
10
11
  cert_chain: []
11
-
12
- date: 2008-05-15 00:00:00 +09:00
13
- default_executable:
14
- dependencies:
15
- - !ruby/object:Gem::Dependency
16
- name: hpricot
17
- version_requirement:
18
- version_requirements: !ruby/object:Gem::Requirement
19
- requirements:
20
- - - ">="
21
- - !ruby/object:Gem::Version
22
- version: "0"
23
- version:
24
- - !ruby/object:Gem::Dependency
25
- name: hoe
26
- version_requirement:
27
- version_requirements: !ruby/object:Gem::Requirement
28
- requirements:
29
- - - ">="
30
- - !ruby/object:Gem::Version
31
- version: 1.5.1
32
- version:
33
- description: A Ruby port of Perl Lingua::EN::Tagger, a probability based, corpus-trained tagger that assigns POS tags to English text based on a lookup dictionary and a set of probability values. The tagger assigns appropriate tags based on conditional probabilities--it examines the preceding tag to determine the appropriate tag for the current word. Unknown words are classified according to word morphology or can be set to be treated as nouns or other parts of speech. The tagger also extracts as many nouns and noun phrases as it can, using a set of regular expressions.
34
- email: yohasebe@gmail.com
12
+ date: 2012-06-05 00:00:00.000000000 Z
13
+ dependencies: []
14
+ description: A Ruby port of Perl Lingua::EN::Tagger, a probability based, corpus-trained
15
+ tagger that assigns POS tags to English text based on a lookup dictionary and a
16
+ set of probability values.
17
+ email:
18
+ - yohasebe@gmail.com
35
19
  executables: []
36
-
37
20
  extensions: []
38
-
39
- extra_rdoc_files:
40
- - History.txt
41
- - LICENSE.txt
42
- - Manifest.txt
43
- - README.txt
44
- files:
45
- - History.txt
46
- - LICENSE.txt
47
- - Manifest.txt
48
- - README.txt
21
+ extra_rdoc_files: []
22
+ files:
23
+ - .gitignore
24
+ - Gemfile
25
+ - LICENSE
26
+ - README.md
49
27
  - Rakefile
28
+ - engtagger.gemspec
50
29
  - lib/engtagger.rb
51
30
  - lib/engtagger/porter.rb
52
31
  - lib/engtagger/pos_tags.hash
53
32
  - lib/engtagger/pos_words.hash
54
33
  - lib/engtagger/tags.yml
55
34
  - lib/engtagger/unknown.yml
35
+ - lib/engtagger/version.rb
56
36
  - lib/engtagger/words.yml
57
37
  - test/test_engtagger.rb
58
- has_rdoc: true
59
- homepage: http://engtagger.rubyforge.org
38
+ homepage: http://github.com/yohasebe/engtagger
39
+ licenses: []
60
40
  post_install_message:
61
- rdoc_options:
62
- - --main
63
- - README.txt
64
- require_paths:
41
+ rdoc_options: []
42
+ require_paths:
65
43
  - lib
66
- required_ruby_version: !ruby/object:Gem::Requirement
67
- requirements:
68
- - - ">="
69
- - !ruby/object:Gem::Version
70
- version: "0"
71
- version:
72
- required_rubygems_version: !ruby/object:Gem::Requirement
73
- requirements:
74
- - - ">="
75
- - !ruby/object:Gem::Version
76
- version: "0"
77
- version:
44
+ required_ruby_version: !ruby/object:Gem::Requirement
45
+ none: false
46
+ requirements:
47
+ - - ! '>='
48
+ - !ruby/object:Gem::Version
49
+ version: '0'
50
+ required_rubygems_version: !ruby/object:Gem::Requirement
51
+ none: false
52
+ requirements:
53
+ - - ! '>='
54
+ - !ruby/object:Gem::Version
55
+ version: '0'
78
56
  requirements: []
79
-
80
- rubyforge_project: engtagger
81
- rubygems_version: 1.1.1
57
+ rubyforge_project:
58
+ rubygems_version: 1.8.24
82
59
  signing_key:
83
- specification_version: 2
84
- summary: English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger
85
- test_files:
60
+ specification_version: 3
61
+ summary: A probability based, corpus-trained English POS tagger
62
+ test_files:
86
63
  - test/test_engtagger.rb
@@ -1,10 +0,0 @@
1
- === 0.1.0 / 2008-05-14
2
-
3
- * Modified Synopsis section of Readme.txt
4
- * Created a description of tag set in Readme.txt
5
- * Fixed a few minor bugs
6
-
7
- === 0.1.0 / 2008-05-06
8
-
9
- * Initial release
10
- * Functionalities are basically the same as those of Perl Lingua::EN::Tagger.
@@ -1,13 +0,0 @@
1
- History.txt
2
- LICENSE.txt
3
- Manifest.txt
4
- README.txt
5
- Rakefile
6
- lib/engtagger.rb
7
- lib/engtagger/porter.rb
8
- lib/engtagger/pos_tags.hash
9
- lib/engtagger/pos_words.hash
10
- lib/engtagger/tags.yml
11
- lib/engtagger/unknown.yml
12
- lib/engtagger/words.yml
13
- test/test_engtagger.rb
data/README.txt DELETED
@@ -1,140 +0,0 @@
1
- = EngTagger
2
-
3
- English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger
4
-
5
- === Description
6
-
7
- A Ruby port of Perl Lingua::EN::Tagger, a probability based, corpus-trained
8
- tagger that assigns POS tags to English text based on a lookup dictionary and
9
- a set of probability values. The tagger assigns appropriate tags based on
10
- conditional probabilities--it examines the preceding tag to determine the
11
- appropriate tag for the current word. Unknown words are classified according to
12
- word morphology or can be set to be treated as nouns or other parts of speech.
13
- The tagger also extracts as many nouns and noun phrases as it can, using a set
14
- of regular expressions.
15
-
16
- === Features
17
-
18
- * Assigns POS tags to English text
19
- * Extract noun phrases from tagged text
20
- * etc.
21
-
22
- === Synopsis:
23
-
24
- require 'rubygems'
25
- require 'engtagger'
26
-
27
- # Create a parser object
28
- tgr = EngTagger.new
29
-
30
- # Sample text
31
- text = "Alice chased the big fat cat."
32
-
33
- # Add part-of-speech tags to text
34
- tagged = tgr.add_tags(text)
35
-
36
- #=> "<nnp>Alice</nnp> <vbd>chased</vbd> <det>the</det> <jj>big</jj> <jj>fat</jj><nn>cat</nn> <pp>.</pp>"
37
-
38
- # Get a list of all nouns and noun phrases with occurrence counts
39
- word_list = tgr.get_words(text)
40
-
41
- #=> {"Alice"=>1, "cat"=>1, "fat cat"=>1, "big fat cat"=>1}
42
-
43
- # Get a readable version of the tagged text
44
- readable = tgr.get_readable(text)
45
-
46
- #=> "Alice/NNP chased/VBD the/DET big/JJ fat/JJ cat/NN ./PP"
47
-
48
- # Get all nouns from a tagged output
49
- nouns = tgr.get_nouns(tagged)
50
-
51
- #=> {"cat"=>1, "Alice"=>1}
52
-
53
- # Get all proper nouns
54
- proper = tgr.get_proper_nouns(tagged)
55
-
56
- #=> {"Alice"=>1}
57
-
58
-
59
- # Get all noun phrases of any syntactic level
60
- # (same as word_list but take a tagged input)
61
- nps = tgr.get_noun_phrases(tagged)
62
-
63
- #=> {"Alice"=>1, "cat"=>1, "fat cat"=>1, "big fat cat"=>1}
64
-
65
- === Tag Set
66
-
67
- The set of POS tags used here is a modified version of the Penn Treebank tagset. Tags with non-letter characters have been redefined to work better in our data structures. Also, the "Determiner" tag (DET) has been changed from 'DT', in order to avoid confusion with the HTML tag, <DT>.
68
-
69
- CC Conjunction, coordinating and, or
70
- CD Adjective, cardinal number 3, fifteen
71
- DET Determiner this, each, some
72
- EX Pronoun, existential there there
73
- FW Foreign words
74
- IN Preposition / Conjunction for, of, although, that
75
- JJ Adjective happy, bad
76
- JJR Adjective, comparative happier, worse
77
- JJS Adjective, superlative happiest, worst
78
- LS Symbol, list item A, A.
79
- MD Verb, modal can, could, 'll
80
- NN Noun aircraft, data
81
- NNP Noun, proper London, Michael
82
- NNPS Noun, proper, plural Australians, Methodists
83
- NNS Noun, plural women, books
84
- PDT Determiner, prequalifier quite, all, half
85
- POS Possessive 's, '
86
- PRP Determiner, possessive second mine, yours
87
- PRPS Determiner, possessive their, your
88
- RB Adverb often, not, very, here
89
- RBR Adverb, comparative faster
90
- RBS Adverb, superlative fastest
91
- RP Adverb, particle up, off, out
92
- SYM Symbol *
93
- TO Preposition to
94
- UH Interjection oh, yes, mmm
95
- VB Verb, infinitive take, live
96
- VBD Verb, past tense took, lived
97
- VBG Verb, gerund taking, living
98
- VBN Verb, past/passive participle taken, lived
99
- VBP Verb, base present form take, live
100
- VBZ Verb, present 3SG -s form takes, lives
101
- WDT Determiner, question which, whatever
102
- WP Pronoun, question who, whoever
103
- WPS Determiner, possessive & question whose
104
- WRB Adverb, question when, how, however
105
-
106
- PP Punctuation, sentence ender ., !, ?
107
- PPC Punctuation, comma ,
108
- PPD Punctuation, dollar sign $
109
- PPL Punctuation, quotation mark left ``
110
- PPR Punctuation, quotation mark right ''
111
- PPS Punctuation, colon, semicolon, elipsis :, ..., -
112
- LRB Punctuation, left bracket (, {, [
113
- RRB Punctuation, right bracket ), }, ]
114
-
115
- === Requirements
116
-
117
- * Ruby 1.8.6
118
- * Hpricot[http://code.whytheluckystiff.net/hpricot/] (optional)
119
-
120
- === Install
121
-
122
- (sudo) gem install engtagger
123
-
124
- === Author
125
-
126
- of this Ruby library
127
- * Yoichiro Hasebe (yohasebe [at] gmail.com)
128
-
129
- of the original Perl module
130
- * Aaron Coburn (acoburn [at] middlebury.edu)
131
-
132
- === Acknowledgement
133
-
134
- This Ruby library is a direct port of Lingua::EN::Tagger available at CPAN.
135
- The credit for the crucial part of its algorithm/design therefore goes to
136
- Aaron Coburn, the author of the original Perl version.
137
-
138
- === License
139
-
140
- This library is distributed under the GPL. Please see the LICENSE file.