stanford-core-nlp 0.1.2 → 0.1.3
Sign up to get free protection for your applications and to get access to all the features.
- data/bin/INFO +1 -0
- data/lib/stanford-core-nlp.rb +1 -1
- data/lib/stanford-core-nlp/stanford_annotations.rb +4 -3
- metadata +6 -32
- data/bin/bridge.jar +0 -0
- data/bin/classifiers/all.3class.distsim.crf.ser.gz +0 -0
- data/bin/classifiers/all.3class.distsim.prop +0 -52
- data/bin/classifiers/conll.4class.distsim.crf.ser.gz +0 -0
- data/bin/classifiers/conll.4class.distsim.prop +0 -58
- data/bin/classifiers/muc.7class.distsim.crf.ser.gz +0 -0
- data/bin/classifiers/muc.7class.distsim.prop +0 -50
- data/bin/dcoref/animate.unigrams.txt +0 -35302
- data/bin/dcoref/demonyms.txt +0 -250
- data/bin/dcoref/female.unigrams.txt +0 -5467
- data/bin/dcoref/inanimate.unigrams.txt +0 -80533
- data/bin/dcoref/male.unigrams.txt +0 -42445
- data/bin/dcoref/namegender.combine.txt +0 -14607
- data/bin/dcoref/neutral.unigrams.txt +0 -30896
- data/bin/dcoref/plural.unigrams.txt +0 -9618
- data/bin/dcoref/singular.unigrams.txt +0 -69190
- data/bin/dcoref/state-abbreviations.txt +0 -50
- data/bin/dcoref/unknown.txt +0 -0
- data/bin/grammar/englishFactored.ser.gz +0 -0
- data/bin/grammar/englishPCFG.ser.gz +0 -0
- data/bin/joda-time.jar +0 -0
- data/bin/stanford-corenlp.jar +0 -0
- data/bin/taggers/README-Models.txt +0 -102
- data/bin/taggers/english-bidirectional-distsim.tagger +0 -0
- data/bin/taggers/english-bidirectional-distsim.tagger.props +0 -33
- data/bin/taggers/english-left3words-distsim.tagger +0 -0
- data/bin/taggers/english-left3words-distsim.tagger.props +0 -33
- data/bin/xom.jar +0 -0
@@ -1,50 +0,0 @@
|
|
1
|
-
Alabama Ala. AL
|
2
|
-
Alaska Alaska AK
|
3
|
-
Arizona Ariz. AZ
|
4
|
-
Arkansas Ark. AR
|
5
|
-
California Calif. CA
|
6
|
-
Colorado Colo. CO
|
7
|
-
Connecticut Conn. CT
|
8
|
-
Delaware Del. DE
|
9
|
-
Florida Fla. FL
|
10
|
-
Georgia Ga. GA
|
11
|
-
Hawaii Hawaii HI
|
12
|
-
Idaho Idaho ID
|
13
|
-
Illinois Ill. IL
|
14
|
-
Indiana Ind. IN
|
15
|
-
Iowa Iowa IA
|
16
|
-
Kansas Kans. KS
|
17
|
-
Kentucky Ky. KY
|
18
|
-
Louisiana La. LA
|
19
|
-
Maine Maine ME
|
20
|
-
Maryland Md. MD
|
21
|
-
Massachusetts Mass. MA
|
22
|
-
Michigan Mich. MI
|
23
|
-
Minnesota Minn. MN
|
24
|
-
Mississippi Miss. MS
|
25
|
-
Missouri Mo. MO
|
26
|
-
Montana Mont. MT
|
27
|
-
Nebraska Nebr. NE
|
28
|
-
Nevada Nev. NV
|
29
|
-
New Hampshire N.H. NH
|
30
|
-
New Jersey N.J. NJ
|
31
|
-
New Mexico N.M. NM
|
32
|
-
New York N.Y. NY
|
33
|
-
North Carolina N.C. NC
|
34
|
-
North Dakota N.D. ND
|
35
|
-
Ohio Ohio OH
|
36
|
-
Oklahoma Okla. OK
|
37
|
-
Oregon Ore. OR
|
38
|
-
Pennsylvania Pa. PA
|
39
|
-
Rhode Island R.I. RI
|
40
|
-
South Carolina S.C. SC
|
41
|
-
South Dakota S.D. SD
|
42
|
-
Tennessee Tenn. TN
|
43
|
-
Texas Tex. TX
|
44
|
-
Utah Utah UT
|
45
|
-
Vermont Vt. VT
|
46
|
-
Virginia Va. VA
|
47
|
-
Washington Wash. WA
|
48
|
-
West Virginia W.Va. WV
|
49
|
-
Wisconsin Wis. WI
|
50
|
-
Wyoming Wyo. WY
|
data/bin/dcoref/unknown.txt
DELETED
File without changes
|
Binary file
|
Binary file
|
data/bin/joda-time.jar
DELETED
Binary file
|
data/bin/stanford-corenlp.jar
DELETED
Binary file
|
@@ -1,102 +0,0 @@
|
|
1
|
-
Stanford POS Tagger, v. 3.1.0 - 2011-12-16
|
2
|
-
Copyright (c) 2002-2011 The Board of Trustees of
|
3
|
-
The Leland Stanford Junior University. All Rights Reserved.
|
4
|
-
|
5
|
-
This document contains (some) information about the models included in
|
6
|
-
this release and that may be downloaded for the POS tagger website at
|
7
|
-
http://nlp.stanford.edu/software/tagger.shtml . If you have downloaded
|
8
|
-
the full tagger, all of the models mentioned in this document are in the
|
9
|
-
downloaded package in the same directory as this readme. Otherwise,
|
10
|
-
included in the download are two
|
11
|
-
English taggers, and the other taggers may be downloaded from the
|
12
|
-
website. All taggers are accompanied by the props files used to create
|
13
|
-
them; please examine these files for more detailed information about the
|
14
|
-
creation of the taggers.
|
15
|
-
|
16
|
-
For English, the bidirectional taggers are slightly more accurate, but
|
17
|
-
tag much more slowly; choose the appropriate tagger based on your
|
18
|
-
speed/performance needs.
|
19
|
-
|
20
|
-
English taggers
|
21
|
-
---------------------------
|
22
|
-
bidirectional-distsim-wsj-0-18.tagger
|
23
|
-
Trained on WSJ sections 0-18 using a bidirectional architecture and
|
24
|
-
including word shape and distributional similarity features.
|
25
|
-
Penn Treebank tagset.
|
26
|
-
Performance:
|
27
|
-
97.28% correct on WSJ 19-21
|
28
|
-
(90.46% correct on unknown words)
|
29
|
-
|
30
|
-
left3words-wsj-0-18.tagger
|
31
|
-
Trained on WSJ sections 0-18 using the left3words architecture and
|
32
|
-
includes word shape features. Penn tagset.
|
33
|
-
Performance:
|
34
|
-
96.97% correct on WSJ 19-21
|
35
|
-
(88.85% correct on unknown words)
|
36
|
-
|
37
|
-
left3words-distsim-wsj-0-18.tagger
|
38
|
-
Trained on WSJ sections 0-18 using the left3words architecture and
|
39
|
-
includes word shape and distributional similarity features. Penn tagset.
|
40
|
-
Performance:
|
41
|
-
97.01% correct on WSJ 19-21
|
42
|
-
(89.81% correct on unknown words)
|
43
|
-
|
44
|
-
|
45
|
-
Chinese tagger
|
46
|
-
---------------------------
|
47
|
-
chinese.tagger
|
48
|
-
Trained on a combination of Chinese Treebank texts from Chinese and Hong
|
49
|
-
Kong sources.
|
50
|
-
LDC Chinese Treebank POS tag set.
|
51
|
-
Performance:
|
52
|
-
94.13% on a combination of Chinese and Hong Kong texts
|
53
|
-
(78.92% on unknown words)
|
54
|
-
|
55
|
-
Arabic tagger
|
56
|
-
---------------------------
|
57
|
-
arabic-accurate.tagger
|
58
|
-
Trained on the *entire* ATB p1-3.
|
59
|
-
When trained on the train part of the ATB p1-3 split done for the 2005
|
60
|
-
JHU Summer Workshop (Diab split), using (augmented) Bies tags, it gets
|
61
|
-
the following performance:
|
62
|
-
Performance:
|
63
|
-
96.50% on dev portion according to Diab split
|
64
|
-
(80.59% on unknown words)
|
65
|
-
|
66
|
-
arabic-fast.tagger
|
67
|
-
4x speed improvement over "accurate".
|
68
|
-
Performance:
|
69
|
-
96.34% on dev portion according to Diab split
|
70
|
-
(80.28% on unknown words)
|
71
|
-
|
72
|
-
|
73
|
-
French tagger
|
74
|
-
---------------------------
|
75
|
-
french.tagger
|
76
|
-
Trained on the French treebank.
|
77
|
-
|
78
|
-
German tagger
|
79
|
-
---------------------------
|
80
|
-
german-hgc.tagger
|
81
|
-
Trained on the first 80% of the Negra corpus, which uses the STTS tagset.
|
82
|
-
The Stuttgart-Tübingen Tagset (STTS) is a set of 54 tags for annotating
|
83
|
-
German text corpora with part-of-speech labels, which was jointly
|
84
|
-
developed by the Institut für maschinelle Sprachverarbeitung of the
|
85
|
-
University of Stuttgart and the Seminar für Sprachwissenschaft of the
|
86
|
-
University of Tübingen. See:
|
87
|
-
http://www.ims.uni-stuttgart.de/projekte/CQPDemos/Bundestag/help-tagset.html
|
88
|
-
This model uses features from the distributional similarity clusters
|
89
|
-
built over the HGC.
|
90
|
-
Performance:
|
91
|
-
96.90% on the first half of the remaining 20% of the Negra corpus (dev set)
|
92
|
-
(90.33% on unknown words)
|
93
|
-
|
94
|
-
german-dewac.tagger
|
95
|
-
This model uses features from the distributional similarity clusters
|
96
|
-
built from the deWac web corpus.
|
97
|
-
|
98
|
-
german-fast.tagger
|
99
|
-
Lacks distributional similarity features, but is several times faster
|
100
|
-
than the other alternatives.
|
101
|
-
Performance:
|
102
|
-
96.61% overall / 86.72% unknown.
|
Binary file
|
@@ -1,33 +0,0 @@
|
|
1
|
-
## tagger training invoked at Thu Dec 15 01:17:19 PST 2011 with arguments:
|
2
|
-
model = english-bidirectional-distsim.tagger
|
3
|
-
arch = bidirectional5words,naacl2003unknowns,allwordshapes(-1,1),distsim(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1),distsimconjunction(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1)
|
4
|
-
trainFile = /u/nlp/data/pos-tagger/english/train-wsj-0-18;/u/nlp/data/pos-tagger/english/train-extra-english
|
5
|
-
closedClassTags =
|
6
|
-
closedClassTagThreshold = 40
|
7
|
-
curWordMinFeatureThresh = 2
|
8
|
-
debug = false
|
9
|
-
debugPrefix =
|
10
|
-
tagSeparator = _
|
11
|
-
encoding = UTF-8
|
12
|
-
iterations = 100
|
13
|
-
lang = english
|
14
|
-
learnClosedClassTags = false
|
15
|
-
minFeatureThresh = 2
|
16
|
-
openClassTags =
|
17
|
-
rareWordMinFeatureThresh = 5
|
18
|
-
rareWordThresh = 5
|
19
|
-
search = owlqn
|
20
|
-
sgml = false
|
21
|
-
sigmaSquared = 0.5
|
22
|
-
regL1 = 0.75
|
23
|
-
tagInside =
|
24
|
-
tokenize = true
|
25
|
-
tokenizerFactory =
|
26
|
-
tokenizerOptions =
|
27
|
-
verbose = false
|
28
|
-
verboseResults = true
|
29
|
-
veryCommonWordThresh = 250
|
30
|
-
xmlInput =
|
31
|
-
outputFile =
|
32
|
-
outputFormat = slashTags
|
33
|
-
outputFormatOptions =
|
Binary file
|
@@ -1,33 +0,0 @@
|
|
1
|
-
## tagger training invoked at Thu Dec 15 01:17:21 PST 2011 with arguments:
|
2
|
-
model = english-left3words-distsim.tagger
|
3
|
-
arch = left3words,naacl2003unknowns,wordshapes(-1,1),distsim(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1),distsimconjunction(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1)
|
4
|
-
trainFile = /u/nlp/data/pos-tagger/english/train-wsj-0-18;/u/nlp/data/pos-tagger/english/train-extra-english
|
5
|
-
closedClassTags =
|
6
|
-
closedClassTagThreshold = 40
|
7
|
-
curWordMinFeatureThresh = 2
|
8
|
-
debug = false
|
9
|
-
debugPrefix =
|
10
|
-
tagSeparator = _
|
11
|
-
encoding = UTF-8
|
12
|
-
iterations = 100
|
13
|
-
lang = english
|
14
|
-
learnClosedClassTags = false
|
15
|
-
minFeatureThresh = 2
|
16
|
-
openClassTags =
|
17
|
-
rareWordMinFeatureThresh = 10
|
18
|
-
rareWordThresh = 5
|
19
|
-
search = owlqn
|
20
|
-
sgml = false
|
21
|
-
sigmaSquared = 0.0
|
22
|
-
regL1 = 0.75
|
23
|
-
tagInside =
|
24
|
-
tokenize = true
|
25
|
-
tokenizerFactory =
|
26
|
-
tokenizerOptions =
|
27
|
-
verbose = false
|
28
|
-
verboseResults = true
|
29
|
-
veryCommonWordThresh = 250
|
30
|
-
xmlInput =
|
31
|
-
outputFile =
|
32
|
-
outputFormat = slashTags
|
33
|
-
outputFormatOptions =
|
data/bin/xom.jar
DELETED
Binary file
|