stanford-core-nlp 0.1.2 → 0.1.3

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,50 +0,0 @@
1
- Alabama Ala. AL
2
- Alaska Alaska AK
3
- Arizona Ariz. AZ
4
- Arkansas Ark. AR
5
- California Calif. CA
6
- Colorado Colo. CO
7
- Connecticut Conn. CT
8
- Delaware Del. DE
9
- Florida Fla. FL
10
- Georgia Ga. GA
11
- Hawaii Hawaii HI
12
- Idaho Idaho ID
13
- Illinois Ill. IL
14
- Indiana Ind. IN
15
- Iowa Iowa IA
16
- Kansas Kans. KS
17
- Kentucky Ky. KY
18
- Louisiana La. LA
19
- Maine Maine ME
20
- Maryland Md. MD
21
- Massachusetts Mass. MA
22
- Michigan Mich. MI
23
- Minnesota Minn. MN
24
- Mississippi Miss. MS
25
- Missouri Mo. MO
26
- Montana Mont. MT
27
- Nebraska Nebr. NE
28
- Nevada Nev. NV
29
- New Hampshire N.H. NH
30
- New Jersey N.J. NJ
31
- New Mexico N.M. NM
32
- New York N.Y. NY
33
- North Carolina N.C. NC
34
- North Dakota N.D. ND
35
- Ohio Ohio OH
36
- Oklahoma Okla. OK
37
- Oregon Ore. OR
38
- Pennsylvania Pa. PA
39
- Rhode Island R.I. RI
40
- South Carolina S.C. SC
41
- South Dakota S.D. SD
42
- Tennessee Tenn. TN
43
- Texas Tex. TX
44
- Utah Utah UT
45
- Vermont Vt. VT
46
- Virginia Va. VA
47
- Washington Wash. WA
48
- West Virginia W.Va. WV
49
- Wisconsin Wis. WI
50
- Wyoming Wyo. WY
File without changes
Binary file
Binary file
data/bin/joda-time.jar DELETED
Binary file
Binary file
@@ -1,102 +0,0 @@
1
- Stanford POS Tagger, v. 3.1.0 - 2011-12-16
2
- Copyright (c) 2002-2011 The Board of Trustees of
3
- The Leland Stanford Junior University. All Rights Reserved.
4
-
5
- This document contains (some) information about the models included in
6
- this release and that may be downloaded for the POS tagger website at
7
- http://nlp.stanford.edu/software/tagger.shtml . If you have downloaded
8
- the full tagger, all of the models mentioned in this document are in the
9
- downloaded package in the same directory as this readme. Otherwise,
10
- included in the download are two
11
- English taggers, and the other taggers may be downloaded from the
12
- website. All taggers are accompanied by the props files used to create
13
- them; please examine these files for more detailed information about the
14
- creation of the taggers.
15
-
16
- For English, the bidirectional taggers are slightly more accurate, but
17
- tag much more slowly; choose the appropriate tagger based on your
18
- speed/performance needs.
19
-
20
- English taggers
21
- ---------------------------
22
- bidirectional-distsim-wsj-0-18.tagger
23
- Trained on WSJ sections 0-18 using a bidirectional architecture and
24
- including word shape and distributional similarity features.
25
- Penn Treebank tagset.
26
- Performance:
27
- 97.28% correct on WSJ 19-21
28
- (90.46% correct on unknown words)
29
-
30
- left3words-wsj-0-18.tagger
31
- Trained on WSJ sections 0-18 using the left3words architecture and
32
- includes word shape features. Penn tagset.
33
- Performance:
34
- 96.97% correct on WSJ 19-21
35
- (88.85% correct on unknown words)
36
-
37
- left3words-distsim-wsj-0-18.tagger
38
- Trained on WSJ sections 0-18 using the left3words architecture and
39
- includes word shape and distributional similarity features. Penn tagset.
40
- Performance:
41
- 97.01% correct on WSJ 19-21
42
- (89.81% correct on unknown words)
43
-
44
-
45
- Chinese tagger
46
- ---------------------------
47
- chinese.tagger
48
- Trained on a combination of Chinese Treebank texts from Chinese and Hong
49
- Kong sources.
50
- LDC Chinese Treebank POS tag set.
51
- Performance:
52
- 94.13% on a combination of Chinese and Hong Kong texts
53
- (78.92% on unknown words)
54
-
55
- Arabic tagger
56
- ---------------------------
57
- arabic-accurate.tagger
58
- Trained on the *entire* ATB p1-3.
59
- When trained on the train part of the ATB p1-3 split done for the 2005
60
- JHU Summer Workshop (Diab split), using (augmented) Bies tags, it gets
61
- the following performance:
62
- Performance:
63
- 96.50% on dev portion according to Diab split
64
- (80.59% on unknown words)
65
-
66
- arabic-fast.tagger
67
- 4x speed improvement over "accurate".
68
- Performance:
69
- 96.34% on dev portion according to Diab split
70
- (80.28% on unknown words)
71
-
72
-
73
- French tagger
74
- ---------------------------
75
- french.tagger
76
- Trained on the French treebank.
77
-
78
- German tagger
79
- ---------------------------
80
- german-hgc.tagger
81
- Trained on the first 80% of the Negra corpus, which uses the STTS tagset.
82
- The Stuttgart-Tübingen Tagset (STTS) is a set of 54 tags for annotating
83
- German text corpora with part-of-speech labels, which was jointly
84
- developed by the Institut für maschinelle Sprachverarbeitung of the
85
- University of Stuttgart and the Seminar für Sprachwissenschaft of the
86
- University of Tübingen. See:
87
- http://www.ims.uni-stuttgart.de/projekte/CQPDemos/Bundestag/help-tagset.html
88
- This model uses features from the distributional similarity clusters
89
- built over the HGC.
90
- Performance:
91
- 96.90% on the first half of the remaining 20% of the Negra corpus (dev set)
92
- (90.33% on unknown words)
93
-
94
- german-dewac.tagger
95
- This model uses features from the distributional similarity clusters
96
- built from the deWac web corpus.
97
-
98
- german-fast.tagger
99
- Lacks distributional similarity features, but is several times faster
100
- than the other alternatives.
101
- Performance:
102
- 96.61% overall / 86.72% unknown.
@@ -1,33 +0,0 @@
1
- ## tagger training invoked at Thu Dec 15 01:17:19 PST 2011 with arguments:
2
- model = english-bidirectional-distsim.tagger
3
- arch = bidirectional5words,naacl2003unknowns,allwordshapes(-1,1),distsim(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1),distsimconjunction(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1)
4
- trainFile = /u/nlp/data/pos-tagger/english/train-wsj-0-18;/u/nlp/data/pos-tagger/english/train-extra-english
5
- closedClassTags =
6
- closedClassTagThreshold = 40
7
- curWordMinFeatureThresh = 2
8
- debug = false
9
- debugPrefix =
10
- tagSeparator = _
11
- encoding = UTF-8
12
- iterations = 100
13
- lang = english
14
- learnClosedClassTags = false
15
- minFeatureThresh = 2
16
- openClassTags =
17
- rareWordMinFeatureThresh = 5
18
- rareWordThresh = 5
19
- search = owlqn
20
- sgml = false
21
- sigmaSquared = 0.5
22
- regL1 = 0.75
23
- tagInside =
24
- tokenize = true
25
- tokenizerFactory =
26
- tokenizerOptions =
27
- verbose = false
28
- verboseResults = true
29
- veryCommonWordThresh = 250
30
- xmlInput =
31
- outputFile =
32
- outputFormat = slashTags
33
- outputFormatOptions =
@@ -1,33 +0,0 @@
1
- ## tagger training invoked at Thu Dec 15 01:17:21 PST 2011 with arguments:
2
- model = english-left3words-distsim.tagger
3
- arch = left3words,naacl2003unknowns,wordshapes(-1,1),distsim(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1),distsimconjunction(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1)
4
- trainFile = /u/nlp/data/pos-tagger/english/train-wsj-0-18;/u/nlp/data/pos-tagger/english/train-extra-english
5
- closedClassTags =
6
- closedClassTagThreshold = 40
7
- curWordMinFeatureThresh = 2
8
- debug = false
9
- debugPrefix =
10
- tagSeparator = _
11
- encoding = UTF-8
12
- iterations = 100
13
- lang = english
14
- learnClosedClassTags = false
15
- minFeatureThresh = 2
16
- openClassTags =
17
- rareWordMinFeatureThresh = 10
18
- rareWordThresh = 5
19
- search = owlqn
20
- sgml = false
21
- sigmaSquared = 0.0
22
- regL1 = 0.75
23
- tagInside =
24
- tokenize = true
25
- tokenizerFactory =
26
- tokenizerOptions =
27
- verbose = false
28
- verboseResults = true
29
- veryCommonWordThresh = 250
30
- xmlInput =
31
- outputFile =
32
- outputFormat = slashTags
33
- outputFormatOptions =
data/bin/xom.jar DELETED
Binary file