RubyGems - stanford-core-nlp - Versions diffs - 0.1.2 → 0.1.3 - Mend

stanford-core-nlp 0.1.2 → 0.1.3

Files changed (32) hide show

data/bin/INFO +1 -0
data/lib/stanford-core-nlp.rb +1 -1
data/lib/stanford-core-nlp/stanford_annotations.rb +4 -3
metadata +6 -32
data/bin/bridge.jar +0 -0
data/bin/classifiers/all.3class.distsim.crf.ser.gz +0 -0
data/bin/classifiers/all.3class.distsim.prop +0 -52
data/bin/classifiers/conll.4class.distsim.crf.ser.gz +0 -0
data/bin/classifiers/conll.4class.distsim.prop +0 -58
data/bin/classifiers/muc.7class.distsim.crf.ser.gz +0 -0
data/bin/classifiers/muc.7class.distsim.prop +0 -50
data/bin/dcoref/animate.unigrams.txt +0 -35302
data/bin/dcoref/demonyms.txt +0 -250
data/bin/dcoref/female.unigrams.txt +0 -5467
data/bin/dcoref/inanimate.unigrams.txt +0 -80533
data/bin/dcoref/male.unigrams.txt +0 -42445
data/bin/dcoref/namegender.combine.txt +0 -14607
data/bin/dcoref/neutral.unigrams.txt +0 -30896
data/bin/dcoref/plural.unigrams.txt +0 -9618
data/bin/dcoref/singular.unigrams.txt +0 -69190
data/bin/dcoref/state-abbreviations.txt +0 -50
data/bin/dcoref/unknown.txt +0 -0
data/bin/grammar/englishFactored.ser.gz +0 -0
data/bin/grammar/englishPCFG.ser.gz +0 -0
data/bin/joda-time.jar +0 -0
data/bin/stanford-corenlp.jar +0 -0
data/bin/taggers/README-Models.txt +0 -102
data/bin/taggers/english-bidirectional-distsim.tagger +0 -0
data/bin/taggers/english-bidirectional-distsim.tagger.props +0 -33
data/bin/taggers/english-left3words-distsim.tagger +0 -0
data/bin/taggers/english-left3words-distsim.tagger.props +0 -33
data/bin/xom.jar +0 -0

data/bin/dcoref/state-abbreviations.txt DELETED Viewed

@@ -1,50 +0,0 @@
-Alabama	Ala.	AL
-Alaska	Alaska	AK
-Arizona	Ariz.	AZ
-Arkansas	Ark.	AR
-California	Calif.	CA
-Colorado	Colo.	CO
-Connecticut	Conn.	CT
-Delaware	Del.	DE
-Florida	Fla.	FL
-Georgia	Ga.	GA
-Hawaii	Hawaii	HI
-Idaho	Idaho	ID
-Illinois	Ill.	IL
-Indiana	Ind.	IN
-Iowa	Iowa	IA
-Kansas	Kans.	KS
-Kentucky	Ky.	KY
-Louisiana	La.	LA
-Maine	Maine	ME
-Maryland	Md.	MD
-Massachusetts	Mass.	MA
-Michigan	Mich.	MI
-Minnesota	Minn.	MN
-Mississippi	Miss.	MS
-Missouri	Mo.	MO
-Montana	Mont.	MT
-Nebraska	Nebr.	NE
-Nevada	Nev.	NV
-New Hampshire	N.H.	NH
-New Jersey	N.J.	NJ
-New Mexico	N.M.	NM
-New York	N.Y.	NY
-North Carolina	N.C.	NC
-North Dakota	N.D.	ND
-Ohio	Ohio	OH
-Oklahoma	Okla.	OK
-Oregon	Ore.	OR
-Pennsylvania	Pa.	PA
-Rhode Island	R.I.	RI
-South Carolina	S.C.	SC
-South Dakota	S.D.	SD
-Tennessee	Tenn.	TN
-Texas	Tex.	TX
-Utah	Utah	UT
-Vermont	Vt.	VT
-Virginia	Va.	VA
-Washington	Wash.	WA
-West Virginia	W.Va.	WV
-Wisconsin	Wis.	WI
-Wyoming	Wyo.	WY

data/bin/dcoref/unknown.txt DELETED Viewed

File without changes

data/bin/grammar/englishFactored.ser.gz DELETED Viewed

Binary file

data/bin/grammar/englishPCFG.ser.gz DELETED Viewed

Binary file

data/bin/joda-time.jar DELETED Viewed

Binary file

data/bin/stanford-corenlp.jar DELETED Viewed

Binary file

data/bin/taggers/README-Models.txt DELETED Viewed

@@ -1,102 +0,0 @@
-Stanford POS Tagger, v. 3.1.0 - 2011-12-16
-Copyright (c) 2002-2011 The Board of Trustees of
-The Leland Stanford Junior University. All Rights Reserved.
-This document contains (some) information about the models included in
-this release and that may be downloaded for the POS tagger website at
-http://nlp.stanford.edu/software/tagger.shtml .  If you have downloaded
-the full tagger, all of the models mentioned in this document are in the
-downloaded package in the same directory as this readme.  Otherwise,
-included in the download are two
-English taggers, and the other taggers may be downloaded from the
-website.  All taggers are accompanied by the props files used to create
-them; please examine these files for more detailed information about the
-creation of the taggers.
-For English, the bidirectional taggers are slightly more accurate, but
-tag much more slowly; choose the appropriate tagger based on your
-speed/performance needs.
-English taggers
----------------------------
-bidirectional-distsim-wsj-0-18.tagger
-Trained on WSJ sections 0-18 using a bidirectional architecture and
-including word shape and distributional similarity features.
-Penn Treebank tagset.
-Performance:
-97.28% correct on WSJ 19-21
-(90.46% correct on unknown words)
-left3words-wsj-0-18.tagger
-Trained on WSJ sections 0-18 using the left3words architecture and
-includes word shape features.  Penn tagset.
-Performance:
-96.97% correct on WSJ 19-21
-(88.85% correct on unknown words)
-left3words-distsim-wsj-0-18.tagger
-Trained on WSJ sections 0-18 using the left3words architecture and
-includes word shape and distributional similarity features. Penn tagset.
-Performance:
-97.01% correct on WSJ 19-21
-(89.81% correct on unknown words)
-Chinese tagger
----------------------------
-chinese.tagger
-Trained on a combination of Chinese Treebank texts from Chinese and Hong
-Kong sources.
-LDC Chinese Treebank POS tag set.
-Performance:
-94.13% on a combination of Chinese and Hong Kong texts
-(78.92% on unknown words)
-Arabic tagger
----------------------------
-arabic-accurate.tagger
-Trained on the *entire* ATB p1-3.
-When trained on the train part of the ATB p1-3 split done for the 2005
-JHU Summer Workshop (Diab split), using (augmented) Bies tags, it gets
-the following performance:
-Performance:
-96.50% on dev portion according to Diab split
-(80.59% on unknown words)
-arabic-fast.tagger
-4x speed improvement over "accurate".
-Performance:
-96.34% on dev portion according to Diab split
-(80.28% on unknown words)
-French tagger
----------------------------
-french.tagger
-Trained on the French treebank.
-German tagger
----------------------------
-german-hgc.tagger
-Trained on the first 80% of the Negra corpus, which uses the STTS tagset.
-The Stuttgart-Tübingen Tagset (STTS) is a set of 54 tags for annotating
-German text corpora with part-of-speech labels, which was jointly
-developed by the Institut für maschinelle Sprachverarbeitung of the
-University of Stuttgart and the Seminar für Sprachwissenschaft of the
-University of Tübingen. See:
-http://www.ims.uni-stuttgart.de/projekte/CQPDemos/Bundestag/help-tagset.html
-This model uses features from the distributional similarity clusters
-built over the HGC.
-Performance:
-96.90% on the first half of the remaining 20% of the Negra corpus (dev set)
-(90.33% on unknown words)
-german-dewac.tagger
-This model uses features from the distributional similarity clusters
-built from the deWac web corpus.
-german-fast.tagger
-Lacks distributional similarity features, but is several times faster
-than the other alternatives.
-Performance:
-96.61% overall / 86.72% unknown.

data/bin/taggers/english-bidirectional-distsim.tagger DELETED Viewed

Binary file

data/bin/taggers/english-bidirectional-distsim.tagger.props DELETED Viewed

@@ -1,33 +0,0 @@
-## tagger training invoked at Thu Dec 15 01:17:19 PST 2011 with arguments:
-                   model = english-bidirectional-distsim.tagger
-                    arch = bidirectional5words,naacl2003unknowns,allwordshapes(-1,1),distsim(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1),distsimconjunction(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1)
-               trainFile = /u/nlp/data/pos-tagger/english/train-wsj-0-18;/u/nlp/data/pos-tagger/english/train-extra-english
-         closedClassTags =
- closedClassTagThreshold = 40
- curWordMinFeatureThresh = 2
-                   debug = false
-             debugPrefix =
-            tagSeparator = _
-                encoding = UTF-8
-              iterations = 100
-                    lang = english
-    learnClosedClassTags = false
-        minFeatureThresh = 2
-           openClassTags =
-rareWordMinFeatureThresh = 5
-          rareWordThresh = 5
-                  search = owlqn
-                    sgml = false
-            sigmaSquared = 0.5
-                   regL1 = 0.75
-               tagInside =
-                tokenize = true
-        tokenizerFactory =
-        tokenizerOptions =
-                 verbose = false
-          verboseResults = true
-    veryCommonWordThresh = 250
-                xmlInput =
-              outputFile =
-            outputFormat = slashTags
-     outputFormatOptions =

data/bin/taggers/english-left3words-distsim.tagger DELETED Viewed

Binary file

data/bin/taggers/english-left3words-distsim.tagger.props DELETED Viewed

@@ -1,33 +0,0 @@
-## tagger training invoked at Thu Dec 15 01:17:21 PST 2011 with arguments:
-                   model = english-left3words-distsim.tagger
-                    arch = left3words,naacl2003unknowns,wordshapes(-1,1),distsim(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1),distsimconjunction(/u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters,-1,1)
-               trainFile = /u/nlp/data/pos-tagger/english/train-wsj-0-18;/u/nlp/data/pos-tagger/english/train-extra-english
-         closedClassTags =
- closedClassTagThreshold = 40
- curWordMinFeatureThresh = 2
-                   debug = false
-             debugPrefix =
-            tagSeparator = _
-                encoding = UTF-8
-              iterations = 100
-                    lang = english
-    learnClosedClassTags = false
-        minFeatureThresh = 2
-           openClassTags =
-rareWordMinFeatureThresh = 10
-          rareWordThresh = 5
-                  search = owlqn
-                    sgml = false
-            sigmaSquared = 0.0
-                   regL1 = 0.75
-               tagInside =
-                tokenize = true
-        tokenizerFactory =
-        tokenizerOptions =
-                 verbose = false
-          verboseResults = true
-    veryCommonWordThresh = 250
-                xmlInput =
-              outputFile =
-            outputFormat = slashTags
-     outputFormatOptions =

data/bin/xom.jar DELETED Viewed

Binary file