anystyle 1.3.0 → 1.3.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +74 -12
- data/lib/anystyle/normalizer/volume.rb +1 -1
- data/lib/anystyle/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: '082895ba5f17e070ac5c85afe9849b27097497622ac25f81eacbba205af1fec8'
|
4
|
+
data.tar.gz: b3f288e9cb22ce9a4dc74970cabf866f5349c60d52b4e8b1c6624b6fd6dfbd1a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 47ea4876f749891b2f305456404e585dcaee27690caa80091836f4b81ee191e548a7a799db001bfdcda58ed9784485416e8abd379b3d712804054cf78a5b4417
|
7
|
+
data.tar.gz: b7ce8f5116f2ca211ab5f42ae345aa112954ae895aaa0668ef93218c66b4696bfeba9da1f513fa28669b3dfb668033cf94c7857f1450a8320e685d5d16f50d37
|
data/README.md
CHANGED
@@ -20,6 +20,35 @@ Using AnyStyle CLI
|
|
20
20
|
|
21
21
|
See [anystyle-cli](https://github.com/inukshuk/anystyle-cli) for more details.
|
22
22
|
|
23
|
+
Using AnyStyle in Ruby
|
24
|
+
----------------------
|
25
|
+
Install the `anystyle` gem.
|
26
|
+
|
27
|
+
$ [sudo] gem install anystyle
|
28
|
+
|
29
|
+
Once installed, you can use the static Parser and Finder instances
|
30
|
+
by calling the `AnyStyle.parse` or `AnyStyle.find` methods. For example:
|
31
|
+
|
32
|
+
```ruby
|
33
|
+
require 'anystyle'
|
34
|
+
|
35
|
+
pp AnyStyle.parse 'Derrida, J. (1967). L’écriture et la différence (1 éd.). Paris: Éditions du Seuil.'
|
36
|
+
#-> [{
|
37
|
+
# :author=>[{:family=>"Derrida", :given=>"J."}],
|
38
|
+
# :date=>["1967"],
|
39
|
+
# :title=>["L’écriture et la différence"],
|
40
|
+
# :edition=>["1"],
|
41
|
+
# :location=>["Paris"],
|
42
|
+
# :publisher=>["Éditions du Seuil"],
|
43
|
+
# :language=>"fr",
|
44
|
+
# :scripts=>["Common", "Latin"],
|
45
|
+
# :type=>"book"
|
46
|
+
#}]
|
47
|
+
```
|
48
|
+
|
49
|
+
Alternatively, you can create your own `AnyStyle::Parser` or
|
50
|
+
`AnyStyle::Finder` with custom options.
|
51
|
+
|
23
52
|
|
24
53
|
Web Application and Web Service
|
25
54
|
-------------------------------
|
@@ -30,20 +59,53 @@ Please note that the web service is currently based on the legacy
|
|
30
59
|
[0.x branch](https://github.com/inukshuk/anystyle/tree/0.x).
|
31
60
|
|
32
61
|
|
33
|
-
Using AnyStyle in Ruby
|
34
|
-
----------------------
|
35
|
-
|
36
|
-
$ [sudo] gem install anystyle
|
37
|
-
|
38
|
-
|
39
|
-
Reference Parsing
|
40
|
-
-----------------
|
41
|
-
|
42
|
-
Document Parsing
|
43
|
-
----------------
|
44
|
-
|
45
62
|
Training
|
46
63
|
--------
|
64
|
+
You can train custom Finder and Parser models. To do this, you need
|
65
|
+
to prepare your own data sets for training. You can create your own
|
66
|
+
data from scratch or build on AnyStyle's default sets. The default
|
67
|
+
parser model is based on the
|
68
|
+
[core](https://github.com/inukshuk/anystyle/blob/master/res/parser/core.xml)
|
69
|
+
data set; the default finder model source data is not publicly
|
70
|
+
available in its entirety, but you can find a number of tagged
|
71
|
+
documents
|
72
|
+
[here](https://github.com/inukshuk/anystyle/blob/master/res/finder).
|
73
|
+
|
74
|
+
When you have compiled a data set for training, you will be ready
|
75
|
+
to create your own model:
|
76
|
+
|
77
|
+
$ anystyle train training-data.xml custom.mod
|
78
|
+
|
79
|
+
This will save your new model as `custom.mod`. To use your model
|
80
|
+
instead of AnyStyle's default, use the `-P` or `--parser-model` flag
|
81
|
+
and, respectively, `-F` or `--finder-model` to use a custom Finder
|
82
|
+
model. For instance, the command below would parse all references
|
83
|
+
in `bib.txt` using the custom model we just trained and print the
|
84
|
+
result to STDOUT using the JSON output format:
|
85
|
+
|
86
|
+
$ anystyle -P custom.mod -f json parse bib.txt -
|
87
|
+
|
88
|
+
When training your own models, it is good practice to check the
|
89
|
+
quality using a second data set. For example, using AnyStyle's own
|
90
|
+
[gold](https://github.com/inukshuk/anystyle/blob/master/res/parser/gold.xml)
|
91
|
+
data set (a large, manually curated data set) we could check our
|
92
|
+
custom model like this:
|
93
|
+
|
94
|
+
$ anystyle -P x.mod check ./res/parser/gold.xml
|
95
|
+
Checking gold.xml................. 1 seq 0.06% 3 tok 0.01% 3s
|
96
|
+
|
97
|
+
This command will print the sequence and token error rates; in
|
98
|
+
the case of AnyStyle a the number of sequence errors is the number
|
99
|
+
of references which were tagged differently by the parser than they
|
100
|
+
were in the input; the number of token errors is the total number of
|
101
|
+
words across all the references which were tagged differently. In the
|
102
|
+
example above, we got one reference wrong (out of 1700 at the time);
|
103
|
+
but even this one reference was mostly tagged correctly, because only
|
104
|
+
a total of 3 words were tagged differently.
|
105
|
+
|
106
|
+
When working with training data, it is a good idea to use the
|
107
|
+
`Wapiti::Dataset` API in Ruby: it supports all the standard set
|
108
|
+
operators and makes it very easy to combine or compare data sets.
|
47
109
|
|
48
110
|
Dictionary Adapters
|
49
111
|
-------------------
|
data/lib/anystyle/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: anystyle
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.3.
|
4
|
+
version: 1.3.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Sylvester Keil
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-09-
|
11
|
+
date: 2018-09-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bibtex-ruby
|