anystyle 1.3.0 → 1.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +74 -12
- data/lib/anystyle/normalizer/volume.rb +1 -1
- data/lib/anystyle/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: '082895ba5f17e070ac5c85afe9849b27097497622ac25f81eacbba205af1fec8'
|
4
|
+
data.tar.gz: b3f288e9cb22ce9a4dc74970cabf866f5349c60d52b4e8b1c6624b6fd6dfbd1a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 47ea4876f749891b2f305456404e585dcaee27690caa80091836f4b81ee191e548a7a799db001bfdcda58ed9784485416e8abd379b3d712804054cf78a5b4417
|
7
|
+
data.tar.gz: b7ce8f5116f2ca211ab5f42ae345aa112954ae895aaa0668ef93218c66b4696bfeba9da1f513fa28669b3dfb668033cf94c7857f1450a8320e685d5d16f50d37
|
data/README.md
CHANGED
@@ -20,6 +20,35 @@ Using AnyStyle CLI
|
|
20
20
|
|
21
21
|
See [anystyle-cli](https://github.com/inukshuk/anystyle-cli) for more details.
|
22
22
|
|
23
|
+
Using AnyStyle in Ruby
|
24
|
+
----------------------
|
25
|
+
Install the `anystyle` gem.
|
26
|
+
|
27
|
+
$ [sudo] gem install anystyle
|
28
|
+
|
29
|
+
Once installed, you can use the static Parser and Finder instances
|
30
|
+
by calling the `AnyStyle.parse` or `AnyStyle.find` methods. For example:
|
31
|
+
|
32
|
+
```ruby
|
33
|
+
require 'anystyle'
|
34
|
+
|
35
|
+
pp AnyStyle.parse 'Derrida, J. (1967). L’écriture et la différence (1 éd.). Paris: Éditions du Seuil.'
|
36
|
+
#-> [{
|
37
|
+
# :author=>[{:family=>"Derrida", :given=>"J."}],
|
38
|
+
# :date=>["1967"],
|
39
|
+
# :title=>["L’écriture et la différence"],
|
40
|
+
# :edition=>["1"],
|
41
|
+
# :location=>["Paris"],
|
42
|
+
# :publisher=>["Éditions du Seuil"],
|
43
|
+
# :language=>"fr",
|
44
|
+
# :scripts=>["Common", "Latin"],
|
45
|
+
# :type=>"book"
|
46
|
+
#}]
|
47
|
+
```
|
48
|
+
|
49
|
+
Alternatively, you can create your own `AnyStyle::Parser` or
|
50
|
+
`AnyStyle::Finder` with custom options.
|
51
|
+
|
23
52
|
|
24
53
|
Web Application and Web Service
|
25
54
|
-------------------------------
|
@@ -30,20 +59,53 @@ Please note that the web service is currently based on the legacy
|
|
30
59
|
[0.x branch](https://github.com/inukshuk/anystyle/tree/0.x).
|
31
60
|
|
32
61
|
|
33
|
-
Using AnyStyle in Ruby
|
34
|
-
----------------------
|
35
|
-
|
36
|
-
$ [sudo] gem install anystyle
|
37
|
-
|
38
|
-
|
39
|
-
Reference Parsing
|
40
|
-
-----------------
|
41
|
-
|
42
|
-
Document Parsing
|
43
|
-
----------------
|
44
|
-
|
45
62
|
Training
|
46
63
|
--------
|
64
|
+
You can train custom Finder and Parser models. To do this, you need
|
65
|
+
to prepare your own data sets for training. You can create your own
|
66
|
+
data from scratch or build on AnyStyle's default sets. The default
|
67
|
+
parser model is based on the
|
68
|
+
[core](https://github.com/inukshuk/anystyle/blob/master/res/parser/core.xml)
|
69
|
+
data set; the default finder model source data is not publicly
|
70
|
+
available in its entirety, but you can find a number of tagged
|
71
|
+
documents
|
72
|
+
[here](https://github.com/inukshuk/anystyle/blob/master/res/finder).
|
73
|
+
|
74
|
+
When you have compiled a data set for training, you will be ready
|
75
|
+
to create your own model:
|
76
|
+
|
77
|
+
$ anystyle train training-data.xml custom.mod
|
78
|
+
|
79
|
+
This will save your new model as `custom.mod`. To use your model
|
80
|
+
instead of AnyStyle's default, use the `-P` or `--parser-model` flag
|
81
|
+
and, respectively, `-F` or `--finder-model` to use a custom Finder
|
82
|
+
model. For instance, the command below would parse all references
|
83
|
+
in `bib.txt` using the custom model we just trained and print the
|
84
|
+
result to STDOUT using the JSON output format:
|
85
|
+
|
86
|
+
$ anystyle -P custom.mod -f json parse bib.txt -
|
87
|
+
|
88
|
+
When training your own models, it is good practice to check the
|
89
|
+
quality using a second data set. For example, using AnyStyle's own
|
90
|
+
[gold](https://github.com/inukshuk/anystyle/blob/master/res/parser/gold.xml)
|
91
|
+
data set (a large, manually curated data set) we could check our
|
92
|
+
custom model like this:
|
93
|
+
|
94
|
+
$ anystyle -P x.mod check ./res/parser/gold.xml
|
95
|
+
Checking gold.xml................. 1 seq 0.06% 3 tok 0.01% 3s
|
96
|
+
|
97
|
+
This command will print the sequence and token error rates; in
|
98
|
+
the case of AnyStyle a the number of sequence errors is the number
|
99
|
+
of references which were tagged differently by the parser than they
|
100
|
+
were in the input; the number of token errors is the total number of
|
101
|
+
words across all the references which were tagged differently. In the
|
102
|
+
example above, we got one reference wrong (out of 1700 at the time);
|
103
|
+
but even this one reference was mostly tagged correctly, because only
|
104
|
+
a total of 3 words were tagged differently.
|
105
|
+
|
106
|
+
When working with training data, it is a good idea to use the
|
107
|
+
`Wapiti::Dataset` API in Ruby: it supports all the standard set
|
108
|
+
operators and makes it very easy to combine or compare data sets.
|
47
109
|
|
48
110
|
Dictionary Adapters
|
49
111
|
-------------------
|
data/lib/anystyle/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: anystyle
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.3.
|
4
|
+
version: 1.3.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Sylvester Keil
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-09-
|
11
|
+
date: 2018-09-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bibtex-ruby
|