relaton-bipm 1.14.1 → 1.14.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/Gemfile +6 -0
- data/README.adoc +32 -12
- data/grammars/basicdoc.rng +0 -1
- data/grammars/biblio.rng +12 -2
- data/lib/relaton_bipm/bipm_bibliography.rb +12 -310
- data/lib/relaton_bipm/bipm_si_brochure_parser.rb +8 -4
- data/lib/relaton_bipm/comment_periond.rb +1 -1
- data/lib/relaton_bipm/data_fetcher.rb +17 -5
- data/lib/relaton_bipm/data_outcomes_parser.rb +68 -29
- data/lib/relaton_bipm/id_parser.rb +134 -0
- data/lib/relaton_bipm/processor.rb +5 -4
- data/lib/relaton_bipm/rawdata_bipm_metrologia/article_parser.rb +311 -0
- data/lib/relaton_bipm/rawdata_bipm_metrologia/fetcher.rb +176 -0
- data/lib/relaton_bipm/version.rb +1 -1
- data/lib/relaton_bipm.rb +5 -1
- data/relaton_bipm.gemspec +2 -6
- metadata +26 -80
- data/lib/relaton_bipm/index.rb +0 -68
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 34d720b316dbd942e2c5d630d2ae0f07b74331e4ef07f68715e84304bad0fb13
|
4
|
+
data.tar.gz: 38d36e34b998db6e4fa9e9f1a6e5306fadacea45b9a878cd14629eaaca2ef50d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a22261617d5c3de8aad7ed410091331698630f958332c5feb0b215b9fafe9167015530e9e6ecb71046307a6775aa7a2a0a71dd4c12e4f982cb0bd259e021a267
|
7
|
+
data.tar.gz: 376bb090dd4d273b8039357d78280c9bc4f1555918920a55b22ec72cb8be87c4ce0a8469254ffc000256fa95069a271b65b978dc767d735acd4b3e141c5dea24
|
data/.gitignore
CHANGED
data/Gemfile
CHANGED
data/README.adoc
CHANGED
@@ -70,22 +70,35 @@ Allowed document names are:
|
|
70
70
|
|
71
71
|
==== Reference structure for Metrologia documents
|
72
72
|
|
73
|
-
`BIPM Metrologia {JOURNAL} {VOLUME} {ISSUE}
|
73
|
+
`BIPM Metrologia {JOURNAL} {VOLUME} {ISSUE}`
|
74
74
|
|
75
|
-
- `{JOURNAL}` - number
|
76
|
-
- `{VOLUME}` - number
|
77
|
-
- `{ISSUE}` - number
|
78
|
-
- `{PAGE}` - number of page, optional
|
75
|
+
- `{JOURNAL}` - journal number, required
|
76
|
+
- `{VOLUME}` - volume number, optional
|
77
|
+
- `{ISSUE}` - issue number, optional
|
79
78
|
|
80
79
|
==== Reference structures for CCTF (CCDS), CGPM, CIPM documents
|
81
80
|
|
82
81
|
===== Basic pattern
|
83
82
|
|
84
83
|
----
|
85
|
-
Long:
|
86
|
-
|
84
|
+
Long:
|
85
|
+
{group name} -- {type} {number} ({year})
|
86
|
+
{group name} {type} {number} ({year})
|
87
|
+
{group name} {type} {year}-{zero_leading_number}
|
88
|
+
|
89
|
+
Short:
|
90
|
+
{group name} -- {type-abbrev} {number} ({year}, {lang})
|
91
|
+
{group name} {type-abbrev} {number} ({year}, {lang})
|
87
92
|
----
|
88
93
|
|
94
|
+
`group name` - a name of the group, required. A full list of group names is available https://github.com/metanorma/bipm-editor-guides/blob/main/sources/bipm-outcomes-en.adoc#appendix-a-bipm-groups-and-codes[here].
|
95
|
+
`type` - a type of document, required. A list of types is: Resolution (Résolution), Recommendation (Recommandation), Decision (Décision), Meeting (Réunion), Declaration (Déclaration).
|
96
|
+
`type-abbrev` - an abbreviation of the type, required. A list of abbreviations: RES (Resolution), REC (Recommendation), DECN (Decision).
|
97
|
+
`number` - a number of the document, optional. Can be with part, e.g. `1-2`.
|
98
|
+
`zero_leading_number` - a number of the document with a leading zero, required. Can be used when a document has a 1 or 2 digits number. It's `00` for documents without a number.
|
99
|
+
`year` - a year of the document, optional.
|
100
|
+
`lang` - a language of the document, optional. Can be `EN` or `FR`.
|
101
|
+
|
89
102
|
===== Special case pattern
|
90
103
|
|
91
104
|
The basic pattern works fine for all, except for these 2 cases:
|
@@ -189,9 +202,9 @@ item = RelatonBipm::BipmBibliography.get "BIPM SI Brochure"
|
|
189
202
|
...
|
190
203
|
|
191
204
|
# get BIPM Metrologia page
|
192
|
-
bib = RelatonBipm::BipmBibliography.get "BIPM Metrologia 29 6
|
193
|
-
[relaton-bipm] ("BIPM Metrologia 29 6
|
194
|
-
[relaton-bipm] ("BIPM Metrologia 29 6
|
205
|
+
bib = RelatonBipm::BipmBibliography.get "BIPM Metrologia 29 6 001"
|
206
|
+
[relaton-bipm] ("BIPM Metrologia 29 6 001") fetching...
|
207
|
+
[relaton-bipm] ("BIPM Metrologia 29 6 001") found Metrologia 29 6 001
|
195
208
|
=> #<RelatonBipm::BipmBibliographicItem:0x007f8857f94d40
|
196
209
|
...
|
197
210
|
|
@@ -295,7 +308,7 @@ bib.link
|
|
295
308
|
#<RelatonBib::TypedUri:0x00007fa6d6a29250 @content=#<Addressable::URI:0xc2b0 URI:https://doi.org/10.1088/0026-1394/29/6/001>, @type="doi">]
|
296
309
|
----
|
297
310
|
|
298
|
-
=== Create bibliographic item from XML
|
311
|
+
=== Create a bibliographic item from XML
|
299
312
|
|
300
313
|
[source,ruby]
|
301
314
|
----
|
@@ -304,7 +317,7 @@ RelatonBipm::XMLParser.from_xml File.read('spec/fixtures/bipm_item.xml')
|
|
304
317
|
...
|
305
318
|
----
|
306
319
|
|
307
|
-
=== Create bibliographic item from YAML
|
320
|
+
=== Create a bibliographic item from YAML
|
308
321
|
[source,ruby]
|
309
322
|
----
|
310
323
|
hash = YAML.load_file 'spec/fixtures/bipm_item.yml'
|
@@ -321,6 +334,7 @@ RelatonBipm::BipmBibliographicItem.from_hash hash
|
|
321
334
|
This gem uses the following datasets as data sources:
|
322
335
|
- `bipm-data-outcomes` - looking for a local directory with the repository https://github.com/metanorma/bipm-data-outcomes
|
323
336
|
- `bipm-si-brochute` - looking for a local directory with the repository https://github.com/metanorma/bipm-si-brochure
|
337
|
+
- `rawdata-bipm-metrologia` - looking for a local directory with the repository https://github.com/relaton/rawdata-bipm-metrologia
|
324
338
|
|
325
339
|
The method `RelatonBipm::DataFetcher.fetch(source, output: "data", format: "yaml")` fetches all the documents from the dataset and saves them to the `./data` folder in YAML format.
|
326
340
|
Arguments:
|
@@ -342,6 +356,12 @@ Started at: 2022-06-23 09:37:12 +0200
|
|
342
356
|
Stopped at: 2022-06-23 09:37:12 +0200
|
343
357
|
Done in: 0 sec.
|
344
358
|
=> nil
|
359
|
+
|
360
|
+
RelatonBipm::DataFetcher.fetch "rawdata-bipm-metrologia"
|
361
|
+
Started at: 2022-06-23 09:39:12 +0200
|
362
|
+
Stopped at: 2022-06-23 09:40:34 +0200
|
363
|
+
Done in: 82 sec.
|
364
|
+
=> nil
|
345
365
|
----
|
346
366
|
|
347
367
|
== Development
|
data/grammars/basicdoc.rng
CHANGED
data/grammars/biblio.rng
CHANGED
@@ -216,6 +216,9 @@
|
|
216
216
|
<optional>
|
217
217
|
<ref name="fullname"/>
|
218
218
|
</optional>
|
219
|
+
<zeroOrMore>
|
220
|
+
<ref name="credential"/>
|
221
|
+
</zeroOrMore>
|
219
222
|
<zeroOrMore>
|
220
223
|
<ref name="affiliation"/>
|
221
224
|
</zeroOrMore>
|
@@ -232,6 +235,11 @@
|
|
232
235
|
<ref name="FullNameType"/>
|
233
236
|
</element>
|
234
237
|
</define>
|
238
|
+
<define name="credential">
|
239
|
+
<element name="credential">
|
240
|
+
<text/>
|
241
|
+
</element>
|
242
|
+
</define>
|
235
243
|
<define name="FullNameType">
|
236
244
|
<choice>
|
237
245
|
<group>
|
@@ -305,7 +313,9 @@
|
|
305
313
|
<zeroOrMore>
|
306
314
|
<ref name="affiliationdescription"/>
|
307
315
|
</zeroOrMore>
|
308
|
-
<
|
316
|
+
<optional>
|
317
|
+
<ref name="organization"/>
|
318
|
+
</optional>
|
309
319
|
</element>
|
310
320
|
</define>
|
311
321
|
<define name="affiliationname">
|
@@ -1316,7 +1326,7 @@
|
|
1316
1326
|
<value>commentaryOf</value>
|
1317
1327
|
<value>hasCommentary</value>
|
1318
1328
|
<value>related</value>
|
1319
|
-
<value>
|
1329
|
+
<value>hasComplement</value>
|
1320
1330
|
<value>complementOf</value>
|
1321
1331
|
<value>obsoletes</value>
|
1322
1332
|
<value>obsoletedBy</value>
|
@@ -3,14 +3,6 @@ require "mechanize"
|
|
3
3
|
module RelatonBipm
|
4
4
|
class BipmBibliography
|
5
5
|
GH_ENDPOINT = "https://raw.githubusercontent.com/relaton/relaton-data-bipm/master/".freeze
|
6
|
-
IOP_DOMAIN = "https://iopscience.iop.org".freeze
|
7
|
-
TRANSLATIONS = {
|
8
|
-
"Déclaration" => "Declaration",
|
9
|
-
"Réunion" => "Meeting",
|
10
|
-
"Recommandation" => "Recommendation",
|
11
|
-
"Résolution" => "Resolution",
|
12
|
-
"Décision" => "Decision",
|
13
|
-
}.freeze
|
14
6
|
|
15
7
|
class << self
|
16
8
|
# @param text [String]
|
@@ -18,14 +10,13 @@ module RelatonBipm
|
|
18
10
|
def search(text, _year = nil, _opts = {}) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
|
19
11
|
warn "[relaton-bipm] (\"#{text}\") fetching..."
|
20
12
|
ref = text.sub(/^BIPM\s/, "")
|
21
|
-
item =
|
13
|
+
item = get_bipm(ref, magent)
|
22
14
|
unless item
|
23
15
|
warn "[relaton-bipm] (\"#{text}\") not found."
|
24
16
|
return
|
25
17
|
end
|
26
18
|
|
27
19
|
warn("[relaton-bipm] (\"#{text}\") found #{item.docidentifier[0].id}")
|
28
|
-
item.fetched = Date.today.to_s
|
29
20
|
item
|
30
21
|
rescue Mechanize::ResponseCodeError => e
|
31
22
|
raise RelatonBib::RequestError, e.message unless e.response_code == "404"
|
@@ -48,295 +39,28 @@ module RelatonBipm
|
|
48
39
|
a
|
49
40
|
end
|
50
41
|
|
51
|
-
# @param
|
42
|
+
# @param reference [String]
|
52
43
|
# @param agent [Mechanize]
|
53
44
|
# @return [RelatonBipm::BipmBibliographicItem]
|
54
|
-
def get_bipm(
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
# TRANSLATIONS.each { |fr, en| rf.sub! fr, en }
|
60
|
-
path = Index.new.search ref
|
61
|
-
return unless path
|
45
|
+
def get_bipm(reference, agent) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
|
46
|
+
ref_id = Id.new reference
|
47
|
+
index = Relaton::Index.find_or_create :BIPM, url: "#{GH_ENDPOINT}index2.zip"
|
48
|
+
rows = index.search { |r| ref_id == r[:id] }
|
49
|
+
return unless rows.any?
|
62
50
|
|
63
|
-
url = "#{GH_ENDPOINT}#{
|
51
|
+
url = "#{GH_ENDPOINT}#{rows.first[:file]}"
|
64
52
|
resp = agent.get url
|
65
|
-
check_response resp
|
66
53
|
return unless resp.code == "200"
|
67
54
|
|
68
55
|
yaml = RelatonBib.parse_yaml resp.body, [Date]
|
69
|
-
|
56
|
+
yaml["fetched"] = Date.today.to_s
|
70
57
|
bib_hash = HashConverter.hash_to_bib yaml
|
71
58
|
BipmBibliographicItem.new(**bib_hash)
|
72
59
|
end
|
73
60
|
|
74
|
-
#
|
75
|
-
#
|
76
|
-
#
|
77
|
-
def get_metrologia(ref, agent)
|
78
|
-
agent.redirect_ok = false
|
79
|
-
ref_arr = ref.split
|
80
|
-
case ref_arr.size
|
81
|
-
when 1 then get_journal agent
|
82
|
-
when 2 then get_volume ref_arr[1], agent
|
83
|
-
when 3 then get_issue(*ref_arr[1..2], agent)
|
84
|
-
when 4 then get_article_from_issue(*ref_arr[1..3], agent)
|
85
|
-
end
|
86
|
-
end
|
87
|
-
|
88
|
-
# @param agent [Mechanize]
|
89
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
90
|
-
def get_journal(agent)
|
91
|
-
url = "#{IOP_DOMAIN}/journal/0026-1394"
|
92
|
-
rsp = agent.get url
|
93
|
-
check_response rsp
|
94
|
-
rel = rsp.xpath('//select[@id="allVolumesSelector"]/option').map do |v|
|
95
|
-
{ type: "partOf", bibitem: journal_rel(v) }
|
96
|
-
end
|
97
|
-
did = doc_id []
|
98
|
-
bibitem(formattedref: fref(did.id), docid: [did], link: blink(url), relation: rel)
|
99
|
-
end
|
100
|
-
|
101
|
-
# @param elm [Nokogiri::XML::Element]
|
102
|
-
def journal_rel(elm)
|
103
|
-
vol = elm[:value].split("/").last
|
104
|
-
did = doc_id [vol]
|
105
|
-
url = IOP_DOMAIN + elm[:value]
|
106
|
-
BipmBibliographicItem.new(formattedref: fref(did.id), docid: [did], link: blink(url))
|
107
|
-
end
|
108
|
-
|
109
|
-
# @param vol [String]
|
110
|
-
# @param agent [Mechanize]
|
111
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
112
|
-
def get_volume(vol, agent)
|
113
|
-
url = "#{IOP_DOMAIN}/volume/0026-1394/#{vol}"
|
114
|
-
rsp = agent.get url
|
115
|
-
check_response rsp
|
116
|
-
rel = rsp.xpath('//li[@itemprop="hasPart"]').map do |i|
|
117
|
-
{ type: "partOf", bibitem: volume_rel(i, vol) }
|
118
|
-
end
|
119
|
-
did = doc_id [vol]
|
120
|
-
bibitem(formattedref: fref(did.id), docid: [did], link: blink(url), date: bdate(rsp), relation: rel,
|
121
|
-
extent: btextent(vol), series: series)
|
122
|
-
end
|
123
|
-
|
124
|
-
def volume_rel(elm, vol) # rubocop:disable Metrics/AbcSize
|
125
|
-
a = elm.at 'a[@itemprop="issueNumber"]'
|
126
|
-
ish = a[:href].split("/").last
|
127
|
-
url = IOP_DOMAIN + a[:href]
|
128
|
-
docid = doc_id [vol, ish]
|
129
|
-
t = elm.at "p"
|
130
|
-
title_fref = t ? { title: titles(t.text) } : { formattedref: fref(docid.id) }
|
131
|
-
BipmBibliographicItem.new(**title_fref, docid: [docid], link: blink(url))
|
132
|
-
end
|
133
|
-
|
134
|
-
# @param title [String]
|
135
|
-
# @return [RelatonBib::TypedTitleStringCollection]
|
136
|
-
def titles(title)
|
137
|
-
RelatonBib::TypedTitleString.from_string title, "en", "Latn", "text/html"
|
138
|
-
end
|
139
|
-
|
140
|
-
# @param vol [String]
|
141
|
-
# @param ish [String]
|
142
|
-
# @param agent [Mechanize]
|
143
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
144
|
-
def get_issue(vol, ish, agent) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
|
145
|
-
url = issue_url vol, ish
|
146
|
-
rsp = agent.get url
|
147
|
-
check_response rsp
|
148
|
-
rel = rsp.xpath('//div[@class="art-list-item-body"]').map do |a|
|
149
|
-
{ type: "partOf", bibitem: issue_rel(a, vol, ish) }
|
150
|
-
end
|
151
|
-
did = doc_id [vol, ish]
|
152
|
-
title_fref = { title: issue_title(rsp) }
|
153
|
-
title_fref[:formattedref] = fref did.id unless title_fref[:title].any?
|
154
|
-
bibitem(**title_fref, link: blink(url), relation: rel, docid: [did],
|
155
|
-
date: bdate(rsp), extent: btextent(vol, ish), series: series)
|
156
|
-
end
|
157
|
-
|
158
|
-
# @param ref [String]
|
159
|
-
# @return [RelatonBib::FormattedRef]
|
160
|
-
def fref(ref)
|
161
|
-
RelatonBib::FormattedRef.new content: ref, language: "en", script: "Latn"
|
162
|
-
end
|
163
|
-
|
164
|
-
# @param rsp [Mechanize::Page]
|
165
|
-
# @return [RelatonBib::TypedTitleStringCollection]
|
166
|
-
def issue_title(rsp)
|
167
|
-
t = rsp.at('//div[@id="wd-jnl-issue-title"]/h4')
|
168
|
-
return RelatonBib::TypedTitleStringCollection.new [] unless t
|
169
|
-
|
170
|
-
titles(t.text)
|
171
|
-
end
|
172
|
-
|
173
|
-
# @oaran vol [String]
|
174
|
-
# @param ish [String]
|
175
|
-
# @return [String]
|
176
|
-
def issue_url(vol, ish)
|
177
|
-
"#{IOP_DOMAIN}/issue/0026-1394/#{vol}/#{ish}"
|
178
|
-
end
|
179
|
-
|
180
|
-
# @param elm [Nokogiri::XML::Element]
|
181
|
-
# @param vol [String]
|
182
|
-
# @param ish [String]
|
183
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
184
|
-
def issue_rel(elm, vol, ish)
|
185
|
-
art = elm.at('div[@class="indexer"]').text
|
186
|
-
ref = elm.at('div/a[@class="art-list-item-title"]')
|
187
|
-
title = titles ref.text.strip
|
188
|
-
docid = doc_id [vol, ish, art]
|
189
|
-
link = blink IOP_DOMAIN + ref[:href]
|
190
|
-
BipmBibliographicItem.new(title: title, docid: [docid], link: link)
|
191
|
-
end
|
192
|
-
|
193
|
-
# @param content [RelatonBib::TypedTitleString]
|
194
|
-
# @return [RelatonBib::TypedTitleString]
|
195
|
-
def btitle(content)
|
196
|
-
RelatonBib::TypedTitleString.new type: "main", content: content, language: "en", script: "Latn"
|
197
|
-
end
|
198
|
-
|
199
|
-
# @param url [String]
|
200
|
-
# @return [String]
|
201
|
-
def blink(url)
|
202
|
-
[RelatonBib::TypedUri.new(type: "src", content: url)]
|
203
|
-
end
|
204
|
-
|
205
|
-
# @param rsp [Mechanize::Page]
|
206
|
-
# @return [Array<RelatonBib::BibliographicDate>]
|
207
|
-
def bdate(rsp)
|
208
|
-
date = rsp.at('//p[@itemprop="issueNumber"]|//h2[@itemprop="volumeNumber"]').text.split(", ").last
|
209
|
-
on = date.match?(/^\d{4}$/) ? date : Date.parse(date).strftime("%Y-%m")
|
210
|
-
[RelatonBib::BibliographicDate.new(type: "published", on: on)]
|
211
|
-
end
|
212
|
-
|
213
|
-
# @param args [Array<String>]
|
214
|
-
# @return [RelatonBib::DocumentIdentifier]
|
215
|
-
def doc_id(args)
|
216
|
-
id = args.clone.unshift "Metrologia"
|
217
|
-
RelatonBib::DocumentIdentifier.new(type: "BIPM", id: id.join(" "), primary: true)
|
218
|
-
end
|
219
|
-
|
220
|
-
# @param vol [String]
|
221
|
-
# @param ish [String]
|
222
|
-
# @param art [String]
|
223
|
-
# @param agent [Mechanize]
|
224
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
225
|
-
def get_article_from_issue(vol, ish, art, agent) # rubocop:disable Metrics/MethodLength
|
226
|
-
url = issue_url vol, ish
|
227
|
-
rsp = agent.get url
|
228
|
-
check_response rsp
|
229
|
-
link = rsp.at("//div[@class='indexer'][.='#{art}']/../div/a")
|
230
|
-
unless link
|
231
|
-
arts = rsp.xpath("//div[@class='indexer']").map(&:text)
|
232
|
-
warn "[relaton-bipm] No article is available at the specified start page \"#{art}\" in issue \"BIPM Metrologia #{vol} #{ish}\"."
|
233
|
-
warn "[relaton-bipm] Available articles in the issue start at the following pages: (#{arts.join(', ')})"
|
234
|
-
return
|
235
|
-
end
|
236
|
-
|
237
|
-
get_article link[:href], vol, ish, agent
|
238
|
-
end
|
239
|
-
|
240
|
-
# @param path [String]
|
241
|
-
# @param vol [String]
|
242
|
-
# @param ish [String]
|
243
|
-
# @param agent [Mechanize]
|
244
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
245
|
-
def get_article(path, vol, ish, agent) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
|
246
|
-
agent.agent.allowed_error_codes = [403]
|
247
|
-
rsp = agent.get path
|
248
|
-
check_response rsp
|
249
|
-
title = rsp.at("//h1[@itemprop='headline']").children.to_xml
|
250
|
-
url = rsp.uri
|
251
|
-
bib = rsp.link_with(text: "BibTeX").href
|
252
|
-
rsp = agent.get bib
|
253
|
-
check_response rsp
|
254
|
-
bt = BibTeX.parse(rsp.body).first
|
255
|
-
bibitem(
|
256
|
-
docid: btdocid(bt), title: titles(title), date: btdate(bt),
|
257
|
-
abstract: btabstract(bt), doctype: bt.type.to_s, series: series,
|
258
|
-
link: btlink(bt, url), contributor: btcontrib(bt),
|
259
|
-
extent: btextent(vol, ish, bt.pages.to_s)
|
260
|
-
)
|
261
|
-
end
|
262
|
-
|
263
|
-
# @param args [Hash]
|
264
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
265
|
-
def bibitem(**args)
|
266
|
-
BipmBibliographicItem.new(
|
267
|
-
type: "article", language: ["en"], script: ["Latn"], **args,
|
268
|
-
)
|
269
|
-
end
|
270
|
-
|
271
|
-
# @return [Array<RelatonBib::Series>]
|
272
|
-
def series
|
273
|
-
[RelatonBib::Series.new(title: btitle("Metrologia"))]
|
274
|
-
end
|
275
|
-
|
276
|
-
# @param bibtex [BibTeX::Entry]
|
277
|
-
# @return [Array<RelatonBib::DocumentIdentifier>]
|
278
|
-
def btdocid(bibtex)
|
279
|
-
id = "#{bibtex.journal} #{bibtex.volume} #{bibtex.number} #{bibtex.pages.match(/^\d+/)}"
|
280
|
-
[
|
281
|
-
RelatonBib::DocumentIdentifier.new(type: "BIPM", id: id, primary: true),
|
282
|
-
RelatonBib::DocumentIdentifier.new(type: "DOI", id: bibtex.doi),
|
283
|
-
]
|
284
|
-
end
|
285
|
-
|
286
|
-
# @param bibtex [BibTeX::Entry]
|
287
|
-
# @return [Array<RelatonBib::FormattedString>]
|
288
|
-
def btabstract(bibtex)
|
289
|
-
[RelatonBib::FormattedString.new(content: bibtex.abstract.to_s, language: "en", script: "Latn")]
|
290
|
-
end
|
291
|
-
|
292
|
-
# @param bibtex [BibTeX::Entry]
|
293
|
-
# @param ref [URI]
|
294
|
-
# @return [Array<RelatonBib::TypedUri>]
|
295
|
-
def btlink(bibtex, ref)
|
296
|
-
[
|
297
|
-
RelatonBib::TypedUri.new(type: "src", content: ref.to_s),
|
298
|
-
RelatonBib::TypedUri.new(type: "doi", content: bibtex.url.to_s),
|
299
|
-
]
|
300
|
-
end
|
301
|
-
|
302
|
-
# @param bibtex [BibTeX::Entry]
|
303
|
-
# @return [Array<RelatonBib::BibliographicDate>]
|
304
|
-
def btdate(bibtex)
|
305
|
-
on = Date.new(bibtex.year.to_i, bibtex.month_numeric)
|
306
|
-
[RelatonBib::BibliographicDate.new(type: "published", on: on)]
|
307
|
-
end
|
308
|
-
|
309
|
-
# @param bibtex [BibTeX::Entry]
|
310
|
-
# @return [Array<Hash>]
|
311
|
-
def btcontrib(bibtex) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize
|
312
|
-
contribs = []
|
313
|
-
if bibtex.publisher && !bibtex.publisher.empty?
|
314
|
-
org = RelatonBib::Organization.new name: bibtex.publisher.to_s
|
315
|
-
contribs << { entity: org, role: [{ type: "publisher" }] }
|
316
|
-
end
|
317
|
-
return contribs unless bibtex.author && !bibtex.author.empty?
|
318
|
-
|
319
|
-
bibtex.author.split(" and ").inject(contribs) do |mem, name|
|
320
|
-
cname = RelatonBib::LocalizedString.new name, "en", "Latn"
|
321
|
-
name = RelatonBib::FullName.new completename: cname
|
322
|
-
author = RelatonBib::Person.new name: name
|
323
|
-
mem << { entity: author, role: [{ type: "author" }] }
|
324
|
-
end
|
325
|
-
end
|
326
|
-
|
327
|
-
#
|
328
|
-
# @param vol [String] volume
|
329
|
-
# @param ish [String] issue
|
330
|
-
# @param pgs [String] pages
|
331
|
-
#
|
332
|
-
# @return [Array<RelatonBib::BibItemLocality>]
|
333
|
-
#
|
334
|
-
def btextent(vol, ish = nil, pgs = nil)
|
335
|
-
ext = [RelatonBib::Locality.new("volume", vol)]
|
336
|
-
ext << RelatonBib::Locality.new("issue", ish) if ish
|
337
|
-
ext << RelatonBib::Locality.new("page", *pgs.split("--")) if pgs
|
338
|
-
ext
|
339
|
-
end
|
61
|
+
# def match_item(ids, ref_id)
|
62
|
+
# ids.find { |id| Id.new(id) == ref_id }
|
63
|
+
# end
|
340
64
|
|
341
65
|
# @param ref [String] the BIPM standard Code to look up (e..g "BIPM B-11")
|
342
66
|
# @param year [String] not used
|
@@ -345,28 +69,6 @@ module RelatonBipm
|
|
345
69
|
def get(ref, year = nil, opts = {})
|
346
70
|
search(ref, year, opts)
|
347
71
|
end
|
348
|
-
|
349
|
-
private
|
350
|
-
|
351
|
-
#
|
352
|
-
# Check HTTP response. Warn and rise error if response is not 200
|
353
|
-
# or redirect to CAPTCHA.
|
354
|
-
#
|
355
|
-
# @param [Mechanize] rsp response
|
356
|
-
#
|
357
|
-
# @raise [RelatonBib::RequestError] if response is not 200
|
358
|
-
#
|
359
|
-
def check_response(rsp) # rubocop:disable Metrics/AbcSize
|
360
|
-
if rsp.code == "302"
|
361
|
-
warn "[relaton-bipm] This source employs anti-DDoS measures that unfortunately affects automated requests."
|
362
|
-
warn "[relaton-bipm] Please visit this link in your browser to resolve the CAPTCHA, then retry: #{rsp.uri}"
|
363
|
-
# warn "[relaton-bipm] #{rsp.uri} is redirected to #{rsp.header['location']}"
|
364
|
-
raise RelatonBib::RequestError, "cannot access #{rsp.uri}"
|
365
|
-
elsif rsp.code != "200" && rsp.code != "403"
|
366
|
-
warn "[read_bipm] can't acces #{rsp.uri} #{rsp.code}"
|
367
|
-
raise RelatonBib::RequestError, "cannot acces #{rsp.uri} #{rsp.code}"
|
368
|
-
end
|
369
|
-
end
|
370
72
|
end
|
371
73
|
end
|
372
74
|
end
|
@@ -6,7 +6,7 @@ module RelatonBipm
|
|
6
6
|
# @param [RelatonBipm::DataFetcher] data_fetcher data fetcher
|
7
7
|
#
|
8
8
|
def initialize(data_fetcher)
|
9
|
-
@data_fetcher = data_fetcher
|
9
|
+
@data_fetcher = WeakRef.new data_fetcher
|
10
10
|
end
|
11
11
|
|
12
12
|
#
|
@@ -27,14 +27,18 @@ module RelatonBipm
|
|
27
27
|
# puts "Ls #{Dir['bipm-si-brochure/*']}"
|
28
28
|
# puts "Ls #{Dir['bipm-si-brochure/site/*']}"
|
29
29
|
# puts "Ls #{Dir['bipm-si-brochure/site/documents/*']}"
|
30
|
-
Dir["bipm-si-brochure/
|
30
|
+
Dir["bipm-si-brochure/_site/documents/*.rxl"].each do |f|
|
31
31
|
puts "Parsing #{f}"
|
32
32
|
docstd = Nokogiri::XML File.read f
|
33
33
|
doc = docstd.at "/bibdata"
|
34
34
|
hash1 = RelatonBipm::XMLParser.from_xml(doc.to_xml).to_hash
|
35
35
|
fix_si_brochure_id hash1
|
36
|
-
|
37
|
-
|
36
|
+
basename = File.join @data_fetcher.output, File.basename(f).sub(/(?:-(?:en|fr))?\.rxl$/, "")
|
37
|
+
outfile = "#{basename}.#{@data_fetcher.ext}"
|
38
|
+
key = hash1["docnumber"] || basename
|
39
|
+
@data_fetcher.index[[key]] = outfile
|
40
|
+
@data_fetcher.index_new.add_or_update [key], outfile
|
41
|
+
@data_fetcher.index2.add_or_update Id.new(key).normalized_hash, outfile
|
38
42
|
hash = if File.exist? outfile
|
39
43
|
warn_duplicate = false
|
40
44
|
hash2 = YAML.load_file outfile
|
@@ -1,6 +1,6 @@
|
|
1
1
|
module RelatonBipm
|
2
2
|
class DataFetcher
|
3
|
-
attr_reader :output, :format, :ext, :files, :index
|
3
|
+
attr_reader :output, :format, :ext, :files, :index, :index_new, :index2
|
4
4
|
|
5
5
|
#
|
6
6
|
# Initialize fetcher
|
@@ -15,6 +15,8 @@ module RelatonBipm
|
|
15
15
|
@files = []
|
16
16
|
@index_path = "index.yaml"
|
17
17
|
@index = File.exist?(@index_path) ? YAML.load_file(@index_path) : {}
|
18
|
+
@index_new = Relaton::Index.find_or_create :BIPM, file: "index-bipm.yaml"
|
19
|
+
@index2 = Relaton::Index.find_or_create :BIPM, file: "index2.yaml"
|
18
20
|
end
|
19
21
|
|
20
22
|
#
|
@@ -43,8 +45,11 @@ module RelatonBipm
|
|
43
45
|
case source
|
44
46
|
when "bipm-data-outcomes" then DataOutcomesParser.parse(self)
|
45
47
|
when "bipm-si-brochure" then BipmSiBrochureParser.parse(self)
|
48
|
+
when "rawdata-bipm-metrologia" then RawdataBipmMetrologia::Fetcher.fetch(self)
|
46
49
|
end
|
47
|
-
File.write @index_path,
|
50
|
+
File.write @index_path, index.to_yaml, encoding: "UTF-8"
|
51
|
+
index_new.save
|
52
|
+
index2.save
|
48
53
|
end
|
49
54
|
|
50
55
|
#
|
@@ -54,15 +59,22 @@ module RelatonBipm
|
|
54
59
|
# @param [RelatonBipm::BipmBibliographicItem] item document to save
|
55
60
|
# @param [Boolean, nil] warn_duplicate Warn if document already exists
|
56
61
|
#
|
57
|
-
# @return [<Type>] <description>
|
58
|
-
#
|
59
62
|
def write_file(path, item, warn_duplicate: true)
|
63
|
+
content = serialize item
|
60
64
|
if @files.include?(path)
|
61
65
|
warn "File #{path} already exists" if warn_duplicate
|
62
66
|
else
|
63
67
|
@files << path
|
64
68
|
end
|
65
|
-
File.write path,
|
69
|
+
File.write path, content, encoding: "UTF-8"
|
70
|
+
end
|
71
|
+
|
72
|
+
def serialize(item)
|
73
|
+
case @format
|
74
|
+
when "xml" then item.to_xml bibdata: true
|
75
|
+
when "yaml" then item.to_hash.to_yaml
|
76
|
+
when "bibxml" then item.to_bibxml
|
77
|
+
end
|
66
78
|
end
|
67
79
|
end
|
68
80
|
end
|