relaton-bipm 1.14.1 → 1.14.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/Gemfile +6 -0
- data/README.adoc +32 -12
- data/grammars/basicdoc.rng +0 -1
- data/grammars/biblio.rng +12 -2
- data/lib/relaton_bipm/bipm_bibliography.rb +12 -310
- data/lib/relaton_bipm/bipm_si_brochure_parser.rb +8 -4
- data/lib/relaton_bipm/comment_periond.rb +1 -1
- data/lib/relaton_bipm/data_fetcher.rb +17 -5
- data/lib/relaton_bipm/data_outcomes_parser.rb +68 -29
- data/lib/relaton_bipm/id_parser.rb +134 -0
- data/lib/relaton_bipm/processor.rb +5 -4
- data/lib/relaton_bipm/rawdata_bipm_metrologia/article_parser.rb +311 -0
- data/lib/relaton_bipm/rawdata_bipm_metrologia/fetcher.rb +176 -0
- data/lib/relaton_bipm/version.rb +1 -1
- data/lib/relaton_bipm.rb +5 -1
- data/relaton_bipm.gemspec +2 -6
- metadata +26 -80
- data/lib/relaton_bipm/index.rb +0 -68
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 34d720b316dbd942e2c5d630d2ae0f07b74331e4ef07f68715e84304bad0fb13
|
4
|
+
data.tar.gz: 38d36e34b998db6e4fa9e9f1a6e5306fadacea45b9a878cd14629eaaca2ef50d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a22261617d5c3de8aad7ed410091331698630f958332c5feb0b215b9fafe9167015530e9e6ecb71046307a6775aa7a2a0a71dd4c12e4f982cb0bd259e021a267
|
7
|
+
data.tar.gz: 376bb090dd4d273b8039357d78280c9bc4f1555918920a55b22ec72cb8be87c4ce0a8469254ffc000256fa95069a271b65b978dc767d735acd4b3e141c5dea24
|
data/.gitignore
CHANGED
data/Gemfile
CHANGED
data/README.adoc
CHANGED
@@ -70,22 +70,35 @@ Allowed document names are:
|
|
70
70
|
|
71
71
|
==== Reference structure for Metrologia documents
|
72
72
|
|
73
|
-
`BIPM Metrologia {JOURNAL} {VOLUME} {ISSUE}
|
73
|
+
`BIPM Metrologia {JOURNAL} {VOLUME} {ISSUE}`
|
74
74
|
|
75
|
-
- `{JOURNAL}` - number
|
76
|
-
- `{VOLUME}` - number
|
77
|
-
- `{ISSUE}` - number
|
78
|
-
- `{PAGE}` - number of page, optional
|
75
|
+
- `{JOURNAL}` - journal number, required
|
76
|
+
- `{VOLUME}` - volume number, optional
|
77
|
+
- `{ISSUE}` - issue number, optional
|
79
78
|
|
80
79
|
==== Reference structures for CCTF (CCDS), CGPM, CIPM documents
|
81
80
|
|
82
81
|
===== Basic pattern
|
83
82
|
|
84
83
|
----
|
85
|
-
Long:
|
86
|
-
|
84
|
+
Long:
|
85
|
+
{group name} -- {type} {number} ({year})
|
86
|
+
{group name} {type} {number} ({year})
|
87
|
+
{group name} {type} {year}-{zero_leading_number}
|
88
|
+
|
89
|
+
Short:
|
90
|
+
{group name} -- {type-abbrev} {number} ({year}, {lang})
|
91
|
+
{group name} {type-abbrev} {number} ({year}, {lang})
|
87
92
|
----
|
88
93
|
|
94
|
+
`group name` - a name of the group, required. A full list of group names is available https://github.com/metanorma/bipm-editor-guides/blob/main/sources/bipm-outcomes-en.adoc#appendix-a-bipm-groups-and-codes[here].
|
95
|
+
`type` - a type of document, required. A list of types is: Resolution (Résolution), Recommendation (Recommandation), Decision (Décision), Meeting (Réunion), Declaration (Déclaration).
|
96
|
+
`type-abbrev` - an abbreviation of the type, required. A list of abbreviations: RES (Resolution), REC (Recommendation), DECN (Decision).
|
97
|
+
`number` - a number of the document, optional. Can be with part, e.g. `1-2`.
|
98
|
+
`zero_leading_number` - a number of the document with a leading zero, required. Can be used when a document has a 1 or 2 digits number. It's `00` for documents without a number.
|
99
|
+
`year` - a year of the document, optional.
|
100
|
+
`lang` - a language of the document, optional. Can be `EN` or `FR`.
|
101
|
+
|
89
102
|
===== Special case pattern
|
90
103
|
|
91
104
|
The basic pattern works fine for all, except for these 2 cases:
|
@@ -189,9 +202,9 @@ item = RelatonBipm::BipmBibliography.get "BIPM SI Brochure"
|
|
189
202
|
...
|
190
203
|
|
191
204
|
# get BIPM Metrologia page
|
192
|
-
bib = RelatonBipm::BipmBibliography.get "BIPM Metrologia 29 6
|
193
|
-
[relaton-bipm] ("BIPM Metrologia 29 6
|
194
|
-
[relaton-bipm] ("BIPM Metrologia 29 6
|
205
|
+
bib = RelatonBipm::BipmBibliography.get "BIPM Metrologia 29 6 001"
|
206
|
+
[relaton-bipm] ("BIPM Metrologia 29 6 001") fetching...
|
207
|
+
[relaton-bipm] ("BIPM Metrologia 29 6 001") found Metrologia 29 6 001
|
195
208
|
=> #<RelatonBipm::BipmBibliographicItem:0x007f8857f94d40
|
196
209
|
...
|
197
210
|
|
@@ -295,7 +308,7 @@ bib.link
|
|
295
308
|
#<RelatonBib::TypedUri:0x00007fa6d6a29250 @content=#<Addressable::URI:0xc2b0 URI:https://doi.org/10.1088/0026-1394/29/6/001>, @type="doi">]
|
296
309
|
----
|
297
310
|
|
298
|
-
=== Create bibliographic item from XML
|
311
|
+
=== Create a bibliographic item from XML
|
299
312
|
|
300
313
|
[source,ruby]
|
301
314
|
----
|
@@ -304,7 +317,7 @@ RelatonBipm::XMLParser.from_xml File.read('spec/fixtures/bipm_item.xml')
|
|
304
317
|
...
|
305
318
|
----
|
306
319
|
|
307
|
-
=== Create bibliographic item from YAML
|
320
|
+
=== Create a bibliographic item from YAML
|
308
321
|
[source,ruby]
|
309
322
|
----
|
310
323
|
hash = YAML.load_file 'spec/fixtures/bipm_item.yml'
|
@@ -321,6 +334,7 @@ RelatonBipm::BipmBibliographicItem.from_hash hash
|
|
321
334
|
This gem uses the following datasets as data sources:
|
322
335
|
- `bipm-data-outcomes` - looking for a local directory with the repository https://github.com/metanorma/bipm-data-outcomes
|
323
336
|
- `bipm-si-brochute` - looking for a local directory with the repository https://github.com/metanorma/bipm-si-brochure
|
337
|
+
- `rawdata-bipm-metrologia` - looking for a local directory with the repository https://github.com/relaton/rawdata-bipm-metrologia
|
324
338
|
|
325
339
|
The method `RelatonBipm::DataFetcher.fetch(source, output: "data", format: "yaml")` fetches all the documents from the dataset and saves them to the `./data` folder in YAML format.
|
326
340
|
Arguments:
|
@@ -342,6 +356,12 @@ Started at: 2022-06-23 09:37:12 +0200
|
|
342
356
|
Stopped at: 2022-06-23 09:37:12 +0200
|
343
357
|
Done in: 0 sec.
|
344
358
|
=> nil
|
359
|
+
|
360
|
+
RelatonBipm::DataFetcher.fetch "rawdata-bipm-metrologia"
|
361
|
+
Started at: 2022-06-23 09:39:12 +0200
|
362
|
+
Stopped at: 2022-06-23 09:40:34 +0200
|
363
|
+
Done in: 82 sec.
|
364
|
+
=> nil
|
345
365
|
----
|
346
366
|
|
347
367
|
== Development
|
data/grammars/basicdoc.rng
CHANGED
data/grammars/biblio.rng
CHANGED
@@ -216,6 +216,9 @@
|
|
216
216
|
<optional>
|
217
217
|
<ref name="fullname"/>
|
218
218
|
</optional>
|
219
|
+
<zeroOrMore>
|
220
|
+
<ref name="credential"/>
|
221
|
+
</zeroOrMore>
|
219
222
|
<zeroOrMore>
|
220
223
|
<ref name="affiliation"/>
|
221
224
|
</zeroOrMore>
|
@@ -232,6 +235,11 @@
|
|
232
235
|
<ref name="FullNameType"/>
|
233
236
|
</element>
|
234
237
|
</define>
|
238
|
+
<define name="credential">
|
239
|
+
<element name="credential">
|
240
|
+
<text/>
|
241
|
+
</element>
|
242
|
+
</define>
|
235
243
|
<define name="FullNameType">
|
236
244
|
<choice>
|
237
245
|
<group>
|
@@ -305,7 +313,9 @@
|
|
305
313
|
<zeroOrMore>
|
306
314
|
<ref name="affiliationdescription"/>
|
307
315
|
</zeroOrMore>
|
308
|
-
<
|
316
|
+
<optional>
|
317
|
+
<ref name="organization"/>
|
318
|
+
</optional>
|
309
319
|
</element>
|
310
320
|
</define>
|
311
321
|
<define name="affiliationname">
|
@@ -1316,7 +1326,7 @@
|
|
1316
1326
|
<value>commentaryOf</value>
|
1317
1327
|
<value>hasCommentary</value>
|
1318
1328
|
<value>related</value>
|
1319
|
-
<value>
|
1329
|
+
<value>hasComplement</value>
|
1320
1330
|
<value>complementOf</value>
|
1321
1331
|
<value>obsoletes</value>
|
1322
1332
|
<value>obsoletedBy</value>
|
@@ -3,14 +3,6 @@ require "mechanize"
|
|
3
3
|
module RelatonBipm
|
4
4
|
class BipmBibliography
|
5
5
|
GH_ENDPOINT = "https://raw.githubusercontent.com/relaton/relaton-data-bipm/master/".freeze
|
6
|
-
IOP_DOMAIN = "https://iopscience.iop.org".freeze
|
7
|
-
TRANSLATIONS = {
|
8
|
-
"Déclaration" => "Declaration",
|
9
|
-
"Réunion" => "Meeting",
|
10
|
-
"Recommandation" => "Recommendation",
|
11
|
-
"Résolution" => "Resolution",
|
12
|
-
"Décision" => "Decision",
|
13
|
-
}.freeze
|
14
6
|
|
15
7
|
class << self
|
16
8
|
# @param text [String]
|
@@ -18,14 +10,13 @@ module RelatonBipm
|
|
18
10
|
def search(text, _year = nil, _opts = {}) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
|
19
11
|
warn "[relaton-bipm] (\"#{text}\") fetching..."
|
20
12
|
ref = text.sub(/^BIPM\s/, "")
|
21
|
-
item =
|
13
|
+
item = get_bipm(ref, magent)
|
22
14
|
unless item
|
23
15
|
warn "[relaton-bipm] (\"#{text}\") not found."
|
24
16
|
return
|
25
17
|
end
|
26
18
|
|
27
19
|
warn("[relaton-bipm] (\"#{text}\") found #{item.docidentifier[0].id}")
|
28
|
-
item.fetched = Date.today.to_s
|
29
20
|
item
|
30
21
|
rescue Mechanize::ResponseCodeError => e
|
31
22
|
raise RelatonBib::RequestError, e.message unless e.response_code == "404"
|
@@ -48,295 +39,28 @@ module RelatonBipm
|
|
48
39
|
a
|
49
40
|
end
|
50
41
|
|
51
|
-
# @param
|
42
|
+
# @param reference [String]
|
52
43
|
# @param agent [Mechanize]
|
53
44
|
# @return [RelatonBipm::BipmBibliographicItem]
|
54
|
-
def get_bipm(
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
# TRANSLATIONS.each { |fr, en| rf.sub! fr, en }
|
60
|
-
path = Index.new.search ref
|
61
|
-
return unless path
|
45
|
+
def get_bipm(reference, agent) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
|
46
|
+
ref_id = Id.new reference
|
47
|
+
index = Relaton::Index.find_or_create :BIPM, url: "#{GH_ENDPOINT}index2.zip"
|
48
|
+
rows = index.search { |r| ref_id == r[:id] }
|
49
|
+
return unless rows.any?
|
62
50
|
|
63
|
-
url = "#{GH_ENDPOINT}#{
|
51
|
+
url = "#{GH_ENDPOINT}#{rows.first[:file]}"
|
64
52
|
resp = agent.get url
|
65
|
-
check_response resp
|
66
53
|
return unless resp.code == "200"
|
67
54
|
|
68
55
|
yaml = RelatonBib.parse_yaml resp.body, [Date]
|
69
|
-
|
56
|
+
yaml["fetched"] = Date.today.to_s
|
70
57
|
bib_hash = HashConverter.hash_to_bib yaml
|
71
58
|
BipmBibliographicItem.new(**bib_hash)
|
72
59
|
end
|
73
60
|
|
74
|
-
#
|
75
|
-
#
|
76
|
-
#
|
77
|
-
def get_metrologia(ref, agent)
|
78
|
-
agent.redirect_ok = false
|
79
|
-
ref_arr = ref.split
|
80
|
-
case ref_arr.size
|
81
|
-
when 1 then get_journal agent
|
82
|
-
when 2 then get_volume ref_arr[1], agent
|
83
|
-
when 3 then get_issue(*ref_arr[1..2], agent)
|
84
|
-
when 4 then get_article_from_issue(*ref_arr[1..3], agent)
|
85
|
-
end
|
86
|
-
end
|
87
|
-
|
88
|
-
# @param agent [Mechanize]
|
89
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
90
|
-
def get_journal(agent)
|
91
|
-
url = "#{IOP_DOMAIN}/journal/0026-1394"
|
92
|
-
rsp = agent.get url
|
93
|
-
check_response rsp
|
94
|
-
rel = rsp.xpath('//select[@id="allVolumesSelector"]/option').map do |v|
|
95
|
-
{ type: "partOf", bibitem: journal_rel(v) }
|
96
|
-
end
|
97
|
-
did = doc_id []
|
98
|
-
bibitem(formattedref: fref(did.id), docid: [did], link: blink(url), relation: rel)
|
99
|
-
end
|
100
|
-
|
101
|
-
# @param elm [Nokogiri::XML::Element]
|
102
|
-
def journal_rel(elm)
|
103
|
-
vol = elm[:value].split("/").last
|
104
|
-
did = doc_id [vol]
|
105
|
-
url = IOP_DOMAIN + elm[:value]
|
106
|
-
BipmBibliographicItem.new(formattedref: fref(did.id), docid: [did], link: blink(url))
|
107
|
-
end
|
108
|
-
|
109
|
-
# @param vol [String]
|
110
|
-
# @param agent [Mechanize]
|
111
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
112
|
-
def get_volume(vol, agent)
|
113
|
-
url = "#{IOP_DOMAIN}/volume/0026-1394/#{vol}"
|
114
|
-
rsp = agent.get url
|
115
|
-
check_response rsp
|
116
|
-
rel = rsp.xpath('//li[@itemprop="hasPart"]').map do |i|
|
117
|
-
{ type: "partOf", bibitem: volume_rel(i, vol) }
|
118
|
-
end
|
119
|
-
did = doc_id [vol]
|
120
|
-
bibitem(formattedref: fref(did.id), docid: [did], link: blink(url), date: bdate(rsp), relation: rel,
|
121
|
-
extent: btextent(vol), series: series)
|
122
|
-
end
|
123
|
-
|
124
|
-
def volume_rel(elm, vol) # rubocop:disable Metrics/AbcSize
|
125
|
-
a = elm.at 'a[@itemprop="issueNumber"]'
|
126
|
-
ish = a[:href].split("/").last
|
127
|
-
url = IOP_DOMAIN + a[:href]
|
128
|
-
docid = doc_id [vol, ish]
|
129
|
-
t = elm.at "p"
|
130
|
-
title_fref = t ? { title: titles(t.text) } : { formattedref: fref(docid.id) }
|
131
|
-
BipmBibliographicItem.new(**title_fref, docid: [docid], link: blink(url))
|
132
|
-
end
|
133
|
-
|
134
|
-
# @param title [String]
|
135
|
-
# @return [RelatonBib::TypedTitleStringCollection]
|
136
|
-
def titles(title)
|
137
|
-
RelatonBib::TypedTitleString.from_string title, "en", "Latn", "text/html"
|
138
|
-
end
|
139
|
-
|
140
|
-
# @param vol [String]
|
141
|
-
# @param ish [String]
|
142
|
-
# @param agent [Mechanize]
|
143
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
144
|
-
def get_issue(vol, ish, agent) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
|
145
|
-
url = issue_url vol, ish
|
146
|
-
rsp = agent.get url
|
147
|
-
check_response rsp
|
148
|
-
rel = rsp.xpath('//div[@class="art-list-item-body"]').map do |a|
|
149
|
-
{ type: "partOf", bibitem: issue_rel(a, vol, ish) }
|
150
|
-
end
|
151
|
-
did = doc_id [vol, ish]
|
152
|
-
title_fref = { title: issue_title(rsp) }
|
153
|
-
title_fref[:formattedref] = fref did.id unless title_fref[:title].any?
|
154
|
-
bibitem(**title_fref, link: blink(url), relation: rel, docid: [did],
|
155
|
-
date: bdate(rsp), extent: btextent(vol, ish), series: series)
|
156
|
-
end
|
157
|
-
|
158
|
-
# @param ref [String]
|
159
|
-
# @return [RelatonBib::FormattedRef]
|
160
|
-
def fref(ref)
|
161
|
-
RelatonBib::FormattedRef.new content: ref, language: "en", script: "Latn"
|
162
|
-
end
|
163
|
-
|
164
|
-
# @param rsp [Mechanize::Page]
|
165
|
-
# @return [RelatonBib::TypedTitleStringCollection]
|
166
|
-
def issue_title(rsp)
|
167
|
-
t = rsp.at('//div[@id="wd-jnl-issue-title"]/h4')
|
168
|
-
return RelatonBib::TypedTitleStringCollection.new [] unless t
|
169
|
-
|
170
|
-
titles(t.text)
|
171
|
-
end
|
172
|
-
|
173
|
-
# @oaran vol [String]
|
174
|
-
# @param ish [String]
|
175
|
-
# @return [String]
|
176
|
-
def issue_url(vol, ish)
|
177
|
-
"#{IOP_DOMAIN}/issue/0026-1394/#{vol}/#{ish}"
|
178
|
-
end
|
179
|
-
|
180
|
-
# @param elm [Nokogiri::XML::Element]
|
181
|
-
# @param vol [String]
|
182
|
-
# @param ish [String]
|
183
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
184
|
-
def issue_rel(elm, vol, ish)
|
185
|
-
art = elm.at('div[@class="indexer"]').text
|
186
|
-
ref = elm.at('div/a[@class="art-list-item-title"]')
|
187
|
-
title = titles ref.text.strip
|
188
|
-
docid = doc_id [vol, ish, art]
|
189
|
-
link = blink IOP_DOMAIN + ref[:href]
|
190
|
-
BipmBibliographicItem.new(title: title, docid: [docid], link: link)
|
191
|
-
end
|
192
|
-
|
193
|
-
# @param content [RelatonBib::TypedTitleString]
|
194
|
-
# @return [RelatonBib::TypedTitleString]
|
195
|
-
def btitle(content)
|
196
|
-
RelatonBib::TypedTitleString.new type: "main", content: content, language: "en", script: "Latn"
|
197
|
-
end
|
198
|
-
|
199
|
-
# @param url [String]
|
200
|
-
# @return [String]
|
201
|
-
def blink(url)
|
202
|
-
[RelatonBib::TypedUri.new(type: "src", content: url)]
|
203
|
-
end
|
204
|
-
|
205
|
-
# @param rsp [Mechanize::Page]
|
206
|
-
# @return [Array<RelatonBib::BibliographicDate>]
|
207
|
-
def bdate(rsp)
|
208
|
-
date = rsp.at('//p[@itemprop="issueNumber"]|//h2[@itemprop="volumeNumber"]').text.split(", ").last
|
209
|
-
on = date.match?(/^\d{4}$/) ? date : Date.parse(date).strftime("%Y-%m")
|
210
|
-
[RelatonBib::BibliographicDate.new(type: "published", on: on)]
|
211
|
-
end
|
212
|
-
|
213
|
-
# @param args [Array<String>]
|
214
|
-
# @return [RelatonBib::DocumentIdentifier]
|
215
|
-
def doc_id(args)
|
216
|
-
id = args.clone.unshift "Metrologia"
|
217
|
-
RelatonBib::DocumentIdentifier.new(type: "BIPM", id: id.join(" "), primary: true)
|
218
|
-
end
|
219
|
-
|
220
|
-
# @param vol [String]
|
221
|
-
# @param ish [String]
|
222
|
-
# @param art [String]
|
223
|
-
# @param agent [Mechanize]
|
224
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
225
|
-
def get_article_from_issue(vol, ish, art, agent) # rubocop:disable Metrics/MethodLength
|
226
|
-
url = issue_url vol, ish
|
227
|
-
rsp = agent.get url
|
228
|
-
check_response rsp
|
229
|
-
link = rsp.at("//div[@class='indexer'][.='#{art}']/../div/a")
|
230
|
-
unless link
|
231
|
-
arts = rsp.xpath("//div[@class='indexer']").map(&:text)
|
232
|
-
warn "[relaton-bipm] No article is available at the specified start page \"#{art}\" in issue \"BIPM Metrologia #{vol} #{ish}\"."
|
233
|
-
warn "[relaton-bipm] Available articles in the issue start at the following pages: (#{arts.join(', ')})"
|
234
|
-
return
|
235
|
-
end
|
236
|
-
|
237
|
-
get_article link[:href], vol, ish, agent
|
238
|
-
end
|
239
|
-
|
240
|
-
# @param path [String]
|
241
|
-
# @param vol [String]
|
242
|
-
# @param ish [String]
|
243
|
-
# @param agent [Mechanize]
|
244
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
245
|
-
def get_article(path, vol, ish, agent) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
|
246
|
-
agent.agent.allowed_error_codes = [403]
|
247
|
-
rsp = agent.get path
|
248
|
-
check_response rsp
|
249
|
-
title = rsp.at("//h1[@itemprop='headline']").children.to_xml
|
250
|
-
url = rsp.uri
|
251
|
-
bib = rsp.link_with(text: "BibTeX").href
|
252
|
-
rsp = agent.get bib
|
253
|
-
check_response rsp
|
254
|
-
bt = BibTeX.parse(rsp.body).first
|
255
|
-
bibitem(
|
256
|
-
docid: btdocid(bt), title: titles(title), date: btdate(bt),
|
257
|
-
abstract: btabstract(bt), doctype: bt.type.to_s, series: series,
|
258
|
-
link: btlink(bt, url), contributor: btcontrib(bt),
|
259
|
-
extent: btextent(vol, ish, bt.pages.to_s)
|
260
|
-
)
|
261
|
-
end
|
262
|
-
|
263
|
-
# @param args [Hash]
|
264
|
-
# @return [RelatonBipm::BipmBibliographicItem]
|
265
|
-
def bibitem(**args)
|
266
|
-
BipmBibliographicItem.new(
|
267
|
-
type: "article", language: ["en"], script: ["Latn"], **args,
|
268
|
-
)
|
269
|
-
end
|
270
|
-
|
271
|
-
# @return [Array<RelatonBib::Series>]
|
272
|
-
def series
|
273
|
-
[RelatonBib::Series.new(title: btitle("Metrologia"))]
|
274
|
-
end
|
275
|
-
|
276
|
-
# @param bibtex [BibTeX::Entry]
|
277
|
-
# @return [Array<RelatonBib::DocumentIdentifier>]
|
278
|
-
def btdocid(bibtex)
|
279
|
-
id = "#{bibtex.journal} #{bibtex.volume} #{bibtex.number} #{bibtex.pages.match(/^\d+/)}"
|
280
|
-
[
|
281
|
-
RelatonBib::DocumentIdentifier.new(type: "BIPM", id: id, primary: true),
|
282
|
-
RelatonBib::DocumentIdentifier.new(type: "DOI", id: bibtex.doi),
|
283
|
-
]
|
284
|
-
end
|
285
|
-
|
286
|
-
# @param bibtex [BibTeX::Entry]
|
287
|
-
# @return [Array<RelatonBib::FormattedString>]
|
288
|
-
def btabstract(bibtex)
|
289
|
-
[RelatonBib::FormattedString.new(content: bibtex.abstract.to_s, language: "en", script: "Latn")]
|
290
|
-
end
|
291
|
-
|
292
|
-
# @param bibtex [BibTeX::Entry]
|
293
|
-
# @param ref [URI]
|
294
|
-
# @return [Array<RelatonBib::TypedUri>]
|
295
|
-
def btlink(bibtex, ref)
|
296
|
-
[
|
297
|
-
RelatonBib::TypedUri.new(type: "src", content: ref.to_s),
|
298
|
-
RelatonBib::TypedUri.new(type: "doi", content: bibtex.url.to_s),
|
299
|
-
]
|
300
|
-
end
|
301
|
-
|
302
|
-
# @param bibtex [BibTeX::Entry]
|
303
|
-
# @return [Array<RelatonBib::BibliographicDate>]
|
304
|
-
def btdate(bibtex)
|
305
|
-
on = Date.new(bibtex.year.to_i, bibtex.month_numeric)
|
306
|
-
[RelatonBib::BibliographicDate.new(type: "published", on: on)]
|
307
|
-
end
|
308
|
-
|
309
|
-
# @param bibtex [BibTeX::Entry]
|
310
|
-
# @return [Array<Hash>]
|
311
|
-
def btcontrib(bibtex) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize
|
312
|
-
contribs = []
|
313
|
-
if bibtex.publisher && !bibtex.publisher.empty?
|
314
|
-
org = RelatonBib::Organization.new name: bibtex.publisher.to_s
|
315
|
-
contribs << { entity: org, role: [{ type: "publisher" }] }
|
316
|
-
end
|
317
|
-
return contribs unless bibtex.author && !bibtex.author.empty?
|
318
|
-
|
319
|
-
bibtex.author.split(" and ").inject(contribs) do |mem, name|
|
320
|
-
cname = RelatonBib::LocalizedString.new name, "en", "Latn"
|
321
|
-
name = RelatonBib::FullName.new completename: cname
|
322
|
-
author = RelatonBib::Person.new name: name
|
323
|
-
mem << { entity: author, role: [{ type: "author" }] }
|
324
|
-
end
|
325
|
-
end
|
326
|
-
|
327
|
-
#
|
328
|
-
# @param vol [String] volume
|
329
|
-
# @param ish [String] issue
|
330
|
-
# @param pgs [String] pages
|
331
|
-
#
|
332
|
-
# @return [Array<RelatonBib::BibItemLocality>]
|
333
|
-
#
|
334
|
-
def btextent(vol, ish = nil, pgs = nil)
|
335
|
-
ext = [RelatonBib::Locality.new("volume", vol)]
|
336
|
-
ext << RelatonBib::Locality.new("issue", ish) if ish
|
337
|
-
ext << RelatonBib::Locality.new("page", *pgs.split("--")) if pgs
|
338
|
-
ext
|
339
|
-
end
|
61
|
+
# def match_item(ids, ref_id)
|
62
|
+
# ids.find { |id| Id.new(id) == ref_id }
|
63
|
+
# end
|
340
64
|
|
341
65
|
# @param ref [String] the BIPM standard Code to look up (e..g "BIPM B-11")
|
342
66
|
# @param year [String] not used
|
@@ -345,28 +69,6 @@ module RelatonBipm
|
|
345
69
|
def get(ref, year = nil, opts = {})
|
346
70
|
search(ref, year, opts)
|
347
71
|
end
|
348
|
-
|
349
|
-
private
|
350
|
-
|
351
|
-
#
|
352
|
-
# Check HTTP response. Warn and rise error if response is not 200
|
353
|
-
# or redirect to CAPTCHA.
|
354
|
-
#
|
355
|
-
# @param [Mechanize] rsp response
|
356
|
-
#
|
357
|
-
# @raise [RelatonBib::RequestError] if response is not 200
|
358
|
-
#
|
359
|
-
def check_response(rsp) # rubocop:disable Metrics/AbcSize
|
360
|
-
if rsp.code == "302"
|
361
|
-
warn "[relaton-bipm] This source employs anti-DDoS measures that unfortunately affects automated requests."
|
362
|
-
warn "[relaton-bipm] Please visit this link in your browser to resolve the CAPTCHA, then retry: #{rsp.uri}"
|
363
|
-
# warn "[relaton-bipm] #{rsp.uri} is redirected to #{rsp.header['location']}"
|
364
|
-
raise RelatonBib::RequestError, "cannot access #{rsp.uri}"
|
365
|
-
elsif rsp.code != "200" && rsp.code != "403"
|
366
|
-
warn "[read_bipm] can't acces #{rsp.uri} #{rsp.code}"
|
367
|
-
raise RelatonBib::RequestError, "cannot acces #{rsp.uri} #{rsp.code}"
|
368
|
-
end
|
369
|
-
end
|
370
72
|
end
|
371
73
|
end
|
372
74
|
end
|
@@ -6,7 +6,7 @@ module RelatonBipm
|
|
6
6
|
# @param [RelatonBipm::DataFetcher] data_fetcher data fetcher
|
7
7
|
#
|
8
8
|
def initialize(data_fetcher)
|
9
|
-
@data_fetcher = data_fetcher
|
9
|
+
@data_fetcher = WeakRef.new data_fetcher
|
10
10
|
end
|
11
11
|
|
12
12
|
#
|
@@ -27,14 +27,18 @@ module RelatonBipm
|
|
27
27
|
# puts "Ls #{Dir['bipm-si-brochure/*']}"
|
28
28
|
# puts "Ls #{Dir['bipm-si-brochure/site/*']}"
|
29
29
|
# puts "Ls #{Dir['bipm-si-brochure/site/documents/*']}"
|
30
|
-
Dir["bipm-si-brochure/
|
30
|
+
Dir["bipm-si-brochure/_site/documents/*.rxl"].each do |f|
|
31
31
|
puts "Parsing #{f}"
|
32
32
|
docstd = Nokogiri::XML File.read f
|
33
33
|
doc = docstd.at "/bibdata"
|
34
34
|
hash1 = RelatonBipm::XMLParser.from_xml(doc.to_xml).to_hash
|
35
35
|
fix_si_brochure_id hash1
|
36
|
-
|
37
|
-
|
36
|
+
basename = File.join @data_fetcher.output, File.basename(f).sub(/(?:-(?:en|fr))?\.rxl$/, "")
|
37
|
+
outfile = "#{basename}.#{@data_fetcher.ext}"
|
38
|
+
key = hash1["docnumber"] || basename
|
39
|
+
@data_fetcher.index[[key]] = outfile
|
40
|
+
@data_fetcher.index_new.add_or_update [key], outfile
|
41
|
+
@data_fetcher.index2.add_or_update Id.new(key).normalized_hash, outfile
|
38
42
|
hash = if File.exist? outfile
|
39
43
|
warn_duplicate = false
|
40
44
|
hash2 = YAML.load_file outfile
|
@@ -1,6 +1,6 @@
|
|
1
1
|
module RelatonBipm
|
2
2
|
class DataFetcher
|
3
|
-
attr_reader :output, :format, :ext, :files, :index
|
3
|
+
attr_reader :output, :format, :ext, :files, :index, :index_new, :index2
|
4
4
|
|
5
5
|
#
|
6
6
|
# Initialize fetcher
|
@@ -15,6 +15,8 @@ module RelatonBipm
|
|
15
15
|
@files = []
|
16
16
|
@index_path = "index.yaml"
|
17
17
|
@index = File.exist?(@index_path) ? YAML.load_file(@index_path) : {}
|
18
|
+
@index_new = Relaton::Index.find_or_create :BIPM, file: "index-bipm.yaml"
|
19
|
+
@index2 = Relaton::Index.find_or_create :BIPM, file: "index2.yaml"
|
18
20
|
end
|
19
21
|
|
20
22
|
#
|
@@ -43,8 +45,11 @@ module RelatonBipm
|
|
43
45
|
case source
|
44
46
|
when "bipm-data-outcomes" then DataOutcomesParser.parse(self)
|
45
47
|
when "bipm-si-brochure" then BipmSiBrochureParser.parse(self)
|
48
|
+
when "rawdata-bipm-metrologia" then RawdataBipmMetrologia::Fetcher.fetch(self)
|
46
49
|
end
|
47
|
-
File.write @index_path,
|
50
|
+
File.write @index_path, index.to_yaml, encoding: "UTF-8"
|
51
|
+
index_new.save
|
52
|
+
index2.save
|
48
53
|
end
|
49
54
|
|
50
55
|
#
|
@@ -54,15 +59,22 @@ module RelatonBipm
|
|
54
59
|
# @param [RelatonBipm::BipmBibliographicItem] item document to save
|
55
60
|
# @param [Boolean, nil] warn_duplicate Warn if document already exists
|
56
61
|
#
|
57
|
-
# @return [<Type>] <description>
|
58
|
-
#
|
59
62
|
def write_file(path, item, warn_duplicate: true)
|
63
|
+
content = serialize item
|
60
64
|
if @files.include?(path)
|
61
65
|
warn "File #{path} already exists" if warn_duplicate
|
62
66
|
else
|
63
67
|
@files << path
|
64
68
|
end
|
65
|
-
File.write path,
|
69
|
+
File.write path, content, encoding: "UTF-8"
|
70
|
+
end
|
71
|
+
|
72
|
+
def serialize(item)
|
73
|
+
case @format
|
74
|
+
when "xml" then item.to_xml bibdata: true
|
75
|
+
when "yaml" then item.to_hash.to_yaml
|
76
|
+
when "bibxml" then item.to_bibxml
|
77
|
+
end
|
66
78
|
end
|
67
79
|
end
|
68
80
|
end
|