relaton-bipm 1.14.1 → 1.14.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/Gemfile +6 -0
- data/README.adoc +32 -12
- data/grammars/basicdoc.rng +0 -1
- data/grammars/biblio.rng +12 -2
- data/lib/relaton_bipm/bipm_bibliography.rb +12 -310
- data/lib/relaton_bipm/bipm_si_brochure_parser.rb +8 -4
- data/lib/relaton_bipm/comment_periond.rb +1 -1
- data/lib/relaton_bipm/data_fetcher.rb +17 -5
- data/lib/relaton_bipm/data_outcomes_parser.rb +68 -29
- data/lib/relaton_bipm/id_parser.rb +134 -0
- data/lib/relaton_bipm/processor.rb +5 -4
- data/lib/relaton_bipm/rawdata_bipm_metrologia/article_parser.rb +311 -0
- data/lib/relaton_bipm/rawdata_bipm_metrologia/fetcher.rb +176 -0
- data/lib/relaton_bipm/version.rb +1 -1
- data/lib/relaton_bipm.rb +5 -1
- data/relaton_bipm.gemspec +2 -6
- metadata +26 -80
- data/lib/relaton_bipm/index.rb +0 -68
    
        checksums.yaml
    CHANGED
    
    | @@ -1,7 +1,7 @@ | |
| 1 1 | 
             
            ---
         | 
| 2 2 | 
             
            SHA256:
         | 
| 3 | 
            -
              metadata.gz:  | 
| 4 | 
            -
              data.tar.gz:  | 
| 3 | 
            +
              metadata.gz: 34d720b316dbd942e2c5d630d2ae0f07b74331e4ef07f68715e84304bad0fb13
         | 
| 4 | 
            +
              data.tar.gz: 38d36e34b998db6e4fa9e9f1a6e5306fadacea45b9a878cd14629eaaca2ef50d
         | 
| 5 5 | 
             
            SHA512:
         | 
| 6 | 
            -
              metadata.gz:  | 
| 7 | 
            -
              data.tar.gz:  | 
| 6 | 
            +
              metadata.gz: a22261617d5c3de8aad7ed410091331698630f958332c5feb0b215b9fafe9167015530e9e6ecb71046307a6775aa7a2a0a71dd4c12e4f982cb0bd259e021a267
         | 
| 7 | 
            +
              data.tar.gz: 376bb090dd4d273b8039357d78280c9bc4f1555918920a55b22ec72cb8be87c4ce0a8469254ffc000256fa95069a271b65b978dc767d735acd4b3e141c5dea24
         | 
    
        data/.gitignore
    CHANGED
    
    
    
        data/Gemfile
    CHANGED
    
    
    
        data/README.adoc
    CHANGED
    
    | @@ -70,22 +70,35 @@ Allowed document names are: | |
| 70 70 |  | 
| 71 71 | 
             
            ==== Reference structure for Metrologia documents
         | 
| 72 72 |  | 
| 73 | 
            -
            `BIPM Metrologia {JOURNAL} {VOLUME} {ISSUE} | 
| 73 | 
            +
            `BIPM Metrologia {JOURNAL} {VOLUME} {ISSUE}`
         | 
| 74 74 |  | 
| 75 | 
            -
            - `{JOURNAL}` - number | 
| 76 | 
            -
            - `{VOLUME}` - number | 
| 77 | 
            -
            - `{ISSUE}` - number | 
| 78 | 
            -
            - `{PAGE}` - number of page, optional
         | 
| 75 | 
            +
            - `{JOURNAL}` - journal number, required
         | 
| 76 | 
            +
            - `{VOLUME}` - volume number, optional
         | 
| 77 | 
            +
            - `{ISSUE}` - issue number, optional
         | 
| 79 78 |  | 
| 80 79 | 
             
            ==== Reference structures for CCTF (CCDS), CGPM, CIPM documents
         | 
| 81 80 |  | 
| 82 81 | 
             
            ===== Basic pattern
         | 
| 83 82 |  | 
| 84 83 | 
             
            ----
         | 
| 85 | 
            -
            Long: | 
| 86 | 
            -
             | 
| 84 | 
            +
            Long:
         | 
| 85 | 
            +
            {group name} -- {type} {number} ({year})
         | 
| 86 | 
            +
            {group name} {type} {number} ({year})
         | 
| 87 | 
            +
            {group name} {type} {year}-{zero_leading_number}
         | 
| 88 | 
            +
             | 
| 89 | 
            +
            Short:
         | 
| 90 | 
            +
            {group name} -- {type-abbrev} {number} ({year}, {lang})
         | 
| 91 | 
            +
            {group name} {type-abbrev} {number} ({year}, {lang})
         | 
| 87 92 | 
             
            ----
         | 
| 88 93 |  | 
| 94 | 
            +
            `group name` - a name of the group, required. A full list of group names is available https://github.com/metanorma/bipm-editor-guides/blob/main/sources/bipm-outcomes-en.adoc#appendix-a-bipm-groups-and-codes[here].
         | 
| 95 | 
            +
            `type` - a type of document, required. A list of types is: Resolution (Résolution), Recommendation (Recommandation), Decision (Décision), Meeting (Réunion), Declaration (Déclaration).
         | 
| 96 | 
            +
            `type-abbrev` - an abbreviation of the type, required. A list of abbreviations: RES (Resolution), REC (Recommendation), DECN (Decision).
         | 
| 97 | 
            +
            `number` - a number of the document, optional. Can be with part, e.g. `1-2`.
         | 
| 98 | 
            +
            `zero_leading_number` - a number of the document with a leading zero, required. Can be used when a document has a 1 or 2 digits number. It's `00` for documents without a number.
         | 
| 99 | 
            +
            `year` - a year of the document, optional.
         | 
| 100 | 
            +
            `lang` - a language of the document, optional. Can be `EN` or `FR`.
         | 
| 101 | 
            +
             | 
| 89 102 | 
             
            ===== Special case pattern
         | 
| 90 103 |  | 
| 91 104 | 
             
            The basic pattern works fine for all, except for these 2 cases:
         | 
| @@ -189,9 +202,9 @@ item = RelatonBipm::BipmBibliography.get "BIPM SI Brochure" | |
| 189 202 | 
             
            ...
         | 
| 190 203 |  | 
| 191 204 | 
             
            # get BIPM Metrologia page
         | 
| 192 | 
            -
            bib = RelatonBipm::BipmBibliography.get "BIPM Metrologia 29 6  | 
| 193 | 
            -
            [relaton-bipm] ("BIPM Metrologia 29 6  | 
| 194 | 
            -
            [relaton-bipm] ("BIPM Metrologia 29 6  | 
| 205 | 
            +
            bib = RelatonBipm::BipmBibliography.get "BIPM Metrologia 29 6 001"
         | 
| 206 | 
            +
            [relaton-bipm] ("BIPM Metrologia 29 6 001") fetching...
         | 
| 207 | 
            +
            [relaton-bipm] ("BIPM Metrologia 29 6 001") found Metrologia 29 6 001
         | 
| 195 208 | 
             
            => #<RelatonBipm::BipmBibliographicItem:0x007f8857f94d40
         | 
| 196 209 | 
             
            ...
         | 
| 197 210 |  | 
| @@ -295,7 +308,7 @@ bib.link | |
| 295 308 | 
             
             #<RelatonBib::TypedUri:0x00007fa6d6a29250 @content=#<Addressable::URI:0xc2b0 URI:https://doi.org/10.1088/0026-1394/29/6/001>, @type="doi">]
         | 
| 296 309 | 
             
            ----
         | 
| 297 310 |  | 
| 298 | 
            -
            === Create bibliographic item from XML
         | 
| 311 | 
            +
            === Create a bibliographic item from XML
         | 
| 299 312 |  | 
| 300 313 | 
             
            [source,ruby]
         | 
| 301 314 | 
             
            ----
         | 
| @@ -304,7 +317,7 @@ RelatonBipm::XMLParser.from_xml File.read('spec/fixtures/bipm_item.xml') | |
| 304 317 | 
             
            ...
         | 
| 305 318 | 
             
            ----
         | 
| 306 319 |  | 
| 307 | 
            -
            === Create bibliographic item from YAML
         | 
| 320 | 
            +
            === Create a bibliographic item from YAML
         | 
| 308 321 | 
             
            [source,ruby]
         | 
| 309 322 | 
             
            ----
         | 
| 310 323 | 
             
            hash = YAML.load_file 'spec/fixtures/bipm_item.yml'
         | 
| @@ -321,6 +334,7 @@ RelatonBipm::BipmBibliographicItem.from_hash hash | |
| 321 334 | 
             
            This gem uses the following datasets as data sources:
         | 
| 322 335 | 
             
            - `bipm-data-outcomes` - looking for a local directory with the repository https://github.com/metanorma/bipm-data-outcomes
         | 
| 323 336 | 
             
            - `bipm-si-brochute` - looking for a local directory with the repository https://github.com/metanorma/bipm-si-brochure
         | 
| 337 | 
            +
            - `rawdata-bipm-metrologia` - looking for a local directory with the repository https://github.com/relaton/rawdata-bipm-metrologia
         | 
| 324 338 |  | 
| 325 339 | 
             
            The method `RelatonBipm::DataFetcher.fetch(source, output: "data", format: "yaml")` fetches all the documents from the dataset and saves them to the `./data` folder in YAML format.
         | 
| 326 340 | 
             
            Arguments:
         | 
| @@ -342,6 +356,12 @@ Started at: 2022-06-23 09:37:12 +0200 | |
| 342 356 | 
             
            Stopped at: 2022-06-23 09:37:12 +0200
         | 
| 343 357 | 
             
            Done in: 0 sec.
         | 
| 344 358 | 
             
            => nil
         | 
| 359 | 
            +
             | 
| 360 | 
            +
            RelatonBipm::DataFetcher.fetch "rawdata-bipm-metrologia"
         | 
| 361 | 
            +
            Started at: 2022-06-23 09:39:12 +0200
         | 
| 362 | 
            +
            Stopped at: 2022-06-23 09:40:34 +0200
         | 
| 363 | 
            +
            Done in: 82 sec.
         | 
| 364 | 
            +
            => nil
         | 
| 345 365 | 
             
            ----
         | 
| 346 366 |  | 
| 347 367 | 
             
            == Development
         | 
    
        data/grammars/basicdoc.rng
    CHANGED
    
    
    
        data/grammars/biblio.rng
    CHANGED
    
    | @@ -216,6 +216,9 @@ | |
| 216 216 | 
             
                  <optional>
         | 
| 217 217 | 
             
                    <ref name="fullname"/>
         | 
| 218 218 | 
             
                  </optional>
         | 
| 219 | 
            +
                  <zeroOrMore>
         | 
| 220 | 
            +
                    <ref name="credential"/>
         | 
| 221 | 
            +
                  </zeroOrMore>
         | 
| 219 222 | 
             
                  <zeroOrMore>
         | 
| 220 223 | 
             
                    <ref name="affiliation"/>
         | 
| 221 224 | 
             
                  </zeroOrMore>
         | 
| @@ -232,6 +235,11 @@ | |
| 232 235 | 
             
                  <ref name="FullNameType"/>
         | 
| 233 236 | 
             
                </element>
         | 
| 234 237 | 
             
              </define>
         | 
| 238 | 
            +
              <define name="credential">
         | 
| 239 | 
            +
                <element name="credential">
         | 
| 240 | 
            +
                  <text/>
         | 
| 241 | 
            +
                </element>
         | 
| 242 | 
            +
              </define>
         | 
| 235 243 | 
             
              <define name="FullNameType">
         | 
| 236 244 | 
             
                <choice>
         | 
| 237 245 | 
             
                  <group>
         | 
| @@ -305,7 +313,9 @@ | |
| 305 313 | 
             
                  <zeroOrMore>
         | 
| 306 314 | 
             
                    <ref name="affiliationdescription"/>
         | 
| 307 315 | 
             
                  </zeroOrMore>
         | 
| 308 | 
            -
                  < | 
| 316 | 
            +
                  <optional>
         | 
| 317 | 
            +
                    <ref name="organization"/>
         | 
| 318 | 
            +
                  </optional>
         | 
| 309 319 | 
             
                </element>
         | 
| 310 320 | 
             
              </define>
         | 
| 311 321 | 
             
              <define name="affiliationname">
         | 
| @@ -1316,7 +1326,7 @@ | |
| 1316 1326 | 
             
                  <value>commentaryOf</value>
         | 
| 1317 1327 | 
             
                  <value>hasCommentary</value>
         | 
| 1318 1328 | 
             
                  <value>related</value>
         | 
| 1319 | 
            -
                  <value> | 
| 1329 | 
            +
                  <value>hasComplement</value>
         | 
| 1320 1330 | 
             
                  <value>complementOf</value>
         | 
| 1321 1331 | 
             
                  <value>obsoletes</value>
         | 
| 1322 1332 | 
             
                  <value>obsoletedBy</value>
         | 
| @@ -3,14 +3,6 @@ require "mechanize" | |
| 3 3 | 
             
            module RelatonBipm
         | 
| 4 4 | 
             
              class BipmBibliography
         | 
| 5 5 | 
             
                GH_ENDPOINT = "https://raw.githubusercontent.com/relaton/relaton-data-bipm/master/".freeze
         | 
| 6 | 
            -
                IOP_DOMAIN = "https://iopscience.iop.org".freeze
         | 
| 7 | 
            -
                TRANSLATIONS = {
         | 
| 8 | 
            -
                  "Déclaration" => "Declaration",
         | 
| 9 | 
            -
                  "Réunion" => "Meeting",
         | 
| 10 | 
            -
                  "Recommandation" => "Recommendation",
         | 
| 11 | 
            -
                  "Résolution" => "Resolution",
         | 
| 12 | 
            -
                  "Décision" => "Decision",
         | 
| 13 | 
            -
                }.freeze
         | 
| 14 6 |  | 
| 15 7 | 
             
                class << self
         | 
| 16 8 | 
             
                  # @param text [String]
         | 
| @@ -18,14 +10,13 @@ module RelatonBipm | |
| 18 10 | 
             
                  def search(text, _year = nil, _opts = {}) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
         | 
| 19 11 | 
             
                    warn "[relaton-bipm] (\"#{text}\") fetching..."
         | 
| 20 12 | 
             
                    ref = text.sub(/^BIPM\s/, "")
         | 
| 21 | 
            -
                    item =  | 
| 13 | 
            +
                    item = get_bipm(ref, magent)
         | 
| 22 14 | 
             
                    unless item
         | 
| 23 15 | 
             
                      warn "[relaton-bipm] (\"#{text}\") not found."
         | 
| 24 16 | 
             
                      return
         | 
| 25 17 | 
             
                    end
         | 
| 26 18 |  | 
| 27 19 | 
             
                    warn("[relaton-bipm] (\"#{text}\") found #{item.docidentifier[0].id}")
         | 
| 28 | 
            -
                    item.fetched = Date.today.to_s
         | 
| 29 20 | 
             
                    item
         | 
| 30 21 | 
             
                  rescue Mechanize::ResponseCodeError => e
         | 
| 31 22 | 
             
                    raise RelatonBib::RequestError, e.message unless e.response_code == "404"
         | 
| @@ -48,295 +39,28 @@ module RelatonBipm | |
| 48 39 | 
             
                    a
         | 
| 49 40 | 
             
                  end
         | 
| 50 41 |  | 
| 51 | 
            -
                  # @param  | 
| 42 | 
            +
                  # @param reference [String]
         | 
| 52 43 | 
             
                  # @param agent [Mechanize]
         | 
| 53 44 | 
             
                  # @return [RelatonBipm::BipmBibliographicItem]
         | 
| 54 | 
            -
                  def get_bipm( | 
| 55 | 
            -
                     | 
| 56 | 
            -
                     | 
| 57 | 
            -
                     | 
| 58 | 
            -
                     | 
| 59 | 
            -
                    # TRANSLATIONS.each { |fr, en| rf.sub! fr, en }
         | 
| 60 | 
            -
                    path = Index.new.search ref
         | 
| 61 | 
            -
                    return unless path
         | 
| 45 | 
            +
                  def get_bipm(reference, agent) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
         | 
| 46 | 
            +
                    ref_id = Id.new reference
         | 
| 47 | 
            +
                    index = Relaton::Index.find_or_create :BIPM, url: "#{GH_ENDPOINT}index2.zip"
         | 
| 48 | 
            +
                    rows = index.search { |r| ref_id == r[:id] }
         | 
| 49 | 
            +
                    return unless rows.any?
         | 
| 62 50 |  | 
| 63 | 
            -
                    url = "#{GH_ENDPOINT}#{ | 
| 51 | 
            +
                    url = "#{GH_ENDPOINT}#{rows.first[:file]}"
         | 
| 64 52 | 
             
                    resp = agent.get url
         | 
| 65 | 
            -
                    check_response resp
         | 
| 66 53 | 
             
                    return unless resp.code == "200"
         | 
| 67 54 |  | 
| 68 55 | 
             
                    yaml = RelatonBib.parse_yaml resp.body, [Date]
         | 
| 69 | 
            -
                     | 
| 56 | 
            +
                    yaml["fetched"] = Date.today.to_s
         | 
| 70 57 | 
             
                    bib_hash = HashConverter.hash_to_bib yaml
         | 
| 71 58 | 
             
                    BipmBibliographicItem.new(**bib_hash)
         | 
| 72 59 | 
             
                  end
         | 
| 73 60 |  | 
| 74 | 
            -
                  #  | 
| 75 | 
            -
                  #  | 
| 76 | 
            -
                  #  | 
| 77 | 
            -
                  def get_metrologia(ref, agent)
         | 
| 78 | 
            -
                    agent.redirect_ok = false
         | 
| 79 | 
            -
                    ref_arr = ref.split
         | 
| 80 | 
            -
                    case ref_arr.size
         | 
| 81 | 
            -
                    when 1 then get_journal agent
         | 
| 82 | 
            -
                    when 2 then get_volume ref_arr[1], agent
         | 
| 83 | 
            -
                    when 3 then get_issue(*ref_arr[1..2], agent)
         | 
| 84 | 
            -
                    when 4 then get_article_from_issue(*ref_arr[1..3], agent)
         | 
| 85 | 
            -
                    end
         | 
| 86 | 
            -
                  end
         | 
| 87 | 
            -
             | 
| 88 | 
            -
                  # @param agent [Mechanize]
         | 
| 89 | 
            -
                  # @return [RelatonBipm::BipmBibliographicItem]
         | 
| 90 | 
            -
                  def get_journal(agent)
         | 
| 91 | 
            -
                    url = "#{IOP_DOMAIN}/journal/0026-1394"
         | 
| 92 | 
            -
                    rsp = agent.get url
         | 
| 93 | 
            -
                    check_response rsp
         | 
| 94 | 
            -
                    rel = rsp.xpath('//select[@id="allVolumesSelector"]/option').map do |v|
         | 
| 95 | 
            -
                      { type: "partOf", bibitem: journal_rel(v) }
         | 
| 96 | 
            -
                    end
         | 
| 97 | 
            -
                    did = doc_id []
         | 
| 98 | 
            -
                    bibitem(formattedref: fref(did.id), docid: [did], link: blink(url), relation: rel)
         | 
| 99 | 
            -
                  end
         | 
| 100 | 
            -
             | 
| 101 | 
            -
                  # @param elm [Nokogiri::XML::Element]
         | 
| 102 | 
            -
                  def journal_rel(elm)
         | 
| 103 | 
            -
                    vol = elm[:value].split("/").last
         | 
| 104 | 
            -
                    did = doc_id [vol]
         | 
| 105 | 
            -
                    url = IOP_DOMAIN + elm[:value]
         | 
| 106 | 
            -
                    BipmBibliographicItem.new(formattedref: fref(did.id), docid: [did], link: blink(url))
         | 
| 107 | 
            -
                  end
         | 
| 108 | 
            -
             | 
| 109 | 
            -
                  # @param vol [String]
         | 
| 110 | 
            -
                  # @param agent [Mechanize]
         | 
| 111 | 
            -
                  # @return [RelatonBipm::BipmBibliographicItem]
         | 
| 112 | 
            -
                  def get_volume(vol, agent)
         | 
| 113 | 
            -
                    url = "#{IOP_DOMAIN}/volume/0026-1394/#{vol}"
         | 
| 114 | 
            -
                    rsp = agent.get url
         | 
| 115 | 
            -
                    check_response rsp
         | 
| 116 | 
            -
                    rel = rsp.xpath('//li[@itemprop="hasPart"]').map do |i|
         | 
| 117 | 
            -
                      { type: "partOf", bibitem: volume_rel(i, vol) }
         | 
| 118 | 
            -
                    end
         | 
| 119 | 
            -
                    did = doc_id [vol]
         | 
| 120 | 
            -
                    bibitem(formattedref: fref(did.id), docid: [did], link: blink(url), date: bdate(rsp), relation: rel,
         | 
| 121 | 
            -
                            extent: btextent(vol), series: series)
         | 
| 122 | 
            -
                  end
         | 
| 123 | 
            -
             | 
| 124 | 
            -
                  def volume_rel(elm, vol) # rubocop:disable Metrics/AbcSize
         | 
| 125 | 
            -
                    a = elm.at 'a[@itemprop="issueNumber"]'
         | 
| 126 | 
            -
                    ish = a[:href].split("/").last
         | 
| 127 | 
            -
                    url = IOP_DOMAIN + a[:href]
         | 
| 128 | 
            -
                    docid = doc_id [vol, ish]
         | 
| 129 | 
            -
                    t = elm.at "p"
         | 
| 130 | 
            -
                    title_fref = t ? { title: titles(t.text) } : { formattedref: fref(docid.id) }
         | 
| 131 | 
            -
                    BipmBibliographicItem.new(**title_fref, docid: [docid], link: blink(url))
         | 
| 132 | 
            -
                  end
         | 
| 133 | 
            -
             | 
| 134 | 
            -
                  # @param title [String]
         | 
| 135 | 
            -
                  # @return [RelatonBib::TypedTitleStringCollection]
         | 
| 136 | 
            -
                  def titles(title)
         | 
| 137 | 
            -
                    RelatonBib::TypedTitleString.from_string title, "en", "Latn", "text/html"
         | 
| 138 | 
            -
                  end
         | 
| 139 | 
            -
             | 
| 140 | 
            -
                  # @param vol [String]
         | 
| 141 | 
            -
                  # @param ish [String]
         | 
| 142 | 
            -
                  # @param agent [Mechanize]
         | 
| 143 | 
            -
                  # @return [RelatonBipm::BipmBibliographicItem]
         | 
| 144 | 
            -
                  def get_issue(vol, ish, agent) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
         | 
| 145 | 
            -
                    url = issue_url vol, ish
         | 
| 146 | 
            -
                    rsp = agent.get url
         | 
| 147 | 
            -
                    check_response rsp
         | 
| 148 | 
            -
                    rel = rsp.xpath('//div[@class="art-list-item-body"]').map do |a|
         | 
| 149 | 
            -
                      { type: "partOf", bibitem: issue_rel(a, vol, ish) }
         | 
| 150 | 
            -
                    end
         | 
| 151 | 
            -
                    did = doc_id [vol, ish]
         | 
| 152 | 
            -
                    title_fref = { title: issue_title(rsp) }
         | 
| 153 | 
            -
                    title_fref[:formattedref] = fref did.id unless title_fref[:title].any?
         | 
| 154 | 
            -
                    bibitem(**title_fref, link: blink(url), relation: rel, docid: [did],
         | 
| 155 | 
            -
                                          date: bdate(rsp), extent: btextent(vol, ish), series: series)
         | 
| 156 | 
            -
                  end
         | 
| 157 | 
            -
             | 
| 158 | 
            -
                  # @param ref [String]
         | 
| 159 | 
            -
                  # @return [RelatonBib::FormattedRef]
         | 
| 160 | 
            -
                  def fref(ref)
         | 
| 161 | 
            -
                    RelatonBib::FormattedRef.new content: ref, language: "en", script: "Latn"
         | 
| 162 | 
            -
                  end
         | 
| 163 | 
            -
             | 
| 164 | 
            -
                  # @param rsp [Mechanize::Page]
         | 
| 165 | 
            -
                  # @return [RelatonBib::TypedTitleStringCollection]
         | 
| 166 | 
            -
                  def issue_title(rsp)
         | 
| 167 | 
            -
                    t = rsp.at('//div[@id="wd-jnl-issue-title"]/h4')
         | 
| 168 | 
            -
                    return RelatonBib::TypedTitleStringCollection.new [] unless t
         | 
| 169 | 
            -
             | 
| 170 | 
            -
                    titles(t.text)
         | 
| 171 | 
            -
                  end
         | 
| 172 | 
            -
             | 
| 173 | 
            -
                  # @oaran vol [String]
         | 
| 174 | 
            -
                  # @param ish [String]
         | 
| 175 | 
            -
                  # @return [String]
         | 
| 176 | 
            -
                  def issue_url(vol, ish)
         | 
| 177 | 
            -
                    "#{IOP_DOMAIN}/issue/0026-1394/#{vol}/#{ish}"
         | 
| 178 | 
            -
                  end
         | 
| 179 | 
            -
             | 
| 180 | 
            -
                  # @param elm [Nokogiri::XML::Element]
         | 
| 181 | 
            -
                  # @param vol [String]
         | 
| 182 | 
            -
                  # @param ish [String]
         | 
| 183 | 
            -
                  # @return [RelatonBipm::BipmBibliographicItem]
         | 
| 184 | 
            -
                  def issue_rel(elm, vol, ish)
         | 
| 185 | 
            -
                    art = elm.at('div[@class="indexer"]').text
         | 
| 186 | 
            -
                    ref = elm.at('div/a[@class="art-list-item-title"]')
         | 
| 187 | 
            -
                    title = titles ref.text.strip
         | 
| 188 | 
            -
                    docid = doc_id [vol, ish, art]
         | 
| 189 | 
            -
                    link = blink IOP_DOMAIN + ref[:href]
         | 
| 190 | 
            -
                    BipmBibliographicItem.new(title: title, docid: [docid], link: link)
         | 
| 191 | 
            -
                  end
         | 
| 192 | 
            -
             | 
| 193 | 
            -
                  # @param content [RelatonBib::TypedTitleString]
         | 
| 194 | 
            -
                  # @return [RelatonBib::TypedTitleString]
         | 
| 195 | 
            -
                  def btitle(content)
         | 
| 196 | 
            -
                    RelatonBib::TypedTitleString.new type: "main", content: content, language: "en", script: "Latn"
         | 
| 197 | 
            -
                  end
         | 
| 198 | 
            -
             | 
| 199 | 
            -
                  # @param url [String]
         | 
| 200 | 
            -
                  # @return [String]
         | 
| 201 | 
            -
                  def blink(url)
         | 
| 202 | 
            -
                    [RelatonBib::TypedUri.new(type: "src", content: url)]
         | 
| 203 | 
            -
                  end
         | 
| 204 | 
            -
             | 
| 205 | 
            -
                  # @param rsp [Mechanize::Page]
         | 
| 206 | 
            -
                  # @return [Array<RelatonBib::BibliographicDate>]
         | 
| 207 | 
            -
                  def bdate(rsp)
         | 
| 208 | 
            -
                    date = rsp.at('//p[@itemprop="issueNumber"]|//h2[@itemprop="volumeNumber"]').text.split(", ").last
         | 
| 209 | 
            -
                    on = date.match?(/^\d{4}$/) ? date : Date.parse(date).strftime("%Y-%m")
         | 
| 210 | 
            -
                    [RelatonBib::BibliographicDate.new(type: "published", on: on)]
         | 
| 211 | 
            -
                  end
         | 
| 212 | 
            -
             | 
| 213 | 
            -
                  # @param args [Array<String>]
         | 
| 214 | 
            -
                  # @return [RelatonBib::DocumentIdentifier]
         | 
| 215 | 
            -
                  def doc_id(args)
         | 
| 216 | 
            -
                    id = args.clone.unshift "Metrologia"
         | 
| 217 | 
            -
                    RelatonBib::DocumentIdentifier.new(type: "BIPM", id: id.join(" "), primary: true)
         | 
| 218 | 
            -
                  end
         | 
| 219 | 
            -
             | 
| 220 | 
            -
                  # @param vol [String]
         | 
| 221 | 
            -
                  # @param ish [String]
         | 
| 222 | 
            -
                  # @param art [String]
         | 
| 223 | 
            -
                  # @param agent [Mechanize]
         | 
| 224 | 
            -
                  # @return [RelatonBipm::BipmBibliographicItem]
         | 
| 225 | 
            -
                  def get_article_from_issue(vol, ish, art, agent) # rubocop:disable Metrics/MethodLength
         | 
| 226 | 
            -
                    url = issue_url vol, ish
         | 
| 227 | 
            -
                    rsp = agent.get url
         | 
| 228 | 
            -
                    check_response rsp
         | 
| 229 | 
            -
                    link = rsp.at("//div[@class='indexer'][.='#{art}']/../div/a")
         | 
| 230 | 
            -
                    unless link
         | 
| 231 | 
            -
                      arts = rsp.xpath("//div[@class='indexer']").map(&:text)
         | 
| 232 | 
            -
                      warn "[relaton-bipm] No article is available at the specified start page \"#{art}\" in issue \"BIPM Metrologia #{vol} #{ish}\"."
         | 
| 233 | 
            -
                      warn "[relaton-bipm] Available articles in the issue start at the following pages: (#{arts.join(', ')})"
         | 
| 234 | 
            -
                      return
         | 
| 235 | 
            -
                    end
         | 
| 236 | 
            -
             | 
| 237 | 
            -
                    get_article link[:href], vol, ish, agent
         | 
| 238 | 
            -
                  end
         | 
| 239 | 
            -
             | 
| 240 | 
            -
                  # @param path [String]
         | 
| 241 | 
            -
                  # @param vol [String]
         | 
| 242 | 
            -
                  # @param ish [String]
         | 
| 243 | 
            -
                  # @param agent [Mechanize]
         | 
| 244 | 
            -
                  # @return [RelatonBipm::BipmBibliographicItem]
         | 
| 245 | 
            -
                  def get_article(path, vol, ish, agent) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
         | 
| 246 | 
            -
                    agent.agent.allowed_error_codes = [403]
         | 
| 247 | 
            -
                    rsp = agent.get path
         | 
| 248 | 
            -
                    check_response rsp
         | 
| 249 | 
            -
                    title = rsp.at("//h1[@itemprop='headline']").children.to_xml
         | 
| 250 | 
            -
                    url = rsp.uri
         | 
| 251 | 
            -
                    bib = rsp.link_with(text: "BibTeX").href
         | 
| 252 | 
            -
                    rsp = agent.get bib
         | 
| 253 | 
            -
                    check_response rsp
         | 
| 254 | 
            -
                    bt = BibTeX.parse(rsp.body).first
         | 
| 255 | 
            -
                    bibitem(
         | 
| 256 | 
            -
                      docid: btdocid(bt), title: titles(title), date: btdate(bt),
         | 
| 257 | 
            -
                      abstract: btabstract(bt), doctype: bt.type.to_s, series: series,
         | 
| 258 | 
            -
                      link: btlink(bt, url), contributor: btcontrib(bt),
         | 
| 259 | 
            -
                      extent: btextent(vol, ish, bt.pages.to_s)
         | 
| 260 | 
            -
                    )
         | 
| 261 | 
            -
                  end
         | 
| 262 | 
            -
             | 
| 263 | 
            -
                  # @param args [Hash]
         | 
| 264 | 
            -
                  # @return [RelatonBipm::BipmBibliographicItem]
         | 
| 265 | 
            -
                  def bibitem(**args)
         | 
| 266 | 
            -
                    BipmBibliographicItem.new(
         | 
| 267 | 
            -
                      type: "article", language: ["en"], script: ["Latn"], **args,
         | 
| 268 | 
            -
                    )
         | 
| 269 | 
            -
                  end
         | 
| 270 | 
            -
             | 
| 271 | 
            -
                  # @return [Array<RelatonBib::Series>]
         | 
| 272 | 
            -
                  def series
         | 
| 273 | 
            -
                    [RelatonBib::Series.new(title: btitle("Metrologia"))]
         | 
| 274 | 
            -
                  end
         | 
| 275 | 
            -
             | 
| 276 | 
            -
                  # @param bibtex [BibTeX::Entry]
         | 
| 277 | 
            -
                  # @return [Array<RelatonBib::DocumentIdentifier>]
         | 
| 278 | 
            -
                  def btdocid(bibtex)
         | 
| 279 | 
            -
                    id = "#{bibtex.journal} #{bibtex.volume} #{bibtex.number} #{bibtex.pages.match(/^\d+/)}"
         | 
| 280 | 
            -
                    [
         | 
| 281 | 
            -
                      RelatonBib::DocumentIdentifier.new(type: "BIPM", id: id, primary: true),
         | 
| 282 | 
            -
                      RelatonBib::DocumentIdentifier.new(type: "DOI", id: bibtex.doi),
         | 
| 283 | 
            -
                    ]
         | 
| 284 | 
            -
                  end
         | 
| 285 | 
            -
             | 
| 286 | 
            -
                  # @param bibtex [BibTeX::Entry]
         | 
| 287 | 
            -
                  # @return [Array<RelatonBib::FormattedString>]
         | 
| 288 | 
            -
                  def btabstract(bibtex)
         | 
| 289 | 
            -
                    [RelatonBib::FormattedString.new(content: bibtex.abstract.to_s, language: "en", script: "Latn")]
         | 
| 290 | 
            -
                  end
         | 
| 291 | 
            -
             | 
| 292 | 
            -
                  # @param bibtex [BibTeX::Entry]
         | 
| 293 | 
            -
                  # @param ref [URI]
         | 
| 294 | 
            -
                  # @return [Array<RelatonBib::TypedUri>]
         | 
| 295 | 
            -
                  def btlink(bibtex, ref)
         | 
| 296 | 
            -
                    [
         | 
| 297 | 
            -
                      RelatonBib::TypedUri.new(type: "src", content: ref.to_s),
         | 
| 298 | 
            -
                      RelatonBib::TypedUri.new(type: "doi", content: bibtex.url.to_s),
         | 
| 299 | 
            -
                    ]
         | 
| 300 | 
            -
                  end
         | 
| 301 | 
            -
             | 
| 302 | 
            -
                  # @param bibtex [BibTeX::Entry]
         | 
| 303 | 
            -
                  # @return [Array<RelatonBib::BibliographicDate>]
         | 
| 304 | 
            -
                  def btdate(bibtex)
         | 
| 305 | 
            -
                    on = Date.new(bibtex.year.to_i, bibtex.month_numeric)
         | 
| 306 | 
            -
                    [RelatonBib::BibliographicDate.new(type: "published", on: on)]
         | 
| 307 | 
            -
                  end
         | 
| 308 | 
            -
             | 
| 309 | 
            -
                  # @param bibtex [BibTeX::Entry]
         | 
| 310 | 
            -
                  # @return [Array<Hash>]
         | 
| 311 | 
            -
                  def btcontrib(bibtex) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize
         | 
| 312 | 
            -
                    contribs = []
         | 
| 313 | 
            -
                    if bibtex.publisher && !bibtex.publisher.empty?
         | 
| 314 | 
            -
                      org = RelatonBib::Organization.new name: bibtex.publisher.to_s
         | 
| 315 | 
            -
                      contribs << { entity: org, role: [{ type: "publisher" }] }
         | 
| 316 | 
            -
                    end
         | 
| 317 | 
            -
                    return contribs unless bibtex.author && !bibtex.author.empty?
         | 
| 318 | 
            -
             | 
| 319 | 
            -
                    bibtex.author.split(" and ").inject(contribs) do |mem, name|
         | 
| 320 | 
            -
                      cname = RelatonBib::LocalizedString.new name, "en", "Latn"
         | 
| 321 | 
            -
                      name = RelatonBib::FullName.new completename: cname
         | 
| 322 | 
            -
                      author = RelatonBib::Person.new name: name
         | 
| 323 | 
            -
                      mem << { entity: author, role: [{ type: "author" }] }
         | 
| 324 | 
            -
                    end
         | 
| 325 | 
            -
                  end
         | 
| 326 | 
            -
             | 
| 327 | 
            -
                  #
         | 
| 328 | 
            -
                  # @param vol [String] volume
         | 
| 329 | 
            -
                  # @param ish [String] issue
         | 
| 330 | 
            -
                  # @param pgs [String] pages
         | 
| 331 | 
            -
                  #
         | 
| 332 | 
            -
                  # @return [Array<RelatonBib::BibItemLocality>]
         | 
| 333 | 
            -
                  #
         | 
| 334 | 
            -
                  def btextent(vol, ish = nil, pgs = nil)
         | 
| 335 | 
            -
                    ext = [RelatonBib::Locality.new("volume", vol)]
         | 
| 336 | 
            -
                    ext << RelatonBib::Locality.new("issue", ish) if ish
         | 
| 337 | 
            -
                    ext << RelatonBib::Locality.new("page", *pgs.split("--")) if pgs
         | 
| 338 | 
            -
                    ext
         | 
| 339 | 
            -
                  end
         | 
| 61 | 
            +
                  # def match_item(ids, ref_id)
         | 
| 62 | 
            +
                  #   ids.find { |id| Id.new(id) == ref_id }
         | 
| 63 | 
            +
                  # end
         | 
| 340 64 |  | 
| 341 65 | 
             
                  # @param ref [String] the BIPM standard Code to look up (e..g "BIPM B-11")
         | 
| 342 66 | 
             
                  # @param year [String] not used
         | 
| @@ -345,28 +69,6 @@ module RelatonBipm | |
| 345 69 | 
             
                  def get(ref, year = nil, opts = {})
         | 
| 346 70 | 
             
                    search(ref, year, opts)
         | 
| 347 71 | 
             
                  end
         | 
| 348 | 
            -
             | 
| 349 | 
            -
                  private
         | 
| 350 | 
            -
             | 
| 351 | 
            -
                  #
         | 
| 352 | 
            -
                  # Check HTTP response. Warn and rise error if response is not 200
         | 
| 353 | 
            -
                  #   or redirect to CAPTCHA.
         | 
| 354 | 
            -
                  #
         | 
| 355 | 
            -
                  # @param [Mechanize] rsp response
         | 
| 356 | 
            -
                  #
         | 
| 357 | 
            -
                  # @raise [RelatonBib::RequestError] if response is not 200
         | 
| 358 | 
            -
                  #
         | 
| 359 | 
            -
                  def check_response(rsp) # rubocop:disable Metrics/AbcSize
         | 
| 360 | 
            -
                    if rsp.code == "302"
         | 
| 361 | 
            -
                      warn "[relaton-bipm] This source employs anti-DDoS measures that unfortunately affects automated requests."
         | 
| 362 | 
            -
                      warn "[relaton-bipm] Please visit this link in your browser to resolve the CAPTCHA, then retry: #{rsp.uri}"
         | 
| 363 | 
            -
                      # warn "[relaton-bipm] #{rsp.uri} is redirected to #{rsp.header['location']}"
         | 
| 364 | 
            -
                      raise RelatonBib::RequestError, "cannot access #{rsp.uri}"
         | 
| 365 | 
            -
                    elsif rsp.code != "200" && rsp.code != "403"
         | 
| 366 | 
            -
                      warn "[read_bipm] can't acces #{rsp.uri} #{rsp.code}"
         | 
| 367 | 
            -
                      raise RelatonBib::RequestError, "cannot acces #{rsp.uri} #{rsp.code}"
         | 
| 368 | 
            -
                    end
         | 
| 369 | 
            -
                  end
         | 
| 370 72 | 
             
                end
         | 
| 371 73 | 
             
              end
         | 
| 372 74 | 
             
            end
         | 
| @@ -6,7 +6,7 @@ module RelatonBipm | |
| 6 6 | 
             
                # @param [RelatonBipm::DataFetcher] data_fetcher data fetcher
         | 
| 7 7 | 
             
                #
         | 
| 8 8 | 
             
                def initialize(data_fetcher)
         | 
| 9 | 
            -
                  @data_fetcher = data_fetcher
         | 
| 9 | 
            +
                  @data_fetcher = WeakRef.new data_fetcher
         | 
| 10 10 | 
             
                end
         | 
| 11 11 |  | 
| 12 12 | 
             
                #
         | 
| @@ -27,14 +27,18 @@ module RelatonBipm | |
| 27 27 | 
             
                  # puts "Ls #{Dir['bipm-si-brochure/*']}"
         | 
| 28 28 | 
             
                  # puts "Ls #{Dir['bipm-si-brochure/site/*']}"
         | 
| 29 29 | 
             
                  # puts "Ls #{Dir['bipm-si-brochure/site/documents/*']}"
         | 
| 30 | 
            -
                  Dir["bipm-si-brochure/ | 
| 30 | 
            +
                  Dir["bipm-si-brochure/_site/documents/*.rxl"].each do |f|
         | 
| 31 31 | 
             
                    puts "Parsing #{f}"
         | 
| 32 32 | 
             
                    docstd = Nokogiri::XML File.read f
         | 
| 33 33 | 
             
                    doc = docstd.at "/bibdata"
         | 
| 34 34 | 
             
                    hash1 = RelatonBipm::XMLParser.from_xml(doc.to_xml).to_hash
         | 
| 35 35 | 
             
                    fix_si_brochure_id hash1
         | 
| 36 | 
            -
                     | 
| 37 | 
            -
                     | 
| 36 | 
            +
                    basename = File.join @data_fetcher.output, File.basename(f).sub(/(?:-(?:en|fr))?\.rxl$/, "")
         | 
| 37 | 
            +
                    outfile = "#{basename}.#{@data_fetcher.ext}"
         | 
| 38 | 
            +
                    key = hash1["docnumber"] || basename
         | 
| 39 | 
            +
                    @data_fetcher.index[[key]] = outfile
         | 
| 40 | 
            +
                    @data_fetcher.index_new.add_or_update [key], outfile
         | 
| 41 | 
            +
                    @data_fetcher.index2.add_or_update Id.new(key).normalized_hash, outfile
         | 
| 38 42 | 
             
                    hash = if File.exist? outfile
         | 
| 39 43 | 
             
                             warn_duplicate = false
         | 
| 40 44 | 
             
                             hash2 = YAML.load_file outfile
         | 
| @@ -1,6 +1,6 @@ | |
| 1 1 | 
             
            module RelatonBipm
         | 
| 2 2 | 
             
              class DataFetcher
         | 
| 3 | 
            -
                attr_reader :output, :format, :ext, :files, :index
         | 
| 3 | 
            +
                attr_reader :output, :format, :ext, :files, :index, :index_new, :index2
         | 
| 4 4 |  | 
| 5 5 | 
             
                #
         | 
| 6 6 | 
             
                # Initialize fetcher
         | 
| @@ -15,6 +15,8 @@ module RelatonBipm | |
| 15 15 | 
             
                  @files = []
         | 
| 16 16 | 
             
                  @index_path = "index.yaml"
         | 
| 17 17 | 
             
                  @index = File.exist?(@index_path) ? YAML.load_file(@index_path) : {}
         | 
| 18 | 
            +
                  @index_new = Relaton::Index.find_or_create :BIPM, file: "index-bipm.yaml"
         | 
| 19 | 
            +
                  @index2 = Relaton::Index.find_or_create :BIPM, file: "index2.yaml"
         | 
| 18 20 | 
             
                end
         | 
| 19 21 |  | 
| 20 22 | 
             
                #
         | 
| @@ -43,8 +45,11 @@ module RelatonBipm | |
| 43 45 | 
             
                  case source
         | 
| 44 46 | 
             
                  when "bipm-data-outcomes" then DataOutcomesParser.parse(self)
         | 
| 45 47 | 
             
                  when "bipm-si-brochure" then BipmSiBrochureParser.parse(self)
         | 
| 48 | 
            +
                  when "rawdata-bipm-metrologia" then RawdataBipmMetrologia::Fetcher.fetch(self)
         | 
| 46 49 | 
             
                  end
         | 
| 47 | 
            -
                  File.write @index_path,  | 
| 50 | 
            +
                  File.write @index_path, index.to_yaml, encoding: "UTF-8"
         | 
| 51 | 
            +
                  index_new.save
         | 
| 52 | 
            +
                  index2.save
         | 
| 48 53 | 
             
                end
         | 
| 49 54 |  | 
| 50 55 | 
             
                #
         | 
| @@ -54,15 +59,22 @@ module RelatonBipm | |
| 54 59 | 
             
                # @param [RelatonBipm::BipmBibliographicItem] item document to save
         | 
| 55 60 | 
             
                # @param [Boolean, nil] warn_duplicate Warn if document already exists
         | 
| 56 61 | 
             
                #
         | 
| 57 | 
            -
                # @return [<Type>] <description>
         | 
| 58 | 
            -
                #
         | 
| 59 62 | 
             
                def write_file(path, item, warn_duplicate: true)
         | 
| 63 | 
            +
                  content = serialize item
         | 
| 60 64 | 
             
                  if @files.include?(path)
         | 
| 61 65 | 
             
                    warn "File #{path} already exists" if warn_duplicate
         | 
| 62 66 | 
             
                  else
         | 
| 63 67 | 
             
                    @files << path
         | 
| 64 68 | 
             
                  end
         | 
| 65 | 
            -
                  File.write path,  | 
| 69 | 
            +
                  File.write path, content, encoding: "UTF-8"
         | 
| 70 | 
            +
                end
         | 
| 71 | 
            +
             | 
| 72 | 
            +
                def serialize(item)
         | 
| 73 | 
            +
                  case @format
         | 
| 74 | 
            +
                  when "xml" then item.to_xml bibdata: true
         | 
| 75 | 
            +
                  when "yaml" then item.to_hash.to_yaml
         | 
| 76 | 
            +
                  when "bibxml" then item.to_bibxml
         | 
| 77 | 
            +
                  end
         | 
| 66 78 | 
             
                end
         | 
| 67 79 | 
             
              end
         | 
| 68 80 | 
             
            end
         |