relaton-iso 2.0.0.pre.alpha.2 → 2.0.0.pre.alpha.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/rake.yml +1 -2
- data/CLAUDE.md +33 -0
- data/README.adoc +2 -0
- data/Rakefile +22 -0
- data/lib/relaton/iso/bibliography.rb +83 -2
- data/lib/relaton/iso/hit_collection.rb +20 -5
- data/lib/relaton/iso/version.rb +1 -1
- data/relaton_iso.gemspec +1 -1
- metadata +4 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: a4e9325119df51d5aea288a6d612d7353107a25ac3030715dcf910ffaea13c26
|
|
4
|
+
data.tar.gz: e8167236cafc25b54a3a4d8d4bcee86dca7326103aec873e3534bf896e9c7498
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 9166abf7a46ec9604872543c9e572eebb03c34146635c156393cdd4e27c684a94e657daec4f55a889e928e38453fa1c690b6bc13549c7a67f80c6d7a9de447ae
|
|
7
|
+
data.tar.gz: 1a57e63a6faca62cf6ed1b6a6d0e346ae783f30d9d2bf48ef833cdeb6282f04b07e2af468183cedd0bf5367b12d71daeb9e5aad0574207c746d6c1cd6bcb0ca0
|
data/.github/workflows/rake.yml
CHANGED
data/CLAUDE.md
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Development
|
|
6
|
+
|
|
7
|
+
- `bin/setup` — install dependencies
|
|
8
|
+
- `rake spec` — run tests
|
|
9
|
+
- `rspec spec/relaton/iso/bibliography_spec.rb` — run a single test file
|
|
10
|
+
- `rspec spec/relaton/iso/bibliography_spec.rb -e "some description"` — run a specific example
|
|
11
|
+
- `rake spec:update_index` — download latest ISO index fixture (`spec/fixtures/index-v1.zip`) from relaton-data-iso
|
|
12
|
+
- `bin/console` — interactive prompt with the gem loaded
|
|
13
|
+
- `rubocop` — lint (Ribose OSS style guide, Ruby 3.2 target)
|
|
14
|
+
|
|
15
|
+
## Architecture
|
|
16
|
+
|
|
17
|
+
relaton-iso retrieves ISO standard bibliographic data. The core retrieval flow:
|
|
18
|
+
|
|
19
|
+
1. **Bibliography** (`lib/relaton/iso/bibliography.rb`) — entry point via `search(pubid)` and `get(ref, year, opts)`. Handles year filtering, part matching, and type/stage validation.
|
|
20
|
+
2. **HitCollection** (`lib/relaton/iso/hit_collection.rb`) — searches a pre-built YAML index (`index-v1.zip` from relaton-data-iso) using `Relaton::Index`. Matches on `id_keys`: publisher, number, copublisher, part, year, edition, type, stage, iteration. Returns sorted Hit array.
|
|
21
|
+
3. **Hit** (`lib/relaton/iso/hit.rb`) — wraps an index result. The `item` attribute lazy-loads the full document from GitHub raw content (relaton-data-iso repo). `sort_weight` prioritizes published over withdrawn/deleted.
|
|
22
|
+
4. **ItemData** / **Model::Item** — ISO-specific bibliographic item extending `Relaton::Bib::ItemData`.
|
|
23
|
+
5. **Scraper** (`lib/relaton/iso/scraper.rb`) — parses ISO website pages for metadata (used by DataFetcher for bulk operations, not the normal lookup path).
|
|
24
|
+
6. **DataFetcher** (`lib/relaton/iso/data_fetcher.rb`) — bulk fetches from ISO.org ICS pages using 3 threads with a persistent queue for resumability.
|
|
25
|
+
|
|
26
|
+
Key dependency: `pubid-iso` gem handles ISO publication identifier parsing and comparison.
|
|
27
|
+
|
|
28
|
+
## Testing
|
|
29
|
+
|
|
30
|
+
- **Framework:** RSpec with VCR cassettes and WebMock
|
|
31
|
+
- **Network access:** fully blocked via `WebMock.disable_net_connect!`
|
|
32
|
+
- **Index fixture:** `spec/fixtures/index-v1.zip` is served by WebMock stub (configured in `spec/support/webmock.rb`). Run `rake spec:update_index` to refresh when upstream data changes.
|
|
33
|
+
- **VCR:** cassettes in `spec/vcr_cassettes/`, record mode `:once`, re-record interval 7 days. Index download requests are ignored by VCR (handled by WebMock stub instead).
|
data/README.adoc
CHANGED
|
@@ -360,6 +360,8 @@ RelatonIso uses the relaton-logger gem for logging. By default, it logs to STDOU
|
|
|
360
360
|
|
|
361
361
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
|
362
362
|
|
|
363
|
+
To update the ISO index test fixture (used by WebMock in tests), run `rake spec:update_index`. This downloads the latest `index-v1.zip` from the https://github.com/relaton/relaton-data-iso[relaton-data-iso] repository.
|
|
364
|
+
|
|
363
365
|
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
|
364
366
|
|
|
365
367
|
|
data/Rakefile
CHANGED
|
@@ -4,3 +4,25 @@ require "rspec/core/rake_task"
|
|
|
4
4
|
RSpec::Core::RakeTask.new(:spec)
|
|
5
5
|
|
|
6
6
|
task :default => :spec
|
|
7
|
+
|
|
8
|
+
namespace :spec do
|
|
9
|
+
desc "Download latest ISO index fixture from relaton-data-iso"
|
|
10
|
+
task :update_index do
|
|
11
|
+
require "net/http"
|
|
12
|
+
require "uri"
|
|
13
|
+
|
|
14
|
+
url = "https://raw.githubusercontent.com/relaton/relaton-data-iso/data-v2/index-v1.zip"
|
|
15
|
+
dest = File.join(__dir__, "spec", "fixtures", "index-v1.zip")
|
|
16
|
+
|
|
17
|
+
puts "Downloading #{url} ..."
|
|
18
|
+
uri = URI.parse(url)
|
|
19
|
+
response = Net::HTTP.get_response(uri)
|
|
20
|
+
|
|
21
|
+
if response.is_a?(Net::HTTPSuccess)
|
|
22
|
+
File.binwrite(dest, response.body)
|
|
23
|
+
puts "Updated #{dest} (#{response.body.bytesize} bytes)"
|
|
24
|
+
else
|
|
25
|
+
abort "Failed to download: HTTP #{response.code}"
|
|
26
|
+
end
|
|
27
|
+
end
|
|
28
|
+
end
|
|
@@ -1,5 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require "date"
|
|
4
|
+
|
|
3
5
|
# require 'relaton_iso/iso_bibliographic_item'
|
|
4
6
|
# require "relaton_iso/scrapper"
|
|
5
7
|
# require "relaton_iso/hit_collection"
|
|
@@ -44,12 +46,19 @@ module Relaton
|
|
|
44
46
|
|
|
45
47
|
hits, missed_year_ids = isobib_search_filter(query_pubid, opts)
|
|
46
48
|
tip_ids = look_up_with_any_types_stages(hits, ref, opts)
|
|
47
|
-
|
|
49
|
+
|
|
50
|
+
date_filter = opts[:publication_date_before] || opts[:publication_date_after]
|
|
51
|
+
if date_filter && !query_pubid.root.all_parts
|
|
52
|
+
ret = find_match_by_date(hits, query_pubid, opts)
|
|
53
|
+
else
|
|
54
|
+
ret = hits.fetch_doc(date_filter ? opts : {})
|
|
55
|
+
end
|
|
48
56
|
return fetch_ref_err(query_pubid, missed_year_ids, tip_ids) unless ret
|
|
49
57
|
|
|
50
58
|
response_pubid = ret.docidentifier.find(&:primary) # .sub(" (all parts)", "")
|
|
51
59
|
Util.info "Found: `#{response_pubid}`", key: query_pubid.to_s
|
|
52
|
-
get_all = (query_pubid.root.year && opts[:keep_year].nil?) || opts[:keep_year] || opts[:all_parts]
|
|
60
|
+
get_all = (query_pubid.root.year && opts[:keep_year].nil?) || opts[:keep_year] || opts[:all_parts] ||
|
|
61
|
+
opts[:publication_date_before] || opts[:publication_date_after]
|
|
53
62
|
return ret if get_all
|
|
54
63
|
|
|
55
64
|
ret.to_most_recent_reference
|
|
@@ -109,6 +118,78 @@ module Relaton
|
|
|
109
118
|
|
|
110
119
|
private
|
|
111
120
|
|
|
121
|
+
# Find the best match among hits using date filters.
|
|
122
|
+
# @param hits [Relaton::Iso::HitCollection]
|
|
123
|
+
# @param pubid [Pubid::Iso::Identifier]
|
|
124
|
+
# @param opts [Hash]
|
|
125
|
+
# @return [Relaton::Iso::ItemData, nil]
|
|
126
|
+
def find_match_by_date(hits, pubid, opts) # rubocop:disable Metrics/AbcSize
|
|
127
|
+
candidates = []
|
|
128
|
+
hits.each { |h| candidates << h if year_in_range?(hit_year(h), opts) }
|
|
129
|
+
candidates.sort_by { |h| -hit_year(h) }.each do |h|
|
|
130
|
+
ret = fetch_and_check_date(h, pubid, opts)
|
|
131
|
+
return ret if ret
|
|
132
|
+
end
|
|
133
|
+
nil
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
# Extract year from a hit as an integer.
|
|
137
|
+
# @param hit [Relaton::Iso::Hit]
|
|
138
|
+
# @return [Integer]
|
|
139
|
+
def hit_year(hit)
|
|
140
|
+
yr = hit.pubid&.year || hit.hit[:year]
|
|
141
|
+
yr.to_i
|
|
142
|
+
end
|
|
143
|
+
|
|
144
|
+
# Check if a year falls within the date filter range.
|
|
145
|
+
# @param year [Integer]
|
|
146
|
+
# @param opts [Hash]
|
|
147
|
+
# @return [Boolean]
|
|
148
|
+
def year_in_range?(year, opts)
|
|
149
|
+
return false if year.zero?
|
|
150
|
+
|
|
151
|
+
if opts[:publication_date_before]
|
|
152
|
+
return false if year > opts[:publication_date_before].year
|
|
153
|
+
end
|
|
154
|
+
if opts[:publication_date_after]
|
|
155
|
+
return false if year < opts[:publication_date_after].year
|
|
156
|
+
end
|
|
157
|
+
true
|
|
158
|
+
end
|
|
159
|
+
|
|
160
|
+
# Check if the item's published date falls within the filter range.
|
|
161
|
+
# @param item [Relaton::Iso::ItemData]
|
|
162
|
+
# @param opts [Hash]
|
|
163
|
+
# @return [Boolean]
|
|
164
|
+
def publication_date_in_range?(item, opts)
|
|
165
|
+
pub_date_entry = item.date.find { |d| d.type == "published" }
|
|
166
|
+
return true unless pub_date_entry&.at
|
|
167
|
+
|
|
168
|
+
pub_date = pub_date_entry.at.to_date
|
|
169
|
+
return true unless pub_date
|
|
170
|
+
|
|
171
|
+
if opts[:publication_date_before]
|
|
172
|
+
return false if pub_date >= opts[:publication_date_before]
|
|
173
|
+
end
|
|
174
|
+
if opts[:publication_date_after]
|
|
175
|
+
return false if pub_date < opts[:publication_date_after]
|
|
176
|
+
end
|
|
177
|
+
true
|
|
178
|
+
end
|
|
179
|
+
|
|
180
|
+
# Fetch the item for a hit and check if its publication date is in range.
|
|
181
|
+
# @param hit [Relaton::Iso::Hit]
|
|
182
|
+
# @param pubid [Pubid::Iso::Identifier]
|
|
183
|
+
# @param opts [Hash]
|
|
184
|
+
# @return [Relaton::Iso::ItemData, nil]
|
|
185
|
+
def fetch_and_check_date(hit, pubid, opts)
|
|
186
|
+
ret = hit.item
|
|
187
|
+
if publication_date_in_range?(ret, opts)
|
|
188
|
+
Util.info "Found: `#{ret.docidentifier.first.content}`", key: pubid.to_s
|
|
189
|
+
ret
|
|
190
|
+
end
|
|
191
|
+
end
|
|
192
|
+
|
|
112
193
|
def check_year(year, hit) # rubocop:disable Metrics/AbcSize
|
|
113
194
|
(hit.pubid.base.nil? && hit.pubid.year.to_s == year.to_s) ||
|
|
114
195
|
(!hit.pubid.base.nil? && hit.pubid.base.year.to_s == year.to_s) ||
|
|
@@ -31,7 +31,7 @@ module Relaton
|
|
|
31
31
|
#
|
|
32
32
|
def find # rubocop:disable Metrics/AbcSize
|
|
33
33
|
@array = index.search do |row|
|
|
34
|
-
row[:id].is_a?(Hash) ? pubid_match?(row[:id]) : ref.to_s(with_prf: true) == row[:id]
|
|
34
|
+
row[:id].is_a?(Hash) || row[:id].is_a?(::Pubid::Core::Identifier::Base) ? pubid_match?(row[:id]) : ref.to_s(with_prf: true) == row[:id]
|
|
35
35
|
end.map { |row| Hit.new row, self }
|
|
36
36
|
.sort_by! { |h| h.pubid.to_s }
|
|
37
37
|
.reverse!
|
|
@@ -49,6 +49,8 @@ module Relaton
|
|
|
49
49
|
end
|
|
50
50
|
|
|
51
51
|
def create_pubid(id)
|
|
52
|
+
return id if id.is_a?(::Pubid::Core::Identifier::Base)
|
|
53
|
+
|
|
52
54
|
::Pubid::Iso::Identifier.create(**id)
|
|
53
55
|
rescue StandardError => e
|
|
54
56
|
Util.warn e.message, key: ref.to_s
|
|
@@ -74,15 +76,24 @@ module Relaton
|
|
|
74
76
|
excl_attrs << :iteration
|
|
75
77
|
end
|
|
76
78
|
# excl_parts << :edition if ref.root.edition.nil? || all_parts
|
|
77
|
-
@
|
|
79
|
+
@excludings = excl_attrs
|
|
78
80
|
end
|
|
79
81
|
|
|
80
82
|
def index
|
|
81
|
-
@index ||= Relaton::Index.find_or_create
|
|
83
|
+
@index ||= Relaton::Index.find_or_create(
|
|
84
|
+
:iso,
|
|
85
|
+
url: "#{ENDPOINT}#{INDEXFILE}.zip",
|
|
86
|
+
file: "#{INDEXFILE}.yaml",
|
|
87
|
+
id_keys: %i[publisher number copublisher part year edition type stage
|
|
88
|
+
iteration joint_document tctype sctype wgtype tcnumber
|
|
89
|
+
scnumber wgnumber dirtype base supplements addendum
|
|
90
|
+
jtc_dir month amendments corrigendums language],
|
|
91
|
+
pubid_class: ::Pubid::Iso::Identifier,
|
|
92
|
+
)
|
|
82
93
|
end
|
|
83
94
|
|
|
84
95
|
def fetch_doc(options = {})
|
|
85
|
-
@
|
|
96
|
+
@excludings = nil if options != opts
|
|
86
97
|
@opts = options
|
|
87
98
|
|
|
88
99
|
if !ref.root.all_parts || size == 1
|
|
@@ -94,7 +105,11 @@ module Relaton
|
|
|
94
105
|
|
|
95
106
|
# @return [RelatonIsoBib::IsoBibliographicItem, nil]
|
|
96
107
|
def to_all_parts # rubocop:disable Metrics/AbcSize,Metrics/CyclomaticComplexity
|
|
97
|
-
|
|
108
|
+
parts = @array.select { |h| h.pubid.part }
|
|
109
|
+
if opts[:publication_date_before] || opts[:publication_date_after]
|
|
110
|
+
parts = parts.select { |h| Bibliography.send(:year_in_range?, (h.pubid.year || h.hit[:year]).to_i, opts) }
|
|
111
|
+
end
|
|
112
|
+
hit = parts.min_by { |h| h.pubid.part.to_i }
|
|
98
113
|
return @array.first&.item unless hit
|
|
99
114
|
|
|
100
115
|
bibitem = hit.item
|
data/lib/relaton/iso/version.rb
CHANGED
data/relaton_iso.gemspec
CHANGED
|
@@ -28,7 +28,7 @@ Gem::Specification.new do |spec|
|
|
|
28
28
|
|
|
29
29
|
spec.add_dependency "isoics", "~> 0.1.6"
|
|
30
30
|
spec.add_dependency "openssl", "~> 3.3.2" # 3.3.0 raised an error on Ruby 3.4.7
|
|
31
|
-
spec.add_dependency "pubid-iso", "~> 1.15.
|
|
31
|
+
spec.add_dependency "pubid-iso", "~> 1.15.8"
|
|
32
32
|
spec.add_dependency "relaton-bib", "~> 2.0.0-alpha.4"
|
|
33
33
|
spec.add_dependency "relaton-core", "~> 0.0.9"
|
|
34
34
|
spec.add_dependency "relaton-index", "~> 0.2.12"
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: relaton-iso
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 2.0.0.pre.alpha.
|
|
4
|
+
version: 2.0.0.pre.alpha.3
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Ribose Inc.
|
|
@@ -43,14 +43,14 @@ dependencies:
|
|
|
43
43
|
requirements:
|
|
44
44
|
- - "~>"
|
|
45
45
|
- !ruby/object:Gem::Version
|
|
46
|
-
version: 1.15.
|
|
46
|
+
version: 1.15.8
|
|
47
47
|
type: :runtime
|
|
48
48
|
prerelease: false
|
|
49
49
|
version_requirements: !ruby/object:Gem::Requirement
|
|
50
50
|
requirements:
|
|
51
51
|
- - "~>"
|
|
52
52
|
- !ruby/object:Gem::Version
|
|
53
|
-
version: 1.15.
|
|
53
|
+
version: 1.15.8
|
|
54
54
|
- !ruby/object:Gem::Dependency
|
|
55
55
|
name: relaton-bib
|
|
56
56
|
requirement: !ruby/object:Gem::Requirement
|
|
@@ -107,6 +107,7 @@ files:
|
|
|
107
107
|
- ".hound.yml"
|
|
108
108
|
- ".rspec"
|
|
109
109
|
- ".rubocop.yml"
|
|
110
|
+
- CLAUDE.md
|
|
110
111
|
- CODE_OF_CONDUCT.md
|
|
111
112
|
- Gemfile
|
|
112
113
|
- LICENSE.txt
|