relaton-iso 2.0.0.pre.alpha.2 → 2.0.0.pre.alpha.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 52a7ad701da999890a9e5947a408ed028c14a23c1beb268f2be990299f5c44c6
4
- data.tar.gz: e80e496c6198a01d2ea9b5ed07a9511184adfc4ca6963d619af9e2de28a1a2e1
3
+ metadata.gz: a4e9325119df51d5aea288a6d612d7353107a25ac3030715dcf910ffaea13c26
4
+ data.tar.gz: e8167236cafc25b54a3a4d8d4bcee86dca7326103aec873e3534bf896e9c7498
5
5
  SHA512:
6
- metadata.gz: 35a4b249729a7ff7f1df79088e4cb780f0dd86c25145bea9201a16f86f3fed88ca19f65eab16df7d47179382dc53d2c6b302136eb6c63015e6c485d8dea87c93
7
- data.tar.gz: 54053f7fb0344f53c69679c5c14dd1fd2452fbbbf5c9d1b8b69ed8842222e50769f633107b3ffaef5a040d87a73e047a7edd948836a4a793cbfafd565db5ed13
6
+ metadata.gz: 9166abf7a46ec9604872543c9e572eebb03c34146635c156393cdd4e27c684a94e657daec4f55a889e928e38453fa1c690b6bc13549c7a67f80c6d7a9de447ae
7
+ data.tar.gz: 1a57e63a6faca62cf6ed1b6a6d0e346ae783f30d9d2bf48ef833cdeb6282f04b07e2af468183cedd0bf5367b12d71daeb9e5aad0574207c746d6c1cd6bcb0ca0
@@ -7,8 +7,7 @@ on:
7
7
  branches: [ master, main ]
8
8
  tags: [ v* ]
9
9
  pull_request:
10
- schedule:
11
- - cron: '0 0 * * *'
10
+ workflow_dispatch:
12
11
 
13
12
  jobs:
14
13
  rake:
data/CLAUDE.md ADDED
@@ -0,0 +1,33 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Development
6
+
7
+ - `bin/setup` — install dependencies
8
+ - `rake spec` — run tests
9
+ - `rspec spec/relaton/iso/bibliography_spec.rb` — run a single test file
10
+ - `rspec spec/relaton/iso/bibliography_spec.rb -e "some description"` — run a specific example
11
+ - `rake spec:update_index` — download latest ISO index fixture (`spec/fixtures/index-v1.zip`) from relaton-data-iso
12
+ - `bin/console` — interactive prompt with the gem loaded
13
+ - `rubocop` — lint (Ribose OSS style guide, Ruby 3.2 target)
14
+
15
+ ## Architecture
16
+
17
+ relaton-iso retrieves ISO standard bibliographic data. The core retrieval flow:
18
+
19
+ 1. **Bibliography** (`lib/relaton/iso/bibliography.rb`) — entry point via `search(pubid)` and `get(ref, year, opts)`. Handles year filtering, part matching, and type/stage validation.
20
+ 2. **HitCollection** (`lib/relaton/iso/hit_collection.rb`) — searches a pre-built YAML index (`index-v1.zip` from relaton-data-iso) using `Relaton::Index`. Matches on `id_keys`: publisher, number, copublisher, part, year, edition, type, stage, iteration. Returns sorted Hit array.
21
+ 3. **Hit** (`lib/relaton/iso/hit.rb`) — wraps an index result. The `item` attribute lazy-loads the full document from GitHub raw content (relaton-data-iso repo). `sort_weight` prioritizes published over withdrawn/deleted.
22
+ 4. **ItemData** / **Model::Item** — ISO-specific bibliographic item extending `Relaton::Bib::ItemData`.
23
+ 5. **Scraper** (`lib/relaton/iso/scraper.rb`) — parses ISO website pages for metadata (used by DataFetcher for bulk operations, not the normal lookup path).
24
+ 6. **DataFetcher** (`lib/relaton/iso/data_fetcher.rb`) — bulk fetches from ISO.org ICS pages using 3 threads with a persistent queue for resumability.
25
+
26
+ Key dependency: `pubid-iso` gem handles ISO publication identifier parsing and comparison.
27
+
28
+ ## Testing
29
+
30
+ - **Framework:** RSpec with VCR cassettes and WebMock
31
+ - **Network access:** fully blocked via `WebMock.disable_net_connect!`
32
+ - **Index fixture:** `spec/fixtures/index-v1.zip` is served by WebMock stub (configured in `spec/support/webmock.rb`). Run `rake spec:update_index` to refresh when upstream data changes.
33
+ - **VCR:** cassettes in `spec/vcr_cassettes/`, record mode `:once`, re-record interval 7 days. Index download requests are ignored by VCR (handled by WebMock stub instead).
data/README.adoc CHANGED
@@ -360,6 +360,8 @@ RelatonIso uses the relaton-logger gem for logging. By default, it logs to STDOU
360
360
 
361
361
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
362
362
 
363
+ To update the ISO index test fixture (used by WebMock in tests), run `rake spec:update_index`. This downloads the latest `index-v1.zip` from the https://github.com/relaton/relaton-data-iso[relaton-data-iso] repository.
364
+
363
365
  To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
364
366
 
365
367
 
data/Rakefile CHANGED
@@ -4,3 +4,25 @@ require "rspec/core/rake_task"
4
4
  RSpec::Core::RakeTask.new(:spec)
5
5
 
6
6
  task :default => :spec
7
+
8
+ namespace :spec do
9
+ desc "Download latest ISO index fixture from relaton-data-iso"
10
+ task :update_index do
11
+ require "net/http"
12
+ require "uri"
13
+
14
+ url = "https://raw.githubusercontent.com/relaton/relaton-data-iso/data-v2/index-v1.zip"
15
+ dest = File.join(__dir__, "spec", "fixtures", "index-v1.zip")
16
+
17
+ puts "Downloading #{url} ..."
18
+ uri = URI.parse(url)
19
+ response = Net::HTTP.get_response(uri)
20
+
21
+ if response.is_a?(Net::HTTPSuccess)
22
+ File.binwrite(dest, response.body)
23
+ puts "Updated #{dest} (#{response.body.bytesize} bytes)"
24
+ else
25
+ abort "Failed to download: HTTP #{response.code}"
26
+ end
27
+ end
28
+ end
@@ -1,5 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require "date"
4
+
3
5
  # require 'relaton_iso/iso_bibliographic_item'
4
6
  # require "relaton_iso/scrapper"
5
7
  # require "relaton_iso/hit_collection"
@@ -44,12 +46,19 @@ module Relaton
44
46
 
45
47
  hits, missed_year_ids = isobib_search_filter(query_pubid, opts)
46
48
  tip_ids = look_up_with_any_types_stages(hits, ref, opts)
47
- ret = hits.fetch_doc
49
+
50
+ date_filter = opts[:publication_date_before] || opts[:publication_date_after]
51
+ if date_filter && !query_pubid.root.all_parts
52
+ ret = find_match_by_date(hits, query_pubid, opts)
53
+ else
54
+ ret = hits.fetch_doc(date_filter ? opts : {})
55
+ end
48
56
  return fetch_ref_err(query_pubid, missed_year_ids, tip_ids) unless ret
49
57
 
50
58
  response_pubid = ret.docidentifier.find(&:primary) # .sub(" (all parts)", "")
51
59
  Util.info "Found: `#{response_pubid}`", key: query_pubid.to_s
52
- get_all = (query_pubid.root.year && opts[:keep_year].nil?) || opts[:keep_year] || opts[:all_parts]
60
+ get_all = (query_pubid.root.year && opts[:keep_year].nil?) || opts[:keep_year] || opts[:all_parts] ||
61
+ opts[:publication_date_before] || opts[:publication_date_after]
53
62
  return ret if get_all
54
63
 
55
64
  ret.to_most_recent_reference
@@ -109,6 +118,78 @@ module Relaton
109
118
 
110
119
  private
111
120
 
121
+ # Find the best match among hits using date filters.
122
+ # @param hits [Relaton::Iso::HitCollection]
123
+ # @param pubid [Pubid::Iso::Identifier]
124
+ # @param opts [Hash]
125
+ # @return [Relaton::Iso::ItemData, nil]
126
+ def find_match_by_date(hits, pubid, opts) # rubocop:disable Metrics/AbcSize
127
+ candidates = []
128
+ hits.each { |h| candidates << h if year_in_range?(hit_year(h), opts) }
129
+ candidates.sort_by { |h| -hit_year(h) }.each do |h|
130
+ ret = fetch_and_check_date(h, pubid, opts)
131
+ return ret if ret
132
+ end
133
+ nil
134
+ end
135
+
136
+ # Extract year from a hit as an integer.
137
+ # @param hit [Relaton::Iso::Hit]
138
+ # @return [Integer]
139
+ def hit_year(hit)
140
+ yr = hit.pubid&.year || hit.hit[:year]
141
+ yr.to_i
142
+ end
143
+
144
+ # Check if a year falls within the date filter range.
145
+ # @param year [Integer]
146
+ # @param opts [Hash]
147
+ # @return [Boolean]
148
+ def year_in_range?(year, opts)
149
+ return false if year.zero?
150
+
151
+ if opts[:publication_date_before]
152
+ return false if year > opts[:publication_date_before].year
153
+ end
154
+ if opts[:publication_date_after]
155
+ return false if year < opts[:publication_date_after].year
156
+ end
157
+ true
158
+ end
159
+
160
+ # Check if the item's published date falls within the filter range.
161
+ # @param item [Relaton::Iso::ItemData]
162
+ # @param opts [Hash]
163
+ # @return [Boolean]
164
+ def publication_date_in_range?(item, opts)
165
+ pub_date_entry = item.date.find { |d| d.type == "published" }
166
+ return true unless pub_date_entry&.at
167
+
168
+ pub_date = pub_date_entry.at.to_date
169
+ return true unless pub_date
170
+
171
+ if opts[:publication_date_before]
172
+ return false if pub_date >= opts[:publication_date_before]
173
+ end
174
+ if opts[:publication_date_after]
175
+ return false if pub_date < opts[:publication_date_after]
176
+ end
177
+ true
178
+ end
179
+
180
+ # Fetch the item for a hit and check if its publication date is in range.
181
+ # @param hit [Relaton::Iso::Hit]
182
+ # @param pubid [Pubid::Iso::Identifier]
183
+ # @param opts [Hash]
184
+ # @return [Relaton::Iso::ItemData, nil]
185
+ def fetch_and_check_date(hit, pubid, opts)
186
+ ret = hit.item
187
+ if publication_date_in_range?(ret, opts)
188
+ Util.info "Found: `#{ret.docidentifier.first.content}`", key: pubid.to_s
189
+ ret
190
+ end
191
+ end
192
+
112
193
  def check_year(year, hit) # rubocop:disable Metrics/AbcSize
113
194
  (hit.pubid.base.nil? && hit.pubid.year.to_s == year.to_s) ||
114
195
  (!hit.pubid.base.nil? && hit.pubid.base.year.to_s == year.to_s) ||
@@ -31,7 +31,7 @@ module Relaton
31
31
  #
32
32
  def find # rubocop:disable Metrics/AbcSize
33
33
  @array = index.search do |row|
34
- row[:id].is_a?(Hash) ? pubid_match?(row[:id]) : ref.to_s(with_prf: true) == row[:id]
34
+ row[:id].is_a?(Hash) || row[:id].is_a?(::Pubid::Core::Identifier::Base) ? pubid_match?(row[:id]) : ref.to_s(with_prf: true) == row[:id]
35
35
  end.map { |row| Hit.new row, self }
36
36
  .sort_by! { |h| h.pubid.to_s }
37
37
  .reverse!
@@ -49,6 +49,8 @@ module Relaton
49
49
  end
50
50
 
51
51
  def create_pubid(id)
52
+ return id if id.is_a?(::Pubid::Core::Identifier::Base)
53
+
52
54
  ::Pubid::Iso::Identifier.create(**id)
53
55
  rescue StandardError => e
54
56
  Util.warn e.message, key: ref.to_s
@@ -74,15 +76,24 @@ module Relaton
74
76
  excl_attrs << :iteration
75
77
  end
76
78
  # excl_parts << :edition if ref.root.edition.nil? || all_parts
77
- @escludings = excl_attrs
79
+ @excludings = excl_attrs
78
80
  end
79
81
 
80
82
  def index
81
- @index ||= Relaton::Index.find_or_create :iso, url: "#{ENDPOINT}#{INDEXFILE}.zip", file: "#{INDEXFILE}.yaml"
83
+ @index ||= Relaton::Index.find_or_create(
84
+ :iso,
85
+ url: "#{ENDPOINT}#{INDEXFILE}.zip",
86
+ file: "#{INDEXFILE}.yaml",
87
+ id_keys: %i[publisher number copublisher part year edition type stage
88
+ iteration joint_document tctype sctype wgtype tcnumber
89
+ scnumber wgnumber dirtype base supplements addendum
90
+ jtc_dir month amendments corrigendums language],
91
+ pubid_class: ::Pubid::Iso::Identifier,
92
+ )
82
93
  end
83
94
 
84
95
  def fetch_doc(options = {})
85
- @excludeingds = nil if options != opts
96
+ @excludings = nil if options != opts
86
97
  @opts = options
87
98
 
88
99
  if !ref.root.all_parts || size == 1
@@ -94,7 +105,11 @@ module Relaton
94
105
 
95
106
  # @return [RelatonIsoBib::IsoBibliographicItem, nil]
96
107
  def to_all_parts # rubocop:disable Metrics/AbcSize,Metrics/CyclomaticComplexity
97
- hit = @array.select { |h| h.pubid.part }.min_by { |h| h.pubid.part.to_i }
108
+ parts = @array.select { |h| h.pubid.part }
109
+ if opts[:publication_date_before] || opts[:publication_date_after]
110
+ parts = parts.select { |h| Bibliography.send(:year_in_range?, (h.pubid.year || h.hit[:year]).to_i, opts) }
111
+ end
112
+ hit = parts.min_by { |h| h.pubid.part.to_i }
98
113
  return @array.first&.item unless hit
99
114
 
100
115
  bibitem = hit.item
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Relaton
4
4
  module Iso
5
- VERSION = "2.0.0-alpha.2"
5
+ VERSION = "2.0.0-alpha.3"
6
6
  end
7
7
  end
data/relaton_iso.gemspec CHANGED
@@ -28,7 +28,7 @@ Gem::Specification.new do |spec|
28
28
 
29
29
  spec.add_dependency "isoics", "~> 0.1.6"
30
30
  spec.add_dependency "openssl", "~> 3.3.2" # 3.3.0 raised an error on Ruby 3.4.7
31
- spec.add_dependency "pubid-iso", "~> 1.15.0"
31
+ spec.add_dependency "pubid-iso", "~> 1.15.8"
32
32
  spec.add_dependency "relaton-bib", "~> 2.0.0-alpha.4"
33
33
  spec.add_dependency "relaton-core", "~> 0.0.9"
34
34
  spec.add_dependency "relaton-index", "~> 0.2.12"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: relaton-iso
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.0.pre.alpha.2
4
+ version: 2.0.0.pre.alpha.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ribose Inc.
@@ -43,14 +43,14 @@ dependencies:
43
43
  requirements:
44
44
  - - "~>"
45
45
  - !ruby/object:Gem::Version
46
- version: 1.15.0
46
+ version: 1.15.8
47
47
  type: :runtime
48
48
  prerelease: false
49
49
  version_requirements: !ruby/object:Gem::Requirement
50
50
  requirements:
51
51
  - - "~>"
52
52
  - !ruby/object:Gem::Version
53
- version: 1.15.0
53
+ version: 1.15.8
54
54
  - !ruby/object:Gem::Dependency
55
55
  name: relaton-bib
56
56
  requirement: !ruby/object:Gem::Requirement
@@ -107,6 +107,7 @@ files:
107
107
  - ".hound.yml"
108
108
  - ".rspec"
109
109
  - ".rubocop.yml"
110
+ - CLAUDE.md
110
111
  - CODE_OF_CONDUCT.md
111
112
  - Gemfile
112
113
  - LICENSE.txt