relaton-iso 2.1.1 → 2.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1996226d20bb1e528b2a5a2cabce6446bcddc1036b0c00f40da8b56d39b57142
4
- data.tar.gz: abe31e6602e8846f4154ced3900cd289c3b11fa2ce1ece03cb1cc325590736c6
3
+ metadata.gz: 767dfed024aec3fc3c96c2322ef0fe9514fdd99d7ea5602c9caa878ba9ff95f6
4
+ data.tar.gz: 427f2b4fb8c58791acd025fcaa7bc6ba7d1928314c6ca9e04ea583d3063247e8
5
5
  SHA512:
6
- metadata.gz: fb51cc0479e8a79c2e395d98a119d19aa75c40ce046c80c64151ac3620560dd408d0a78902de1e8b8be5c6531da6649284fc6e0cb97fb5ab7860132713559716
7
- data.tar.gz: 4b70e291c451f833abc4c97346398638c09df0658c6fa7b7a7eadf318f0d0b80d5e08c0dd1254b17837af5195e1f3029fd3efff2d35a8b3207a1de0275320dad
6
+ metadata.gz: a99c1fb6fd7ed6cd9f11784d1851353224f9af378705ed3cfcbafd4a20d1eb40f33036521bdcabebe4ab720a6d628ab01a41af914652e3effbe0ba4d5176882f
7
+ data.tar.gz: 52c952079694c2f0b43a23a9eb767442c53c2fd32183317616eafff960c14e8b997a6819b407933c1d584e15f690fc531b4c8fea7a4dc7add4bac8eb6a237e90
data/CLAUDE.md CHANGED
@@ -20,8 +20,11 @@ relaton-iso retrieves ISO standard bibliographic data. The core retrieval flow:
20
20
  2. **HitCollection** (`lib/relaton/iso/hit_collection.rb`) — searches a pre-built YAML index (`index-v1.zip` from relaton-data-iso) using `Relaton::Index`. Matches on `id_keys`: publisher, number, copublisher, part, year, edition, type, stage, iteration. Returns sorted Hit array.
21
21
  3. **Hit** (`lib/relaton/iso/hit.rb`) — wraps an index result. The `item` attribute lazy-loads the full document from GitHub raw content (relaton-data-iso repo). `sort_weight` prioritizes published over withdrawn/deleted.
22
22
  4. **ItemData** / **Model::Item** — ISO-specific bibliographic item extending `Relaton::Bib::ItemData`.
23
- 5. **Scraper** (`lib/relaton/iso/scraper.rb`) — parses ISO website pages for metadata (used by DataFetcher for bulk operations, not the normal lookup path).
24
- 6. **DataFetcher** (`lib/relaton/iso/data_fetcher.rb`) — bulk fetches from ISO.org ICS pages using 3 threads with a persistent queue for resumability.
23
+ 5. **Scraper** (`lib/relaton/iso/scraper.rb`) — parses individual ISO website pages. Used only by `Bibliography.get` as a fallback when an item is missing from the curated index; no longer drives bulk ingest.
24
+ 6. **DataFetcher** (`lib/relaton/iso/data_fetcher.rb`) — streams the ISO Open Data programme JSONL feeds (`iso_deliverables_metadata.jsonl` for documents, `iso_technical_committees.jsonl` for committee titles) and writes one YAML per primary docid into `@output`. Short-circuits on upstream `Last-Modified`; falls back to a full pass when `data/` or `index-v1.yaml` is missing. Two source modes:
25
+ - `iso-open-data` (default) — incremental, skip when upstream is unchanged.
26
+ - `iso-open-data-all` — wipe `@output` and re-emit every record.
27
+ 7. **DataParser** (`lib/relaton/iso/data_parser.rb`) — converts one Open Data record (`Hash`) into a `Relaton::Iso::ItemData`. Takes a `ref_index` (id → reference) for resolving `replaces`/`replacedBy` and a `tc_index` (reference → `{ "en"/"fr" => title }`) for resolving committee labels.
25
28
 
26
29
  Key dependency: `pubid-iso` gem handles ISO publication identifier parsing and comparison.
27
30
 
data/README.adoc CHANGED
@@ -352,6 +352,26 @@ item.source
352
352
  @type="rss">]
353
353
  ----
354
354
 
355
+ [[bulk-data-ingest]]
356
+ === Bulk data ingest
357
+
358
+ The curated dataset under https://github.com/relaton/relaton-data-iso[relaton-data-iso] is rebuilt daily by `Relaton::Iso::DataFetcher`, which streams the
359
+ https://www.iso.org/open-data.html[ISO Open Data programme] JSONL feeds — `iso_deliverables_metadata.jsonl` for documents (~80,000 records) and `iso_technical_committees.jsonl` for committee titles — and writes one YAML per primary docid.
360
+
361
+ Two source modes are exposed (also reachable via `relaton-cli`'s dataset list and the GitHub Actions workflow input):
362
+
363
+ [source,ruby]
364
+ ----
365
+ # Incremental: skip the run if upstream `Last-Modified` matches the local
366
+ # `last_modified.txt`. Falls back to a full pass when `data/` or
367
+ # `index-v1.yaml` is missing.
368
+ Relaton::Iso::DataFetcher.fetch("iso-open-data", output: "data", format: "yaml")
369
+
370
+ # Full refresh: wipe `output` and re-emit every record. Use when the local
371
+ # tree is suspect or after a parser change that affects emitted YAML.
372
+ Relaton::Iso::DataFetcher.fetch("iso-open-data-all", output: "data", format: "yaml")
373
+ ----
374
+
355
375
  === Logging
356
376
 
357
377
  RelatonIso uses the relaton-logger gem for logging. By default, it logs to STDOUT. To change the log levels and add other loggers, read the https://github.com/relaton/relaton-logger#usage[relaton-logger] documentation.
@@ -367,7 +387,7 @@ To install this gem onto your local machine, run `bundle exec rake install`. To
367
387
 
368
388
  == Exceptional Citations
369
389
 
370
- This gem retrieves bibliographic descriptions of ISO documents by doing searches on the ISO website, http://www.iso.org, and screenscraping the document that matches the queried document identifier. The following documents are not returned as search results from the ISO website, and the gem returns manually generated references to them.
390
+ Single-document lookups via `Bibliography.get` first consult the curated relaton-data-iso index (regenerated daily from the ISO Open Data programme — see <<bulk-data-ingest>>) and fall back to scraping individual pages on http://www.iso.org for items not yet indexed. The following documents are not returned as search results from the ISO website, and the gem returns manually generated references to them.
371
391
 
372
392
  * `IEV`: used in the metanorma-iso gem to reference Electropedia entries generically. Is resolved to an "all parts" reference to IEC 60050, which in turn is resolved into the specific documents cited by their top-level clause.
373
393
 
@@ -186,10 +186,16 @@ module Relaton
186
186
  end
187
187
 
188
188
  # Extract year from a hit as an integer.
189
+ #
190
+ # Amendments, corrigendums and supplements carry no year on their own
191
+ # identifier; the year lives on the underlying standard reachable via
192
+ # `root` (which walks the full base chain, however deeply nested). Fall
193
+ # back to it so a date filter does not drop such references (issue #181).
194
+ #
189
195
  # @param hit [Relaton::Iso::Hit]
190
196
  # @return [Integer]
191
197
  def hit_year(hit)
192
- yr = hit.pubid&.year || hit.hit[:year]
198
+ yr = hit.pubid&.year || hit.hit[:year] || hit.pubid&.root&.year
193
199
  yr.to_i
194
200
  end
195
201
 
@@ -236,6 +242,10 @@ module Relaton
236
242
  # @return [Relaton::Iso::ItemData, nil]
237
243
  def fetch_and_check_date(hit, pubid, opts)
238
244
  ret = hit.item
245
+ # A data file that fails to load (e.g. the index references a file that
246
+ # 404s) yields an item with no docidentifier; skip it rather than crash.
247
+ return unless ret&.docidentifier&.first
248
+
239
249
  if publication_date_in_range?(ret, opts)
240
250
  Util.info "Found: `#{ret.docidentifier.first.content}`", key: pubid.to_s
241
251
  ret
@@ -1,185 +1,251 @@
1
+ require "fileutils"
2
+ require "json"
3
+ require "net/http"
4
+ require "tmpdir"
1
5
  require_relative "../iso"
2
- require_relative "queue"
3
- require_relative "scraper"
6
+ require_relative "data_parser"
4
7
 
5
8
  module Relaton
6
9
  module Iso
7
- # Fetch all the documents from ISO website.
10
+ #
11
+ # Fetch ISO documents from the ISO Open Data programme bulk JSONL
12
+ # (see https://www.iso.org/open-data.html) and write each one as a YAML
13
+ # file under `@output`.
14
+ #
15
+ # `source` modes (matching the `Relaton::Core::DataFetcher.fetch` arg):
16
+ #
17
+ # * `"iso-open-data"` (default) - skip the run if the upstream
18
+ # `Last-Modified` header matches `LAST_MODIFIED_FILE`.
19
+ # * `"iso-open-data-all"` - clear `@output` and re-emit every record.
20
+ #
8
21
  class DataFetcher < Core::DataFetcher
9
- #
10
- # The queue is used to store the ICS page paths beeing fetching in the current run.
11
- #
12
- # @return [Queue] queue
13
- #
14
- def queue
15
- @queue ||= ::Queue.new
16
- end
17
-
18
- def mutex
19
- @mutex ||= Mutex.new
20
- end
22
+ OPEN_DATA_URL = "https://isopublicstorageprod.blob.core.windows.net/" \
23
+ "opendata/_latest/iso_deliverables_metadata/json/" \
24
+ "iso_deliverables_metadata.jsonl".freeze
25
+ TC_DATA_URL = "https://isopublicstorageprod.blob.core.windows.net/" \
26
+ "opendata/_latest/iso_technical_committees/json/" \
27
+ "iso_technical_committees.jsonl".freeze
28
+ LAST_MODIFIED_FILE = "last_modified.txt".freeze
29
+ MAX_DOWNLOAD_RETRIES = 4
30
+ RETRY_BACKOFF_BASE = 30
21
31
 
22
32
  def log_error(msg)
23
33
  Util.error msg
24
34
  end
25
35
 
26
36
  def index
27
- @index ||= Relaton::Index.find_or_create :iso, file: "#{INDEXFILE}.yaml"
28
- end
29
-
30
- #
31
- # ISO has too many docs. GHA can't get them all in one run.
32
- # So, we need to split the process into several runs.
33
- # The iso_queue is used to store the doc paths that have not been fetched.
34
- #
35
- # @return [Relaton::Iso::Queue] queue
36
- #
37
- def iso_queue
38
- @iso_queue ||= Relaton::Iso::Queue.new
39
- end
40
-
41
- #
42
- # Go through all ICS and fetch all documents.
43
- #
44
- # @return [void]
45
- #
46
- def fetch # rubocop:disable Metrics/AbcSize
47
- Util.info "Scrapping ICS pages..."
48
- fetch_ics
49
- Util.info "(#{Time.now}) Scrapping documents..."
50
- fetch_docs
51
- iso_queue.save
52
- # index.sort! { |a, b| compare_docids a, b }
37
+ @index ||= Relaton::Index.find_or_create(
38
+ :iso, file: "#{INDEXFILE}.yaml", pubid_class: ::Pubid::Iso::Identifier,
39
+ )
40
+ end
41
+
42
+ def fetch(source = nil)
43
+ @source = source || "iso-open-data"
44
+ @full_refresh = @source == "iso-open-data-all"
45
+
46
+ Util.info "Fetching ISO Open Data (mode: #{@source})..."
47
+ last_modified = fetch_last_modified
48
+ return if up_to_date?(last_modified)
49
+
50
+ prepare_output
51
+ jsonl_path = download_dataset
52
+ ref_index, amend_index, date_index = build_ref_index(jsonl_path)
53
+ tc_index = build_tc_index
54
+ ingest_records(jsonl_path, ref_index, tc_index, amend_index, date_index)
55
+ merge_static_files
56
+
53
57
  index.save
58
+ save_last_modified(last_modified)
54
59
  report_errors
60
+ rescue StandardError => e
61
+ Util.error "#{e.message}\n#{e.backtrace.join("\n")}"
62
+ raise
55
63
  end
56
64
 
57
65
  private
58
66
 
59
- #
60
- # Fetch ICS page recursively and store all the links to documents in the iso_queue.
61
- #
62
- # @param [String] path path to ICS page
63
- #
64
- def fetch_ics
65
- threads = Array.new(3) { thread { |path| fetch_ics_page(path) } }
66
- fetch_ics_page "/standards-catalogue/browse-by-ics.html"
67
- sleep(1) until queue.empty?
68
- threads.size.times { queue << :END }
69
- threads.each(&:join)
70
- end
71
-
72
- def fetch_ics_page(path)
73
- resp = get_redirection path
74
- unless resp
75
- Util.error "Failed fetching ICS page #{url(path)}"
76
- return
67
+ # --- HTTP / state -----------------------------------------------------
68
+
69
+ def fetch_last_modified
70
+ uri = URI(OPEN_DATA_URL)
71
+ resp = Net::HTTP.start(uri.host, uri.port, use_ssl: true) do |http|
72
+ http.request(Net::HTTP::Head.new(uri.request_uri))
73
+ end
74
+ resp["last-modified"]
75
+ end
76
+
77
+ def up_to_date?(last_modified)
78
+ return false if @full_refresh || last_modified.nil?
79
+ return false unless File.exist?(LAST_MODIFIED_FILE)
80
+ return false unless output_populated?
81
+
82
+ if File.read(LAST_MODIFIED_FILE, encoding: "UTF-8").strip == last_modified.strip
83
+ Util.info "ISO Open Data is up to date (Last-Modified: #{last_modified}); nothing to do."
84
+ true
85
+ else
86
+ false
77
87
  end
88
+ end
89
+
90
+ # Guard against an external wipe (or a fresh checkout) — if the YAML tree
91
+ # or the index file is gone, force a refresh instead of trusting
92
+ # `LAST_MODIFIED_FILE`.
93
+ def output_populated?
94
+ return false unless Dir.exist?(@output)
95
+ return false unless File.exist?("#{INDEXFILE}.yaml")
78
96
 
79
- page = Nokogiri::HTML(resp.body)
80
- parse_doc_links page
81
- parse_ics_links page
97
+ Dir.children(@output).any? { |f| f.end_with?(".yaml") }
82
98
  end
83
99
 
84
- def parse_doc_links(page)
85
- doc_links = page.xpath "//td[@data-title='Standard and/or project']/div/div/a"
86
- @errors[:doc_links] &&= doc_links.empty?
87
- doc_links.each { |item| iso_queue.add_first item[:href].split("?").first }
100
+ def save_last_modified(last_modified)
101
+ return unless last_modified
102
+
103
+ File.write(LAST_MODIFIED_FILE, last_modified, encoding: "UTF-8")
104
+ end
105
+
106
+ def prepare_output
107
+ FileUtils.rm_rf(@output) if @full_refresh
108
+ FileUtils.mkdir_p(@output)
88
109
  end
89
110
 
90
- def parse_ics_links(page)
91
- ics_links = page.xpath("//td[@data-title='ICS']/a")
92
- @errors[:ics_links] &&= ics_links.empty?
93
- ics_links.each { |item| queue << item[:href] }
111
+ def download_dataset
112
+ download_jsonl(OPEN_DATA_URL, "iso_deliverables_metadata.jsonl")
94
113
  end
95
114
 
96
- def url(path)
97
- Scraper::DOMAIN + path
115
+ def download_tc_dataset
116
+ download_jsonl(TC_DATA_URL, "iso_technical_committees.jsonl")
98
117
  end
99
118
 
100
- #
101
- # Get the page from the given path. If the page is redirected, get the
102
- # page from the new path.
103
- #
104
- # @param [String] path path to the page
105
- #
106
- # @return [Net::HTTPOK, nil] HTTP response
107
- #
108
- def get_redirection(path) # rubocop:disable Metrics/MethodLength
109
- try = 0
110
- uri = URI url(path)
119
+ def download_jsonl(url, filename)
120
+ path = File.join(Dir.tmpdir, filename)
121
+ Util.info "Downloading #{url}..."
122
+ uri = URI(url)
123
+ attempt = 0
111
124
  begin
112
- get_response uri
113
- rescue Net::OpenTimeout, Net::ReadTimeout, Errno::ECONNREFUSED => e
114
- try += 1
115
- retry if check_try try, uri
125
+ File.open(path, "wb") do |f|
126
+ Net::HTTP.start(uri.host, uri.port, use_ssl: true) do |http|
127
+ http.request_get(uri.request_uri) do |resp|
128
+ raise "Open Data download failed: HTTP #{resp.code}" unless resp.code == "200"
116
129
 
117
- Util.warn "Failed fetching #{uri}, #{e.message}"
130
+ resp.read_body { |chunk| f.write(chunk) }
131
+ end
132
+ end
133
+ end
134
+ rescue StandardError => e
135
+ attempt += 1
136
+ raise if attempt > MAX_DOWNLOAD_RETRIES
137
+
138
+ delay = RETRY_BACKOFF_BASE * (2**(attempt - 1))
139
+ Util.warn "Download attempt #{attempt}/#{MAX_DOWNLOAD_RETRIES} failed (#{e.message}). Retrying in #{delay}s..."
140
+ sleep delay
141
+ retry
118
142
  end
143
+ Util.info "Downloaded #{File.size(path) / 1024 / 1024} MB to #{path}."
144
+ path
119
145
  end
120
146
 
121
- def get_response(uri)
122
- resp = Net::HTTP.get_response(uri)
123
- resp.code == "302" ? get_redirection(resp["location"]) : resp
124
- end
147
+ # --- ingestion --------------------------------------------------------
125
148
 
126
- def check_try(try, uri)
127
- if try < 3
128
- Util.warn "Timeout fetching #{uri}, retrying..."
129
- sleep 1
130
- true
149
+ def build_ref_index(path)
150
+ Util.info "Indexing references and amendments..."
151
+ ref_map = {}
152
+ amend_map = Hash.new { |h, k| h[k] = [] }
153
+ date_map = {}
154
+ File.foreach(path, encoding: "UTF-8") do |line|
155
+ rec = JSON.parse(line)
156
+ id = rec["id"]
157
+ ref = normalize_reference(rec["reference"])
158
+ next unless ref
159
+
160
+ ref_map[id] = ref if id
161
+ pub_date = rec["publicationDate"]
162
+ date_map[ref] = pub_date if pub_date && !pub_date.empty?
163
+ if rec["supplementType"] && (base = amend_base(ref))
164
+ amend_map[base] << ref
165
+ end
166
+ rescue JSON::ParserError
167
+ next
131
168
  end
169
+ Util.info "Indexed #{ref_map.size} references; " \
170
+ "#{amend_map.values.sum(&:size)} amendments across #{amend_map.size} bases; " \
171
+ "#{date_map.size} publication dates."
172
+ [ref_map, amend_map, date_map]
132
173
  end
133
174
 
134
- def fetch_docs
135
- threads = Array.new(3) { thread { |path| fetch_doc(path) } }
136
- iso_queue[0..10_000].each { |docpath| queue << docpath }
137
- threads.size.times { queue << :END }
138
- threads.each(&:join)
175
+ def amend_base(ref)
176
+ pubid = ::Pubid::Iso::Identifier.parse(ref)
177
+ return nil unless pubid.respond_to?(:base) && pubid.base
178
+
179
+ pubid.base.to_s
180
+ rescue StandardError
181
+ nil
139
182
  end
140
183
 
141
- #
142
- # Fetch document from ISO website.
143
- #
144
- # @param [String] docpath document page path
145
- #
146
- # @return [void]
147
- #
148
- def fetch_doc(docpath)
149
- doc = Scraper.parse_page docpath, errors: @errors
150
- mutex.synchronize { save_doc doc, docpath }
151
- rescue StandardError => e
152
- Util.warn "Fail fetching document: #{url(docpath)}\n#{e.message}\n#{e.backtrace}"
184
+ # Open Data emits stub records for deleted/abandoned projects with a
185
+ # "Withdrawn" publisher prefix. They have no publicationDate, no edition,
186
+ # and sit on stage *.98 (deleted). Skip them entirely.
187
+ def normalize_reference(ref)
188
+ return nil if ref.nil? || ref.empty?
189
+ return nil if ref.start_with?("Withdrawn ")
190
+
191
+ ref
153
192
  end
154
193
 
155
- # def compare_docids(id1, id2)
156
- # Pubid::Iso::Identifier.create(**id1).to_s <=> Pubid::Iso::Identifier.create(**id2).to_s
157
- # end
194
+ def ingestable?(ref)
195
+ !ref.nil? && !ref.empty? && !ref.start_with?("Withdrawn ")
196
+ end
197
+
198
+ def build_tc_index
199
+ Util.info "Indexing technical committees..."
200
+ path = download_tc_dataset
201
+ map = {}
202
+ File.foreach(path, encoding: "UTF-8") do |line|
203
+ rec = JSON.parse(line)
204
+ ref = rec["reference"]
205
+ title = rec["title"]
206
+ map[ref] = title if ref && title.is_a?(Hash)
207
+ rescue JSON::ParserError
208
+ next
209
+ end
210
+ Util.info "Indexed #{map.size} committees."
211
+ map
212
+ end
213
+
214
+ def ingest_records(path, ref_index, tc_index, amend_index = {}, date_index = {})
215
+ Util.info "Parsing records..."
216
+ count = 0
217
+ File.foreach(path, encoding: "UTF-8") do |line|
218
+ rec = JSON.parse(line)
219
+ next unless ingestable?(rec["reference"])
220
+
221
+ fetch_pub(rec, ref_index, tc_index, amend_index, date_index)
222
+ count += 1
223
+ Util.info "Processed #{count} records..." if (count % 5_000).zero?
224
+ rescue StandardError => e
225
+ Util.warn "Failed record `#{rec && rec['reference']}`: #{e.message}"
226
+ end
227
+ Util.info "Finished: #{count} records."
228
+ end
158
229
 
159
- #
160
- # save document to file.
161
- #
162
- # @param [RelatonIsoBib::IsoBibliographicItem] doc document
163
- #
164
- # @return [void]
165
- #
166
- def save_doc(doc, docpath) # rubocop:disable Metrics/AbcSize,Metrics/MethodLength
230
+ def fetch_pub(rec, ref_index, tc_index = {}, amend_index = {}, date_index = {})
231
+ doc = DataParser.new(rec, ref_index, @errors, tc_index, amend_index, date_index).parse
167
232
  docid = doc.docidentifier.detect(&:primary)
168
- file = output_file docid.content.to_s
233
+ return unless docid
234
+
235
+ file = output_file(docid.content.to_s)
169
236
  if File.exist?(file)
170
- rewrite_with_same_or_newer doc, docid, file, docpath
237
+ rewrite_with_same_or_newer(doc, docid, file)
171
238
  else
172
- write_file file, doc, docid
239
+ write_file(file, doc, docid)
173
240
  end
174
- iso_queue.move_last docpath
175
241
  end
176
242
 
177
- def rewrite_with_same_or_newer(doc, docid, file, docpath)
178
- bib = Item.from_yaml File.read(file, encoding: "UTF-8")
179
- if edition_greater?(doc, bib) || replace_substage98?(doc, bib)
180
- write_file file, doc, docid
181
- elsif @files.include?(file) && !edition_greater?(bib, doc)
182
- Util.warn "Duplicate file `#{file}` for `#{docid.content}` from #{url(docpath)}"
243
+ def rewrite_with_same_or_newer(doc, docid, file)
244
+ existing = Item.from_yaml(File.read(file, encoding: "UTF-8"))
245
+ if edition_greater?(doc, existing) || replace_substage98?(doc, existing)
246
+ write_file(file, doc, docid)
247
+ elsif @files.include?(file) && !edition_greater?(existing, doc)
248
+ Util.warn "Duplicate file `#{file}` for `#{docid.content}`"
183
249
  end
184
250
  end
185
251
 
@@ -187,35 +253,38 @@ module Relaton
187
253
  doc.edition && bib.edition && doc.edition.content.to_i > bib.edition.content.to_i
188
254
  end
189
255
 
190
- def replace_substage98?(doc, bib) # rubocop:disable Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
256
+ def replace_substage98?(doc, bib)
191
257
  doc.edition&.content == bib.edition&.content &&
192
258
  (doc.status&.substage&.content != "98" || bib.status&.substage&.content == "98")
193
259
  end
194
260
 
195
261
  def write_file(file, doc, docid)
196
262
  @files << file
197
- index.add_or_update docid.pubid.to_h, file
198
- File.write file, serialize(doc), encoding: "UTF-8"
263
+ index.add_or_update(docid.pubid || docid.content.to_s, file)
264
+ File.write(file, serialize(doc), encoding: "UTF-8")
199
265
  end
200
266
 
201
- def to_yaml(doc) = doc.to_yaml
267
+ # --- static merge -----------------------------------------------------
202
268
 
203
- def to_xml(doc) = doc.to_xml bibxml: true
269
+ def merge_static_files
270
+ return unless Dir.exist?("static")
204
271
 
205
- def to_bibxml(doc) = doc.to_rfcxml
272
+ Dir["static/**/*.yaml"].each do |f|
273
+ item = Item.from_yaml(File.read(f, encoding: "UTF-8"))
274
+ did = item.docidentifier.detect(&:primary)
275
+ next unless did
206
276
 
207
- #
208
- # Create thread worker
209
- #
210
- # @return [Thread] thread
211
- #
212
- def thread
213
- Thread.new do
214
- while (path = queue.pop) != :END
215
- yield path
216
- end
277
+ index.add_or_update(did.pubid || did.content.to_s, f)
217
278
  end
218
279
  end
280
+
281
+ # --- serialization ---------------------------------------------------
282
+
283
+ def to_yaml(doc) = doc.to_yaml
284
+
285
+ def to_xml(doc) = doc.to_xml(bibxml: true)
286
+
287
+ def to_bibxml(doc) = doc.to_rfcxml
219
288
  end
220
289
  end
221
290
  end