fair_champion_harvester 0.1.13 → 0.1.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 98197689724c7efe3d1d725d74b96d841ef1e9b4da0ab6068a721109192cd901
4
- data.tar.gz: eab9a398f94b2770d60aa9bde1a81ca035bc21ee69bc22d2b0be0fc46a148e5e
3
+ metadata.gz: ecc509041da6bb1b6e39fec571b281b8b78fc3726fe886e6720e7ca8f505bf6d
4
+ data.tar.gz: 82bb48606f99990171318569dfa9cd22c593967401b0373771c05b0f93bf7c1b
5
5
  SHA512:
6
- metadata.gz: ac9ca34832e5045adc5761f4579d69eec86e7ee87e54aeeb6ccb6ae87a3d04b31d3c03bcbbcf730d4274ce2066fbdd4b5e7e28afc5528544d8a84078bf4d1d61
7
- data.tar.gz: 3d1f6868db870bcd874f3797a93160f5e330b7614c7de0aaa514d3d914ee8fc17a5f6cc2c99ecba0386c030419af446052faf6c7428b2969dd9ce06a1ce95507
6
+ metadata.gz: 7b9746031f2bde80c3487a06612ec6b588f0ad3098b622ec7e8a41c8d816f70b3c8dd1b742e0bedfe1c77f1dbfd58f2e4e4855da886717958b2ac66aaacae7d1
7
+ data.tar.gz: 1e4e0a02c9ff6b502a56de0b76d27e501b4904975554e91f9891aed3df95db7a9dc8a7ff540de720b7c30be3c1567c8e5914521c6b5d5f773567fe13849c663b
data/CHANGELOG.md CHANGED
@@ -2,6 +2,12 @@
2
2
 
3
3
  ## [Unreleased]
4
4
 
5
+ ## [0.1.14] - 2026-06-30
6
+
7
+ ### Fixed
8
+
9
+ - Critical cache collision bug in `Cache.checkRDFCache`: the lookup was comparing only `File.size == body.bytesize` (byte count) against every `*_graphbody` file in `/tmp`, while `writeRDFCache` had always written files keyed by `MD5(body)`. As `/tmp` accumulated files over days of running, any two metadata bodies with identical byte counts would collide — the wrong RDF graph would be returned for a request. Symptom: a completely unrelated dataset's metadata (e.g. "Tata Motors NSE OHLCV Dataset") returned for a different resource (e.g. Glasgow University ORDA record). Problem disappeared on restart (clears `/tmp`) but returned after ~1 week. Fixed by rewriting `checkRDFCache` to look up directly by `MD5(body)` — O(1), no glob scan, and consistent with the write path. Also eliminates the parallel-access race window where a thread could match a partially-written file from another thread.
10
+
5
11
  ## [0.1.13] - 2026-06-30
6
12
 
7
13
  ### Fixed
data/lib/cache.rb CHANGED
@@ -5,26 +5,17 @@ module FAIRChampionHarvester
5
5
  ################### #####################################
6
6
 
7
7
  def self.checkRDFCache(body)
8
- fs = File.join("/tmp/", "*_graphbody")
9
- bodies = Dir.glob(fs)
10
8
  g = RDF::Graph.new
11
- bodies.each do |bodyfile|
12
- next unless File.size(bodyfile) == body.bytesize # compare body size
13
- next unless bodyfile.match(/(.*)_graphbody$/) # continue if there's no match
9
+ key = Digest::MD5.hexdigest body
10
+ graph_file = "/tmp/#{key}_graph"
11
+ body_file = "/tmp/#{key}_graphbody"
14
12
 
15
- filename = ::Regexp.last_match(1)
16
- warn "Regexp match for #{filename} FOUND"
17
- next unless File.exist?("#{filename}_graph") # @ get the associated graph file
13
+ return g unless File.exist?(graph_file) && File.exist?(body_file)
18
14
 
19
- warn "RDF Cache File #{filename} FOUND"
20
- graph = Marshal.load(File.read("#{filename}_graph")) # unmarshal it
21
- graph.each do |statement|
22
- g << statement # need to do this because the unmarshalled object isn't entirely functional as an RDF::Graph object
23
- end
24
- warn "returning a graph of #{g.size}"
25
- break
26
- end
27
- # return an empty graph otherwise
15
+ warn "RDF Cache File #{key} FOUND"
16
+ graph = Marshal.load(File.read(graph_file))
17
+ graph.each { |statement| g << statement }
18
+ warn "returning a graph of #{g.size}"
28
19
  g
29
20
  end
30
21
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module FairChampionHarvester
4
- VERSION = "0.1.13"
4
+ VERSION = "0.1.14"
5
5
  end
data/lib/harvester.rb CHANGED
@@ -1,4 +1,4 @@
1
- HARVESTER_VERSION = "Hvst-1.5.0".freeze
1
+ HARVESTER_VERSION = "Hvst-0.1.14".freeze
2
2
  # better output,
3
3
  # different dealing with DataCite (they have a unique type header)
4
4
  # handle large extruct output,
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fair_champion_harvester
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.13
4
+ version: 0.1.14
5
5
  platform: ruby
6
6
  authors:
7
7
  - markwilkinson