fair_champion_harvester 0.1.13 → 0.1.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +6 -0
- data/lib/cache.rb +8 -17
- data/lib/fair_champion_harvester/version.rb +1 -1
- data/lib/harvester.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: ecc509041da6bb1b6e39fec571b281b8b78fc3726fe886e6720e7ca8f505bf6d
|
|
4
|
+
data.tar.gz: 82bb48606f99990171318569dfa9cd22c593967401b0373771c05b0f93bf7c1b
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 7b9746031f2bde80c3487a06612ec6b588f0ad3098b622ec7e8a41c8d816f70b3c8dd1b742e0bedfe1c77f1dbfd58f2e4e4855da886717958b2ac66aaacae7d1
|
|
7
|
+
data.tar.gz: 1e4e0a02c9ff6b502a56de0b76d27e501b4904975554e91f9891aed3df95db7a9dc8a7ff540de720b7c30be3c1567c8e5914521c6b5d5f773567fe13849c663b
|
data/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,12 @@
|
|
|
2
2
|
|
|
3
3
|
## [Unreleased]
|
|
4
4
|
|
|
5
|
+
## [0.1.14] - 2026-06-30
|
|
6
|
+
|
|
7
|
+
### Fixed
|
|
8
|
+
|
|
9
|
+
- Critical cache collision bug in `Cache.checkRDFCache`: the lookup was comparing only `File.size == body.bytesize` (byte count) against every `*_graphbody` file in `/tmp`, while `writeRDFCache` had always written files keyed by `MD5(body)`. As `/tmp` accumulated files over days of running, any two metadata bodies with identical byte counts would collide — the wrong RDF graph would be returned for a request. Symptom: a completely unrelated dataset's metadata (e.g. "Tata Motors NSE OHLCV Dataset") returned for a different resource (e.g. Glasgow University ORDA record). Problem disappeared on restart (clears `/tmp`) but returned after ~1 week. Fixed by rewriting `checkRDFCache` to look up directly by `MD5(body)` — O(1), no glob scan, and consistent with the write path. Also eliminates the parallel-access race window where a thread could match a partially-written file from another thread.
|
|
10
|
+
|
|
5
11
|
## [0.1.13] - 2026-06-30
|
|
6
12
|
|
|
7
13
|
### Fixed
|
data/lib/cache.rb
CHANGED
|
@@ -5,26 +5,17 @@ module FAIRChampionHarvester
|
|
|
5
5
|
################### #####################################
|
|
6
6
|
|
|
7
7
|
def self.checkRDFCache(body)
|
|
8
|
-
fs = File.join("/tmp/", "*_graphbody")
|
|
9
|
-
bodies = Dir.glob(fs)
|
|
10
8
|
g = RDF::Graph.new
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
9
|
+
key = Digest::MD5.hexdigest body
|
|
10
|
+
graph_file = "/tmp/#{key}_graph"
|
|
11
|
+
body_file = "/tmp/#{key}_graphbody"
|
|
14
12
|
|
|
15
|
-
|
|
16
|
-
warn "Regexp match for #{filename} FOUND"
|
|
17
|
-
next unless File.exist?("#{filename}_graph") # @ get the associated graph file
|
|
13
|
+
return g unless File.exist?(graph_file) && File.exist?(body_file)
|
|
18
14
|
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
end
|
|
24
|
-
warn "returning a graph of #{g.size}"
|
|
25
|
-
break
|
|
26
|
-
end
|
|
27
|
-
# return an empty graph otherwise
|
|
15
|
+
warn "RDF Cache File #{key} FOUND"
|
|
16
|
+
graph = Marshal.load(File.read(graph_file))
|
|
17
|
+
graph.each { |statement| g << statement }
|
|
18
|
+
warn "returning a graph of #{g.size}"
|
|
28
19
|
g
|
|
29
20
|
end
|
|
30
21
|
|
data/lib/harvester.rb
CHANGED