oddb2xml 3.0.22 → 3.0.24

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b116a974b4b6f80e5dba15a710f53fe2c4b0b44bcd094216eb0f46678d703beb
4
- data.tar.gz: e994a68fedddc58377421d5883322578ce5b6b119deaae1ba793acb4be190a9a
3
+ metadata.gz: 6c5dd43eea5580b800dcb7c1ede8f2b1b55ded474a7608471185bd0fab81f5c0
4
+ data.tar.gz: 0f2b55c4a6706b4e723efbcb77bd3c40b321eb3cf2893b58230572e03b6936a5
5
5
  SHA512:
6
- metadata.gz: baf0927ae2293672d4e1db0e61ace759fc8fe11cb05e936a556eafd3740d4355d9b6cf5b5e049aa9205125760d6d363bcecaa1cb98c93d4646981502b087f6b4
7
- data.tar.gz: 03217fb23800e5745aef09b836860e4f6f021b641e2ff7076d382f5eb4235f2ce618360966d0a59fdaad53647b7bd8b558cd131b15ed320ee2706ac9a98a2c97
6
+ metadata.gz: a55b5366832bd577609b70baa32934119ef953a441296c76848f183565c70709f7a24541f1927c23c78ded28b6b515b802ea287f34f58b5a50228ed6bba071ba
7
+ data.tar.gz: d4591b9b3e812a134e0598865d5385f0fcbe3546cc1b799e62c2c4b8d6c22ce8ee8347e32c7c967a3bd6051aa5e1d45ad8a262942a841b67a4ef22afd6a92579
data/.ruby-version CHANGED
@@ -1 +1 @@
1
- 3.4.5
1
+ 3.3.6
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- oddb2xml (3.0.22)
4
+ oddb2xml (3.0.24)
5
5
  csv
6
6
  htmlentities
7
7
  httpi
data/History.txt CHANGED
@@ -1,3 +1,10 @@
1
+ === 3.0.24 / 16.06.2026
2
+ * Bugfix (--skip-download): cached files fetched with a write mode were silently emptied on every --skip-download run. DownloadMethod#download_as restored the file from the ./downloads cache and then re-opened it with the caller's mode (e.g. "w+"), which truncated it to zero bytes before the read — so the returned data was empty. This blanked epha_interactions.csv (oddb_interaction.xml came out with NBR_RECORD=0) and any other source pulled with a "w+"-style mode (LPPV, Weleda/WALA SL, BAG SL group prices) whenever its file was present in the cache. It surfaced in deploy scripts that download once and rebuild several price increments from the shared cache (e.g. -b -I 45/50/55): the first build was correct, every --skip-download rebuild lost the interactions. The skip branch now opens the restored file read-only, preserving any encoding suffix ("w+:iso-8859-1:utf-8" -> "r:iso-8859-1:utf-8").
3
+
4
+ === 3.0.23 / 12.06.2026
5
+ * Bugfix (--proxy-check): the connectivity check now follows HTTP redirects to other hosts and reports the real "forwarder" target an allow-list proxy must permit, instead of stopping at the first hop. GS1 Switzerland turned id.gs1.ch into a 301 redirect to the global resolver id.gs1.org, which 307-redirects again to apitools.gs1.ch — so allowing only id.gs1.ch is no longer enough, and the firstbase download dies on the blocked target. Previously any 3xx answer was reported as "OK", so the check was falsely green; it now shows e.g. "[BLOCKED] id.gs1.ch -> apitools.gs1.ch" plus a "must be on the proxy allow-list too" note for every cross-host redirect. id.gs1.org is also probed explicitly (added to --firstbase's host set and to the full --proxy-check report).
6
+ * Improvement (--proxy-check): each host is now probed with the actual resource path the downloader fetches (e.g. raw.githubusercontent.com/zdavatz/…, www.spezialitaetenliste.ch/File.axd, files.refdata.ch/…/Refdata.Articles.zip) rather than "/". Probing "/" produced misleading host redirects (raw.githubusercontent.com/ -> github.com) for hosts whose real download path returns 200 directly; the genuine paths also reveal real forwarders such as www.spezialitaetenliste.ch -> sl.bag.admin.ch.
7
+
1
8
  === 3.0.22 / 11.06.2026
2
9
  * New (Artikelstamm / -e / -b): extend the 3.0.21 "Kapitel 70" SL recovery to WALA products (GTIN prefix 7640187…), in addition to Weleda. A third runtime CSV in github.com/zdavatz/oddb2xml_files, wala_arzneimittel.csv (bundled fallback under data/), is mapped exactly like the Weleda file: for any GTIN absent from the FHIR NDJSON, oddb2xml fills <SL_ENTRY>true</SL_ENTRY> and the BAG SL public price (<PPUB> in Artikelstamm, the standard <ARTPRI><PTYP>PPUB</PTYP> entry in oddb_article.xml for -e/-b). The WALA layout differs from Weleda: it is ";"-separated with a BOM, has no "/ SL" column (a row is SL when it carries a CSL-Code = Kapitel-70.01 group code), and the public package price is given inline in the "CSL 70.01." column -- already multiplied for the pack size (the multiplier appears only in the galenic-form text, e.g. "Solutio ad inj. 10 x 1 ml"), so it is taken verbatim instead of being re-joined against bag_sl_group_prices.csv (which would yield the 1/10 per-unit price). The FHIR/ZurRose price still always wins; this only fills a gap. Live file: 320 WALA SL products with prices.
3
10
 
@@ -18,7 +18,13 @@ module Oddb2xml
18
18
  data = nil
19
19
  FileUtils.makedirs(File.dirname(file), verbose: true)
20
20
  if Oddb2xml.skip_download(file)
21
- io = File.open(file, option)
21
+ # The file has just been restored from the download cache. Open it
22
+ # read-only: a write mode like "w+" would truncate the cached file to
23
+ # zero bytes before the read, silently emptying it (e.g. it blanked
24
+ # epha_interactions.csv on every --skip-download run). Preserve any
25
+ # encoding suffix (e.g. "w+:iso-8859-1:utf-8" -> "r:iso-8859-1:utf-8").
26
+ read_option = option.sub(/\A[wa]\+?/, "r")
27
+ io = File.open(file, read_option)
22
28
  data = io.read
23
29
  else
24
30
  begin
@@ -25,6 +25,25 @@ module Oddb2xml
25
25
 
26
26
  TIMEOUT = 6 # seconds, per host (open + read); checks run concurrently
27
27
 
28
+ # Representative resource path per host -- the actual file the downloader
29
+ # fetches, NOT "/". Probing "/" gives misleading host redirects (e.g.
30
+ # raw.githubusercontent.com/ -> github.com, while the real raw file path
31
+ # returns 200), whereas the genuine download paths reveal the real
32
+ # forwarder chain the proxy must allow (id.gs1.ch -> id.gs1.org ->
33
+ # apitools.gs1.ch; www.spezialitaetenliste.ch/File.axd -> sl.bag.admin.ch).
34
+ PROBE_PATHS = {
35
+ "files.refdata.ch" => "/simis-public-prod/Articles/1.0/Refdata.Articles.zip",
36
+ "raw.githubusercontent.com" => "/zdavatz/oddb2xml_files/master/LPPV.txt",
37
+ "id.gs1.ch" => "/01/07612345000961",
38
+ "id.gs1.org" => "/01/07612345000961",
39
+ "www.spezialitaetenliste.ch" => "/File.axd?file=XMLPublications.zip",
40
+ "www.medregbm.admin.ch" => "/Publikation/"
41
+ }.freeze
42
+
43
+ def probe_path(host)
44
+ PROBE_PATHS[host] || "/"
45
+ end
46
+
28
47
  def proxy_uri
29
48
  env = ENV["https_proxy"] || ENV["HTTPS_PROXY"] || ENV["http_proxy"] || ENV["HTTP_PROXY"]
30
49
  return nil if env.nil? || env.empty?
@@ -34,10 +53,25 @@ module Oddb2xml
34
53
  nil
35
54
  end
36
55
 
56
+ # Redirect targets ("forwarders") that an allow-list proxy must permit in
57
+ # addition to the host we actually request. id.gs1.ch 301-redirects every
58
+ # path to the global resolver id.gs1.org, so allowing only id.gs1.ch is not
59
+ # enough -- the firstbase download follows the redirect and dies on the
60
+ # blocked target. The real firstbase chain is id.gs1.ch -> id.gs1.org ->
61
+ # apitools.gs1.ch, so the redirect is followed dynamically too (see
62
+ # check_host); this list just guarantees the known target is probed even
63
+ # when the redirect probe is short-circuited.
64
+ FORWARDERS = {
65
+ "id.gs1.org" => "GS1 global resolver (id.gs1.ch redirect target, --firstbase / -b)"
66
+ }.freeze
67
+
37
68
  def hosts_for(options = {})
38
69
  hosts = BASE_HOSTS.dup
39
70
  hosts["epl.bag.admin.ch"] = "BAG FHIR data (--fhir)" if options[:fhir]
40
- hosts["id.gs1.ch"] = "GS1 NONPHARMA (--firstbase / -b)" if options[:firstbase]
71
+ if options[:firstbase]
72
+ hosts["id.gs1.ch"] = "GS1 NONPHARMA (--firstbase / -b)"
73
+ hosts["id.gs1.org"] = FORWARDERS["id.gs1.org"]
74
+ end
41
75
  hosts["www.spezialitaetenliste.ch"] = "BAG Spezialitätenliste" unless options[:fhir]
42
76
  hosts["www.medregbm.admin.ch"] = "Medizinalberuferegister (-x address)" if options[:address]
43
77
  hosts
@@ -51,7 +85,7 @@ module Oddb2xml
51
85
  "id.gs1.ch" => "GS1 NONPHARMA (--firstbase / -b)",
52
86
  "www.spezialitaetenliste.ch" => "BAG Spezialitätenliste",
53
87
  "www.medregbm.admin.ch" => "Medizinalberuferegister (-x address)"
54
- )
88
+ ).merge(FORWARDERS)
55
89
  end
56
90
 
57
91
  # Probe every host and print a full OK/BLOCKED/UNREACHABLE table.
@@ -59,32 +93,40 @@ module Oddb2xml
59
93
  def report(_options = {})
60
94
  proxy = proxy_uri
61
95
  results = all_hosts.map do |host, desc|
62
- Thread.new { [host, desc, check_host(host, proxy)] }
96
+ Thread.new { [host, desc, check_host(host, proxy, probe_path(host))] }
63
97
  end.map(&:value).sort_by { |(host, _desc, _status)| host }
64
98
 
65
99
  header = "oddb2xml connectivity check"
66
100
  header += proxy ? " (via proxy #{proxy.host}:#{proxy.port})" : " (no proxy configured)"
67
101
  puts header
68
102
  results.each do |(host, desc, status)|
69
- tag = case status
103
+ tag = case status[:result]
70
104
  when :ok then "OK "
71
105
  when :blocked then "BLOCKED" # proxy returned 407
72
106
  else "UNREACH"
73
107
  end
74
- puts format(" [%s] %-28s %s", tag, host, desc)
108
+ label = status[:via] ? "#{host} -> #{status[:via]}" : host
109
+ puts format(" [%s] %-36s %s", tag, label, desc)
75
110
  end
76
- unreachable = results.reject { |(_host, _desc, status)| status == :ok }
111
+ unreachable = results.reject { |(_host, _desc, status)| status[:result] == :ok }
77
112
  if unreachable.empty?
78
113
  puts "All #{results.size} hosts reachable."
79
114
  true
80
115
  else
81
116
  puts "#{unreachable.size} of #{results.size} host(s) NOT reachable -- downloads using them will fail."
117
+ results.select { |(_host, _desc, status)| status[:via] }.each do |(host, _desc, status)|
118
+ puts " note: #{host} redirects to #{status[:via]} -- that host must be on the proxy allow-list too."
119
+ end
82
120
  false
83
121
  end
84
122
  end
85
123
 
86
- # Returns :ok, :blocked (proxy 407) or :unreachable for a single host.
87
- def check_host(host, proxy)
124
+ # Probe a host (following HTTP redirects to other hosts) and return a Hash:
125
+ # { result: :ok | :blocked | :unreachable, via: "final.host" | nil }
126
+ # `:via` is set only when the host redirected to a *different* host, so the
127
+ # caller can surface that the redirect target (e.g. id.gs1.ch -> id.gs1.org)
128
+ # must be reachable too -- a 301 to a blocked host used to be reported as OK.
129
+ def check_host(host, proxy, path = "/", hops = 4, origin = nil)
88
130
  http =
89
131
  if proxy
90
132
  Net::HTTP.new(host, 443, proxy.host, proxy.port, proxy.user, proxy.password)
@@ -96,14 +138,27 @@ module Oddb2xml
96
138
  http.open_timeout = TIMEOUT
97
139
  http.read_timeout = TIMEOUT
98
140
  http.start do |h|
99
- res = h.head("/")
100
- return :blocked if res.code.to_s == "407"
101
- return :ok # any HTTP answer (200/301/403/404/...) means the host is reachable
141
+ res = h.head(path)
142
+ return {result: :blocked, via: via_for(origin, host)} if res.code.to_s == "407"
143
+ if res.code.to_s.start_with?("3") && res["location"] && hops > 0
144
+ loc = URI.parse(res["location"])
145
+ if loc.host && loc.host != host
146
+ next_path = (loc.respond_to?(:request_uri) && loc.request_uri) ? loc.request_uri : "/"
147
+ return check_host(loc.host, proxy, next_path, hops - 1, origin || host)
148
+ end
149
+ end
150
+ # any other HTTP answer (200/403/404/...) means this host is reachable
151
+ return {result: :ok, via: via_for(origin, host)}
102
152
  end
103
153
  rescue => error
104
154
  msg = error.message.to_s.downcase
105
- return :blocked if msg.include?("407") || msg.include?("authenticationrequired") || msg.include?("proxy")
106
- :unreachable
155
+ blocked = msg.include?("407") || msg.include?("authenticationrequired") || msg.include?("proxy")
156
+ {result: blocked ? :blocked : :unreachable, via: via_for(origin, host)}
157
+ end
158
+
159
+ # The final host reached, but only when it differs from where we started.
160
+ def via_for(origin, host)
161
+ (origin && origin != host) ? host : nil
107
162
  end
108
163
 
109
164
  # Probe all relevant hosts concurrently and warn about any that fail.
@@ -114,10 +169,10 @@ module Oddb2xml
114
169
  proxy = proxy_uri
115
170
  hosts = hosts_for(options)
116
171
  results = hosts.map do |host, desc|
117
- Thread.new { [host, desc, check_host(host, proxy)] }
172
+ Thread.new { [host, desc, check_host(host, proxy, probe_path(host))] }
118
173
  end.map(&:value)
119
174
 
120
- problems = results.reject { |(_host, _desc, status)| status == :ok }
175
+ problems = results.reject { |(_host, _desc, status)| status[:result] == :ok }
121
176
  return if problems.empty?
122
177
 
123
178
  warn_about(problems, proxy)
@@ -130,8 +185,9 @@ module Oddb2xml
130
185
  warn " The following hosts could not be reached -- the corresponding"
131
186
  warn " downloads will FAIL or produce incomplete data:"
132
187
  problems.each do |(host, desc, status)|
133
- tag = (status == :blocked) ? "BLOCKED by proxy (407)" : "UNREACHABLE "
134
- warn format(" [%s] %-26s %s", tag, host, desc)
188
+ tag = (status[:result] == :blocked) ? "BLOCKED by proxy (407)" : "UNREACHABLE "
189
+ label = status[:via] ? "#{host} -> #{status[:via]}" : host
190
+ warn format(" [%s] %-34s %s", tag, label, desc)
135
191
  end
136
192
  if proxy
137
193
  warn ""
@@ -1,3 +1,3 @@
1
1
  module Oddb2xml
2
- VERSION = "3.0.22"
2
+ VERSION = "3.0.24"
3
3
  end
@@ -0,0 +1,99 @@
1
+ #!/usr/bin/env bash
2
+ #
3
+ # run_oddb2xml — build the firstbase (-b) oddb2xml feed at several price
4
+ # increments and stage the results for transfer.
5
+ #
6
+ # The upstream sources are downloaded ONCE: the first build fetches them, and
7
+ # every subsequent increment re-uses the cached ./downloads via --skip-download.
8
+ # All builds therefore run in a single shared working directory ($BUILD_DIR) —
9
+ # the original deploy script cd'd into a separate dir per increment, where
10
+ # --skip-download could not find downloads/ (DOWNLOADS is cwd-relative) and
11
+ # silently re-downloaded everything each time.
12
+ #
13
+ # Output layout (under $OUT_DIR, default /home/zdavatz/oddb2xml):
14
+ # <OUT_DIR>/45/ oddb_*.xml built with +45 %
15
+ # <OUT_DIR>/50/ oddb_*.xml built with +50 %
16
+ # <OUT_DIR>/55/ oddb_*.xml built with +55 %
17
+ # <OUT_DIR>/default/ oddb_*.xml built with no increment
18
+ # Each destination dir also keeps the source archive as oddb2xml.zip.
19
+ # The working dir ($BUILD_DIR, default <OUT_DIR>-build) holds the shared
20
+ # downloads/ cache and the transient zip; it lives OUTSIDE $OUT_DIR so the
21
+ # transfer's `scp -r $OUT_DIR/*` never uploads the multi-hundred-MB cache.
22
+ #
23
+ # Configurable via environment:
24
+ # OUT_DIR destination root (default /home/zdavatz/oddb2xml)
25
+ # BUILD_DIR working dir (default <OUT_DIR>-build)
26
+ # INCREMENTS space-separated percents (default "45 50 55")
27
+ # ODDB2XML_BIN oddb2xml executable (default oddb2xml)
28
+ # SKIP_GEM_INSTALL set to 1 to skip `gem install oddb2xml`
29
+ # RUN_TRANSFER set to 1 to run the transfer (scripts/transfer.sh) at the end
30
+ # TRANSFER_CMD transfer command (default: sudo, preserving
31
+ # ODDB2XML_TRANSFER_DIR, scripts/transfer.sh next to this file)
32
+ #
33
+ set -euo pipefail
34
+
35
+ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
36
+ OUT_DIR="${OUT_DIR:-/home/zdavatz/oddb2xml}"
37
+ BUILD_DIR="${BUILD_DIR:-${OUT_DIR%/}-build}"
38
+ INCREMENTS="${INCREMENTS:-45 50 55}"
39
+ ODDB2XML_BIN="${ODDB2XML_BIN:-oddb2xml}"
40
+ TRANSFER_CMD="${TRANSFER_CMD:-$SCRIPT_DIR/transfer.sh}"
41
+
42
+ log() { printf '%s %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$*"; }
43
+
44
+ # 1. Install / update the published gem unless told otherwise.
45
+ if [[ "${SKIP_GEM_INSTALL:-0}" != "1" ]]; then
46
+ log "Installing oddb2xml gem"
47
+ gem install oddb2xml
48
+ fi
49
+
50
+ # 2. Fresh working dir (keeps a shared downloads/ cache across increments).
51
+ log "Preparing build dir $BUILD_DIR"
52
+ rm -rf "$BUILD_DIR"
53
+ mkdir -p "$BUILD_DIR"
54
+ cd "$BUILD_DIR"
55
+
56
+ first=1
57
+
58
+ # build_one <increment-percent|""> <destination-subdir>
59
+ build_one() {
60
+ local inc="$1" name="$2" dest="$OUT_DIR/$2"
61
+ local inc_opt=() dl_opt=()
62
+ [[ -n "$inc" ]] && inc_opt=(-I "$inc")
63
+
64
+ if [[ $first -eq 1 ]]; then
65
+ first=0 # first build downloads the sources
66
+ else
67
+ dl_opt=(--skip-download) # the rest re-use the cached downloads/
68
+ fi
69
+
70
+ log "Building increment '${inc:-none}' -> $dest"
71
+ rm -f oddb*.zip
72
+ "$ODDB2XML_BIN" "${dl_opt[@]}" -b "${inc_opt[@]}" -c zip
73
+
74
+ shopt -s nullglob
75
+ local zips=(oddb*.zip)
76
+ shopt -u nullglob
77
+ [[ ${#zips[@]} -ge 1 ]] || { log "ERROR: no zip produced for increment '${inc:-none}'"; exit 1; }
78
+ local zip="${zips[0]}"
79
+
80
+ rm -rf "$dest"
81
+ mkdir -p "$dest"
82
+ unzip -o -q -d "$dest" "$zip"
83
+ mv "$zip" "$dest/oddb2xml.zip"
84
+ log "Staged $dest"
85
+ }
86
+
87
+ for inc in $INCREMENTS; do
88
+ build_one "$inc" "$inc"
89
+ done
90
+ build_one "" "default" # final run with no increment
91
+
92
+ # 3. Optional hand-off to the transfer step (scripts/transfer.sh).
93
+ if [[ "${RUN_TRANSFER:-0}" == "1" ]]; then
94
+ log "Running transfer: $TRANSFER_CMD"
95
+ export ODDB2XML_TRANSFER_DIR="$OUT_DIR" # keep transfer.sh in sync with OUT_DIR
96
+ $TRANSFER_CMD
97
+ fi
98
+
99
+ log "Done. Output under $OUT_DIR"
@@ -0,0 +1,45 @@
1
+ #!/bin/bash
2
+ #
3
+ # transfer.sh — push the generated oddb2xml feeds (plus the aips2sqlite
4
+ # Fachinformation XML and the swissmedic-sequences CSV) to the HIN download
5
+ # server via scp. Runs on the ywesee host; everything is user-owned, so no sudo.
6
+ #
7
+ # Paths default to this host's layout and can be overridden via environment:
8
+ # ODDB2XML_TRANSFER_DIR dir whose contents go to .../download/oddb2xml/
9
+ # (default /home/zdavatz/oddb2xml)
10
+ # AIPS2SQLITE_DIR aips2sqlite output dir
11
+ # (default /home/zdavatz/software/aips2sqlite/jars/output)
12
+ # SSH_KEY scp identity file (default ~/.ssh/id_ed25519)
13
+ # SCP_DEST scp destination base, e.g. user@host:/path/download
14
+ # (REQUIRED — no default yet; set the new download server)
15
+
16
+ ODDB2XML_TRANSFER_DIR="${ODDB2XML_TRANSFER_DIR:-/home/zdavatz/oddb2xml}"
17
+ AIPS2SQLITE_DIR="${AIPS2SQLITE_DIR:-/home/zdavatz/software/aips2sqlite/jars/output}"
18
+ SSH_KEY="${SSH_KEY:-$HOME/.ssh/id_ed25519}"
19
+ # TODO: set the new download-server destination.
20
+ SCP_DEST="${SCP_DEST:?set SCP_DEST to the scp target, e.g. user@host:/var/www/.../download}"
21
+
22
+ ###
23
+ ### ODDB2XML
24
+ ###
25
+
26
+ find "$ODDB2XML_TRANSFER_DIR/" -type d -exec chmod 755 {} \;
27
+ find "$ODDB2XML_TRANSFER_DIR/" -type f -exec chmod 644 {} \;
28
+
29
+ scp -r -i "$SSH_KEY" "$ODDB2XML_TRANSFER_DIR"/* "$SCP_DEST/oddb2xml/"
30
+
31
+ ###
32
+ ### aips2sqlite
33
+ ###
34
+
35
+ if [ -d "$AIPS2SQLITE_DIR/fis" ]; then
36
+ find "$AIPS2SQLITE_DIR/fis" -name '*.xml' -type f -exec chmod 644 {} \;
37
+ scp -r -i "$SSH_KEY" "$AIPS2SQLITE_DIR"/fis/*.xml "$SCP_DEST/mediupdate-xml/"
38
+ fi
39
+
40
+ if [ -f "$AIPS2SQLITE_DIR/oddb2xml_swissmedic_sequences.csv" ]; then
41
+ chmod 644 "$AIPS2SQLITE_DIR/oddb2xml_swissmedic_sequences.csv"
42
+ scp -r -i "$SSH_KEY" "$AIPS2SQLITE_DIR/oddb2xml_swissmedic_sequences.csv" "$SCP_DEST/oddb2xml/"
43
+ fi
44
+
45
+ exit 0