oddb2xml 3.0.28 → 3.0.29

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 62af20b2609637da38197880075560b49d0f810142ca162623980490abab4f6e
4
- data.tar.gz: a5ac57d7d4864331ff34fa0f60f37ccf497c42b186f5d1a0a7ed0ee3322b3178
3
+ metadata.gz: 4602599ed50439225ff6ce76000f8441a9b4963b02538adf3f46d6728beff6cc
4
+ data.tar.gz: 55f34b946cc7a0e75435459aa90169e23719ac44c4bbfdf1ba3d63c3162f894d
5
5
  SHA512:
6
- metadata.gz: 5bc3e6e4bb7bf221deb8feee038a59e723b491af3ebfa40a91d4c26d4a6fc9fc5c9417801ddac922b5a3be981310225b4b84e5f0c98a1daa06e2983251c80302
7
- data.tar.gz: 434a73b313996af5a2f0ee11a7d404a9cd86bcd0c853abf913867fc0a0d7b85b1bf12dcdd5075f9574229d8c6f0d7b5610d2aa2c3062adde35a7f5bc303e086c
6
+ metadata.gz: cf98f9144a53cd62be8599168423c4c6d560b8474dc015d5281eccb84b2f80de6d46e3765b7ab96cc6271822361ea0a4a7b1f00bae1908e16f94f8fb77006908
7
+ data.tar.gz: 7ed225d5f4cd6f3c3293913cb40e01430839968d34e15551a2192a1f9069343dd98f7448179a0e433bc6485d8810510904add62632e6e5ddf6d03d50b8f41dc8
data/CLAUDE.md CHANGED
@@ -68,7 +68,7 @@ YAML files in `data/` provide manual overrides and mappings: `article_overrides.
68
68
 
69
69
  These scripts run the public download server at `https://mediupdatexml.oddb.org` (Apache on this host) and are **not** part of the gem itself.
70
70
 
71
- - **`run_oddb2xml.sh`** — nightly build driver (cron: `0 1 * * * zdavatz`). Downloads the upstream sources **once**, then builds the `-b`/firstbase feed at price increments `45/50/55` plus `default` (no increment) into `$OUT_DIR` (`/home/zdavatz/oddb2xml`, one subdir each). The shared `downloads/` cache and transient zip live in `$BUILD_DIR` (`<OUT_DIR>-build`), **outside** `$OUT_DIR` so the transfer never uploads the multi-hundred-MB cache. Final step ("2b") regenerates the landing page. Each `oddb2xml` invocation is wrapped in `run_with_retry` (default **3 attempts, 120 s apart**, tunable via `ODDB2XML_RETRIES`/`ODDB2XML_RETRY_DELAY`): a transient upstream download failure (e.g. Swissmedic resetting the connection, `Errno::ECONNRESET`) previously aborted the whole `set -e` run 14 s in and left the feeds a day stale, so it now retries before giving up; a genuine repeated failure still stops the run.
71
+ - **`run_oddb2xml.sh`** — nightly build driver (cron: `0 1 * * * zdavatz`). Downloads the upstream sources **once**, then builds the `-b`/firstbase feed at price increments `45/50/55` plus `default` (no increment) into `$OUT_DIR` (`/home/zdavatz/oddb2xml`, one subdir each). The shared `downloads/` cache and transient zip live in `$BUILD_DIR` (`<OUT_DIR>-build`), **outside** `$OUT_DIR` so the transfer never uploads the multi-hundred-MB cache. Final step ("2b") regenerates the landing page. Each `oddb2xml` invocation is wrapped in `run_with_retry` (default **3 attempts, 120 s apart**, tunable via `ODDB2XML_RETRIES`/`ODDB2XML_RETRY_DELAY`): a transient upstream download failure (e.g. Swissmedic resetting the connection, `Errno::ECONNRESET`) previously aborted the whole `set -e` run 14 s in and left the feeds a day stale, so it now retries before giving up; a genuine repeated failure still stops the run. **Firstbase (GS1 NONPHARMA) last-good fallback (3.0.29 onwards):** the GS1 `GetFirstbaseHealthcare` export (`id.gs1.ch/01/07612345000961` → `apitools.gs1.ch`) has been answering `403 - Forbidden`, which blanked `firstbase.csv` and dropped **every** NONPHARMA article from the `-b` feed (landing page then showed `NONPHARMA = 0 − 1 = −1`). The script keeps the last successful `firstbase.csv` in a persistent cache `$FIRSTBASE_CACHE` (default `<OUT_DIR>-state/firstbase.csv`) **outside** `$BUILD_DIR` so it survives the nightly `rm -rf`, seeds it into `downloads/` before the build, and refreshes it after a successful download. The gem side (`FirstbaseDownloader#download`, rewritten in 3.0.29) makes the seed usable: it still attempts the live GS1 fetch on the first (downloading) build, but only overwrites `firstbase.csv` when the response is a real non-empty CSV (`firstbase_csv?` rejects HTML/`403 - Forbidden`/empty bodies and open-uri exceptions), otherwise it **keeps the existing seeded file** instead of the old `"w+"` truncate-to-zero. A recovered GS1 therefore refreshes the data automatically; while GS1 is down the feed serves yesterday's (last-good) NONPHARMA rather than nothing. `generate_index_html.sh` also guards the NONPHARMA count so an empty CSV renders `—` (not `−1`).
72
72
  - **`generate_index_html.sh DOCROOT [FIRSTBASE_CSV]`** — single source of truth for the landing page. Writes `index.html` + a self-contained `logo.svg` **atomically** (temp + `mv`, so either owner — root from setup, `zdavatz` from cron — can refresh it). Computes live counts: PHARMA = `<SMNO>` count in `default/oddb_article.xml`, NONPHARMA = firstbase CSV rows − 1, total ART = `<ART ` count. Also runs **`visitor_stats.py`** and embeds its graph. Re-run standalone any time (it only reads already-built files); a separate cron line refreshes it **hourly** (`5 * * * * zdavatz`) so counts + graph stay current between nightly builds.
73
73
  - **`visitor_stats.py LOG_GLOB CACHE_DIR [DAYS]`** — emits the visitors/sessions/region graph as an inline-SVG HTML **fragment** (last `DAYS`, default 14): Besucher = distinct IPs/day, Sitzungen = 30-min-inactivity sessions per `(IP, User-Agent)`, plus a top-6 country breakdown by IP. Bots are filtered by User-Agent. Region lookup is **fully self-contained** — pure Python stdlib + the free **DB-IP country-lite CSV** (CC-BY, no licence key) cached in the build `downloads/` dir and refreshed monthly; **no apt package, no gem, no system GeoIP DB**. Prints nothing (page degrades to omitting the section) when the Apache log is unreadable or empty. Reading `/var/log/apache2` requires the cron user to be in the **`adm`** group (`sudo usermod -aG adm zdavatz`).
74
74
  - **`swissmedic_watch.sh`** — outage/block auto-recovery (cron: `*/30 * * * * zdavatz`). Since the Swissmedic platform migration (~2026-06-23, now a Swisscom-operated gateway), `www.swissmedic.ch` intermittently resets this host's automated connections **after the TLS handshake** (TCP RST), which aborts `run_oddb2xml.sh` under `set -e` and leaves the feeds stale (the block is host/IP- and client-fingerprint-sensitive: a real browser works, `curl`/`wget`/Ruby get reset, while other admin.ch hosts answer fine — so it is a WAF/bot rule, not an outage). The watcher polls Swissmedic with **oddb2xml's own client** (a Ruby `open-uri` canary on `listen_neu.html`); while blocked it is a silent no-op, and the moment it gets HTTP 200 it launches **one** build and emails. It fires **at most once per day** (stamp in `$STATE_DIR`, default `<OUT_DIR>-watch`, kept **outside** the wiped `$BUILD_DIR`), and skips when a build is already running or today's `default/oddb_article.xml` is already fresh. It exports `RBENV_VERSION=3.4.5` + the rbenv-shims PATH to match the nightly cron (the repo `.ruby-version` pins an uninstalled Ruby).
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- oddb2xml (3.0.28)
4
+ oddb2xml (3.0.29)
5
5
  csv
6
6
  htmlentities
7
7
  httpi
@@ -396,14 +396,53 @@ module Oddb2xml
396
396
  @url = BASE_URL
397
397
  end
398
398
 
399
+ # A valid firstbase export is a non-empty CSV. When GS1 is unavailable it
400
+ # answers with an HTML error page (the GetFirstbaseHealthcare endpoint has
401
+ # been returning "403 - Forbidden: Access is denied") or open-uri raises.
402
+ # The old "w+" download truncated firstbase.csv to zero bytes on any such
403
+ # failure, silently dropping every NONPHARMA article. Reject non-CSV bodies
404
+ # so the caller can keep the previous good firstbase.csv instead.
405
+ def firstbase_csv?(text)
406
+ return false if text.nil?
407
+ head = text[0, 512].to_s.strip.downcase
408
+ return false if head.empty?
409
+ return false if head.start_with?("<!doctype", "<html", "<?xml")
410
+ return false if head.include?("403 - forbidden") || head.include?("access is denied")
411
+ true
412
+ end
413
+
399
414
  def download
400
415
  @file2save = File.join(DOWNLOADS, "firstbase.csv")
401
416
  report_download(@url, @file2save)
402
- begin
403
- download_as(@file2save, "w+")
417
+ # Price-increment / Artikelstamm runs (--skip-download) reuse the cached
418
+ # firstbase.csv. Do NOT skip merely because the file exists: the nightly
419
+ # deploy seeds a last-good copy so a GS1 outage does not blank the feed,
420
+ # and we still want a fresh download attempt on the first (downloading)
421
+ # build so a recovered GS1 refreshes the data.
422
+ if Oddb2xml.skip_download? && File.size?(@file2save)
423
+ Oddb2xml.log "FirstbaseDownloader: --skip-download, reusing cached #{@file2save} (#{File.size(@file2save)} bytes)"
404
424
  return File.expand_path(@file2save)
425
+ end
426
+ begin
427
+ data = Oddb2xml.uri_open(@url).read
428
+ if firstbase_csv?(data)
429
+ File.write(@file2save, data)
430
+ Oddb2xml.log "FirstbaseDownloader: fetched fresh firstbase.csv (#{data.bytesize} bytes)"
431
+ elsif File.size?(@file2save)
432
+ Oddb2xml.log "FirstbaseDownloader: GS1 returned no CSV (#{data.to_s.bytesize} bytes); keeping existing #{@file2save} (#{File.size(@file2save)} bytes)"
433
+ else
434
+ Oddb2xml.log "FirstbaseDownloader: GS1 returned no CSV and there is no cached firstbase.csv to fall back to"
435
+ end
405
436
  rescue Timeout::Error, Errno::ETIMEDOUT
406
437
  retrievable? ? retry : raise
438
+ rescue => error
439
+ # 403 / blocked / unreachable: keep any existing firstbase.csv (e.g. the
440
+ # last-good copy the deploy script seeds) rather than truncating it.
441
+ if File.size?(@file2save)
442
+ Oddb2xml.log "FirstbaseDownloader: download failed (#{error.class}: #{error}); keeping existing #{@file2save} (#{File.size(@file2save)} bytes)"
443
+ else
444
+ Oddb2xml.log "FirstbaseDownloader: download failed (#{error.class}: #{error}) and no cached firstbase.csv to fall back to"
445
+ end
407
446
  ensure
408
447
  Oddb2xml.download_finished(@file2save, false)
409
448
  end
@@ -1,3 +1,3 @@
1
1
  module Oddb2xml
2
- VERSION = "3.0.28"
2
+ VERSION = "3.0.29"
3
3
  end
@@ -35,7 +35,13 @@ pharma="—"
35
35
  [[ -f "$ARTICLE_XML" ]] && pharma=$(grep -c '<SMNO>' "$ARTICLE_XML" || true)
36
36
 
37
37
  nonpharma="—"
38
- [[ -f "$FIRSTBASE_CSV" ]] && nonpharma=$(( $(wc -l < "$FIRSTBASE_CSV") - 1 ))
38
+ # Only count when the CSV actually has data rows. An empty firstbase.csv (the
39
+ # GS1 firstbase dump upstream returns 403/empty from time to time) would make a
40
+ # naive `rows - 1` render "-1"; keep the "—" fallback instead.
41
+ if [[ -s "$FIRSTBASE_CSV" ]]; then
42
+ fb_rows=$(wc -l < "$FIRSTBASE_CSV")
43
+ (( fb_rows > 1 )) && nonpharma=$(( fb_rows - 1 ))
44
+ fi
39
45
 
40
46
  stand=$(date '+%d.%m.%Y %H:%M')
41
47
 
@@ -150,7 +156,8 @@ cat > "$tmp" <<HTML
150
156
 
151
157
  <h2>Elexis Artikelstamm</h2>
152
158
  <ul>
153
- <li><a href="artikelstamm/">artikelstamm/</a> <span class="desc">— Elexis Artikelstamm v6 (mit BAG-Indikationscodes) und Legacy-v5, je als XML und CSV, täglich aktualisiert</span></li>
159
+ <li><a href="artikelstamm/">artikelstamm/</a> <span class="desc">— Elexis Artikelstamm v6 (mit BAG-Indikationscodes) und Legacy-v5, je als XML und CSV, täglich aktualisiert (erzeugt mit oddb2xml)</span></li>
160
+ <li><a href="artikelstamm/rust2xml/">artikelstamm/rust2xml/</a> <span class="desc">— derselbe Artikelstamm v6 + v5, erzeugt mit <a href="https://github.com/zdavatz/rust2xml">rust2xml</a> (Rust-Port), täglich um 03:00 aus denselben Live-Quellen</span></li>
154
161
  </ul>
155
162
 
156
163
  <h2>aips2sqlite — Fachinformationen &amp; AmiKo-Datenbanken</h2>
@@ -81,6 +81,41 @@ rm -rf "$BUILD_DIR"
81
81
  mkdir -p "$BUILD_DIR"
82
82
  cd "$BUILD_DIR"
83
83
 
84
+ # 3. Seed the ZurRose transfer.zip from the local get_transfer mirror.
85
+ # get_transfer.sh (crontab 00:30) downloads transfer.dat straight from
86
+ # zurrose.ch on THIS host and uploads the zip to pillbox.oddb.org — so the
87
+ # pillbox HTTP fetch is a needless detour back to our own file. Placing the
88
+ # zip in downloads/ makes oddb2xml's skip_download reuse it and the build no
89
+ # longer depends on pillbox being up (2026-07-02: pillbox refused connections
90
+ # during the 01:00 run and all three retries died, killing the whole nightly
91
+ # build). If the seed file is missing, oddb2xml falls back to the normal
92
+ # pillbox download as before.
93
+ GET_TRANSFER_ZIP="${GET_TRANSFER_ZIP:-/home/zdavatz/software/get_transfer/TRANSFER.ZIP}"
94
+ if [[ -s "$GET_TRANSFER_ZIP" ]]; then
95
+ mkdir -p "$BUILD_DIR/downloads"
96
+ cp -p "$GET_TRANSFER_ZIP" "$BUILD_DIR/downloads/transfer.zip"
97
+ log "Seeded ZurRose transfer.zip from $GET_TRANSFER_ZIP ($(date -r "$GET_TRANSFER_ZIP" '+%Y-%m-%d %H:%M'), no pillbox fetch needed)"
98
+ else
99
+ log "WARNING: $GET_TRANSFER_ZIP missing - falling back to pillbox.oddb.org download"
100
+ fi
101
+
102
+ # 3b. Firstbase (GS1 NONPHARMA) fallback. The GS1 GetFirstbaseHealthcare
103
+ # endpoint has been answering "403 - Forbidden", which blanked firstbase.csv and
104
+ # dropped every NONPHARMA article from the -b feed. Keep the last successful
105
+ # firstbase.csv in a persistent cache OUTSIDE $BUILD_DIR (it survives the nightly
106
+ # `rm -rf`) and seed it into downloads/ so the gem's FirstbaseDownloader falls
107
+ # back to yesterday's file when today's download fails. A recovered GS1 still
108
+ # refreshes the data: the first (downloading) build always retries the live
109
+ # fetch and only keeps the seed when that fetch yields no CSV.
110
+ FIRSTBASE_CACHE="${FIRSTBASE_CACHE:-${OUT_DIR%/}-state/firstbase.csv}"
111
+ if [[ -s "$FIRSTBASE_CACHE" ]]; then
112
+ mkdir -p "$BUILD_DIR/downloads"
113
+ cp -p "$FIRSTBASE_CACHE" "$BUILD_DIR/downloads/firstbase.csv"
114
+ log "Seeded firstbase.csv from last-good cache $FIRSTBASE_CACHE ($(($(wc -l < "$FIRSTBASE_CACHE") - 1)) rows, $(date -r "$FIRSTBASE_CACHE" '+%Y-%m-%d %H:%M'))"
115
+ else
116
+ log "No firstbase last-good cache at $FIRSTBASE_CACHE yet - relying on live GS1 download"
117
+ fi
118
+
84
119
  first=1
85
120
 
86
121
  # build_one <increment-percent|""> <destination-subdir>
@@ -132,17 +167,43 @@ build_artikelstamm() {
132
167
  shopt -u nullglob
133
168
  [[ ${#out[@]} -ge 1 ]] || { log "ERROR: no artikelstamm output produced"; exit 1; }
134
169
 
135
- rm -rf "$dest"
136
170
  mkdir -p "$dest"
137
- cp -p "${out[@]}" "$dest/"
171
+ # Remove only oddb2xml's own top-level files; keep sub-directories such as
172
+ # rust2xml/ (published independently by rust2xml's own cron at 03:00) intact.
173
+ # A plain `rm -rf "$dest"` used to wipe that sibling output every night.
174
+ rm -f "$dest"/artikelstamm_*.xml "$dest"/artikelstamm_*.csv
175
+ # Publish under date-less, stable names so the download URLs never change:
176
+ # artikelstamm_01072026_v6.xml -> artikelstamm_v6.xml (same for _v5 / .csv).
177
+ local f base
178
+ for f in "${out[@]}"; do
179
+ base="$(basename "$f" | sed -E 's/_[0-9]{8}_/_/')"
180
+ cp -p "$f" "$dest/$base"
181
+ done
138
182
  log "Staged ${#out[@]} file(s) to $dest"
139
183
  }
140
184
 
185
+ # Build order: default first (it downloads the shared sources), then the
186
+ # Artikelstamm right after so it is published early, then the price increments.
187
+ build_one "" "default" # first run: downloads sources, no increment
188
+
189
+ # Refresh the last-good firstbase.csv cache after the downloading build. When
190
+ # GS1 answered, downloads/firstbase.csv now holds fresh data; when it 403'd, the
191
+ # gem kept the seeded copy - either way a non-empty file is worth caching so the
192
+ # next run can fall back to it. An empty file means both today's download AND the
193
+ # seed were missing, so leave the previous cache untouched.
194
+ FIRSTBASE_LIVE="$BUILD_DIR/downloads/firstbase.csv"
195
+ if [[ -s "$FIRSTBASE_LIVE" ]]; then
196
+ mkdir -p "$(dirname "$FIRSTBASE_CACHE")"
197
+ cp -p "$FIRSTBASE_LIVE" "$FIRSTBASE_CACHE"
198
+ log "Cached firstbase.csv as last-good ($(($(wc -l < "$FIRSTBASE_LIVE") - 1)) rows) -> $FIRSTBASE_CACHE"
199
+ else
200
+ log "WARNING: firstbase.csv is empty after the build (GS1 403 and no cache) - NONPHARMA missing this run"
201
+ fi
202
+
203
+ build_artikelstamm # Elexis Artikelstamm (v6 + legacy v5)
141
204
  for inc in $INCREMENTS; do
142
- build_one "$inc" "$inc"
205
+ build_one "$inc" "$inc" # price increments re-use the cached downloads/
143
206
  done
144
- build_one "" "default" # final run with no increment
145
- build_artikelstamm # Elexis Artikelstamm (v6 + legacy v5)
146
207
 
147
208
  # 2b. Refresh the download landing page with the live PHARMA/NONPHARMA counts
148
209
  # (PHARMA from default/oddb_article.xml, NONPHARMA from the GS1 firstbase CSV).
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: oddb2xml
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.28
4
+ version: 3.0.29
5
5
  platform: ruby
6
6
  authors:
7
7
  - Yasuhiro Asaka, Zeno R.R. Davatz, Niklaus Giger