oddb2xml 3.0.13 → 3.0.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a7683d46d314f40d96aee89b278969119fd9fe0b7736512e5e291eb127f57ab1
4
- data.tar.gz: a4246838014fcc77f8a2560db7bf223ac5da3469171eed8873aaf82b81c20877
3
+ metadata.gz: 48a674b29560c021e581fdbcd060a0bcf58f618ce853a93a54e8a7504bbb3f0a
4
+ data.tar.gz: ac3fc397a257b03a51f760299241c9865c325b22c3bb1b57ba76a0dcfa5fcb59
5
5
  SHA512:
6
- metadata.gz: a9eca4b589da6ffba2995a8f781a05f84529c376f234d9810bbfa3ff48d3c7bfa6126a1829c7560af676d085a6ff12bcf5556468cacaa63f64f70d794998541d
7
- data.tar.gz: 36c76b18c0282ce169cfd152e0ff00bec9a2a670e7eb53c15a3c7e5a6014370e0884b0e8d64263847bd8a4219e5041aa6f8d257df354c42f2916d66fd9a59470
6
+ metadata.gz: bd0da98d7f153c459f0f8548f56c1ac416752ee9e05aafe2cb2bf5965e1f88c7bab1272237210b7ead8079b6f8d564eeca187801348d26763439dc6133a8c490
7
+ data.tar.gz: 7b53c6f9ed2f363bbae51c2f8d6cbce456774534f08045def0248a13a22a8e17c2e723f093936e6d607c02d7e3da95e74eb00945bdae6fd393f0aac6fa1da8fc
data/CLAUDE.md CHANGED
@@ -49,7 +49,7 @@ The system follows a **download → extract → build → compress** pipeline:
49
49
 
50
50
  7. **FHIR support** (`lib/oddb2xml/fhir_support.rb`) — Self-contained module providing `FhirDownloader` and FHIR NDJSON parsing. Activated via `--fhir` (or `--fhir-url=<URL>`). Downloads per-language NDJSON files (`foph-sl-export-latest-{de,fr,it}.ndjson`) from `epl.bag.admin.ch` to populate French and Italian product names/descriptions. Maps legal status codes `756005022007` and `756005022008` to Swissmedic category D. Reads the BAG **Indikationscode** (`XXXXX.NN`) from the explicit `indicationCode` extension on each `RegulatedAuthorization.indication[].extension[regulatedAuthorization-limitation]` (BAG SL FHIR export >= v2.0.5; handled from 3.0.10). The BAG changelog states the limitation code (`ClinicalUseDefinition.id`) and the indication code are **independent** fields, so the older derivation — combining each indication CUD's `.NN` id-suffix with the reimbursement RA's `FOPHDossierNumber` — is kept only as a fallback for feeds lacking the extension. Exposed as `item[:indication_codes]` and per-package `:indication_codes` (each entry a `{code:, cud_id:, text:}` hash, where `cud_id` is the `limitationIndication` CUD reference used to resolve the text). From 3.0.7 onwards, `Builder#build_product` emits one `<INDICATION_CODE code="XXXXX.NN" cud_id="DRUG.NN">limitation text</INDICATION_CODE>` child per indication on every `<PRD>` in `oddb_product.xml`; live feed numbers: 539 products / 1,293 codes / 100 % with non-empty indication text. Mandatory on prescriptions/invoices for SL price-model drugs from 2026-07-01 — see issue [#113](https://github.com/zdavatz/oddb2xml/issues/113). **Limitation texts** (3.0.8 onwards): the `regulatedAuthorization-limitation` extension has no inline `limitationText` in the live BAG feed — it carries a `limitationIndication` reference to a `ClinicalUseDefinition` whose `indication.diseaseSymptomProcedure.concept.text` is the actual text. The parser stores the ref as `cud_ref` on each Limitation, `Bundle#cud_text_by_id` resolves DE, and `merge_language` propagates FR/IT from the per-language NDJSON files via the same CUD id. Coverage on the live feed jumped from 0 / 9'108 to 9'108 / 9'108 (issue [#116](https://github.com/zdavatz/oddb2xml/issues/116)). **Limitation code / LIMNAMEBAG** (3.0.12 onwards): FHIR has no native BAG limitation code (LIMCD), so `create_limitations_for_package` sets `LimitationCode = cud_ref` (the `limitationIndication` CUD id) instead of `""`. Without this, every FHIR limitation shared an empty `:code`; `Builder#build_artikelstamm` groups its `<LIMITATIONS>` section by code, so all of them collapsed into a single `<LIMITATION>` with an empty `<LIMNAMEBAG>` and only one text survived. Using the CUD id as the key makes each distinct limitation emit and be referenced from its `<PRODUCT>`. The downstream `bin/check_artikelstamm` (`semantic_check.rb`) also crashed on the lone-element output because Ox `:hash_no_attrs` collapses a one-child section into a Hash (and an empty one into nil) — `SemanticCheckXML#get_items` now normalises every section to an Array.
51
51
 
52
- 8. **Refdata cleanup** (`lib/oddb2xml/refdata_cleanup.rb`) — Compensates for known data-quality issues in upstream Refdata.Articles.xml before they reach the output. Each fix is guarded by a Swissmedic-side heuristic (e.g. comma in `substance_swissmedic` to distinguish mono products from real combinations). Currently fixes the doubled-dose template bug (`X mg / X mg / Stk`). Called from `Builder#apply_refdata_description_cleanups!` at the start of `prepare_articles`. See GitHub issue #112 for the catalogue.
52
+ 8. **Refdata cleanup** (`lib/oddb2xml/refdata_cleanup.rb`) — Compensates for known data-quality issues in upstream Refdata.Articles.xml before they reach the output. Each fix is guarded by a Swissmedic-side heuristic (e.g. comma in `substance_swissmedic` to distinguish mono products from real combinations). Currently fixes (a) the doubled-dose template bug (`X mg / X mg / Stk`, `fix_double_dose`, guarded by `single_substance?`) and (b) the spelled-out German galenic form `Retardtabletten` → house-style abbreviation `Ret Tabl` (`normalize_galenic_form` / `GALENIC_NORMALISATIONS`, issue #112 case #13, e.g. RINVOQ — a narrow word-boundary substitution that leaves legitimate brand suffixes like `TRAMAL retard` and Mepha's `Lactab` untouched). Called from `Builder#apply_refdata_description_cleanups!` at the start of `prepare_articles`. See GitHub issue #112 for the catalogue.
53
53
 
54
54
  9. **Chapter-70 hack** (`lib/oddb2xml/chapter_70_hack.rb`) — Legacy scraper for the SL "Komplementärarzneimittel" products (homeopathic/anthroposophic/phytotherapeutic), called only from `Builder#build_artikelstamm`. **Deprecated / non-FHIR only (3.0.11 onwards):** the source page `varia_De.htm` was rebuilt as a JavaScript SPA with no static data table, so the scraper now returns nothing there. These products + limitations now come through the FHIR feed (SL classification `20. KOMPLEMENTÄRARZNEIMITTEL`, 221 products on the live DE feed with real GTINs and limitation texts), so `build_artikelstamm` **skips the scraper entirely when `@options[:fhir]`** (the default for `--artikelstamm` since 3.0.9). In `--no-fhir` mode the scraper degrades gracefully (skips non-row/`<script>` nodes and empty tables, warns, returns `[]`) instead of raising `NoMethodError`. See GitHub issue #118.
55
55
 
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- oddb2xml (3.0.13)
4
+ oddb2xml (3.0.14)
5
5
  csv
6
6
  htmlentities
7
7
  httpi
data/History.txt CHANGED
@@ -1,3 +1,6 @@
1
+ === 3.0.14 / 02.06.2026
2
+ * Refdata cleanup: normalise the spelled-out German galenic form "Retardtabletten" to the Refdata house-style abbreviation "Ret Tabl" (issue #112 case #13, e.g. RINVOQ). Only 3 of the ~940 retard-tablet descriptions deviated from the convention; the fix is a narrow, German-only word-boundary substitution (RefdataCleanup.normalize_galenic_form, wired into Builder#apply_refdata_description_cleanups!). Legitimate brand suffixes ("TRAMAL retard", "XANAX retard", "MELATONIN Spirig HC retard") and Mepha's "Lactab" are left untouched.
3
+
1
4
  === 3.0.13 / 02.06.2026
2
5
  * Security: bump the nokogiri floor to >= 1.19.3. Fixes two GitHub/Dependabot advisories against nokogiri < 1.19.3: a high-severity ReDoS (regular expression backtracking) in the CSS selector tokenizer, and a medium-severity memory leak in the XSLT transform.
3
6
 
data/README.md CHANGED
@@ -51,7 +51,7 @@ HIN (http://hin.ch) creates daily the actual file. They can be downloaded from `
51
51
  see `--help`.
52
52
 
53
53
  ```
54
- /opt/src/oddb2xml/bin/oddb2xml version 3.0.13
54
+ /opt/src/oddb2xml/bin/oddb2xml version 3.0.14
55
55
  Usage:
56
56
  oddb2xml [option]
57
57
  produced files are found under data
@@ -96,6 +96,7 @@ module Oddb2xml
96
96
  @refdata_descriptions_cleaned = true
97
97
  return if @refdata.nil? || @refdata.empty?
98
98
  double_dose_fixed = 0
99
+ galenic_fixed = 0
99
100
  @refdata.each_value do |item|
100
101
  next unless item.is_a?(Hash)
101
102
  no8 = item[:no8]
@@ -110,11 +111,20 @@ module Oddb2xml
110
111
  item[key] = cleaned
111
112
  double_dose_fixed += 1
112
113
  end
114
+ original = item[key]
115
+ cleaned = RefdataCleanup.normalize_galenic_form(original)
116
+ if cleaned != original
117
+ item[key] = cleaned
118
+ galenic_fixed += 1
119
+ end
113
120
  end
114
121
  end
115
122
  if double_dose_fixed > 0
116
123
  Oddb2xml.log("Refdata cleanup: fixed double-dose pattern in #{double_dose_fixed} description(s)")
117
124
  end
125
+ if galenic_fixed > 0
126
+ Oddb2xml.log("Refdata cleanup: normalised galenic form in #{galenic_fixed} description(s)")
127
+ end
118
128
  end
119
129
 
120
130
  private_class_method
@@ -30,5 +30,23 @@ module Oddb2xml
30
30
  return desc unless single_substance?(swissmedic_substance)
31
31
  desc.sub(DOUBLE_DOSE_RE, '\1 / ')
32
32
  end
33
+
34
+ # Case #13 (issue #112): a handful of products spell the galenic form out
35
+ # in full ("RINVOQ Retardtabletten 30 mg 28 Stk") while the Refdata house
36
+ # style abbreviates it everywhere else ("Ret Tabl", 940 other DE names).
37
+ # Normalise the spelled-out form to the abbreviation so the outliers match
38
+ # the convention. The keys are German-only words (FR/IT use "comprimé …" /
39
+ # "compresse …"), so applying this to FR/IT descriptions is a safe no-op.
40
+ GALENIC_NORMALISATIONS = {
41
+ /\bRetardtabletten\b/ => "Ret Tabl"
42
+ }.freeze
43
+
44
+ # Normalises spelled-out German galenic forms to the Refdata house-style
45
+ # abbreviation. Returns the cleaned description, or the original string if
46
+ # no rule applies.
47
+ def self.normalize_galenic_form(desc)
48
+ return desc if desc.nil? || desc.empty?
49
+ GALENIC_NORMALISATIONS.reduce(desc) { |result, (re, repl)| result.gsub(re, repl) }
50
+ end
33
51
  end
34
52
  end
@@ -1,3 +1,3 @@
1
1
  module Oddb2xml
2
- VERSION = "3.0.13"
2
+ VERSION = "3.0.14"
3
3
  end
@@ -62,6 +62,36 @@ describe Oddb2xml::RefdataCleanup do
62
62
  expect(described_class.fix_double_dose(input, combo)).to eq input
63
63
  end
64
64
  end
65
+
66
+ describe ".normalize_galenic_form" do
67
+ it "abbreviates the spelled-out 'Retardtabletten' to 'Ret Tabl' (issue #112 #13)" do
68
+ input = "RINVOQ Retardtabletten 30 mg 28 Stk"
69
+ expected = "RINVOQ Ret Tabl 30 mg 28 Stk"
70
+ expect(described_class.normalize_galenic_form(input)).to eq expected
71
+ end
72
+
73
+ it "leaves the already-abbreviated house style untouched" do
74
+ input = "TRAMAL retard Ret Tabl 100 mg 30 Stk"
75
+ expect(described_class.normalize_galenic_form(input)).to eq input
76
+ end
77
+
78
+ it "is a no-op for FR/IT names (different galenic words)" do
79
+ fr = "RINVOQ comprimé à libération prolong. 30 mg 28 pce"
80
+ it_ = "RINVOQ compresse a rilascio prolungato 30 mg 28 pz"
81
+ expect(described_class.normalize_galenic_form(fr)).to eq fr
82
+ expect(described_class.normalize_galenic_form(it_)).to eq it_
83
+ end
84
+
85
+ it "is a no-op for nil or empty descriptions" do
86
+ expect(described_class.normalize_galenic_form(nil)).to be_nil
87
+ expect(described_class.normalize_galenic_form("")).to eq ""
88
+ end
89
+
90
+ it "does not touch 'Retardtabletten' embedded in a longer word" do
91
+ input = "FOO Retardtablettenspender 1 Stk"
92
+ expect(described_class.normalize_galenic_form(input)).to eq input
93
+ end
94
+ end
65
95
  end
66
96
 
67
97
  describe Oddb2xml::Builder do
@@ -147,5 +177,27 @@ describe Oddb2xml::Builder do
147
177
 
148
178
  expect(builder.refdata["7680694750066"][:desc_de]).to eq input
149
179
  end
180
+
181
+ it "normalises the galenic form on the German name only (RINVOQ, issue #112 #13)" do
182
+ builder.packs = {
183
+ "67257003" => {substance_swissmedic: "upadacitinibum"}
184
+ }
185
+ builder.refdata = {
186
+ "7680672570037" => {
187
+ ean13: "7680672570037",
188
+ no8: "67257003",
189
+ desc_de: "RINVOQ Retardtabletten 30 mg 28 Stk",
190
+ desc_fr: "RINVOQ comprimé à libération prolong. 30 mg 28 pce",
191
+ desc_it: "RINVOQ compresse a rilascio prolungato 30 mg 28 pz"
192
+ }
193
+ }
194
+
195
+ builder.apply_refdata_description_cleanups!
196
+
197
+ item = builder.refdata["7680672570037"]
198
+ expect(item[:desc_de]).to eq "RINVOQ Ret Tabl 30 mg 28 Stk"
199
+ expect(item[:desc_fr]).to eq "RINVOQ comprimé à libération prolong. 30 mg 28 pce"
200
+ expect(item[:desc_it]).to eq "RINVOQ compresse a rilascio prolungato 30 mg 28 pz"
201
+ end
150
202
  end
151
203
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: oddb2xml
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.13
4
+ version: 3.0.14
5
5
  platform: ruby
6
6
  authors:
7
7
  - Yasuhiro Asaka, Zeno R.R. Davatz, Niklaus Giger