oddb2xml 3.0.13 → 3.0.14
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CLAUDE.md +1 -1
- data/Gemfile.lock +1 -1
- data/History.txt +3 -0
- data/README.md +1 -1
- data/lib/oddb2xml/builder.rb +10 -0
- data/lib/oddb2xml/refdata_cleanup.rb +18 -0
- data/lib/oddb2xml/version.rb +1 -1
- data/spec/refdata_cleanup_spec.rb +52 -0
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 48a674b29560c021e581fdbcd060a0bcf58f618ce853a93a54e8a7504bbb3f0a
|
|
4
|
+
data.tar.gz: ac3fc397a257b03a51f760299241c9865c325b22c3bb1b57ba76a0dcfa5fcb59
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: bd0da98d7f153c459f0f8548f56c1ac416752ee9e05aafe2cb2bf5965e1f88c7bab1272237210b7ead8079b6f8d564eeca187801348d26763439dc6133a8c490
|
|
7
|
+
data.tar.gz: 7b53c6f9ed2f363bbae51c2f8d6cbce456774534f08045def0248a13a22a8e17c2e723f093936e6d607c02d7e3da95e74eb00945bdae6fd393f0aac6fa1da8fc
|
data/CLAUDE.md
CHANGED
|
@@ -49,7 +49,7 @@ The system follows a **download → extract → build → compress** pipeline:
|
|
|
49
49
|
|
|
50
50
|
7. **FHIR support** (`lib/oddb2xml/fhir_support.rb`) — Self-contained module providing `FhirDownloader` and FHIR NDJSON parsing. Activated via `--fhir` (or `--fhir-url=<URL>`). Downloads per-language NDJSON files (`foph-sl-export-latest-{de,fr,it}.ndjson`) from `epl.bag.admin.ch` to populate French and Italian product names/descriptions. Maps legal status codes `756005022007` and `756005022008` to Swissmedic category D. Reads the BAG **Indikationscode** (`XXXXX.NN`) from the explicit `indicationCode` extension on each `RegulatedAuthorization.indication[].extension[regulatedAuthorization-limitation]` (BAG SL FHIR export >= v2.0.5; handled from 3.0.10). The BAG changelog states the limitation code (`ClinicalUseDefinition.id`) and the indication code are **independent** fields, so the older derivation — combining each indication CUD's `.NN` id-suffix with the reimbursement RA's `FOPHDossierNumber` — is kept only as a fallback for feeds lacking the extension. Exposed as `item[:indication_codes]` and per-package `:indication_codes` (each entry a `{code:, cud_id:, text:}` hash, where `cud_id` is the `limitationIndication` CUD reference used to resolve the text). From 3.0.7 onwards, `Builder#build_product` emits one `<INDICATION_CODE code="XXXXX.NN" cud_id="DRUG.NN">limitation text</INDICATION_CODE>` child per indication on every `<PRD>` in `oddb_product.xml`; live feed numbers: 539 products / 1,293 codes / 100 % with non-empty indication text. Mandatory on prescriptions/invoices for SL price-model drugs from 2026-07-01 — see issue [#113](https://github.com/zdavatz/oddb2xml/issues/113). **Limitation texts** (3.0.8 onwards): the `regulatedAuthorization-limitation` extension has no inline `limitationText` in the live BAG feed — it carries a `limitationIndication` reference to a `ClinicalUseDefinition` whose `indication.diseaseSymptomProcedure.concept.text` is the actual text. The parser stores the ref as `cud_ref` on each Limitation, `Bundle#cud_text_by_id` resolves DE, and `merge_language` propagates FR/IT from the per-language NDJSON files via the same CUD id. Coverage on the live feed jumped from 0 / 9'108 to 9'108 / 9'108 (issue [#116](https://github.com/zdavatz/oddb2xml/issues/116)). **Limitation code / LIMNAMEBAG** (3.0.12 onwards): FHIR has no native BAG limitation code (LIMCD), so `create_limitations_for_package` sets `LimitationCode = cud_ref` (the `limitationIndication` CUD id) instead of `""`. Without this, every FHIR limitation shared an empty `:code`; `Builder#build_artikelstamm` groups its `<LIMITATIONS>` section by code, so all of them collapsed into a single `<LIMITATION>` with an empty `<LIMNAMEBAG>` and only one text survived. Using the CUD id as the key makes each distinct limitation emit and be referenced from its `<PRODUCT>`. The downstream `bin/check_artikelstamm` (`semantic_check.rb`) also crashed on the lone-element output because Ox `:hash_no_attrs` collapses a one-child section into a Hash (and an empty one into nil) — `SemanticCheckXML#get_items` now normalises every section to an Array.
|
|
51
51
|
|
|
52
|
-
8. **Refdata cleanup** (`lib/oddb2xml/refdata_cleanup.rb`) — Compensates for known data-quality issues in upstream Refdata.Articles.xml before they reach the output. Each fix is guarded by a Swissmedic-side heuristic (e.g. comma in `substance_swissmedic` to distinguish mono products from real combinations). Currently fixes the doubled-dose template bug (`X mg / X mg / Stk`). Called from `Builder#apply_refdata_description_cleanups!` at the start of `prepare_articles`. See GitHub issue #112 for the catalogue.
|
|
52
|
+
8. **Refdata cleanup** (`lib/oddb2xml/refdata_cleanup.rb`) — Compensates for known data-quality issues in upstream Refdata.Articles.xml before they reach the output. Each fix is guarded by a Swissmedic-side heuristic (e.g. comma in `substance_swissmedic` to distinguish mono products from real combinations). Currently fixes (a) the doubled-dose template bug (`X mg / X mg / Stk`, `fix_double_dose`, guarded by `single_substance?`) and (b) the spelled-out German galenic form `Retardtabletten` → house-style abbreviation `Ret Tabl` (`normalize_galenic_form` / `GALENIC_NORMALISATIONS`, issue #112 case #13, e.g. RINVOQ — a narrow word-boundary substitution that leaves legitimate brand suffixes like `TRAMAL retard` and Mepha's `Lactab` untouched). Called from `Builder#apply_refdata_description_cleanups!` at the start of `prepare_articles`. See GitHub issue #112 for the catalogue.
|
|
53
53
|
|
|
54
54
|
9. **Chapter-70 hack** (`lib/oddb2xml/chapter_70_hack.rb`) — Legacy scraper for the SL "Komplementärarzneimittel" products (homeopathic/anthroposophic/phytotherapeutic), called only from `Builder#build_artikelstamm`. **Deprecated / non-FHIR only (3.0.11 onwards):** the source page `varia_De.htm` was rebuilt as a JavaScript SPA with no static data table, so the scraper now returns nothing there. These products + limitations now come through the FHIR feed (SL classification `20. KOMPLEMENTÄRARZNEIMITTEL`, 221 products on the live DE feed with real GTINs and limitation texts), so `build_artikelstamm` **skips the scraper entirely when `@options[:fhir]`** (the default for `--artikelstamm` since 3.0.9). In `--no-fhir` mode the scraper degrades gracefully (skips non-row/`<script>` nodes and empty tables, warns, returns `[]`) instead of raising `NoMethodError`. See GitHub issue #118.
|
|
55
55
|
|
data/Gemfile.lock
CHANGED
data/History.txt
CHANGED
|
@@ -1,3 +1,6 @@
|
|
|
1
|
+
=== 3.0.14 / 02.06.2026
|
|
2
|
+
* Refdata cleanup: normalise the spelled-out German galenic form "Retardtabletten" to the Refdata house-style abbreviation "Ret Tabl" (issue #112 case #13, e.g. RINVOQ). Only 3 of the ~940 retard-tablet descriptions deviated from the convention; the fix is a narrow, German-only word-boundary substitution (RefdataCleanup.normalize_galenic_form, wired into Builder#apply_refdata_description_cleanups!). Legitimate brand suffixes ("TRAMAL retard", "XANAX retard", "MELATONIN Spirig HC retard") and Mepha's "Lactab" are left untouched.
|
|
3
|
+
|
|
1
4
|
=== 3.0.13 / 02.06.2026
|
|
2
5
|
* Security: bump the nokogiri floor to >= 1.19.3. Fixes two GitHub/Dependabot advisories against nokogiri < 1.19.3: a high-severity ReDoS (regular expression backtracking) in the CSS selector tokenizer, and a medium-severity memory leak in the XSLT transform.
|
|
3
6
|
|
data/README.md
CHANGED
|
@@ -51,7 +51,7 @@ HIN (http://hin.ch) creates daily the actual file. They can be downloaded from `
|
|
|
51
51
|
see `--help`.
|
|
52
52
|
|
|
53
53
|
```
|
|
54
|
-
/opt/src/oddb2xml/bin/oddb2xml version 3.0.
|
|
54
|
+
/opt/src/oddb2xml/bin/oddb2xml version 3.0.14
|
|
55
55
|
Usage:
|
|
56
56
|
oddb2xml [option]
|
|
57
57
|
produced files are found under data
|
data/lib/oddb2xml/builder.rb
CHANGED
|
@@ -96,6 +96,7 @@ module Oddb2xml
|
|
|
96
96
|
@refdata_descriptions_cleaned = true
|
|
97
97
|
return if @refdata.nil? || @refdata.empty?
|
|
98
98
|
double_dose_fixed = 0
|
|
99
|
+
galenic_fixed = 0
|
|
99
100
|
@refdata.each_value do |item|
|
|
100
101
|
next unless item.is_a?(Hash)
|
|
101
102
|
no8 = item[:no8]
|
|
@@ -110,11 +111,20 @@ module Oddb2xml
|
|
|
110
111
|
item[key] = cleaned
|
|
111
112
|
double_dose_fixed += 1
|
|
112
113
|
end
|
|
114
|
+
original = item[key]
|
|
115
|
+
cleaned = RefdataCleanup.normalize_galenic_form(original)
|
|
116
|
+
if cleaned != original
|
|
117
|
+
item[key] = cleaned
|
|
118
|
+
galenic_fixed += 1
|
|
119
|
+
end
|
|
113
120
|
end
|
|
114
121
|
end
|
|
115
122
|
if double_dose_fixed > 0
|
|
116
123
|
Oddb2xml.log("Refdata cleanup: fixed double-dose pattern in #{double_dose_fixed} description(s)")
|
|
117
124
|
end
|
|
125
|
+
if galenic_fixed > 0
|
|
126
|
+
Oddb2xml.log("Refdata cleanup: normalised galenic form in #{galenic_fixed} description(s)")
|
|
127
|
+
end
|
|
118
128
|
end
|
|
119
129
|
|
|
120
130
|
private_class_method
|
|
@@ -30,5 +30,23 @@ module Oddb2xml
|
|
|
30
30
|
return desc unless single_substance?(swissmedic_substance)
|
|
31
31
|
desc.sub(DOUBLE_DOSE_RE, '\1 / ')
|
|
32
32
|
end
|
|
33
|
+
|
|
34
|
+
# Case #13 (issue #112): a handful of products spell the galenic form out
|
|
35
|
+
# in full ("RINVOQ Retardtabletten 30 mg 28 Stk") while the Refdata house
|
|
36
|
+
# style abbreviates it everywhere else ("Ret Tabl", 940 other DE names).
|
|
37
|
+
# Normalise the spelled-out form to the abbreviation so the outliers match
|
|
38
|
+
# the convention. The keys are German-only words (FR/IT use "comprimé …" /
|
|
39
|
+
# "compresse …"), so applying this to FR/IT descriptions is a safe no-op.
|
|
40
|
+
GALENIC_NORMALISATIONS = {
|
|
41
|
+
/\bRetardtabletten\b/ => "Ret Tabl"
|
|
42
|
+
}.freeze
|
|
43
|
+
|
|
44
|
+
# Normalises spelled-out German galenic forms to the Refdata house-style
|
|
45
|
+
# abbreviation. Returns the cleaned description, or the original string if
|
|
46
|
+
# no rule applies.
|
|
47
|
+
def self.normalize_galenic_form(desc)
|
|
48
|
+
return desc if desc.nil? || desc.empty?
|
|
49
|
+
GALENIC_NORMALISATIONS.reduce(desc) { |result, (re, repl)| result.gsub(re, repl) }
|
|
50
|
+
end
|
|
33
51
|
end
|
|
34
52
|
end
|
data/lib/oddb2xml/version.rb
CHANGED
|
@@ -62,6 +62,36 @@ describe Oddb2xml::RefdataCleanup do
|
|
|
62
62
|
expect(described_class.fix_double_dose(input, combo)).to eq input
|
|
63
63
|
end
|
|
64
64
|
end
|
|
65
|
+
|
|
66
|
+
describe ".normalize_galenic_form" do
|
|
67
|
+
it "abbreviates the spelled-out 'Retardtabletten' to 'Ret Tabl' (issue #112 #13)" do
|
|
68
|
+
input = "RINVOQ Retardtabletten 30 mg 28 Stk"
|
|
69
|
+
expected = "RINVOQ Ret Tabl 30 mg 28 Stk"
|
|
70
|
+
expect(described_class.normalize_galenic_form(input)).to eq expected
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
it "leaves the already-abbreviated house style untouched" do
|
|
74
|
+
input = "TRAMAL retard Ret Tabl 100 mg 30 Stk"
|
|
75
|
+
expect(described_class.normalize_galenic_form(input)).to eq input
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
it "is a no-op for FR/IT names (different galenic words)" do
|
|
79
|
+
fr = "RINVOQ comprimé à libération prolong. 30 mg 28 pce"
|
|
80
|
+
it_ = "RINVOQ compresse a rilascio prolungato 30 mg 28 pz"
|
|
81
|
+
expect(described_class.normalize_galenic_form(fr)).to eq fr
|
|
82
|
+
expect(described_class.normalize_galenic_form(it_)).to eq it_
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
it "is a no-op for nil or empty descriptions" do
|
|
86
|
+
expect(described_class.normalize_galenic_form(nil)).to be_nil
|
|
87
|
+
expect(described_class.normalize_galenic_form("")).to eq ""
|
|
88
|
+
end
|
|
89
|
+
|
|
90
|
+
it "does not touch 'Retardtabletten' embedded in a longer word" do
|
|
91
|
+
input = "FOO Retardtablettenspender 1 Stk"
|
|
92
|
+
expect(described_class.normalize_galenic_form(input)).to eq input
|
|
93
|
+
end
|
|
94
|
+
end
|
|
65
95
|
end
|
|
66
96
|
|
|
67
97
|
describe Oddb2xml::Builder do
|
|
@@ -147,5 +177,27 @@ describe Oddb2xml::Builder do
|
|
|
147
177
|
|
|
148
178
|
expect(builder.refdata["7680694750066"][:desc_de]).to eq input
|
|
149
179
|
end
|
|
180
|
+
|
|
181
|
+
it "normalises the galenic form on the German name only (RINVOQ, issue #112 #13)" do
|
|
182
|
+
builder.packs = {
|
|
183
|
+
"67257003" => {substance_swissmedic: "upadacitinibum"}
|
|
184
|
+
}
|
|
185
|
+
builder.refdata = {
|
|
186
|
+
"7680672570037" => {
|
|
187
|
+
ean13: "7680672570037",
|
|
188
|
+
no8: "67257003",
|
|
189
|
+
desc_de: "RINVOQ Retardtabletten 30 mg 28 Stk",
|
|
190
|
+
desc_fr: "RINVOQ comprimé à libération prolong. 30 mg 28 pce",
|
|
191
|
+
desc_it: "RINVOQ compresse a rilascio prolungato 30 mg 28 pz"
|
|
192
|
+
}
|
|
193
|
+
}
|
|
194
|
+
|
|
195
|
+
builder.apply_refdata_description_cleanups!
|
|
196
|
+
|
|
197
|
+
item = builder.refdata["7680672570037"]
|
|
198
|
+
expect(item[:desc_de]).to eq "RINVOQ Ret Tabl 30 mg 28 Stk"
|
|
199
|
+
expect(item[:desc_fr]).to eq "RINVOQ comprimé à libération prolong. 30 mg 28 pce"
|
|
200
|
+
expect(item[:desc_it]).to eq "RINVOQ compresse a rilascio prolungato 30 mg 28 pz"
|
|
201
|
+
end
|
|
150
202
|
end
|
|
151
203
|
end
|