oddb2xml 3.0.14 → 3.0.16
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CLAUDE.md +1 -1
- data/Gemfile.lock +1 -1
- data/History.txt +6 -0
- data/README.md +1 -1
- data/lib/oddb2xml/builder.rb +28 -17
- data/lib/oddb2xml/refdata_cleanup.rb +146 -0
- data/lib/oddb2xml/version.rb +1 -1
- data/spec/refdata_cleanup_spec.rb +184 -0
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 5cba5ea3e5414179cae06dc6b1fd3e30882ebf93d158cebbbf8c384789c6f437
|
|
4
|
+
data.tar.gz: 96e18c2cfbd07cacb6266659b4995e2fdddc04b1e1dafea9410df0423ae91ae8
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 32bc1b4476bd33f7f3c04aae1f63bb7ddca649d7ca7b73942d321bda32102629ec879f3e3ac414073f72f11687752a4d113e1688b52f0c6466b45f866fddcc7c
|
|
7
|
+
data.tar.gz: fbecd040c7a10958075a6754a6b7a492d88181a0530a73112db6cbf0e51f13060a849b53689575515c98c72e93a20bc1744afd1d87eb376aeeacf0282e1592fe
|
data/CLAUDE.md
CHANGED
|
@@ -49,7 +49,7 @@ The system follows a **download → extract → build → compress** pipeline:
|
|
|
49
49
|
|
|
50
50
|
7. **FHIR support** (`lib/oddb2xml/fhir_support.rb`) — Self-contained module providing `FhirDownloader` and FHIR NDJSON parsing. Activated via `--fhir` (or `--fhir-url=<URL>`). Downloads per-language NDJSON files (`foph-sl-export-latest-{de,fr,it}.ndjson`) from `epl.bag.admin.ch` to populate French and Italian product names/descriptions. Maps legal status codes `756005022007` and `756005022008` to Swissmedic category D. Reads the BAG **Indikationscode** (`XXXXX.NN`) from the explicit `indicationCode` extension on each `RegulatedAuthorization.indication[].extension[regulatedAuthorization-limitation]` (BAG SL FHIR export >= v2.0.5; handled from 3.0.10). The BAG changelog states the limitation code (`ClinicalUseDefinition.id`) and the indication code are **independent** fields, so the older derivation — combining each indication CUD's `.NN` id-suffix with the reimbursement RA's `FOPHDossierNumber` — is kept only as a fallback for feeds lacking the extension. Exposed as `item[:indication_codes]` and per-package `:indication_codes` (each entry a `{code:, cud_id:, text:}` hash, where `cud_id` is the `limitationIndication` CUD reference used to resolve the text). From 3.0.7 onwards, `Builder#build_product` emits one `<INDICATION_CODE code="XXXXX.NN" cud_id="DRUG.NN">limitation text</INDICATION_CODE>` child per indication on every `<PRD>` in `oddb_product.xml`; live feed numbers: 539 products / 1,293 codes / 100 % with non-empty indication text. Mandatory on prescriptions/invoices for SL price-model drugs from 2026-07-01 — see issue [#113](https://github.com/zdavatz/oddb2xml/issues/113). **Limitation texts** (3.0.8 onwards): the `regulatedAuthorization-limitation` extension has no inline `limitationText` in the live BAG feed — it carries a `limitationIndication` reference to a `ClinicalUseDefinition` whose `indication.diseaseSymptomProcedure.concept.text` is the actual text. The parser stores the ref as `cud_ref` on each Limitation, `Bundle#cud_text_by_id` resolves DE, and `merge_language` propagates FR/IT from the per-language NDJSON files via the same CUD id. Coverage on the live feed jumped from 0 / 9'108 to 9'108 / 9'108 (issue [#116](https://github.com/zdavatz/oddb2xml/issues/116)). **Limitation code / LIMNAMEBAG** (3.0.12 onwards): FHIR has no native BAG limitation code (LIMCD), so `create_limitations_for_package` sets `LimitationCode = cud_ref` (the `limitationIndication` CUD id) instead of `""`. Without this, every FHIR limitation shared an empty `:code`; `Builder#build_artikelstamm` groups its `<LIMITATIONS>` section by code, so all of them collapsed into a single `<LIMITATION>` with an empty `<LIMNAMEBAG>` and only one text survived. Using the CUD id as the key makes each distinct limitation emit and be referenced from its `<PRODUCT>`. The downstream `bin/check_artikelstamm` (`semantic_check.rb`) also crashed on the lone-element output because Ox `:hash_no_attrs` collapses a one-child section into a Hash (and an empty one into nil) — `SemanticCheckXML#get_items` now normalises every section to an Array.
|
|
51
51
|
|
|
52
|
-
8. **Refdata cleanup** (`lib/oddb2xml/refdata_cleanup.rb`) — Compensates for known data-quality issues in upstream Refdata.Articles.xml before they reach the output. Each fix is guarded by a Swissmedic-side heuristic (e.g. comma in `substance_swissmedic` to distinguish mono products from real combinations). Currently fixes (a) the doubled-dose template bug (`X mg / X mg / Stk`, `fix_double_dose`, guarded by `single_substance?`)
|
|
52
|
+
8. **Refdata cleanup** (`lib/oddb2xml/refdata_cleanup.rb`) — Compensates for known data-quality issues in upstream Refdata.Articles.xml before they reach the output. Each fix is guarded by a Swissmedic-side heuristic (e.g. comma in `substance_swissmedic` to distinguish mono products from real combinations). Currently fixes (a) the doubled-dose template bug (`X mg / X mg / Stk`, `fix_double_dose`, guarded by `single_substance?`); (b) the spelled-out German galenic form `Retardtabletten` → house-style abbreviation `Ret Tabl` (`normalize_galenic_form` / `GALENIC_NORMALISATIONS`, issue #112 case #13, e.g. RINVOQ — a narrow word-boundary substitution that leaves legitimate brand suffixes like `TRAMAL retard` and Mepha's `Lactab` untouched); and (c) dose info Refdata dropped from `<FullName>`, sourced from the Swissmedic composition string `pack[:composition_swissmedic]` — `fix_missing_combo_dose` (#6, appends a combination's 2nd component strength), `fix_missing_dose` (#4, inserts a mono product's missing strength before the pack count), `fix_missing_volume` (#7, appends an injectable's per-pen volume); and (d) 50-char-truncation repairs — `fix_truncated_metoject` (#1, rebuilds METOJECT Autoinjektor names from the intact `<brand> Autoinjektor <dose>/<vol>` prefix + Swissmedic `size`, localised DE/FR/IT) and `fix_truncated_volume_unit` (#3, restores the cut `ml` of the VERACTIV Vitamin D3 drops). The (c) and (d) fixes are scoped to explicit IKSNR allow-lists (`COMBO_DOSE_IKSNR`/`MISSING_DOSE_IKSNR`/`MISSING_VOLUME_IKSNR`/`METOJECT_IKSNR`/`VERACTIV_VITD3_IKSNR`): a dry run proved a blanket heuristic mis-fires on hundreds of legitimate names (sodium counter-ion doses, strength-less phyto/powder products, concentration names like `CIMZIA 200 mg/ml`), so only catalogued registrations are touched — add an IKSNR to grow coverage. Called from `Builder#apply_refdata_description_cleanups!` at the start of `prepare_articles`. See GitHub issue #112 for the catalogue.
|
|
53
53
|
|
|
54
54
|
9. **Chapter-70 hack** (`lib/oddb2xml/chapter_70_hack.rb`) — Legacy scraper for the SL "Komplementärarzneimittel" products (homeopathic/anthroposophic/phytotherapeutic), called only from `Builder#build_artikelstamm`. **Deprecated / non-FHIR only (3.0.11 onwards):** the source page `varia_De.htm` was rebuilt as a JavaScript SPA with no static data table, so the scraper now returns nothing there. These products + limitations now come through the FHIR feed (SL classification `20. KOMPLEMENTÄRARZNEIMITTEL`, 221 products on the live DE feed with real GTINs and limitation texts), so `build_artikelstamm` **skips the scraper entirely when `@options[:fhir]`** (the default for `--artikelstamm` since 3.0.9). In `--no-fhir` mode the scraper degrades gracefully (skips non-row/`<script>` nodes and empty tables, warns, returns `[]`) instead of raising `NoMethodError`. See GitHub issue #118.
|
|
55
55
|
|
data/Gemfile.lock
CHANGED
data/History.txt
CHANGED
|
@@ -1,3 +1,9 @@
|
|
|
1
|
+
=== 3.0.16 / 02.06.2026
|
|
2
|
+
* Refdata cleanup: two more Swissmedic-sourced, IKSNR-scoped fixes for 50-char truncation (issue #112 cases #1 and #3). fix_truncated_metoject rebuilds every truncated METOJECT Autoinjektor name ("METOJECT Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1") from its intact "<brand> Autoinjektor <dose>/<vol>" prefix plus the authoritative Swissmedic pack size -> "METOJECT Autoinjektor 10 mg/0.2 ml Fertpen 1 Stk", localised for FR ("stylo pré … pce") and IT ("penna preriempita … pz"); it recovers the correct count even where the truncation cut the count digits. fix_truncated_volume_unit restores the lost final "l" of the VERACTIV Vitamin D3 drops volume ("… 20'000 U.I. 10m" -> "10ml"). Both are scoped to their registration (METOJECT 65672, VERACTIV Vitamin D3 57690) and idempotent. The French wording in the VERACTIV German name remains an upstream issue.
|
|
3
|
+
|
|
4
|
+
=== 3.0.15 / 02.06.2026
|
|
5
|
+
* Refdata cleanup: reconstruct dose information that Refdata dropped from <FullName>, sourcing the authoritative values from the Swissmedic "Zugelassene Packungen" composition string (issue #112 cases #4/#6/#7). Three new guarded fixes in RefdataCleanup: fix_missing_combo_dose appends a combination's 2nd component strength ("ATOVAQUON PLUS … 250 mg" -> "250 mg / 100 mg"), fix_missing_dose inserts a mono product's missing strength before the pack count ("CETIRIZIN Spirig HC Filmtabl 30 Stk" -> "… 10 mg 30 Stk"), and fix_missing_volume appends an injectable's per-pen volume ("MOUNJARO KwikPen … 7.5 mg" -> "7.5 mg/0.6 ml"). Each fix is scoped to an explicit IKSNR allow-list: a dry run showed a blanket heuristic mis-fires on hundreds of legitimate names (sodium counter-ion doses, strength-less phyto/powder products, concentration names), so only the catalogued registrations are touched. Add an IKSNR to grow coverage.
|
|
6
|
+
|
|
1
7
|
=== 3.0.14 / 02.06.2026
|
|
2
8
|
* Refdata cleanup: normalise the spelled-out German galenic form "Retardtabletten" to the Refdata house-style abbreviation "Ret Tabl" (issue #112 case #13, e.g. RINVOQ). Only 3 of the ~940 retard-tablet descriptions deviated from the convention; the fix is a narrow, German-only word-boundary substitution (RefdataCleanup.normalize_galenic_form, wired into Builder#apply_refdata_description_cleanups!). Legitimate brand suffixes ("TRAMAL retard", "XANAX retard", "MELATONIN Spirig HC retard") and Mepha's "Lactab" are left untouched.
|
|
3
9
|
|
data/README.md
CHANGED
|
@@ -51,7 +51,7 @@ HIN (http://hin.ch) creates daily the actual file. They can be downloaded from `
|
|
|
51
51
|
see `--help`.
|
|
52
52
|
|
|
53
53
|
```
|
|
54
|
-
/opt/src/oddb2xml/bin/oddb2xml version 3.0.
|
|
54
|
+
/opt/src/oddb2xml/bin/oddb2xml version 3.0.16
|
|
55
55
|
Usage:
|
|
56
56
|
oddb2xml [option]
|
|
57
57
|
produced files are found under data
|
data/lib/oddb2xml/builder.rb
CHANGED
|
@@ -95,8 +95,7 @@ module Oddb2xml
|
|
|
95
95
|
return if @refdata_descriptions_cleaned
|
|
96
96
|
@refdata_descriptions_cleaned = true
|
|
97
97
|
return if @refdata.nil? || @refdata.empty?
|
|
98
|
-
|
|
99
|
-
galenic_fixed = 0
|
|
98
|
+
counts = Hash.new(0)
|
|
100
99
|
@refdata.each_value do |item|
|
|
101
100
|
next unless item.is_a?(Hash)
|
|
102
101
|
no8 = item[:no8]
|
|
@@ -104,26 +103,38 @@ module Oddb2xml
|
|
|
104
103
|
pack = @packs[no8]
|
|
105
104
|
next unless pack
|
|
106
105
|
substance = pack[:substance_swissmedic]
|
|
106
|
+
composition = pack[:composition_swissmedic]
|
|
107
|
+
size = pack[:size]
|
|
107
108
|
[:desc_de, :desc_fr, :desc_it].each do |key|
|
|
108
109
|
original = item[key]
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
110
|
+
next if original.nil? || original.empty?
|
|
111
|
+
desc = original
|
|
112
|
+
# Applied in order so later fixes see the output of earlier ones.
|
|
113
|
+
pipeline = [
|
|
114
|
+
[:metoject, ->(d) { RefdataCleanup.fix_truncated_metoject(d, no8, size) }],
|
|
115
|
+
[:veractiv_vol, ->(d) { RefdataCleanup.fix_truncated_volume_unit(d, no8) }],
|
|
116
|
+
[:double_dose, ->(d) { RefdataCleanup.fix_double_dose(d, substance) }],
|
|
117
|
+
[:galenic, ->(d) { RefdataCleanup.normalize_galenic_form(d) }],
|
|
118
|
+
[:combo_dose, ->(d) { RefdataCleanup.fix_missing_combo_dose(d, substance, composition, no8) }],
|
|
119
|
+
[:missing_dose, ->(d) { RefdataCleanup.fix_missing_dose(d, substance, composition, no8) }],
|
|
120
|
+
[:volume, ->(d) { RefdataCleanup.fix_missing_volume(d, composition, no8) }]
|
|
121
|
+
]
|
|
122
|
+
pipeline.each do |rule, fix|
|
|
123
|
+
cleaned = fix.call(desc)
|
|
124
|
+
if cleaned != desc
|
|
125
|
+
desc = cleaned
|
|
126
|
+
counts[rule] += 1
|
|
127
|
+
end
|
|
119
128
|
end
|
|
129
|
+
item[key] = desc if desc != original
|
|
120
130
|
end
|
|
121
131
|
end
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
132
|
+
labels = {metoject: "truncated METOJECT name", veractiv_vol: "truncated volume unit",
|
|
133
|
+
double_dose: "double-dose pattern", galenic: "galenic form",
|
|
134
|
+
combo_dose: "missing combo 2nd dose", missing_dose: "missing strength",
|
|
135
|
+
volume: "missing injection volume"}
|
|
136
|
+
counts.each do |rule, n|
|
|
137
|
+
Oddb2xml.log("Refdata cleanup: fixed #{labels[rule]} in #{n} description(s)") if n > 0
|
|
127
138
|
end
|
|
128
139
|
end
|
|
129
140
|
|
|
@@ -48,5 +48,151 @@ module Oddb2xml
|
|
|
48
48
|
return desc if desc.nil? || desc.empty?
|
|
49
49
|
GALENIC_NORMALISATIONS.reduce(desc) { |result, (re, repl)| result.gsub(re, repl) }
|
|
50
50
|
end
|
|
51
|
+
|
|
52
|
+
# The following three fixes reconstruct dose information that Refdata
|
|
53
|
+
# dropped from <FullName>, sourcing the authoritative values from the
|
|
54
|
+
# Swissmedic "Zugelassene Packungen" composition string (already loaded as
|
|
55
|
+
# pack[:composition_swissmedic], keyed by the same SwissmedicNo8). See
|
|
56
|
+
# issue #112 cases #4 (missing strength), #6 (missing 2nd combo component)
|
|
57
|
+
# and #7 (missing injection volume).
|
|
58
|
+
#
|
|
59
|
+
# Each fix is scoped to an explicit allow-list of Swissmedic registration
|
|
60
|
+
# numbers (IKSNR, the first 5 digits of the no8). A blanket heuristic is
|
|
61
|
+
# NOT safe: a dry run over the full Refdata feed mis-fired on hundreds of
|
|
62
|
+
# legitimate names — combination detection grabbed sodium counter-ion doses
|
|
63
|
+
# ("KEPPRA … / 2.8 mg"), the missing-strength rule fired on strength-less
|
|
64
|
+
# phyto/powder products ("IMPORTAL Pulver"), and the volume rule corrupted
|
|
65
|
+
# concentration names ("CIMZIA 200 mg/ml"). Restricting to the catalogued
|
|
66
|
+
# registrations keeps the Swissmedic-derived value while touching only the
|
|
67
|
+
# known-bad products. Add an IKSNR here once a new case is confirmed.
|
|
68
|
+
COMBO_DOSE_IKSNR = %w[65280].freeze # #6 ATOVAQUON PLUS Spirig HC
|
|
69
|
+
MISSING_DOSE_IKSNR = %w[62568].freeze # #4 CETIRIZIN Spirig HC
|
|
70
|
+
MISSING_VOLUME_IKSNR = %w[69696].freeze # #7 MOUNJARO KwikPen
|
|
71
|
+
|
|
72
|
+
def self.iksnr_of(no8)
|
|
73
|
+
no8.to_s[0, 5]
|
|
74
|
+
end
|
|
75
|
+
|
|
76
|
+
# Builds a whitespace-tolerant matcher for a normalised dose value like
|
|
77
|
+
# "250 mg" so it also matches "250mg" in a description.
|
|
78
|
+
def self.dose_regex(dose)
|
|
79
|
+
m = dose.to_s.match(/\A([\d.,]+)\s*(.+?)\s*\z/)
|
|
80
|
+
return /#{Regexp.escape(dose.to_s)}/i unless m
|
|
81
|
+
/(?<![\d.,])#{Regexp.escape(m[1])}\s*#{Regexp.escape(m[2])}/i
|
|
82
|
+
end
|
|
83
|
+
|
|
84
|
+
# Returns the dose token that belongs to a named active substance in the
|
|
85
|
+
# Swissmedic composition, normalised to "<number> <unit>" (e.g.
|
|
86
|
+
# dose_for_substance(comp, "atovaquonum") => "250 mg"). Matches within the
|
|
87
|
+
# comma-delimited segment that names the substance so excipient doses are
|
|
88
|
+
# never picked up. Returns nil if absent.
|
|
89
|
+
def self.dose_for_substance(composition, substance)
|
|
90
|
+
return nil if composition.nil? || substance.nil?
|
|
91
|
+
key = substance.to_s.strip[/\A[A-Za-zÀ-ÿ]+/]
|
|
92
|
+
return nil if key.nil? || key.empty?
|
|
93
|
+
composition.split(",").each do |segment|
|
|
94
|
+
next unless /\b#{Regexp.escape(key)}/i.match?(segment)
|
|
95
|
+
m = segment.match(DOSE_TOKEN)
|
|
96
|
+
next unless m
|
|
97
|
+
parts = m[0].match(/\A([\d.,]+)\s*(.+?)\s*\z/)
|
|
98
|
+
return parts ? "#{parts[1]} #{parts[2]}" : m[0].strip
|
|
99
|
+
end
|
|
100
|
+
nil
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
# Case #6: a real combination product whose Refdata description carries
|
|
104
|
+
# only the first component's strength (e.g. "ATOVAQUON PLUS … 250 mg …").
|
|
105
|
+
# Appends the second active's strength from Swissmedic, producing
|
|
106
|
+
# "… 250 mg / 100 mg …". No-op for mono products, 3+ component combos, or
|
|
107
|
+
# when the second strength is already present.
|
|
108
|
+
def self.fix_missing_combo_dose(desc, swissmedic_substance, composition, no8)
|
|
109
|
+
return desc if desc.nil? || desc.empty?
|
|
110
|
+
return desc unless COMBO_DOSE_IKSNR.include?(iksnr_of(no8))
|
|
111
|
+
return desc if single_substance?(swissmedic_substance)
|
|
112
|
+
subs = swissmedic_substance.to_s.split(",").map(&:strip)
|
|
113
|
+
return desc unless subs.size == 2
|
|
114
|
+
d1 = dose_for_substance(composition, subs[0])
|
|
115
|
+
d2 = dose_for_substance(composition, subs[1])
|
|
116
|
+
return desc unless d1 && d2
|
|
117
|
+
return desc unless dose_regex(d1).match?(desc)
|
|
118
|
+
return desc if dose_regex(d2).match?(desc)
|
|
119
|
+
desc.sub(dose_regex(d1)) { |hit| "#{hit} / #{d2}" }
|
|
120
|
+
end
|
|
121
|
+
|
|
122
|
+
# Case #4: a mono product whose Refdata description carries NO strength at
|
|
123
|
+
# all (e.g. "CETIRIZIN Spirig HC Filmtabl 10 Stk"). Inserts the single
|
|
124
|
+
# active's strength from Swissmedic before the trailing "<count> <unit>"
|
|
125
|
+
# group → "CETIRIZIN Spirig HC Filmtabl 10 mg 10 Stk". No-op when a
|
|
126
|
+
# strength is already present or no trailing pack count exists.
|
|
127
|
+
def self.fix_missing_dose(desc, swissmedic_substance, composition, no8)
|
|
128
|
+
return desc if desc.nil? || desc.empty?
|
|
129
|
+
return desc unless MISSING_DOSE_IKSNR.include?(iksnr_of(no8))
|
|
130
|
+
return desc unless single_substance?(swissmedic_substance)
|
|
131
|
+
return desc if DOSE_TOKEN.match?(desc)
|
|
132
|
+
dose = dose_for_substance(composition, swissmedic_substance)
|
|
133
|
+
return desc unless dose
|
|
134
|
+
return desc unless /\s\d[\d.,']*\s+\S+\s*\z/.match?(desc)
|
|
135
|
+
desc.sub(/(\s)(\d[\d.,']*\s+\S+\s*)\z/, "\\1#{dose} \\2")
|
|
136
|
+
end
|
|
137
|
+
|
|
138
|
+
# Case #7: an injectable pen/solution whose Refdata description gives the
|
|
139
|
+
# strength but not the per-pen volume (e.g. "MOUNJARO KwikPen Inj Lös
|
|
140
|
+
# 7.5 mg 1 Stk"). Appends "/<vol> ml" taken from the Swissmedic
|
|
141
|
+
# composition ("… pro 0.6 ml …") → "… 7.5 mg/0.6 ml 1 Stk". Only fires for
|
|
142
|
+
# injectable forms that have no volume anywhere in the name yet.
|
|
143
|
+
def self.fix_missing_volume(desc, composition, no8)
|
|
144
|
+
return desc if desc.nil? || desc.empty?
|
|
145
|
+
return desc unless MISSING_VOLUME_IKSNR.include?(iksnr_of(no8))
|
|
146
|
+
return desc unless /\b(?:Inj|Fertpen|Injektor|stylo|sol\b)/i.match?(desc)
|
|
147
|
+
return desc if /\d\s*ml\b/i.match?(desc)
|
|
148
|
+
vol = composition.to_s[/\bpro\s+([\d.,]+)\s*ml\b/i, 1]
|
|
149
|
+
return desc unless vol
|
|
150
|
+
m = desc.match(/\d+(?:[.,]\d+)?\s*mg/i)
|
|
151
|
+
return desc unless m
|
|
152
|
+
desc.sub(m[0], "#{m[0]}/#{vol} ml")
|
|
153
|
+
end
|
|
154
|
+
|
|
155
|
+
METOJECT_IKSNR = %w[65672].freeze # #1 METOJECT Autoinjektor
|
|
156
|
+
|
|
157
|
+
# Localised "<pen> … <count> <unit>" suffix, selected by the galenic-form
|
|
158
|
+
# token Refdata uses per language. The "<brand> Autoinjektor <dose>/<vol>"
|
|
159
|
+
# prefix is identical across DE/FR/IT, so only the suffix is localised.
|
|
160
|
+
METOJECT_SUFFIX = {
|
|
161
|
+
/\bInj Lös\b/ => ["Fertpen", "Stk"], # DE
|
|
162
|
+
/\binj sol\b/ => ["stylo pré", "pce"], # FR
|
|
163
|
+
/\bsol inj\b/ => ["penna preriempita", "pz"] # IT
|
|
164
|
+
}.freeze
|
|
165
|
+
|
|
166
|
+
# Case #1: every METOJECT Autoinjektor name is truncated at Refdata's
|
|
167
|
+
# 50-char limit, carrying a redundant strength in the (often cut) tail
|
|
168
|
+
# ("METOJECT Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1"). Rebuild from the
|
|
169
|
+
# intact prefix plus the authoritative Swissmedic pack size →
|
|
170
|
+
# "METOJECT Autoinjektor 10 mg/0.2 ml Fertpen 1 Stk" (localised for FR/IT).
|
|
171
|
+
# Scoped to the METOJECT registration; idempotent once Refdata stops
|
|
172
|
+
# truncating (the rebuilt name no longer carries the redundant tail).
|
|
173
|
+
def self.fix_truncated_metoject(desc, no8, size)
|
|
174
|
+
return desc if desc.nil? || desc.empty?
|
|
175
|
+
return desc unless METOJECT_IKSNR.include?(iksnr_of(no8))
|
|
176
|
+
return desc if size.nil? || size.to_s.empty?
|
|
177
|
+
m = desc.match(%r{\A(METOJECT Autoinjektor \d[\d.]* mg/\d[\d.]* ml)\b})
|
|
178
|
+
return desc unless m
|
|
179
|
+
suffix = METOJECT_SUFFIX.find { |re, _| re.match?(desc) }
|
|
180
|
+
return desc unless suffix
|
|
181
|
+
pen, unit = suffix.last
|
|
182
|
+
"#{m[1]} #{pen} #{size} #{unit}"
|
|
183
|
+
end
|
|
184
|
+
|
|
185
|
+
VERACTIV_VITD3_IKSNR = %w[57690].freeze # #3 VERACTIV Vitamin D3 Wild
|
|
186
|
+
|
|
187
|
+
# Case #3 (partial): the VERACTIV Vitamin D3 drops are truncated at 50
|
|
188
|
+
# chars, losing the final "l" of the volume ("… 20'000 U.I. 10m" → "10ml").
|
|
189
|
+
# Restore it. The French wording ("Huile", drop-form codes) in the German
|
|
190
|
+
# name is a separate upstream issue and is left untouched. Scoped to the
|
|
191
|
+
# registration; a no-op once the volume already ends in "ml".
|
|
192
|
+
def self.fix_truncated_volume_unit(desc, no8)
|
|
193
|
+
return desc if desc.nil? || desc.empty?
|
|
194
|
+
return desc unless VERACTIV_VITD3_IKSNR.include?(iksnr_of(no8))
|
|
195
|
+
desc.sub(/(\d)\s*m\z/, '\1ml')
|
|
196
|
+
end
|
|
51
197
|
end
|
|
52
198
|
end
|
data/lib/oddb2xml/version.rb
CHANGED
|
@@ -92,6 +92,142 @@ describe Oddb2xml::RefdataCleanup do
|
|
|
92
92
|
expect(described_class.normalize_galenic_form(input)).to eq input
|
|
93
93
|
end
|
|
94
94
|
end
|
|
95
|
+
|
|
96
|
+
describe ".dose_for_substance" do
|
|
97
|
+
let(:comp) { "atovaquonum 250 mg, proguanili hydrochloridum 100 mg, cellulosum microcristallinum" }
|
|
98
|
+
|
|
99
|
+
it "returns the dose of the named active, normalised to '<n> <unit>'" do
|
|
100
|
+
expect(described_class.dose_for_substance(comp, "atovaquonum")).to eq "250 mg"
|
|
101
|
+
expect(described_class.dose_for_substance(comp, "proguanili hydrochloridum")).to eq "100 mg"
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
it "ignores excipient doses outside the substance's segment" do
|
|
105
|
+
cetirizin = "cetirizini dihydrochloridum 10 mg, lactosum monohydricum 65 mg"
|
|
106
|
+
expect(described_class.dose_for_substance(cetirizin, "cetirizini dihydrochloridum")).to eq "10 mg"
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
it "returns nil when the substance is absent" do
|
|
110
|
+
expect(described_class.dose_for_substance(comp, "ibuprofenum")).to be_nil
|
|
111
|
+
expect(described_class.dose_for_substance(nil, "atovaquonum")).to be_nil
|
|
112
|
+
end
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
describe ".fix_missing_combo_dose (issue #112 #6)" do
|
|
116
|
+
let(:combo) { "atovaquonum, proguanili hydrochloridum" }
|
|
117
|
+
let(:comp) { "atovaquonum 250 mg, proguanili hydrochloridum 100 mg, cellulosum microcristallinum" }
|
|
118
|
+
|
|
119
|
+
it "appends the 2nd combo component dose for the catalogued IKSNR (ATOVAQUON, 65280)" do
|
|
120
|
+
input = "ATOVAQUON PLUS Spirig HC Filmtabl 250 mg 12 Stk"
|
|
121
|
+
expect(described_class.fix_missing_combo_dose(input, combo, comp, "65280001"))
|
|
122
|
+
.to eq "ATOVAQUON PLUS Spirig HC Filmtabl 250 mg / 100 mg 12 Stk"
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
it "is a no-op for non-catalogued registrations (avoids the KEPPRA sodium misfire)" do
|
|
126
|
+
input = "KEPPRA Filmtabl 1000 mg 100 Stk"
|
|
127
|
+
keppra_comp = "levetiracetamum 1000 mg, ... corresp. natrium 2.8 mg"
|
|
128
|
+
expect(described_class.fix_missing_combo_dose(input, "levetiracetamum, natrium", keppra_comp, "29152001"))
|
|
129
|
+
.to eq input
|
|
130
|
+
end
|
|
131
|
+
|
|
132
|
+
it "is a no-op when the 2nd dose is already present" do
|
|
133
|
+
input = "ATOVAQUON PLUS Spirig HC Filmtabl 250 mg / 100 mg 12 Stk"
|
|
134
|
+
expect(described_class.fix_missing_combo_dose(input, combo, comp, "65280001")).to eq input
|
|
135
|
+
end
|
|
136
|
+
|
|
137
|
+
it "is a no-op for mono products" do
|
|
138
|
+
input = "X 250 mg 12 Stk"
|
|
139
|
+
expect(described_class.fix_missing_combo_dose(input, "atovaquonum", comp, "65280001")).to eq input
|
|
140
|
+
end
|
|
141
|
+
end
|
|
142
|
+
|
|
143
|
+
describe ".fix_missing_dose (issue #112 #4)" do
|
|
144
|
+
let(:comp) { "cetirizini dihydrochloridum 10 mg, lactosum monohydricum 65 mg, talcum" }
|
|
145
|
+
let(:sub) { "cetirizini dihydrochloridum" }
|
|
146
|
+
|
|
147
|
+
it "inserts the strength before the pack count for the catalogued IKSNR (CETIRIZIN, 62568)" do
|
|
148
|
+
expect(described_class.fix_missing_dose("CETIRIZIN Spirig HC Filmtabl 30 Stk", sub, comp, "62568007"))
|
|
149
|
+
.to eq "CETIRIZIN Spirig HC Filmtabl 10 mg 30 Stk"
|
|
150
|
+
end
|
|
151
|
+
|
|
152
|
+
it "works on French/Italian pack-count units" do
|
|
153
|
+
expect(described_class.fix_missing_dose("CETIRIZINE Spirig HC cpr pellic 30 pce", sub, comp, "62568007"))
|
|
154
|
+
.to eq "CETIRIZINE Spirig HC cpr pellic 10 mg 30 pce"
|
|
155
|
+
expect(described_class.fix_missing_dose("CETIRIZINA Spirig HC cpr riv 30 pz", sub, comp, "62568007"))
|
|
156
|
+
.to eq "CETIRIZINA Spirig HC cpr riv 10 mg 30 pz"
|
|
157
|
+
end
|
|
158
|
+
|
|
159
|
+
it "is a no-op for non-catalogued registrations (avoids the IMPORTAL powder misfire)" do
|
|
160
|
+
expect(described_class.fix_missing_dose("IMPORTAL Pulver Btl 50 Stk", "lactitolum", "lactitolum monohydricum 10 g", "43414001"))
|
|
161
|
+
.to eq "IMPORTAL Pulver Btl 50 Stk"
|
|
162
|
+
end
|
|
163
|
+
|
|
164
|
+
it "is a no-op when a strength is already present" do
|
|
165
|
+
input = "CETIRIZIN Spirig HC Filmtabl 10 mg 30 Stk"
|
|
166
|
+
expect(described_class.fix_missing_dose(input, sub, comp, "62568007")).to eq input
|
|
167
|
+
end
|
|
168
|
+
end
|
|
169
|
+
|
|
170
|
+
describe ".fix_missing_volume (issue #112 #7)" do
|
|
171
|
+
let(:comp) { "tirzepatidum 7.5 mg, ... ad solutionem pro 0.6 ml corresp. natrium 0.6 mg." }
|
|
172
|
+
|
|
173
|
+
it "appends the per-pen volume for the catalogued IKSNR (MOUNJARO, 69696)" do
|
|
174
|
+
expect(described_class.fix_missing_volume("MOUNJARO KwikPen Inj Lös 7.5 mg 1 Stk", comp, "69696003"))
|
|
175
|
+
.to eq "MOUNJARO KwikPen Inj Lös 7.5 mg/0.6 ml 1 Stk"
|
|
176
|
+
end
|
|
177
|
+
|
|
178
|
+
it "is a no-op for non-catalogued registrations (avoids the CIMZIA concentration misfire)" do
|
|
179
|
+
input = "CIMZIA AutoClicks 200 mg/ml Fertpen 2 Stk"
|
|
180
|
+
expect(described_class.fix_missing_volume(input, "... pro 1 ml ...", "58277001")).to eq input
|
|
181
|
+
end
|
|
182
|
+
|
|
183
|
+
it "never double-appends a volume that is already present" do
|
|
184
|
+
input = "MOUNJARO KwikPen Inj Lös 7.5 mg/0.6 ml 1 Stk"
|
|
185
|
+
expect(described_class.fix_missing_volume(input, comp, "69696003")).to eq input
|
|
186
|
+
end
|
|
187
|
+
end
|
|
188
|
+
|
|
189
|
+
describe ".fix_truncated_metoject (issue #112 #1)" do
|
|
190
|
+
it "rebuilds the DE name from the prefix plus the Swissmedic size" do
|
|
191
|
+
expect(described_class.fix_truncated_metoject("METOJECT Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1", "65672106", "1"))
|
|
192
|
+
.to eq "METOJECT Autoinjektor 10 mg/0.2 ml Fertpen 1 Stk"
|
|
193
|
+
end
|
|
194
|
+
|
|
195
|
+
it "localises French (stylo pré/pce) and Italian (penna preriempita/pz)" do
|
|
196
|
+
expect(described_class.fix_truncated_metoject("METOJECT Autoinjektor 10 mg/0.2 ml inj sol 10 mg 1", "65672106", "1"))
|
|
197
|
+
.to eq "METOJECT Autoinjektor 10 mg/0.2 ml stylo pré 1 pce"
|
|
198
|
+
expect(described_class.fix_truncated_metoject("METOJECT Autoinjektor 10 mg/0.2 ml sol inj 10 mg 1", "65672106", "1"))
|
|
199
|
+
.to eq "METOJECT Autoinjektor 10 mg/0.2 ml penna preriempita 1 pz"
|
|
200
|
+
end
|
|
201
|
+
|
|
202
|
+
it "uses the Swissmedic size even when the truncated count was cut off" do
|
|
203
|
+
expect(described_class.fix_truncated_metoject("METOJECT Autoinjektor 12.5 mg/0.25 ml Inj Lös 12.5", "65672111", "12"))
|
|
204
|
+
.to eq "METOJECT Autoinjektor 12.5 mg/0.25 ml Fertpen 12 Stk"
|
|
205
|
+
end
|
|
206
|
+
|
|
207
|
+
it "is a no-op for other registrations and without a size" do
|
|
208
|
+
other = "FOO Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1"
|
|
209
|
+
expect(described_class.fix_truncated_metoject(other, "99999001", "1")).to eq other
|
|
210
|
+
keep = "METOJECT Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1"
|
|
211
|
+
expect(described_class.fix_truncated_metoject(keep, "65672106", nil)).to eq keep
|
|
212
|
+
end
|
|
213
|
+
end
|
|
214
|
+
|
|
215
|
+
describe ".fix_truncated_volume_unit (issue #112 #3)" do
|
|
216
|
+
it "restores the truncated 'ml' for the VERACTIV Vitamin D3 registration" do
|
|
217
|
+
expect(described_class.fix_truncated_volume_unit("VERACTIV Vitamin D3 Wild Huile Trp 20'000 U.I. 10m", "57690004"))
|
|
218
|
+
.to eq "VERACTIV Vitamin D3 Wild Huile Trp 20'000 U.I. 10ml"
|
|
219
|
+
end
|
|
220
|
+
|
|
221
|
+
it "is a no-op when the volume already ends in 'ml'" do
|
|
222
|
+
input = "VERACTIV Vitamin D3 Wild Huile Trp 20'000 U.I. 10ml"
|
|
223
|
+
expect(described_class.fix_truncated_volume_unit(input, "57690004")).to eq input
|
|
224
|
+
end
|
|
225
|
+
|
|
226
|
+
it "is a no-op for other registrations" do
|
|
227
|
+
input = "FOO Tropfen 10m"
|
|
228
|
+
expect(described_class.fix_truncated_volume_unit(input, "12345001")).to eq input
|
|
229
|
+
end
|
|
230
|
+
end
|
|
95
231
|
end
|
|
96
232
|
|
|
97
233
|
describe Oddb2xml::Builder do
|
|
@@ -199,5 +335,53 @@ describe Oddb2xml::Builder do
|
|
|
199
335
|
expect(item[:desc_fr]).to eq "RINVOQ comprimé à libération prolong. 30 mg 28 pce"
|
|
200
336
|
expect(item[:desc_it]).to eq "RINVOQ compresse a rilascio prolungato 30 mg 28 pz"
|
|
201
337
|
end
|
|
338
|
+
|
|
339
|
+
it "reconstructs missing dose info from Swissmedic for catalogued articles (issue #112 #4/#6/#7)" do
|
|
340
|
+
builder.packs = {
|
|
341
|
+
"65280001" => {substance_swissmedic: "atovaquonum, proguanili hydrochloridum",
|
|
342
|
+
composition_swissmedic: "atovaquonum 250 mg, proguanili hydrochloridum 100 mg, cellulosum"},
|
|
343
|
+
"62568007" => {substance_swissmedic: "cetirizini dihydrochloridum",
|
|
344
|
+
composition_swissmedic: "cetirizini dihydrochloridum 10 mg, lactosum monohydricum 65 mg"},
|
|
345
|
+
"69696003" => {substance_swissmedic: "tirzepatidum",
|
|
346
|
+
composition_swissmedic: "tirzepatidum 7.5 mg, ad solutionem pro 0.6 ml corresp. natrium"}
|
|
347
|
+
}
|
|
348
|
+
builder.refdata = {
|
|
349
|
+
"7680652800017" => {ean13: "7680652800017", no8: "65280001",
|
|
350
|
+
desc_de: "ATOVAQUON PLUS Spirig HC Filmtabl 250 mg 12 Stk", desc_fr: "", desc_it: ""},
|
|
351
|
+
"7680625680073" => {ean13: "7680625680073", no8: "62568007",
|
|
352
|
+
desc_de: "CETIRIZIN Spirig HC Filmtabl 30 Stk", desc_fr: "", desc_it: ""},
|
|
353
|
+
"7680696960036" => {ean13: "7680696960036", no8: "69696003",
|
|
354
|
+
desc_de: "MOUNJARO KwikPen Inj Lös 7.5 mg 1 Stk", desc_fr: "", desc_it: ""}
|
|
355
|
+
}
|
|
356
|
+
|
|
357
|
+
builder.apply_refdata_description_cleanups!
|
|
358
|
+
|
|
359
|
+
expect(builder.refdata["7680652800017"][:desc_de]).to eq "ATOVAQUON PLUS Spirig HC Filmtabl 250 mg / 100 mg 12 Stk"
|
|
360
|
+
expect(builder.refdata["7680625680073"][:desc_de]).to eq "CETIRIZIN Spirig HC Filmtabl 10 mg 30 Stk"
|
|
361
|
+
expect(builder.refdata["7680696960036"][:desc_de]).to eq "MOUNJARO KwikPen Inj Lös 7.5 mg/0.6 ml 1 Stk"
|
|
362
|
+
end
|
|
363
|
+
|
|
364
|
+
it "rebuilds truncated names from Swissmedic for catalogued articles (issue #112 #1/#3)" do
|
|
365
|
+
builder.packs = {
|
|
366
|
+
"65672106" => {substance_swissmedic: "methotrexatum", composition_swissmedic: "", size: "1"},
|
|
367
|
+
"57690004" => {substance_swissmedic: "colecalciferolum", composition_swissmedic: "", size: "1"}
|
|
368
|
+
}
|
|
369
|
+
builder.refdata = {
|
|
370
|
+
"7680656721066" => {ean13: "7680656721066", no8: "65672106",
|
|
371
|
+
desc_de: "METOJECT Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1",
|
|
372
|
+
desc_fr: "METOJECT Autoinjektor 10 mg/0.2 ml inj sol 10 mg 1",
|
|
373
|
+
desc_it: "METOJECT Autoinjektor 10 mg/0.2 ml sol inj 10 mg 1"},
|
|
374
|
+
"7680576900046" => {ean13: "7680576900046", no8: "57690004",
|
|
375
|
+
desc_de: "VERACTIV Vitamin D3 Wild Huile Trp 20'000 U.I. 10m", desc_fr: "", desc_it: ""}
|
|
376
|
+
}
|
|
377
|
+
|
|
378
|
+
builder.apply_refdata_description_cleanups!
|
|
379
|
+
|
|
380
|
+
met = builder.refdata["7680656721066"]
|
|
381
|
+
expect(met[:desc_de]).to eq "METOJECT Autoinjektor 10 mg/0.2 ml Fertpen 1 Stk"
|
|
382
|
+
expect(met[:desc_fr]).to eq "METOJECT Autoinjektor 10 mg/0.2 ml stylo pré 1 pce"
|
|
383
|
+
expect(met[:desc_it]).to eq "METOJECT Autoinjektor 10 mg/0.2 ml penna preriempita 1 pz"
|
|
384
|
+
expect(builder.refdata["7680576900046"][:desc_de]).to eq "VERACTIV Vitamin D3 Wild Huile Trp 20'000 U.I. 10ml"
|
|
385
|
+
end
|
|
202
386
|
end
|
|
203
387
|
end
|