oddb2xml 3.0.14 → 3.0.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 48a674b29560c021e581fdbcd060a0bcf58f618ce853a93a54e8a7504bbb3f0a
4
- data.tar.gz: ac3fc397a257b03a51f760299241c9865c325b22c3bb1b57ba76a0dcfa5fcb59
3
+ metadata.gz: 5cba5ea3e5414179cae06dc6b1fd3e30882ebf93d158cebbbf8c384789c6f437
4
+ data.tar.gz: 96e18c2cfbd07cacb6266659b4995e2fdddc04b1e1dafea9410df0423ae91ae8
5
5
  SHA512:
6
- metadata.gz: bd0da98d7f153c459f0f8548f56c1ac416752ee9e05aafe2cb2bf5965e1f88c7bab1272237210b7ead8079b6f8d564eeca187801348d26763439dc6133a8c490
7
- data.tar.gz: 7b53c6f9ed2f363bbae51c2f8d6cbce456774534f08045def0248a13a22a8e17c2e723f093936e6d607c02d7e3da95e74eb00945bdae6fd393f0aac6fa1da8fc
6
+ metadata.gz: 32bc1b4476bd33f7f3c04aae1f63bb7ddca649d7ca7b73942d321bda32102629ec879f3e3ac414073f72f11687752a4d113e1688b52f0c6466b45f866fddcc7c
7
+ data.tar.gz: fbecd040c7a10958075a6754a6b7a492d88181a0530a73112db6cbf0e51f13060a849b53689575515c98c72e93a20bc1744afd1d87eb376aeeacf0282e1592fe
data/CLAUDE.md CHANGED
@@ -49,7 +49,7 @@ The system follows a **download → extract → build → compress** pipeline:
49
49
 
50
50
  7. **FHIR support** (`lib/oddb2xml/fhir_support.rb`) — Self-contained module providing `FhirDownloader` and FHIR NDJSON parsing. Activated via `--fhir` (or `--fhir-url=<URL>`). Downloads per-language NDJSON files (`foph-sl-export-latest-{de,fr,it}.ndjson`) from `epl.bag.admin.ch` to populate French and Italian product names/descriptions. Maps legal status codes `756005022007` and `756005022008` to Swissmedic category D. Reads the BAG **Indikationscode** (`XXXXX.NN`) from the explicit `indicationCode` extension on each `RegulatedAuthorization.indication[].extension[regulatedAuthorization-limitation]` (BAG SL FHIR export >= v2.0.5; handled from 3.0.10). The BAG changelog states the limitation code (`ClinicalUseDefinition.id`) and the indication code are **independent** fields, so the older derivation — combining each indication CUD's `.NN` id-suffix with the reimbursement RA's `FOPHDossierNumber` — is kept only as a fallback for feeds lacking the extension. Exposed as `item[:indication_codes]` and per-package `:indication_codes` (each entry a `{code:, cud_id:, text:}` hash, where `cud_id` is the `limitationIndication` CUD reference used to resolve the text). From 3.0.7 onwards, `Builder#build_product` emits one `<INDICATION_CODE code="XXXXX.NN" cud_id="DRUG.NN">limitation text</INDICATION_CODE>` child per indication on every `<PRD>` in `oddb_product.xml`; live feed numbers: 539 products / 1,293 codes / 100 % with non-empty indication text. Mandatory on prescriptions/invoices for SL price-model drugs from 2026-07-01 — see issue [#113](https://github.com/zdavatz/oddb2xml/issues/113). **Limitation texts** (3.0.8 onwards): the `regulatedAuthorization-limitation` extension has no inline `limitationText` in the live BAG feed — it carries a `limitationIndication` reference to a `ClinicalUseDefinition` whose `indication.diseaseSymptomProcedure.concept.text` is the actual text. The parser stores the ref as `cud_ref` on each Limitation, `Bundle#cud_text_by_id` resolves DE, and `merge_language` propagates FR/IT from the per-language NDJSON files via the same CUD id. Coverage on the live feed jumped from 0 / 9'108 to 9'108 / 9'108 (issue [#116](https://github.com/zdavatz/oddb2xml/issues/116)). **Limitation code / LIMNAMEBAG** (3.0.12 onwards): FHIR has no native BAG limitation code (LIMCD), so `create_limitations_for_package` sets `LimitationCode = cud_ref` (the `limitationIndication` CUD id) instead of `""`. Without this, every FHIR limitation shared an empty `:code`; `Builder#build_artikelstamm` groups its `<LIMITATIONS>` section by code, so all of them collapsed into a single `<LIMITATION>` with an empty `<LIMNAMEBAG>` and only one text survived. Using the CUD id as the key makes each distinct limitation emit and be referenced from its `<PRODUCT>`. The downstream `bin/check_artikelstamm` (`semantic_check.rb`) also crashed on the lone-element output because Ox `:hash_no_attrs` collapses a one-child section into a Hash (and an empty one into nil) — `SemanticCheckXML#get_items` now normalises every section to an Array.
51
51
 
52
- 8. **Refdata cleanup** (`lib/oddb2xml/refdata_cleanup.rb`) — Compensates for known data-quality issues in upstream Refdata.Articles.xml before they reach the output. Each fix is guarded by a Swissmedic-side heuristic (e.g. comma in `substance_swissmedic` to distinguish mono products from real combinations). Currently fixes (a) the doubled-dose template bug (`X mg / X mg / Stk`, `fix_double_dose`, guarded by `single_substance?`) and (b) the spelled-out German galenic form `Retardtabletten` → house-style abbreviation `Ret Tabl` (`normalize_galenic_form` / `GALENIC_NORMALISATIONS`, issue #112 case #13, e.g. RINVOQ — a narrow word-boundary substitution that leaves legitimate brand suffixes like `TRAMAL retard` and Mepha's `Lactab` untouched). Called from `Builder#apply_refdata_description_cleanups!` at the start of `prepare_articles`. See GitHub issue #112 for the catalogue.
52
+ 8. **Refdata cleanup** (`lib/oddb2xml/refdata_cleanup.rb`) — Compensates for known data-quality issues in upstream Refdata.Articles.xml before they reach the output. Each fix is guarded by a Swissmedic-side heuristic (e.g. comma in `substance_swissmedic` to distinguish mono products from real combinations). Currently fixes (a) the doubled-dose template bug (`X mg / X mg / Stk`, `fix_double_dose`, guarded by `single_substance?`); (b) the spelled-out German galenic form `Retardtabletten` → house-style abbreviation `Ret Tabl` (`normalize_galenic_form` / `GALENIC_NORMALISATIONS`, issue #112 case #13, e.g. RINVOQ — a narrow word-boundary substitution that leaves legitimate brand suffixes like `TRAMAL retard` and Mepha's `Lactab` untouched); and (c) dose info Refdata dropped from `<FullName>`, sourced from the Swissmedic composition string `pack[:composition_swissmedic]` — `fix_missing_combo_dose` (#6, appends a combination's 2nd component strength), `fix_missing_dose` (#4, inserts a mono product's missing strength before the pack count), `fix_missing_volume` (#7, appends an injectable's per-pen volume); and (d) 50-char-truncation repairs — `fix_truncated_metoject` (#1, rebuilds METOJECT Autoinjektor names from the intact `<brand> Autoinjektor <dose>/<vol>` prefix + Swissmedic `size`, localised DE/FR/IT) and `fix_truncated_volume_unit` (#3, restores the cut `ml` of the VERACTIV Vitamin D3 drops). The (c) and (d) fixes are scoped to explicit IKSNR allow-lists (`COMBO_DOSE_IKSNR`/`MISSING_DOSE_IKSNR`/`MISSING_VOLUME_IKSNR`/`METOJECT_IKSNR`/`VERACTIV_VITD3_IKSNR`): a dry run proved a blanket heuristic mis-fires on hundreds of legitimate names (sodium counter-ion doses, strength-less phyto/powder products, concentration names like `CIMZIA 200 mg/ml`), so only catalogued registrations are touched — add an IKSNR to grow coverage. Called from `Builder#apply_refdata_description_cleanups!` at the start of `prepare_articles`. See GitHub issue #112 for the catalogue.
53
53
 
54
54
  9. **Chapter-70 hack** (`lib/oddb2xml/chapter_70_hack.rb`) — Legacy scraper for the SL "Komplementärarzneimittel" products (homeopathic/anthroposophic/phytotherapeutic), called only from `Builder#build_artikelstamm`. **Deprecated / non-FHIR only (3.0.11 onwards):** the source page `varia_De.htm` was rebuilt as a JavaScript SPA with no static data table, so the scraper now returns nothing there. These products + limitations now come through the FHIR feed (SL classification `20. KOMPLEMENTÄRARZNEIMITTEL`, 221 products on the live DE feed with real GTINs and limitation texts), so `build_artikelstamm` **skips the scraper entirely when `@options[:fhir]`** (the default for `--artikelstamm` since 3.0.9). In `--no-fhir` mode the scraper degrades gracefully (skips non-row/`<script>` nodes and empty tables, warns, returns `[]`) instead of raising `NoMethodError`. See GitHub issue #118.
55
55
 
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- oddb2xml (3.0.14)
4
+ oddb2xml (3.0.16)
5
5
  csv
6
6
  htmlentities
7
7
  httpi
data/History.txt CHANGED
@@ -1,3 +1,9 @@
1
+ === 3.0.16 / 02.06.2026
2
+ * Refdata cleanup: two more Swissmedic-sourced, IKSNR-scoped fixes for 50-char truncation (issue #112 cases #1 and #3). fix_truncated_metoject rebuilds every truncated METOJECT Autoinjektor name ("METOJECT Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1") from its intact "<brand> Autoinjektor <dose>/<vol>" prefix plus the authoritative Swissmedic pack size -> "METOJECT Autoinjektor 10 mg/0.2 ml Fertpen 1 Stk", localised for FR ("stylo pré … pce") and IT ("penna preriempita … pz"); it recovers the correct count even where the truncation cut the count digits. fix_truncated_volume_unit restores the lost final "l" of the VERACTIV Vitamin D3 drops volume ("… 20'000 U.I. 10m" -> "10ml"). Both are scoped to their registration (METOJECT 65672, VERACTIV Vitamin D3 57690) and idempotent. The French wording in the VERACTIV German name remains an upstream issue.
3
+
4
+ === 3.0.15 / 02.06.2026
5
+ * Refdata cleanup: reconstruct dose information that Refdata dropped from <FullName>, sourcing the authoritative values from the Swissmedic "Zugelassene Packungen" composition string (issue #112 cases #4/#6/#7). Three new guarded fixes in RefdataCleanup: fix_missing_combo_dose appends a combination's 2nd component strength ("ATOVAQUON PLUS … 250 mg" -> "250 mg / 100 mg"), fix_missing_dose inserts a mono product's missing strength before the pack count ("CETIRIZIN Spirig HC Filmtabl 30 Stk" -> "… 10 mg 30 Stk"), and fix_missing_volume appends an injectable's per-pen volume ("MOUNJARO KwikPen … 7.5 mg" -> "7.5 mg/0.6 ml"). Each fix is scoped to an explicit IKSNR allow-list: a dry run showed a blanket heuristic mis-fires on hundreds of legitimate names (sodium counter-ion doses, strength-less phyto/powder products, concentration names), so only the catalogued registrations are touched. Add an IKSNR to grow coverage.
6
+
1
7
  === 3.0.14 / 02.06.2026
2
8
  * Refdata cleanup: normalise the spelled-out German galenic form "Retardtabletten" to the Refdata house-style abbreviation "Ret Tabl" (issue #112 case #13, e.g. RINVOQ). Only 3 of the ~940 retard-tablet descriptions deviated from the convention; the fix is a narrow, German-only word-boundary substitution (RefdataCleanup.normalize_galenic_form, wired into Builder#apply_refdata_description_cleanups!). Legitimate brand suffixes ("TRAMAL retard", "XANAX retard", "MELATONIN Spirig HC retard") and Mepha's "Lactab" are left untouched.
3
9
 
data/README.md CHANGED
@@ -51,7 +51,7 @@ HIN (http://hin.ch) creates daily the actual file. They can be downloaded from `
51
51
  see `--help`.
52
52
 
53
53
  ```
54
- /opt/src/oddb2xml/bin/oddb2xml version 3.0.14
54
+ /opt/src/oddb2xml/bin/oddb2xml version 3.0.16
55
55
  Usage:
56
56
  oddb2xml [option]
57
57
  produced files are found under data
@@ -95,8 +95,7 @@ module Oddb2xml
95
95
  return if @refdata_descriptions_cleaned
96
96
  @refdata_descriptions_cleaned = true
97
97
  return if @refdata.nil? || @refdata.empty?
98
- double_dose_fixed = 0
99
- galenic_fixed = 0
98
+ counts = Hash.new(0)
100
99
  @refdata.each_value do |item|
101
100
  next unless item.is_a?(Hash)
102
101
  no8 = item[:no8]
@@ -104,26 +103,38 @@ module Oddb2xml
104
103
  pack = @packs[no8]
105
104
  next unless pack
106
105
  substance = pack[:substance_swissmedic]
106
+ composition = pack[:composition_swissmedic]
107
+ size = pack[:size]
107
108
  [:desc_de, :desc_fr, :desc_it].each do |key|
108
109
  original = item[key]
109
- cleaned = RefdataCleanup.fix_double_dose(original, substance)
110
- if cleaned != original
111
- item[key] = cleaned
112
- double_dose_fixed += 1
113
- end
114
- original = item[key]
115
- cleaned = RefdataCleanup.normalize_galenic_form(original)
116
- if cleaned != original
117
- item[key] = cleaned
118
- galenic_fixed += 1
110
+ next if original.nil? || original.empty?
111
+ desc = original
112
+ # Applied in order so later fixes see the output of earlier ones.
113
+ pipeline = [
114
+ [:metoject, ->(d) { RefdataCleanup.fix_truncated_metoject(d, no8, size) }],
115
+ [:veractiv_vol, ->(d) { RefdataCleanup.fix_truncated_volume_unit(d, no8) }],
116
+ [:double_dose, ->(d) { RefdataCleanup.fix_double_dose(d, substance) }],
117
+ [:galenic, ->(d) { RefdataCleanup.normalize_galenic_form(d) }],
118
+ [:combo_dose, ->(d) { RefdataCleanup.fix_missing_combo_dose(d, substance, composition, no8) }],
119
+ [:missing_dose, ->(d) { RefdataCleanup.fix_missing_dose(d, substance, composition, no8) }],
120
+ [:volume, ->(d) { RefdataCleanup.fix_missing_volume(d, composition, no8) }]
121
+ ]
122
+ pipeline.each do |rule, fix|
123
+ cleaned = fix.call(desc)
124
+ if cleaned != desc
125
+ desc = cleaned
126
+ counts[rule] += 1
127
+ end
119
128
  end
129
+ item[key] = desc if desc != original
120
130
  end
121
131
  end
122
- if double_dose_fixed > 0
123
- Oddb2xml.log("Refdata cleanup: fixed double-dose pattern in #{double_dose_fixed} description(s)")
124
- end
125
- if galenic_fixed > 0
126
- Oddb2xml.log("Refdata cleanup: normalised galenic form in #{galenic_fixed} description(s)")
132
+ labels = {metoject: "truncated METOJECT name", veractiv_vol: "truncated volume unit",
133
+ double_dose: "double-dose pattern", galenic: "galenic form",
134
+ combo_dose: "missing combo 2nd dose", missing_dose: "missing strength",
135
+ volume: "missing injection volume"}
136
+ counts.each do |rule, n|
137
+ Oddb2xml.log("Refdata cleanup: fixed #{labels[rule]} in #{n} description(s)") if n > 0
127
138
  end
128
139
  end
129
140
 
@@ -48,5 +48,151 @@ module Oddb2xml
48
48
  return desc if desc.nil? || desc.empty?
49
49
  GALENIC_NORMALISATIONS.reduce(desc) { |result, (re, repl)| result.gsub(re, repl) }
50
50
  end
51
+
52
+ # The following three fixes reconstruct dose information that Refdata
53
+ # dropped from <FullName>, sourcing the authoritative values from the
54
+ # Swissmedic "Zugelassene Packungen" composition string (already loaded as
55
+ # pack[:composition_swissmedic], keyed by the same SwissmedicNo8). See
56
+ # issue #112 cases #4 (missing strength), #6 (missing 2nd combo component)
57
+ # and #7 (missing injection volume).
58
+ #
59
+ # Each fix is scoped to an explicit allow-list of Swissmedic registration
60
+ # numbers (IKSNR, the first 5 digits of the no8). A blanket heuristic is
61
+ # NOT safe: a dry run over the full Refdata feed mis-fired on hundreds of
62
+ # legitimate names — combination detection grabbed sodium counter-ion doses
63
+ # ("KEPPRA … / 2.8 mg"), the missing-strength rule fired on strength-less
64
+ # phyto/powder products ("IMPORTAL Pulver"), and the volume rule corrupted
65
+ # concentration names ("CIMZIA 200 mg/ml"). Restricting to the catalogued
66
+ # registrations keeps the Swissmedic-derived value while touching only the
67
+ # known-bad products. Add an IKSNR here once a new case is confirmed.
68
+ COMBO_DOSE_IKSNR = %w[65280].freeze # #6 ATOVAQUON PLUS Spirig HC
69
+ MISSING_DOSE_IKSNR = %w[62568].freeze # #4 CETIRIZIN Spirig HC
70
+ MISSING_VOLUME_IKSNR = %w[69696].freeze # #7 MOUNJARO KwikPen
71
+
72
+ def self.iksnr_of(no8)
73
+ no8.to_s[0, 5]
74
+ end
75
+
76
+ # Builds a whitespace-tolerant matcher for a normalised dose value like
77
+ # "250 mg" so it also matches "250mg" in a description.
78
+ def self.dose_regex(dose)
79
+ m = dose.to_s.match(/\A([\d.,]+)\s*(.+?)\s*\z/)
80
+ return /#{Regexp.escape(dose.to_s)}/i unless m
81
+ /(?<![\d.,])#{Regexp.escape(m[1])}\s*#{Regexp.escape(m[2])}/i
82
+ end
83
+
84
+ # Returns the dose token that belongs to a named active substance in the
85
+ # Swissmedic composition, normalised to "<number> <unit>" (e.g.
86
+ # dose_for_substance(comp, "atovaquonum") => "250 mg"). Matches within the
87
+ # comma-delimited segment that names the substance so excipient doses are
88
+ # never picked up. Returns nil if absent.
89
+ def self.dose_for_substance(composition, substance)
90
+ return nil if composition.nil? || substance.nil?
91
+ key = substance.to_s.strip[/\A[A-Za-zÀ-ÿ]+/]
92
+ return nil if key.nil? || key.empty?
93
+ composition.split(",").each do |segment|
94
+ next unless /\b#{Regexp.escape(key)}/i.match?(segment)
95
+ m = segment.match(DOSE_TOKEN)
96
+ next unless m
97
+ parts = m[0].match(/\A([\d.,]+)\s*(.+?)\s*\z/)
98
+ return parts ? "#{parts[1]} #{parts[2]}" : m[0].strip
99
+ end
100
+ nil
101
+ end
102
+
103
+ # Case #6: a real combination product whose Refdata description carries
104
+ # only the first component's strength (e.g. "ATOVAQUON PLUS … 250 mg …").
105
+ # Appends the second active's strength from Swissmedic, producing
106
+ # "… 250 mg / 100 mg …". No-op for mono products, 3+ component combos, or
107
+ # when the second strength is already present.
108
+ def self.fix_missing_combo_dose(desc, swissmedic_substance, composition, no8)
109
+ return desc if desc.nil? || desc.empty?
110
+ return desc unless COMBO_DOSE_IKSNR.include?(iksnr_of(no8))
111
+ return desc if single_substance?(swissmedic_substance)
112
+ subs = swissmedic_substance.to_s.split(",").map(&:strip)
113
+ return desc unless subs.size == 2
114
+ d1 = dose_for_substance(composition, subs[0])
115
+ d2 = dose_for_substance(composition, subs[1])
116
+ return desc unless d1 && d2
117
+ return desc unless dose_regex(d1).match?(desc)
118
+ return desc if dose_regex(d2).match?(desc)
119
+ desc.sub(dose_regex(d1)) { |hit| "#{hit} / #{d2}" }
120
+ end
121
+
122
+ # Case #4: a mono product whose Refdata description carries NO strength at
123
+ # all (e.g. "CETIRIZIN Spirig HC Filmtabl 10 Stk"). Inserts the single
124
+ # active's strength from Swissmedic before the trailing "<count> <unit>"
125
+ # group → "CETIRIZIN Spirig HC Filmtabl 10 mg 10 Stk". No-op when a
126
+ # strength is already present or no trailing pack count exists.
127
+ def self.fix_missing_dose(desc, swissmedic_substance, composition, no8)
128
+ return desc if desc.nil? || desc.empty?
129
+ return desc unless MISSING_DOSE_IKSNR.include?(iksnr_of(no8))
130
+ return desc unless single_substance?(swissmedic_substance)
131
+ return desc if DOSE_TOKEN.match?(desc)
132
+ dose = dose_for_substance(composition, swissmedic_substance)
133
+ return desc unless dose
134
+ return desc unless /\s\d[\d.,']*\s+\S+\s*\z/.match?(desc)
135
+ desc.sub(/(\s)(\d[\d.,']*\s+\S+\s*)\z/, "\\1#{dose} \\2")
136
+ end
137
+
138
+ # Case #7: an injectable pen/solution whose Refdata description gives the
139
+ # strength but not the per-pen volume (e.g. "MOUNJARO KwikPen Inj Lös
140
+ # 7.5 mg 1 Stk"). Appends "/<vol> ml" taken from the Swissmedic
141
+ # composition ("… pro 0.6 ml …") → "… 7.5 mg/0.6 ml 1 Stk". Only fires for
142
+ # injectable forms that have no volume anywhere in the name yet.
143
+ def self.fix_missing_volume(desc, composition, no8)
144
+ return desc if desc.nil? || desc.empty?
145
+ return desc unless MISSING_VOLUME_IKSNR.include?(iksnr_of(no8))
146
+ return desc unless /\b(?:Inj|Fertpen|Injektor|stylo|sol\b)/i.match?(desc)
147
+ return desc if /\d\s*ml\b/i.match?(desc)
148
+ vol = composition.to_s[/\bpro\s+([\d.,]+)\s*ml\b/i, 1]
149
+ return desc unless vol
150
+ m = desc.match(/\d+(?:[.,]\d+)?\s*mg/i)
151
+ return desc unless m
152
+ desc.sub(m[0], "#{m[0]}/#{vol} ml")
153
+ end
154
+
155
+ METOJECT_IKSNR = %w[65672].freeze # #1 METOJECT Autoinjektor
156
+
157
+ # Localised "<pen> … <count> <unit>" suffix, selected by the galenic-form
158
+ # token Refdata uses per language. The "<brand> Autoinjektor <dose>/<vol>"
159
+ # prefix is identical across DE/FR/IT, so only the suffix is localised.
160
+ METOJECT_SUFFIX = {
161
+ /\bInj Lös\b/ => ["Fertpen", "Stk"], # DE
162
+ /\binj sol\b/ => ["stylo pré", "pce"], # FR
163
+ /\bsol inj\b/ => ["penna preriempita", "pz"] # IT
164
+ }.freeze
165
+
166
+ # Case #1: every METOJECT Autoinjektor name is truncated at Refdata's
167
+ # 50-char limit, carrying a redundant strength in the (often cut) tail
168
+ # ("METOJECT Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1"). Rebuild from the
169
+ # intact prefix plus the authoritative Swissmedic pack size →
170
+ # "METOJECT Autoinjektor 10 mg/0.2 ml Fertpen 1 Stk" (localised for FR/IT).
171
+ # Scoped to the METOJECT registration; idempotent once Refdata stops
172
+ # truncating (the rebuilt name no longer carries the redundant tail).
173
+ def self.fix_truncated_metoject(desc, no8, size)
174
+ return desc if desc.nil? || desc.empty?
175
+ return desc unless METOJECT_IKSNR.include?(iksnr_of(no8))
176
+ return desc if size.nil? || size.to_s.empty?
177
+ m = desc.match(%r{\A(METOJECT Autoinjektor \d[\d.]* mg/\d[\d.]* ml)\b})
178
+ return desc unless m
179
+ suffix = METOJECT_SUFFIX.find { |re, _| re.match?(desc) }
180
+ return desc unless suffix
181
+ pen, unit = suffix.last
182
+ "#{m[1]} #{pen} #{size} #{unit}"
183
+ end
184
+
185
+ VERACTIV_VITD3_IKSNR = %w[57690].freeze # #3 VERACTIV Vitamin D3 Wild
186
+
187
+ # Case #3 (partial): the VERACTIV Vitamin D3 drops are truncated at 50
188
+ # chars, losing the final "l" of the volume ("… 20'000 U.I. 10m" → "10ml").
189
+ # Restore it. The French wording ("Huile", drop-form codes) in the German
190
+ # name is a separate upstream issue and is left untouched. Scoped to the
191
+ # registration; a no-op once the volume already ends in "ml".
192
+ def self.fix_truncated_volume_unit(desc, no8)
193
+ return desc if desc.nil? || desc.empty?
194
+ return desc unless VERACTIV_VITD3_IKSNR.include?(iksnr_of(no8))
195
+ desc.sub(/(\d)\s*m\z/, '\1ml')
196
+ end
51
197
  end
52
198
  end
@@ -1,3 +1,3 @@
1
1
  module Oddb2xml
2
- VERSION = "3.0.14"
2
+ VERSION = "3.0.16"
3
3
  end
@@ -92,6 +92,142 @@ describe Oddb2xml::RefdataCleanup do
92
92
  expect(described_class.normalize_galenic_form(input)).to eq input
93
93
  end
94
94
  end
95
+
96
+ describe ".dose_for_substance" do
97
+ let(:comp) { "atovaquonum 250 mg, proguanili hydrochloridum 100 mg, cellulosum microcristallinum" }
98
+
99
+ it "returns the dose of the named active, normalised to '<n> <unit>'" do
100
+ expect(described_class.dose_for_substance(comp, "atovaquonum")).to eq "250 mg"
101
+ expect(described_class.dose_for_substance(comp, "proguanili hydrochloridum")).to eq "100 mg"
102
+ end
103
+
104
+ it "ignores excipient doses outside the substance's segment" do
105
+ cetirizin = "cetirizini dihydrochloridum 10 mg, lactosum monohydricum 65 mg"
106
+ expect(described_class.dose_for_substance(cetirizin, "cetirizini dihydrochloridum")).to eq "10 mg"
107
+ end
108
+
109
+ it "returns nil when the substance is absent" do
110
+ expect(described_class.dose_for_substance(comp, "ibuprofenum")).to be_nil
111
+ expect(described_class.dose_for_substance(nil, "atovaquonum")).to be_nil
112
+ end
113
+ end
114
+
115
+ describe ".fix_missing_combo_dose (issue #112 #6)" do
116
+ let(:combo) { "atovaquonum, proguanili hydrochloridum" }
117
+ let(:comp) { "atovaquonum 250 mg, proguanili hydrochloridum 100 mg, cellulosum microcristallinum" }
118
+
119
+ it "appends the 2nd combo component dose for the catalogued IKSNR (ATOVAQUON, 65280)" do
120
+ input = "ATOVAQUON PLUS Spirig HC Filmtabl 250 mg 12 Stk"
121
+ expect(described_class.fix_missing_combo_dose(input, combo, comp, "65280001"))
122
+ .to eq "ATOVAQUON PLUS Spirig HC Filmtabl 250 mg / 100 mg 12 Stk"
123
+ end
124
+
125
+ it "is a no-op for non-catalogued registrations (avoids the KEPPRA sodium misfire)" do
126
+ input = "KEPPRA Filmtabl 1000 mg 100 Stk"
127
+ keppra_comp = "levetiracetamum 1000 mg, ... corresp. natrium 2.8 mg"
128
+ expect(described_class.fix_missing_combo_dose(input, "levetiracetamum, natrium", keppra_comp, "29152001"))
129
+ .to eq input
130
+ end
131
+
132
+ it "is a no-op when the 2nd dose is already present" do
133
+ input = "ATOVAQUON PLUS Spirig HC Filmtabl 250 mg / 100 mg 12 Stk"
134
+ expect(described_class.fix_missing_combo_dose(input, combo, comp, "65280001")).to eq input
135
+ end
136
+
137
+ it "is a no-op for mono products" do
138
+ input = "X 250 mg 12 Stk"
139
+ expect(described_class.fix_missing_combo_dose(input, "atovaquonum", comp, "65280001")).to eq input
140
+ end
141
+ end
142
+
143
+ describe ".fix_missing_dose (issue #112 #4)" do
144
+ let(:comp) { "cetirizini dihydrochloridum 10 mg, lactosum monohydricum 65 mg, talcum" }
145
+ let(:sub) { "cetirizini dihydrochloridum" }
146
+
147
+ it "inserts the strength before the pack count for the catalogued IKSNR (CETIRIZIN, 62568)" do
148
+ expect(described_class.fix_missing_dose("CETIRIZIN Spirig HC Filmtabl 30 Stk", sub, comp, "62568007"))
149
+ .to eq "CETIRIZIN Spirig HC Filmtabl 10 mg 30 Stk"
150
+ end
151
+
152
+ it "works on French/Italian pack-count units" do
153
+ expect(described_class.fix_missing_dose("CETIRIZINE Spirig HC cpr pellic 30 pce", sub, comp, "62568007"))
154
+ .to eq "CETIRIZINE Spirig HC cpr pellic 10 mg 30 pce"
155
+ expect(described_class.fix_missing_dose("CETIRIZINA Spirig HC cpr riv 30 pz", sub, comp, "62568007"))
156
+ .to eq "CETIRIZINA Spirig HC cpr riv 10 mg 30 pz"
157
+ end
158
+
159
+ it "is a no-op for non-catalogued registrations (avoids the IMPORTAL powder misfire)" do
160
+ expect(described_class.fix_missing_dose("IMPORTAL Pulver Btl 50 Stk", "lactitolum", "lactitolum monohydricum 10 g", "43414001"))
161
+ .to eq "IMPORTAL Pulver Btl 50 Stk"
162
+ end
163
+
164
+ it "is a no-op when a strength is already present" do
165
+ input = "CETIRIZIN Spirig HC Filmtabl 10 mg 30 Stk"
166
+ expect(described_class.fix_missing_dose(input, sub, comp, "62568007")).to eq input
167
+ end
168
+ end
169
+
170
+ describe ".fix_missing_volume (issue #112 #7)" do
171
+ let(:comp) { "tirzepatidum 7.5 mg, ... ad solutionem pro 0.6 ml corresp. natrium 0.6 mg." }
172
+
173
+ it "appends the per-pen volume for the catalogued IKSNR (MOUNJARO, 69696)" do
174
+ expect(described_class.fix_missing_volume("MOUNJARO KwikPen Inj Lös 7.5 mg 1 Stk", comp, "69696003"))
175
+ .to eq "MOUNJARO KwikPen Inj Lös 7.5 mg/0.6 ml 1 Stk"
176
+ end
177
+
178
+ it "is a no-op for non-catalogued registrations (avoids the CIMZIA concentration misfire)" do
179
+ input = "CIMZIA AutoClicks 200 mg/ml Fertpen 2 Stk"
180
+ expect(described_class.fix_missing_volume(input, "... pro 1 ml ...", "58277001")).to eq input
181
+ end
182
+
183
+ it "never double-appends a volume that is already present" do
184
+ input = "MOUNJARO KwikPen Inj Lös 7.5 mg/0.6 ml 1 Stk"
185
+ expect(described_class.fix_missing_volume(input, comp, "69696003")).to eq input
186
+ end
187
+ end
188
+
189
+ describe ".fix_truncated_metoject (issue #112 #1)" do
190
+ it "rebuilds the DE name from the prefix plus the Swissmedic size" do
191
+ expect(described_class.fix_truncated_metoject("METOJECT Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1", "65672106", "1"))
192
+ .to eq "METOJECT Autoinjektor 10 mg/0.2 ml Fertpen 1 Stk"
193
+ end
194
+
195
+ it "localises French (stylo pré/pce) and Italian (penna preriempita/pz)" do
196
+ expect(described_class.fix_truncated_metoject("METOJECT Autoinjektor 10 mg/0.2 ml inj sol 10 mg 1", "65672106", "1"))
197
+ .to eq "METOJECT Autoinjektor 10 mg/0.2 ml stylo pré 1 pce"
198
+ expect(described_class.fix_truncated_metoject("METOJECT Autoinjektor 10 mg/0.2 ml sol inj 10 mg 1", "65672106", "1"))
199
+ .to eq "METOJECT Autoinjektor 10 mg/0.2 ml penna preriempita 1 pz"
200
+ end
201
+
202
+ it "uses the Swissmedic size even when the truncated count was cut off" do
203
+ expect(described_class.fix_truncated_metoject("METOJECT Autoinjektor 12.5 mg/0.25 ml Inj Lös 12.5", "65672111", "12"))
204
+ .to eq "METOJECT Autoinjektor 12.5 mg/0.25 ml Fertpen 12 Stk"
205
+ end
206
+
207
+ it "is a no-op for other registrations and without a size" do
208
+ other = "FOO Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1"
209
+ expect(described_class.fix_truncated_metoject(other, "99999001", "1")).to eq other
210
+ keep = "METOJECT Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1"
211
+ expect(described_class.fix_truncated_metoject(keep, "65672106", nil)).to eq keep
212
+ end
213
+ end
214
+
215
+ describe ".fix_truncated_volume_unit (issue #112 #3)" do
216
+ it "restores the truncated 'ml' for the VERACTIV Vitamin D3 registration" do
217
+ expect(described_class.fix_truncated_volume_unit("VERACTIV Vitamin D3 Wild Huile Trp 20'000 U.I. 10m", "57690004"))
218
+ .to eq "VERACTIV Vitamin D3 Wild Huile Trp 20'000 U.I. 10ml"
219
+ end
220
+
221
+ it "is a no-op when the volume already ends in 'ml'" do
222
+ input = "VERACTIV Vitamin D3 Wild Huile Trp 20'000 U.I. 10ml"
223
+ expect(described_class.fix_truncated_volume_unit(input, "57690004")).to eq input
224
+ end
225
+
226
+ it "is a no-op for other registrations" do
227
+ input = "FOO Tropfen 10m"
228
+ expect(described_class.fix_truncated_volume_unit(input, "12345001")).to eq input
229
+ end
230
+ end
95
231
  end
96
232
 
97
233
  describe Oddb2xml::Builder do
@@ -199,5 +335,53 @@ describe Oddb2xml::Builder do
199
335
  expect(item[:desc_fr]).to eq "RINVOQ comprimé à libération prolong. 30 mg 28 pce"
200
336
  expect(item[:desc_it]).to eq "RINVOQ compresse a rilascio prolungato 30 mg 28 pz"
201
337
  end
338
+
339
+ it "reconstructs missing dose info from Swissmedic for catalogued articles (issue #112 #4/#6/#7)" do
340
+ builder.packs = {
341
+ "65280001" => {substance_swissmedic: "atovaquonum, proguanili hydrochloridum",
342
+ composition_swissmedic: "atovaquonum 250 mg, proguanili hydrochloridum 100 mg, cellulosum"},
343
+ "62568007" => {substance_swissmedic: "cetirizini dihydrochloridum",
344
+ composition_swissmedic: "cetirizini dihydrochloridum 10 mg, lactosum monohydricum 65 mg"},
345
+ "69696003" => {substance_swissmedic: "tirzepatidum",
346
+ composition_swissmedic: "tirzepatidum 7.5 mg, ad solutionem pro 0.6 ml corresp. natrium"}
347
+ }
348
+ builder.refdata = {
349
+ "7680652800017" => {ean13: "7680652800017", no8: "65280001",
350
+ desc_de: "ATOVAQUON PLUS Spirig HC Filmtabl 250 mg 12 Stk", desc_fr: "", desc_it: ""},
351
+ "7680625680073" => {ean13: "7680625680073", no8: "62568007",
352
+ desc_de: "CETIRIZIN Spirig HC Filmtabl 30 Stk", desc_fr: "", desc_it: ""},
353
+ "7680696960036" => {ean13: "7680696960036", no8: "69696003",
354
+ desc_de: "MOUNJARO KwikPen Inj Lös 7.5 mg 1 Stk", desc_fr: "", desc_it: ""}
355
+ }
356
+
357
+ builder.apply_refdata_description_cleanups!
358
+
359
+ expect(builder.refdata["7680652800017"][:desc_de]).to eq "ATOVAQUON PLUS Spirig HC Filmtabl 250 mg / 100 mg 12 Stk"
360
+ expect(builder.refdata["7680625680073"][:desc_de]).to eq "CETIRIZIN Spirig HC Filmtabl 10 mg 30 Stk"
361
+ expect(builder.refdata["7680696960036"][:desc_de]).to eq "MOUNJARO KwikPen Inj Lös 7.5 mg/0.6 ml 1 Stk"
362
+ end
363
+
364
+ it "rebuilds truncated names from Swissmedic for catalogued articles (issue #112 #1/#3)" do
365
+ builder.packs = {
366
+ "65672106" => {substance_swissmedic: "methotrexatum", composition_swissmedic: "", size: "1"},
367
+ "57690004" => {substance_swissmedic: "colecalciferolum", composition_swissmedic: "", size: "1"}
368
+ }
369
+ builder.refdata = {
370
+ "7680656721066" => {ean13: "7680656721066", no8: "65672106",
371
+ desc_de: "METOJECT Autoinjektor 10 mg/0.2 ml Inj Lös 10 mg 1",
372
+ desc_fr: "METOJECT Autoinjektor 10 mg/0.2 ml inj sol 10 mg 1",
373
+ desc_it: "METOJECT Autoinjektor 10 mg/0.2 ml sol inj 10 mg 1"},
374
+ "7680576900046" => {ean13: "7680576900046", no8: "57690004",
375
+ desc_de: "VERACTIV Vitamin D3 Wild Huile Trp 20'000 U.I. 10m", desc_fr: "", desc_it: ""}
376
+ }
377
+
378
+ builder.apply_refdata_description_cleanups!
379
+
380
+ met = builder.refdata["7680656721066"]
381
+ expect(met[:desc_de]).to eq "METOJECT Autoinjektor 10 mg/0.2 ml Fertpen 1 Stk"
382
+ expect(met[:desc_fr]).to eq "METOJECT Autoinjektor 10 mg/0.2 ml stylo pré 1 pce"
383
+ expect(met[:desc_it]).to eq "METOJECT Autoinjektor 10 mg/0.2 ml penna preriempita 1 pz"
384
+ expect(builder.refdata["7680576900046"][:desc_de]).to eq "VERACTIV Vitamin D3 Wild Huile Trp 20'000 U.I. 10ml"
385
+ end
202
386
  end
203
387
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: oddb2xml
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.14
4
+ version: 3.0.16
5
5
  platform: ruby
6
6
  authors:
7
7
  - Yasuhiro Asaka, Zeno R.R. Davatz, Niklaus Giger