oddb2xml 3.0.2 → 3.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 55725f8e11fa91216b5f7189f9339a6bacf6fd69af78c633421abc8c573a2326
4
- data.tar.gz: acd34c442ef756f43aeac9e2551df614fff4bad81894a87add521e4c7efab8cd
3
+ metadata.gz: 25c142b0dfeb3bb43d540e7d9837b5199d5e2eb7c5afbd07fe278f8e4b40516e
4
+ data.tar.gz: d9dad26c7d08193bc3af27262ebfd32f0edce7d4c56e457d6d3cfabf845900c7
5
5
  SHA512:
6
- metadata.gz: 273f80bb691ae1d0e86db41d9215456c48822f4a764163c3675d87ca6f975f11cd8d2dce89f4bf71abd9b8e320b61babd45c73d0683faafbeb25adf56f2d12b6
7
- data.tar.gz: c0014b66c15ebddcff17cf1435d24faab68ab856faa5d198208f94d9d6925540d4939ce4ff82f4f4d38c1fc319156dbaecf49415f56e8fce1d676d3e06b179fc
6
+ metadata.gz: bd8480feba005844d3d2dd46f31a69bce0cdf076a75c041eeed53808a66733cb821cdc6633ae82cbba4a1347822ec123bf2ed594609145ceb8c3227ab06f7433
7
+ data.tar.gz: ebee25f81ef3c9f56c748175fca7d8c12c1894a3ba5db1d94d4c494f0deb223192af0f727cf567921981ce00e99f412789ba92a1cd5ae11f8c80d745dba99cfa
data/CLAUDE.md CHANGED
@@ -39,7 +39,7 @@ The system follows a **download → extract → build → compress** pipeline:
39
39
 
40
40
  2. **Downloaders** (`lib/oddb2xml/downloader.rb`) — 11 subclasses of `Downloader`, each fetching from a specific Swiss data source. Files cached in `./downloads/`.
41
41
 
42
- 3. **Extractors** (`lib/oddb2xml/extractor.rb`) — Matching extractor classes that parse downloaded files into Ruby hashes. Formats include XML (nokogiri/sax-machine), XLSX (rubyXL), SOAP (savon), CSV, and fixed-width text.
42
+ 3. **Extractors** (`lib/oddb2xml/extractor.rb`) — Matching extractor classes that parse downloaded files into Ruby hashes. Formats include XML (nokogiri/sax-machine), XLSX (rubyXL), CSV, and fixed-width text. Refdata uses the new SwissReg XML format from a zip download (`files.refdata.ch`).
43
43
 
44
44
  4. **Builder** (`lib/oddb2xml/builder.rb`) — The largest file (~1900 lines). Merges extracted data and generates output XML/DAT files. Methods follow `prepare_*` (data assembly) and `build_*` (output generation) naming.
45
45
 
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- oddb2xml (3.0.2)
4
+ oddb2xml (3.0.3)
5
5
  htmlentities
6
6
  httpi
7
7
  mechanize (>= 2.8.5)
data/History.txt CHANGED
@@ -1,7 +1,15 @@
1
+ === 3.0.3 / 24.04.2026
2
+ * FHIR: download per-language NDJSON files (foph-sl-export-latest-{de,fr,it}.ndjson) so French and Italian product names/descriptions are populated
3
+ * FHIR: map legal status code 756005022008 to Swissmedic category D (in addition to 756005022007)
4
+
1
5
  === 3.0.2 / 09.03.2026
2
6
  * Use raw.githubusercontent.com URL for ATC CSV to avoid 429 Too Many Requests errors
3
7
  * Add retry logic with exponential backoff for HTTP 429 errors in uri_open
4
8
  * Remove obsolete Ruby version check in uri_open (Ruby >= 2.5 already required)
9
+ * Migrate Refdata from SOAP API to new SwissReg zip download (files.refdata.ch)
10
+ * Fix UTF-8 encoding for German umlauts in SORTD field (e.g. ö → Ö)
11
+ * Restore Italian description fallback to German when not available from Refdata
12
+ * Fix Optimist short option conflict for --fhir and --fhir_url options
5
13
 
6
14
  === 2.7.9 / 19.09.22
7
15
  * Remove newly generated DSCRI when not running --artikelstamm and
data/README.md CHANGED
@@ -25,7 +25,7 @@ creates .dat files according to ([IGM-11](http://dev.ywesee.com/uploads/att/IGM.
25
25
  * oddb.dat
26
26
  * oddb_with_migel.dat
27
27
 
28
- the files are using [swissINDEX](http://www.refdata.ch/downloads/company/download/swissindex_TechnischeBeschreibung.pdf), [BAG-XML](http://bag.e-mediat.net/SL2007.Web.External/Default.aspx?webgrab=ignore) and [Swissmedic](http://www.swissmedic.ch/daten/00080/00251/index.html?lang=de) as sources.
28
+ the files are using [Refdata](https://files.refdata.ch/simis-public-prod/Articles/1.0/Refdata.Articles.zip), [BAG-XML](http://bag.e-mediat.net/SL2007.Web.External/Default.aspx?webgrab=ignore) and [Swissmedic](http://www.swissmedic.ch/daten/00080/00251/index.html?lang=de) as sources.
29
29
 
30
30
  The following additional data is in the files:
31
31
 
@@ -274,16 +274,14 @@ We use the following files:
274
274
 
275
275
  * https://www.swissmedic.ch/arzneimittel/00156/00221/00222/00230/index.html?lang=de (Präparateliste und zugelassene Packungen)
276
276
  * https://raw.githubusercontent.com/zdavatz/oddb2xml_files/master/interactions_de_utf8.csv
277
- * http://refdatabase.refdata.ch/Service/Article.asmx
277
+ * https://files.refdata.ch/simis-public-prod/Articles/1.0/Refdata.Articles.zip
278
278
  * http://bag.e-mediat.net/SL2007.Web.External/File.axd?file=XMLPublications.zip
279
279
  * https://www.medregbm.admin.ch/Publikation/CreateExcelListBetriebs
280
280
  * https://www.medregbm.admin.ch/Publikation/CreateExcelListMedizinalPersons
281
281
  * http://zurrose.com/fileadmin/main/lib/download.php?file=/fileadmin/user_upload/downloads/ProduktUpdate/IGM11_mit_MwSt/Vollstamm/transfer.dat
282
- * https://index.ws.e-mediat.net/Swissindex/NonPharma/ws_NonPharma_V101.asmx
283
- * https://index.ws.e-mediat.net/Swissindex/NonPharma/ws_Pharma_V101.asmx
282
+ * https://raw.githubusercontent.com/zdavatz/oddb2xml_files/master/NON-Pharma.xls
284
283
  * http://download.swissmedicinfo.ch/ (AipsDownload)
285
284
  * https://raw.githubusercontent.com/zdavatz/oddb2xml_files/master/LPPV.txt
286
- * https://raw.githubusercontent.com/epha/robot/master/data/manual/swissmedic/atc.csv
287
285
  * https://raw.githubusercontent.com/zdavatz/cpp2sqlite/master/input/atc_codes_multi_lingual.txt
288
286
 
289
287
  ## Rules for matching GTIN (aka EAN13), product number and IKSNR
data/lib/oddb2xml/cli.rb CHANGED
@@ -285,10 +285,15 @@ module Oddb2xml
285
285
  # instead of Thread.new do
286
286
 
287
287
  downloader = FhirDownloader.new(@options)
288
- fhir_file = downloader.download
289
- Oddb2xml.log("FhirDownloader downloaded #{File.size(fhir_file)} bytes")
288
+ fhir_files = downloader.download
289
+ total_bytes = if fhir_files.is_a?(Hash)
290
+ fhir_files.values.sum { |f| File.size(f) }
291
+ else
292
+ File.size(fhir_files)
293
+ end
294
+ Oddb2xml.log("FhirDownloader downloaded #{total_bytes} bytes")
290
295
  @mutex.synchronize do
291
- hsh = FhirExtractor.new(fhir_file).to_hash
296
+ hsh = FhirExtractor.new(fhir_files).to_hash
292
297
  @items = hsh
293
298
  Oddb2xml.log("FhirExtractor added #{@items.size} items from FHIR")
294
299
  @items
@@ -135,6 +135,7 @@ module Oddb2xml
135
135
  end
136
136
  end
137
137
  end
138
+ xml.force_encoding("UTF-8") if xml.encoding.name != "UTF-8"
138
139
  xml
139
140
  end
140
141
  end
@@ -248,13 +249,7 @@ module Oddb2xml
248
249
  end
249
250
 
250
251
  def init
251
- config = {
252
- log_level: :info,
253
- log: false, # $stdout
254
- raise_errors: true,
255
- wsdl: @url
256
- }
257
- @client = Savon::Client.new(config)
252
+ # No SOAP client needed - we download a zip file directly
258
253
  end
259
254
 
260
255
  def download
@@ -202,7 +202,8 @@ module Oddb2xml
202
202
 
203
203
  class RefdataExtractor < Extractor
204
204
  def initialize(xml, type)
205
- @type = (type == :pharma ? "PHARMA" : "NONPHARMA")
205
+ @type = (type.to_s.upcase == "PHARMA" ? "PHARMA" : "NONPHARMA")
206
+ xml = xml.dup.force_encoding("UTF-8") if xml.encoding.name != "UTF-8"
206
207
  super(xml)
207
208
  end
208
209
 
@@ -242,6 +243,7 @@ module Oddb2xml
242
243
  item[:desc_it] = name.FullName
243
244
  end
244
245
  end
246
+ item[:desc_it] = item[:desc_de] if item[:desc_it].empty? # refdata has no italian name
245
247
  item[:atc_code] = article.MedicinalProduct.ProductClassification.Atc || ""
246
248
  item[:company_name] = article.PackagedProduct.Holder.Name || ""
247
249
  item[:company_ean] = article.PackagedProduct.Holder.Identifier || ""
@@ -15,62 +15,69 @@ module Oddb2xml
15
15
 
16
16
  BASE_URL = "https://epl.bag.admin.ch"
17
17
  STATIC_FHIR_PATH = "/static/fhir"
18
+ LANGUAGES = %w[de fr it].freeze
18
19
 
19
20
  def initialize(options = {})
20
21
  @options = options
21
- @url = find_latest_fhir_url
22
- super(options, @url)
22
+ super(options, BASE_URL)
23
23
  end
24
24
 
25
+ # Returns either a single file path String (when --fhir_url is used) or a
26
+ # Hash of { "de" => path, "fr" => path, "it" => path } for per-language
27
+ # NDJSON files.
25
28
  def download
26
- filename = File.basename(@url)
29
+ if @options[:fhir_url]
30
+ @url = @options[:fhir_url]
31
+ download_one(@url)
32
+ else
33
+ files = {}
34
+ LANGUAGES.each do |lang|
35
+ url = "#{BASE_URL}#{STATIC_FHIR_PATH}/foph-sl-export-latest-#{lang}.ndjson"
36
+ path = download_one(url)
37
+ files[lang] = path if path
38
+ end
39
+ raise "FhirDownloader: no FHIR files downloaded successfully" if files.empty?
40
+ files
41
+ end
42
+ end
43
+
44
+ private
45
+
46
+ def download_one(url)
47
+ @url = url
48
+ filename = File.basename(url)
27
49
  file = File.join(WORK_DIR, filename)
28
50
  @file2save = File.join(DOWNLOADS, filename)
29
51
 
30
- report_download(@url, @file2save)
52
+ report_download(url, @file2save)
31
53
 
32
- # Check if we should skip download (file exists and is recent)
33
54
  if skip_download?
34
55
  Oddb2xml.log "FhirDownloader: Skip downloading #{@file2save} (#{format_size(File.size(@file2save))}, less than 24h old)"
35
56
  return File.expand_path(@file2save)
36
57
  end
37
58
 
38
59
  begin
39
- # Download the file
40
60
  download_as(file, "w+")
41
61
 
42
- # Validate NDJSON format
43
62
  if validate_ndjson(@file2save)
44
63
  line_count = count_ndjson_lines(@file2save)
45
- Oddb2xml.log "FhirDownloader: NDJSON validation successful (#{line_count} bundles, #{format_size(File.size(@file2save))})"
64
+ Oddb2xml.log "FhirDownloader: NDJSON validation successful for #{filename} (#{line_count} bundles, #{format_size(File.size(@file2save))})"
46
65
  else
47
- Oddb2xml.log "FhirDownloader: WARNING - NDJSON validation failed!"
66
+ Oddb2xml.log "FhirDownloader: WARNING - NDJSON validation failed for #{filename}!"
48
67
  end
49
68
 
50
69
  File.expand_path(@file2save)
51
70
  rescue Timeout::Error, Errno::ETIMEDOUT
52
71
  retrievable? ? retry : raise
53
72
  rescue => error
54
- Oddb2xml.log "FhirDownloader: Error downloading FHIR file: #{error.message}"
55
- raise
73
+ Oddb2xml.log "FhirDownloader: Error downloading #{filename}: #{error.message}"
74
+ nil
56
75
  ensure
57
76
  Oddb2xml.download_finished(@file2save, false)
58
77
  FileUtils.rm_f(file, verbose: true) if File.exist?(file) && file != @file2save
59
78
  end
60
79
  end
61
80
 
62
- private
63
-
64
- def find_latest_fhir_url
65
- agent = Mechanize.new
66
- response = agent.get "https://epl.bag.admin.ch/api/sl/public/resources/current"
67
- resources = JSON.parse(response.body)
68
- "https://epl.bag.admin.ch/static/" + resources["fhir"]["fileUrl"]
69
- rescue => e
70
- Oddb2xml.log "FhirDownloader: Error finding latest URL: #{e.message}"
71
- nil
72
- end
73
-
74
81
  def skip_download?
75
82
  @options[:skip_download] || (File.exist?(@file2save) && file_age_hours(@file2save) < 24)
76
83
  end
@@ -552,7 +559,7 @@ module Oddb2xml
552
559
  "B"
553
560
  when "756005022005"
554
561
  "C"
555
- when "756005022007"
562
+ when "756005022007", "756005022008"
556
563
  "D"
557
564
  when "756005022009"
558
565
  "E"
@@ -565,8 +572,16 @@ module Oddb2xml
565
572
 
566
573
  # FHIR Extractor - Compatible with existing BagXmlExtractor
567
574
  class FhirExtractor < Extractor
568
- def initialize(fhir_file)
569
- @fhir_file = fhir_file
575
+ # Accepts either a single NDJSON file path (back-compat) or a Hash
576
+ # { "de" => path, "fr" => path, "it" => path } of per-language files.
577
+ def initialize(fhir_files)
578
+ if fhir_files.is_a?(Hash)
579
+ @fhir_files = fhir_files
580
+ @fhir_file = fhir_files["de"] || fhir_files.values.first
581
+ else
582
+ @fhir_files = {"de" => fhir_files}
583
+ @fhir_file = fhir_files
584
+ end
570
585
  end
571
586
 
572
587
  def to_hash
@@ -712,9 +727,54 @@ module Oddb2xml
712
727
  end
713
728
  end
714
729
 
730
+ # Merge names/descriptions from additional language files
731
+ @fhir_files.each do |lang, file|
732
+ next if file == @fhir_file
733
+ next unless file && File.exist?(file)
734
+ merge_language(data, file, lang)
735
+ end
736
+
715
737
  Oddb2xml.log "FhirExtractor: Extracted #{data.size} packages"
716
738
  data
717
739
  end
740
+
741
+ private
742
+
743
+ def merge_language(data, file, lang)
744
+ Oddb2xml.log "FhirExtractor: Merging #{lang} names/descriptions from #{file}"
745
+ result = FhirPreparationsEntry.parse(file)
746
+ name_accessor = "Name#{lang.capitalize}"
747
+ name_key = "name_#{lang}".to_sym
748
+ desc_key = "desc_#{lang}".to_sym
749
+
750
+ result.Preparations.Preparation.each do |seq|
751
+ next unless seq && seq.Packs && seq.Packs.Pack
752
+
753
+ translated_name = seq.respond_to?(name_accessor) ? seq.send(name_accessor) : nil
754
+
755
+ seq.Packs.Pack.each do |pac|
756
+ next unless pac.GTIN
757
+ ean13 = pac.GTIN.to_s
758
+ item = data[ean13]
759
+ next unless item
760
+
761
+ if translated_name && !translated_name.empty?
762
+ item[name_key] = translated_name
763
+ if item[:packages][ean13]
764
+ item[:packages][ean13][name_key] = translated_name
765
+ end
766
+ end
767
+
768
+ # The FHIR parser assigns pkg.description to all three
769
+ # Description* fields; in a language-specific file this is the
770
+ # description in that language.
771
+ desc = pac.DescriptionDe
772
+ if desc && !desc.empty? && item[:packages][ean13]
773
+ item[:packages][ean13][desc_key] = desc
774
+ end
775
+ end
776
+ end
777
+ end
718
778
  end
719
779
  end
720
780
 
@@ -22,8 +22,8 @@ module Oddb2xml
22
22
  opt :extended, "pharma, non-pharma plus prices and non-pharma from zurrose.
23
23
  Products without EAN-Code will also be listed.
24
24
  File oddb_calc.xml will also be generated"
25
- opt :fhir, "Use FHIR NDJSON format from FOPH/BAG instead of XML from Spezialitätenliste", default: false
26
- opt :fhir_url, "Specific FHIR NDJSON URL to download (implies --fhir)", type: :string, default: nil
25
+ opt :fhir, "Use FHIR NDJSON format from FOPH/BAG instead of XML from Spezialitätenliste", default: false, short: :none
26
+ opt :fhir_url, "Specific FHIR NDJSON URL to download (implies --fhir)", type: :string, default: nil, short: :none
27
27
  opt :format, "File format F, default is xml. {xml|dat}
28
28
  If F is given, -o option is ignored.", type: :string, default: "xml"
29
29
  opt :include, "Include target option for ean14 for 'dat' format.
@@ -1,3 +1,3 @@
1
1
  module Oddb2xml
2
- VERSION = "3.0.2"
2
+ VERSION = "3.0.3"
3
3
  end
data/spec/builder_spec.rb CHANGED
@@ -10,7 +10,7 @@ ARTICLE_ATTRIBUTE_TESTS = [
10
10
  ["ARTICLE", "PROD_DATE", Oddb2xml::DATE_REGEXP],
11
11
  ["ARTICLE", "VALID_DATE", Oddb2xml::DATE_REGEXP],
12
12
  ["ARTICLE/ART", "SHA256", /[a-f0-9]{32}/],
13
- ["ARTICLE/ART", "DT", /\d{4}-\d{2}-\d{2}/]
13
+ ["ARTICLE/ART", "DT", //]
14
14
  ]
15
15
 
16
16
  ARTICLE_MISSING_ELEMENTS = [