oddb2xml 3.0.2 → 3.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CLAUDE.md +1 -1
- data/Gemfile.lock +1 -1
- data/History.txt +8 -0
- data/README.md +3 -5
- data/lib/oddb2xml/cli.rb +8 -3
- data/lib/oddb2xml/downloader.rb +2 -7
- data/lib/oddb2xml/extractor.rb +3 -1
- data/lib/oddb2xml/fhir_support.rb +86 -26
- data/lib/oddb2xml/options.rb +2 -2
- data/lib/oddb2xml/version.rb +1 -1
- data/spec/builder_spec.rb +1 -1
- data/spec/data/Refdata.Articles.xml +853 -0
- data/spec/downloader_spec.rb +4 -4
- data/spec/extractor_spec.rb +3 -3
- data/spec/fixtures/vcr_cassettes/oddb2xml.json +2092 -1980
- data/spec/options_spec.rb +2 -0
- data/spec/spec_helper.rb +5 -3
- metadata +4 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 25c142b0dfeb3bb43d540e7d9837b5199d5e2eb7c5afbd07fe278f8e4b40516e
|
|
4
|
+
data.tar.gz: d9dad26c7d08193bc3af27262ebfd32f0edce7d4c56e457d6d3cfabf845900c7
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: bd8480feba005844d3d2dd46f31a69bce0cdf076a75c041eeed53808a66733cb821cdc6633ae82cbba4a1347822ec123bf2ed594609145ceb8c3227ab06f7433
|
|
7
|
+
data.tar.gz: ebee25f81ef3c9f56c748175fca7d8c12c1894a3ba5db1d94d4c494f0deb223192af0f727cf567921981ce00e99f412789ba92a1cd5ae11f8c80d745dba99cfa
|
data/CLAUDE.md
CHANGED
|
@@ -39,7 +39,7 @@ The system follows a **download → extract → build → compress** pipeline:
|
|
|
39
39
|
|
|
40
40
|
2. **Downloaders** (`lib/oddb2xml/downloader.rb`) — 11 subclasses of `Downloader`, each fetching from a specific Swiss data source. Files cached in `./downloads/`.
|
|
41
41
|
|
|
42
|
-
3. **Extractors** (`lib/oddb2xml/extractor.rb`) — Matching extractor classes that parse downloaded files into Ruby hashes. Formats include XML (nokogiri/sax-machine), XLSX (rubyXL),
|
|
42
|
+
3. **Extractors** (`lib/oddb2xml/extractor.rb`) — Matching extractor classes that parse downloaded files into Ruby hashes. Formats include XML (nokogiri/sax-machine), XLSX (rubyXL), CSV, and fixed-width text. Refdata uses the new SwissReg XML format from a zip download (`files.refdata.ch`).
|
|
43
43
|
|
|
44
44
|
4. **Builder** (`lib/oddb2xml/builder.rb`) — The largest file (~1900 lines). Merges extracted data and generates output XML/DAT files. Methods follow `prepare_*` (data assembly) and `build_*` (output generation) naming.
|
|
45
45
|
|
data/Gemfile.lock
CHANGED
data/History.txt
CHANGED
|
@@ -1,7 +1,15 @@
|
|
|
1
|
+
=== 3.0.3 / 24.04.2026
|
|
2
|
+
* FHIR: download per-language NDJSON files (foph-sl-export-latest-{de,fr,it}.ndjson) so French and Italian product names/descriptions are populated
|
|
3
|
+
* FHIR: map legal status code 756005022008 to Swissmedic category D (in addition to 756005022007)
|
|
4
|
+
|
|
1
5
|
=== 3.0.2 / 09.03.2026
|
|
2
6
|
* Use raw.githubusercontent.com URL for ATC CSV to avoid 429 Too Many Requests errors
|
|
3
7
|
* Add retry logic with exponential backoff for HTTP 429 errors in uri_open
|
|
4
8
|
* Remove obsolete Ruby version check in uri_open (Ruby >= 2.5 already required)
|
|
9
|
+
* Migrate Refdata from SOAP API to new SwissReg zip download (files.refdata.ch)
|
|
10
|
+
* Fix UTF-8 encoding for German umlauts in SORTD field (e.g. ö → Ö)
|
|
11
|
+
* Restore Italian description fallback to German when not available from Refdata
|
|
12
|
+
* Fix Optimist short option conflict for --fhir and --fhir_url options
|
|
5
13
|
|
|
6
14
|
=== 2.7.9 / 19.09.22
|
|
7
15
|
* Remove newly generated DSCRI when not running --artikelstamm and
|
data/README.md
CHANGED
|
@@ -25,7 +25,7 @@ creates .dat files according to ([IGM-11](http://dev.ywesee.com/uploads/att/IGM.
|
|
|
25
25
|
* oddb.dat
|
|
26
26
|
* oddb_with_migel.dat
|
|
27
27
|
|
|
28
|
-
the files are using [
|
|
28
|
+
the files are using [Refdata](https://files.refdata.ch/simis-public-prod/Articles/1.0/Refdata.Articles.zip), [BAG-XML](http://bag.e-mediat.net/SL2007.Web.External/Default.aspx?webgrab=ignore) and [Swissmedic](http://www.swissmedic.ch/daten/00080/00251/index.html?lang=de) as sources.
|
|
29
29
|
|
|
30
30
|
The following additional data is in the files:
|
|
31
31
|
|
|
@@ -274,16 +274,14 @@ We use the following files:
|
|
|
274
274
|
|
|
275
275
|
* https://www.swissmedic.ch/arzneimittel/00156/00221/00222/00230/index.html?lang=de (Präparateliste und zugelassene Packungen)
|
|
276
276
|
* https://raw.githubusercontent.com/zdavatz/oddb2xml_files/master/interactions_de_utf8.csv
|
|
277
|
-
*
|
|
277
|
+
* https://files.refdata.ch/simis-public-prod/Articles/1.0/Refdata.Articles.zip
|
|
278
278
|
* http://bag.e-mediat.net/SL2007.Web.External/File.axd?file=XMLPublications.zip
|
|
279
279
|
* https://www.medregbm.admin.ch/Publikation/CreateExcelListBetriebs
|
|
280
280
|
* https://www.medregbm.admin.ch/Publikation/CreateExcelListMedizinalPersons
|
|
281
281
|
* http://zurrose.com/fileadmin/main/lib/download.php?file=/fileadmin/user_upload/downloads/ProduktUpdate/IGM11_mit_MwSt/Vollstamm/transfer.dat
|
|
282
|
-
* https://
|
|
283
|
-
* https://index.ws.e-mediat.net/Swissindex/NonPharma/ws_Pharma_V101.asmx
|
|
282
|
+
* https://raw.githubusercontent.com/zdavatz/oddb2xml_files/master/NON-Pharma.xls
|
|
284
283
|
* http://download.swissmedicinfo.ch/ (AipsDownload)
|
|
285
284
|
* https://raw.githubusercontent.com/zdavatz/oddb2xml_files/master/LPPV.txt
|
|
286
|
-
* https://raw.githubusercontent.com/epha/robot/master/data/manual/swissmedic/atc.csv
|
|
287
285
|
* https://raw.githubusercontent.com/zdavatz/cpp2sqlite/master/input/atc_codes_multi_lingual.txt
|
|
288
286
|
|
|
289
287
|
## Rules for matching GTIN (aka EAN13), product number and IKSNR
|
data/lib/oddb2xml/cli.rb
CHANGED
|
@@ -285,10 +285,15 @@ module Oddb2xml
|
|
|
285
285
|
# instead of Thread.new do
|
|
286
286
|
|
|
287
287
|
downloader = FhirDownloader.new(@options)
|
|
288
|
-
|
|
289
|
-
|
|
288
|
+
fhir_files = downloader.download
|
|
289
|
+
total_bytes = if fhir_files.is_a?(Hash)
|
|
290
|
+
fhir_files.values.sum { |f| File.size(f) }
|
|
291
|
+
else
|
|
292
|
+
File.size(fhir_files)
|
|
293
|
+
end
|
|
294
|
+
Oddb2xml.log("FhirDownloader downloaded #{total_bytes} bytes")
|
|
290
295
|
@mutex.synchronize do
|
|
291
|
-
hsh = FhirExtractor.new(
|
|
296
|
+
hsh = FhirExtractor.new(fhir_files).to_hash
|
|
292
297
|
@items = hsh
|
|
293
298
|
Oddb2xml.log("FhirExtractor added #{@items.size} items from FHIR")
|
|
294
299
|
@items
|
data/lib/oddb2xml/downloader.rb
CHANGED
|
@@ -135,6 +135,7 @@ module Oddb2xml
|
|
|
135
135
|
end
|
|
136
136
|
end
|
|
137
137
|
end
|
|
138
|
+
xml.force_encoding("UTF-8") if xml.encoding.name != "UTF-8"
|
|
138
139
|
xml
|
|
139
140
|
end
|
|
140
141
|
end
|
|
@@ -248,13 +249,7 @@ module Oddb2xml
|
|
|
248
249
|
end
|
|
249
250
|
|
|
250
251
|
def init
|
|
251
|
-
|
|
252
|
-
log_level: :info,
|
|
253
|
-
log: false, # $stdout
|
|
254
|
-
raise_errors: true,
|
|
255
|
-
wsdl: @url
|
|
256
|
-
}
|
|
257
|
-
@client = Savon::Client.new(config)
|
|
252
|
+
# No SOAP client needed - we download a zip file directly
|
|
258
253
|
end
|
|
259
254
|
|
|
260
255
|
def download
|
data/lib/oddb2xml/extractor.rb
CHANGED
|
@@ -202,7 +202,8 @@ module Oddb2xml
|
|
|
202
202
|
|
|
203
203
|
class RefdataExtractor < Extractor
|
|
204
204
|
def initialize(xml, type)
|
|
205
|
-
@type = (type ==
|
|
205
|
+
@type = (type.to_s.upcase == "PHARMA" ? "PHARMA" : "NONPHARMA")
|
|
206
|
+
xml = xml.dup.force_encoding("UTF-8") if xml.encoding.name != "UTF-8"
|
|
206
207
|
super(xml)
|
|
207
208
|
end
|
|
208
209
|
|
|
@@ -242,6 +243,7 @@ module Oddb2xml
|
|
|
242
243
|
item[:desc_it] = name.FullName
|
|
243
244
|
end
|
|
244
245
|
end
|
|
246
|
+
item[:desc_it] = item[:desc_de] if item[:desc_it].empty? # refdata has no italian name
|
|
245
247
|
item[:atc_code] = article.MedicinalProduct.ProductClassification.Atc || ""
|
|
246
248
|
item[:company_name] = article.PackagedProduct.Holder.Name || ""
|
|
247
249
|
item[:company_ean] = article.PackagedProduct.Holder.Identifier || ""
|
|
@@ -15,62 +15,69 @@ module Oddb2xml
|
|
|
15
15
|
|
|
16
16
|
BASE_URL = "https://epl.bag.admin.ch"
|
|
17
17
|
STATIC_FHIR_PATH = "/static/fhir"
|
|
18
|
+
LANGUAGES = %w[de fr it].freeze
|
|
18
19
|
|
|
19
20
|
def initialize(options = {})
|
|
20
21
|
@options = options
|
|
21
|
-
|
|
22
|
-
super(options, @url)
|
|
22
|
+
super(options, BASE_URL)
|
|
23
23
|
end
|
|
24
24
|
|
|
25
|
+
# Returns either a single file path String (when --fhir_url is used) or a
|
|
26
|
+
# Hash of { "de" => path, "fr" => path, "it" => path } for per-language
|
|
27
|
+
# NDJSON files.
|
|
25
28
|
def download
|
|
26
|
-
|
|
29
|
+
if @options[:fhir_url]
|
|
30
|
+
@url = @options[:fhir_url]
|
|
31
|
+
download_one(@url)
|
|
32
|
+
else
|
|
33
|
+
files = {}
|
|
34
|
+
LANGUAGES.each do |lang|
|
|
35
|
+
url = "#{BASE_URL}#{STATIC_FHIR_PATH}/foph-sl-export-latest-#{lang}.ndjson"
|
|
36
|
+
path = download_one(url)
|
|
37
|
+
files[lang] = path if path
|
|
38
|
+
end
|
|
39
|
+
raise "FhirDownloader: no FHIR files downloaded successfully" if files.empty?
|
|
40
|
+
files
|
|
41
|
+
end
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
private
|
|
45
|
+
|
|
46
|
+
def download_one(url)
|
|
47
|
+
@url = url
|
|
48
|
+
filename = File.basename(url)
|
|
27
49
|
file = File.join(WORK_DIR, filename)
|
|
28
50
|
@file2save = File.join(DOWNLOADS, filename)
|
|
29
51
|
|
|
30
|
-
report_download(
|
|
52
|
+
report_download(url, @file2save)
|
|
31
53
|
|
|
32
|
-
# Check if we should skip download (file exists and is recent)
|
|
33
54
|
if skip_download?
|
|
34
55
|
Oddb2xml.log "FhirDownloader: Skip downloading #{@file2save} (#{format_size(File.size(@file2save))}, less than 24h old)"
|
|
35
56
|
return File.expand_path(@file2save)
|
|
36
57
|
end
|
|
37
58
|
|
|
38
59
|
begin
|
|
39
|
-
# Download the file
|
|
40
60
|
download_as(file, "w+")
|
|
41
61
|
|
|
42
|
-
# Validate NDJSON format
|
|
43
62
|
if validate_ndjson(@file2save)
|
|
44
63
|
line_count = count_ndjson_lines(@file2save)
|
|
45
|
-
Oddb2xml.log "FhirDownloader: NDJSON validation successful (#{line_count} bundles, #{format_size(File.size(@file2save))})"
|
|
64
|
+
Oddb2xml.log "FhirDownloader: NDJSON validation successful for #{filename} (#{line_count} bundles, #{format_size(File.size(@file2save))})"
|
|
46
65
|
else
|
|
47
|
-
Oddb2xml.log "FhirDownloader: WARNING - NDJSON validation failed!"
|
|
66
|
+
Oddb2xml.log "FhirDownloader: WARNING - NDJSON validation failed for #{filename}!"
|
|
48
67
|
end
|
|
49
68
|
|
|
50
69
|
File.expand_path(@file2save)
|
|
51
70
|
rescue Timeout::Error, Errno::ETIMEDOUT
|
|
52
71
|
retrievable? ? retry : raise
|
|
53
72
|
rescue => error
|
|
54
|
-
Oddb2xml.log "FhirDownloader: Error downloading
|
|
55
|
-
|
|
73
|
+
Oddb2xml.log "FhirDownloader: Error downloading #{filename}: #{error.message}"
|
|
74
|
+
nil
|
|
56
75
|
ensure
|
|
57
76
|
Oddb2xml.download_finished(@file2save, false)
|
|
58
77
|
FileUtils.rm_f(file, verbose: true) if File.exist?(file) && file != @file2save
|
|
59
78
|
end
|
|
60
79
|
end
|
|
61
80
|
|
|
62
|
-
private
|
|
63
|
-
|
|
64
|
-
def find_latest_fhir_url
|
|
65
|
-
agent = Mechanize.new
|
|
66
|
-
response = agent.get "https://epl.bag.admin.ch/api/sl/public/resources/current"
|
|
67
|
-
resources = JSON.parse(response.body)
|
|
68
|
-
"https://epl.bag.admin.ch/static/" + resources["fhir"]["fileUrl"]
|
|
69
|
-
rescue => e
|
|
70
|
-
Oddb2xml.log "FhirDownloader: Error finding latest URL: #{e.message}"
|
|
71
|
-
nil
|
|
72
|
-
end
|
|
73
|
-
|
|
74
81
|
def skip_download?
|
|
75
82
|
@options[:skip_download] || (File.exist?(@file2save) && file_age_hours(@file2save) < 24)
|
|
76
83
|
end
|
|
@@ -552,7 +559,7 @@ module Oddb2xml
|
|
|
552
559
|
"B"
|
|
553
560
|
when "756005022005"
|
|
554
561
|
"C"
|
|
555
|
-
when "756005022007"
|
|
562
|
+
when "756005022007", "756005022008"
|
|
556
563
|
"D"
|
|
557
564
|
when "756005022009"
|
|
558
565
|
"E"
|
|
@@ -565,8 +572,16 @@ module Oddb2xml
|
|
|
565
572
|
|
|
566
573
|
# FHIR Extractor - Compatible with existing BagXmlExtractor
|
|
567
574
|
class FhirExtractor < Extractor
|
|
568
|
-
|
|
569
|
-
|
|
575
|
+
# Accepts either a single NDJSON file path (back-compat) or a Hash
|
|
576
|
+
# { "de" => path, "fr" => path, "it" => path } of per-language files.
|
|
577
|
+
def initialize(fhir_files)
|
|
578
|
+
if fhir_files.is_a?(Hash)
|
|
579
|
+
@fhir_files = fhir_files
|
|
580
|
+
@fhir_file = fhir_files["de"] || fhir_files.values.first
|
|
581
|
+
else
|
|
582
|
+
@fhir_files = {"de" => fhir_files}
|
|
583
|
+
@fhir_file = fhir_files
|
|
584
|
+
end
|
|
570
585
|
end
|
|
571
586
|
|
|
572
587
|
def to_hash
|
|
@@ -712,9 +727,54 @@ module Oddb2xml
|
|
|
712
727
|
end
|
|
713
728
|
end
|
|
714
729
|
|
|
730
|
+
# Merge names/descriptions from additional language files
|
|
731
|
+
@fhir_files.each do |lang, file|
|
|
732
|
+
next if file == @fhir_file
|
|
733
|
+
next unless file && File.exist?(file)
|
|
734
|
+
merge_language(data, file, lang)
|
|
735
|
+
end
|
|
736
|
+
|
|
715
737
|
Oddb2xml.log "FhirExtractor: Extracted #{data.size} packages"
|
|
716
738
|
data
|
|
717
739
|
end
|
|
740
|
+
|
|
741
|
+
private
|
|
742
|
+
|
|
743
|
+
def merge_language(data, file, lang)
|
|
744
|
+
Oddb2xml.log "FhirExtractor: Merging #{lang} names/descriptions from #{file}"
|
|
745
|
+
result = FhirPreparationsEntry.parse(file)
|
|
746
|
+
name_accessor = "Name#{lang.capitalize}"
|
|
747
|
+
name_key = "name_#{lang}".to_sym
|
|
748
|
+
desc_key = "desc_#{lang}".to_sym
|
|
749
|
+
|
|
750
|
+
result.Preparations.Preparation.each do |seq|
|
|
751
|
+
next unless seq && seq.Packs && seq.Packs.Pack
|
|
752
|
+
|
|
753
|
+
translated_name = seq.respond_to?(name_accessor) ? seq.send(name_accessor) : nil
|
|
754
|
+
|
|
755
|
+
seq.Packs.Pack.each do |pac|
|
|
756
|
+
next unless pac.GTIN
|
|
757
|
+
ean13 = pac.GTIN.to_s
|
|
758
|
+
item = data[ean13]
|
|
759
|
+
next unless item
|
|
760
|
+
|
|
761
|
+
if translated_name && !translated_name.empty?
|
|
762
|
+
item[name_key] = translated_name
|
|
763
|
+
if item[:packages][ean13]
|
|
764
|
+
item[:packages][ean13][name_key] = translated_name
|
|
765
|
+
end
|
|
766
|
+
end
|
|
767
|
+
|
|
768
|
+
# The FHIR parser assigns pkg.description to all three
|
|
769
|
+
# Description* fields; in a language-specific file this is the
|
|
770
|
+
# description in that language.
|
|
771
|
+
desc = pac.DescriptionDe
|
|
772
|
+
if desc && !desc.empty? && item[:packages][ean13]
|
|
773
|
+
item[:packages][ean13][desc_key] = desc
|
|
774
|
+
end
|
|
775
|
+
end
|
|
776
|
+
end
|
|
777
|
+
end
|
|
718
778
|
end
|
|
719
779
|
end
|
|
720
780
|
|
data/lib/oddb2xml/options.rb
CHANGED
|
@@ -22,8 +22,8 @@ module Oddb2xml
|
|
|
22
22
|
opt :extended, "pharma, non-pharma plus prices and non-pharma from zurrose.
|
|
23
23
|
Products without EAN-Code will also be listed.
|
|
24
24
|
File oddb_calc.xml will also be generated"
|
|
25
|
-
opt :fhir, "Use FHIR NDJSON format from FOPH/BAG instead of XML from Spezialitätenliste", default: false
|
|
26
|
-
opt :fhir_url, "Specific FHIR NDJSON URL to download (implies --fhir)", type: :string, default: nil
|
|
25
|
+
opt :fhir, "Use FHIR NDJSON format from FOPH/BAG instead of XML from Spezialitätenliste", default: false, short: :none
|
|
26
|
+
opt :fhir_url, "Specific FHIR NDJSON URL to download (implies --fhir)", type: :string, default: nil, short: :none
|
|
27
27
|
opt :format, "File format F, default is xml. {xml|dat}
|
|
28
28
|
If F is given, -o option is ignored.", type: :string, default: "xml"
|
|
29
29
|
opt :include, "Include target option for ean14 for 'dat' format.
|
data/lib/oddb2xml/version.rb
CHANGED
data/spec/builder_spec.rb
CHANGED
|
@@ -10,7 +10,7 @@ ARTICLE_ATTRIBUTE_TESTS = [
|
|
|
10
10
|
["ARTICLE", "PROD_DATE", Oddb2xml::DATE_REGEXP],
|
|
11
11
|
["ARTICLE", "VALID_DATE", Oddb2xml::DATE_REGEXP],
|
|
12
12
|
["ARTICLE/ART", "SHA256", /[a-f0-9]{32}/],
|
|
13
|
-
["ARTICLE/ART", "DT",
|
|
13
|
+
["ARTICLE/ART", "DT", //]
|
|
14
14
|
]
|
|
15
15
|
|
|
16
16
|
ARTICLE_MISSING_ELEMENTS = [
|