oddb2xml 3.0.0 → 3.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CLAUDE.md +69 -0
- data/Gemfile +3 -2
- data/Gemfile.lock +49 -44
- data/History.txt +5 -0
- data/README.md +1 -0
- data/lib/oddb2xml/cli.rb +23 -3
- data/lib/oddb2xml/compressor.rb +0 -2
- data/lib/oddb2xml/fhir_support.rb +752 -0
- data/lib/oddb2xml/options.rb +6 -0
- data/lib/oddb2xml/util.rb +14 -6
- data/lib/oddb2xml/version.rb +1 -1
- data/lib/oddb2xml.rb +10 -0
- data/oddb2xml.gemspec +8 -8
- metadata +25 -23
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 55725f8e11fa91216b5f7189f9339a6bacf6fd69af78c633421abc8c573a2326
|
|
4
|
+
data.tar.gz: acd34c442ef756f43aeac9e2551df614fff4bad81894a87add521e4c7efab8cd
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 273f80bb691ae1d0e86db41d9215456c48822f4a764163c3675d87ca6f975f11cd8d2dce89f4bf71abd9b8e320b61babd45c73d0683faafbeb25adf56f2d12b6
|
|
7
|
+
data.tar.gz: c0014b66c15ebddcff17cf1435d24faab68ab856faa5d198208f94d9d6925540d4939ce4ff82f4f4d38c1fc319156dbaecf49415f56e8fce1d676d3e06b179fc
|
data/CLAUDE.md
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Project Overview
|
|
6
|
+
|
|
7
|
+
oddb2xml is a Ruby gem that downloads Swiss pharmaceutical data from 10+ sources (Swissmedic, BAG, Refdata, ZurRose, EPha, etc.), parses multiple formats (XML, XLSX, CSV, SOAP, fixed-width DAT), merges/deduplicates them, and generates standardized XML/DAT output files for healthcare systems. It also supports the Elexis EHR Artikelstamm format.
|
|
8
|
+
|
|
9
|
+
## Common Commands
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
# Install dependencies
|
|
13
|
+
bundle install
|
|
14
|
+
|
|
15
|
+
# Run full test suite
|
|
16
|
+
bundle exec rake spec
|
|
17
|
+
|
|
18
|
+
# Run a single test file
|
|
19
|
+
bundle exec rspec spec/builder_spec.rb
|
|
20
|
+
|
|
21
|
+
# Run a single test by line number
|
|
22
|
+
bundle exec rspec spec/builder_spec.rb:42
|
|
23
|
+
|
|
24
|
+
# Lint with StandardRB
|
|
25
|
+
bundle exec standardrb
|
|
26
|
+
|
|
27
|
+
# Auto-fix lint issues
|
|
28
|
+
bundle exec standardrb --fix
|
|
29
|
+
|
|
30
|
+
# Build the gem
|
|
31
|
+
bundle exec rake build
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Architecture
|
|
35
|
+
|
|
36
|
+
The system follows a **download → extract → build → compress** pipeline:
|
|
37
|
+
|
|
38
|
+
1. **CLI** (`lib/oddb2xml/cli.rb`) — Entry point. Parses options via Optimist (`options.rb`), orchestrates the pipeline, manages multi-threaded downloads.
|
|
39
|
+
|
|
40
|
+
2. **Downloaders** (`lib/oddb2xml/downloader.rb`) — 11 subclasses of `Downloader`, each fetching from a specific Swiss data source. Files cached in `./downloads/`.
|
|
41
|
+
|
|
42
|
+
3. **Extractors** (`lib/oddb2xml/extractor.rb`) — Matching extractor classes that parse downloaded files into Ruby hashes. Formats include XML (nokogiri/sax-machine), XLSX (rubyXL), SOAP (savon), CSV, and fixed-width text.
|
|
43
|
+
|
|
44
|
+
4. **Builder** (`lib/oddb2xml/builder.rb`) — The largest file (~1900 lines). Merges extracted data and generates output XML/DAT files. Methods follow `prepare_*` (data assembly) and `build_*` (output generation) naming.
|
|
45
|
+
|
|
46
|
+
5. **Calc** (`lib/oddb2xml/calc.rb`) — Composition calculation logic, works with `parslet_compositions.rb` and `compositions_syntax.rb` (Parslet-based PEG parser for drug composition strings).
|
|
47
|
+
|
|
48
|
+
6. **Compressor** (`lib/oddb2xml/compressor.rb`) — Optional ZIP/TAR.GZ output compression.
|
|
49
|
+
|
|
50
|
+
### Key data identifiers
|
|
51
|
+
- **GTIN/EAN13**: Primary article identifier (13-digit barcode)
|
|
52
|
+
- **Pharmacode**: Swiss pharmacy code
|
|
53
|
+
- **IKSNR**: Swissmedic registration number (5-digit)
|
|
54
|
+
- **Swissmedic sequence/pack numbers**: Combined with IKSNR to form full identifiers
|
|
55
|
+
|
|
56
|
+
### Static data overrides
|
|
57
|
+
YAML files in `data/` provide manual overrides and mappings: `article_overrides.yaml`, `product_overrides.yaml`, `gtin2ignore.yaml`, `gal_forms.yaml`, `gal_groups.yaml`.
|
|
58
|
+
|
|
59
|
+
## Testing
|
|
60
|
+
|
|
61
|
+
- Framework: RSpec with flexmock (mocking), webmock + VCR (HTTP recording/playback)
|
|
62
|
+
- Test fixtures: `spec/data/` (sample files), `spec/fixtures/vcr_cassettes/` (recorded HTTP responses)
|
|
63
|
+
- `spec/spec_helper.rb` defines test constants (GTINs) and configures VCR to avoid real HTTP calls during tests
|
|
64
|
+
- CI runs on Ruby 3.0, 3.1, 3.2
|
|
65
|
+
|
|
66
|
+
## Ruby Version
|
|
67
|
+
|
|
68
|
+
- Minimum: Ruby >= 2.5.0 (gemspec)
|
|
69
|
+
- Current development: Ruby 3.2.0 (`.ruby-version`)
|
data/Gemfile
CHANGED
data/Gemfile.lock
CHANGED
|
@@ -1,53 +1,53 @@
|
|
|
1
1
|
PATH
|
|
2
2
|
remote: .
|
|
3
3
|
specs:
|
|
4
|
-
oddb2xml (3.0.
|
|
4
|
+
oddb2xml (3.0.2)
|
|
5
5
|
htmlentities
|
|
6
6
|
httpi
|
|
7
|
-
mechanize
|
|
7
|
+
mechanize (>= 2.8.5)
|
|
8
8
|
minitar
|
|
9
9
|
multi_json
|
|
10
|
-
nokogiri (>= 1.
|
|
10
|
+
nokogiri (>= 1.19.1)
|
|
11
11
|
optimist
|
|
12
12
|
ox
|
|
13
13
|
parslet
|
|
14
|
-
rack (
|
|
15
|
-
rexml
|
|
14
|
+
rack (>= 3.1.20)
|
|
15
|
+
rexml (>= 3.3.9)
|
|
16
16
|
rubyXL (~> 3.4.0)
|
|
17
|
-
rubyntlm (
|
|
17
|
+
rubyntlm (>= 0.6.3)
|
|
18
18
|
rubyzip (~> 3.0.1)
|
|
19
19
|
savon (~> 2.12.0)
|
|
20
20
|
sax-machine
|
|
21
21
|
spreadsheet
|
|
22
22
|
standardrb
|
|
23
|
-
webrick
|
|
23
|
+
webrick (>= 1.8.2)
|
|
24
24
|
xml-simple
|
|
25
25
|
|
|
26
26
|
GEM
|
|
27
27
|
remote: https://rubygems.org/
|
|
28
28
|
specs:
|
|
29
|
-
addressable (2.8.
|
|
30
|
-
public_suffix (>= 2.0.2, <
|
|
29
|
+
addressable (2.8.8)
|
|
30
|
+
public_suffix (>= 2.0.2, < 8.0)
|
|
31
31
|
akami (1.3.1)
|
|
32
32
|
gyoku (>= 0.4.0)
|
|
33
33
|
nokogiri
|
|
34
34
|
ast (2.4.2)
|
|
35
|
+
base64 (0.3.0)
|
|
35
36
|
builder (3.2.4)
|
|
36
37
|
byebug (11.1.3)
|
|
37
38
|
coderay (1.1.3)
|
|
38
|
-
connection_pool (
|
|
39
|
+
connection_pool (3.0.2)
|
|
39
40
|
crack (0.4.5)
|
|
40
41
|
rexml
|
|
41
42
|
diff-lcs (1.5.0)
|
|
42
|
-
domain_name (0.
|
|
43
|
-
unf (>= 0.0.5, < 1.0.0)
|
|
43
|
+
domain_name (0.6.20240107)
|
|
44
44
|
flexmock (2.3.8)
|
|
45
45
|
gyoku (1.4.0)
|
|
46
46
|
builder (>= 2.1.2)
|
|
47
47
|
rexml (~> 3.0)
|
|
48
48
|
hashdiff (1.0.1)
|
|
49
49
|
htmlentities (4.3.4)
|
|
50
|
-
http-cookie (1.0
|
|
50
|
+
http-cookie (1.1.0)
|
|
51
51
|
domain_name (~> 0.5)
|
|
52
52
|
httpi (2.5.0)
|
|
53
53
|
rack
|
|
@@ -55,31 +55,37 @@ GEM
|
|
|
55
55
|
json (2.6.3)
|
|
56
56
|
language_server-protocol (3.17.0.3)
|
|
57
57
|
lint_roller (1.1.0)
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
58
|
+
logger (1.7.0)
|
|
59
|
+
mechanize (2.14.0)
|
|
60
|
+
addressable (~> 2.8)
|
|
61
|
+
base64
|
|
62
|
+
domain_name (~> 0.5, >= 0.5.20190701)
|
|
63
|
+
http-cookie (~> 1.0, >= 1.0.3)
|
|
64
|
+
mime-types (~> 3.3)
|
|
65
|
+
net-http-digest_auth (~> 1.4, >= 1.4.1)
|
|
66
|
+
net-http-persistent (>= 2.5.2, < 5.0.dev)
|
|
67
|
+
nkf
|
|
68
|
+
nokogiri (~> 1.11, >= 1.11.2)
|
|
69
|
+
rubyntlm (~> 0.6, >= 0.6.3)
|
|
66
70
|
webrick (~> 1.7)
|
|
67
|
-
webrobots (
|
|
71
|
+
webrobots (~> 0.1.2)
|
|
68
72
|
method_source (1.0.0)
|
|
69
|
-
mime-types (3.
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
+
mime-types (3.7.0)
|
|
74
|
+
logger
|
|
75
|
+
mime-types-data (~> 3.2025, >= 3.2025.0507)
|
|
76
|
+
mime-types-data (3.2026.0203)
|
|
77
|
+
mini_portile2 (2.8.9)
|
|
73
78
|
minitar (0.9)
|
|
74
79
|
multi_json (1.15.0)
|
|
80
|
+
mutex_m (0.3.0)
|
|
75
81
|
net-http-digest_auth (1.4.1)
|
|
76
|
-
net-http-persistent (4.0.
|
|
77
|
-
connection_pool (
|
|
78
|
-
|
|
79
|
-
|
|
82
|
+
net-http-persistent (4.0.8)
|
|
83
|
+
connection_pool (>= 2.2.4, < 4)
|
|
84
|
+
nkf (0.2.0)
|
|
85
|
+
nokogiri (1.19.1)
|
|
86
|
+
mini_portile2 (~> 2.8.2)
|
|
80
87
|
racc (~> 1.4)
|
|
81
88
|
nori (2.6.0)
|
|
82
|
-
ntlm-http (0.1.1)
|
|
83
89
|
optimist (3.1.0)
|
|
84
90
|
ox (2.14.14)
|
|
85
91
|
parallel (1.23.0)
|
|
@@ -97,14 +103,14 @@ GEM
|
|
|
97
103
|
pry (~> 0.11)
|
|
98
104
|
yard (~> 0.9.11)
|
|
99
105
|
psych (3.3.4)
|
|
100
|
-
public_suffix (
|
|
101
|
-
racc (1.
|
|
102
|
-
rack (3.
|
|
106
|
+
public_suffix (7.0.2)
|
|
107
|
+
racc (1.8.1)
|
|
108
|
+
rack (3.2.5)
|
|
103
109
|
rainbow (3.1.1)
|
|
104
110
|
rake (13.0.6)
|
|
105
|
-
rdoc (6.3.
|
|
111
|
+
rdoc (6.3.4.1)
|
|
106
112
|
regexp_parser (2.8.1)
|
|
107
|
-
rexml (3.
|
|
113
|
+
rexml (3.4.4)
|
|
108
114
|
rspec (3.12.0)
|
|
109
115
|
rspec-core (~> 3.12.0)
|
|
110
116
|
rspec-expectations (~> 3.12.0)
|
|
@@ -138,7 +144,8 @@ GEM
|
|
|
138
144
|
rubyXL (3.4.25)
|
|
139
145
|
nokogiri (>= 1.10.8)
|
|
140
146
|
rubyzip (>= 1.3.0)
|
|
141
|
-
rubyntlm (0.5
|
|
147
|
+
rubyntlm (0.6.5)
|
|
148
|
+
base64
|
|
142
149
|
rubyzip (3.0.1)
|
|
143
150
|
savon (2.12.1)
|
|
144
151
|
akami (~> 1.2)
|
|
@@ -167,9 +174,6 @@ GEM
|
|
|
167
174
|
standardrb (1.0.1)
|
|
168
175
|
standard
|
|
169
176
|
timecop (0.9.8)
|
|
170
|
-
unf (0.1.4)
|
|
171
|
-
unf_ext
|
|
172
|
-
unf_ext (0.0.8.2)
|
|
173
177
|
unicode-display_width (2.5.0)
|
|
174
178
|
vcr (6.1.0)
|
|
175
179
|
wasabi (3.7.0)
|
|
@@ -180,7 +184,7 @@ GEM
|
|
|
180
184
|
addressable (>= 2.8.0)
|
|
181
185
|
crack (>= 0.3.2)
|
|
182
186
|
hashdiff (>= 0.4.0, < 2.0.0)
|
|
183
|
-
webrick (1.
|
|
187
|
+
webrick (1.9.2)
|
|
184
188
|
webrobots (0.1.2)
|
|
185
189
|
xml-simple (1.1.9)
|
|
186
190
|
rexml
|
|
@@ -192,14 +196,15 @@ PLATFORMS
|
|
|
192
196
|
DEPENDENCIES
|
|
193
197
|
bundler
|
|
194
198
|
flexmock
|
|
195
|
-
|
|
199
|
+
mutex_m
|
|
200
|
+
nokogiri (>= 1.19.1)
|
|
196
201
|
oddb2xml!
|
|
197
202
|
pry-byebug
|
|
198
203
|
pry-doc
|
|
199
204
|
psych (< 4.0.0)
|
|
200
|
-
rack (
|
|
205
|
+
rack (>= 3.1.20)
|
|
201
206
|
rake
|
|
202
|
-
rdoc (
|
|
207
|
+
rdoc (>= 6.3.4.1)
|
|
203
208
|
rspec
|
|
204
209
|
timecop
|
|
205
210
|
vcr
|
data/History.txt
CHANGED
|
@@ -1,3 +1,8 @@
|
|
|
1
|
+
=== 3.0.2 / 09.03.2026
|
|
2
|
+
* Use raw.githubusercontent.com URL for ATC CSV to avoid 429 Too Many Requests errors
|
|
3
|
+
* Add retry logic with exponential backoff for HTTP 429 errors in uri_open
|
|
4
|
+
* Remove obsolete Ruby version check in uri_open (Ruby >= 2.5 already required)
|
|
5
|
+
|
|
1
6
|
=== 2.7.9 / 19.09.22
|
|
2
7
|
* Remove newly generated DSCRI when not running --artikelstamm and
|
|
3
8
|
generate always (as before 2.7.8) a DSCRF fiele
|
data/README.md
CHANGED
|
@@ -284,6 +284,7 @@ We use the following files:
|
|
|
284
284
|
* http://download.swissmedicinfo.ch/ (AipsDownload)
|
|
285
285
|
* https://raw.githubusercontent.com/zdavatz/oddb2xml_files/master/LPPV.txt
|
|
286
286
|
* https://raw.githubusercontent.com/epha/robot/master/data/manual/swissmedic/atc.csv
|
|
287
|
+
* https://raw.githubusercontent.com/zdavatz/cpp2sqlite/master/input/atc_codes_multi_lingual.txt
|
|
287
288
|
|
|
288
289
|
## Rules for matching GTIN (aka EAN13), product number and IKSNR
|
|
289
290
|
|
data/lib/oddb2xml/cli.rb
CHANGED
|
@@ -60,14 +60,21 @@ module Oddb2xml
|
|
|
60
60
|
threads << download(:zurrose)
|
|
61
61
|
threads << download(:package) # swissmedic
|
|
62
62
|
threads << download(:lppv) # oddb2xml_files
|
|
63
|
-
|
|
63
|
+
|
|
64
|
+
# Use FHIR or XML for BAG data
|
|
65
|
+
if @options[:fhir]
|
|
66
|
+
threads << download(:fhir) # FHIR from FOPH/BAG
|
|
67
|
+
else
|
|
68
|
+
threads << download(:bag) # XML from bag.e-mediat
|
|
69
|
+
end
|
|
70
|
+
|
|
64
71
|
if @options[:firstbase]
|
|
65
72
|
threads << download(:firstbase) # https://github.com/zdavatz/oddb2xml/issues/63
|
|
66
73
|
end
|
|
67
74
|
types.each do |type|
|
|
68
75
|
begin
|
|
69
76
|
threads << download(:refdata, type) # refdata
|
|
70
|
-
rescue error
|
|
77
|
+
rescue => error
|
|
71
78
|
# Should continue even when error #102
|
|
72
79
|
Oddb2xml.log("Error in downloading refdata #{error}")
|
|
73
80
|
end
|
|
@@ -274,6 +281,19 @@ module Oddb2xml
|
|
|
274
281
|
@lppvs
|
|
275
282
|
end
|
|
276
283
|
|
|
284
|
+
when :fhir
|
|
285
|
+
# instead of Thread.new do
|
|
286
|
+
|
|
287
|
+
downloader = FhirDownloader.new(@options)
|
|
288
|
+
fhir_file = downloader.download
|
|
289
|
+
Oddb2xml.log("FhirDownloader downloaded #{File.size(fhir_file)} bytes")
|
|
290
|
+
@mutex.synchronize do
|
|
291
|
+
hsh = FhirExtractor.new(fhir_file).to_hash
|
|
292
|
+
@items = hsh
|
|
293
|
+
Oddb2xml.log("FhirExtractor added #{@items.size} items from FHIR")
|
|
294
|
+
@items
|
|
295
|
+
end
|
|
296
|
+
|
|
277
297
|
when :bag
|
|
278
298
|
# instead of Thread.new do
|
|
279
299
|
|
|
@@ -321,7 +341,7 @@ module Oddb2xml
|
|
|
321
341
|
@refdata_types[type] = hsh
|
|
322
342
|
Oddb2xml.log("RefdataExtractor #{type} added #{hsh.size} keys now #{@refdata_types.keys} items from xml with #{xml.size} bytes")
|
|
323
343
|
@refdata_types[type]
|
|
324
|
-
rescue error
|
|
344
|
+
rescue => error
|
|
325
345
|
# Should continue even when error https://github.com/zdavatz/oddb2xml/issues/102
|
|
326
346
|
Oddb2xml.log("Error in RefdataExtractor #{error}")
|
|
327
347
|
end
|