oddb2xml 3.0.0 → 3.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ab4100aa6c522cd66ca2a69af3959c576e82cfdaeb41cffd399cb6d451ba7ff2
4
- data.tar.gz: 495bcb8a97881ba89f5a1bb9fd8acb264a56cdd8f262e1b7669579bffc9c77be
3
+ metadata.gz: 55725f8e11fa91216b5f7189f9339a6bacf6fd69af78c633421abc8c573a2326
4
+ data.tar.gz: acd34c442ef756f43aeac9e2551df614fff4bad81894a87add521e4c7efab8cd
5
5
  SHA512:
6
- metadata.gz: f9b839faf528860fdf2ab6e319b1db88812d1aa2c7e628df45ea6de1a932c78b51b93e84b6a30355ed8a40ca4f9782d73edce2ef3742934d8b1cff99c6d19ad8
7
- data.tar.gz: a2878be3ff52ef0ee26d366f56bed4d1e443083730aa2253cbba6a1fecfcb40e01140910108649d7065dcbec3a2326208e2a8aca1918b60151df7aaea2983767
6
+ metadata.gz: 273f80bb691ae1d0e86db41d9215456c48822f4a764163c3675d87ca6f975f11cd8d2dce89f4bf71abd9b8e320b61babd45c73d0683faafbeb25adf56f2d12b6
7
+ data.tar.gz: c0014b66c15ebddcff17cf1435d24faab68ab856faa5d198208f94d9d6925540d4939ce4ff82f4f4d38c1fc319156dbaecf49415f56e8fce1d676d3e06b179fc
data/CLAUDE.md ADDED
@@ -0,0 +1,69 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ oddb2xml is a Ruby gem that downloads Swiss pharmaceutical data from 10+ sources (Swissmedic, BAG, Refdata, ZurRose, EPha, etc.), parses multiple formats (XML, XLSX, CSV, SOAP, fixed-width DAT), merges/deduplicates them, and generates standardized XML/DAT output files for healthcare systems. It also supports the Elexis EHR Artikelstamm format.
8
+
9
+ ## Common Commands
10
+
11
+ ```bash
12
+ # Install dependencies
13
+ bundle install
14
+
15
+ # Run full test suite
16
+ bundle exec rake spec
17
+
18
+ # Run a single test file
19
+ bundle exec rspec spec/builder_spec.rb
20
+
21
+ # Run a single test by line number
22
+ bundle exec rspec spec/builder_spec.rb:42
23
+
24
+ # Lint with StandardRB
25
+ bundle exec standardrb
26
+
27
+ # Auto-fix lint issues
28
+ bundle exec standardrb --fix
29
+
30
+ # Build the gem
31
+ bundle exec rake build
32
+ ```
33
+
34
+ ## Architecture
35
+
36
+ The system follows a **download → extract → build → compress** pipeline:
37
+
38
+ 1. **CLI** (`lib/oddb2xml/cli.rb`) — Entry point. Parses options via Optimist (`options.rb`), orchestrates the pipeline, manages multi-threaded downloads.
39
+
40
+ 2. **Downloaders** (`lib/oddb2xml/downloader.rb`) — 11 subclasses of `Downloader`, each fetching from a specific Swiss data source. Files cached in `./downloads/`.
41
+
42
+ 3. **Extractors** (`lib/oddb2xml/extractor.rb`) — Matching extractor classes that parse downloaded files into Ruby hashes. Formats include XML (nokogiri/sax-machine), XLSX (rubyXL), SOAP (savon), CSV, and fixed-width text.
43
+
44
+ 4. **Builder** (`lib/oddb2xml/builder.rb`) — The largest file (~1900 lines). Merges extracted data and generates output XML/DAT files. Methods follow `prepare_*` (data assembly) and `build_*` (output generation) naming.
45
+
46
+ 5. **Calc** (`lib/oddb2xml/calc.rb`) — Composition calculation logic, works with `parslet_compositions.rb` and `compositions_syntax.rb` (Parslet-based PEG parser for drug composition strings).
47
+
48
+ 6. **Compressor** (`lib/oddb2xml/compressor.rb`) — Optional ZIP/TAR.GZ output compression.
49
+
50
+ ### Key data identifiers
51
+ - **GTIN/EAN13**: Primary article identifier (13-digit barcode)
52
+ - **Pharmacode**: Swiss pharmacy code
53
+ - **IKSNR**: Swissmedic registration number (5-digit)
54
+ - **Swissmedic sequence/pack numbers**: Combined with IKSNR to form full identifiers
55
+
56
+ ### Static data overrides
57
+ YAML files in `data/` provide manual overrides and mappings: `article_overrides.yaml`, `product_overrides.yaml`, `gtin2ignore.yaml`, `gal_forms.yaml`, `gal_groups.yaml`.
58
+
59
+ ## Testing
60
+
61
+ - Framework: RSpec with flexmock (mocking), webmock + VCR (HTTP recording/playback)
62
+ - Test fixtures: `spec/data/` (sample files), `spec/fixtures/vcr_cassettes/` (recorded HTTP responses)
63
+ - `spec/spec_helper.rb` defines test constants (GTINs) and configures VCR to avoid real HTTP calls during tests
64
+ - CI runs on Ruby 3.0, 3.1, 3.2
65
+
66
+ ## Ruby Version
67
+
68
+ - Minimum: Ruby >= 2.5.0 (gemspec)
69
+ - Current development: Ruby 3.2.0 (`.ruby-version`)
data/Gemfile CHANGED
@@ -7,5 +7,6 @@ group :debugger do
7
7
  gem "pry-doc"
8
8
  end
9
9
 
10
- gem "nokogiri", "1.13.9"
11
- gem "rack", "3.0.11"
10
+ gem "nokogiri", ">= 1.19.1"
11
+ gem "rack", ">= 3.1.20"
12
+ gem "mutex_m"
data/Gemfile.lock CHANGED
@@ -1,53 +1,53 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- oddb2xml (3.0.0)
4
+ oddb2xml (3.0.2)
5
5
  htmlentities
6
6
  httpi
7
- mechanize
7
+ mechanize (>= 2.8.5)
8
8
  minitar
9
9
  multi_json
10
- nokogiri (>= 1.8.2)
10
+ nokogiri (>= 1.19.1)
11
11
  optimist
12
12
  ox
13
13
  parslet
14
- rack (= 3.0.11)
15
- rexml
14
+ rack (>= 3.1.20)
15
+ rexml (>= 3.3.9)
16
16
  rubyXL (~> 3.4.0)
17
- rubyntlm (= 0.5.1)
17
+ rubyntlm (>= 0.6.3)
18
18
  rubyzip (~> 3.0.1)
19
19
  savon (~> 2.12.0)
20
20
  sax-machine
21
21
  spreadsheet
22
22
  standardrb
23
- webrick
23
+ webrick (>= 1.8.2)
24
24
  xml-simple
25
25
 
26
26
  GEM
27
27
  remote: https://rubygems.org/
28
28
  specs:
29
- addressable (2.8.5)
30
- public_suffix (>= 2.0.2, < 6.0)
29
+ addressable (2.8.8)
30
+ public_suffix (>= 2.0.2, < 8.0)
31
31
  akami (1.3.1)
32
32
  gyoku (>= 0.4.0)
33
33
  nokogiri
34
34
  ast (2.4.2)
35
+ base64 (0.3.0)
35
36
  builder (3.2.4)
36
37
  byebug (11.1.3)
37
38
  coderay (1.1.3)
38
- connection_pool (2.4.1)
39
+ connection_pool (3.0.2)
39
40
  crack (0.4.5)
40
41
  rexml
41
42
  diff-lcs (1.5.0)
42
- domain_name (0.5.20190701)
43
- unf (>= 0.0.5, < 1.0.0)
43
+ domain_name (0.6.20240107)
44
44
  flexmock (2.3.8)
45
45
  gyoku (1.4.0)
46
46
  builder (>= 2.1.2)
47
47
  rexml (~> 3.0)
48
48
  hashdiff (1.0.1)
49
49
  htmlentities (4.3.4)
50
- http-cookie (1.0.5)
50
+ http-cookie (1.1.0)
51
51
  domain_name (~> 0.5)
52
52
  httpi (2.5.0)
53
53
  rack
@@ -55,31 +55,37 @@ GEM
55
55
  json (2.6.3)
56
56
  language_server-protocol (3.17.0.3)
57
57
  lint_roller (1.1.0)
58
- mechanize (2.7.7)
59
- domain_name (~> 0.5, >= 0.5.1)
60
- http-cookie (~> 1.0)
61
- mime-types (>= 1.17.2)
62
- net-http-digest_auth (~> 1.1, >= 1.1.1)
63
- net-http-persistent (>= 2.5.2)
64
- nokogiri (~> 1.6)
65
- ntlm-http (~> 0.1, >= 0.1.1)
58
+ logger (1.7.0)
59
+ mechanize (2.14.0)
60
+ addressable (~> 2.8)
61
+ base64
62
+ domain_name (~> 0.5, >= 0.5.20190701)
63
+ http-cookie (~> 1.0, >= 1.0.3)
64
+ mime-types (~> 3.3)
65
+ net-http-digest_auth (~> 1.4, >= 1.4.1)
66
+ net-http-persistent (>= 2.5.2, < 5.0.dev)
67
+ nkf
68
+ nokogiri (~> 1.11, >= 1.11.2)
69
+ rubyntlm (~> 0.6, >= 0.6.3)
66
70
  webrick (~> 1.7)
67
- webrobots (>= 0.0.9, < 0.2)
71
+ webrobots (~> 0.1.2)
68
72
  method_source (1.0.0)
69
- mime-types (3.5.1)
70
- mime-types-data (~> 3.2015)
71
- mime-types-data (3.2023.0808)
72
- mini_portile2 (2.8.4)
73
+ mime-types (3.7.0)
74
+ logger
75
+ mime-types-data (~> 3.2025, >= 3.2025.0507)
76
+ mime-types-data (3.2026.0203)
77
+ mini_portile2 (2.8.9)
73
78
  minitar (0.9)
74
79
  multi_json (1.15.0)
80
+ mutex_m (0.3.0)
75
81
  net-http-digest_auth (1.4.1)
76
- net-http-persistent (4.0.2)
77
- connection_pool (~> 2.2)
78
- nokogiri (1.13.9)
79
- mini_portile2 (~> 2.8.0)
82
+ net-http-persistent (4.0.8)
83
+ connection_pool (>= 2.2.4, < 4)
84
+ nkf (0.2.0)
85
+ nokogiri (1.19.1)
86
+ mini_portile2 (~> 2.8.2)
80
87
  racc (~> 1.4)
81
88
  nori (2.6.0)
82
- ntlm-http (0.1.1)
83
89
  optimist (3.1.0)
84
90
  ox (2.14.14)
85
91
  parallel (1.23.0)
@@ -97,14 +103,14 @@ GEM
97
103
  pry (~> 0.11)
98
104
  yard (~> 0.9.11)
99
105
  psych (3.3.4)
100
- public_suffix (5.0.3)
101
- racc (1.7.1)
102
- rack (3.0.11)
106
+ public_suffix (7.0.2)
107
+ racc (1.8.1)
108
+ rack (3.2.5)
103
109
  rainbow (3.1.1)
104
110
  rake (13.0.6)
105
- rdoc (6.3.3)
111
+ rdoc (6.3.4.1)
106
112
  regexp_parser (2.8.1)
107
- rexml (3.2.6)
113
+ rexml (3.4.4)
108
114
  rspec (3.12.0)
109
115
  rspec-core (~> 3.12.0)
110
116
  rspec-expectations (~> 3.12.0)
@@ -138,7 +144,8 @@ GEM
138
144
  rubyXL (3.4.25)
139
145
  nokogiri (>= 1.10.8)
140
146
  rubyzip (>= 1.3.0)
141
- rubyntlm (0.5.1)
147
+ rubyntlm (0.6.5)
148
+ base64
142
149
  rubyzip (3.0.1)
143
150
  savon (2.12.1)
144
151
  akami (~> 1.2)
@@ -167,9 +174,6 @@ GEM
167
174
  standardrb (1.0.1)
168
175
  standard
169
176
  timecop (0.9.8)
170
- unf (0.1.4)
171
- unf_ext
172
- unf_ext (0.0.8.2)
173
177
  unicode-display_width (2.5.0)
174
178
  vcr (6.1.0)
175
179
  wasabi (3.7.0)
@@ -180,7 +184,7 @@ GEM
180
184
  addressable (>= 2.8.0)
181
185
  crack (>= 0.3.2)
182
186
  hashdiff (>= 0.4.0, < 2.0.0)
183
- webrick (1.8.1)
187
+ webrick (1.9.2)
184
188
  webrobots (0.1.2)
185
189
  xml-simple (1.1.9)
186
190
  rexml
@@ -192,14 +196,15 @@ PLATFORMS
192
196
  DEPENDENCIES
193
197
  bundler
194
198
  flexmock
195
- nokogiri (= 1.13.9)
199
+ mutex_m
200
+ nokogiri (>= 1.19.1)
196
201
  oddb2xml!
197
202
  pry-byebug
198
203
  pry-doc
199
204
  psych (< 4.0.0)
200
- rack (= 3.0.11)
205
+ rack (>= 3.1.20)
201
206
  rake
202
- rdoc (~> 6.3.3)
207
+ rdoc (>= 6.3.4.1)
203
208
  rspec
204
209
  timecop
205
210
  vcr
data/History.txt CHANGED
@@ -1,3 +1,8 @@
1
+ === 3.0.2 / 09.03.2026
2
+ * Use raw.githubusercontent.com URL for ATC CSV to avoid 429 Too Many Requests errors
3
+ * Add retry logic with exponential backoff for HTTP 429 errors in uri_open
4
+ * Remove obsolete Ruby version check in uri_open (Ruby >= 2.5 already required)
5
+
1
6
  === 2.7.9 / 19.09.22
2
7
  * Remove newly generated DSCRI when not running --artikelstamm and
3
8
  generate always (as before 2.7.8) a DSCRF fiele
data/README.md CHANGED
@@ -284,6 +284,7 @@ We use the following files:
284
284
  * http://download.swissmedicinfo.ch/ (AipsDownload)
285
285
  * https://raw.githubusercontent.com/zdavatz/oddb2xml_files/master/LPPV.txt
286
286
  * https://raw.githubusercontent.com/epha/robot/master/data/manual/swissmedic/atc.csv
287
+ * https://raw.githubusercontent.com/zdavatz/cpp2sqlite/master/input/atc_codes_multi_lingual.txt
287
288
 
288
289
  ## Rules for matching GTIN (aka EAN13), product number and IKSNR
289
290
 
data/lib/oddb2xml/cli.rb CHANGED
@@ -60,14 +60,21 @@ module Oddb2xml
60
60
  threads << download(:zurrose)
61
61
  threads << download(:package) # swissmedic
62
62
  threads << download(:lppv) # oddb2xml_files
63
- threads << download(:bag) # bag.e-mediat
63
+
64
+ # Use FHIR or XML for BAG data
65
+ if @options[:fhir]
66
+ threads << download(:fhir) # FHIR from FOPH/BAG
67
+ else
68
+ threads << download(:bag) # XML from bag.e-mediat
69
+ end
70
+
64
71
  if @options[:firstbase]
65
72
  threads << download(:firstbase) # https://github.com/zdavatz/oddb2xml/issues/63
66
73
  end
67
74
  types.each do |type|
68
75
  begin
69
76
  threads << download(:refdata, type) # refdata
70
- rescue error
77
+ rescue => error
71
78
  # Should continue even when error #102
72
79
  Oddb2xml.log("Error in downloading refdata #{error}")
73
80
  end
@@ -274,6 +281,19 @@ module Oddb2xml
274
281
  @lppvs
275
282
  end
276
283
 
284
+ when :fhir
285
+ # instead of Thread.new do
286
+
287
+ downloader = FhirDownloader.new(@options)
288
+ fhir_file = downloader.download
289
+ Oddb2xml.log("FhirDownloader downloaded #{File.size(fhir_file)} bytes")
290
+ @mutex.synchronize do
291
+ hsh = FhirExtractor.new(fhir_file).to_hash
292
+ @items = hsh
293
+ Oddb2xml.log("FhirExtractor added #{@items.size} items from FHIR")
294
+ @items
295
+ end
296
+
277
297
  when :bag
278
298
  # instead of Thread.new do
279
299
 
@@ -321,7 +341,7 @@ module Oddb2xml
321
341
  @refdata_types[type] = hsh
322
342
  Oddb2xml.log("RefdataExtractor #{type} added #{hsh.size} keys now #{@refdata_types.keys} items from xml with #{xml.size} bytes")
323
343
  @refdata_types[type]
324
- rescue error
344
+ rescue => error
325
345
  # Should continue even when error https://github.com/zdavatz/oddb2xml/issues/102
326
346
  Oddb2xml.log("Error in RefdataExtractor #{error}")
327
347
  end
@@ -1,10 +1,8 @@
1
1
  require "zlib"
2
2
  require "minitar"
3
3
  require "zip"
4
-
5
4
  module Oddb2xml
6
5
  class Compressor
7
- include Archive::Tar
8
6
  attr_accessor :contents
9
7
  def initialize(prefix = "oddb", options = {})
10
8
  @options = options