oddb2xml 3.0.10 → 3.0.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 371c2ba25fb99bf0255bc0d336b0ae8a8a0cee64e9cd869e249e7fe7cd1d9da0
4
- data.tar.gz: b85bd319a9179098af8c97bd6c1d8671a1e1cf656c0e83fdc7a0ede20ccf60fc
3
+ metadata.gz: 36ede269e855765b80eda67f9926c44dbc78c3eb4130b2e75ee780f635d0dbe3
4
+ data.tar.gz: 81ecf15485597e9a9eea503e63a7cc7aaf40dc3417f34ef2c3ca768092db9262
5
5
  SHA512:
6
- metadata.gz: bab282fd20949a78607099dba3d64288cc63809b9e25f3e95015cd7f826a274a4010f9654117710976391561524c8f57b2c0727ed8b9ef7e8511f5fe9cd31c72
7
- data.tar.gz: b4eef9e9aa441a567ac050c0ff27170ef8f9d64f8f9993b7c014ee9b7e79241ba6d5f917b956649cb194fcb59e8b06694db4d8e4500f113adb3f1212236d8430
6
+ metadata.gz: a305b713ab53201fa6c52d18db0446aa853df4f3986dbd53bcdb26cada8ad46770765800b7db5135bec1d0d8e9c58b4d793ca6f864dd203e5187912014a04103
7
+ data.tar.gz: 77abfc938a47bafc4579a429d4ff65dcace489566673754e4dedb74834fd0332f902c3aa80155b8a5aaf8140abc191fd6d4f7892092ab36ea541bf5ddff1bdcc
data/CLAUDE.md CHANGED
@@ -51,6 +51,8 @@ The system follows a **download → extract → build → compress** pipeline:
51
51
 
52
52
  8. **Refdata cleanup** (`lib/oddb2xml/refdata_cleanup.rb`) — Compensates for known data-quality issues in upstream Refdata.Articles.xml before they reach the output. Each fix is guarded by a Swissmedic-side heuristic (e.g. comma in `substance_swissmedic` to distinguish mono products from real combinations). Currently fixes the doubled-dose template bug (`X mg / X mg / Stk`). Called from `Builder#apply_refdata_description_cleanups!` at the start of `prepare_articles`. See GitHub issue #112 for the catalogue.
53
53
 
54
+ 9. **Chapter-70 hack** (`lib/oddb2xml/chapter_70_hack.rb`) — Legacy scraper for the SL "Komplementärarzneimittel" products (homeopathic/anthroposophic/phytotherapeutic), called only from `Builder#build_artikelstamm`. **Deprecated / non-FHIR only (3.0.11 onwards):** the source page `varia_De.htm` was rebuilt as a JavaScript SPA with no static data table, so the scraper now returns nothing there. These products + limitations now come through the FHIR feed (SL classification `20. KOMPLEMENTÄRARZNEIMITTEL`, 221 products on the live DE feed with real GTINs and limitation texts), so `build_artikelstamm` **skips the scraper entirely when `@options[:fhir]`** (the default for `--artikelstamm` since 3.0.9). In `--no-fhir` mode the scraper degrades gracefully (skips non-row/`<script>` nodes and empty tables, warns, returns `[]`) instead of raising `NoMethodError`. See GitHub issue #118.
55
+
54
56
  ### Key data identifiers
55
57
  - **GTIN/EAN13**: Primary article identifier (13-digit barcode)
56
58
  - **Pharmacode**: Swiss pharmacy code
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- oddb2xml (3.0.10)
4
+ oddb2xml (3.0.11)
5
5
  csv
6
6
  htmlentities
7
7
  httpi
data/History.txt CHANGED
@@ -1,3 +1,6 @@
1
+ === 3.0.11 / 02.06.2026
2
+ * Artikelstamm: fix crash in the chapter-70 hack. The upstream source page http://www.spezialitaetenliste.ch/varia_De.htm has been rebuilt as a JavaScript single-page app and no longer ships the chapter-70 data table as static HTML, so Chapter70xtractor.parse raised NoMethodError. The chapter-70 (Komplementärarzneimittel) products and their limitations now arrive through the FHIR feed (SL classification "20. KOMPLEMENTÄRARZNEIMITTEL"), so the legacy scraper is now skipped entirely in FHIR mode (the default for --artikelstamm since 3.0.9). For --no-fhir runs the scraper degrades gracefully: it skips non-row nodes and an empty/missing table, logs a warning and returns no items instead of crashing (issue #118).
3
+
1
4
  === 3.0.10 / 01.06.2026
2
5
  * FHIR: read the BAG Indikationscode (XXXXX.NN) from the explicit `indicationCode` extension now carried on each limitation (BAG SL FHIR export >= v2.0.5) instead of reconstructing it from FOPHDossierNumber + the ClinicalUseDefinition id suffix. The changelog states the limitation code (CUD id) and the indication code are independent fields, so the old derivation was no longer guaranteed correct. Falls back to the previous derivation for older feeds that lack the extension. Output is identical on the current live feed. Other changelog items (limitation text moved to ClinicalUseDefinition, sanitized CUD ids, un-truncated `<` in texts, deduplicated/ordered Ingredients) were already handled or are transparent upstream fixes.
3
6
 
data/README.md CHANGED
@@ -51,7 +51,7 @@ HIN (http://hin.ch) creates daily the actual file. They can be downloaded from `
51
51
  see `--help`.
52
52
 
53
53
  ```
54
- /opt/src/oddb2xml/bin/oddb2xml version 3.0.10
54
+ /opt/src/oddb2xml/bin/oddb2xml version 3.0.11
55
55
  Usage:
56
56
  oddb2xml [option]
57
57
  produced files are found under data
@@ -1742,7 +1742,7 @@ module Oddb2xml
1742
1742
  # Set the pharmatype to 'Y' for outdated products, which are no longer found
1743
1743
  # in refdata/packungen
1744
1744
  chap70 = nil
1745
- if @chapter70items.values.find { |x| x[:pharmacode]&.eql?(obj[:pharmacode]) }
1745
+ if @chapter70items&.values&.find { |x| x[:pharmacode]&.eql?(obj[:pharmacode]) }
1746
1746
  Oddb2xml.log "found chapter #{obj[:pharmacode]}" if $VERBOSE
1747
1747
  chap70 = true
1748
1748
  end
@@ -1804,9 +1804,17 @@ module Oddb2xml
1804
1804
  end
1805
1805
  end
1806
1806
  unless @prepared
1807
- require "oddb2xml/chapter_70_hack"
1808
- Oddb2xml::Chapter70xtractor.parse
1809
- @chapter70items = Oddb2xml::Chapter70xtractor.items
1807
+ # The chapter-70 ("Komplementärarzneimittel") products and their
1808
+ # limitations now arrive through the FHIR feed (SL classification
1809
+ # "20. KOMPLEMENTÄRARZNEIMITTEL"), so the legacy varia_De.htm scraper
1810
+ # is redundant in FHIR mode. The varia page itself has been rebuilt as
1811
+ # a JavaScript SPA that no longer ships the data table, so the scraper
1812
+ # can only fail there now. See GitHub issue #118.
1813
+ unless @options[:fhir]
1814
+ require "oddb2xml/chapter_70_hack"
1815
+ Oddb2xml::Chapter70xtractor.parse
1816
+ @chapter70items = Oddb2xml::Chapter70xtractor.items
1817
+ end
1810
1818
  prepare_limitations
1811
1819
  prepare_articles
1812
1820
  prepare_products
@@ -39,12 +39,31 @@ module Oddb2xml
39
39
  effort: :tolerant,
40
40
  smart: true
41
41
  }
42
- res = Ox.load(Oddb2xml.uri_open(html_file).read, mode: :hash_no_attrs).values.first["body"]
42
+ parsed = Ox.load(Oddb2xml.uri_open(html_file).read, mode: :hash_no_attrs)
43
+ res = parsed.values.first["body"] if parsed.respond_to?(:values) && parsed.values.first.is_a?(Hash)
43
44
  result = []
44
45
  idx = 0
45
46
  @@items = {}
46
- res.values.last.each do |item|
47
- item.values.first.each do |sub_elem|
47
+ unless res.respond_to?(:values)
48
+ warn "Chapter70: varia page has no <body> to parse (got #{res.class}); skipping"
49
+ return []
50
+ end
51
+ # The varia page used to expose the chapter-70 table as static HTML. It
52
+ # is now a JavaScript single-page app whose <body> only contains an empty
53
+ # <sl-root> shell, so there is no data table to walk. Each entry yielded
54
+ # by iterating a Hash is a [tag, content] pair; a real row carries a Hash
55
+ # of cells, while stray nodes (e.g. <script>) carry an Array. Skip the
56
+ # latter so a redesigned/empty page degrades to "no items" instead of
57
+ # raising NoMethodError. See GitHub issue #118.
58
+ rows = res.values.last
59
+ unless rows.respond_to?(:each)
60
+ warn "Chapter70: varia page has no parseable rows (got #{rows.class}); skipping"
61
+ return []
62
+ end
63
+ rows.each do |item|
64
+ cells = item.is_a?(Hash) ? item.values.first : nil
65
+ next unless cells.respond_to?(:each)
66
+ cells.each do |sub_elem|
48
67
  what = Chapter70xtractor.parse_td(sub_elem)
49
68
  idx += 1
50
69
  puts "#{idx}: xx #{what}" if $VERBOSE
@@ -52,6 +71,7 @@ module Oddb2xml
52
71
  end
53
72
  end
54
73
  result2 = result.find_all { |x| (x.is_a?(Array) && x.first.is_a?(String)) && x.first.to_i > 100 }
74
+ warn "Chapter70: varia page yielded no chapter-70 products; skipping" if result2.empty?
55
75
  result2.each do |entry|
56
76
  data = {}
57
77
  pharma_code = entry.first
@@ -1,3 +1,3 @@
1
1
  module Oddb2xml
2
- VERSION = "3.0.10"
2
+ VERSION = "3.0.11"
3
3
  end
@@ -38,7 +38,11 @@ describe Oddb2xml::Builder do
38
38
  context "when artikelstamm option is given" do
39
39
  before(:all) do
40
40
  common_run_init
41
- options = Oddb2xml::Options.parse(["--artikelstamm"]) # , '--log'])
41
+ # --no-fhir: these fixtures (chapter-70 fake GTINs, BAG-XML preparations)
42
+ # exercise the legacy pipeline. Since 3.0.9 --artikelstamm defaults to the
43
+ # FHIR feed, which has no VCR cassette here and skips the chapter-70 hack
44
+ # (issue #118), so the suite must opt out to keep testing legacy output.
45
+ options = Oddb2xml::Options.parse(["--artikelstamm", "--no-fhir"]) # , '--log'])
42
46
  # @res = buildr_capture(:stdout){ Oddb2xml::Cli.new(options).run }
43
47
  Oddb2xml::Cli.new(options).run # to debug
44
48
  @artikelstamm_name = File.join(Oddb2xml::WORK_DIR, "artikelstamm_#{Date.today.strftime("%d%m%Y")}_v5.xml")
@@ -503,10 +507,24 @@ Der behandelnde Arzt ist verpflichtet, die erforderlichen Daten laufend im vorge
503
507
  end
504
508
  it "parsing" do
505
509
  require "oddb2xml/chapter_70_hack"
510
+ # Stub explicitly so this test is independent of example ordering
511
+ # (the SPA test below re-stubs the same URL).
512
+ url = "http://www.spezialitaetenliste.ch/varia_De.htm"
513
+ stub_request(:any, url).to_return(body: File.read(File.join(Oddb2xml::SpecData, "varia_De.htm")))
506
514
  result = Oddb2xml::Chapter70xtractor.parse
507
515
  expect(result.class).to eq Array
508
516
  expect(result.first).to eq ["2069562", "70.01.10", "Urtinktur", "1--10 g/ml", "13.40", ""]
509
517
  expect(result.last).to eq ["6516727", "70.02", "Allergenorum extractum varium / Inj. Susp. \n\tFortsetzungsbehandlung", "1 Durchstfl 1.5 ml", "311.85", "L"]
510
518
  end
519
+ it "degrades gracefully when varia page is a JavaScript SPA (issue #118)" do
520
+ require "oddb2xml/chapter_70_hack"
521
+ spa = File.read(File.join(Oddb2xml::SpecData, "varia_De_spa.htm"))
522
+ url = "http://www.spezialitaetenliste.ch/varia_De.htm"
523
+ stub_request(:any, url).to_return(body: spa)
524
+ result = nil
525
+ expect { result = Oddb2xml::Chapter70xtractor.parse }.not_to raise_error
526
+ expect(result).to eq []
527
+ expect(Oddb2xml::Chapter70xtractor.items).to eq({})
528
+ end
511
529
  end
512
530
  end
@@ -0,0 +1,16 @@
1
+ <!doctype html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="utf-8">
5
+ <title>BAG SL</title>
6
+ <base href="/">
7
+ </head>
8
+ <body>
9
+ <noscript><div><section><strong>JavaScript ist erforderlich!</strong></section></div></noscript>
10
+ <div><div><section><strong>Nicht unterstützter Browser!</strong></section></div></div>
11
+ <sl-root></sl-root>
12
+ <script src="polyfills-HE33VVSG.js" type="module"></script>
13
+ <script src="scripts-ALGZMJC4.js" defer></script>
14
+ <script src="main-DRAKRIKO.js" type="module"></script>
15
+ </body>
16
+ </html>
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: oddb2xml
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.10
4
+ version: 3.0.11
5
5
  platform: ruby
6
6
  authors:
7
7
  - Yasuhiro Asaka, Zeno R.R. Davatz, Niklaus Giger
@@ -547,6 +547,7 @@ files:
547
547
  - spec/data/v5_first.xml
548
548
  - spec/data/v5_second.xml
549
549
  - spec/data/varia_De.htm
550
+ - spec/data/varia_De_spa.htm
550
551
  - spec/data/vcr/transfer.dat
551
552
  - spec/data/vcr/transfer.zip
552
553
  - spec/data/wsdl_nonpharma.xml
@@ -644,6 +645,7 @@ test_files:
644
645
  - spec/data/v5_first.xml
645
646
  - spec/data/v5_second.xml
646
647
  - spec/data/varia_De.htm
648
+ - spec/data/varia_De_spa.htm
647
649
  - spec/data/vcr/transfer.dat
648
650
  - spec/data/vcr/transfer.zip
649
651
  - spec/data/wsdl_nonpharma.xml