traject 3.5.0 → 3.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 01ca968682bb3fc2a8313131ef6344bfc9e5418b767b2900c3d799caa356d016
4
- data.tar.gz: a3fd6c9a3bec88c6ba592500ea170357b059533f28c5ba3fb2fe72de39702a2a
3
+ metadata.gz: 2e47e6648ed9fc963d18e10c9be48a30273147c4920cb4b7e448d078fd2398ac
4
+ data.tar.gz: efa549ebcbd87e599b56b955b4bd26422dfe7de67697aed6b39cb421c3b80677
5
5
  SHA512:
6
- metadata.gz: 93547927e90b7947588c1983bd37de4651722bebaff7aaad2d3965ec46eed8c647971f1ce093beb86a5c47c2efa999516d00fb6956682c98913cd54ea5a1a2b8
7
- data.tar.gz: 128ea6e2517711f2324541215f814430f963b0042ae321ebc3f29646398990dc09d2a307acc32fb89a70142d74763fdfee6a61d220043622b5b8b11fbcd645d8
6
+ metadata.gz: 6acdd2b8cfc888b221a1f19cd5197127006be81d0525169d531fc9bf43fe02cc9ec87401e6b2442c57ff0cd483d9884504ac75be92e3718cbbc49208dc97024f
7
+ data.tar.gz: 30abefa7af9e1c170ae8570aa59b6c571a9acc1eb7b0abf6efd64d97550b678c21c72d76a8156ef3844ab01154111fd3747f96d6346ee8a8d76e747b2cf92e1f
@@ -12,14 +12,23 @@ jobs:
12
12
  strategy:
13
13
  fail-fast: false
14
14
  matrix:
15
- ruby: [ '2.4', '2.5', '2.6', '2.7', 'jruby-9.1', 'jruby-9.2' ]
15
+ ruby: [ '2.4', '2.5', '2.6', '2.7', '3.0', 'jruby-9.1', 'jruby-9.2' ]
16
16
  name: Ruby ${{ matrix.ruby }}
17
17
  steps:
18
18
  - uses: actions/checkout@v2
19
+
19
20
  - name: Set up Ruby
20
21
  uses: ruby/setup-ruby@v1
21
22
  with:
22
23
  ruby-version: ${{ matrix.ruby }}
24
+
25
+ - name: set JAVA_OPTS for jruby-9.1
26
+ run: echo 'JAVA_OPTS="--add-opens java.base/java.security.cert=ALL-UNNAMED --add-opens java.base/java.security=ALL-UNNAMED --add-opens java.base/java.util.zip=ALL-UNNAMED"' >> $GITHUB_ENV
27
+ if: ${{ matrix.ruby == 'jruby-9.1' }}
28
+ # https://github.com/jruby/jruby/issues/4834
29
+ # Still seems to be an issue in jruby-9.1, but not 9.2
30
+ # https://github.community/t/conditional-setting-of-env-variables-in-gh-actions/179650
31
+
23
32
  - name: Install dependencies
24
33
  run: bundle install --jobs 4 --retry 3
25
34
  - name: Run tests
data/CHANGES.md CHANGED
@@ -6,7 +6,11 @@
6
6
 
7
7
  *
8
8
 
9
- *
9
+ ## 3.6.0
10
+
11
+ * Tiny backward compat changes for ruby 3.0 compat. https://github.com/traject/traject/pull/263
12
+
13
+ * Allow gem `http` 5.x in gemspec. https://github.com/traject/traject/pull/269
10
14
 
11
15
  ## 3.5.0
12
16
 
data/README.md CHANGED
@@ -8,7 +8,7 @@ Traject can also be generalized to a set of tools for getting structured data fr
8
8
 
9
9
  **Traject is stable, mature software, that is already being used in production by its authors and several other institutions.**
10
10
 
11
- [![Gem Version](https://badge.fury.io/rb/traject.png)](http://badge.fury.io/rb/traject)
11
+ [![Gem Version](https://badge.fury.io/rb/traject.svg)](http://badge.fury.io/rb/traject)
12
12
  [![CI Status](https://github.com/traject/traject/workflows/CI/badge.svg?branch=master)](https://github.com/traject/traject/actions?query=workflow%3ACI+branch%3Amaster)
13
13
 
14
14
 
@@ -468,6 +468,22 @@ Also see `-I load_path` option and suggestions for Bundler use under Extending W
468
468
  See also [Hints for batch and cronjob use](./doc/batch_execution.md) of traject.
469
469
 
470
470
 
471
+ ## A small but complete example
472
+
473
+ To process a MARC XML file with the data shown in [./examples/marc/tiny.xml](./examples/marc/tiny.xml) you can use save the following configuration as `config.rb`:
474
+
475
+ ```
476
+ to_field 'title', extract_marc('245a', first: true)
477
+ ```
478
+
479
+ and run Traject as follows:
480
+
481
+ ```
482
+ traject -t xml -c config.rb -w Traject::DebugWriter tiny.xml
483
+ ```
484
+
485
+ `-t xml` indicates that the file is a MARC XML file. `-w Traject::DebugWriter` outputs the results to the console (e.g. without saving to Solr).
486
+
471
487
  ## Extending With Your Own Code
472
488
 
473
489
  Traject config files are full live ruby files, where you can do anything,
data/doc/xml.md CHANGED
@@ -4,6 +4,8 @@ The [NokogiriIndexer](../lib/traject/nokogiri_indexer.md) is a Traject::Indexer
4
4
 
5
5
  It by default uses the NokogiriReader to read XML and read Nokogiri::XML::Documents, and includes the NokogiriMacros mix-in, with some macros for operating on Nokogiri::XML::Documents.
6
6
 
7
+ Plese notice that the recommened mechanism to parse MARC XML files with Traject is via the `-t` parameter (or the via the `provide "marc_source.type", "xml"` setting). The documentation in this page is for those parsing other (non MARC) XML files.
8
+
7
9
  ## On the command-line
8
10
 
9
11
  You can tell the traject command-line to use the NokogiriIndexer with the `-i xml` flag:
@@ -0,0 +1,35 @@
1
+ <?xml version="1.0" encoding="UTF-8"?>
2
+ <collection xmlns="http://www.loc.gov/MARC21/slim" xmlns:marc="http://www.loc.gov/MARC21/slim">
3
+ <record>
4
+ <leader>01352cam a2200349 a 4500</leader>
5
+ <datafield tag="245" ind1="0" ind2="0">
6
+ <subfield code="6">880-01</subfield>
7
+ <subfield code="a">Kazoku kankei no shakai shinrigaku /</subfield>
8
+ <subfield code="c">Osada Masayoshi hen.</subfield>
9
+ </datafield>
10
+ </record>
11
+ <record>
12
+ <leader>01121ccm a2200289z 4500</leader>
13
+ <datafield tag="245" ind1="1" ind2="0">
14
+ <subfield code="a">Powhatan&#39;s daughter :</subfield>
15
+ <subfield code="b">march</subfield>
16
+ </datafield>
17
+ <datafield tag="100" ind1="1" ind2=" ">
18
+ <subfield code="a">Sousa, John Philip,</subfield>
19
+ <subfield code="d">1854-1932,</subfield>
20
+ <subfield code="e">composer.</subfield>
21
+ </datafield>
22
+ </record>
23
+ <record>
24
+ <leader>01137cam a2200301 a 4500</leader>
25
+ <datafield tag="245" ind1="1" ind2="0">
26
+ <subfield code="a">Two pieces /</subfield>
27
+ <subfield code="c">by Frank O&#39;Hara.</subfield>
28
+ </datafield>
29
+ <datafield tag="100" ind1="1" ind2=" ">
30
+ <subfield code="a">O&#39;Hara, Frank,</subfield>
31
+ <subfield code="d">1926-1966.</subfield>
32
+ <subfield code="0">http://id.loc.gov/authorities/names/n79042130</subfield>
33
+ </datafield>
34
+ </record>
35
+ </collection>
@@ -62,7 +62,7 @@ All records are assumed to have a unique id. You can set which field to look in
62
62
  def serialize(context)
63
63
  h = context.output_hash
64
64
  rec_key = record_number(context)
65
- lines = h.keys.sort.map { |k| @format % [rec_key, k, h[k].join(' | ')] }
65
+ lines = h.keys.sort.map { |k| @format % [rec_key, k, (h[k] || []).join(' | ')] }
66
66
  lines.push "\n"
67
67
  lines.join("\n")
68
68
  end
@@ -1,3 +1,5 @@
1
+ require 'nokogiri'
2
+
1
3
  module Traject
2
4
  # A Trajet reader which reads XML, and yields zero to many Nokogiri::XML::Document
3
5
  # objects as source records in the traject pipeline.
@@ -1,3 +1,3 @@
1
1
  module Traject
2
- VERSION = "3.5.0"
2
+ VERSION = "3.6.0"
3
3
  end
@@ -22,15 +22,16 @@ describe "Shell out to command line" do
22
22
 
23
23
  it "can display version" do
24
24
  out, err, result = execute_with_args("-v")
25
+
26
+ assert result.success?, "Expected subprocess exit code to be success.\nSTDERR:\n#{err}\n\nSTDOUT:\n#{out}"
25
27
  assert_equal err, "traject version #{Traject::VERSION}\n"
26
- assert result.success?
27
28
  end
28
29
 
29
30
  it "can display help text" do
30
31
  out, err, result = execute_with_args("-h")
31
32
 
33
+ assert result.success?, "Expected subprocess exit code to be success.\nSTDERR:\n#{err}\n\nSTDOUT:\n#{out}"
32
34
  assert err.start_with?("traject [options] -c configuration.rb [-c config2.rb] file.mrc")
33
- assert result.success?
34
35
  end
35
36
 
36
37
  it "handles bad argument" do
@@ -43,7 +44,7 @@ describe "Shell out to command line" do
43
44
  it "does basic dry run" do
44
45
  out, err, result = execute_with_args("--debug-mode -s one=two -s three=four -c test/test_support/demo_config.rb test/test_support/emptyish_record.marc")
45
46
 
46
- assert result.success?
47
+ assert result.success?, "Expected subprocess exit code to be success.\nSTDERR:\n#{err}\n\nSTDOUT:\n#{out}"
47
48
  assert_includes err, "executing with: `--debug-mode -s one=two -s three=four"
48
49
  assert_match /bib_1000165 +author_sort +Collection la/, out
49
50
  end
@@ -73,6 +73,19 @@ describe 'Simple output' do
73
73
 
74
74
  end
75
75
 
76
+ it "deals ok with nil values" do
77
+ record_with_nil_value = {"id"=>["2710183"], "title"=>["Manufacturing consent : the political economy of the mass media /"], "xyz"=>nil}
78
+ @writer.put Traject::Indexer::Context.new(:output_hash => record_with_nil_value)
79
+ expected = [
80
+ "#{@id} id #{@id}",
81
+ "#{@id} title #{@title}",
82
+ "#{@id} xyz",
83
+ "\n"
84
+ ]
85
+ assert_equal expected.join("\n").gsub(/\s/, ''), @io.string.gsub(/\s/, '')
86
+ @writer.close
87
+
88
+ end
76
89
  end
77
90
 
78
91
 
@@ -7,7 +7,8 @@ memory_writer_class = Class.new do
7
7
  # store them in a class variable so we can test em later
8
8
  # Supress the warning message
9
9
  original_verbose, $VERBOSE = $VERBOSE, nil
10
- @@last_writer_settings = @settings = settings
10
+ @settings = settings
11
+ self.class.store_last_writer_settings(@settings)
11
12
  # Activate warning messages again.
12
13
  $VERBOSE = original_verbose
13
14
  @settings["memory_writer.added"] = []
@@ -20,6 +21,16 @@ memory_writer_class = Class.new do
20
21
  def close
21
22
  @settings["memory_writer.closed"] = true
22
23
  end
24
+
25
+ private
26
+
27
+ def self.store_last_writer_settings(settings)
28
+ @last_writer_settings = settings
29
+ end
30
+
31
+ def self.last_writer_settings
32
+ @last_writer_settings
33
+ end
23
34
  end
24
35
 
25
36
  describe "Traject::Indexer#process" do
@@ -53,7 +64,7 @@ describe "Traject::Indexer#process" do
53
64
 
54
65
  # Grab the settings out of a class variable where we left em,
55
66
  # as a convenient place to store outcomes so we can test em.
56
- writer_settings = memory_writer_class.class_variable_get("@@last_writer_settings")
67
+ writer_settings = memory_writer_class.last_writer_settings
57
68
 
58
69
  assert writer_settings["memory_writer.added"]
59
70
  assert_equal 30, writer_settings["memory_writer.added"].length
@@ -146,7 +157,7 @@ describe "Traject::Indexer#process" do
146
157
  it "parses and loads" do
147
158
  @indexer.process([@file1, @file2])
148
159
  # kinda ridic, yeah.
149
- output_hashes = memory_writer_class.class_variable_get("@@last_writer_settings")["memory_writer.added"].collect(&:output_hash)
160
+ output_hashes = memory_writer_class.last_writer_settings["memory_writer.added"].collect(&:output_hash)
150
161
 
151
162
  assert_length 2, output_hashes
152
163
  assert output_hashes.all? { |hash| hash["title"].length > 0 }
data/traject.gemspec CHANGED
@@ -29,7 +29,7 @@ Gem::Specification.new do |spec|
29
29
  spec.add_dependency "yell" # logging
30
30
  spec.add_dependency "dot-properties", ">= 0.1.1" # reading java style .properties
31
31
  spec.add_dependency "httpclient", "~> 2.5"
32
- spec.add_dependency "http", ">= 3.0", "< 5" # used in oai_pmh_reader, may use more extensively in future instead of httpclient
32
+ spec.add_dependency "http", ">= 3.0", "< 6" # used in oai_pmh_reader, may use more extensively in future instead of httpclient
33
33
  spec.add_dependency 'marc-fastxmlwriter', '~>1.0' # fast marc->xml
34
34
  spec.add_dependency "nokogiri", "~> 1.9" # NokogiriIndexer
35
35
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: traject
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.5.0
4
+ version: 3.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Rochkind
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2020-12-14 00:00:00.000000000 Z
12
+ date: 2021-06-21 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: concurrent-ruby
@@ -124,7 +124,7 @@ dependencies:
124
124
  version: '3.0'
125
125
  - - "<"
126
126
  - !ruby/object:Gem::Version
127
- version: '5'
127
+ version: '6'
128
128
  type: :runtime
129
129
  prerelease: false
130
130
  version_requirements: !ruby/object:Gem::Requirement
@@ -134,7 +134,7 @@ dependencies:
134
134
  version: '3.0'
135
135
  - - "<"
136
136
  - !ruby/object:Gem::Version
137
- version: '5'
137
+ version: '6'
138
138
  - !ruby/object:Gem::Dependency
139
139
  name: marc-fastxmlwriter
140
140
  requirement: !ruby/object:Gem::Requirement
@@ -257,6 +257,7 @@ files:
257
257
  - doc/programmatic_use.md
258
258
  - doc/settings.md
259
259
  - doc/xml.md
260
+ - examples/marc/tiny.xml
260
261
  - lib/tasks/load_maps.rake
261
262
  - lib/traject.rb
262
263
  - lib/traject/array_writer.rb