traject-marc4j_reader 1.0.2-java → 1.1.0-java

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 209b0ffa0955679de3c33a051bc872d60adb1981
4
- data.tar.gz: 3ef2f390a73a6714c84286af700e14f4bfb0ba61
3
+ metadata.gz: 731f7a7338beafc60639283c73a4ac0f0ced73cc
4
+ data.tar.gz: b9c6733b4a14aead04cbbba9221dc7edd56ac294
5
5
  SHA512:
6
- metadata.gz: ca446d5ab5180a42b9d553d2e2e4ef9d707b09b6422cb9e1ec7303b84dd5f865b42f4e23436c59e04e36a7bc4bc1fa9eea04dc2d14e98cbd00f035f222d2b5b9
7
- data.tar.gz: d1bd3535b23938038a24a79a52292c36457054bbdd4fcc5d025b7f9c78d96b8c60926d212038be4fcd119714245ba703d2f018b84a760f0034d0d0810016eaf2
6
+ metadata.gz: f4fe9c0eb1f72492777b96778259a1a74e58a0e2633d36b8be0527da51c723209377a0542259629628081ad4695d8280d1e747b2eb9a151fa41a2c4932fac14a
7
+ data.tar.gz: e1ad15787f3e51ac71e58c83ea074b904979f35f3337989e643344361c14b318539f78923cf29b6b2cba0fa33153df6f23a519532f949739e75e322e9c3c52f7
@@ -0,0 +1,27 @@
1
+ Adds a new setting to allow finer-grained control over which `marc4j` reader
2
+ is used to process binary MARC.
3
+
4
+ `traject` uses a broadly permissive set of defaults to read binary MARC
5
+ records, which may not always be what you want. The `permissive` setting sets
6
+ the flag of the same name on the
7
+ (https://github.com/marc4j/marc4j/blob/master/src/org/marc4j/MarcPermissiveStreamReader.java#L164)['permissive'
8
+ reader class] provided by marc4j (which, at the time of writing, controls how
9
+ that reader guesses the encoding of input records.
10
+
11
+ ## Use the "strict" `org.marc4j.MarcStreamReader` class to read MARC21
12
+
13
+ In situations where you want stricter record processing -- or in case your records can't be processed by the permissive stream reader (paradoxically, `MarcStreamReader` is more forgiving of certain non-standard MARC, e.g. uppercase subfields), you can specify that `traject` should use the `org.marc4j.MarcStreamReader` class:
14
+
15
+ ```ruby
16
+ settings do
17
+ provide 'marc4j_reader.class', 'MarcStreamReader'
18
+ end
19
+ ```
20
+
21
+ ## A note about `permissive`
22
+
23
+ The `marc4j_reader.permissive` setting, which previously existed, is passed
24
+ through to the constructor of the `MarcPermissiveStreamReader` class, and does
25
+ not effect which class is used to read MARC21 input. If you set both this parameter and the `marc4j_reader.class` parameter, the `permissive` setting will be ignored.
26
+
27
+
data/README.md CHANGED
@@ -26,17 +26,28 @@ of the workload in a traject run, you'll almost certainly see performance gains.
26
26
 
27
27
  ## Installation
28
28
 
29
- Add this line to your application's Gemfile:
29
+ Traject prior to 3.0 included this as a dependency on JRuby, and defaulted to using it.
30
30
 
31
- gem 'traject-marc4j_reader'
31
+ In Traject 3.0+, you need to manually add this gem and configure to use it.
32
32
 
33
- And then execute:
33
+ If you are using bundler and a `Gemfile`, add `gem "traject-marc4j_reader", "~> 1.0"` to your `Gemfile`. Otherwise, just `gem install traject-marc4j_reader`.
34
34
 
35
- $ bundle
35
+ Then, in your traject config file:
36
36
 
37
- Or install it yourself as:
37
+ # Instead of require in config file, you could use the `-r` traject
38
+ # command-line option.
39
+ require 'traject/marc4j_reader'
38
40
 
39
- $ gem install traject-marc4j_reader
41
+ settings do
42
+ provide "reader_class_name", "Traject::Marc4JReader"
43
+
44
+ # Recommend marc4j_reader.permissive true unless you have reason not to.
45
+ # true was default provided by core traject gem in Traject pre-3.0, but isn't
46
+ # anymore in traject 3.0 -- so set to true explicitly to maintain behavior
47
+ #
48
+ # Only relevant for binary MARC source data.
49
+ provide "marc4j_reader.permissive", true
50
+ end
40
51
 
41
52
  ## Traject::Marc4jReader settings
42
53
 
@@ -49,7 +60,7 @@ so output will always reflect that conversion.
49
60
  * `marc4j.jar_dir`: Path to a directory containing Marc4J jar file to use. All .jar's in dir will
50
61
  be loaded. If unset, uses marc4j.jar bundled with traject.
51
62
 
52
- * `marc4j_reader.permissive`: Used by Marc4JReader only when marc.source_type is 'binary', boolean, argument to the underlying MarcPermissiveStreamReader. Default true.
63
+ * `marc4j_reader.permissive`: Used by Marc4JReader only when marc.source_type is 'binary', boolean, argument to the underlying MarcPermissiveStreamReader. Default false, but recommend true for most uses.
53
64
 
54
65
  * `marc4j_reader.source_encoding`: Used by Marc4JReader only when marc.source_type is 'binary', encoding strings accepted
55
66
  by marc4j MarcPermissiveStreamReader. Default "BESTGUESS", also "UTF-8", "MARC"
@@ -57,6 +68,8 @@ so output will always reflect that conversion.
57
68
  * `marc4j_reader.keep_marc4j`: After translating the marc4j record into a normal ruby-marc object,
58
69
  provides access to the former via `record#original_marc4j`.
59
70
 
71
+ * 'marc4j_reader.class': Set to eg 'MarcStreamReader' to use that more strict Marc4J reader class, instead of the default Marc4J `MarcPermissiveStreamReader`.
72
+
60
73
 
61
74
  ## Sample use
62
75
 
@@ -19,9 +19,9 @@ require 'marc/marc4j'
19
19
  #
20
20
  # * marc_source.type: serialization type. default 'binary', also 'xml' (TODO: json/marc-in-json)
21
21
  #
22
- # * marc4j_reader.permissive: default true, false to turn off permissive reading. Used as
22
+ # * marc4j_reader.permissive: Used as
23
23
  # value to 'permissive' arg of MarcPermissiveStreamReader constructor.
24
- # Only used for 'binary'
24
+ # Only used for 'binary'. Default false, but recommend true for most uses.
25
25
  #
26
26
  # * marc_source.encoding: Only used for 'binary', otherwise always UTF-8.
27
27
  # String of the values MarcPermissiveStreamReader accepts:
@@ -81,6 +81,7 @@ class Traject::Marc4JReader
81
81
 
82
82
  # Convenience
83
83
  java_import org.marc4j.MarcPermissiveStreamReader
84
+ java_import org.marc4j.MarcStreamReader
84
85
  java_import org.marc4j.MarcXmlReader
85
86
 
86
87
  end
@@ -112,19 +113,26 @@ class Traject::Marc4JReader
112
113
  def create_marc_reader!
113
114
  case input_type
114
115
  when "binary"
115
- permissive = settings["marc4j_reader.permissive"].to_s == "true"
116
-
117
- # #to_inputstream turns our ruby IO into a Java InputStream
118
- # third arg means 'convert to UTF-8, yes'
119
- MarcPermissiveStreamReader.new(input_stream.to_inputstream, permissive, true, specified_source_encoding)
116
+ the_stream = input_stream.to_inputstream
117
+ if settings['marc4j_reader.class'] == 'MarcStreamReader'
118
+ MarcStreamReader.new(the_stream, specified_source_encoding)
119
+ else
120
+ permissive = settings["marc4j_reader.permissive"].to_s == "true"
121
+
122
+ # #to_inputstream turns our ruby IO into a Java InputStream
123
+ # third arg means 'convert to UTF-8, yes'
124
+ MarcPermissiveStreamReader.new(the_stream, permissive, true, specified_source_encoding)
125
+ end
120
126
  when "xml"
121
127
  MarcXmlReader.new(input_stream.to_inputstream)
122
128
  else
123
- raise IllegalArgument.new("Unrecgonized marc_source.type: #{input_type}")
129
+ raise ArgumentError.new("Unrecgonized marc_source.type: #{input_type}")
124
130
  end
125
131
  end
126
132
 
127
133
  def each
134
+ return to_enum(:each) unless block_given?
135
+
128
136
  while (internal_reader.hasNext)
129
137
  begin
130
138
  marc4j = internal_reader.next
@@ -1,5 +1,5 @@
1
1
  module Traject
2
2
  class Marc4JReader
3
- VERSION = "1.0.2"
3
+ VERSION = "1.1.0"
4
4
  end
5
5
  end
metadata CHANGED
@@ -1,83 +1,83 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: traject-marc4j_reader
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.2
4
+ version: 1.1.0
5
5
  platform: java
6
6
  authors:
7
7
  - Bill Dueber
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-02-16 00:00:00.000000000 Z
11
+ date: 2018-10-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
+ name: marc
14
15
  requirement: !ruby/object:Gem::Requirement
15
16
  requirements:
16
- - - ~>
17
+ - - "~>"
17
18
  - !ruby/object:Gem::Version
18
19
  version: '1.0'
19
- name: marc
20
- prerelease: false
21
20
  type: :runtime
21
+ prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - ~>
24
+ - - "~>"
25
25
  - !ruby/object:Gem::Version
26
26
  version: '1.0'
27
27
  - !ruby/object:Gem::Dependency
28
+ name: marc-marc4j
28
29
  requirement: !ruby/object:Gem::Requirement
29
30
  requirements:
30
- - - ~>
31
+ - - "~>"
31
32
  - !ruby/object:Gem::Version
32
33
  version: '1.0'
33
- name: marc-marc4j
34
- prerelease: false
35
34
  type: :runtime
35
+ prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - ~>
38
+ - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: '1.0'
41
41
  - !ruby/object:Gem::Dependency
42
+ name: bundler
42
43
  requirement: !ruby/object:Gem::Requirement
43
44
  requirements:
44
- - - ~>
45
+ - - "~>"
45
46
  - !ruby/object:Gem::Version
46
47
  version: '1.6'
47
- name: bundler
48
- prerelease: false
49
48
  type: :development
49
+ prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - ~>
52
+ - - "~>"
53
53
  - !ruby/object:Gem::Version
54
54
  version: '1.6'
55
55
  - !ruby/object:Gem::Dependency
56
+ name: rake
56
57
  requirement: !ruby/object:Gem::Requirement
57
58
  requirements:
58
- - - '>='
59
+ - - ">="
59
60
  - !ruby/object:Gem::Version
60
61
  version: '0'
61
- name: rake
62
- prerelease: false
63
62
  type: :development
63
+ prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
- - - '>='
66
+ - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
69
  - !ruby/object:Gem::Dependency
70
+ name: minitest
70
71
  requirement: !ruby/object:Gem::Requirement
71
72
  requirements:
72
- - - '>='
73
+ - - ">="
73
74
  - !ruby/object:Gem::Version
74
75
  version: '0'
75
- name: minitest
76
- prerelease: false
77
76
  type: :development
77
+ prerelease: false
78
78
  version_requirements: !ruby/object:Gem::Requirement
79
79
  requirements:
80
- - - '>='
80
+ - - ">="
81
81
  - !ruby/object:Gem::Version
82
82
  version: '0'
83
83
  description: 'Allows jruby users to leverage marc-marc4j to use marc4j as a reader
@@ -88,8 +88,9 @@ executables: []
88
88
  extensions: []
89
89
  extra_rdoc_files: []
90
90
  files:
91
- - .gitignore
92
- - .travis.yml
91
+ - ".gitignore"
92
+ - ".travis.yml"
93
+ - CHANGES.md
93
94
  - Gemfile
94
95
  - LICENSE.txt
95
96
  - README.md
@@ -110,24 +111,24 @@ homepage: ''
110
111
  licenses:
111
112
  - MIT
112
113
  metadata: {}
113
- post_install_message:
114
+ post_install_message:
114
115
  rdoc_options: []
115
116
  require_paths:
116
117
  - lib
117
118
  required_ruby_version: !ruby/object:Gem::Requirement
118
119
  requirements:
119
- - - '>='
120
+ - - ">="
120
121
  - !ruby/object:Gem::Version
121
122
  version: '0'
122
123
  required_rubygems_version: !ruby/object:Gem::Requirement
123
124
  requirements:
124
- - - '>='
125
+ - - ">="
125
126
  - !ruby/object:Gem::Version
126
127
  version: '0'
127
128
  requirements: []
128
- rubyforge_project:
129
- rubygems_version: 2.1.9
130
- signing_key:
129
+ rubyforge_project:
130
+ rubygems_version: 2.5.2.3
131
+ signing_key:
131
132
  specification_version: 4
132
133
  summary: Use marc4j (java) library under traject
133
134
  test_files:
@@ -140,4 +141,3 @@ test_files:
140
141
  - test/test_support/test_data.utf8.marc.xml
141
142
  - test/test_support/test_data.utf8.mrc
142
143
  - test/test_traject_marc4j_reader.rb
143
- has_rdoc: