traject 2.1.0-java → 2.2.0-java
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +2 -0
- data/.travis.yml +8 -20
- data/CHANGES.md +14 -0
- data/README.md +35 -56
- data/doc/extending.md +20 -27
- data/doc/indexing_rules.md +46 -57
- data/doc/settings.md +17 -48
- data/lib/traject/debug_writer.rb +31 -5
- data/lib/traject/indexer.rb +6 -4
- data/lib/traject/marc_extractor.rb +37 -157
- data/lib/traject/marc_extractor_spec.rb +229 -0
- data/lib/traject/version.rb +1 -1
- data/test/debug_writer_test.rb +41 -0
- data/test/marc_extractor_test.rb +24 -24
- data/test/test_support/demo_config.rb +1 -1
- data/traject.gemspec +5 -5
- metadata +74 -73
data/doc/settings.md
CHANGED
@@ -25,80 +25,49 @@ settings are applied first of all. It's recommended you use `provide`.
|
|
25
25
|
|
26
26
|
## Known settings
|
27
27
|
|
28
|
-
* `debug_ascii_progress`: true/'true' to print ascii characters to STDERR indicating progress.
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
* `%` for completing of a Solr 'add'
|
34
|
-
* `!` when threadpool for solr add has a full queue, so solr add is
|
35
|
-
going to happen in calling queue -- means solr adding can't
|
36
|
-
keep up with production.
|
28
|
+
* `debug_ascii_progress`: true/'true' to print ascii characters to STDERR indicating progress. Yes, this is fixed to STDERR, regardless of your logging setup.
|
29
|
+
* `.` for every batch of records read and parsed
|
30
|
+
* `^` for every batch of records batched and queued for adding to solr (possibly in thread pool)
|
31
|
+
* `%` for completing of a Solr 'add'
|
32
|
+
* `!` when threadpool for solr add has a full queue, so solr add is going to happen in calling queue -- means solr adding can't keep up with production.
|
37
33
|
|
38
34
|
* `json_writer.pretty_print`: used by the JsonWriter, if set to true, will output pretty printed json (with added whitespace) for easier human readability. Default false.
|
39
35
|
|
40
36
|
* `log.file`: filename to send logging, or 'STDOUT' or 'STDERR' for those streams. Default STDERR
|
41
37
|
|
42
|
-
* `log.error_file`: Default nil, if set then all log lines of ERROR and higher will be _additionally_
|
43
|
-
sent to error file named.
|
38
|
+
* `log.error_file`: Default nil, if set then all log lines of ERROR and higher will be _additionally_ sent to error file named.
|
44
39
|
|
45
40
|
* `log.format`: Formatting string used by Yell logger. https://github.com/rudionrails/yell/wiki/101-formatting-log-messages
|
46
41
|
|
47
|
-
* `log.level`: Log this level and above. Default 'info', set to eg 'debug' to get potentially more logging info,
|
48
|
-
or 'error' to get less. https://github.com/rudionrails/yell/wiki/101-setting-the-log-level
|
42
|
+
* `log.level`: Log this level and above. Default 'info', set to eg 'debug' to get potentially more logging info, or 'error' to get less. https://github.com/rudionrails/yell/wiki/101-setting-the-log-level
|
49
43
|
|
50
|
-
* `log.batch_size`: If set to a number N (or string representation), will output a progress line to
|
51
|
-
log. (by default as INFO, but see log.batch_size.severity)
|
44
|
+
* `log.batch_size`: If set to a number N (or string representation), will output a progress line to log. (by default as INFO, but see log.batch_size.severity)
|
52
45
|
|
53
46
|
* `log.batch_size.severity`: If `log.batch_size` is set, what logger severity level to log to. Default "INFO", set to "DEBUG" etc if desired.
|
54
47
|
|
55
48
|
* `marc_source.type`: default 'binary'. Can also set to 'xml' or (not yet implemented todo) 'json'. Command line shortcut `-t`
|
56
49
|
|
57
|
-
* `marcout.allow_oversized`: Used with `-x marcout` command to output marc when outputting
|
58
|
-
as ISO 2709 binary, set to true or string "true", and the MARC::Writer will have
|
59
|
-
allow_oversized=true set, allowing oversized records to be serialized with length
|
60
|
-
bytes zero'd out -- technically illegal, but can be read by MARC::Reader in permissive mode.
|
50
|
+
* `marcout.allow_oversized`: Used with `-x marcout` command to output marc when outputting as ISO 2709 binary, set to true or string "true", and the MARC::Writer will have allow_oversized=true set, allowing oversized records to be serialized with length bytes zero'd out -- technically illegal, but can be read by MARC::Reader in permissive mode.
|
61
51
|
|
62
|
-
* `output_file`: Output file to write to for operations that write to files: For instance the `marcout` command,
|
63
|
-
or Writer classes that write to files, like Traject::JsonWriter. Has an shortcut
|
64
|
-
`-o` on command line.
|
52
|
+
* `output_file`: Output file to write to for operations that write to files: For instance the `marcout` command, or Writer classes that write to files, like Traject::JsonWriter. Has an shortcut `-o` on command line.
|
65
53
|
|
66
|
-
* `processing_thread_pool` Number of threads in the main thread pool used for processing
|
67
|
-
records with input rules. On JRuby or Rubinius, defaults to 1 less than the number of processors detected on your machine. On other ruby platforms, defaults to 1. Set to 0 or nil
|
68
|
-
to disable thread pool, and do all processing in main thread.
|
54
|
+
* `processing_thread_pool` Number of threads in the main thread pool used for processing records with input rules. On JRuby or Rubinius, defaults to 1 less than the number of processors detected on your machine. On other ruby platforms, defaults to 1. Set to 0 or nil to disable thread pool, and do all processing in main thread.
|
69
55
|
|
70
|
-
|
71
|
-
might want to try different sizes and measure which works best for you.
|
72
|
-
Probably no reason for it ever to be more than number of cores on indexing machine.
|
56
|
+
Choose a pool size based on size of your machine, and complexity of your indexing rules, you might want to try different sizes and measure which works best for you. Probably no reason for it ever to be more than number of cores on indexing machine.
|
73
57
|
|
74
58
|
|
75
|
-
* `reader_class_name`: a Traject Reader class, used by the indexer as a source
|
76
|
-
of records. Defaults to Traject::Marc4JReader (using the Java Marc4J
|
77
|
-
library) on JRuby; Traject::MarcReader (using the ruby marc gem) otherwise.
|
78
|
-
Command-line shortcut `-r`
|
59
|
+
* `reader_class_name`: a Traject Reader class, used by the indexer as a source of records. Defaults to Traject::Marc4JReader (using the Java Marc4J library) on JRuby; Traject::MarcReader (using the ruby marc gem) otherwise. Command-line shortcut `-r`
|
79
60
|
|
80
61
|
* `solr.url`: URL to connect to a solr instance for indexing, eg http://example.org:8983/solr . Command-line short-cut `-u`.
|
81
62
|
|
82
|
-
* `solr.version`: Set to eg "1.4.0", "4.3.0"; currently un-used, but in the future will control
|
83
|
-
change some default settings, and/or sanity check and warn you if you're doing something
|
84
|
-
that might not work with that version of solr. Set now for help in the future.
|
63
|
+
* `solr.version`: Set to eg "1.4.0", "4.3.0"; currently un-used, but in the future will control some default settings, and/or sanity check and warn you if you're doing something that might not work with that version of solr. Set now for help in the future.
|
85
64
|
|
86
|
-
* `solr_writer.batch_size`: size of batches that SolrJsonWriter will send docs to Solr in. Default 100. Set to nil,
|
87
|
-
0, or 1, and SolrJsonWriter will do one http transaction per document, no batching.
|
65
|
+
* `solr_writer.batch_size`: size of batches that SolrJsonWriter will send docs to Solr in. Default 100. Set to nil, 0, or 1, and SolrJsonWriter will do one http transaction per document, no batching.
|
88
66
|
|
89
67
|
* `solr_writer.commit_on_close`: default false, set to true to have the solr writer send an explicit commit message to Solr after indexing.
|
90
68
|
|
69
|
+
* `solr_writer.thread_pool`: defaults to 1 (single bg thread). A thread pool is used for submitting docs to solr. Set to 0 or nil to disable threading. Set to 1, there will still be a single bg thread doing the adds. May make sense to set higher than number of cores on your indexing machine, as these threads will mostly be waiting on Solr. Speed/capacity of your solr might be more relevant. Note that processing_thread_pool threads can end up submitting to solr too, if solr_json_writer.thread_pool is full.
|
91
70
|
|
92
|
-
* `
|
93
|
-
to solr. Set to 0 or nil to disable threading. Set to 1,
|
94
|
-
there will still be a single bg thread doing the adds.
|
95
|
-
May make sense to set higher than number of cores on your
|
96
|
-
indexing machine, as these threads will mostly be waiting
|
97
|
-
on Solr. Speed/capacity of your solr might be more relevant.
|
98
|
-
Note that processing_thread_pool threads can end up submitting
|
99
|
-
to solr too, if solr_json_writer.thread_pool is full.
|
100
|
-
|
101
|
-
* `writer`: An object that implements the Traject Writer interface. If set, takes precedence
|
102
|
-
over `writer_class_name`.
|
71
|
+
* `writer`: An object that implements the Traject Writer interface. If set, takes precedence over `writer_class_name`.
|
103
72
|
|
104
73
|
* `writer_class_name`: a Traject Writer class, used by indexer to send processed dictionaries off. Will be used if no explicit `writer` setting or `#writer=` is set. Default Traject::SolrJsonWriter, other writers for debugging or writing to files are also available. See Traject::Indexer for more info. Command line shortcut `-w`
|
data/lib/traject/debug_writer.rb
CHANGED
@@ -32,14 +32,40 @@ require 'traject/line_writer'
|
|
32
32
|
# provide "output_file", "out.txt"
|
33
33
|
# end
|
34
34
|
class Traject::DebugWriter < Traject::LineWriter
|
35
|
-
DEFAULT_FORMAT = '%-12s %-25s %s'
|
36
35
|
DEFAULT_IDFIELD = 'id'
|
36
|
+
DEFAULT_FORMAT = '%-12s %-25s %s'
|
37
|
+
|
38
|
+
def initialize(*)
|
39
|
+
super
|
40
|
+
@idfield = settings["debug_writer.idfield"] || DEFAULT_IDFIELD
|
41
|
+
@format = settings['debug_writer.format'] || DEFAULT_FORMAT
|
42
|
+
|
43
|
+
if @idfield == 'record_position' then
|
44
|
+
@use_position = true
|
45
|
+
end
|
46
|
+
|
47
|
+
@already_threw_warning_about_missing_id = false
|
48
|
+
|
49
|
+
end
|
50
|
+
|
51
|
+
def record_number(context)
|
52
|
+
return context.position if @use_position
|
53
|
+
if context.output_hash.has_key?(@idfield)
|
54
|
+
context.output_hash[@idfield].first
|
55
|
+
else
|
56
|
+
unless @already_threw_warning_about_missing_id
|
57
|
+
context.logger.warn "At least one record (##{context.position}) doesn't define field '#{@idfield}'.
|
58
|
+
All records are assumed to have a unique id. You can set which field to look in via the setting 'debug_writer.idfield'"
|
59
|
+
@already_threw_warning_about_missing_id = true
|
60
|
+
end
|
61
|
+
"record_num_#{context.position}"
|
62
|
+
end
|
63
|
+
end
|
37
64
|
|
38
65
|
def serialize(context)
|
39
|
-
|
40
|
-
|
41
|
-
h
|
42
|
-
lines = h.keys.sort.map {|k| format % [h[idfield].first, k, h[k].join(' | ')] }
|
66
|
+
h = context.output_hash
|
67
|
+
rec_key = record_number(context)
|
68
|
+
lines = h.keys.sort.map { |k| @format % [rec_key, k, h[k].join(' | ')] }
|
43
69
|
lines.push "\n"
|
44
70
|
lines.join("\n")
|
45
71
|
end
|
data/lib/traject/indexer.rb
CHANGED
@@ -8,6 +8,8 @@ require 'traject/indexer/settings'
|
|
8
8
|
require 'traject/marc_reader'
|
9
9
|
require 'traject/json_writer'
|
10
10
|
require 'traject/solr_json_writer'
|
11
|
+
require 'traject/debug_writer'
|
12
|
+
|
11
13
|
|
12
14
|
require 'traject/macros/marc21'
|
13
15
|
require 'traject/macros/basic'
|
@@ -98,7 +100,7 @@ end
|
|
98
100
|
#
|
99
101
|
# This may raise if the file is not readable. Or if the config file
|
100
102
|
# can't be evaluated, it will raise a Traject::Indexer::ConfigLoadError
|
101
|
-
# with a bunch of contextual information useful to reporting to developer.
|
103
|
+
# with a bunch of contextual information useful to reporting to developer.
|
102
104
|
#
|
103
105
|
# You can also instead, or in addition, write configuration inline using
|
104
106
|
# standard ruby `instance_eval`:
|
@@ -704,15 +706,15 @@ class Traject::Indexer
|
|
704
706
|
end
|
705
707
|
|
706
708
|
# Raised by #load_config_file when config file can not
|
707
|
-
# be processed.
|
709
|
+
# be processed.
|
708
710
|
#
|
709
711
|
# The exception #message includes an error message formatted
|
710
|
-
# for good display to the developer, in the console.
|
712
|
+
# for good display to the developer, in the console.
|
711
713
|
#
|
712
714
|
# Original exception raised when processing config file
|
713
715
|
# can be found in #original. Original exception should ordinarily
|
714
716
|
# have a good stack trace, including the file path of the config
|
715
|
-
# file in question.
|
717
|
+
# file in question.
|
716
718
|
#
|
717
719
|
# Original config path in #config_file, and line number in config
|
718
720
|
# file that triggered the exception in #config_file_lineno (may be nil)
|
@@ -1,3 +1,5 @@
|
|
1
|
+
require 'traject/marc_extractor_spec'
|
2
|
+
|
1
3
|
module Traject
|
2
4
|
# MarcExtractor is a class for extracting lists of strings from a MARC::Record,
|
3
5
|
# according to specifications. See #parse_string_spec for description of string
|
@@ -36,7 +38,7 @@ module Traject
|
|
36
38
|
# and includes a tag and a a byte slice specification.
|
37
39
|
#
|
38
40
|
# "008[35-37]:007[5]""
|
39
|
-
# => bytes 35-37 inclusive of any field 008, and byte 5 of any field 007
|
41
|
+
# => bytes 35-37 inclusive of any field 008, and byte 5 of any field 007
|
40
42
|
#
|
41
43
|
# * subfields and indicators can only be provided for marc data/variable fields
|
42
44
|
# * byte slice can only be provided for marc control fields (generally tags less than 010)
|
@@ -105,7 +107,9 @@ module Traject
|
|
105
107
|
# lazily create and then re-use a MarcExtractor object with
|
106
108
|
# particular initialization arguments.
|
107
109
|
class MarcExtractor
|
108
|
-
attr_accessor :options, :
|
110
|
+
attr_accessor :options, :spec_set
|
111
|
+
|
112
|
+
ALTERNATE_SCRIPT_TAG = '880'
|
109
113
|
|
110
114
|
# First arg is a specification for extraction of data from a MARC record.
|
111
115
|
# Specification can be given in two forms:
|
@@ -126,30 +130,48 @@ module Traject
|
|
126
130
|
# * :only => only include linked 880s, not original
|
127
131
|
def initialize(spec, options = {})
|
128
132
|
self.options = {
|
129
|
-
|
130
|
-
|
133
|
+
:separator => ' ',
|
134
|
+
:alternate_script => :include
|
131
135
|
}.merge(options)
|
132
136
|
|
133
|
-
self.
|
137
|
+
self.spec_set = SpecSet.new(spec)
|
134
138
|
|
135
139
|
|
136
140
|
# Tags are "interesting" if we have a spec that might cover it
|
137
|
-
@interesting_tags_hash = {}
|
138
|
-
|
139
141
|
# By default, interesting tags are those represented by keys in spec_hash.
|
140
142
|
# Add them unless we only care about alternate scripts.
|
141
143
|
unless options[:alternate_script] == :only
|
142
|
-
self.
|
144
|
+
self.spec_set.tags.each { |tag| show_interest_in_tag(tag) }
|
143
145
|
end
|
144
146
|
|
145
147
|
# If we *are* interested in alternate scripts, add the 880
|
146
148
|
if options[:alternate_script] != false
|
147
|
-
@
|
149
|
+
@fetch_alternate_script = true
|
150
|
+
show_interest_in_tag(ALTERNATE_SCRIPT_TAG)
|
148
151
|
end
|
149
152
|
|
150
153
|
self.freeze
|
151
154
|
end
|
152
155
|
|
156
|
+
|
157
|
+
# Declare that we're interested in a tag
|
158
|
+
def show_interest_in_tag(tag)
|
159
|
+
@interesting_tags_hash ||= {}
|
160
|
+
@interesting_tags_hash[tag] = true
|
161
|
+
end
|
162
|
+
|
163
|
+
# Check to see if a tag is interesting (meaning it may be covered by a spec
|
164
|
+
# and the passed-in options about alternate scripts)
|
165
|
+
def interesting_tag?(tag)
|
166
|
+
return @interesting_tags_hash.include?(tag)
|
167
|
+
end
|
168
|
+
|
169
|
+
# All the "interesting" tags
|
170
|
+
def interesting_tags
|
171
|
+
@interesting_tags_hash.keys
|
172
|
+
end
|
173
|
+
|
174
|
+
|
153
175
|
# Takes the same arguments as MarcExtractor.new, but will re-use an existing
|
154
176
|
# cached MarcExtractor already created with given initialization arguments,
|
155
177
|
# if available.
|
@@ -169,80 +191,11 @@ module Traject
|
|
169
191
|
# extractor = MarcExtractor.cached("245abc:700a", :separator => nil)
|
170
192
|
def self.cached(*args)
|
171
193
|
cache = (Thread.current[:marc_extractor_cached] ||= Hash.new)
|
172
|
-
return (
|
194
|
+
return (cache[args] ||= Traject::MarcExtractor.new(*args).freeze)
|
173
195
|
end
|
174
196
|
|
175
|
-
# Check to see if a tag is interesting (meaning it may be covered by a spec
|
176
|
-
# and the passed-in options about alternate scripts)
|
177
|
-
def interesting_tag?(tag)
|
178
|
-
return @interesting_tags_hash.include?(tag)
|
179
|
-
end
|
180
|
-
|
181
|
-
|
182
|
-
# Converts from a string marc spec like "008[35]:245abc:700a" to a hash used internally
|
183
|
-
# to represent the specification. See comments at head of class for
|
184
|
-
# documentation of string specification format.
|
185
|
-
#
|
186
|
-
#
|
187
|
-
# ## Return value
|
188
|
-
#
|
189
|
-
# The hash returned is keyed by tag, and has as values an array of 0 or
|
190
|
-
# or more MarcExtractor::Spec objects representing the specified extraction
|
191
|
-
# operations for that tag.
|
192
|
-
#
|
193
|
-
# It's an array of possibly more than one, because you can specify
|
194
|
-
# multiple extractions on the same tag: for instance "245a:245abc"
|
195
|
-
#
|
196
|
-
# See tests for more examples.
|
197
|
-
def self.parse_string_spec(spec_string)
|
198
|
-
# hash defaults to []
|
199
|
-
hash = Hash.new
|
200
|
-
|
201
|
-
spec_strings = spec_string.is_a?(Array) ? spec_string.map{|s| s.split(/\s*:\s*/)}.flatten : spec_string.split(/s*:\s*/)
|
202
|
-
|
203
|
-
spec_strings.each do |part|
|
204
|
-
if (part =~ /\A([a-zA-Z0-9]{3})(\|([a-z0-9\ \*]{2})\|)?([a-z0-9]*)?\Z/)
|
205
|
-
# variable field
|
206
|
-
tag, indicators, subfields = $1, $3, $4
|
207
|
-
|
208
|
-
spec = Spec.new(:tag => tag)
|
209
|
-
|
210
|
-
if subfields and !subfields.empty?
|
211
|
-
spec.subfields = subfields.split('')
|
212
|
-
end
|
213
|
-
|
214
|
-
if indicators
|
215
|
-
# if specified as '*', leave nil
|
216
|
-
spec.indicator1 = indicators[0] if indicators[0] != "*"
|
217
|
-
spec.indicator2 = indicators[1] if indicators[1] != "*"
|
218
|
-
end
|
219
|
-
|
220
|
-
hash[spec.tag] ||= []
|
221
|
-
hash[spec.tag] << spec
|
222
197
|
|
223
|
-
|
224
|
-
tag, byte1, byte2 = $1, $3, $5
|
225
|
-
|
226
|
-
spec = Spec.new(:tag => tag)
|
227
|
-
|
228
|
-
if byte1 && byte2
|
229
|
-
spec.bytes = ((byte1.to_i)..(byte2.to_i))
|
230
|
-
elsif byte1
|
231
|
-
spec.bytes = byte1.to_i
|
232
|
-
end
|
233
|
-
|
234
|
-
hash[spec.tag] ||= []
|
235
|
-
hash[spec.tag] << spec
|
236
|
-
else
|
237
|
-
raise ArgumentError.new("Unrecognized marc extract specification: #{part}")
|
238
|
-
end
|
239
|
-
end
|
240
|
-
|
241
|
-
return hash
|
242
|
-
end
|
243
|
-
|
244
|
-
|
245
|
-
# Returns array of strings, extracted values. Maybe empty array.
|
198
|
+
# Returns array of strings from a MARC::Record, extracted values. May be empty array.
|
246
199
|
def extract(marc_record)
|
247
200
|
results = []
|
248
201
|
|
@@ -265,14 +218,10 @@ module Traject
|
|
265
218
|
# Third (optional) arg to block is self, the MarcExtractor object, useful for custom
|
266
219
|
# implementations.
|
267
220
|
def each_matching_line(marc_record)
|
268
|
-
marc_record.fields(
|
221
|
+
marc_record.fields(interesting_tags).each do |field|
|
269
222
|
|
270
|
-
# Make sure it matches indicators too, specs_covering_field
|
271
|
-
# doesn't check that.
|
272
223
|
specs_covering_field(field).each do |spec|
|
273
|
-
if spec.matches_indicators?(field)
|
274
224
|
yield(field, spec, self)
|
275
|
-
end
|
276
225
|
end
|
277
226
|
|
278
227
|
end
|
@@ -314,29 +263,13 @@ module Traject
|
|
314
263
|
end
|
315
264
|
|
316
265
|
|
317
|
-
|
318
266
|
# Find Spec objects, if any, covering extraction from this field.
|
319
267
|
# Returns an array of 0 or more MarcExtractor::Spec objects
|
320
268
|
#
|
321
|
-
# When given an 880, will return the spec (if any) for the linked tag iff
|
322
|
-
# we have a $6 and we want the alternate script.
|
323
|
-
#
|
324
269
|
# Returns an empty array in case of no matching extraction specs.
|
325
270
|
def specs_covering_field(field)
|
326
|
-
|
327
|
-
|
328
|
-
# Short-circuit the unintersting stuff
|
329
|
-
return [] unless interesting_tag?(tag)
|
330
|
-
|
331
|
-
# Due to bug in jruby https://github.com/jruby/jruby/issues/886 , we need
|
332
|
-
# to do this weird encode gymnastics, which fixes it for mysterious reasons.
|
333
|
-
|
334
|
-
if tag == "880" && field['6']
|
335
|
-
tag = field["6"].encode(field["6"].encoding).byteslice(0,3)
|
336
|
-
end
|
337
|
-
|
338
|
-
# Take the resulting tag and get the spec from it (or the default nil if there isn't a spec for this tag)
|
339
|
-
spec = self.spec_hash[tag] || []
|
271
|
+
return [] unless interesting_tag?(field.tag)
|
272
|
+
self.spec_set.specs_matching_field(field, @fetch_alternate_script)
|
340
273
|
end
|
341
274
|
|
342
275
|
|
@@ -348,63 +281,10 @@ module Traject
|
|
348
281
|
|
349
282
|
def freeze
|
350
283
|
self.options.freeze
|
351
|
-
self.
|
284
|
+
self.spec_set.freeze
|
352
285
|
super
|
353
286
|
end
|
354
287
|
|
355
288
|
|
356
|
-
# Represents a single specification for extracting data
|
357
|
-
# from a marc field, like "600abc" or "600|1*|x".
|
358
|
-
#
|
359
|
-
# Includes the tag for reference, although this is redundant and not actually used
|
360
|
-
# in logic, since the tag is also implicit in the overall spec_hash
|
361
|
-
# with tag => [spec1, spec2]
|
362
|
-
class Spec
|
363
|
-
attr_accessor :tag, :subfields, :indicator1, :indicator2, :bytes
|
364
|
-
|
365
|
-
def initialize(hash = {})
|
366
|
-
hash.each_pair do |key, value|
|
367
|
-
self.send("#{key}=", value)
|
368
|
-
end
|
369
|
-
end
|
370
|
-
|
371
|
-
|
372
|
-
# Should subfields extracted by joined, if we have a seperator?
|
373
|
-
# * '630' no subfields specified => join all subfields
|
374
|
-
# * '630abc' multiple subfields specified = join all subfields
|
375
|
-
# * '633a' one subfield => do not join, return one value for each $a in the field
|
376
|
-
# * '633aa' one subfield, doubled => do join after all, will return a single string joining all the values of all the $a's.
|
377
|
-
#
|
378
|
-
# Last case is handled implicitly at the moment when subfields == ['a', 'a']
|
379
|
-
def joinable?
|
380
|
-
(self.subfields.nil? || self.subfields.size != 1)
|
381
|
-
end
|
382
|
-
|
383
|
-
# Pass in a MARC field, do it's indicators match indicators
|
384
|
-
# in this spec? nil indicators in spec mean we don't care, everything
|
385
|
-
# matches.
|
386
|
-
def matches_indicators?(field)
|
387
|
-
return (self.indicator1.nil? || self.indicator1 == field.indicator1) &&
|
388
|
-
(self.indicator2.nil? || self.indicator2 == field.indicator2)
|
389
|
-
end
|
390
|
-
|
391
|
-
# Pass in a string subfield code like 'a'; does this
|
392
|
-
# spec include it?
|
393
|
-
def includes_subfield_code?(code)
|
394
|
-
# subfields nil means include them all
|
395
|
-
self.subfields.nil? || self.subfields.include?(code)
|
396
|
-
end
|
397
|
-
|
398
|
-
def ==(spec)
|
399
|
-
return false unless spec.kind_of?(Spec)
|
400
|
-
|
401
|
-
return (self.tag == spec.tag) &&
|
402
|
-
(self.subfields == spec.subfields) &&
|
403
|
-
(self.indicator1 == spec.indicator1) &&
|
404
|
-
(self.indicator1 == spec.indicator2) &&
|
405
|
-
(self.bytes == spec.bytes)
|
406
|
-
end
|
407
|
-
end
|
408
|
-
|
409
289
|
end
|
410
290
|
end
|