traject 0.13.1 → 0.13.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -20,7 +20,7 @@ Existing tools for indexing Marc to Solr exist, and have served us well for many
20
20
  logic, should be very easy. More sophisticated and even complex customization use cases should still be possible,
21
21
  changing just the parts of traject you want to change.
22
22
  * *Maintainable local logic*, including supporting sharing of reusable logic via ruby gems.
23
- * *Maintainable understandable internal logic*; well-covered by tests, well-factored seperation of concerns,
23
+ * *Maintainable understandable internal logic*; well-covered by tests, well-factored separation of concerns,
24
24
  easy for newcomer developers who know ruby to understand the codebase.
25
25
  * *High performance*, using multi-threaded concurrency where appropriate to maximize throughput.
26
26
  While it depends on your configuration and the size of your server(s), traject is likely higher
@@ -164,8 +164,12 @@ Other examples of the specification string, which can include multiple tag menti
164
164
 
165
165
  # Instead of joining subfields from the same field
166
166
  # into one string, joined by spaces, leave them
167
- # each in seperate strings:
168
- to_field "isbn", extract_marc("020az", :seperator => nil)
167
+ # each in separate strings:
168
+ to_field "isbn", extract_marc("020az", :separator => nil)
169
+
170
+ # Make sure that you don't get any duplicates
171
+ # by passing in ":deduplicate => true"
172
+ to_field 'language008', extract_marc('008[35-37]', :deduplicate=>true)
169
173
  ~~~
170
174
 
171
175
  The `extract_marc` function *by default* includes any linked
@@ -347,18 +351,22 @@ checking.
347
351
  Use `-u` as a shortcut for `s solr.url=X`
348
352
 
349
353
  traject -c conf_file.rb -u http://example.com/solr marc_file.mrc
354
+
355
+ Run `traject -h` to see the command line help screen listing all available options.
350
356
 
351
357
  Also see `-I load_path` and `-G Gemfile` options under Extending With Your Own Code.
352
358
 
353
359
  See also [Hints for batch and cronjob use](./doc/batch_execution.md) of traject.
354
360
 
361
+
362
+
355
363
  ## Extending With Your Own Code
356
364
 
357
365
  Traject config files are full live ruby files, where you can do anything,
358
366
  including declaring new classes, etc.
359
367
 
360
368
  However, beyond limited trivial logic, you'll want to organize your
361
- code reasonably into seperate files, not jam everything into config
369
+ code reasonably into separate files, not jam everything into config
362
370
  files.
363
371
 
364
372
  Traject wants to make sure it makes it convenient for you to do so,
@@ -132,9 +132,9 @@ the command-line:
132
132
 
133
133
  Or in a traject configuration file, setting the `log.file` configuration setting.
134
134
 
135
- ### Seperate error log
135
+ ### separate error log
136
136
 
137
- You can also seperately have a duplicate log file created with ONLY log messages of
137
+ You can also separately have a duplicate log file created with ONLY log messages of
138
138
  level ERROR and higher (meaning ERROR and FATAL), with the `log.error_file` setting.
139
139
  Then, if there's any lines in this error log file at all, you know something bad
140
140
  happened, maybe your batch process needs to notify someone, or abort further
@@ -41,8 +41,8 @@ for commonly used settings, see `traject -h`.
41
41
  * `log.level`: Log this level and above. Default 'info', set to eg 'debug' to get potentially more logging info,
42
42
  or 'error' to get less. https://github.com/rudionrails/yell/wiki/101-setting-the-log-level
43
43
 
44
- * `log.batch_progress`: If set to a number N (or string representation), will output a progress line to INFO
45
- log, every N records.
44
+ * `log.batch_size`: If set to a number N (or string representation), will output a progress line to INFO
45
+ log, every N records.
46
46
 
47
47
  * `marc_source.type`: default 'binary'. Can also set to 'xml' or (not yet implemented todo) 'json'. Command line shortcut `-t`
48
48
 
@@ -78,9 +78,13 @@ module Traject
78
78
  result =
79
79
  case options[:command]
80
80
  when "process"
81
- indexer.process get_input_io(self.remaining_argv)
81
+ (io, filename) = get_input_io(self.remaining_argv)
82
+ indexer.settings['command_line.filename'] = filename if filename
83
+ indexer.process(io)
82
84
  when "marcout"
83
- command_marcout! get_input_io(self.remaining_argv)
85
+ (io, filename) = get_input_io(self.remaining_argv)
86
+ indexer.settings['command_line.filename'] = filename if filename
87
+ command_marcout!(io)
84
88
  when "commit"
85
89
  command_commit!
86
90
  else
@@ -155,20 +159,23 @@ module Traject
155
159
  #
156
160
  # So for now we do just one file, or stdin if specified. Sorry!
157
161
 
162
+ filename = nil
158
163
  if options[:stdin]
159
- indexer.logger.info "Reading from STDIN..."
164
+ indexer.logger.info("Reading from standard input")
160
165
  io = $stdin
161
166
  elsif argv.length > 1
162
167
  self.console.puts "Sorry, traject can only handle one input file at a time right now. `#{argv}` Exiting..."
163
168
  exit 1
164
169
  elsif argv.length == 0
165
- indexer.logger.warn "Warning, no file input given..."
166
170
  io = File.open(File::NULL, 'r')
171
+ indexer.logger.info("Warning, no file input given. Use command-line argument '--stdin' to use standard input ")
167
172
  else
168
- indexer.logger.info "Reading from #{argv.first}"
169
173
  io = File.open(argv.first, 'r')
174
+ filename = argv.first
175
+ indexer.logger.info "Reading from #{filename}"
170
176
  end
171
- return io
177
+
178
+ return io, filename
172
179
  end
173
180
 
174
181
  def load_configuration_files!(my_indexer, conf_files)
@@ -246,6 +253,12 @@ module Traject
246
253
  if options[:debug]
247
254
  settings["log.level"] = "debug"
248
255
  end
256
+ if options[:'debug-mode']
257
+ require 'traject/debug_writer'
258
+ settings["writer_class_name"] = "Traject::DebugWriter"
259
+ settings["log.level"] = "debug"
260
+ settings["processing_thread_pool"] = 0
261
+ end
249
262
  if options[:writer]
250
263
  settings["writer_class_name"] = options[:writer]
251
264
  end
@@ -291,6 +304,7 @@ module Traject
291
304
  on :x, "command", "alternate traject command: process (default); marcout", :argument => true, :default => "process"
292
305
 
293
306
  on "stdin", "read input from stdin"
307
+ on "debug-mode", "debug logging, single threaded, output human readable hashes"
294
308
  end
295
309
  end
296
310
 
@@ -318,4 +332,4 @@ module Traject
318
332
 
319
333
 
320
334
  end
321
- end
335
+ end
@@ -226,6 +226,7 @@ class Traject::Indexer
226
226
  end
227
227
  end
228
228
  end
229
+ accumulator.compact!
229
230
  (context.output_hash[context.field_name] ||= []).concat accumulator unless accumulator.empty?
230
231
  context.field_name = nil
231
232
 
@@ -264,7 +265,7 @@ class Traject::Indexer
264
265
  def log_mapping_errors(context, index_step, aProc)
265
266
  begin
266
267
  yield
267
- rescue Exception => e
268
+ rescue Exception => e
268
269
  msg = "Unexpected error on record id `#{id_string(context.source_record)}` at file position #{context.position}\n"
269
270
 
270
271
  conf = context.field_name ? "to_field '#{context.field_name}'" : "each_record"
@@ -272,10 +273,14 @@ class Traject::Indexer
272
273
  msg += " while executing #{conf} defined at #{index_step[:source_location]}\n"
273
274
  msg += Traject::Util.exception_to_log_message(e)
274
275
 
275
- logger.error msg
276
- logger.debug "Record: " + context.source_record.to_s
276
+ logger.error msg
277
+ begin
278
+ logger.debug "Record: " + context.source_record.to_s
279
+ rescue Exception => marc_to_s_exception
280
+ logger.debug "(Could not log record, #{marc_to_s_exception})"
281
+ end
277
282
 
278
- raise e
283
+ raise e
279
284
  end
280
285
  end
281
286
 
@@ -293,14 +298,16 @@ class Traject::Indexer
293
298
 
294
299
  count = 0
295
300
  start_time = batch_start_time = Time.now
296
- logger.info "beginning Indexer#process with settings: #{settings.inspect}"
301
+ logger.debug "beginning Indexer#process with settings: #{settings.inspect}"
297
302
 
298
303
  reader = self.reader!(io_stream)
299
304
  writer = self.writer!
300
305
 
301
306
  thread_pool = Traject::ThreadPool.new(settings["processing_thread_pool"].to_i)
302
307
 
303
- logger.info " with reader: #{reader.class.name} and writer: #{writer.class.name}"
308
+ logger.info " Indexer with reader: #{reader.class.name} and writer: #{writer.class.name}"
309
+
310
+ log_batch_size = settings["log.batch_size"] && settings["log.batch_size"].to_i
304
311
 
305
312
  reader.each do |record; position|
306
313
  count += 1
@@ -315,8 +322,8 @@ class Traject::Indexer
315
322
  $stderr.write "." if count % settings["solrj_writer.batch_size"] == 0
316
323
  end
317
324
 
318
- if settings["log.batch_progress"] && (count % settings["log.batch_progress"].to_i == 0)
319
- batch_rps = settings["log.batch_progress"].to_i / (Time.now - batch_start_time)
325
+ if log_batch_size && (count % log_batch_size == 0)
326
+ batch_rps = log_batch_size / (Time.now - batch_start_time)
320
327
  overall_rps = count / (Time.now - start_time)
321
328
  logger.info "Traject::Indexer#process, read #{count} records at id:#{id_string(record)}; #{'%.0f' % batch_rps}/s this batch, #{'%.0f' % overall_rps}/s overall"
322
329
  batch_start_time = Time.now
@@ -29,11 +29,12 @@ module Traject::Macros
29
29
  #
30
30
  # to_field("title"), extract_marc("245abcd", :trim_punctuation => true)
31
31
  # to_field("id"), extract_marc("001", :first => true)
32
- # to_field("geo"), extract_marc("040a", :seperator => nil, :translation_map => "marc040")
32
+ # to_field("geo"), extract_marc("040a", :separator => nil, :translation_map => "marc040")
33
33
  def extract_marc(spec, options = {})
34
34
  only_first = options.delete(:first)
35
35
  trim_punctuation = options.delete(:trim_punctuation)
36
36
  default_value = options.delete(:default)
37
+ deduplicate = options.delete(:deduplicate) || options.delete(:uniq)
37
38
 
38
39
  # We create the TranslationMap and the MarcExtractor here
39
40
  # on load, so the lambda can just refer to already created
@@ -62,10 +63,15 @@ module Traject::Macros
62
63
  if trim_punctuation
63
64
  accumulator.collect! {|s| Marc21.trim_punctuation(s)}
64
65
  end
66
+
67
+ if deduplicate
68
+ accumulator.uniq!
69
+ end
65
70
 
66
71
  if default_value && accumulator.empty?
67
72
  accumulator << default_value
68
73
  end
74
+
69
75
  end
70
76
  end
71
77
 
@@ -117,14 +123,14 @@ module Traject::Macros
117
123
  # options
118
124
  # [:from] default 100, only tags >= lexicographically
119
125
  # [:to] default 899, only tags <= lexicographically
120
- # [:seperator] how to join subfields, default space, nil means don't join
126
+ # [:separator] how to join subfields, default space, nil means don't join
121
127
  #
122
128
  # All fields in from-to must be marc DATA (not control fields), or weirdness
123
129
  #
124
130
  # Can always run this thing multiple times on the same field if you need
125
131
  # non-contiguous ranges of fields.
126
132
  def extract_all_marc_values(options = {})
127
- options = {:from => "100", :to => "899", :seperator => ' '}.merge(options)
133
+ options = {:from => "100", :to => "899", :separator => ' '}.merge(options)
128
134
 
129
135
  lambda do |record, accumulator, context|
130
136
  record.each do |field|
@@ -132,8 +138,8 @@ module Traject::Macros
132
138
  subfield_values = field.subfields.collect {|sf| sf.value}
133
139
  next unless subfield_values.length > 0
134
140
 
135
- if options[:seperator]
136
- accumulator << subfield_values.join( options[:seperator])
141
+ if options[:separator]
142
+ accumulator << subfield_values.join( options[:separator])
137
143
  else
138
144
  accumulator.concat subfield_values
139
145
  end
@@ -14,7 +14,7 @@ module Traject::Macros
14
14
  # Extract OCLC numbers from, by default 035a's by known prefixes, then stripped
15
15
  # just the num, and de-dup.
16
16
  def oclcnum(extract_fields = "035a")
17
- extractor = MarcExtractor.new(extract_fields, :seperator => nil)
17
+ extractor = MarcExtractor.new(extract_fields, :separator => nil)
18
18
 
19
19
  lambda do |record, accumulator|
20
20
  list = extractor.extract(record).collect! do |o|
@@ -118,7 +118,7 @@ module Traject::Macros
118
118
  def marc_languages(spec = "008[35-37]:041a:041d")
119
119
  translation_map = Traject::TranslationMap.new("marc_languages")
120
120
 
121
- extractor = MarcExtractor.new(spec, :seperator => nil)
121
+ extractor = MarcExtractor.new(spec, :separator => nil)
122
122
 
123
123
  lambda do |record, accumulator|
124
124
  codes = extractor.collect_matching_lines(record) do |field, spec, extractor|
@@ -127,7 +127,7 @@ module Traject::Macros
127
127
  else
128
128
  extractor.collect_subfields(field, spec).collect do |value|
129
129
  # sometimes multiple language codes are jammed together in one subfield, and
130
- # we need to seperate ourselves. sigh.
130
+ # we need to separate ourselves. sigh.
131
131
  unless value.length == 3
132
132
  value = value.scan(/.{1,3}/) # split into an array of 3-length substrs
133
133
  end
@@ -162,11 +162,11 @@ module Traject::Macros
162
162
  # Takes marc 048ab instrument code, and translates it to human-displayable
163
163
  # string. Takes first two chars of 048a or b, to translate (ignores numeric code)
164
164
  #
165
- # Pass in custom spec if you want just a or b, to seperate soloists or whatever.
165
+ # Pass in custom spec if you want just a or b, to separate soloists or whatever.
166
166
  def marc_instrumentation_humanized(spec = "048ab", options = {})
167
167
  translation_map = Traject::TranslationMap.new(options[:translation_map] || "marc_instruments")
168
168
 
169
- extractor = MarcExtractor.new(spec, :seperator => nil)
169
+ extractor = MarcExtractor.new(spec, :separator => nil)
170
170
 
171
171
  lambda do |record, accumulator|
172
172
  values = extractor.extract(record)
@@ -189,7 +189,7 @@ module Traject::Macros
189
189
  def marc_instrument_codes_normalized(spec = "048")
190
190
  soloist_suffix = ".s"
191
191
 
192
- extractor = MarcExtractor.new("048", :seperator => nil)
192
+ extractor = MarcExtractor.new("048", :separator => nil)
193
193
 
194
194
  return lambda do |record, accumulator|
195
195
  accumulator.concat(
@@ -286,7 +286,7 @@ module Traject::Macros
286
286
  end
287
287
  # Okay, nothing from 008, try 260
288
288
  if found_date.nil?
289
- v260c = MarcExtractor.cached("260c", :seperator => nil).extract(record).first
289
+ v260c = MarcExtractor.cached("260c", :separator => nil).extract(record).first
290
290
  # just try to take the first four digits out of there, we're not going to try
291
291
  # anything crazy.
292
292
  if v260c =~ /(\d{4})/
@@ -320,7 +320,7 @@ module Traject::Macros
320
320
  default_value = options.has_key?(:default) ? options[:default] : "Unknown"
321
321
  translation_map = Traject::TranslationMap.new("lcc_top_level")
322
322
 
323
- extractor = MarcExtractor.new(spec, :seperator => nil)
323
+ extractor = MarcExtractor.new(spec, :separator => nil)
324
324
 
325
325
  lambda do |record, accumulator|
326
326
  candidates = extractor.extract(record)
@@ -352,8 +352,8 @@ module Traject::Macros
352
352
  a_fields_spec = options[:geo_a_fields] || "651a:691a"
353
353
  z_fields_spec = options[:geo_z_fields] || "600:610:611:630:648:650:654:655:656:690:651:691"
354
354
 
355
- extractor_043a = MarcExtractor.new("043a", :seperator => nil)
356
- extractor_a_fields = MarcExtractor.new(a_fields_spec, :seperator => nil)
355
+ extractor_043a = MarcExtractor.new("043a", :separator => nil)
356
+ extractor_a_fields = MarcExtractor.new(a_fields_spec, :separator => nil)
357
357
  extractor_z_fields = MarcExtractor.new(z_fields_spec)
358
358
 
359
359
  lambda do |record, accumulator|
@@ -403,7 +403,7 @@ module Traject::Macros
403
403
  def marc_era_facet
404
404
  ordinary_fields_spec = "600y:610y:611y:630y:648ay:650y:654y:656y:690y"
405
405
  special_fields_spec = "651:691"
406
- seperator = ": "
406
+ separator = ": "
407
407
 
408
408
  extractor_ordinary_fields = MarcExtractor.new(ordinary_fields_spec)
409
409
  extractor_special_fields = MarcExtractor.new(special_fields_spec)
@@ -423,7 +423,7 @@ module Traject::Macros
423
423
  next unless sf.code == 'y'
424
424
  if sf.value =~ /\A\s*.+,\s+(ca.\s+)?\d\d\d\d?(-\d\d\d\d?)?( B\.C\.)?[.,; ]*\Z/
425
425
  # it's our pattern, add the $a in please
426
- accumulator << "#{field['a']}#{seperator}#{sf.value.sub(/\. *\Z/, '')}"
426
+ accumulator << "#{field['a']}#{separator}#{sf.value.sub(/\. *\Z/, '')}"
427
427
  else
428
428
  accumulator << sf.value.sub(/\. *\Z/, '')
429
429
  end
@@ -7,7 +7,7 @@ module Traject
7
7
  # Examples:
8
8
  #
9
9
  # array_of_stuff = MarcExtractor.new("001:245abc:700a").extract(marc_record)
10
- # values = MarcExtractor.new("040a", :seperator => nil).extract(marc_record)
10
+ # values = MarcExtractor.new("040a", :separator => nil).extract(marc_record)
11
11
  #
12
12
  #
13
13
  # == Note on Performance and MarcExtractor creation and reuse
@@ -46,7 +46,7 @@ module Traject
46
46
  #
47
47
  # options:
48
48
  #
49
- # [:seperator] default ' ' (space), what to use to seperate
49
+ # [:separator] default ' ' (space), what to use to separate
50
50
  # subfield values when joining strings
51
51
  #
52
52
  # [:alternate_script] default :include, include linked 880s for tags
@@ -55,7 +55,7 @@ module Traject
55
55
  # * :only => only include linked 880s, not original
56
56
  def initialize(spec, options = {})
57
57
  self.options = {
58
- :seperator => ' ',
58
+ :separator => ' ',
59
59
  :alternate_script => :include
60
60
  }.merge(options)
61
61
 
@@ -93,7 +93,7 @@ module Traject
93
93
  # although if you try hard enough you can surely find a way to do something
94
94
  # you shouldn't.
95
95
  #
96
- # extractor = MarcExtractor.cached("245abc:700a", :seperator => nil)
96
+ # extractor = MarcExtractor.cached("245abc:700a", :separator => nil)
97
97
  def self.cached(*args)
98
98
  cache = (Thread.current[:marc_extractor_cached] ||= Hash.new)
99
99
  extractor = (cache[args] ||= begin
@@ -118,7 +118,7 @@ module Traject
118
118
  # to represent the specification.
119
119
  #
120
120
  # a String specification is a string (or array of strings) of form:
121
- # {tag}{|indicators|}{subfields} seperated by colons
121
+ # {tag}{|indicators|}{subfields} separated by colons
122
122
  # tag is three chars (usually but not neccesarily numeric),
123
123
  # indicators are optional two chars prefixed by hyphen,
124
124
  # subfields are optional list of chars (alphanumeric)
@@ -239,7 +239,7 @@ module Traject
239
239
  # Pass in a marc data field and a hash spec, returns
240
240
  # an ARRAY of one or more strings, subfields extracted
241
241
  # and processed per spec. Takes account of options such
242
- # as :seperator
242
+ # as :separator
243
243
  #
244
244
  # Always returns array, sometimes empty array.
245
245
  def collect_subfields(field, spec)
@@ -249,7 +249,7 @@ module Traject
249
249
 
250
250
  return subfields if subfields.empty? # empty array, just return it.
251
251
 
252
- return options[:seperator] ? [ subfields.join( options[:seperator]) ] : subfields
252
+ return options[:separator] ? [ subfields.join( options[:separator]) ] : subfields
253
253
  end
254
254
 
255
255
 
@@ -16,8 +16,8 @@ require 'marc'
16
16
  # ["marc_source.type"] serialization type. default 'binary'
17
17
  # * "binary". Actual marc.
18
18
  # * "xml", MarcXML
19
- # * "json". (NOT YET IMPLEMENTED) The "marc-in-json" format, encoded as newline-seperated
20
- # json. A simplistic newline-seperated json, with no comments
19
+ # * "json". (NOT YET IMPLEMENTED) The "marc-in-json" format, encoded as newline-separated
20
+ # json. A simplistic newline-separated json, with no comments
21
21
  # allowed, and no unescpaed internal newlines allowed in the json
22
22
  # objects -- we just read line by line, and assume each line is a
23
23
  # marc-in-json. http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/
@@ -1,6 +1,6 @@
1
1
  # A Null writer that does absolutely nothing with records given to it,
2
2
  # just drops em on the floor.
3
- class Traject::MockWriter
3
+ class Traject::NullWriter
4
4
  attr_reader :settings
5
5
 
6
6
  def initialize(argSettings)
@@ -108,6 +108,8 @@ class Traject::SolrJWriter
108
108
  @thread_pool = Traject::ThreadPool.new( @settings["solrj_writer.thread_pool"].to_i )
109
109
 
110
110
  @debug_ascii_progress = (@settings["debug_ascii_progress"].to_s == "true")
111
+
112
+ logger.info(" SolrJWriter writing to '#{settings['solr.url']}'")
111
113
  end
112
114
 
113
115
  # Loads solrj if not already loaded. By loading all jars found
@@ -1,3 +1,3 @@
1
1
  module Traject
2
- VERSION = "0.13.1"
2
+ VERSION = "0.13.2"
3
3
  end
@@ -56,6 +56,26 @@ describe "Traject::Macros::Marc21" do
56
56
 
57
57
  assert_equal ["DEFAULT VALUE"], output["only_default"]
58
58
  end
59
+
60
+ it "respects the :deduplicate option (and its alias 'uniq')" do
61
+ # Add a second 008
62
+ f = @record.fields('008').first
63
+ @record.append(f)
64
+
65
+ @indexer.instance_eval do
66
+ to_field "lang1", extract_marc('008[35-37]')
67
+ to_field "lang2", extract_marc('008[35-37]', :deduplicate=>true)
68
+ to_field "lang3", extract_marc('008[35-37]', :uniq=>true)
69
+ end
70
+
71
+ output = @indexer.map_record(@record)
72
+ assert_equal ["eng", "eng"], output['lang1']
73
+ assert_equal ["eng"], output['lang2']
74
+ assert_equal ["eng"], output['lang3']
75
+
76
+ end
77
+
78
+
59
79
 
60
80
  it "Marc21::trim_punctuation class method" do
61
81
  assert_equal "one two three", Marc21.trim_punctuation("one two three")
@@ -75,7 +95,7 @@ describe "Traject::Macros::Marc21" do
75
95
 
76
96
  it "uses :translation_map" do
77
97
  @indexer.instance_eval do
78
- to_field "cataloging_agency", extract_marc("040a", :seperator => nil, :translation_map => "marc_040a_translate_test")
98
+ to_field "cataloging_agency", extract_marc("040a", :separator => nil, :translation_map => "marc_040a_translate_test")
79
99
  end
80
100
  output = @indexer.map_record(@record)
81
101
 
@@ -173,17 +173,17 @@ describe "Traject::MarcExtractor" do
173
173
  end
174
174
  end
175
175
 
176
- describe "seperator argument" do
176
+ describe "separator argument" do
177
177
  it "causes non-join when nil" do
178
178
  parsed_spec = Traject::MarcExtractor.parse_string_spec("245")
179
- values = Traject::MarcExtractor.new(parsed_spec, :seperator => nil).extract(@record)
179
+ values = Traject::MarcExtractor.new(parsed_spec, :separator => nil).extract(@record)
180
180
 
181
181
  assert_length 3, values
182
182
  end
183
183
 
184
184
  it "can be non-default" do
185
185
  parsed_spec = Traject::MarcExtractor.parse_string_spec("245")
186
- values = Traject::MarcExtractor.new(parsed_spec, :seperator => "!! ").extract(@record)
186
+ values = Traject::MarcExtractor.new(parsed_spec, :separator => "!! ").extract(@record)
187
187
 
188
188
  assert_length 1, values
189
189
  assert_equal "Manufacturing consent :!! the political economy of the mass media /!! Edward S. Herman and Noam Chomsky ; with a new introduction by the authors.", values.first
@@ -288,13 +288,13 @@ describe "Traject::MarcExtractor" do
288
288
 
289
289
  describe "MarcExtractor.cached" do
290
290
  it "creates" do
291
- ext = Traject::MarcExtractor.cached("245abc", :seperator => nil)
291
+ ext = Traject::MarcExtractor.cached("245abc", :separator => nil)
292
292
  assert_equal({"245"=>{:subfields=>["a", "b", "c"]}}, ext.spec_hash)
293
- assert ext.options[:seperator].nil?, "extractor options[:seperator] is nil"
293
+ assert ext.options[:separator].nil?, "extractor options[:separator] is nil"
294
294
  end
295
295
  it "caches" do
296
- ext1 = Traject::MarcExtractor.cached("245abc", :seperator => nil)
297
- ext2 = Traject::MarcExtractor.cached("245abc", :seperator => nil)
296
+ ext1 = Traject::MarcExtractor.cached("245abc", :separator => nil)
297
+ ext2 = Traject::MarcExtractor.cached("245abc", :separator => nil)
298
298
 
299
299
  assert_same ext1, ext2
300
300
  end
@@ -50,7 +50,7 @@ to_field "format", marc_formats
50
50
  to_field "isbn_t", extract_marc("020a:773z:776z:534z:556z")
51
51
  to_field "lccn", extract_marc("010a")
52
52
 
53
- to_field "material_type_display", extract_marc("300a", :seperator => nil, :trim_punctuation => true)
53
+ to_field "material_type_display", extract_marc("300a", :separator => nil, :trim_punctuation => true)
54
54
 
55
55
  to_field "title_t", extract_marc("245ak")
56
56
  to_field "title1_t", extract_marc("245abk")
@@ -107,7 +107,7 @@ to_field "pub_date", marc_publication_date
107
107
  # call numbers.
108
108
  lcc_map = Traject::TranslationMap.new("lcc_top_level")
109
109
  holdings_extractor = Traject::MarcExtractor.new("991:937")
110
- sudoc_extractor = Traject::MarcExtractor.new("086a", :seperator =>nil)
110
+ sudoc_extractor = Traject::MarcExtractor.new("086a", :separator =>nil)
111
111
 
112
112
  to_field "discipline_facet", marc_lcc_to_broad_category(:default => nil) do |record, accumulator|
113
113
  # add in our local call numbers
@@ -147,8 +147,8 @@ end
147
147
  to_field "instrumentation_facet", marc_instrumentation_humanized
148
148
  to_field "instrumentation_code_unstem", marc_instrument_codes_normalized
149
149
 
150
- to_field "issn", extract_marc("022a:022l:022y:773x:774x:776x", :seperator => nil)
151
- to_field "issn_related", extract_marc("490x:440x:800x:400x:410x:411x:810x:811x:830x:700x:710x:711x:730x:780x:785x:777x:543x:760x:762x:765x:767x:770x:772x:775x:786x:787x", :seperator => nil)
150
+ to_field "issn", extract_marc("022a:022l:022y:773x:774x:776x", :separator => nil)
151
+ to_field "issn_related", extract_marc("490x:440x:800x:400x:410x:411x:810x:811x:830x:700x:710x:711x:730x:780x:785x:777x:543x:760x:762x:765x:767x:770x:772x:775x:786x:787x", :separator => nil)
152
152
 
153
153
  to_field "oclcnum_t", oclcnum
154
154
 
metadata CHANGED
@@ -2,14 +2,14 @@
2
2
  name: traject
3
3
  version: !ruby/object:Gem::Version
4
4
  prerelease:
5
- version: 0.13.1
5
+ version: 0.13.2
6
6
  platform: ruby
7
7
  authors:
8
8
  - Jonathan Rochkind
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-09-16 00:00:00.000000000 Z
12
+ date: 2013-09-23 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: marc
@@ -194,7 +194,7 @@ files:
194
194
  - lib/traject/marc_extractor.rb
195
195
  - lib/traject/marc_reader.rb
196
196
  - lib/traject/mock_reader.rb
197
- - lib/traject/mock_writer.rb
197
+ - lib/traject/null_writer.rb
198
198
  - lib/traject/qualified_const_get.rb
199
199
  - lib/traject/solrj_writer.rb
200
200
  - lib/traject/thread_pool.rb
@@ -244,6 +244,7 @@ files:
244
244
  - test/test_support/packed_041a_lang.marc
245
245
  - test/test_support/test_data.utf8.marc.xml
246
246
  - test/test_support/test_data.utf8.mrc
247
+ - test/test_support/test_data.utf8.mrc.gz
247
248
  - test/test_support/the_business_ren.marc
248
249
  - test/translation_map_test.rb
249
250
  - test/translation_maps/bad_ruby.rb
@@ -345,6 +346,7 @@ test_files:
345
346
  - test/test_support/packed_041a_lang.marc
346
347
  - test/test_support/test_data.utf8.marc.xml
347
348
  - test/test_support/test_data.utf8.mrc
349
+ - test/test_support/test_data.utf8.mrc.gz
348
350
  - test/test_support/the_business_ren.marc
349
351
  - test/translation_map_test.rb
350
352
  - test/translation_maps/bad_ruby.rb