RubyGems - traject - Versions diffs - 3.0.0 → 3.1.0.rc1 - Mend

traject 3.0.0 → 3.1.0.rc1

Files changed (22) hide show

checksums.yaml +4 -4
data/.travis.yml +2 -4
data/CHANGES.md +30 -0
data/README.md +7 -4
data/doc/indexing_rules.md +5 -6
data/doc/programmatic_use.md +25 -1
data/doc/settings.md +2 -0
data/doc/xml.md +2 -0
data/lib/traject/indexer.rb +32 -4
data/lib/traject/indexer/context.rb +45 -0
data/lib/traject/indexer/step.rb +8 -12
data/lib/traject/line_writer.rb +36 -4
data/lib/traject/nokogiri_reader.rb +9 -18
data/lib/traject/solr_json_writer.rb +136 -21
data/lib/traject/version.rb +1 -1
data/test/indexer/class_level_configuration_test.rb +104 -0
data/test/indexer/context_test.rb +64 -1
data/test/indexer/error_handler_test.rb +18 -0
data/test/nokogiri_reader_test.rb +56 -3
data/test/solr_json_writer_test.rb +145 -7
data/traject.gemspec +2 -2
metadata +17 -9

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: cf92e5467d32d37b681a36ae1ffbd2995bbf3e0def938b13d74831a939b68632
-  data.tar.gz: 7c4693ded4a9a8b0e9c599e7489aaefdf9806dfffce6b20ae6054def9ba8c156
+  metadata.gz: 06c28d37f9aafafe709a146c7612e5b5d8a5c58a61fd1502823a38dc52b9d05b
+  data.tar.gz: 2e38b2b8c4030456f3757ae6062231268110d68ef07e10cab722b4074ccd570c
 SHA512:
-  metadata.gz: 9e12113a6f53aa9c7629c072df80b1e347f432d069bd30dbb35d73373fccc3fa341682b281a65c778aa2a3eae9fb7b2d52c81c2f39aa17d348074ecb8b9c2512
-  data.tar.gz: 6f2294bce5deb181a20db0977f8ab7e73e8e1cda6e86d8ee562fabd7a8cce2c683011be8f3955ccafd0165787dbaf774e7c3571220f5d6e797eaf6fe8a02577d
+  metadata.gz: 04561a77a3e6f2073198983b5bf7d4e35cc9f52bccc1211487cc4c850b0f0b0fc9395a7c87e6ed90061f4a15af57516434d260c649fbc43ea65a0c6435194818
+  data.tar.gz: c7312156c3be556218e319e35ae76aa97fbae5fad6720dbce2e4a046ec90603f5de34fe2cb055425fb3da499922fba50c7d4a6445858793bb0a4fb26cf8f7b29

data/.travis.yml CHANGED

@@ -6,13 +6,11 @@ sudo: true
 rvm:
   - 2.4.4
   - 2.5.1
-  - "2.6.0-preview2"
+  - 2.6.1
 # avoid having travis install jdk on MRI builds where we don't need it.
 matrix:
   include:
     - jdk: openjdk8
       rvm: jruby-9.1.17.0
     - jdk: openjdk8
-      rvm: jruby-9.2.0.0
-  allow_failures:
-    - rvm: "2.6.0-preview2"
+      rvm: jruby-9.2.6.0

data/CHANGES.md CHANGED

@@ -1,5 +1,35 @@
 # Changes
+## 3.1.0
+### Added
+* Context#add_output is added, convenient for custom ruby code.
+        each_record do |record, context|
+           context.add_output "key", something_from(record)
+        end
+  https://github.com/traject/traject/pull/220
+* SolrJsonWriter
+  * Class-level indexer configuration, for custom indexer subclasses, now available with class-level `configure` method. Warning, Indexers are still expensive to instantiate though. https://github.com/traject/traject/pull/213
+  * SolrJsonWriter has new settings to control commit semantics. `solr_writer.solr_update_args` and `solr_writer.commit_solr_update_args`, both have hash values that are Solr update handler query params. https://github.com/traject/traject/pull/215
+  * SolrJsonWriter has a `delete(solr-unique-key)` method. Does not currently use any batching or threading. https://github.com/traject/traject/pull/214
+  * SolrJsonWriter, when MaxSkippedRecordsExceeded is raised, it will have a #cause that is the last error, which resulted in MaxSkippedRecordsExceeded. Some error reporting systems, including Rails, will automatically log #cause, so that's helpful. https://github.com/traject/traject/pull/216
+  * SolrJsonWriter now respects a `solr_writer.http_timeout` setting, in seconds, to be passed to HTTPClient instance. https://github.com/traject/traject/pull/219
+* Nokogiri dependency for the NokogiriReader increased to `~> 1.9`. When using Jruby `each_record_xpath`, resulting yielded documents may have xmlns declarations on different nodes than in MRI (and previous versions of nokogiri), but we could find now way around this with nokogiri >= 1.9.0. The documents should still be semantically equivalent for namespace use. This was necessary to keep JRuby Nokogiri XML working with recent Nokogiri releases.  https://github.com/traject/traject/pull/209
+* LineWriter guesses better about when to auto-close, and provides an optional explicit setting in case it guesses wrong. (thanks @justinlittman) https://github.com/traject/traject/pull/211
+* Traject::Indexer will now use a Logger(-compatible) instance passed in in setting 'logger' https://github.com/traject/traject/pull/217
 ## 3.0.0
 ### Changed/Backwards Incompatibilities

data/README.md CHANGED

@@ -19,7 +19,7 @@ Initially by Jonathan Rochkind (Johns Hopkins Libraries) and Bill Dueber (Univer
 * Basic configuration files can be easily written even by non-rubyists,  with a few simple directives traject provides. But config files are 'ruby all the way down', so we can provide a gradual slope to more complex needs, with the full power of ruby.
 * Easy to program, easy to read, easy to modify.
 * Fast. Traject by default indexes using multiple threads, on multiple cpu cores, when the underlying ruby implementation (i.e., JRuby) allows it, and can use a separate thread for communication with solr even under MRI. Traject is intended to be usable to process millions of records.
-* Composed of decoupled components, for flexibility and extensibility.
+* Composed of decoupled components, for flexibility and extensibility.f?
 * Designed to support local code and configuration that's maintainable and testable, and can be shared between projects as ruby gems.
 * Easy to split configuration between multiple files, for simple "pick-and-choose" command line options that can combine to deal with any of your local needs.
@@ -135,7 +135,7 @@ For the syntax and complete possibilities of the specification string argument t
 To see all options for `extract_marc`, see the [extract_marc](http://rdoc.info/gems/traject/Traject/Macros/Marc21:extract_marc) method documentation.
-### XML mode, extract_xml
+### XML mode, extract_xpath
 See our [xml guide](./doc/xml.md) for more XML examples, but you will usually use extract_xpath.
@@ -311,12 +311,15 @@ like `to_field`, is executed for every record, but without being tied
 to a specific output field.
 `each_record` can be used for logging or notifiying, computing intermediate
-results, or writing to more than one field at once.
+results, or more complex ruby logic.
 ~~~ruby
   each_record do |record|
     some_custom_logging(record)
   end
+  each_record do |record, context|
+    context.add_output(:some_value, extract_some_value_from_record(record))
+  end
 ~~~
 For more on `each_record`, see [Indexing Rules: Macros and Custom Logic](./doc/indexing_rules.md).
@@ -405,7 +408,7 @@ writer class in question.
 ## The traject command Line
-(If you are interested in running traject in an embedded/programmatic context instead of as a standalone command-line batch process, please see docs on [Programmatic Use](./docs/programmatic_use.md) )
+(If you are interested in running traject in an embedded/programmatic context instead of as a standalone command-line batch process, please see docs on [Programmatic Use](./doc/programmatic_use.md) )
 The simplest invocation is:

data/doc/indexing_rules.md CHANGED

@@ -247,13 +247,12 @@ each_record do |record, context|
 end
 each_record do |record, context|
-  (val1, val2) = calculate_two_things_from(record)
+  if eligible_for_things?(record)
+    (val1, val2) = calculate_two_things_from(record)
-  context.output_hash["first_field"] ||= []
-  context.output_hash["first_field"] << val1
-  context.output_hash["second_field"] ||= []
-  context.output_hash["second_field"] << val2
+    context.add_output("first_field", val1)
+    context.add_output("second_field", val2)
+  end
 end
 ~~~

data/doc/programmatic_use.md CHANGED

@@ -48,6 +48,30 @@ indexer = Traject::Indexer.new(settings) do
 end
 ```
+### Configuring indexer subclasses
+Indexing step configuration is historically done in traject at the indexer _instance_ level. Either programmatically or by applying a "configuration file" to an indexer instance.
+But you can also define your own indexer sub-class with indexing steps built-in, using the class-level `configure` method.
+This is an EXPERIMENTAL feature, implementation may change. https://github.com/traject/traject/pull/213
+```ruby
+class MyIndexer < Traject::Indexer
+  configure do
+    settings do
+      provide "solr.url", Rails.application.config.my_solr_url
+    end
+    to_field "our_name", literal("University of Whatever")
+  end
+end
+```
+These setting and indexing steps are now "hard-coded" into that subclass. You can still provide additional configuration at the instance level, as normal. You can also make a subclass of that `MyIndexer` class, that will inherit configuration from MyIndexer, and can supply it's own additional class-level configuration too.
+Note that due to how implementation is done, instantiating an indexer is still _relatively_ expensive. (Class-level configuration is only actually executed on instantiation). You will still get better performance by re-using a global instance of your indexer subclass, instead of, say, instantiating one per object to be indexed.
 ## Running the indexer
 ### process: probably not what you want
@@ -157,7 +181,7 @@ You may want to consider instead creating one or more configured "global" indexe
 * Readers, and the Indexer#process method, are not thread-safe. Which is why using Indexer#process, which uses a fixed reader, is not threads-safe, and why when sharing a global idnexer we want to use `process_record`, `map_record`, or `process_with` as above.
-It ought to be safe to use a global Indexer concurrently in several threads, with the `map_record`, `process_record` or `process_with` methods -- so long as your indexing rules and writers are thread-safe, as they usually will be and always ought to be.
+It ought to be safe to use a global Indexer concurrently in several threads, with the `map_record`, `process_record` or `process_with` methods -- so long as your indexing rules and writers are thread-safe, as they usually will be and always ought to be.
 ### An example

data/doc/settings.md CHANGED

@@ -119,6 +119,8 @@ settings are applied first of all. It's recommended you use `provide`.
 * `log.batch_size.severity`: If `log.batch_size` is set, what logger severity level to log to. Default "INFO", set to "DEBUG" etc if desired.
+* 'logger': Ignore all the other logger settings, just pass a `Logger` compatible logger instance in directly.

data/doc/xml.md CHANGED

@@ -133,6 +133,8 @@ The NokogiriReader parser should be relatively performant though, allowing you t
 (There is a half-finished `ExperimentalStreamingNokogiriReader` available, but it is experimental, half-finished, may disappear or change in backwards compat at any time, problematic, not recommended for production use, etc.)
+Note also that in Jruby, when using `each_record_xpath` with the NokogiriReader, the extracted individual documents may have xmlns declerations in different places than you may expect, although they will still be semantically equivalent for namespace processing. This is due to Nokogiri JRuby implementation, and we could find no good way to ensure consistent behavior with MRI. See: https://github.com/sparklemotion/nokogiri/issues/1875
 ### Jruby
 It may be that nokogiri JRuby is just much slower than nokogiri MRI (at least when namespaces are involved?)  It may be that our workaround to a [JRuby bug involving namespaces on moving nodes](https://github.com/sparklemotion/nokogiri/issues/1774) doesn't help.

data/lib/traject/indexer.rb CHANGED

@@ -180,6 +180,7 @@ class Traject::Indexer
     @index_steps            = []
     @after_processing_steps = []
+    self.class.apply_class_configure_block(self)
     instance_eval(&block) if block
   end
@@ -189,6 +190,30 @@ class Traject::Indexer
     instance_eval(&block)
   end
+  ## Class level configure block accepted too, and applied at instantiation
+  #  before instance-level configuration.
+  #
+  #  EXPERIMENTAL, implementation may change in ways that effect some uses.
+  #  https://github.com/traject/traject/pull/213
+  #
+  #  Note that settings set by 'provide' in subclass can not really be overridden
+  #  by 'provide' in a next level subclass. Use self.default_settings instead, with
+  #  call to super.
+  def self.configure(&block)
+    @class_configure_block = block
+  end
+  def self.apply_class_configure_block(instance)
+    # Make sure we inherit from superclass that has a class-level ivar @class_configure_block
+    if self.superclass.respond_to?(:apply_class_configure_block)
+      self.superclass.apply_class_configure_block(instance)
+    end
+    if @class_configure_block
+      instance.configure(&@class_configure_block)
+    end
+  end
   # Pass a string file path, a Pathname, or a File object, for
   # a config file to load into indexer.
@@ -258,10 +283,9 @@ class Traject::Indexer
         "log.batch_size.severity" => "info",
         # how to post-process the accumulator
-        "allow_nil_values"        => false,
-        "allow_duplicate_values"  => true,
-        "allow_empty_fields"      => false
+        Traject::Indexer::ToFieldStep::ALLOW_NIL_VALUES => false,
+        Traject::Indexer::ToFieldStep::ALLOW_DUPLICATE_VALUES  => true,
+        Traject::Indexer::ToFieldStep::ALLOW_EMPTY_FIELDS => false
     }.freeze
   end
@@ -349,6 +373,10 @@ class Traject::Indexer
   # Create logger according to settings
   def create_logger
+    if settings["logger"]
+      # none of the other settings matter, we just got a logger
+      return settings["logger"]
+    end
     logger_level  = settings["log.level"] || "info"

data/lib/traject/indexer/context.rb CHANGED

@@ -82,6 +82,51 @@ class Traject::Indexer
       str
     end
+    # Add values to an array in context.output_hash with the specified key/field_name(s).
+    # Creates array in output_hash if currently nil.
+    #
+    # Post-processing/filtering:
+    #
+    # * uniqs accumulator, unless settings["allow_dupicate_values"] is set.
+    # * Removes nil values unless settings["allow_nil_values"] is set.
+    # * Will not add an empty array to output_hash (will leave it nil instead)
+    #   unless settings["allow_empty_fields"] is set.
+    #
+    # Multiple values can be added with multiple arguments (we avoid an array argument meaning
+    # multiple values to accomodate odd use cases where array itself is desired in output_hash value)
+    #
+    # @param field_name [String,Symbol,Array<String>,Array[<Symbol>]] A key to set in output_hash, or
+    #   an array of such keys.
+    #
+    # @example add one value
+    #   context.add_output(:additional_title, "a title")
+    #
+    # @example add multiple values as multiple params
+    #   context.add_output("additional_title", "a title", "another title")
+    #
+    # @example add multiple values as multiple params from array using ruby spread operator
+    #   context.add_output(:some_key, *array_of_values)
+    #
+    # @example add to multiple keys in output hash
+    #   context.add_output(["key1", "key2"], "value")
+    #
+    # @return [Traject::Context] self
+    #
+    # Note for historical reasons relevant settings key *names* are in constants in Traject::Indexer::ToFieldStep,
+    # but the settings don't just apply to ToFieldSteps
+    def add_output(field_name, *values)
+      values.compact! unless self.settings && self.settings[Traject::Indexer::ToFieldStep::ALLOW_NIL_VALUES]
+      return self if values.empty? and not (self.settings && self.settings[Traject::Indexer::ToFieldStep::ALLOW_EMPTY_FIELDS])
+      Array(field_name).each do |key|
+        accumulator = (self.output_hash[key.to_s] ||= [])
+        accumulator.concat values
+        accumulator.uniq! unless self.settings && self.settings[Traject::Indexer::ToFieldStep::ALLOW_DUPLICATE_VALUES]
+      end
+      return self
+    end
   end

data/lib/traject/indexer/step.rb CHANGED

@@ -145,24 +145,20 @@ class Traject::Indexer
       return accumulator
     end
-    # Add the accumulator to the context with the correct field name
-    # Do post-processing on the accumulator (remove nil values, allow empty
-    # fields, etc)
+    # These constqnts here for historical/legacy reasons, they really oughta
+    # live in Traject::Context, but in case anyone is referring to them
+    # we'll leave them here for now.
     ALLOW_NIL_VALUES       = "allow_nil_values".freeze
     ALLOW_EMPTY_FIELDS     = "allow_empty_fields".freeze
     ALLOW_DUPLICATE_VALUES = "allow_duplicate_values".freeze
+    # Add the accumulator to the context with the correct field name(s).
+    # Do post-processing on the accumulator (remove nil values, allow empty
+    # fields, etc)
     def add_accumulator_to_context!(accumulator, context)
-      accumulator.compact! unless context.settings[ALLOW_NIL_VALUES]
-      return if accumulator.empty? and not (context.settings[ALLOW_EMPTY_FIELDS])
       # field_name can actually be an array of field names
-      Array(field_name).each do |a_field_name|
-        context.output_hash[a_field_name] ||= []
-        existing_accumulator = context.output_hash[a_field_name].concat(accumulator)
-        existing_accumulator.uniq! unless context.settings[ALLOW_DUPLICATE_VALUES]
-      end
+      context.add_output(field_name, *accumulator)
     end
   end

data/lib/traject/line_writer.rb CHANGED

@@ -8,12 +8,35 @@ require 'thread'
 # This does not seem to effect performance much, as far as I could tell
 # benchmarking.
 #
-# Output will be sent to `settings["output_file"]` string path, or else
-# `settings["output_stream"]` (ruby IO object), or else stdout.
-#
 # This class can be sub-classed to write out different serialized
 # reprentations -- subclasses will just override the #serialize
 # method. For instance, see JsonWriter.
+#
+# ## Output
+#
+# The main functionality this class provides is logic for choosing based on
+# settings what file or bytestream to send output to.
+#
+# You can supply `settings["output_file"]` with a _file path_. LineWriter
+# will open up a `File` to write to.
+#
+# Or you can supply `settings["output_stream"]` with any ruby IO object, such an
+# open `File` object or anything else.
+#
+# If neither are supplied, will write to `$stdout`.
+#
+# ## Closing the output stream
+#
+# The LineWriter tries to guess on whether it should call `close` on the output
+# stream it's writing to, when the LineWriter instance is closed. For instance,
+# if you passed in a `settings["output_file"]` with a path, and the LineWriter
+# opened up a `File` object for you, it should close it for you.
+#
+# But for historical reasons, LineWriter doesn't just use that signal, but tries
+# to guess generally on when to call close. If for some reason it gets it wrong,
+# just use `settings["close_output_on_close"]` set to `true` or `false`.
+# (String `"true"` or `"false"` are also acceptable, for convenience in setting
+# options on command line)
 class Traject::LineWriter
   attr_reader :settings
   attr_reader :write_mutex, :output_file
@@ -57,7 +80,16 @@ class Traject::LineWriter
   end
   def close
-    @output_file.close unless (@output_file.nil? || @output_file.tty?)
+    @output_file.close if should_close_stream?
+  end
+  def should_close_stream?
+    if settings["close_output_on_close"].nil?
+      (@output_file.nil? || @output_file.tty? || @output_file == $stdout || $output_file == $stderr)
+    else
+      settings["close_output_on_close"].to_s == "true"
+    end
   end
 end

data/lib/traject/nokogiri_reader.rb CHANGED

@@ -118,35 +118,26 @@ module Traject
     private
-    # In MRI Nokogiri, this is as simple as `new_parent_doc.root = node`
+    # We simply do `new_parent_doc.root = node`
     # It seemed maybe safer to dup the node as well as remove the original from the original doc,
     # but I believe this will result in double memory usage, as unlinked nodes aren't GC'd until
     # their doc is.  I am hoping this pattern results in less memory usage.
     # https://github.com/sparklemotion/nokogiri/issues/1703
     #
-    # However, in JRuby it's a different story, JRuby doesn't properly preserve namespaces
-    # when re-parenting a node.
+    # We used to have to do something different in Jruby to work around bug:
     # https://github.com/sparklemotion/nokogiri/issues/1774
     #
-    # The nodes within the tree re-parented _know_ they are in the correct namespaces,
-    # and xpath queries require that namespace, but the appropriate xmlns attributes
-    # aren't included in the serialized XML. This JRuby-specific code seems to get
-    # things back to a consistent state.
+    # But as of nokogiri 1.9, that does not work, and is not necessary if we accept
+    # that Jruby nokogiri may put xmlns declerations on different elements than MRI,
+    # although it should be semantically equivalent for a namespace-aware parser.
+    # https://github.com/sparklemotion/nokogiri/issues/1875
+    #
+    # This as a separate method now exists largely as a historical artifact, and for this
+    # documentation.
     def reparent_node_to_root(new_parent_doc, node)
-      if Traject::Util.is_jruby?
-        original_ns_scopes = node.namespace_scopes
-      end
       new_parent_doc.root = node
-      if Traject::Util.is_jruby?
-        original_ns_scopes.each do |ns|
-          if new_parent_doc.at_xpath("//#{ns.prefix}:*", ns.prefix => ns.href)
-            new_parent_doc.root.add_namespace(ns.prefix, ns.href)
-          end
-        end
-      end
       return new_parent_doc
     end

data/lib/traject/solr_json_writer.rb CHANGED

@@ -16,7 +16,30 @@ require 'concurrent' # for atomic_fixnum
 # This should work under both MRI and JRuby, with JRuby getting much
 # better performance due to the threading model.
 #
-# Relevant settings
+# Solr updates are by default sent with no commit params. This will definitely
+# maximize your performance, and *especially* for bulk/batch indexing is recommended --
+# use Solr auto commit in your Solr configuration instead, possibly with `commit_on_close`
+# setting here.
+#
+# However, if you want the writer to send `commitWithin=true`, `commit=true`,
+# `softCommit=true`, or any other URL parameters valid for Solr update handlers,
+# you can configure this with `solr_writer.solr_update_args` setting. See:
+# https://lucene.apache.org/solr/guide/7_0/near-real-time-searching.html#passing-commit-and-commitwithin-parameters-as-part-of-the-url
+# Eg:
+#
+#     settings do
+#       provide "solr_writer.solr_update_args", { commitWithin: 1000 }
+#     end
+#
+#  (That it's a hash makes it infeasible to set/override on command line, if this is
+#  annoying for you let us know)
+#
+#  `solr_update_args` will apply to batch and individual update requests, but
+#  not to commit sent if `commit_on_close`. You can also instead set
+#   `solr_writer.solr_commit_args` for that (or pass in an arg to #commit if calling
+#   manually)
+#
+# ## Relevant settings
 #
 # * solr.url (optional if solr.update_url is set) The URL to the solr core to index into
 #
@@ -35,19 +58,32 @@ require 'concurrent' # for atomic_fixnum
 #
 # * solr_writer.skippable_exceptions: List of classes that will be rescued internal to
 #   SolrJsonWriter, and handled with max_skipped logic. Defaults to
-#   `[HTTPClient::TimeoutError, SocketError, Errno::ECONNREFUSED]`
+#   `[HTTPClient::TimeoutError, SocketError, Errno::ECONNREFUSED, Traject::SolrJsonWriter::BadHttpResponse]`
+#
+# * solr_writer.solr_update_args: A _hash_ of query params to send to solr update url.
+#   Will be sent with every update request. Eg `{ softCommit: true }` or `{ commitWithin: 1000 }`.
+#   See also `solr_writer.solr_commit_args`
 #
 # * solr_writer.commit_on_close: Set to true (or "true") if you want to commit at the
 #   end of the indexing run. (Old "solrj_writer.commit_on_close" supported for backwards
 #   compat only.)
 #
+# * solr_writer.commit_solr_update_args: A hash of query params to send when committing.
+#   Will be used for automatic `close_on_commit`, as well as any manual calls to #commit.
+#   If set, must include {"commit" => "true"} or { "softCommit" => "true" } if you actually
+#   want commits to happen when SolrJsonWriter tries to commit! But can be used to switch to softCommits
+#   (hard commits default), or specify additional params like optimize etc.
+#
+# * solr_writer.http_timeout: Value in seconds, will be set on the httpclient as connect/receive/send
+#   timeout. No way to set them individually at present. Default nil, use HTTPClient defaults
+#   (60 for connect/recieve, 120 for send).
+#
 # * solr_writer.commit_timeout: If commit_on_close, how long to wait for Solr before
-#   giving up as a timeout. Default 10 minutes. Solr can be slow.
+#   giving up as a timeout (http client receive_timeout). Default 10 minutes. Solr can be slow at commits. Overrides solr_writer.timeout
 #
 # * solr_json_writer.http_client Mainly intended for testing, set your own HTTPClient
 #   or mock object to be used for HTTP.
+#
 class Traject::SolrJsonWriter
   include Traject::QualifiedConstGet
@@ -71,7 +107,15 @@ class Traject::SolrJsonWriter
       @max_skipped = nil
     end
-    @http_client = @settings["solr_json_writer.http_client"] || HTTPClient.new
+    @http_client = if @settings["solr_json_writer.http_client"]
+      @settings["solr_json_writer.http_client"]
+    else
+      client = HTTPClient.new
+      if @settings["solr_writer.http_timeout"]
+        client.connect_timeout = client.receive_timeout = client.send_timeout = @settings["solr_writer.http_timeout"]
+      end
+      client
+    end
     @batch_size = (settings["solr_writer.batch_size"] || DEFAULT_BATCH_SIZE).to_i
     @batch_size = 1 if @batch_size < 1
@@ -96,6 +140,9 @@ class Traject::SolrJsonWriter
     # Figure out where to send updates
     @solr_update_url = self.determine_solr_update_url
+    @solr_update_args = settings["solr_writer.solr_update_args"]
+    @commit_solr_update_args = settings["solr_writer.commit_solr_update_args"]
     logger.info("   #{self.class.name} writing to '#{@solr_update_url}' in batches of #{@batch_size} with #{@thread_pool_size} bg threads")
   end
@@ -123,14 +170,25 @@ class Traject::SolrJsonWriter
     send_batch( Traject::Util.drain_queue(@batched_queue) )
   end
+  # configured update url, with either settings @solr_update_args or passed in
+  # query_params added to it
+  def solr_update_url_with_query(query_params)
+    if query_params
+      @solr_update_url + '?' + URI.encode_www_form(query_params)
+    else
+      @solr_update_url
+    end
+  end
   # Send the given batch of contexts. If something goes wrong, send
   # them one at a time.
   # @param [Array<Traject::Indexer::Context>] an array of contexts
   def send_batch(batch)
     return if batch.empty?
     json_package = JSON.generate(batch.map { |c| c.output_hash })
     begin
-      resp = @http_client.post @solr_update_url, json_package, "Content-type" => "application/json"
+      resp = @http_client.post solr_update_url_with_query(@solr_update_args), json_package, "Content-type" => "application/json"
     rescue StandardError => exception
     end
@@ -153,30 +211,55 @@ class Traject::SolrJsonWriter
   def send_single(c)
     json_package = JSON.generate([c.output_hash])
     begin
-      resp = @http_client.post @solr_update_url, json_package, "Content-type" => "application/json"
-      # Catch Timeouts and network errors as skipped records, but otherwise
-      # allow unexpected errors to propagate up.
-    rescue *skippable_exceptions => exception
-      # no body, local variable exception set above will be used below
-    end
+      resp = @http_client.post solr_update_url_with_query(@solr_update_args), json_package, "Content-type" => "application/json"
-    if exception || resp.status != 200
-      if exception
-        msg = Traject::Util.exception_to_log_message(exception)
+      unless resp.status == 200
+        raise BadHttpResponse.new("Unexpected HTTP response status #{resp.status}", resp)
+      end
+      # Catch Timeouts and network errors -- as well as non-200 http responses --
+      # as skipped records, but otherwise allow unexpected errors to propagate up.
+    rescue *skippable_exceptions => exception
+      msg = if exception.kind_of?(BadHttpResponse)
+        "Solr error response: #{exception.response.status}: #{exception.response.body}"
       else
-        msg = "Solr error response: #{resp.status}: #{resp.body}"
+        Traject::Util.exception_to_log_message(exception)
       end
       logger.error "Could not add record #{c.record_inspect}: #{msg}"
       logger.debug("\t" + exception.backtrace.join("\n\t")) if exception
       logger.debug(c.source_record.to_s) if c.source_record
       @skipped_record_incrementer.increment
       if @max_skipped and skipped_record_count > @max_skipped
+        # re-raising in rescue means the last encountered error will be available as #cause
+        # on raised exception, a feature in ruby 2.1+.
         raise MaxSkippedRecordsExceeded.new("#{self.class.name}: Exceeded maximum number of skipped records (#{@max_skipped}): aborting")
       end
     end
+  end
+  # Very beginning of a delete implementation. POSTs a delete request to solr
+  # for id in arg (value of Solr UniqueID field, usually `id` field).
+  #
+  # Right now, does it inline and immediately, no use of background threads or batching.
+  # This could change.
+  #
+  # Right now, if unsuccesful for any reason, will raise immediately out of here.
+  # Could raise any of the `skippable_exceptions` (timeouts, network errors), an
+  # exception will be raised right out of here.
+  #
+  # Will use `solr_writer.solr_update_args` settings.
+  #
+  # There is no built-in way to direct a record to be deleted from an indexing config
+  # file at the moment, this is just a loose method on the writer.
+  def delete(id)
+    json_package = {delete: id}
+    resp = @http_client.post solr_update_url_with_query(@solr_update_args), JSON.generate(json_package), "Content-type" => "application/json"
+    if resp.status != 200
+      raise RuntimeError.new("Could not delete #{id.inspect}, http response #{resp.status}: #{resp.body}")
+    end
   end
@@ -220,14 +303,32 @@ class Traject::SolrJsonWriter
   # Send a commit
-  def commit
+  #
+  # Called automatially by `close_on_commit` setting, but also can be called manually.
+  #
+  # If settings `solr_writer.commit_solr_update_args` is set, will be used by default.
+  # That setting needs `{ commit: true }` or  `{softCommit: true}` if you want it to
+  # actually do a commit!
+  #
+  # Optional query_params argument is the actual args to send, you must be sure
+  # to make it include "commit: true" or "softCommit: true" for it to actually commit!
+  # But you may want to include other params too, like optimize etc. query_param
+  # argument replaces setting `solr_writer.commit_solr_update_args`, they are not merged.
+  #
+  # @param [Hash] query_params optional query params to send to solr update. Default {"commit" => "true"}
+  #
+  # @example @writer.commit
+  # @example @writer.commit(softCommit: true)
+  # @example @writer.commit(commit: true, optimize: true, waitFlush: false)
+  def commit(query_params = nil)
+    query_params ||= @commit_solr_update_args || {"commit" => "true"}
     logger.info "#{self.class.name} sending commit to solr at url #{@solr_update_url}..."
     original_timeout = @http_client.receive_timeout
     @http_client.receive_timeout = (settings["commit_timeout"] || (10 * 60)).to_i
-    resp = @http_client.get(@solr_update_url, {"commit" => 'true'})
+    resp = @http_client.get(solr_update_url_with_query(query_params))
     unless resp.status == 200
       raise RuntimeError.new("Could not commit to Solr: #{resp.status} #{resp.body}")
     end
@@ -279,10 +380,24 @@ class Traject::SolrJsonWriter
   class MaxSkippedRecordsExceeded < RuntimeError ; end
+  # Adapted from HTTPClient::BadResponseError.
+  # It's got a #response accessor that will give you the HTTPClient
+  # Response object that had a bad status, although relying on that
+  # would tie you to our HTTPClient implementation that maybe should
+  # be considered an implementation detail, so I dunno.
+  class BadHttpResponse < RuntimeError
+    # HTTP::Message:: a response
+    attr_reader :response
+    def initialize(msg, response = nil) # :nodoc:
+      super(msg)
+      @response = response
+    end
+  end
   private
   def skippable_exceptions
-    @skippable_exceptions ||= (settings["solr_writer.skippable_exceptions"] || [HTTPClient::TimeoutError, SocketError, Errno::ECONNREFUSED])
+    @skippable_exceptions ||= (settings["solr_writer.skippable_exceptions"] || [HTTPClient::TimeoutError, SocketError, Errno::ECONNREFUSED, Traject::SolrJsonWriter::BadHttpResponse])
   end
 end

data/lib/traject/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Traject
-  VERSION = "3.0.0"
+  VERSION = "3.1.0.rc1"
 end

data/test/indexer/class_level_configuration_test.rb ADDED

@@ -0,0 +1,104 @@
+require 'test_helper'
+describe "Class-level configuration of Indexer sub-class" do
+  # Declaring a class inline in minitest isn't great, this really is a globally
+  # available class now, other tests shouldn't re-use this class name. But it works
+  # for testing for now.
+  class TestIndexerSubclass < Traject::Indexer
+    configure do
+      settings do
+        provide "class_level", "TestIndexerSubclass"
+      end
+      to_field "field", literal("value")
+      each_record do |rec, context|
+        context.output_hash["from_each_record"] ||= []
+        context.output_hash["from_each_record"] << "value"
+      end
+    end
+    def self.default_settings
+      @default_settings ||= super.merge(
+        "set_by_default_setting_no_override" => "TestIndexerSubclass",
+        "set_by_default_setting" => "TestIndexerSubclass"
+      )
+    end
+  end
+  before do
+    @indexer = TestIndexerSubclass.new
+  end
+  it "uses class-level configuration" do
+    result = @indexer.map_record(Object.new)
+    assert_equal ['value'], result['field']
+    assert_equal ['value'], result['from_each_record']
+  end
+  it "uses class-level configuration and instance-level configuration" do
+    @indexer.configure do
+      to_field "field", literal("from-instance-config")
+      to_field "instance_field", literal("from-instance-config")
+    end
+    result = @indexer.map_record(Object.new)
+    assert_equal ['value', 'from-instance-config'], result['field']
+    assert_equal ['from-instance-config'], result["instance_field"]
+  end
+  describe "with multi-level subclass" do
+    class TestIndexerSubclassSubclass < TestIndexerSubclass
+      configure do
+        settings do
+          provide "class_level", "TestIndexerSubclassSubclass"
+        end
+        to_field "field", literal("from-sub-subclass")
+        to_field "subclass_field", literal("from-sub-subclass")
+      end
+      def self.default_settings
+        @default_settings ||= super.merge(
+          "set_by_default_setting" => "TestIndexerSubclassSubclass"
+        )
+      end
+    end
+    before do
+      @indexer = TestIndexerSubclassSubclass.new
+    end
+    it "lets subclass override settings 'provide'" do
+      skip("This would be nice but is currently architecturally hard")
+      assert_equal "TestIndexerSubclassSubclass", @indexer.settings["class_level"]
+    end
+    it "lets subclass override default settings" do
+      assert_equal "TestIndexerSubclassSubclass", @indexer.settings["set_by_default_setting"]
+      assert_equal "TestIndexerSubclass", @indexer.settings["set_by_default_setting_no_override"]
+    end
+    it "uses configuraton from all inheritance" do
+      result = @indexer.map_record(Object.new)
+      assert_equal ['value', 'from-sub-subclass'], result['field']
+      assert_equal ['value'], result['from_each_record']
+      assert_equal ['from-sub-subclass'], result['subclass_field']
+    end
+    it "uses configuraton from all inheritance plus instance" do
+      @indexer.configure do
+        to_field "field", literal("from-instance")
+        to_field "instance_field", literal("from-instance")
+      end
+      result = @indexer.map_record(Object.new)
+      assert_equal ['value', 'from-sub-subclass', 'from-instance'], result['field']
+      assert_equal ['from-instance'], result['instance_field']
+    end
+  end
+end

data/test/indexer/context_test.rb CHANGED

@@ -38,8 +38,71 @@ describe "Traject::Indexer::Context" do
       assert_equal "<record ##{@position} (#{@input_name} ##{@position_in_input}), source_id:#{@record_001} output_id:output_id>", @context.record_inspect
     end
   end
+  describe "#add_output" do
+    before do
+      @context = Traject::Indexer::Context.new
+    end
+    it "adds one value to nil" do
+      @context.add_output(:key, "value")
+      assert_equal @context.output_hash, { "key" => ["value"] }
+    end
+    it "adds multiple values to nil" do
+      @context.add_output(:key, "value1", "value2")
+      assert_equal @context.output_hash, { "key" => ["value1", "value2"] }
+    end
+    it "adds one value to existing accumulator" do
+      @context.output_hash["key"] = ["value1"]
+      @context.add_output(:key, "value2")
+      assert_equal @context.output_hash, { "key" => ["value1", "value2"] }
+    end
+    it "uniqs by default" do
+      @context.output_hash["key"] = ["value1"]
+      @context.add_output(:key, "value1")
+      assert_equal @context.output_hash, { "key" => ["value1"] }
+    end
+    it "does not unique if allow_duplicate_values" do
+      @context.settings = { Traject::Indexer::ToFieldStep::ALLOW_DUPLICATE_VALUES => true }
+      @context.output_hash["key"] = ["value1"]
+      @context.add_output(:key, "value1")
+      assert_equal @context.output_hash, { "key" => ["value1", "value1"] }
+    end
+    it "ignores nil values by default" do
+      @context.add_output(:key, "value1", nil, "value2")
+      assert_equal @context.output_hash, { "key" => ["value1", "value2"] }
+    end
+    it "allows nil values if allow_nil_values" do
+      @context.settings = { Traject::Indexer::ToFieldStep::ALLOW_NIL_VALUES => true }
+      @context.add_output(:key, "value1", nil, "value2")
+      assert_equal @context.output_hash, { "key" => ["value1", nil, "value2"] }
+    end
+    it "ignores empty array by default" do
+      @context.add_output(:key)
+      @context.add_output(:key, nil)
+      assert_nil @context.output_hash["key"]
+    end
+    it "allows empty field if allow_empty_fields" do
+      @context.settings = { Traject::Indexer::ToFieldStep::ALLOW_EMPTY_FIELDS => true }
+      @context.add_output(:key, nil)
+      assert_equal @context.output_hash, { "key" => [] }
+    end
+    it "can add to multiple fields" do
+      @context.add_output(["field1", "field2"], "value1", "value2")
+      assert_equal @context.output_hash, { "field1" => ["value1", "value2"], "field2" => ["value1", "value2"] }
+    end
+  end
 end

data/test/indexer/error_handler_test.rb CHANGED

@@ -56,4 +56,22 @@ describe 'Custom mapping error handler' do
     assert_nil indexer.map_record({})
   end
+  it "uses logger from settings" do
+    desired_logger = Logger.new("/dev/null")
+    set_logger = nil
+    indexer.configure do
+      settings do
+        provide "logger", desired_logger
+        provide "mapping_rescue", -> (ctx, e) {
+          set_logger = ctx.logger
+        }
+      end
+      to_field 'id' do |_context , _exception|
+        raise 'this was always going to fail'
+      end
+    end
+    indexer.map_record({})
+    assert_equal desired_logger.object_id, set_logger.object_id
+  end
 end

data/test/nokogiri_reader_test.rb CHANGED

@@ -1,6 +1,12 @@
 require 'test_helper'
 require 'traject/nokogiri_reader'
+# Note that JRuby Nokogiri can treat namespaces differently than MRI nokogiri.
+# Particularly when we extract elements from a larger document with `each_record_xpath`,
+# and put them in their own document, in JRuby nokogiri the xmlns declarations
+# can end up on different elements than expected, although the document should
+# be semantically equivalent to an XML-namespace-aware processor. See:
+# https://github.com/sparklemotion/nokogiri/issues/1875
 describe "Traject::NokogiriReader" do
   describe "with namespaces" do
     before do
@@ -80,8 +86,22 @@ describe "Traject::NokogiriReader" do
         assert yielded_records.length > 0
         expected_namespaces = {"xmlns"=>"http://example.org/top", "xmlns:a"=>"http://example.org/a", "xmlns:b"=>"http://example.org/b"}
-        yielded_records.each do |rec|
-          assert_equal expected_namespaces, rec.namespaces
+        if !Traject::Util.is_jruby?
+          yielded_records.each do |rec|
+            assert_equal expected_namespaces, rec.namespaces
+          end
+        else
+          # jruby nokogiri shuffles things around, all we can really do is test that the namespaces
+          # are somehwere in the doc :( We rely on other tests to test semantic equivalence.
+          yielded_records.each do |rec|
+            assert_equal expected_namespaces, rec.collect_namespaces
+          end
+          whole_doc = Nokogiri::XML.parse(File.open(support_file_path("namespace-test.xml")))
+          whole_doc.xpath("//mytop:record", mytop: "http://example.org/top").each_with_index do |original_el, i|
+            assert ns_semantic_equivalent_xml?(original_el, yielded_records[i])
+          end
         end
       end
     end
@@ -139,7 +159,40 @@ describe "Traject::NokogiriReader" do
     assert_length manually_extracted.size, yielded_records
     assert yielded_records.all? {|r| r.kind_of? Nokogiri::XML::Document }
-    assert_equal manually_extracted.collect(&:to_xml), yielded_records.collect(&:root).collect(&:to_xml)
+    expected_xml = manually_extracted
+    actual_xml   = yielded_records.collect(&:root)
+    expected_xml.size.times do |i|
+      if !Traject::Util.is_jruby?
+        assert_equal expected_xml[i-1].to_xml, actual_xml[i-1].to_xml
+      else
+        # jruby shuffles the xmlns declarations around, but they should
+        # be semantically equivalent to an namespace-aware processor
+        assert ns_semantic_equivalent_xml?(expected_xml[i-1], actual_xml[i-1])
+      end
+    end
+  end
+  # Jruby nokogiri can shuffle around where the `xmlns:ns` declarations appear, although it
+  # _ought_ not to be semantically different for a namespace-aware parser -- nodes are still in
+  # same namespaces.  JRuby may differ from what MRI does with same code, and may differ from
+  # the way an element appeared in input when extracting records from a larger input doc.
+  # There isn't much we can do about this, but we can write a recursive method
+  # that hopefully compares XML to make sure it really is semantically equivalent to
+  # a namespace, and hope we got that right.
+  def ns_semantic_equivalent_xml?(noko_a, noko_b)
+    noko_a = noko_a.root if noko_a.kind_of?(Nokogiri::XML::Document)
+    noko_b = noko_b.root if noko_b.kind_of?(Nokogiri::XML::Document)
+    noko_a.name == noko_b.name &&
+      noko_a.namespace&.prefix == noko_b.namespace&.prefix &&
+      noko_a.namespace&.href   == noko_b.namespace&.href &&
+      noko_a.attributes        == noko_b.attributes &&
+      noko_a.children.length   == noko_b.children.length &&
+      noko_a.children.each_with_index.all? do |a_child, index|
+        ns_semantic_equivalent_xml?(a_child, noko_b.children[index])
+      end
   end
   describe "without each_record_xpath" do

data/test/solr_json_writer_test.rb CHANGED

@@ -137,6 +137,26 @@ describe "Traject::SolrJsonWriter" do
     assert_length 1, JSON.parse(post_args[1][1]), "second batch posted with last remaining doc"
   end
+  it "retries batch as individual records on failure" do
+    @writer = create_writer("solr_writer.batch_size" => 2, "solr_writer.max_skipped" => 10)
+    @fake_http_client.response_status = 500
+    2.times do |i|
+      @writer.put context_with({"id" => "doc_#{i}", "key" => "value"})
+    end
+    @writer.close
+    # 1 batch, then 2 for re-trying each individually
+    assert_length 3, @fake_http_client.post_args
+    batch_update = @fake_http_client.post_args.first
+    assert_length 2, JSON.parse(batch_update[1])
+    individual_update1, individual_update2 = @fake_http_client.post_args[1], @fake_http_client.post_args[2]
+    assert_length 1, JSON.parse(individual_update1[1])
+    assert_length 1, JSON.parse(individual_update2[1])
+  end
   it "can #flush" do
     2.times do |i|
       doc = {"id" => "doc_#{i}", "key" => "value"}
@@ -150,15 +170,116 @@ describe "Traject::SolrJsonWriter" do
     assert_length 1, @fake_http_client.post_args, "Has flushed to solr"
   end
-  it "commits on close when set" do
-    @writer = create_writer("solr.url" => "http://example.com", "solr_writer.commit_on_close" => "true")
-    @writer.put context_with({"id" => "one", "key" => ["value1", "value2"]})
-    @writer.close
+  describe "commit" do
+    it "commits on close when set" do
+      @writer = create_writer("solr.url" => "http://example.com", "solr_writer.commit_on_close" => "true")
+      @writer.put context_with({"id" => "one", "key" => ["value1", "value2"]})
+      @writer.close
+      last_solr_get = @fake_http_client.get_args.last
+      assert_equal "http://example.com/update/json?commit=true", last_solr_get[0]
+    end
+    it "commits on close with commit_solr_update_args" do
+      @writer = create_writer(
+        "solr.url" => "http://example.com",
+        "solr_writer.commit_on_close" => "true",
+        "solr_writer.commit_solr_update_args" => { softCommit: true }
+      )
+      @writer.put context_with({"id" => "one", "key" => ["value1", "value2"]})
+      @writer.close
+      last_solr_get = @fake_http_client.get_args.last
+      assert_equal "http://example.com/update/json?softCommit=true", last_solr_get[0]
+    end
-    last_solr_get = @fake_http_client.get_args.last
+    it "can manually send commit" do
+      @writer = create_writer("solr.url" => "http://example.com")
+      @writer.commit
+      last_solr_get = @fake_http_client.get_args.last
+      assert_equal "http://example.com/update/json?commit=true", last_solr_get[0]
+    end
+    it "can manually send commit with specified args" do
+      @writer = create_writer("solr.url" => "http://example.com", "solr_writer.commit_solr_update_args" => { softCommit: true })
+      @writer.commit(commit: true, optimize: true, waitFlush: false)
+      last_solr_get = @fake_http_client.get_args.last
+      assert_equal "http://example.com/update/json?commit=true&optimize=true&waitFlush=false", last_solr_get[0]
+    end
+    it "uses commit_solr_update_args settings by default" do
+      @writer = create_writer(
+        "solr.url" => "http://example.com",
+        "solr_writer.commit_solr_update_args" => { softCommit: true }
+      )
+      @writer.commit
+      last_solr_get = @fake_http_client.get_args.last
+      assert_equal "http://example.com/update/json?softCommit=true", last_solr_get[0]
+    end
+    it "overrides commit_solr_update_args with method arg" do
+      @writer = create_writer(
+        "solr.url" => "http://example.com",
+        "solr_writer.commit_solr_update_args" => { softCommit: true, foo: "bar" }
+      )
+      @writer.commit(commit: true)
-    assert_equal "http://example.com/update/json", last_solr_get[0]
-    assert_equal( {"commit" => "true"}, last_solr_get[1] )
+      last_solr_get = @fake_http_client.get_args.last
+      assert_equal "http://example.com/update/json?commit=true", last_solr_get[0]
+    end
+  end
+  describe "solr_writer.solr_update_args" do
+    before do
+      @writer = create_writer("solr_writer.solr_update_args" => { softCommit: true } )
+    end
+    it "sends update args" do
+      @writer.put context_with({"id" => "one", "key" => ["value1", "value2"]})
+      @writer.close
+      assert_equal 1, @fake_http_client.post_args.count
+      post_args = @fake_http_client.post_args.first
+      assert_equal "http://example.com/solr/update/json?softCommit=true", post_args[0]
+    end
+    it "sends update args with delete" do
+      @writer.delete("test-id")
+      @writer.close
+      assert_equal 1, @fake_http_client.post_args.count
+      post_args = @fake_http_client.post_args.first
+      assert_equal "http://example.com/solr/update/json?softCommit=true", post_args[0]
+    end
+    it "sends update args on individual-retry after batch failure" do
+      @writer = create_writer(
+        "solr_writer.batch_size" => 2,
+        "solr_writer.max_skipped" => 10,
+        "solr_writer.solr_update_args" => { softCommit: true }
+      )
+      @fake_http_client.response_status = 500
+      2.times do |i|
+        @writer.put context_with({"id" => "doc_#{i}", "key" => "value"})
+      end
+      @writer.close
+      # 1 batch, then 2 for re-trying each individually
+      assert_length 3, @fake_http_client.post_args
+      individual_update1, individual_update2 = @fake_http_client.post_args[1], @fake_http_client.post_args[2]
+      assert_equal "http://example.com/solr/update/json?softCommit=true", individual_update1[0]
+      assert_equal "http://example.com/solr/update/json?softCommit=true", individual_update2[0]
+    end
   end
   describe "skipped records" do
@@ -225,6 +346,23 @@ describe "Traject::SolrJsonWriter" do
        logged = strio.string
       assert_includes logged, 'ArgumentError: bad stuff'
     end
+  end
+  describe "#delete" do
+    it "deletes" do
+      id = "123456"
+      @writer.delete(id)
+      post_args = @fake_http_client.post_args.first
+      assert_equal "http://example.com/solr/update/json", post_args[0]
+      assert_equal JSON.generate({"delete" => id}), post_args[1]
+    end
+    it "raises on non-200 http response" do
+      @fake_http_client.response_status = 500
+      assert_raises(RuntimeError) do
+        @writer.delete("12345")
+      end
+    end
   end
 end

data/traject.gemspec CHANGED

@@ -31,9 +31,9 @@ Gem::Specification.new do |spec|
   spec.add_dependency "httpclient", "~> 2.5"
   spec.add_dependency "http", "~> 3.0" # used in oai_pmh_reader, may use more extensively in future instead of httpclient
   spec.add_dependency 'marc-fastxmlwriter', '~>1.0' # fast marc->xml
-  spec.add_dependency "nokogiri", "~> 1.0" # NokogiriIndexer
+  spec.add_dependency "nokogiri", "~> 1.9" # NokogiriIndexer
-  spec.add_development_dependency "bundler", '~> 1.7'
+  spec.add_development_dependency 'bundler', '>= 1.7', '< 3'
   spec.add_development_dependency "rake"
   spec.add_development_dependency "minitest"

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: traject
 version: !ruby/object:Gem::Version
-  version: 3.0.0
+  version: 3.1.0.rc1
 platform: ruby
 authors:
 - Jonathan Rochkind
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-10-12 00:00:00.000000000 Z
+date: 2019-04-10 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: concurrent-ruby
@@ -149,28 +149,34 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.0'
+        version: '1.9'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '1.0'
+        version: '1.9'
 - !ruby/object:Gem::Dependency
   name: bundler
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
         version: '1.7'
+    - - "<"
+      - !ruby/object:Gem::Version
+        version: '3'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
         version: '1.7'
+    - - "<"
+      - !ruby/object:Gem::Version
+        version: '3'
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
@@ -292,6 +298,7 @@ files:
 - test/debug_writer_test.rb
 - test/delimited_writer_test.rb
 - test/experimental_nokogiri_streaming_reader_test.rb
+- test/indexer/class_level_configuration_test.rb
 - test/indexer/context_test.rb
 - test/indexer/each_record_test.rb
 - test/indexer/error_handler_test.rb
@@ -381,12 +388,12 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: '0'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
-  - - ">="
+  - - ">"
     - !ruby/object:Gem::Version
-      version: '0'
+      version: 1.3.1
 requirements: []
 rubyforge_project:
-rubygems_version: 2.7.7
+rubygems_version: 2.7.6
 signing_key:
 specification_version: 4
 summary: An easy to use, high-performance, flexible and extensible metadata transformation
@@ -395,6 +402,7 @@ test_files:
 - test/debug_writer_test.rb
 - test/delimited_writer_test.rb
 - test/experimental_nokogiri_streaming_reader_test.rb
+- test/indexer/class_level_configuration_test.rb
 - test/indexer/context_test.rb
 - test/indexer/each_record_test.rb
 - test/indexer/error_handler_test.rb