RubyGems - traject - Versions diffs - 2.3.1 → 2.3.2 - Mend

traject 2.3.1 → 2.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGES.md +8 -0
data/README.md +7 -6
data/doc/indexing_rules.md +3 -3
data/lib/traject/macros/marc21.rb +1 -1
data/lib/traject/macros/marc21_semantics.rb +3 -1
data/lib/traject/version.rb +1 -1
data/test/indexer/macros_marc21_test.rb +2 -0
metadata +2 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: dee0f24c94b13285ce7e8408c7d8ccbd796e0c7a
-  data.tar.gz: 9ff0d8523063acaec87aaba83fae96ae5ce46217
+  metadata.gz: 1536be14599f2f0777b79a6bc27717ad0350223f
+  data.tar.gz: 8cc6327ca07889c69526f3a19b4e3b91b5512c65
 SHA512:
-  metadata.gz: 557a5f61241a4dc3f02a0ab0029a75b429690ac7b0f2901ffdfbf9d6312ce27f3b8744a44f9f87c1ef18367b66a081051d8b0671b560173446643971a467f7d0
-  data.tar.gz: d4bb78212b5861447df4f42de5099a03e654eeb37e671e753743cb737a1a159b03e14d8b8c75f5b2259d3c50ee1ffa9d8ff541db9cdff740fdc8d42fe39781e4
+  metadata.gz: 05126d1932a31c7fb97f571619139c287b71afe4f3638ec7e72e73518df8c42f765cdaed7e646b64f4c13ad724b95d8f99e1dc61243b07aac0d2ab7d38bf9241
+  data.tar.gz: 2035e5bc42067a3c0ac598f894ac59c1309244d47afceaaa7b66a7dd4bfd034e8395c1e71b90d2f6d5b5d68efa78ea98c0bfc0681dff23b39276fdb7bed6b5b2

data/CHANGES.md CHANGED

@@ -1,5 +1,13 @@
 # Changes
+## 2.3.2
+  * Change to `extract_marc` to work around a threadsafe problem in JRuby/MRI where
+    regexps were unsafely shared between threads. (@codeforkjeff)
+  * Make trim-punctuation safe for non-just-ASCII text (thanks to @dunn and @redlibrarian)
+## 2.3.1
+  * Update README with more info aout new nil-related options
 ## 2.3.0
   * Allow nil values, empty fields, and deduplication

data/README.md CHANGED

@@ -2,11 +2,13 @@
 An easy to use, high-performance, flexible and extensible MARC to Solr indexer.
+(Questions about use are welcome here or on the [google group](https://groups.google.com/forum/#!forum/traject-users))
 You might use [traject](https://github.com/traject/traject) to index MARC data for a Solr-based discovery product like [Blacklight](https://github.com/projectblacklight/blacklight) or [VUFind](http://vufind.org/).
 Traject can also be generalized to a set of tools for getting structured data from a source, and transforming it to a hash-like object to send to a destination. In addition to sending data to Solr, Traject can produce json or yaml files, tab-delimited files, CSV files, and output suitable for debugging by a human.
-**Traject is stable, mature software, that is already being used in production by its authors.**
+**Traject is stable, mature software, that is already being used in production by its authors and several other institutions.**
 [![Gem Version](https://badge.fury.io/rb/traject.png)](http://badge.fury.io/rb/traject)
 [![Build Status](https://travis-ci.org/traject/traject.png)](https://travis-ci.org/traject/traject)
@@ -113,7 +115,7 @@ data out of a MARC record according to a tag/subfield specification.
     # 245 subfields a, p, and s. 130, all subfields.
     # built-in punctuation trimming routine.
-    to_field "title_t", extract_marc("245nps:130", :trim_punctuation => true)
+    to_field "title_t", extract_marc("245aps:130", :trim_punctuation => true)
     # Can limit to certain indicators with || chars.
     # "*" is a wildcard in indicator spec.  So this is
@@ -129,7 +131,7 @@ data out of a MARC record according to a tag/subfield specification.
     to_field "language_code", extract_marc("008[35-37]")
 ~~~
-`extract_marc` by default includes all 'alternate script' linked fields correspoinding to matched specifications, but you can turn that off, or extract *only* corresponding 880s.
+`extract_marc` by default includes all 'alternate script' linked fields corresponding to matched specifications, but you can turn that off, or extract *only* corresponding 880s.
 ~~~ruby
     to_field "title", extract_marc("245abc", :alternate_script => false)
@@ -140,7 +142,7 @@ By default, specifications with multiple subfields (e.g. "240abc") will produce
 For the syntax and complete possibilities of the specification string argument to extract_marc, see docs at the [MarcExtractor class](./lib/traject/marc_extractor.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/MarcExtractor)).
-`extract_marc` also supports `translation maps` similar to SolrMarc's. There are some translation maps provided by traject, and you can also define your own, in yaml or ruby. Translation maps are especially useful for mapping form MARC codes to user-displayable strings:
+`extract_marc` also supports `translation maps` similar to SolrMarc's. There are some translation maps provided by traject, and you can also define your own, in yaml or ruby. Translation maps are especially useful for mapping from MARC codes to user-displayable strings:
 ~~~ruby
     # "translation_map" will be passed to Traject::TranslationMap.new
@@ -278,7 +280,7 @@ results, or writing to more than one field at once.
 For more on `each_record`, see [Indexing Rules: Macros and Custom Logic](./doc/indexing_rules.md).
-There is also an `after_processing` method that can be used to register logic that will be called after the entire has been processed. You can use it for whatever custom ruby code you might want for your app (send an email? Clean up a log file? Trigger a Solr replication?)
+There is also an `after_processing` method that can be used to register logic that will be called after the entire input has been processed. You can use it for whatever custom ruby code you might want for your app (send an email? Clean up a log file? Trigger a Solr replication?)
 ~~~ruby
 after_processing do
@@ -305,7 +307,6 @@ The [SolrJWriter](https://github.com/traject/traject-solrj_writer) is packaged s
 and will be useful if you need to index to Solr's older than version 3.2. It requires Jruby.
 You can easily write your own Readers and Writers if you'd like, see comments at top
 of [Traject::Indexer](lib/traject/indexer.rb).

data/doc/indexing_rules.md CHANGED

@@ -67,7 +67,7 @@ created." In ruby, lambdas and blocks are closures. Method definitions
 are not, which most of us have run across much to our chagrin.
 Within the context of `traject`, this means you can define a variable
-outside of a `to_field` or `each_record` block and it will be avaiable
+outside of a `to_field` or `each_record` block and it will be available
 inside those blocks. And you only have to define it once.
 That's useful to do for any object that is even a bit expensive
@@ -190,7 +190,7 @@ to_field('foo'), macro_returning_dup_values do |rec, acc|
 end
 ```
-## Maniuplating `context.output_hash` directly
+## Manipulating `context.output_hash` directly
 If you ask for the context argument, a [Traject::Indexer::Context](./lib/traject/indexer/context.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Indexer/Context)), you have access to `context.output_hash`, which is
 the hash of already transformed output that will be sent to Solr (or any other Writer).
@@ -218,7 +218,7 @@ context.output_hash['fieldname'] = ['fuzzy_wuzzies']
 Thus, `each_record` blocks have no `accumulator` argument: instead they either take a single `record` argument; or both a `record` and a `context`.
-`each_record` is useful for logging or notifiying, computing intermediate
+`each_record` is useful for logging or notifying, computing intermediate
 results, or writing to more than one field at once.
 ~~~ruby

data/lib/traject/macros/marc21.rb CHANGED

@@ -233,7 +233,7 @@ module Traject::Macros
       str = str.sub(/ *[ ,\/;:] *\Z/, '')
       # trailing period if it is preceded by at least three letters (possibly preceded and followed by whitespace)
-      str = str.sub(/( *\w\w\w)\. *\Z/, '\1')
+      str = str.sub(/( *[[:word:][:word:][:word:]])\. *\Z/, '\1')
       # single square bracket characters if they are the start and/or end
       #   chars and there are no internal square brackets.

data/lib/traject/macros/marc21_semantics.rb CHANGED

@@ -200,7 +200,9 @@ module Traject::Macros
               # sometimes multiple language codes are jammed together in one subfield, and
               # we need to separate ourselves. sigh.
               unless value.length == 3
-                value = value.scan(/.{1,3}/) # split into an array of 3-length substrs
+                # split into an array of 3-length substrs; JRuby has problems with regexes
+                # across threads, which is why we don't use String#scan here.
+                value = value.chars.each_slice(3).map(&:join)
               end
               value
             end.flatten

data/lib/traject/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Traject
-  VERSION = "2.3.1"
+  VERSION = "2.3.2"
 end

data/test/indexer/macros_marc21_test.rb CHANGED

@@ -128,6 +128,8 @@ describe "Traject::Macros::Marc21" do
       # This one was a bug before
       assert_equal "Feminism and art", Marc21.trim_punctuation("Feminism and art.")
+      assert_equal "Le réve", Marc21.trim_punctuation("Le réve.") # this assertion currently fails
     end
     it "uses :translation_map" do

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: traject
 version: !ruby/object:Gem::Version
-  version: 2.3.1
+  version: 2.3.2
 platform: ruby
 authors:
 - Jonathan Rochkind
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-04-04 00:00:00.000000000 Z
+date: 2016-11-03 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: concurrent-ruby