traject 2.3.1 → 2.3.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: dee0f24c94b13285ce7e8408c7d8ccbd796e0c7a
4
- data.tar.gz: 9ff0d8523063acaec87aaba83fae96ae5ce46217
3
+ metadata.gz: 1536be14599f2f0777b79a6bc27717ad0350223f
4
+ data.tar.gz: 8cc6327ca07889c69526f3a19b4e3b91b5512c65
5
5
  SHA512:
6
- metadata.gz: 557a5f61241a4dc3f02a0ab0029a75b429690ac7b0f2901ffdfbf9d6312ce27f3b8744a44f9f87c1ef18367b66a081051d8b0671b560173446643971a467f7d0
7
- data.tar.gz: d4bb78212b5861447df4f42de5099a03e654eeb37e671e753743cb737a1a159b03e14d8b8c75f5b2259d3c50ee1ffa9d8ff541db9cdff740fdc8d42fe39781e4
6
+ metadata.gz: 05126d1932a31c7fb97f571619139c287b71afe4f3638ec7e72e73518df8c42f765cdaed7e646b64f4c13ad724b95d8f99e1dc61243b07aac0d2ab7d38bf9241
7
+ data.tar.gz: 2035e5bc42067a3c0ac598f894ac59c1309244d47afceaaa7b66a7dd4bfd034e8395c1e71b90d2f6d5b5d68efa78ea98c0bfc0681dff23b39276fdb7bed6b5b2
data/CHANGES.md CHANGED
@@ -1,5 +1,13 @@
1
1
  # Changes
2
2
 
3
+ ## 2.3.2
4
+ * Change to `extract_marc` to work around a threadsafe problem in JRuby/MRI where
5
+ regexps were unsafely shared between threads. (@codeforkjeff)
6
+ * Make trim-punctuation safe for non-just-ASCII text (thanks to @dunn and @redlibrarian)
7
+
8
+ ## 2.3.1
9
+ * Update README with more info aout new nil-related options
10
+
3
11
  ## 2.3.0
4
12
  * Allow nil values, empty fields, and deduplication
5
13
 
data/README.md CHANGED
@@ -2,11 +2,13 @@
2
2
 
3
3
  An easy to use, high-performance, flexible and extensible MARC to Solr indexer.
4
4
 
5
+ (Questions about use are welcome here or on the [google group](https://groups.google.com/forum/#!forum/traject-users))
6
+
5
7
  You might use [traject](https://github.com/traject/traject) to index MARC data for a Solr-based discovery product like [Blacklight](https://github.com/projectblacklight/blacklight) or [VUFind](http://vufind.org/).
6
8
 
7
9
  Traject can also be generalized to a set of tools for getting structured data from a source, and transforming it to a hash-like object to send to a destination. In addition to sending data to Solr, Traject can produce json or yaml files, tab-delimited files, CSV files, and output suitable for debugging by a human.
8
10
 
9
- **Traject is stable, mature software, that is already being used in production by its authors.**
11
+ **Traject is stable, mature software, that is already being used in production by its authors and several other institutions.**
10
12
 
11
13
  [![Gem Version](https://badge.fury.io/rb/traject.png)](http://badge.fury.io/rb/traject)
12
14
  [![Build Status](https://travis-ci.org/traject/traject.png)](https://travis-ci.org/traject/traject)
@@ -113,7 +115,7 @@ data out of a MARC record according to a tag/subfield specification.
113
115
 
114
116
  # 245 subfields a, p, and s. 130, all subfields.
115
117
  # built-in punctuation trimming routine.
116
- to_field "title_t", extract_marc("245nps:130", :trim_punctuation => true)
118
+ to_field "title_t", extract_marc("245aps:130", :trim_punctuation => true)
117
119
 
118
120
  # Can limit to certain indicators with || chars.
119
121
  # "*" is a wildcard in indicator spec. So this is
@@ -129,7 +131,7 @@ data out of a MARC record according to a tag/subfield specification.
129
131
  to_field "language_code", extract_marc("008[35-37]")
130
132
  ~~~
131
133
 
132
- `extract_marc` by default includes all 'alternate script' linked fields correspoinding to matched specifications, but you can turn that off, or extract *only* corresponding 880s.
134
+ `extract_marc` by default includes all 'alternate script' linked fields corresponding to matched specifications, but you can turn that off, or extract *only* corresponding 880s.
133
135
 
134
136
  ~~~ruby
135
137
  to_field "title", extract_marc("245abc", :alternate_script => false)
@@ -140,7 +142,7 @@ By default, specifications with multiple subfields (e.g. "240abc") will produce
140
142
 
141
143
  For the syntax and complete possibilities of the specification string argument to extract_marc, see docs at the [MarcExtractor class](./lib/traject/marc_extractor.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/MarcExtractor)).
142
144
 
143
- `extract_marc` also supports `translation maps` similar to SolrMarc's. There are some translation maps provided by traject, and you can also define your own, in yaml or ruby. Translation maps are especially useful for mapping form MARC codes to user-displayable strings:
145
+ `extract_marc` also supports `translation maps` similar to SolrMarc's. There are some translation maps provided by traject, and you can also define your own, in yaml or ruby. Translation maps are especially useful for mapping from MARC codes to user-displayable strings:
144
146
 
145
147
  ~~~ruby
146
148
  # "translation_map" will be passed to Traject::TranslationMap.new
@@ -278,7 +280,7 @@ results, or writing to more than one field at once.
278
280
 
279
281
  For more on `each_record`, see [Indexing Rules: Macros and Custom Logic](./doc/indexing_rules.md).
280
282
 
281
- There is also an `after_processing` method that can be used to register logic that will be called after the entire has been processed. You can use it for whatever custom ruby code you might want for your app (send an email? Clean up a log file? Trigger a Solr replication?)
283
+ There is also an `after_processing` method that can be used to register logic that will be called after the entire input has been processed. You can use it for whatever custom ruby code you might want for your app (send an email? Clean up a log file? Trigger a Solr replication?)
282
284
 
283
285
  ~~~ruby
284
286
  after_processing do
@@ -305,7 +307,6 @@ The [SolrJWriter](https://github.com/traject/traject-solrj_writer) is packaged s
305
307
  and will be useful if you need to index to Solr's older than version 3.2. It requires Jruby.
306
308
 
307
309
  You can easily write your own Readers and Writers if you'd like, see comments at top
308
-
309
310
  of [Traject::Indexer](lib/traject/indexer.rb).
310
311
 
311
312
 
@@ -67,7 +67,7 @@ created." In ruby, lambdas and blocks are closures. Method definitions
67
67
  are not, which most of us have run across much to our chagrin.
68
68
 
69
69
  Within the context of `traject`, this means you can define a variable
70
- outside of a `to_field` or `each_record` block and it will be avaiable
70
+ outside of a `to_field` or `each_record` block and it will be available
71
71
  inside those blocks. And you only have to define it once.
72
72
 
73
73
  That's useful to do for any object that is even a bit expensive
@@ -190,7 +190,7 @@ to_field('foo'), macro_returning_dup_values do |rec, acc|
190
190
  end
191
191
  ```
192
192
 
193
- ## Maniuplating `context.output_hash` directly
193
+ ## Manipulating `context.output_hash` directly
194
194
 
195
195
  If you ask for the context argument, a [Traject::Indexer::Context](./lib/traject/indexer/context.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Indexer/Context)), you have access to `context.output_hash`, which is
196
196
  the hash of already transformed output that will be sent to Solr (or any other Writer).
@@ -218,7 +218,7 @@ context.output_hash['fieldname'] = ['fuzzy_wuzzies']
218
218
 
219
219
  Thus, `each_record` blocks have no `accumulator` argument: instead they either take a single `record` argument; or both a `record` and a `context`.
220
220
 
221
- `each_record` is useful for logging or notifiying, computing intermediate
221
+ `each_record` is useful for logging or notifying, computing intermediate
222
222
  results, or writing to more than one field at once.
223
223
 
224
224
  ~~~ruby
@@ -233,7 +233,7 @@ module Traject::Macros
233
233
  str = str.sub(/ *[ ,\/;:] *\Z/, '')
234
234
 
235
235
  # trailing period if it is preceded by at least three letters (possibly preceded and followed by whitespace)
236
- str = str.sub(/( *\w\w\w)\. *\Z/, '\1')
236
+ str = str.sub(/( *[[:word:][:word:][:word:]])\. *\Z/, '\1')
237
237
 
238
238
  # single square bracket characters if they are the start and/or end
239
239
  # chars and there are no internal square brackets.
@@ -200,7 +200,9 @@ module Traject::Macros
200
200
  # sometimes multiple language codes are jammed together in one subfield, and
201
201
  # we need to separate ourselves. sigh.
202
202
  unless value.length == 3
203
- value = value.scan(/.{1,3}/) # split into an array of 3-length substrs
203
+ # split into an array of 3-length substrs; JRuby has problems with regexes
204
+ # across threads, which is why we don't use String#scan here.
205
+ value = value.chars.each_slice(3).map(&:join)
204
206
  end
205
207
  value
206
208
  end.flatten
@@ -1,3 +1,3 @@
1
1
  module Traject
2
- VERSION = "2.3.1"
2
+ VERSION = "2.3.2"
3
3
  end
@@ -128,6 +128,8 @@ describe "Traject::Macros::Marc21" do
128
128
 
129
129
  # This one was a bug before
130
130
  assert_equal "Feminism and art", Marc21.trim_punctuation("Feminism and art.")
131
+
132
+ assert_equal "Le réve", Marc21.trim_punctuation("Le réve.") # this assertion currently fails
131
133
  end
132
134
 
133
135
  it "uses :translation_map" do
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: traject
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.3.1
4
+ version: 2.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Rochkind
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2016-04-04 00:00:00.000000000 Z
12
+ date: 2016-11-03 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: concurrent-ruby