traject 3.0.0.alpha.2 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8d8fb43f0b2c71754358c324beb828ab3fddd77b2bafeee7b7ff525ee85260ad
4
- data.tar.gz: ba2556368b6b30559c110b752d99e837387f4f257d473b72518375b129eb9557
3
+ metadata.gz: cf92e5467d32d37b681a36ae1ffbd2995bbf3e0def938b13d74831a939b68632
4
+ data.tar.gz: 7c4693ded4a9a8b0e9c599e7489aaefdf9806dfffce6b20ae6054def9ba8c156
5
5
  SHA512:
6
- metadata.gz: 60f7adf990eb03991b6fea938a729eeab076af76f5de4af54e7a1933d0ffb422925144180d4769c5b9af4a50cd12f60e19479c2d0ae31f26b7440512084f3a9a
7
- data.tar.gz: 2e6c120bc9cd820cc708c9bd71dc5d0e91c681341dd29e8c169aaaa0e14cee2282b3362fb4b4d99f71b85a7e642b354e01f727f2c79d6cbcbd07913affd789b1
6
+ metadata.gz: 9e12113a6f53aa9c7629c072df80b1e347f432d069bd30dbb35d73373fccc3fa341682b281a65c778aa2a3eae9fb7b2d52c81c2f39aa17d348074ecb8b9c2512
7
+ data.tar.gz: 6f2294bce5deb181a20db0977f8ab7e73e8e1cda6e86d8ee562fabd7a8cce2c683011be8f3955ccafd0165787dbaf774e7c3571220f5d6e797eaf6fe8a02577d
@@ -4,15 +4,14 @@ cache: bundler
4
4
  # at downloading jruby, and
5
5
  sudo: true
6
6
  rvm:
7
- - 2.3.6
8
- - 2.4.3
7
+ - 2.4.4
9
8
  - 2.5.1
10
9
  - "2.6.0-preview2"
11
10
  # avoid having travis install jdk on MRI builds where we don't need it.
12
11
  matrix:
13
12
  include:
14
13
  - jdk: openjdk8
15
- rvm: jruby-9.0.5.0
14
+ rvm: jruby-9.1.17.0
16
15
  - jdk: openjdk8
17
16
  rvm: jruby-9.2.0.0
18
17
  allow_failures:
data/README.md CHANGED
@@ -26,7 +26,7 @@ Initially by Jonathan Rochkind (Johns Hopkins Libraries) and Bill Dueber (Univer
26
26
 
27
27
  ## Installation
28
28
 
29
- Traject runs under jruby (9.0.x or higher), MRI ruby (2.3.x or higher), or probably any other ruby platform.
29
+ Traject runs under jruby (9.1.x or higher), MRI ruby (2.3.x or higher), or probably any other ruby platform.
30
30
 
31
31
  Once you have ruby installed, just `$ gem install traject`.
32
32
 
@@ -135,12 +135,6 @@ For the syntax and complete possibilities of the specification string argument t
135
135
 
136
136
  To see all options for `extract_marc`, see the [extract_marc](http://rdoc.info/gems/traject/Traject/Macros/Marc21:extract_marc) method documentation.
137
137
 
138
- There is one special MARC-specific transformation macro, that strips punctuation from beginning and end of values using heuristics designed for AACR2 in MARC:
139
-
140
- ```ruby
141
- to_field "title", extract_marc("245abc"), trim_punctuation
142
- ```
143
-
144
138
  ### XML mode, extract_xml
145
139
 
146
140
  See our [xml guide](./doc/xml.md) for more XML examples, but you will usually use extract_xpath.
@@ -190,15 +184,15 @@ Example:
190
184
  to_field "something", extract_xpath("//value"), strip, default("no value"), prepend("Extracted value: ")
191
185
  ```
192
186
 
193
- ### Other built-in utility macros
187
+ ### Some more MARC-specific utility methods
194
188
 
195
- Other built-in methods that can be used with `to_field` include:
189
+ Other built-in methods that can be used with `to_field` for MARC specifically include:
196
190
 
197
- a hard-coded literal string:
191
+ Strip punctuation from beginning and end of values using heuristics designed for AACR2 in MARC:
198
192
 
199
- ~~~ruby
200
- to_field "source", literal("LIB_CATALOG")
201
- ~~~
193
+ ```ruby
194
+ to_field "title", extract_marc("245abc"), trim_punctuation
195
+ ```
202
196
 
203
197
  the current record serialized back out as MARC, in binary, XML, or json:
204
198
 
@@ -218,7 +212,7 @@ text of all fields in a range:
218
212
 
219
213
  All of these methods are defined at [Traject::Macros::Marc21](./lib/traject/macros/marc21.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Macros/Marc21))
220
214
 
221
- ## More complex canned MARC semantic logic
215
+ ### More complex canned MARC semantic logic
222
216
 
223
217
  Some more complex (and opinionated/subjective) algorithms for deriving semantics from Marc are also packaged with Traject, but not available by default. To make them available to your indexing, you just need to use ruby `require` and `extend`.
224
218
 
@@ -265,7 +259,7 @@ in a configuration file, using a ruby block, which looks like this:
265
259
  ~~~
266
260
 
267
261
  `do |record, accumulator| ... ` is the definition of a ruby block taking
268
- two arguments. The first one passed in will be a MARC record. The
262
+ two arguments. The first one passed in will be a source record (eg MARC or XML). The
269
263
  second is an array, you add values to the array to send them to
270
264
  output.
271
265
 
@@ -296,6 +290,17 @@ use ruby methods like `map!` to modify it:
296
290
  If you find yourself repeating boilerplate code in your custom logic, you can
297
291
  even create your own 'macros' (like `extract_marc`). `extract_marc`, `translation_map`, `first_only` and other macros are nothing more than methods that return ruby lambda objects of the same format as the blocks you write for custom logic.
298
292
 
293
+ In fact, in addition to a literal block on the end, you can pass as many `proc` objects as you want to transform data.
294
+
295
+ ```ruby
296
+ to_field( "something", extract_xpath("//title"),
297
+ ->(record, acc) { acc << "extra value" },
298
+ method_that_returns_a_proc
299
+ ) do |rec, acc|
300
+ whatever_to(acc)
301
+ end
302
+ ```
303
+
299
304
  For tips, gotchas, and a more complete explanation of how this works, see
300
305
  additional documentation page on [Indexing Rules: Macros and Custom Logic](./doc/indexing_rules.md)
301
306
 
data/doc/xml.md CHANGED
@@ -58,6 +58,14 @@ to_field "title", extract_xpath("/oai:record/oai:metadata/oai:dc/dc:title", ns:
58
58
  })
59
59
  ```
60
60
 
61
+ If you are accessing a nokogiri method directly, like in `some_record.xpath`, the registered default namespaces aren't known by nokogiri -- but they are available in the indexer as `default_namespaces`, so can be referenced and passed into the nokogiri method:
62
+
63
+ ```ruby
64
+ each_record do |record|
65
+ log( record.xpath("//dc:title"), default_namespaces )
66
+ end
67
+ ```
68
+
61
69
  You can use all the standard transforation macros in Traject::Macros::Transformation:
62
70
 
63
71
  ```ruby
@@ -9,12 +9,12 @@ require 'traject/ndj_reader'
9
9
  # the gem traject-marc4j_reader.
10
10
  #
11
11
  # By default assumes binary MARC encoding, please set marc_source.type setting
12
- # for XML or json. If binary, please set marc_source.encoding with char encoding.
12
+ # for XML or json. If binary, please set marc_source.encoding with char encoding.
13
13
  #
14
14
  # ## Settings
15
15
 
16
16
  # * "marc_source.type": serialization type. default 'binary'
17
- # * "binary". standard ISO 2709 "binary" MARC format,
17
+ # * "binary". standard ISO 2709 "binary" MARC format,
18
18
  # will use ruby-marc MARC::Reader (Note, if you are using
19
19
  # type 'binary', you probably want to also set 'marc_source.encoding')
20
20
  # * "xml", MarcXML, will use ruby-marc MARC::XMLReader
@@ -23,15 +23,16 @@ require 'traject/ndj_reader'
23
23
  # allowed, and no unescpaed internal newlines allowed in the json
24
24
  # objects -- we just read line by line, and assume each line is a
25
25
  # marc-in-json. http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/
26
- # will use Traject::NDJReader which uses MARC::Record.new_from_hash.
26
+ # will use Traject::NDJReader which uses MARC::Record.new_from_hash.
27
27
  # * "marc_source.encoding": Only used for marc_source.type 'binary', character encoding
28
28
  # of the source marc records. Can be any
29
29
  # encoding recognized by ruby, OR 'MARC-8'. For 'MARC-8', content will
30
- # be transcoded (by ruby-marc) to UTF-8 in internal MARC::Record Strings.
30
+ # be transcoded (by ruby-marc) to UTF-8 in internal MARC::Record Strings.
31
31
  # Default nil, meaning let MARC::Reader use it's default, which will
32
- # probably be Encoding.default_internal, which will probably be UTF-8.
32
+ # be your system's Encoding.default_external, which will probably be UTF-8.
33
+ # (but may be something unexpected/undesired on Windows, where you may want to set this explicitly.)
33
34
  # Right now Traject::MarcReader is hard-coded to transcode to UTF-8 as
34
- # an internal encoding.
35
+ # an internal encoding.
35
36
  # * "marc_reader.xml_parser": For XML type, which XML parser to tell Marc::Reader
36
37
  # to use. Anything recognized by [Marc::Reader :parser
37
38
  # argument](http://rdoc.info/github/ruby-marc/ruby-marc/MARC/XMLReader).
@@ -75,7 +76,7 @@ class Traject::MarcReader
75
76
  Traject::NDJReader.new(self.input_stream, settings)
76
77
  else
77
78
  args = { :invalid => :replace }
78
- args[:external_encoding] = settings["marc_source.encoding"]
79
+ args[:external_encoding] = settings["marc_source.encoding"]
79
80
  MARC::Reader.new(self.input_stream, args)
80
81
  end
81
82
  end
@@ -1,3 +1,3 @@
1
1
  module Traject
2
- VERSION = "3.0.0.alpha.2"
2
+ VERSION = "3.0.0"
3
3
  end
@@ -56,6 +56,17 @@ describe "Traject::NokogiriIndexer" do
56
56
  refute_empty results.last["rights"]
57
57
  end
58
58
 
59
+ it "exposes nokogiri.namespaces setting in default_namespaces" do
60
+ namespaces = @namespaces
61
+ @indexer.configure do
62
+ settings do
63
+ provide "nokogiri.namespaces", namespaces
64
+ end
65
+ end
66
+ @indexer.settings.fill_in_defaults!
67
+ assert_equal namespaces, @indexer.default_namespaces
68
+ end
69
+
59
70
  describe "xpath to non-terminal element" do
60
71
  before do
61
72
  @xml = <<-EOS
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: traject
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.0.alpha.2
4
+ version: 3.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Rochkind
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2018-08-30 00:00:00.000000000 Z
12
+ date: 2018-10-12 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: concurrent-ruby
@@ -381,9 +381,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
381
381
  version: '0'
382
382
  required_rubygems_version: !ruby/object:Gem::Requirement
383
383
  requirements:
384
- - - ">"
384
+ - - ">="
385
385
  - !ruby/object:Gem::Version
386
- version: 1.3.1
386
+ version: '0'
387
387
  requirements: []
388
388
  rubyforge_project:
389
389
  rubygems_version: 2.7.7