traject 3.0.0.alpha.2 → 3.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8d8fb43f0b2c71754358c324beb828ab3fddd77b2bafeee7b7ff525ee85260ad
4
- data.tar.gz: ba2556368b6b30559c110b752d99e837387f4f257d473b72518375b129eb9557
3
+ metadata.gz: cf92e5467d32d37b681a36ae1ffbd2995bbf3e0def938b13d74831a939b68632
4
+ data.tar.gz: 7c4693ded4a9a8b0e9c599e7489aaefdf9806dfffce6b20ae6054def9ba8c156
5
5
  SHA512:
6
- metadata.gz: 60f7adf990eb03991b6fea938a729eeab076af76f5de4af54e7a1933d0ffb422925144180d4769c5b9af4a50cd12f60e19479c2d0ae31f26b7440512084f3a9a
7
- data.tar.gz: 2e6c120bc9cd820cc708c9bd71dc5d0e91c681341dd29e8c169aaaa0e14cee2282b3362fb4b4d99f71b85a7e642b354e01f727f2c79d6cbcbd07913affd789b1
6
+ metadata.gz: 9e12113a6f53aa9c7629c072df80b1e347f432d069bd30dbb35d73373fccc3fa341682b281a65c778aa2a3eae9fb7b2d52c81c2f39aa17d348074ecb8b9c2512
7
+ data.tar.gz: 6f2294bce5deb181a20db0977f8ab7e73e8e1cda6e86d8ee562fabd7a8cce2c683011be8f3955ccafd0165787dbaf774e7c3571220f5d6e797eaf6fe8a02577d
@@ -4,15 +4,14 @@ cache: bundler
4
4
  # at downloading jruby, and
5
5
  sudo: true
6
6
  rvm:
7
- - 2.3.6
8
- - 2.4.3
7
+ - 2.4.4
9
8
  - 2.5.1
10
9
  - "2.6.0-preview2"
11
10
  # avoid having travis install jdk on MRI builds where we don't need it.
12
11
  matrix:
13
12
  include:
14
13
  - jdk: openjdk8
15
- rvm: jruby-9.0.5.0
14
+ rvm: jruby-9.1.17.0
16
15
  - jdk: openjdk8
17
16
  rvm: jruby-9.2.0.0
18
17
  allow_failures:
data/README.md CHANGED
@@ -26,7 +26,7 @@ Initially by Jonathan Rochkind (Johns Hopkins Libraries) and Bill Dueber (Univer
26
26
 
27
27
  ## Installation
28
28
 
29
- Traject runs under jruby (9.0.x or higher), MRI ruby (2.3.x or higher), or probably any other ruby platform.
29
+ Traject runs under jruby (9.1.x or higher), MRI ruby (2.3.x or higher), or probably any other ruby platform.
30
30
 
31
31
  Once you have ruby installed, just `$ gem install traject`.
32
32
 
@@ -135,12 +135,6 @@ For the syntax and complete possibilities of the specification string argument t
135
135
 
136
136
  To see all options for `extract_marc`, see the [extract_marc](http://rdoc.info/gems/traject/Traject/Macros/Marc21:extract_marc) method documentation.
137
137
 
138
- There is one special MARC-specific transformation macro, that strips punctuation from beginning and end of values using heuristics designed for AACR2 in MARC:
139
-
140
- ```ruby
141
- to_field "title", extract_marc("245abc"), trim_punctuation
142
- ```
143
-
144
138
  ### XML mode, extract_xml
145
139
 
146
140
  See our [xml guide](./doc/xml.md) for more XML examples, but you will usually use extract_xpath.
@@ -190,15 +184,15 @@ Example:
190
184
  to_field "something", extract_xpath("//value"), strip, default("no value"), prepend("Extracted value: ")
191
185
  ```
192
186
 
193
- ### Other built-in utility macros
187
+ ### Some more MARC-specific utility methods
194
188
 
195
- Other built-in methods that can be used with `to_field` include:
189
+ Other built-in methods that can be used with `to_field` for MARC specifically include:
196
190
 
197
- a hard-coded literal string:
191
+ Strip punctuation from beginning and end of values using heuristics designed for AACR2 in MARC:
198
192
 
199
- ~~~ruby
200
- to_field "source", literal("LIB_CATALOG")
201
- ~~~
193
+ ```ruby
194
+ to_field "title", extract_marc("245abc"), trim_punctuation
195
+ ```
202
196
 
203
197
  the current record serialized back out as MARC, in binary, XML, or json:
204
198
 
@@ -218,7 +212,7 @@ text of all fields in a range:
218
212
 
219
213
  All of these methods are defined at [Traject::Macros::Marc21](./lib/traject/macros/marc21.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Macros/Marc21))
220
214
 
221
- ## More complex canned MARC semantic logic
215
+ ### More complex canned MARC semantic logic
222
216
 
223
217
  Some more complex (and opinionated/subjective) algorithms for deriving semantics from Marc are also packaged with Traject, but not available by default. To make them available to your indexing, you just need to use ruby `require` and `extend`.
224
218
 
@@ -265,7 +259,7 @@ in a configuration file, using a ruby block, which looks like this:
265
259
  ~~~
266
260
 
267
261
  `do |record, accumulator| ... ` is the definition of a ruby block taking
268
- two arguments. The first one passed in will be a MARC record. The
262
+ two arguments. The first one passed in will be a source record (eg MARC or XML). The
269
263
  second is an array, you add values to the array to send them to
270
264
  output.
271
265
 
@@ -296,6 +290,17 @@ use ruby methods like `map!` to modify it:
296
290
  If you find yourself repeating boilerplate code in your custom logic, you can
297
291
  even create your own 'macros' (like `extract_marc`). `extract_marc`, `translation_map`, `first_only` and other macros are nothing more than methods that return ruby lambda objects of the same format as the blocks you write for custom logic.
298
292
 
293
+ In fact, in addition to a literal block on the end, you can pass as many `proc` objects as you want to transform data.
294
+
295
+ ```ruby
296
+ to_field( "something", extract_xpath("//title"),
297
+ ->(record, acc) { acc << "extra value" },
298
+ method_that_returns_a_proc
299
+ ) do |rec, acc|
300
+ whatever_to(acc)
301
+ end
302
+ ```
303
+
299
304
  For tips, gotchas, and a more complete explanation of how this works, see
300
305
  additional documentation page on [Indexing Rules: Macros and Custom Logic](./doc/indexing_rules.md)
301
306
 
data/doc/xml.md CHANGED
@@ -58,6 +58,14 @@ to_field "title", extract_xpath("/oai:record/oai:metadata/oai:dc/dc:title", ns:
58
58
  })
59
59
  ```
60
60
 
61
+ If you are accessing a nokogiri method directly, like in `some_record.xpath`, the registered default namespaces aren't known by nokogiri -- but they are available in the indexer as `default_namespaces`, so can be referenced and passed into the nokogiri method:
62
+
63
+ ```ruby
64
+ each_record do |record|
65
+ log( record.xpath("//dc:title"), default_namespaces )
66
+ end
67
+ ```
68
+
61
69
  You can use all the standard transforation macros in Traject::Macros::Transformation:
62
70
 
63
71
  ```ruby
@@ -9,12 +9,12 @@ require 'traject/ndj_reader'
9
9
  # the gem traject-marc4j_reader.
10
10
  #
11
11
  # By default assumes binary MARC encoding, please set marc_source.type setting
12
- # for XML or json. If binary, please set marc_source.encoding with char encoding.
12
+ # for XML or json. If binary, please set marc_source.encoding with char encoding.
13
13
  #
14
14
  # ## Settings
15
15
 
16
16
  # * "marc_source.type": serialization type. default 'binary'
17
- # * "binary". standard ISO 2709 "binary" MARC format,
17
+ # * "binary". standard ISO 2709 "binary" MARC format,
18
18
  # will use ruby-marc MARC::Reader (Note, if you are using
19
19
  # type 'binary', you probably want to also set 'marc_source.encoding')
20
20
  # * "xml", MarcXML, will use ruby-marc MARC::XMLReader
@@ -23,15 +23,16 @@ require 'traject/ndj_reader'
23
23
  # allowed, and no unescpaed internal newlines allowed in the json
24
24
  # objects -- we just read line by line, and assume each line is a
25
25
  # marc-in-json. http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/
26
- # will use Traject::NDJReader which uses MARC::Record.new_from_hash.
26
+ # will use Traject::NDJReader which uses MARC::Record.new_from_hash.
27
27
  # * "marc_source.encoding": Only used for marc_source.type 'binary', character encoding
28
28
  # of the source marc records. Can be any
29
29
  # encoding recognized by ruby, OR 'MARC-8'. For 'MARC-8', content will
30
- # be transcoded (by ruby-marc) to UTF-8 in internal MARC::Record Strings.
30
+ # be transcoded (by ruby-marc) to UTF-8 in internal MARC::Record Strings.
31
31
  # Default nil, meaning let MARC::Reader use it's default, which will
32
- # probably be Encoding.default_internal, which will probably be UTF-8.
32
+ # be your system's Encoding.default_external, which will probably be UTF-8.
33
+ # (but may be something unexpected/undesired on Windows, where you may want to set this explicitly.)
33
34
  # Right now Traject::MarcReader is hard-coded to transcode to UTF-8 as
34
- # an internal encoding.
35
+ # an internal encoding.
35
36
  # * "marc_reader.xml_parser": For XML type, which XML parser to tell Marc::Reader
36
37
  # to use. Anything recognized by [Marc::Reader :parser
37
38
  # argument](http://rdoc.info/github/ruby-marc/ruby-marc/MARC/XMLReader).
@@ -75,7 +76,7 @@ class Traject::MarcReader
75
76
  Traject::NDJReader.new(self.input_stream, settings)
76
77
  else
77
78
  args = { :invalid => :replace }
78
- args[:external_encoding] = settings["marc_source.encoding"]
79
+ args[:external_encoding] = settings["marc_source.encoding"]
79
80
  MARC::Reader.new(self.input_stream, args)
80
81
  end
81
82
  end
@@ -1,3 +1,3 @@
1
1
  module Traject
2
- VERSION = "3.0.0.alpha.2"
2
+ VERSION = "3.0.0"
3
3
  end
@@ -56,6 +56,17 @@ describe "Traject::NokogiriIndexer" do
56
56
  refute_empty results.last["rights"]
57
57
  end
58
58
 
59
+ it "exposes nokogiri.namespaces setting in default_namespaces" do
60
+ namespaces = @namespaces
61
+ @indexer.configure do
62
+ settings do
63
+ provide "nokogiri.namespaces", namespaces
64
+ end
65
+ end
66
+ @indexer.settings.fill_in_defaults!
67
+ assert_equal namespaces, @indexer.default_namespaces
68
+ end
69
+
59
70
  describe "xpath to non-terminal element" do
60
71
  before do
61
72
  @xml = <<-EOS
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: traject
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.0.alpha.2
4
+ version: 3.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Rochkind
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2018-08-30 00:00:00.000000000 Z
12
+ date: 2018-10-12 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: concurrent-ruby
@@ -381,9 +381,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
381
381
  version: '0'
382
382
  required_rubygems_version: !ruby/object:Gem::Requirement
383
383
  requirements:
384
- - - ">"
384
+ - - ">="
385
385
  - !ruby/object:Gem::Version
386
- version: 1.3.1
386
+ version: '0'
387
387
  requirements: []
388
388
  rubyforge_project:
389
389
  rubygems_version: 2.7.7