traject 3.1.0 → 3.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/ruby.yml +35 -0
- data/CHANGES.md +46 -0
- data/README.md +18 -2
- data/doc/settings.md +5 -1
- data/doc/xml.md +12 -0
- data/examples/marc/tiny.xml +35 -0
- data/lib/traject/command_line.rb +34 -43
- data/lib/traject/debug_writer.rb +1 -1
- data/lib/traject/indexer.rb +12 -4
- data/lib/traject/macros/marc21.rb +3 -3
- data/lib/traject/macros/marc21_semantics.rb +15 -12
- data/lib/traject/macros/nokogiri_macros.rb +9 -3
- data/lib/traject/marc_extractor.rb +3 -3
- data/lib/traject/nokogiri_reader.rb +10 -1
- data/lib/traject/oai_pmh_nokogiri_reader.rb +9 -3
- data/lib/traject/solr_json_writer.rb +38 -7
- data/lib/traject/version.rb +1 -1
- data/lib/translation_maps/marc_languages.yaml +77 -48
- data/test/command_line_test.rb +52 -0
- data/test/debug_writer_test.rb +13 -0
- data/test/delimited_writer_test.rb +14 -16
- data/test/indexer/class_level_configuration_test.rb +23 -0
- data/test/indexer/macros/macros_marc21_semantics_test.rb +4 -0
- data/test/indexer/nokogiri_indexer_test.rb +35 -0
- data/test/indexer/read_write_test.rb +14 -3
- data/test/nokogiri_reader_test.rb +10 -0
- data/test/solr_json_writer_test.rb +65 -0
- data/test/test_support/date_resort_to_264.marc +1 -0
- data/traject.gemspec +3 -3
- metadata +31 -21
- data/.travis.yml +0 -16
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2e47e6648ed9fc963d18e10c9be48a30273147c4920cb4b7e448d078fd2398ac
|
4
|
+
data.tar.gz: efa549ebcbd87e599b56b955b4bd26422dfe7de67697aed6b39cb421c3b80677
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6acdd2b8cfc888b221a1f19cd5197127006be81d0525169d531fc9bf43fe02cc9ec87401e6b2442c57ff0cd483d9884504ac75be92e3718cbbc49208dc97024f
|
7
|
+
data.tar.gz: 30abefa7af9e1c170ae8570aa59b6c571a9acc1eb7b0abf6efd64d97550b678c21c72d76a8156ef3844ab01154111fd3747f96d6346ee8a8d76e747b2cf92e1f
|
@@ -0,0 +1,35 @@
|
|
1
|
+
name: CI
|
2
|
+
|
3
|
+
on:
|
4
|
+
push:
|
5
|
+
branches: [ master ]
|
6
|
+
pull_request:
|
7
|
+
branches: ['**']
|
8
|
+
|
9
|
+
jobs:
|
10
|
+
tests:
|
11
|
+
runs-on: ubuntu-latest
|
12
|
+
strategy:
|
13
|
+
fail-fast: false
|
14
|
+
matrix:
|
15
|
+
ruby: [ '2.4', '2.5', '2.6', '2.7', '3.0', 'jruby-9.1', 'jruby-9.2' ]
|
16
|
+
name: Ruby ${{ matrix.ruby }}
|
17
|
+
steps:
|
18
|
+
- uses: actions/checkout@v2
|
19
|
+
|
20
|
+
- name: Set up Ruby
|
21
|
+
uses: ruby/setup-ruby@v1
|
22
|
+
with:
|
23
|
+
ruby-version: ${{ matrix.ruby }}
|
24
|
+
|
25
|
+
- name: set JAVA_OPTS for jruby-9.1
|
26
|
+
run: echo 'JAVA_OPTS="--add-opens java.base/java.security.cert=ALL-UNNAMED --add-opens java.base/java.security=ALL-UNNAMED --add-opens java.base/java.util.zip=ALL-UNNAMED"' >> $GITHUB_ENV
|
27
|
+
if: ${{ matrix.ruby == 'jruby-9.1' }}
|
28
|
+
# https://github.com/jruby/jruby/issues/4834
|
29
|
+
# Still seems to be an issue in jruby-9.1, but not 9.2
|
30
|
+
# https://github.community/t/conditional-setting-of-env-variables-in-gh-actions/179650
|
31
|
+
|
32
|
+
- name: Install dependencies
|
33
|
+
run: bundle install --jobs 4 --retry 3
|
34
|
+
- name: Run tests
|
35
|
+
run: bundle exec rake
|
data/CHANGES.md
CHANGED
@@ -1,5 +1,51 @@
|
|
1
1
|
# Changes
|
2
2
|
|
3
|
+
## Next
|
4
|
+
|
5
|
+
*
|
6
|
+
|
7
|
+
*
|
8
|
+
|
9
|
+
## 3.6.0
|
10
|
+
|
11
|
+
* Tiny backward compat changes for ruby 3.0 compat. https://github.com/traject/traject/pull/263
|
12
|
+
|
13
|
+
* Allow gem `http` 5.x in gemspec. https://github.com/traject/traject/pull/269
|
14
|
+
|
15
|
+
## 3.5.0
|
16
|
+
|
17
|
+
* `traject -v` and `traject -h` correctly return 0 exit code indicating success.
|
18
|
+
|
19
|
+
* upgrade to slop gem 4.x, which carries with it a slightly different format of human-readable command-line arg errors, should be otherwise invisible.
|
20
|
+
|
21
|
+
* the SolrJsonWriter now supports HTTP basic auth credentials embedded in `solr.url` or `solr.update_url`, eg `http://user:pass@example.org/solr` https://github.com/traject/traject/pull/262
|
22
|
+
|
23
|
+
|
24
|
+
## 3.4.0
|
25
|
+
|
26
|
+
* XML-mode `extract_xpath` now supports extracting attribute values with xpath @attr syntax.
|
27
|
+
|
28
|
+
## 3.3.0
|
29
|
+
|
30
|
+
* `Traject::Macros::Marc21Semantics.publication_date` now gets date from 264 before 260. https://github.com/traject/traject/pull/233
|
31
|
+
|
32
|
+
* Allow hashie 4.x in gemspec https://github.com/traject/traject/pull/234
|
33
|
+
|
34
|
+
* Allow `http` gem 4.x versions. https://github.com/traject/traject/pull/236
|
35
|
+
|
36
|
+
* Can now call class-level Indexer.configure multiple times https://github.com/sciencehistory/scihist_digicoll/pull/525
|
37
|
+
|
38
|
+
## 3.2.0
|
39
|
+
|
40
|
+
* NokogiriReader has a "nokogiri.strict_mode" setting. Set to true or string 'true' to ask Nokogori to parse in strict mode, so it will immediately raise on ill-formed XML, instead of nokogiri's default to do what it can with it. https://github.com/traject/traject/pull/226
|
41
|
+
|
42
|
+
* SolrJsonWriter
|
43
|
+
|
44
|
+
* Utility method `delete_all!` sends a delete all query to the Solr URL endpoint. https://github.com/traject/traject/pull/227
|
45
|
+
|
46
|
+
* Allow basic auth configuration of the default http client via `solr_writer.basic_auth_user` and `solr_writer.basic_auth_password`. https://github.com/traject/traject/pull/231
|
47
|
+
|
48
|
+
|
3
49
|
## 3.1.0
|
4
50
|
|
5
51
|
### Added
|
data/README.md
CHANGED
@@ -8,8 +8,8 @@ Traject can also be generalized to a set of tools for getting structured data fr
|
|
8
8
|
|
9
9
|
**Traject is stable, mature software, that is already being used in production by its authors and several other institutions.**
|
10
10
|
|
11
|
-
[](http://badge.fury.io/rb/traject)
|
12
|
+
[](https://github.com/traject/traject/actions?query=workflow%3ACI+branch%3Amaster)
|
13
13
|
|
14
14
|
|
15
15
|
## Background/Goals
|
@@ -468,6 +468,22 @@ Also see `-I load_path` option and suggestions for Bundler use under Extending W
|
|
468
468
|
See also [Hints for batch and cronjob use](./doc/batch_execution.md) of traject.
|
469
469
|
|
470
470
|
|
471
|
+
## A small but complete example
|
472
|
+
|
473
|
+
To process a MARC XML file with the data shown in [./examples/marc/tiny.xml](./examples/marc/tiny.xml) you can use save the following configuration as `config.rb`:
|
474
|
+
|
475
|
+
```
|
476
|
+
to_field 'title', extract_marc('245a', first: true)
|
477
|
+
```
|
478
|
+
|
479
|
+
and run Traject as follows:
|
480
|
+
|
481
|
+
```
|
482
|
+
traject -t xml -c config.rb -w Traject::DebugWriter tiny.xml
|
483
|
+
```
|
484
|
+
|
485
|
+
`-t xml` indicates that the file is a MARC XML file. `-w Traject::DebugWriter` outputs the results to the console (e.g. without saving to Solr).
|
486
|
+
|
471
487
|
## Extending With Your Own Code
|
472
488
|
|
473
489
|
Traject config files are full live ruby files, where you can do anything,
|
data/doc/settings.md
CHANGED
@@ -83,7 +83,8 @@ settings are applied first of all. It's recommended you use `provide`.
|
|
83
83
|
### Writing to solr
|
84
84
|
|
85
85
|
* `json_writer.pretty_print`: used by the JsonWriter, if set to true, will output pretty printed json (with added whitespace) for easier human readability. Default false.
|
86
|
-
|
86
|
+
|
87
|
+
* `solr.url`: URL to connect to a solr instance for indexing, eg http://example.org:8983/solr . Command-line short-cut `-u`. (Can include embedded HTTP basic auth as eg `http://user:pass@example.org/solr`)
|
87
88
|
|
88
89
|
* `solr.version`: Set to eg "1.4.0", "4.3.0"; currently un-used, but in the future will control some default settings, and/or sanity check and warn you if you're doing something that might not work with that version of solr. Set now for help in the future.
|
89
90
|
|
@@ -93,6 +94,9 @@ settings are applied first of all. It's recommended you use `provide`.
|
|
93
94
|
|
94
95
|
* `solr_writer.thread_pool`: defaults to 1 (single bg thread). A thread pool is used for submitting docs to solr. Set to 0 or nil to disable threading. Set to 1, there will still be a single bg thread doing the adds. May make sense to set higher than number of cores on your indexing machine, as these threads will mostly be waiting on Solr. Speed/capacity of your solr might be more relevant. Note that processing_thread_pool threads can end up submitting to solr too, if solr_json_writer.thread_pool is full.
|
95
96
|
|
97
|
+
* `solr_writer.basic_auth_user`, `solr_writer.basic_auth_password`: Not set by default but when both are set the default writer is configured with basic auth. You can also just embed basic
|
98
|
+
auth credentials in `solr.url` using standard URI syntax.
|
99
|
+
|
96
100
|
|
97
101
|
### Dealing with MARC data
|
98
102
|
|
data/doc/xml.md
CHANGED
@@ -4,6 +4,8 @@ The [NokogiriIndexer](../lib/traject/nokogiri_indexer.md) is a Traject::Indexer
|
|
4
4
|
|
5
5
|
It by default uses the NokogiriReader to read XML and read Nokogiri::XML::Documents, and includes the NokogiriMacros mix-in, with some macros for operating on Nokogiri::XML::Documents.
|
6
6
|
|
7
|
+
Plese notice that the recommened mechanism to parse MARC XML files with Traject is via the `-t` parameter (or the via the `provide "marc_source.type", "xml"` setting). The documentation in this page is for those parsing other (non MARC) XML files.
|
8
|
+
|
7
9
|
## On the command-line
|
8
10
|
|
9
11
|
You can tell the traject command-line to use the NokogiriIndexer with the `-i xml` flag:
|
@@ -72,6 +74,16 @@ You can use all the standard transforation macros in Traject::Macros::Transforma
|
|
72
74
|
to_field "something", extract_xpath("//value"), first_only, translation_map("some_map"), default("no value")
|
73
75
|
```
|
74
76
|
|
77
|
+
### selecting attribute values
|
78
|
+
|
79
|
+
Just works, using xpath syntax for selecting an attribute:
|
80
|
+
|
81
|
+
|
82
|
+
```ruby
|
83
|
+
# gets status value in: <oai:header status="something">
|
84
|
+
to_field "status", extract_xpath("//oai:record/oai:header/@status")
|
85
|
+
```
|
86
|
+
|
75
87
|
|
76
88
|
### selecting non-text nodes
|
77
89
|
|
@@ -0,0 +1,35 @@
|
|
1
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
2
|
+
<collection xmlns="http://www.loc.gov/MARC21/slim" xmlns:marc="http://www.loc.gov/MARC21/slim">
|
3
|
+
<record>
|
4
|
+
<leader>01352cam a2200349 a 4500</leader>
|
5
|
+
<datafield tag="245" ind1="0" ind2="0">
|
6
|
+
<subfield code="6">880-01</subfield>
|
7
|
+
<subfield code="a">Kazoku kankei no shakai shinrigaku /</subfield>
|
8
|
+
<subfield code="c">Osada Masayoshi hen.</subfield>
|
9
|
+
</datafield>
|
10
|
+
</record>
|
11
|
+
<record>
|
12
|
+
<leader>01121ccm a2200289z 4500</leader>
|
13
|
+
<datafield tag="245" ind1="1" ind2="0">
|
14
|
+
<subfield code="a">Powhatan's daughter :</subfield>
|
15
|
+
<subfield code="b">march</subfield>
|
16
|
+
</datafield>
|
17
|
+
<datafield tag="100" ind1="1" ind2=" ">
|
18
|
+
<subfield code="a">Sousa, John Philip,</subfield>
|
19
|
+
<subfield code="d">1854-1932,</subfield>
|
20
|
+
<subfield code="e">composer.</subfield>
|
21
|
+
</datafield>
|
22
|
+
</record>
|
23
|
+
<record>
|
24
|
+
<leader>01137cam a2200301 a 4500</leader>
|
25
|
+
<datafield tag="245" ind1="1" ind2="0">
|
26
|
+
<subfield code="a">Two pieces /</subfield>
|
27
|
+
<subfield code="c">by Frank O'Hara.</subfield>
|
28
|
+
</datafield>
|
29
|
+
<datafield tag="100" ind1="1" ind2=" ">
|
30
|
+
<subfield code="a">O'Hara, Frank,</subfield>
|
31
|
+
<subfield code="d">1926-1966.</subfield>
|
32
|
+
<subfield code="0">http://id.loc.gov/authorities/names/n79042130</subfield>
|
33
|
+
</datafield>
|
34
|
+
</record>
|
35
|
+
</collection>
|
data/lib/traject/command_line.rb
CHANGED
@@ -29,10 +29,10 @@ module Traject
|
|
29
29
|
self.console = $stderr
|
30
30
|
|
31
31
|
self.orig_argv = argv.dup
|
32
|
-
self.remaining_argv = argv
|
33
32
|
|
34
|
-
self.slop = create_slop!
|
35
|
-
self.options =
|
33
|
+
self.slop = create_slop!(argv)
|
34
|
+
self.options = self.slop
|
35
|
+
self.remaining_argv = self.slop.arguments
|
36
36
|
end
|
37
37
|
|
38
38
|
# Returns true on success or false on failure; may also raise exceptions;
|
@@ -40,11 +40,11 @@ module Traject
|
|
40
40
|
def execute
|
41
41
|
if options[:version]
|
42
42
|
self.console.puts "traject version #{Traject::VERSION}"
|
43
|
-
return
|
43
|
+
return true
|
44
44
|
end
|
45
45
|
if options[:help]
|
46
|
-
self.console.puts slop.
|
47
|
-
return
|
46
|
+
self.console.puts slop.to_s
|
47
|
+
return true
|
48
48
|
end
|
49
49
|
|
50
50
|
|
@@ -179,11 +179,11 @@ module Traject
|
|
179
179
|
end
|
180
180
|
|
181
181
|
def arg_check!
|
182
|
-
if options[:command] == "process" && (options[:conf]
|
182
|
+
if options[:command] == "process" && (!options[:conf] || options[:conf].length == 0)
|
183
183
|
self.console.puts "Error: Missing required configuration file"
|
184
184
|
self.console.puts "Exiting..."
|
185
185
|
self.console.puts
|
186
|
-
self.console.puts self.slop.
|
186
|
+
self.console.puts self.slop.to_s
|
187
187
|
exit 2
|
188
188
|
end
|
189
189
|
end
|
@@ -234,28 +234,36 @@ module Traject
|
|
234
234
|
end
|
235
235
|
|
236
236
|
|
237
|
-
def create_slop!
|
238
|
-
|
239
|
-
banner "traject [options] -c configuration.rb [-c config2.rb] file.mrc"
|
237
|
+
def create_slop!(argv)
|
238
|
+
options = Slop::Options.new do |o|
|
239
|
+
o.banner = "traject [options] -c configuration.rb [-c config2.rb] file.mrc"
|
240
240
|
|
241
|
-
on 'v', 'version', "print version information to stderr"
|
242
|
-
on 'd', 'debug', "Include debug log, -s log.level=debug"
|
243
|
-
on 'h', 'help', "print usage information to stderr"
|
244
|
-
|
245
|
-
|
246
|
-
|
247
|
-
|
248
|
-
|
249
|
-
|
250
|
-
|
251
|
-
|
252
|
-
|
241
|
+
o.on '-v', '--version', "print version information to stderr"
|
242
|
+
o.on '-d', '--debug', "Include debug log, -s log.level=debug"
|
243
|
+
o.on '-h', '--help', "print usage information to stderr"
|
244
|
+
o.array '-c', '--conf', 'configuration file path (repeatable)', :delimiter => nil
|
245
|
+
o.string "-i", '--indexer', "Traject indexer class name or shortcut", :default => "marc"
|
246
|
+
o.array "-s", "--setting", "settings: `-s key=value` (repeatable)", :delimiter => nil
|
247
|
+
o.string "-r", "--reader", "Set reader class, shortcut for -s reader_class_name="
|
248
|
+
o.string "-o", "--output_file", "output file for Writer classes that write to files"
|
249
|
+
o.string "-w", "--writer", "Set writer class, shortcut for -s writer_class_name="
|
250
|
+
o.string "-u", "--solr", "Set solr url, shortcut for -s solr.url="
|
251
|
+
o.string "-t", "--marc_type", "xml, json or binary. shortcut for -s marc_source.type="
|
252
|
+
o.array "-I", "--load_path", "append paths to ruby $LOAD_PATH", :delimiter => ":"
|
253
253
|
|
254
|
-
|
254
|
+
o.string "-x", "--command", "alternate traject command: process (default); marcout; commit", :default => "process"
|
255
255
|
|
256
|
-
on "stdin", "read input from stdin"
|
257
|
-
on "debug-mode", "debug logging, single threaded, output human readable hashes"
|
256
|
+
o.on "--stdin", "read input from stdin"
|
257
|
+
o.on "--debug-mode", "debug logging, single threaded, output human readable hashes"
|
258
258
|
end
|
259
|
+
|
260
|
+
options.parse(argv)
|
261
|
+
rescue Slop::Error => e
|
262
|
+
self.console.puts "Error: #{e.message}"
|
263
|
+
self.console.puts "Exiting..."
|
264
|
+
self.console.puts
|
265
|
+
self.console.puts options.to_s
|
266
|
+
exit 1
|
259
267
|
end
|
260
268
|
|
261
269
|
def initialize_indexer!
|
@@ -267,22 +275,5 @@ module Traject
|
|
267
275
|
|
268
276
|
return indexer
|
269
277
|
end
|
270
|
-
|
271
|
-
def parse_options(argv)
|
272
|
-
|
273
|
-
begin
|
274
|
-
self.slop.parse!(argv)
|
275
|
-
rescue Slop::Error => e
|
276
|
-
self.console.puts "Error: #{e.message}"
|
277
|
-
self.console.puts "Exiting..."
|
278
|
-
self.console.puts
|
279
|
-
self.console.puts slop.help
|
280
|
-
exit 1
|
281
|
-
end
|
282
|
-
|
283
|
-
return self.slop.to_hash
|
284
|
-
end
|
285
|
-
|
286
|
-
|
287
278
|
end
|
288
279
|
end
|
data/lib/traject/debug_writer.rb
CHANGED
@@ -62,7 +62,7 @@ All records are assumed to have a unique id. You can set which field to look in
|
|
62
62
|
def serialize(context)
|
63
63
|
h = context.output_hash
|
64
64
|
rec_key = record_number(context)
|
65
|
-
lines = h.keys.sort.map { |k| @format % [rec_key, k, h[k].join(' | ')] }
|
65
|
+
lines = h.keys.sort.map { |k| @format % [rec_key, k, (h[k] || []).join(' | ')] }
|
66
66
|
lines.push "\n"
|
67
67
|
lines.join("\n")
|
68
68
|
end
|
data/lib/traject/indexer.rb
CHANGED
@@ -190,7 +190,7 @@ class Traject::Indexer
|
|
190
190
|
instance_eval(&block)
|
191
191
|
end
|
192
192
|
|
193
|
-
## Class level configure block accepted too, and applied at instantiation
|
193
|
+
## Class level configure block(s) accepted too, and applied at instantiation
|
194
194
|
# before instance-level configuration.
|
195
195
|
#
|
196
196
|
# EXPERIMENTAL, implementation may change in ways that effect some uses.
|
@@ -199,8 +199,14 @@ class Traject::Indexer
|
|
199
199
|
# Note that settings set by 'provide' in subclass can not really be overridden
|
200
200
|
# by 'provide' in a next level subclass. Use self.default_settings instead, with
|
201
201
|
# call to super.
|
202
|
+
#
|
203
|
+
# You can call this .configure multiple times, blocks are added to a list, and
|
204
|
+
# will be used to initialize an instance in order.
|
205
|
+
#
|
206
|
+
# The main downside of this workaround implementation is performance, even though
|
207
|
+
# defined at load-time on class level, blocks are all executed on every instantiation.
|
202
208
|
def self.configure(&block)
|
203
|
-
@
|
209
|
+
(@class_configure_blocks ||= []) << block
|
204
210
|
end
|
205
211
|
|
206
212
|
def self.apply_class_configure_block(instance)
|
@@ -208,8 +214,10 @@ class Traject::Indexer
|
|
208
214
|
if self.superclass.respond_to?(:apply_class_configure_block)
|
209
215
|
self.superclass.apply_class_configure_block(instance)
|
210
216
|
end
|
211
|
-
if @
|
212
|
-
|
217
|
+
if @class_configure_blocks && !@class_configure_blocks.empty?
|
218
|
+
@class_configure_blocks.each do |block|
|
219
|
+
instance.configure(&block)
|
220
|
+
end
|
213
221
|
end
|
214
222
|
end
|
215
223
|
|
@@ -15,7 +15,7 @@ module Traject::Macros
|
|
15
15
|
# field/substring specification.
|
16
16
|
#
|
17
17
|
# First argument is a string spec suitable for the MarcExtractor, see
|
18
|
-
# MarcExtractor::
|
18
|
+
# Traject::MarcExtractor::Spec.
|
19
19
|
#
|
20
20
|
# Second arg is optional options, including options valid on MarcExtractor.new,
|
21
21
|
# and others. By default, will de-duplicate results, but see :allow_duplicates
|
@@ -42,11 +42,11 @@ module Traject::Macros
|
|
42
42
|
#
|
43
43
|
# * :translation_map => String: translate with named translation map looked up in load
|
44
44
|
# path, uses Tranject::TranslationMap.new(translation_map_arg).
|
45
|
-
# **Instead**, use `extract_marc(whatever), translation_map(translation_map_arg)
|
45
|
+
# **Instead**, use `extract_marc(whatever), translation_map(translation_map_arg)`
|
46
46
|
#
|
47
47
|
# * :trim_punctuation => true; trims leading/trailing punctuation using standard algorithms that
|
48
48
|
# have shown themselves useful with Marc, using Marc21.trim_punctuation. **Instead**, use
|
49
|
-
# `extract_marc(whatever), trim_punctuation
|
49
|
+
# `extract_marc(whatever), trim_punctuation`
|
50
50
|
#
|
51
51
|
# * :default => String: if otherwise empty, add default value. **Instead**, use `extract_marc(whatever), default("default value")`
|
52
52
|
#
|
@@ -26,19 +26,19 @@ module Traject::Macros
|
|
26
26
|
accumulator.concat list.uniq if list
|
27
27
|
end
|
28
28
|
end
|
29
|
-
|
29
|
+
|
30
30
|
# If a num begins with a known OCLC prefix, return it without the prefix.
|
31
31
|
# otherwise nil.
|
32
32
|
#
|
33
|
-
# Allow (OCoLC) and/or ocn/ocm/on
|
34
|
-
|
33
|
+
# Allow (OCoLC) and/or ocn/ocm/on
|
34
|
+
|
35
35
|
OCLCPAT = /
|
36
36
|
\A\s*
|
37
37
|
(?:(?:\(OCoLC\)) |
|
38
38
|
(?:\(OCoLC\))?(?:(?:ocm)|(?:ocn)|(?:on))
|
39
39
|
)(\d+)
|
40
40
|
/x
|
41
|
-
|
41
|
+
|
42
42
|
def self.oclcnum_extract(num)
|
43
43
|
if m = OCLCPAT.match(num)
|
44
44
|
return m[1]
|
@@ -364,13 +364,16 @@ module Traject::Macros
|
|
364
364
|
end
|
365
365
|
end
|
366
366
|
end
|
367
|
-
# Okay, nothing from 008, try 260
|
367
|
+
# Okay, nothing from 008, first try 264, then try 260
|
368
368
|
if found_date.nil?
|
369
|
+
v264c = MarcExtractor.cached("264c", :separator => nil).extract(record).first
|
369
370
|
v260c = MarcExtractor.cached("260c", :separator => nil).extract(record).first
|
370
371
|
# just try to take the first four digits out of there, we're not going to try
|
371
372
|
# anything crazy.
|
372
|
-
if m = /(\d{4})/.match(
|
373
|
+
if m = /(\d{4})/.match(v264c)
|
373
374
|
found_date = m[1].to_i
|
375
|
+
elsif m = /(\d{4})/.match(v260c)
|
376
|
+
found_date = m[1].to_i
|
374
377
|
end
|
375
378
|
end
|
376
379
|
|
@@ -519,11 +522,11 @@ module Traject::Macros
|
|
519
522
|
|
520
523
|
# Extracts LCSH-carrying fields, and formatting them
|
521
524
|
# as a pre-coordinated LCSH string, for instance suitable for including
|
522
|
-
# in a facet.
|
525
|
+
# in a facet.
|
523
526
|
#
|
524
527
|
# You can supply your own list of fields as a spec, but for significant
|
525
528
|
# customization you probably just want to write your own method in
|
526
|
-
# terms of the Marc21Semantics.assemble_lcsh method.
|
529
|
+
# terms of the Marc21Semantics.assemble_lcsh method.
|
527
530
|
def marc_lcsh_formatted(options = {})
|
528
531
|
spec = options[:spec] || "600:610:611:630:648:650:651:654:662"
|
529
532
|
subd_separator = options[:subdivison_separator] || " — "
|
@@ -540,17 +543,17 @@ module Traject::Macros
|
|
540
543
|
end
|
541
544
|
|
542
545
|
# Takes a MARC::Field and formats it into a pre-coordinated LCSH string
|
543
|
-
# with subdivision seperators in the right place.
|
546
|
+
# with subdivision seperators in the right place.
|
544
547
|
#
|
545
548
|
# For 600 fields especially, need to not just join with subdivision seperator
|
546
549
|
# to take acount of $a$d$t -- for other fields, might be able to just
|
547
|
-
# join subfields, not sure.
|
550
|
+
# join subfields, not sure.
|
548
551
|
#
|
549
552
|
# WILL strip trailing period from generated string, contrary to some LCSH practice.
|
550
553
|
# Our data is inconsistent on whether it has period or not, this was
|
551
|
-
# the easiest way to standardize.
|
554
|
+
# the easiest way to standardize.
|
552
555
|
#
|
553
|
-
# Default subdivision seperator is em-dash with spaces, set to '--' if you want.
|
556
|
+
# Default subdivision seperator is em-dash with spaces, set to '--' if you want.
|
554
557
|
#
|
555
558
|
# Cite: "Dash (-) that precedes a subdivision in an extended 600 subject heading
|
556
559
|
# is not carried in the MARC record. It may be system generated as a display constant
|