traject 1.0.0.beta.2 → 1.0.0.beta.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/doc/batch_execution.md +60 -8
- data/doc/extending.md +3 -1
- data/doc/settings.md +2 -2
- data/lib/traject/indexer.rb +29 -26
- data/lib/traject/macros/marc_format_classifier.rb +7 -4
- data/lib/traject/version.rb +1 -1
- data/test/indexer/macros_marc21_semantics_test.rb +50 -4
- data/test/indexer/macros_marc21_test.rb +6 -0
- data/test/marc_format_classifier_test.rb +5 -1
- data/test/test_helper.rb +10 -0
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: fba53ed999c0449a6de13e4e1455399431c4e052
|
4
|
+
data.tar.gz: 5af370ada3fd4f779607bd7e3f294607a7a20208
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f062f33724bdf1d11260edb675a239ebfcd834238dc98806c638d1a9c955acb5819fd2c6fbf15f023b683b2acd68913d100c8ede7f75ca62382e783ecba429da
|
7
|
+
data.tar.gz: d0efcb96f544f7cdc42517cb08a067f8d1f20de1fcbf34a53af1adbdca4b127105f7e659fed11625966605bdef79d5f23433dc0e209b159a447686fa887ab75e
|
data/doc/batch_execution.md
CHANGED
@@ -18,7 +18,9 @@ with jruby 1.7.x or later, this should be default, recommend
|
|
18
18
|
you use jruby 1.7.x.
|
19
19
|
|
20
20
|
Especially when running under a cron job, it can be difficult to
|
21
|
-
set things up so traject runs under jruby
|
21
|
+
set things up so traject runs under jruby -- and then when you add
|
22
|
+
bundler into it, things can get positively byzantine. It's not you,
|
23
|
+
this gets confusing.
|
22
24
|
|
23
25
|
It can sometimes be useful to create a wrapper script for traject
|
24
26
|
that takes care of making sure it's running under the right ruby
|
@@ -31,8 +33,11 @@ Simply run with:
|
|
31
33
|
chruby-exec jruby -- traject {other arguments}
|
32
34
|
|
33
35
|
Whether specifying that directly in a crontab, or in a shell script
|
34
|
-
that needs to call traject, etc.
|
35
|
-
|
36
|
+
that needs to call traject, etc. In a crontab environment, it'll actually need
|
37
|
+
you to set PATH and SHELL variables, as specified in the [chruby docs](https://github.com/postmodern/chruby/wiki/Cron)
|
38
|
+
|
39
|
+
|
40
|
+
So simple you might not need a wrapper script, but it might still be convenient to create one. Say
|
36
41
|
you put a `jruby-traject` at `/usr/local/bin/jruby-traject`, that
|
37
42
|
looks like this:
|
38
43
|
|
@@ -40,9 +45,55 @@ looks like this:
|
|
40
45
|
|
41
46
|
chruby-exec jruby -- traject "$@"
|
42
47
|
|
43
|
-
Now
|
44
|
-
|
45
|
-
|
48
|
+
Now you can can just execute `jruby-traject {arguments}`, and execute traject
|
49
|
+
in a jruby environment. (In a crontab, you'll still need to fix your
|
50
|
+
PATH and SHELL env variables for `chruby-exec` to work, either in the
|
51
|
+
crontab or in this wrapper script)
|
52
|
+
|
53
|
+
### chruby monster wrapper script
|
54
|
+
|
55
|
+
I am still not sure if this is a good idea, but here's an example of
|
56
|
+
a wrapper script for chruby that will take care of the ENV even
|
57
|
+
when running in a crontab, use chruby-exec only if jruby isn't
|
58
|
+
already the default ruby, and add in `bundle exec` too.
|
59
|
+
|
60
|
+
~~~bash
|
61
|
+
#!/usr/bin/env bash
|
62
|
+
|
63
|
+
# A wrapper for traject that uses chruby to make sure jruby
|
64
|
+
# is being used before calling traject, and then calls
|
65
|
+
# traject with bundle exec from within our traject project
|
66
|
+
# dir.
|
67
|
+
|
68
|
+
# Make sure /usr/local/bin is in PATH for chruby-exec,
|
69
|
+
# which it's not ordinarily in a cronjob.
|
70
|
+
if [[ ":$PATH:" != *":/usr/local/bin:"* ]]
|
71
|
+
then
|
72
|
+
export PATH=$PATH:/usr/local/bin
|
73
|
+
fi
|
74
|
+
# chruby needs SHELL set, which it won't be from a crontab
|
75
|
+
export SHELL=/bin/bash
|
76
|
+
|
77
|
+
# Find the dir based on location of this wrapper script,
|
78
|
+
# then use that dir to cd to for the bundle exec to find
|
79
|
+
# the right Gemfile.
|
80
|
+
traject_dir=$(cd `dirname "${BASH_SOURCE[0]}"` && pwd)
|
81
|
+
|
82
|
+
# do we need to use chruby to switch to jruby?
|
83
|
+
if [[ "$(ruby -v)" == *jruby* ]]
|
84
|
+
then
|
85
|
+
ruby_picker="" # nothing needed "
|
86
|
+
else
|
87
|
+
ruby_picker="chruby-exec jruby --"
|
88
|
+
fi
|
89
|
+
|
90
|
+
cmd="BUNDLE_GEMFILE=$traject_dir/Gemfile $ruby_picker bundle exec traject $@"
|
91
|
+
|
92
|
+
echo $cmd
|
93
|
+
eval $cmd
|
94
|
+
~~~
|
95
|
+
|
96
|
+
This monster script can perhaps be adapted for rbenv or rvm.
|
46
97
|
|
47
98
|
### for rbenv
|
48
99
|
|
@@ -62,7 +113,7 @@ If you're running inside a cronjob, things get a bit trickier,
|
|
62
113
|
because rbenv isn't normally set up in the limited environment
|
63
114
|
of cron tasks. One way to deal with this is to have your
|
64
115
|
cronjob explicitly execute in a bash login shell, that
|
65
|
-
will then have rbenv set up so long as it's running
|
116
|
+
will then have rbenv set up -- so long as it's running
|
66
117
|
under an account with rbenv set up properly!
|
67
118
|
|
68
119
|
# in a cronfile
|
@@ -99,6 +150,7 @@ Now any account, in a crontab, in an interactive shell, wherever,
|
|
99
150
|
can just execute `jruby-traject {arguments}`, and execute traject
|
100
151
|
in a jruby environment.
|
101
152
|
|
153
|
+
|
102
154
|
### Bundler too?
|
103
155
|
|
104
156
|
If you're running with bundler too, you could make a wrapper file specific to
|
@@ -188,4 +240,4 @@ do whatever you can make yell, just write ruby.
|
|
188
240
|
For automated batch execution, we recommend you consider using
|
189
241
|
bundler to manage any gem dependencies. See the [Extending
|
190
242
|
With Your Own Code](./extending.md) traject docs for
|
191
|
-
information on how traject integrates with bundler.
|
243
|
+
information on how traject integrates with bundler.
|
data/doc/extending.md
CHANGED
@@ -16,6 +16,7 @@ of a couple traject features meant to make it easier.
|
|
16
16
|
* Traject `-I` argument command line can be used to list directories to
|
17
17
|
add to the load path, similar to the `ruby -I` argument. You
|
18
18
|
can then 'require' local project files from the load path.
|
19
|
+
* Or modify the ruby `$LOAD_PATH` manually at the top of a traject config file you are loading.
|
19
20
|
* translation map files found in a
|
20
21
|
"./translation_maps" subdir on the load path will be found
|
21
22
|
for Traject translation maps.
|
@@ -155,7 +156,8 @@ by running `bundler init`, probably in the directory
|
|
155
156
|
right next to your traject config files.
|
156
157
|
|
157
158
|
Then specify what gems your traject project will use,
|
158
|
-
possibly with version restrictions, in the [Gemfile](http://bundler.io/v1.3/gemfile.html)
|
159
|
+
possibly with version restrictions, in the [Gemfile](http://bundler.io/v1.3/gemfile.html) --
|
160
|
+
**do** include `gem 'traject'` in the Gemfile.
|
159
161
|
|
160
162
|
Run `bundle install` from the directory with the Gemfile, on any system
|
161
163
|
at any time, to make sure specified gems are installed.
|
data/doc/settings.md
CHANGED
@@ -47,8 +47,8 @@ settings are applied first of all. It's recommended you use `provide`.
|
|
47
47
|
* `log.level`: Log this level and above. Default 'info', set to eg 'debug' to get potentially more logging info,
|
48
48
|
or 'error' to get less. https://github.com/rudionrails/yell/wiki/101-setting-the-log-level
|
49
49
|
|
50
|
-
* `log.batch_size`: If set to a number N (or string representation), will output a progress line to
|
51
|
-
log, every N records.
|
50
|
+
* `log.batch_size`: If set to a number N (or string representation), will output a progress line to DEBUG
|
51
|
+
log, every N records. (use -d to turn logging to DEBUG to see.)
|
52
52
|
|
53
53
|
* `marc_source.type`: default 'binary'. Can also set to 'xml' or (not yet implemented todo) 'json'. Command line shortcut `-t`
|
54
54
|
|
data/lib/traject/indexer.rb
CHANGED
@@ -2,11 +2,13 @@ require 'yell'
|
|
2
2
|
|
3
3
|
require 'traject'
|
4
4
|
require 'traject/qualified_const_get'
|
5
|
+
require 'traject/thread_pool'
|
5
6
|
|
6
7
|
require 'traject/indexer/settings'
|
7
8
|
require 'traject/marc_reader'
|
8
9
|
require 'traject/marc4j_reader'
|
9
10
|
require 'traject/json_writer'
|
11
|
+
require 'traject/solrj_writer'
|
10
12
|
|
11
13
|
require 'traject/macros/marc21'
|
12
14
|
require 'traject/macros/basic'
|
@@ -73,7 +75,7 @@ require 'traject/macros/basic'
|
|
73
75
|
# The default writer is the SolrJWriter, using Java SolrJ to
|
74
76
|
# write to a Solr. A few other built-in writers are available,
|
75
77
|
# but it's anticipated more will be created as plugins or local
|
76
|
-
# code for special purposes.
|
78
|
+
# code for special purposes.
|
77
79
|
#
|
78
80
|
# You can set alternate writers by setting a Class object directly
|
79
81
|
# with the #writer_class method, or by the 'writer_class_name' Setting,
|
@@ -167,38 +169,39 @@ class Traject::Indexer
|
|
167
169
|
attr_writer :logger
|
168
170
|
|
169
171
|
|
170
|
-
|
171
|
-
# or SomeLogger.new
|
172
|
-
def logger_argument
|
173
|
-
specified = settings["log.file"] || "STDERR"
|
174
|
-
|
175
|
-
case specified
|
176
|
-
when "STDOUT" then STDOUT
|
177
|
-
when "STDERR" then STDERR
|
178
|
-
else specified
|
179
|
-
end
|
180
|
-
end
|
181
|
-
|
182
|
-
# Second arg to Yell.new, options hash, calculated from
|
183
|
-
# settings
|
184
|
-
def logger_options
|
185
|
-
# formatter, default is fairly basic
|
172
|
+
def logger_format
|
186
173
|
format = settings["log.format"] || "%d %5L %m"
|
187
174
|
format = case format
|
188
|
-
|
189
|
-
|
190
|
-
|
175
|
+
when "false" then false
|
176
|
+
when "" then nil
|
177
|
+
else format
|
191
178
|
end
|
192
|
-
|
193
|
-
level = settings["log.level"] || "info"
|
194
|
-
|
195
|
-
{:format => format, :level => level}
|
196
179
|
end
|
197
180
|
|
198
181
|
# Create logger according to settings
|
199
182
|
def create_logger
|
183
|
+
|
184
|
+
logger_level = settings["log.level"] || "info"
|
185
|
+
|
200
186
|
# log everything to STDERR or specified logfile
|
201
|
-
logger = Yell.new
|
187
|
+
logger = Yell.new
|
188
|
+
logger.format = logger_format
|
189
|
+
logger.level = logger_level
|
190
|
+
|
191
|
+
logger_destination = settings["log.file"] || "STDERR"
|
192
|
+
# We intentionally repeat the logger_level
|
193
|
+
# on the adapter, so it will stay there if overall level
|
194
|
+
# is changed.
|
195
|
+
case logger_destination
|
196
|
+
when "STDERR"
|
197
|
+
logger.adapter :stderr, level: logger_level, format: logger_format
|
198
|
+
when "STDOUT"
|
199
|
+
logger.adapter :stdout, level: logger_level, format: logger_format
|
200
|
+
else
|
201
|
+
logger.adapter :file, logger_destination, level: logger_level, format: logger_format
|
202
|
+
end
|
203
|
+
|
204
|
+
|
202
205
|
# ADDITIONALLY log error and higher to....
|
203
206
|
if settings["log.error_file"]
|
204
207
|
logger.adapter :file, settings["log.error_file"], :level => 'gte.error'
|
@@ -329,7 +332,7 @@ class Traject::Indexer
|
|
329
332
|
if log_batch_size && (count % log_batch_size == 0)
|
330
333
|
batch_rps = log_batch_size / (Time.now - batch_start_time)
|
331
334
|
overall_rps = count / (Time.now - start_time)
|
332
|
-
logger.
|
335
|
+
logger.debug "Traject::Indexer#process, read #{count} records at id:#{id_string(record)}; #{'%.0f' % batch_rps}/s this batch, #{'%.0f' % overall_rps}/s overall"
|
333
336
|
batch_start_time = Time.now
|
334
337
|
end
|
335
338
|
|
@@ -114,12 +114,15 @@ module Traject
|
|
114
114
|
# * If it has any RDA 338, then it's print if it has a value of
|
115
115
|
# volume, sheet, or card.
|
116
116
|
# * If it does not have an RDA 338, it's print if and only if it has
|
117
|
-
#
|
117
|
+
# no 245$h GMD.
|
118
118
|
#
|
119
119
|
# * Here at JH, for legacy reasons we also choose to not
|
120
120
|
# call it print if it's already been marked audio, but
|
121
121
|
# we do that in a different method.
|
122
122
|
#
|
123
|
+
# Note that any record that has neither a 245 nor a 338rda is going
|
124
|
+
# to be marked print
|
125
|
+
#
|
123
126
|
# This algorithm is definitely going to get some things wrong in
|
124
127
|
# both directions, with real world data. But seems to be good enough.
|
125
128
|
def print?
|
@@ -137,7 +140,7 @@ module Traject
|
|
137
140
|
end
|
138
141
|
end
|
139
142
|
else
|
140
|
-
normalized_gmd.length == 0
|
143
|
+
normalized_gmd.length == 0
|
141
144
|
end
|
142
145
|
end
|
143
146
|
|
@@ -145,8 +148,8 @@ module Traject
|
|
145
148
|
# resource. But sometimes resort to 245$h GMD too.
|
146
149
|
def online?
|
147
150
|
# field 007, byte 0 c="electronic" byte 1 r="remote" ==> sure Online
|
148
|
-
found_007 = record.find do |field|
|
149
|
-
field.
|
151
|
+
found_007 = record.fields('007').find do |field|
|
152
|
+
field.value.slice(0) == "c" && field.value.slice(1) == "r"
|
150
153
|
end
|
151
154
|
|
152
155
|
return true if found_007
|
data/lib/traject/version.rb
CHANGED
@@ -28,6 +28,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
28
28
|
output = @indexer.map_record(@record)
|
29
29
|
|
30
30
|
assert_equal %w{47971712}, output["oclcnum"]
|
31
|
+
|
32
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
31
33
|
end
|
32
34
|
|
33
35
|
it "#marc_series_facet" do
|
@@ -40,6 +42,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
40
42
|
|
41
43
|
# trims punctuation too
|
42
44
|
assert_equal ["Big bands"], output["series_facet"]
|
45
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
46
|
+
|
43
47
|
end
|
44
48
|
|
45
49
|
describe "marc_sortable_author" do
|
@@ -54,6 +58,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
54
58
|
output = @indexer.map_record(@record)
|
55
59
|
|
56
60
|
assert_equal ["Herman, Edward S. Manufacturing consent the political economy of the mass media Edward S. Herman and Noam Chomsky ; with a new introduction by the authors"], output["author_sort"]
|
61
|
+
assert_equal [""], @indexer.map_record(empty_record)['author_sort']
|
62
|
+
|
57
63
|
end
|
58
64
|
it "respects non-filing" do
|
59
65
|
@record = MARC::Reader.new(support_file_path "the_business_ren.marc").to_a.first
|
@@ -61,6 +67,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
61
67
|
output = @indexer.map_record(@record)
|
62
68
|
|
63
69
|
assert_equal ["Business renaissance quarterly [electronic resource]."], output["author_sort"]
|
70
|
+
assert_equal [""], @indexer.map_record(empty_record)['author_sort']
|
71
|
+
|
64
72
|
end
|
65
73
|
end
|
66
74
|
|
@@ -71,6 +79,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
71
79
|
it "works" do
|
72
80
|
output = @indexer.map_record(@record)
|
73
81
|
assert_equal ["Manufacturing consent : the political economy of the mass media"], output["title_sort"]
|
82
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
83
|
+
|
74
84
|
end
|
75
85
|
it "respects non-filing" do
|
76
86
|
@record = MARC::Reader.new(support_file_path "the_business_ren.marc").to_a.first
|
@@ -95,6 +105,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
95
105
|
output = @indexer.map_record(@record)
|
96
106
|
|
97
107
|
assert_equal ["English", "French", "German", "Italian", "Spanish", "Russian"], output["languages"]
|
108
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
109
|
+
|
98
110
|
end
|
99
111
|
end
|
100
112
|
|
@@ -108,6 +120,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
108
120
|
output = @indexer.map_record(@record)
|
109
121
|
|
110
122
|
assert_equal ["Larger ensemble, Unspecified", "Piano", "Soprano voice", "Tenor voice", "Violin", "Larger ensemble, Ethnic", "Guitar", "Voices, Unspecified"], output["instrumentation"]
|
123
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
124
|
+
|
111
125
|
end
|
112
126
|
end
|
113
127
|
|
@@ -126,16 +140,29 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
126
140
|
@record = MARC::Reader.new(support_file_path "louis_armstrong.marc").to_a.first
|
127
141
|
output = @indexer.map_record(@record)
|
128
142
|
|
129
|
-
assert_equal ["bb01", "bb01.s", "bb", "bb.s", "oe"],
|
130
|
-
|
143
|
+
assert_equal ["bb01", "bb01.s", "bb", "bb.s", "oe"], output["instrument_codes"]
|
144
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
145
|
+
|
131
146
|
end
|
132
147
|
end
|
133
148
|
|
134
149
|
describe "publication_date" do
|
135
150
|
# there are way too many edge cases for us to test em all, but we'll test some of em.
|
151
|
+
|
152
|
+
it "works when there's no date information" do
|
153
|
+
assert_equal nil, Marc21Semantics.publication_date(empty_record)
|
154
|
+
end
|
155
|
+
|
156
|
+
it "uses macro correctly with no date info" do
|
157
|
+
@indexer.instance_eval {to_field "date", marc_publication_date }
|
158
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
159
|
+
end
|
160
|
+
|
161
|
+
|
136
162
|
it "pulls out 008 date_type s" do
|
137
163
|
@record = MARC::Reader.new(support_file_path "manufacturing_consent.marc").to_a.first
|
138
164
|
assert_equal 2002, Marc21Semantics.publication_date(@record)
|
165
|
+
|
139
166
|
end
|
140
167
|
it "uses start date for date_type c continuing resource" do
|
141
168
|
@record = MARC::Reader.new(support_file_path "the_business_ren.marc").to_a.first
|
@@ -182,18 +209,24 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
182
209
|
output = @indexer.map_record(@record)
|
183
210
|
|
184
211
|
assert_equal ["Language & Literature"], output["discipline_facet"]
|
212
|
+
|
185
213
|
end
|
186
214
|
it "maps to default" do
|
187
215
|
@record = MARC::Reader.new(support_file_path "musical_cage.marc").to_a.first
|
188
216
|
output = @indexer.map_record(@record)
|
189
217
|
assert_equal ["Unknown"], output["discipline_facet"]
|
218
|
+
assert_equal(["Unknown"], @indexer.map_record(empty_record)['discipline_facet'])
|
190
219
|
end
|
220
|
+
|
191
221
|
it "maps to nothing if none and no default" do
|
192
222
|
@indexer.instance_eval {to_field "discipline_no_default", marc_lcc_to_broad_category(:default => nil)}
|
193
223
|
@record = MARC::Reader.new(support_file_path "musical_cage.marc").to_a.first
|
194
224
|
output = @indexer.map_record(@record)
|
195
225
|
|
196
226
|
assert_nil output["discipline_no_default"]
|
227
|
+
|
228
|
+
assert_nil @indexer.map_record(empty_record)["discipline_no_default"]
|
229
|
+
|
197
230
|
end
|
198
231
|
|
199
232
|
describe "LCC_REGEX" do
|
@@ -212,13 +245,15 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
212
245
|
@record = MARC::Reader.new(support_file_path "multi_geo.marc").to_a.first
|
213
246
|
output = @indexer.map_record(@record)
|
214
247
|
|
215
|
-
assert_equal ["Europe", "Middle East", "Africa, North", "Agora (Athens, Greece)", "Rome (Italy)", "Italy"],
|
216
|
-
|
248
|
+
assert_equal ["Europe", "Middle East", "Africa, North", "Agora (Athens, Greece)", "Rome (Italy)", "Italy"], output["geo_facet"]
|
249
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
217
250
|
end
|
218
251
|
it "maps nothing on a record with no geo" do
|
219
252
|
@record = MARC::Reader.new(support_file_path "manufacturing_consent.marc").to_a.first
|
220
253
|
output = @indexer.map_record(@record)
|
221
254
|
assert_nil output["geo_facet"]
|
255
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
256
|
+
|
222
257
|
end
|
223
258
|
end
|
224
259
|
|
@@ -232,6 +267,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
232
267
|
|
233
268
|
assert_equal ["Early modern, 1500-1700", "17th century", "Great Britain: Puritan Revolution, 1642-1660", "Great Britain: Civil War, 1642-1649", "1642-1660"],
|
234
269
|
output["era_facet"]
|
270
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
271
|
+
|
235
272
|
end
|
236
273
|
end
|
237
274
|
|
@@ -241,6 +278,7 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
241
278
|
str = Marc21Semantics.assemble_lcsh(field)
|
242
279
|
|
243
280
|
assert_equal "Psychoanalysis and literature — England — History — 19th century", str
|
281
|
+
|
244
282
|
end
|
245
283
|
|
246
284
|
it "ignores numeric subfields" do
|
@@ -277,6 +315,9 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
277
315
|
|
278
316
|
assert output["lcsh"].length > 0, "outputs data"
|
279
317
|
assert output["lcsh"].include?("Eliot, George, 1819-1880 — Characters"), "includes a string its supposed to"
|
318
|
+
|
319
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
320
|
+
|
280
321
|
end
|
281
322
|
end
|
282
323
|
end
|
@@ -292,6 +333,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
292
333
|
end
|
293
334
|
output = @indexer.map_record(@record)
|
294
335
|
assert_equal ['Business renaissance quarterly'], output['title_phrase']
|
336
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
337
|
+
|
295
338
|
end
|
296
339
|
|
297
340
|
it "works with :include_original" do
|
@@ -300,6 +343,7 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
300
343
|
end
|
301
344
|
output = @indexer.map_record(@record)
|
302
345
|
assert_equal ['The Business renaissance quarterly', 'Business renaissance quarterly'], output['title_phrase']
|
346
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
303
347
|
end
|
304
348
|
|
305
349
|
it "doesn't do anything if you don't include the first subfield" do
|
@@ -308,6 +352,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
308
352
|
end
|
309
353
|
output = @indexer.map_record(@record)
|
310
354
|
assert_equal ['[electronic resource].'], output['title_phrase']
|
355
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
356
|
+
|
311
357
|
end
|
312
358
|
|
313
359
|
|
@@ -26,6 +26,8 @@ describe "Traject::Macros::Marc21" do
|
|
26
26
|
output = @indexer.map_record(@record)
|
27
27
|
|
28
28
|
assert_equal ["Manufacturing consent : the political economy of the mass media /"], output["title"]
|
29
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
30
|
+
|
29
31
|
end
|
30
32
|
|
31
33
|
it "respects :first=>true option" do
|
@@ -36,6 +38,7 @@ describe "Traject::Macros::Marc21" do
|
|
36
38
|
output = @indexer.map_record(@record)
|
37
39
|
|
38
40
|
assert_length 1, output["other_id"]
|
41
|
+
|
39
42
|
end
|
40
43
|
|
41
44
|
it "trims punctuation with :trim_punctuation => true" do
|
@@ -46,6 +49,8 @@ describe "Traject::Macros::Marc21" do
|
|
46
49
|
output = @indexer.map_record(@record)
|
47
50
|
|
48
51
|
assert_equal ["Manufacturing consent : the political economy of the mass media"], output["title"]
|
52
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
53
|
+
|
49
54
|
end
|
50
55
|
|
51
56
|
it "respects :default option" do
|
@@ -70,6 +75,7 @@ describe "Traject::Macros::Marc21" do
|
|
70
75
|
output = @indexer.map_record(@record)
|
71
76
|
assert_equal ["eng"], output['lang1']
|
72
77
|
assert_equal ["eng", "eng"], output['lang2']
|
78
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
73
79
|
end
|
74
80
|
|
75
81
|
it "fails on an extra/misspelled argument to extract_marc" do
|
@@ -10,7 +10,11 @@ def classifier_for(filename)
|
|
10
10
|
end
|
11
11
|
|
12
12
|
describe "MarcFormatClassifier" do
|
13
|
-
|
13
|
+
|
14
|
+
it "returns 'Print' when there's no other data" do
|
15
|
+
assert_equal ['Print'], MarcFormatClassifier.new( empty_record ).formats
|
16
|
+
end
|
17
|
+
|
14
18
|
describe "genre" do
|
15
19
|
# We don't have the patience to test every case, just a sampling
|
16
20
|
it "says book" do
|
data/test/test_helper.rb
CHANGED
@@ -37,6 +37,16 @@ def assert_start_with(start_with, obj, msg = nil)
|
|
37
37
|
assert obj.start_with?(start_with), msg
|
38
38
|
end
|
39
39
|
|
40
|
+
|
41
|
+
# An empty record, for making sure extractors and macros work when
|
42
|
+
# the fields they're looking for aren't there
|
43
|
+
|
44
|
+
def empty_record
|
45
|
+
rec = MARC::Record.new
|
46
|
+
rec.append(MARC::ControlField.new('001', '000000000'))
|
47
|
+
rec
|
48
|
+
end
|
49
|
+
|
40
50
|
# pretends to be a SolrJ HTTPServer-like thing, just kind of mocks it up
|
41
51
|
# and records what happens and simulates errors in some cases.
|
42
52
|
class MockSolrServer
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: traject
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.0.beta.
|
4
|
+
version: 1.0.0.beta.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jonathan Rochkind
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-10-
|
12
|
+
date: 2013-10-28 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: marc
|
@@ -286,7 +286,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
286
286
|
version: 1.3.1
|
287
287
|
requirements: []
|
288
288
|
rubyforge_project:
|
289
|
-
rubygems_version: 2.1.
|
289
|
+
rubygems_version: 2.1.9
|
290
290
|
signing_key:
|
291
291
|
specification_version: 4
|
292
292
|
summary: Index MARC to Solr; or generally process source records to hash-like structures
|