traject 1.0.0.beta.2 → 1.0.0.beta.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/doc/batch_execution.md +60 -8
- data/doc/extending.md +3 -1
- data/doc/settings.md +2 -2
- data/lib/traject/indexer.rb +29 -26
- data/lib/traject/macros/marc_format_classifier.rb +7 -4
- data/lib/traject/version.rb +1 -1
- data/test/indexer/macros_marc21_semantics_test.rb +50 -4
- data/test/indexer/macros_marc21_test.rb +6 -0
- data/test/marc_format_classifier_test.rb +5 -1
- data/test/test_helper.rb +10 -0
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: fba53ed999c0449a6de13e4e1455399431c4e052
|
4
|
+
data.tar.gz: 5af370ada3fd4f779607bd7e3f294607a7a20208
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f062f33724bdf1d11260edb675a239ebfcd834238dc98806c638d1a9c955acb5819fd2c6fbf15f023b683b2acd68913d100c8ede7f75ca62382e783ecba429da
|
7
|
+
data.tar.gz: d0efcb96f544f7cdc42517cb08a067f8d1f20de1fcbf34a53af1adbdca4b127105f7e659fed11625966605bdef79d5f23433dc0e209b159a447686fa887ab75e
|
data/doc/batch_execution.md
CHANGED
@@ -18,7 +18,9 @@ with jruby 1.7.x or later, this should be default, recommend
|
|
18
18
|
you use jruby 1.7.x.
|
19
19
|
|
20
20
|
Especially when running under a cron job, it can be difficult to
|
21
|
-
set things up so traject runs under jruby
|
21
|
+
set things up so traject runs under jruby -- and then when you add
|
22
|
+
bundler into it, things can get positively byzantine. It's not you,
|
23
|
+
this gets confusing.
|
22
24
|
|
23
25
|
It can sometimes be useful to create a wrapper script for traject
|
24
26
|
that takes care of making sure it's running under the right ruby
|
@@ -31,8 +33,11 @@ Simply run with:
|
|
31
33
|
chruby-exec jruby -- traject {other arguments}
|
32
34
|
|
33
35
|
Whether specifying that directly in a crontab, or in a shell script
|
34
|
-
that needs to call traject, etc.
|
35
|
-
|
36
|
+
that needs to call traject, etc. In a crontab environment, it'll actually need
|
37
|
+
you to set PATH and SHELL variables, as specified in the [chruby docs](https://github.com/postmodern/chruby/wiki/Cron)
|
38
|
+
|
39
|
+
|
40
|
+
So simple you might not need a wrapper script, but it might still be convenient to create one. Say
|
36
41
|
you put a `jruby-traject` at `/usr/local/bin/jruby-traject`, that
|
37
42
|
looks like this:
|
38
43
|
|
@@ -40,9 +45,55 @@ looks like this:
|
|
40
45
|
|
41
46
|
chruby-exec jruby -- traject "$@"
|
42
47
|
|
43
|
-
Now
|
44
|
-
|
45
|
-
|
48
|
+
Now you can can just execute `jruby-traject {arguments}`, and execute traject
|
49
|
+
in a jruby environment. (In a crontab, you'll still need to fix your
|
50
|
+
PATH and SHELL env variables for `chruby-exec` to work, either in the
|
51
|
+
crontab or in this wrapper script)
|
52
|
+
|
53
|
+
### chruby monster wrapper script
|
54
|
+
|
55
|
+
I am still not sure if this is a good idea, but here's an example of
|
56
|
+
a wrapper script for chruby that will take care of the ENV even
|
57
|
+
when running in a crontab, use chruby-exec only if jruby isn't
|
58
|
+
already the default ruby, and add in `bundle exec` too.
|
59
|
+
|
60
|
+
~~~bash
|
61
|
+
#!/usr/bin/env bash
|
62
|
+
|
63
|
+
# A wrapper for traject that uses chruby to make sure jruby
|
64
|
+
# is being used before calling traject, and then calls
|
65
|
+
# traject with bundle exec from within our traject project
|
66
|
+
# dir.
|
67
|
+
|
68
|
+
# Make sure /usr/local/bin is in PATH for chruby-exec,
|
69
|
+
# which it's not ordinarily in a cronjob.
|
70
|
+
if [[ ":$PATH:" != *":/usr/local/bin:"* ]]
|
71
|
+
then
|
72
|
+
export PATH=$PATH:/usr/local/bin
|
73
|
+
fi
|
74
|
+
# chruby needs SHELL set, which it won't be from a crontab
|
75
|
+
export SHELL=/bin/bash
|
76
|
+
|
77
|
+
# Find the dir based on location of this wrapper script,
|
78
|
+
# then use that dir to cd to for the bundle exec to find
|
79
|
+
# the right Gemfile.
|
80
|
+
traject_dir=$(cd `dirname "${BASH_SOURCE[0]}"` && pwd)
|
81
|
+
|
82
|
+
# do we need to use chruby to switch to jruby?
|
83
|
+
if [[ "$(ruby -v)" == *jruby* ]]
|
84
|
+
then
|
85
|
+
ruby_picker="" # nothing needed "
|
86
|
+
else
|
87
|
+
ruby_picker="chruby-exec jruby --"
|
88
|
+
fi
|
89
|
+
|
90
|
+
cmd="BUNDLE_GEMFILE=$traject_dir/Gemfile $ruby_picker bundle exec traject $@"
|
91
|
+
|
92
|
+
echo $cmd
|
93
|
+
eval $cmd
|
94
|
+
~~~
|
95
|
+
|
96
|
+
This monster script can perhaps be adapted for rbenv or rvm.
|
46
97
|
|
47
98
|
### for rbenv
|
48
99
|
|
@@ -62,7 +113,7 @@ If you're running inside a cronjob, things get a bit trickier,
|
|
62
113
|
because rbenv isn't normally set up in the limited environment
|
63
114
|
of cron tasks. One way to deal with this is to have your
|
64
115
|
cronjob explicitly execute in a bash login shell, that
|
65
|
-
will then have rbenv set up so long as it's running
|
116
|
+
will then have rbenv set up -- so long as it's running
|
66
117
|
under an account with rbenv set up properly!
|
67
118
|
|
68
119
|
# in a cronfile
|
@@ -99,6 +150,7 @@ Now any account, in a crontab, in an interactive shell, wherever,
|
|
99
150
|
can just execute `jruby-traject {arguments}`, and execute traject
|
100
151
|
in a jruby environment.
|
101
152
|
|
153
|
+
|
102
154
|
### Bundler too?
|
103
155
|
|
104
156
|
If you're running with bundler too, you could make a wrapper file specific to
|
@@ -188,4 +240,4 @@ do whatever you can make yell, just write ruby.
|
|
188
240
|
For automated batch execution, we recommend you consider using
|
189
241
|
bundler to manage any gem dependencies. See the [Extending
|
190
242
|
With Your Own Code](./extending.md) traject docs for
|
191
|
-
information on how traject integrates with bundler.
|
243
|
+
information on how traject integrates with bundler.
|
data/doc/extending.md
CHANGED
@@ -16,6 +16,7 @@ of a couple traject features meant to make it easier.
|
|
16
16
|
* Traject `-I` argument command line can be used to list directories to
|
17
17
|
add to the load path, similar to the `ruby -I` argument. You
|
18
18
|
can then 'require' local project files from the load path.
|
19
|
+
* Or modify the ruby `$LOAD_PATH` manually at the top of a traject config file you are loading.
|
19
20
|
* translation map files found in a
|
20
21
|
"./translation_maps" subdir on the load path will be found
|
21
22
|
for Traject translation maps.
|
@@ -155,7 +156,8 @@ by running `bundler init`, probably in the directory
|
|
155
156
|
right next to your traject config files.
|
156
157
|
|
157
158
|
Then specify what gems your traject project will use,
|
158
|
-
possibly with version restrictions, in the [Gemfile](http://bundler.io/v1.3/gemfile.html)
|
159
|
+
possibly with version restrictions, in the [Gemfile](http://bundler.io/v1.3/gemfile.html) --
|
160
|
+
**do** include `gem 'traject'` in the Gemfile.
|
159
161
|
|
160
162
|
Run `bundle install` from the directory with the Gemfile, on any system
|
161
163
|
at any time, to make sure specified gems are installed.
|
data/doc/settings.md
CHANGED
@@ -47,8 +47,8 @@ settings are applied first of all. It's recommended you use `provide`.
|
|
47
47
|
* `log.level`: Log this level and above. Default 'info', set to eg 'debug' to get potentially more logging info,
|
48
48
|
or 'error' to get less. https://github.com/rudionrails/yell/wiki/101-setting-the-log-level
|
49
49
|
|
50
|
-
* `log.batch_size`: If set to a number N (or string representation), will output a progress line to
|
51
|
-
log, every N records.
|
50
|
+
* `log.batch_size`: If set to a number N (or string representation), will output a progress line to DEBUG
|
51
|
+
log, every N records. (use -d to turn logging to DEBUG to see.)
|
52
52
|
|
53
53
|
* `marc_source.type`: default 'binary'. Can also set to 'xml' or (not yet implemented todo) 'json'. Command line shortcut `-t`
|
54
54
|
|
data/lib/traject/indexer.rb
CHANGED
@@ -2,11 +2,13 @@ require 'yell'
|
|
2
2
|
|
3
3
|
require 'traject'
|
4
4
|
require 'traject/qualified_const_get'
|
5
|
+
require 'traject/thread_pool'
|
5
6
|
|
6
7
|
require 'traject/indexer/settings'
|
7
8
|
require 'traject/marc_reader'
|
8
9
|
require 'traject/marc4j_reader'
|
9
10
|
require 'traject/json_writer'
|
11
|
+
require 'traject/solrj_writer'
|
10
12
|
|
11
13
|
require 'traject/macros/marc21'
|
12
14
|
require 'traject/macros/basic'
|
@@ -73,7 +75,7 @@ require 'traject/macros/basic'
|
|
73
75
|
# The default writer is the SolrJWriter, using Java SolrJ to
|
74
76
|
# write to a Solr. A few other built-in writers are available,
|
75
77
|
# but it's anticipated more will be created as plugins or local
|
76
|
-
# code for special purposes.
|
78
|
+
# code for special purposes.
|
77
79
|
#
|
78
80
|
# You can set alternate writers by setting a Class object directly
|
79
81
|
# with the #writer_class method, or by the 'writer_class_name' Setting,
|
@@ -167,38 +169,39 @@ class Traject::Indexer
|
|
167
169
|
attr_writer :logger
|
168
170
|
|
169
171
|
|
170
|
-
|
171
|
-
# or SomeLogger.new
|
172
|
-
def logger_argument
|
173
|
-
specified = settings["log.file"] || "STDERR"
|
174
|
-
|
175
|
-
case specified
|
176
|
-
when "STDOUT" then STDOUT
|
177
|
-
when "STDERR" then STDERR
|
178
|
-
else specified
|
179
|
-
end
|
180
|
-
end
|
181
|
-
|
182
|
-
# Second arg to Yell.new, options hash, calculated from
|
183
|
-
# settings
|
184
|
-
def logger_options
|
185
|
-
# formatter, default is fairly basic
|
172
|
+
def logger_format
|
186
173
|
format = settings["log.format"] || "%d %5L %m"
|
187
174
|
format = case format
|
188
|
-
|
189
|
-
|
190
|
-
|
175
|
+
when "false" then false
|
176
|
+
when "" then nil
|
177
|
+
else format
|
191
178
|
end
|
192
|
-
|
193
|
-
level = settings["log.level"] || "info"
|
194
|
-
|
195
|
-
{:format => format, :level => level}
|
196
179
|
end
|
197
180
|
|
198
181
|
# Create logger according to settings
|
199
182
|
def create_logger
|
183
|
+
|
184
|
+
logger_level = settings["log.level"] || "info"
|
185
|
+
|
200
186
|
# log everything to STDERR or specified logfile
|
201
|
-
logger = Yell.new
|
187
|
+
logger = Yell.new
|
188
|
+
logger.format = logger_format
|
189
|
+
logger.level = logger_level
|
190
|
+
|
191
|
+
logger_destination = settings["log.file"] || "STDERR"
|
192
|
+
# We intentionally repeat the logger_level
|
193
|
+
# on the adapter, so it will stay there if overall level
|
194
|
+
# is changed.
|
195
|
+
case logger_destination
|
196
|
+
when "STDERR"
|
197
|
+
logger.adapter :stderr, level: logger_level, format: logger_format
|
198
|
+
when "STDOUT"
|
199
|
+
logger.adapter :stdout, level: logger_level, format: logger_format
|
200
|
+
else
|
201
|
+
logger.adapter :file, logger_destination, level: logger_level, format: logger_format
|
202
|
+
end
|
203
|
+
|
204
|
+
|
202
205
|
# ADDITIONALLY log error and higher to....
|
203
206
|
if settings["log.error_file"]
|
204
207
|
logger.adapter :file, settings["log.error_file"], :level => 'gte.error'
|
@@ -329,7 +332,7 @@ class Traject::Indexer
|
|
329
332
|
if log_batch_size && (count % log_batch_size == 0)
|
330
333
|
batch_rps = log_batch_size / (Time.now - batch_start_time)
|
331
334
|
overall_rps = count / (Time.now - start_time)
|
332
|
-
logger.
|
335
|
+
logger.debug "Traject::Indexer#process, read #{count} records at id:#{id_string(record)}; #{'%.0f' % batch_rps}/s this batch, #{'%.0f' % overall_rps}/s overall"
|
333
336
|
batch_start_time = Time.now
|
334
337
|
end
|
335
338
|
|
@@ -114,12 +114,15 @@ module Traject
|
|
114
114
|
# * If it has any RDA 338, then it's print if it has a value of
|
115
115
|
# volume, sheet, or card.
|
116
116
|
# * If it does not have an RDA 338, it's print if and only if it has
|
117
|
-
#
|
117
|
+
# no 245$h GMD.
|
118
118
|
#
|
119
119
|
# * Here at JH, for legacy reasons we also choose to not
|
120
120
|
# call it print if it's already been marked audio, but
|
121
121
|
# we do that in a different method.
|
122
122
|
#
|
123
|
+
# Note that any record that has neither a 245 nor a 338rda is going
|
124
|
+
# to be marked print
|
125
|
+
#
|
123
126
|
# This algorithm is definitely going to get some things wrong in
|
124
127
|
# both directions, with real world data. But seems to be good enough.
|
125
128
|
def print?
|
@@ -137,7 +140,7 @@ module Traject
|
|
137
140
|
end
|
138
141
|
end
|
139
142
|
else
|
140
|
-
normalized_gmd.length == 0
|
143
|
+
normalized_gmd.length == 0
|
141
144
|
end
|
142
145
|
end
|
143
146
|
|
@@ -145,8 +148,8 @@ module Traject
|
|
145
148
|
# resource. But sometimes resort to 245$h GMD too.
|
146
149
|
def online?
|
147
150
|
# field 007, byte 0 c="electronic" byte 1 r="remote" ==> sure Online
|
148
|
-
found_007 = record.find do |field|
|
149
|
-
field.
|
151
|
+
found_007 = record.fields('007').find do |field|
|
152
|
+
field.value.slice(0) == "c" && field.value.slice(1) == "r"
|
150
153
|
end
|
151
154
|
|
152
155
|
return true if found_007
|
data/lib/traject/version.rb
CHANGED
@@ -28,6 +28,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
28
28
|
output = @indexer.map_record(@record)
|
29
29
|
|
30
30
|
assert_equal %w{47971712}, output["oclcnum"]
|
31
|
+
|
32
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
31
33
|
end
|
32
34
|
|
33
35
|
it "#marc_series_facet" do
|
@@ -40,6 +42,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
40
42
|
|
41
43
|
# trims punctuation too
|
42
44
|
assert_equal ["Big bands"], output["series_facet"]
|
45
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
46
|
+
|
43
47
|
end
|
44
48
|
|
45
49
|
describe "marc_sortable_author" do
|
@@ -54,6 +58,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
54
58
|
output = @indexer.map_record(@record)
|
55
59
|
|
56
60
|
assert_equal ["Herman, Edward S. Manufacturing consent the political economy of the mass media Edward S. Herman and Noam Chomsky ; with a new introduction by the authors"], output["author_sort"]
|
61
|
+
assert_equal [""], @indexer.map_record(empty_record)['author_sort']
|
62
|
+
|
57
63
|
end
|
58
64
|
it "respects non-filing" do
|
59
65
|
@record = MARC::Reader.new(support_file_path "the_business_ren.marc").to_a.first
|
@@ -61,6 +67,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
61
67
|
output = @indexer.map_record(@record)
|
62
68
|
|
63
69
|
assert_equal ["Business renaissance quarterly [electronic resource]."], output["author_sort"]
|
70
|
+
assert_equal [""], @indexer.map_record(empty_record)['author_sort']
|
71
|
+
|
64
72
|
end
|
65
73
|
end
|
66
74
|
|
@@ -71,6 +79,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
71
79
|
it "works" do
|
72
80
|
output = @indexer.map_record(@record)
|
73
81
|
assert_equal ["Manufacturing consent : the political economy of the mass media"], output["title_sort"]
|
82
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
83
|
+
|
74
84
|
end
|
75
85
|
it "respects non-filing" do
|
76
86
|
@record = MARC::Reader.new(support_file_path "the_business_ren.marc").to_a.first
|
@@ -95,6 +105,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
95
105
|
output = @indexer.map_record(@record)
|
96
106
|
|
97
107
|
assert_equal ["English", "French", "German", "Italian", "Spanish", "Russian"], output["languages"]
|
108
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
109
|
+
|
98
110
|
end
|
99
111
|
end
|
100
112
|
|
@@ -108,6 +120,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
108
120
|
output = @indexer.map_record(@record)
|
109
121
|
|
110
122
|
assert_equal ["Larger ensemble, Unspecified", "Piano", "Soprano voice", "Tenor voice", "Violin", "Larger ensemble, Ethnic", "Guitar", "Voices, Unspecified"], output["instrumentation"]
|
123
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
124
|
+
|
111
125
|
end
|
112
126
|
end
|
113
127
|
|
@@ -126,16 +140,29 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
126
140
|
@record = MARC::Reader.new(support_file_path "louis_armstrong.marc").to_a.first
|
127
141
|
output = @indexer.map_record(@record)
|
128
142
|
|
129
|
-
assert_equal ["bb01", "bb01.s", "bb", "bb.s", "oe"],
|
130
|
-
|
143
|
+
assert_equal ["bb01", "bb01.s", "bb", "bb.s", "oe"], output["instrument_codes"]
|
144
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
145
|
+
|
131
146
|
end
|
132
147
|
end
|
133
148
|
|
134
149
|
describe "publication_date" do
|
135
150
|
# there are way too many edge cases for us to test em all, but we'll test some of em.
|
151
|
+
|
152
|
+
it "works when there's no date information" do
|
153
|
+
assert_equal nil, Marc21Semantics.publication_date(empty_record)
|
154
|
+
end
|
155
|
+
|
156
|
+
it "uses macro correctly with no date info" do
|
157
|
+
@indexer.instance_eval {to_field "date", marc_publication_date }
|
158
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
159
|
+
end
|
160
|
+
|
161
|
+
|
136
162
|
it "pulls out 008 date_type s" do
|
137
163
|
@record = MARC::Reader.new(support_file_path "manufacturing_consent.marc").to_a.first
|
138
164
|
assert_equal 2002, Marc21Semantics.publication_date(@record)
|
165
|
+
|
139
166
|
end
|
140
167
|
it "uses start date for date_type c continuing resource" do
|
141
168
|
@record = MARC::Reader.new(support_file_path "the_business_ren.marc").to_a.first
|
@@ -182,18 +209,24 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
182
209
|
output = @indexer.map_record(@record)
|
183
210
|
|
184
211
|
assert_equal ["Language & Literature"], output["discipline_facet"]
|
212
|
+
|
185
213
|
end
|
186
214
|
it "maps to default" do
|
187
215
|
@record = MARC::Reader.new(support_file_path "musical_cage.marc").to_a.first
|
188
216
|
output = @indexer.map_record(@record)
|
189
217
|
assert_equal ["Unknown"], output["discipline_facet"]
|
218
|
+
assert_equal(["Unknown"], @indexer.map_record(empty_record)['discipline_facet'])
|
190
219
|
end
|
220
|
+
|
191
221
|
it "maps to nothing if none and no default" do
|
192
222
|
@indexer.instance_eval {to_field "discipline_no_default", marc_lcc_to_broad_category(:default => nil)}
|
193
223
|
@record = MARC::Reader.new(support_file_path "musical_cage.marc").to_a.first
|
194
224
|
output = @indexer.map_record(@record)
|
195
225
|
|
196
226
|
assert_nil output["discipline_no_default"]
|
227
|
+
|
228
|
+
assert_nil @indexer.map_record(empty_record)["discipline_no_default"]
|
229
|
+
|
197
230
|
end
|
198
231
|
|
199
232
|
describe "LCC_REGEX" do
|
@@ -212,13 +245,15 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
212
245
|
@record = MARC::Reader.new(support_file_path "multi_geo.marc").to_a.first
|
213
246
|
output = @indexer.map_record(@record)
|
214
247
|
|
215
|
-
assert_equal ["Europe", "Middle East", "Africa, North", "Agora (Athens, Greece)", "Rome (Italy)", "Italy"],
|
216
|
-
|
248
|
+
assert_equal ["Europe", "Middle East", "Africa, North", "Agora (Athens, Greece)", "Rome (Italy)", "Italy"], output["geo_facet"]
|
249
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
217
250
|
end
|
218
251
|
it "maps nothing on a record with no geo" do
|
219
252
|
@record = MARC::Reader.new(support_file_path "manufacturing_consent.marc").to_a.first
|
220
253
|
output = @indexer.map_record(@record)
|
221
254
|
assert_nil output["geo_facet"]
|
255
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
256
|
+
|
222
257
|
end
|
223
258
|
end
|
224
259
|
|
@@ -232,6 +267,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
232
267
|
|
233
268
|
assert_equal ["Early modern, 1500-1700", "17th century", "Great Britain: Puritan Revolution, 1642-1660", "Great Britain: Civil War, 1642-1649", "1642-1660"],
|
234
269
|
output["era_facet"]
|
270
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
271
|
+
|
235
272
|
end
|
236
273
|
end
|
237
274
|
|
@@ -241,6 +278,7 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
241
278
|
str = Marc21Semantics.assemble_lcsh(field)
|
242
279
|
|
243
280
|
assert_equal "Psychoanalysis and literature — England — History — 19th century", str
|
281
|
+
|
244
282
|
end
|
245
283
|
|
246
284
|
it "ignores numeric subfields" do
|
@@ -277,6 +315,9 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
277
315
|
|
278
316
|
assert output["lcsh"].length > 0, "outputs data"
|
279
317
|
assert output["lcsh"].include?("Eliot, George, 1819-1880 — Characters"), "includes a string its supposed to"
|
318
|
+
|
319
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
320
|
+
|
280
321
|
end
|
281
322
|
end
|
282
323
|
end
|
@@ -292,6 +333,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
292
333
|
end
|
293
334
|
output = @indexer.map_record(@record)
|
294
335
|
assert_equal ['Business renaissance quarterly'], output['title_phrase']
|
336
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
337
|
+
|
295
338
|
end
|
296
339
|
|
297
340
|
it "works with :include_original" do
|
@@ -300,6 +343,7 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
300
343
|
end
|
301
344
|
output = @indexer.map_record(@record)
|
302
345
|
assert_equal ['The Business renaissance quarterly', 'Business renaissance quarterly'], output['title_phrase']
|
346
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
303
347
|
end
|
304
348
|
|
305
349
|
it "doesn't do anything if you don't include the first subfield" do
|
@@ -308,6 +352,8 @@ describe "Traject::Macros::Marc21Semantics" do
|
|
308
352
|
end
|
309
353
|
output = @indexer.map_record(@record)
|
310
354
|
assert_equal ['[electronic resource].'], output['title_phrase']
|
355
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
356
|
+
|
311
357
|
end
|
312
358
|
|
313
359
|
|
@@ -26,6 +26,8 @@ describe "Traject::Macros::Marc21" do
|
|
26
26
|
output = @indexer.map_record(@record)
|
27
27
|
|
28
28
|
assert_equal ["Manufacturing consent : the political economy of the mass media /"], output["title"]
|
29
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
30
|
+
|
29
31
|
end
|
30
32
|
|
31
33
|
it "respects :first=>true option" do
|
@@ -36,6 +38,7 @@ describe "Traject::Macros::Marc21" do
|
|
36
38
|
output = @indexer.map_record(@record)
|
37
39
|
|
38
40
|
assert_length 1, output["other_id"]
|
41
|
+
|
39
42
|
end
|
40
43
|
|
41
44
|
it "trims punctuation with :trim_punctuation => true" do
|
@@ -46,6 +49,8 @@ describe "Traject::Macros::Marc21" do
|
|
46
49
|
output = @indexer.map_record(@record)
|
47
50
|
|
48
51
|
assert_equal ["Manufacturing consent : the political economy of the mass media"], output["title"]
|
52
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
53
|
+
|
49
54
|
end
|
50
55
|
|
51
56
|
it "respects :default option" do
|
@@ -70,6 +75,7 @@ describe "Traject::Macros::Marc21" do
|
|
70
75
|
output = @indexer.map_record(@record)
|
71
76
|
assert_equal ["eng"], output['lang1']
|
72
77
|
assert_equal ["eng", "eng"], output['lang2']
|
78
|
+
assert_equal({}, @indexer.map_record(empty_record))
|
73
79
|
end
|
74
80
|
|
75
81
|
it "fails on an extra/misspelled argument to extract_marc" do
|
@@ -10,7 +10,11 @@ def classifier_for(filename)
|
|
10
10
|
end
|
11
11
|
|
12
12
|
describe "MarcFormatClassifier" do
|
13
|
-
|
13
|
+
|
14
|
+
it "returns 'Print' when there's no other data" do
|
15
|
+
assert_equal ['Print'], MarcFormatClassifier.new( empty_record ).formats
|
16
|
+
end
|
17
|
+
|
14
18
|
describe "genre" do
|
15
19
|
# We don't have the patience to test every case, just a sampling
|
16
20
|
it "says book" do
|
data/test/test_helper.rb
CHANGED
@@ -37,6 +37,16 @@ def assert_start_with(start_with, obj, msg = nil)
|
|
37
37
|
assert obj.start_with?(start_with), msg
|
38
38
|
end
|
39
39
|
|
40
|
+
|
41
|
+
# An empty record, for making sure extractors and macros work when
|
42
|
+
# the fields they're looking for aren't there
|
43
|
+
|
44
|
+
def empty_record
|
45
|
+
rec = MARC::Record.new
|
46
|
+
rec.append(MARC::ControlField.new('001', '000000000'))
|
47
|
+
rec
|
48
|
+
end
|
49
|
+
|
40
50
|
# pretends to be a SolrJ HTTPServer-like thing, just kind of mocks it up
|
41
51
|
# and records what happens and simulates errors in some cases.
|
42
52
|
class MockSolrServer
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: traject
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.0.beta.
|
4
|
+
version: 1.0.0.beta.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jonathan Rochkind
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-10-
|
12
|
+
date: 2013-10-28 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: marc
|
@@ -286,7 +286,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
286
286
|
version: 1.3.1
|
287
287
|
requirements: []
|
288
288
|
rubyforge_project:
|
289
|
-
rubygems_version: 2.1.
|
289
|
+
rubygems_version: 2.1.9
|
290
290
|
signing_key:
|
291
291
|
specification_version: 4
|
292
292
|
summary: Index MARC to Solr; or generally process source records to hash-like structures
|