traject 0.16.0 → 0.17.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.yardopts +1 -0
- data/README.md +183 -191
- data/bench/bench.rb +1 -1
- data/doc/batch_execution.md +14 -0
- data/doc/extending.md +14 -12
- data/doc/indexing_rules.md +265 -0
- data/lib/traject/command_line.rb +12 -41
- data/lib/traject/debug_writer.rb +32 -13
- data/lib/traject/indexer.rb +101 -24
- data/lib/traject/indexer/settings.rb +18 -17
- data/lib/traject/json_writer.rb +32 -11
- data/lib/traject/line_writer.rb +6 -6
- data/lib/traject/macros/basic.rb +1 -1
- data/lib/traject/macros/marc21.rb +17 -13
- data/lib/traject/macros/marc21_semantics.rb +27 -25
- data/lib/traject/macros/marc_format_classifier.rb +39 -25
- data/lib/traject/marc4j_reader.rb +36 -22
- data/lib/traject/marc_extractor.rb +79 -75
- data/lib/traject/marc_reader.rb +33 -25
- data/lib/traject/mock_reader.rb +9 -10
- data/lib/traject/ndj_reader.rb +7 -7
- data/lib/traject/null_writer.rb +1 -1
- data/lib/traject/qualified_const_get.rb +12 -2
- data/lib/traject/solrj_writer.rb +61 -52
- data/lib/traject/thread_pool.rb +45 -45
- data/lib/traject/translation_map.rb +59 -27
- data/lib/traject/util.rb +3 -3
- data/lib/traject/version.rb +1 -1
- data/lib/traject/yaml_writer.rb +1 -1
- data/test/debug_writer_test.rb +7 -7
- data/test/indexer/each_record_test.rb +4 -4
- data/test/indexer/macros_marc21_semantics_test.rb +12 -12
- data/test/indexer/macros_marc21_test.rb +10 -10
- data/test/indexer/macros_test.rb +1 -1
- data/test/indexer/map_record_test.rb +6 -6
- data/test/indexer/read_write_test.rb +43 -4
- data/test/indexer/settings_test.rb +2 -2
- data/test/indexer/to_field_test.rb +8 -8
- data/test/marc4j_reader_test.rb +4 -4
- data/test/marc_extractor_test.rb +33 -25
- data/test/marc_format_classifier_test.rb +3 -3
- data/test/marc_reader_test.rb +2 -2
- data/test/test_helper.rb +3 -3
- data/test/test_support/demo_config.rb +52 -48
- data/test/translation_map_test.rb +22 -4
- data/test/translation_maps/bad_ruby.rb +2 -2
- data/test/translation_maps/both_map.rb +1 -1
- data/test/translation_maps/default_literal.rb +1 -1
- data/test/translation_maps/default_passthrough.rb +1 -1
- data/test/translation_maps/ruby_map.rb +1 -1
- metadata +7 -31
- data/doc/macros.md +0 -103
data/bench/bench.rb
CHANGED
data/doc/batch_execution.md
CHANGED
@@ -99,6 +99,20 @@ Now any account, in a crontab, in an interactive shell, wherever,
|
|
99
99
|
can just execute `jruby-traject {arguments}`, and execute traject
|
100
100
|
in a jruby environment.
|
101
101
|
|
102
|
+
### Bundler too?
|
103
|
+
|
104
|
+
If you're running with bundler too, you could make a wrapper file specific to
|
105
|
+
a particular traject project and it's Gemfile, by combining the `bundle exec` into
|
106
|
+
your wrapper file. For instance, for chruby, this works:
|
107
|
+
|
108
|
+
#!/usr/bin/env bash
|
109
|
+
|
110
|
+
chruby-exec jruby -- BUNDLE_GEMFILE=/path/to/Gemfile bundle exec traject "$@"
|
111
|
+
|
112
|
+
Now you can call your wrapper script from anywhere and with any active ruby,
|
113
|
+
and execute it in jruby and with the dependencies specified in the Gemfile
|
114
|
+
for your project.
|
115
|
+
|
102
116
|
## Exit codes
|
103
117
|
|
104
118
|
Traject tries to always return a well-behaved unix exit code -- 0 for success,
|
data/doc/extending.md
CHANGED
@@ -19,9 +19,9 @@ of a couple traject features meant to make it easier.
|
|
19
19
|
* translation map files found in a
|
20
20
|
"./translation_maps" subdir on the load path will be found
|
21
21
|
for Traject translation maps.
|
22
|
-
*
|
23
|
-
|
24
|
-
|
22
|
+
* You can use Bundler with traject simply by creating a Gemfile with `bundler init`,
|
23
|
+
and then running command line with `bundle exec traject` or
|
24
|
+
even `BUNDLE_GEMFILE=path/to/Gemfile bundle exec traject`
|
25
25
|
|
26
26
|
## Custom code local to your project
|
27
27
|
|
@@ -160,19 +160,21 @@ possibly with version restrictions, in the [Gemfile](http://bundler.io/v1.3/gemf
|
|
160
160
|
Run `bundle install` from the directory with the Gemfile, on any system
|
161
161
|
at any time, to make sure specified gems are installed.
|
162
162
|
|
163
|
-
**Run traject** with
|
164
|
-
your
|
163
|
+
**Run traject** with `bundle exec` to have bundler set up the environment
|
164
|
+
from your Gemfile. You can `cd` into the directory containing the Gemfile,
|
165
|
+
so bundler can find it:
|
165
166
|
|
166
|
-
|
167
|
+
$ cd /some/where
|
168
|
+
$ bundle exec traject -c some_traject_config.rb ...
|
167
169
|
|
168
|
-
Or
|
170
|
+
Or you can use the BUNDLE_GEMFILE environment variable to tell bundler where
|
171
|
+
to find the Gemfile, and run from any directory at all:
|
169
172
|
|
170
|
-
|
173
|
+
$ BUNDLE_GEMFILE=/path/to/Gemfile bundle exec traject -c /path/to/some_config.rb ...
|
171
174
|
|
172
|
-
|
173
|
-
|
174
|
-
|
175
|
-
the program).
|
175
|
+
Bundler will make sure the specified versions of all gems are used by
|
176
|
+
traject, and also make sure no gems except those specified in the gemfile
|
177
|
+
are available to the program, for a reliable reproducible environment.
|
176
178
|
|
177
179
|
You should still `require` the gem in your traject config file,
|
178
180
|
then just refer to what it provides in your config code as usual.
|
@@ -0,0 +1,265 @@
|
|
1
|
+
# Details on Traject Indexing: from custom logic to Macros
|
2
|
+
|
3
|
+
Traject macros are a way of providing re-usable index mapping rules. Before we discuss how they work, we need to remind ourselves of the basic/direct Traject `to_field` indexing method.
|
4
|
+
|
5
|
+
## How direct indexing logic works
|
6
|
+
|
7
|
+
Here's the simplest possible direct Traject mapping logic, duplicating the effects of the `literal` macro:
|
8
|
+
|
9
|
+
~~~ruby
|
10
|
+
to_field("title") do |record, accumulator, context|
|
11
|
+
accumulator << "FIXED LITERAL"
|
12
|
+
end
|
13
|
+
~~~
|
14
|
+
|
15
|
+
That `do` is just ruby `block` syntax, whereby we can pass a block of ruby code as an argument to to a ruby method. We pass a block taking three arguments, labeled `record`, `accumulator`, and `context`, to the `to_field` method. The third 'context' object is optional, you can define it in your block or not, depending on if you want to use it.
|
16
|
+
|
17
|
+
The block is then stored by the Traject::Indexer, and called for each record indexed, with three arguments provided.
|
18
|
+
|
19
|
+
#### record argument
|
20
|
+
|
21
|
+
The record that gets passed to your block is a MARC::Record object (or, theoretically, any object that gets returned by a traject Reader). Your logic will usually examine the record to calculate the desired output.
|
22
|
+
|
23
|
+
### accumulator argument
|
24
|
+
|
25
|
+
The accumulator argument is an array. At the end of your custom code, the accumulator
|
26
|
+
array should hold the output you want to send off, to the field specified in the `to_field`.
|
27
|
+
|
28
|
+
The accumulator is a reference to a ruby array, and you need to **modify** that array,
|
29
|
+
manipulating it in place with Array methods that mutate the array, like `concat`, `<<`,
|
30
|
+
`map!` or even `replace`.
|
31
|
+
|
32
|
+
You can't simply assign the accumulator variable to a different array, that won't work,
|
33
|
+
you need to modify the array in-place.
|
34
|
+
|
35
|
+
# Won't work, assigning variable
|
36
|
+
to_field('foo') do |rec, acc|
|
37
|
+
acc = ["some constant"] } # WRONG!
|
38
|
+
end
|
39
|
+
|
40
|
+
# Won't work, assigning variable
|
41
|
+
to_field('foo') do |rec, acc|
|
42
|
+
acc << 'bill'
|
43
|
+
acc << 'dueber'
|
44
|
+
acc = acc.map{|str| str.upcase}
|
45
|
+
end # WRONG! WRONG! WRONG! WRONG! WRONG!
|
46
|
+
|
47
|
+
|
48
|
+
# Instead, do, modify array in place
|
49
|
+
to_field('foo') {|rec, acc| acc << "some constant" }
|
50
|
+
to_field('foo') do |rec, acc|
|
51
|
+
acc << 'bill'
|
52
|
+
acc << 'dueber'
|
53
|
+
acc = acc.map!{|str| str.upcase} #notice using "map!" not just "map"
|
54
|
+
end
|
55
|
+
|
56
|
+
### context argument
|
57
|
+
|
58
|
+
The third optional context argument
|
59
|
+
|
60
|
+
The third optional argument is a
|
61
|
+
[Traject::Indexer::Context](./lib/traject/indexer/context.rb) ([rdoc](http://rdoc.info/github/jrochkind/traject/Traject/Indexer/Context))
|
62
|
+
object. Most of the time you don't need it, but you can use it for
|
63
|
+
some sophisticated functionality, for example using these Context methods:
|
64
|
+
|
65
|
+
* `context.clipboard` A hash into which you can stuff values that you want to pass from one indexing step to another. For example, if you go through a bunch of work to query a database and get a result you'll need more than once, stick the results somewhere in the clipboard.
|
66
|
+
* `context.position` The position of the record in the input file (e.g., was it the first record, seoncd, etc.). Useful for error reporting
|
67
|
+
* `context.output_hash` A hash mapping the field names (generally defined in `to_field` calls) to an array of values to be sent to the writer associated with that field. This allows you to modify what goes to the writer without going through a `to_field` call -- you can just set `context.output_hash['myfield'] = ['my', 'values']` and you're set. See below for more examples
|
68
|
+
* `context.skip!(msg)` An assertion that this record should be ignored. No more indexing steps will be called, no results will be sent to the writer, and a `debug`-level log message will be written stating that the record was skipped.
|
69
|
+
|
70
|
+
|
71
|
+
## Gotcha: Use closures to make your code more efficient
|
72
|
+
|
73
|
+
A _closure_ is a computer-science term that means "a piece of code
|
74
|
+
that remembers all the variables that were in scope when it was
|
75
|
+
created." In ruby, lambdas and blocks are closures. Method definitions
|
76
|
+
are not, which most of us have run across much to our chagrin.
|
77
|
+
|
78
|
+
Within the context of `traject`, this means you can define a variable
|
79
|
+
outside of a `to_field` or `each_record` block and it will be avaiable
|
80
|
+
inside those blocks. And you only have to define it once.
|
81
|
+
|
82
|
+
That's useful to do for any object that is even a bit expensive
|
83
|
+
to create -- we can maximize the performance of our traject
|
84
|
+
indexing by creating those objects once outside the block,
|
85
|
+
instead of inside the block where it will be created
|
86
|
+
once per-record (every time the block is executed):
|
87
|
+
|
88
|
+
Compare:
|
89
|
+
|
90
|
+
```ruby
|
91
|
+
# Create the transformer for every single record
|
92
|
+
to_field 'normalized_title' do |rec, acc|
|
93
|
+
transformer = My::Custom::Format::Transformer.new # Oh no! I'm doing this for each of my 10M records!
|
94
|
+
acc << transformer.transform(rec['245'].value)
|
95
|
+
end
|
96
|
+
|
97
|
+
# Create the transformer exactly once
|
98
|
+
transformer = My::Custom::Format::Transformer.new # Ahhh. Do it once.
|
99
|
+
to_field 'normalized_title' do |rec, acc|
|
100
|
+
acc << transformer.transform(rec['245'].value)
|
101
|
+
end
|
102
|
+
```
|
103
|
+
|
104
|
+
Certain built-in traject calls have been optimized to be high performance
|
105
|
+
so it's safe to do them inside 'inner loop' blocks though.
|
106
|
+
That includes `Traject::TranslationMap.new` and `Traject::MarcExtractor.cached("xxx")`
|
107
|
+
(note #cached rather than #new there)
|
108
|
+
|
109
|
+
|
110
|
+
## From block to lambda
|
111
|
+
|
112
|
+
In the ruby language, in addition to creating a code block as an argument
|
113
|
+
to a method with `do |args| ... end` or `{|arg| ... }, we can also create
|
114
|
+
a code block to hold in a variable, with the `lambda` keyword:
|
115
|
+
|
116
|
+
always_output_foo = lambda do |record, accumulator|
|
117
|
+
accumulator << "FOO"
|
118
|
+
end
|
119
|
+
|
120
|
+
traject `to_field` is written so, as a convenience, it can take a lambda expression
|
121
|
+
stored in a variable as an alternative to a block:
|
122
|
+
|
123
|
+
to_field("always_has_foo"), always_output_foo
|
124
|
+
|
125
|
+
Why is this a convenience? Well, ordinarily it's not something we
|
126
|
+
need, but in fact it's what allows traject 'macros' as re-useable
|
127
|
+
code templates.
|
128
|
+
|
129
|
+
|
130
|
+
## Macros
|
131
|
+
|
132
|
+
A Traject macro is a way to automatically create indexing rules via re-usable "templates".
|
133
|
+
|
134
|
+
Traject macros are simply methods that return ruby lambda/proc objects, possibly creating
|
135
|
+
them based on parameters passed in.
|
136
|
+
|
137
|
+
Here is in fact how the `literal` function is implemented:
|
138
|
+
|
139
|
+
~~~ruby
|
140
|
+
def literal(value)
|
141
|
+
return lambda do |record, accumulator, context|
|
142
|
+
# because a lambda is a closure, we can define it in terms
|
143
|
+
# of the 'value' from the scope it's defined in!
|
144
|
+
accumulator << value
|
145
|
+
end
|
146
|
+
end
|
147
|
+
to_field("something"), literal("something")
|
148
|
+
~~~
|
149
|
+
|
150
|
+
It's really as simple as that, that's all a Traject macro is. A function that takes parameters, and based on those parameters returns a lambda; the lambda is then passed to the `to_field` indexing method, or similar methods.
|
151
|
+
|
152
|
+
How do you make these methods available to the indexer?
|
153
|
+
|
154
|
+
Define it in a module:
|
155
|
+
|
156
|
+
~~~ruby
|
157
|
+
# in a file literal_macro.rb
|
158
|
+
module LiteralMacro
|
159
|
+
def literal(value)
|
160
|
+
return lambda do |record, accumulator, context|
|
161
|
+
# because a lambda is a closure, we can define it in terms
|
162
|
+
# of the 'value' from the scope it's defined in!
|
163
|
+
accumulator << value
|
164
|
+
end
|
165
|
+
end
|
166
|
+
end
|
167
|
+
~~~
|
168
|
+
|
169
|
+
And then use ordinary ruby `require` and `extend` to add it to the current Indexer file, by simply including this
|
170
|
+
in one of your config files:
|
171
|
+
|
172
|
+
~~~
|
173
|
+
require `literal_macro.rb`
|
174
|
+
extend LiteralMacro
|
175
|
+
|
176
|
+
to_field ...
|
177
|
+
~~~
|
178
|
+
|
179
|
+
That's it. You can use the traject command line `-I` option to set the ruby load path, so your file will be findable via `require`. Or you can distribute it in a gem, and use straight rubygems and the `gem` command in your configuration file, or Bundler with traject command-line `-g` option.
|
180
|
+
|
181
|
+
## Using a lambda _and_ and block
|
182
|
+
|
183
|
+
Traject macros (such as `extract_marc`) create and return a lambda. If
|
184
|
+
you include a lambda _and_ a block on a `to_field` call, the latter
|
185
|
+
gets the accumulator as it was filled in by the former.
|
186
|
+
|
187
|
+
```ruby
|
188
|
+
# Get the titles and lowercase them
|
189
|
+
to_field 'lc_title', extract_marc('245') do |rec, acc, context|
|
190
|
+
acc.map!{|title| title.downcase}
|
191
|
+
end
|
192
|
+
|
193
|
+
# Build my own lambda and use it
|
194
|
+
mylam = lambda {|rec, acc| acc << 'one'} # just add a constant
|
195
|
+
to_field('foo'), mylam do |rec, acc, context|
|
196
|
+
acc << 'two'
|
197
|
+
end #=> context.output_hash['foo'] == ['one', 'two']
|
198
|
+
|
199
|
+
|
200
|
+
# You might also want to do something like this
|
201
|
+
|
202
|
+
to_field('foo'), my_macro_that_doesn't_dedup_ do |rec, acc|
|
203
|
+
acc.uniq!
|
204
|
+
end
|
205
|
+
```
|
206
|
+
|
207
|
+
## Maniuplating `context.output_hash` directly
|
208
|
+
|
209
|
+
If you ask for the context argument, a [Traject::Indexer::Context](./lib/traject/indexer/context.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Indexer/Context)), you have access to context.output_hash, with is
|
210
|
+
the hash of transformed output that will be sent to Solr (or any other Writer)
|
211
|
+
|
212
|
+
You can look in there to see any already transformed output and use it as the source
|
213
|
+
for new output. You can actually *write* to there manually, which can be useful
|
214
|
+
to write routines that effect more than one output field at once.
|
215
|
+
|
216
|
+
**Note**: Make sure you always assign an _array_ to, e.g., `context.output_hash['foo']`, not a single value!
|
217
|
+
|
218
|
+
|
219
|
+
|
220
|
+
## each_record
|
221
|
+
|
222
|
+
All the previous discussion was in terms of `to_field` -- `each_record` is a similar
|
223
|
+
routine, to define logic that is executed for each record, but isn't fixed to write
|
224
|
+
to a single output field.
|
225
|
+
|
226
|
+
So `each_record` blocks have no `accumulator` argument, instead they either take a single
|
227
|
+
`record` argument; or both a `record` and a `context`.
|
228
|
+
|
229
|
+
`each_record` can be used for logging or notifiying; computing intermediate
|
230
|
+
results; or writing to more than one field at once.
|
231
|
+
|
232
|
+
~~~ruby
|
233
|
+
each_record do |record, context|
|
234
|
+
if is_it_bad?(record)
|
235
|
+
context.skip!("Skipping bad record")
|
236
|
+
else
|
237
|
+
context.clipboard[:expensive_result] = calculate_expensive_thing(record)
|
238
|
+
end
|
239
|
+
end
|
240
|
+
|
241
|
+
each_record do |record, context|
|
242
|
+
(one, two) = calculate_two_things_from(record)
|
243
|
+
|
244
|
+
context.output_hash["first_field"] ||= []
|
245
|
+
context.output_hash["first_field"] << one
|
246
|
+
|
247
|
+
context.output_hash["second_field"] ||= []
|
248
|
+
context.output_hash["second_field"] << one
|
249
|
+
end
|
250
|
+
~~~
|
251
|
+
|
252
|
+
traject doesn't come with any macros written for use with
|
253
|
+
`each_record`, but they could be created if useful --
|
254
|
+
just methods that return lambda's taking the right
|
255
|
+
args for `each_record`.
|
256
|
+
|
257
|
+
## More tips and gotchas about indexing steps
|
258
|
+
|
259
|
+
* **All your `to_field` and `each_record` steps are run _in the order in which they were initially evaluated_**. That means that the order you call your config files can potentially make a difference if you're screwing around stuffing stuff into the context clipboard or whatnot.
|
260
|
+
|
261
|
+
* **`to_field` can be called multiple times on the same field name.** If you call the same field name multiple times, all the values will be sent to the writer.
|
262
|
+
|
263
|
+
* **Once you call `context.skip!(msg)` no more index steps will be run for that record**. So if you have any cleanup code, you'll need to make sure to call it yourself.
|
264
|
+
|
265
|
+
* **By default, `trajcet` indexing runs multi-threaded**. In the current implementation, the indexing steps for one record are *not* split across threads, but different records can be processed simultaneously by more than one thread. That means you need to make sure your code is thread-safe (or always set `processing_thread_pool` to 0).
|
data/lib/traject/command_line.rb
CHANGED
@@ -1,7 +1,6 @@
|
|
1
|
-
# Require as little as possible at top, so we can bundle require later
|
2
|
-
# if needed, before requiring anything from the bundle. Can't avoid slop
|
3
|
-
# though, to get our bundle arg out, sorry.
|
4
1
|
require 'slop'
|
2
|
+
require 'traject'
|
3
|
+
require 'traject/indexer'
|
5
4
|
|
6
5
|
module Traject
|
7
6
|
# The class that executes for the Traject command line utility.
|
@@ -33,21 +32,6 @@ module Traject
|
|
33
32
|
# Returns true on success or false on failure; may also raise exceptions;
|
34
33
|
# may also exit program directly itself (yeah, could use some normalization)
|
35
34
|
def execute
|
36
|
-
# Do bundler setup FIRST to try and initialize all gems from gemfile
|
37
|
-
# if requested.
|
38
|
-
|
39
|
-
# have to use Slop object to tell diff between
|
40
|
-
# no arg supplied and no option -g given at all
|
41
|
-
if slop.present? :Gemfile
|
42
|
-
require_bundler_setup(options[:Gemfile])
|
43
|
-
end
|
44
|
-
|
45
|
-
|
46
|
-
# We require them here instead of top of file,
|
47
|
-
# so we have done bundler require before we require these.
|
48
|
-
require 'traject'
|
49
|
-
require 'traject/indexer'
|
50
|
-
|
51
35
|
if options[:version]
|
52
36
|
self.console.puts "traject version #{Traject::VERSION}"
|
53
37
|
return
|
@@ -92,6 +76,10 @@ module Traject
|
|
92
76
|
end
|
93
77
|
|
94
78
|
return result
|
79
|
+
rescue Exception => e
|
80
|
+
# Try to log unexpected exceptions if possible
|
81
|
+
indexer && indexer.logger && indexer.logger.fatal("Traject::CommandLine: Unexpected exception, terminating execution: #{e.inspect}") rescue nil
|
82
|
+
raise e
|
95
83
|
end
|
96
84
|
|
97
85
|
def command_commit!
|
@@ -117,19 +105,21 @@ module Traject
|
|
117
105
|
$stdout
|
118
106
|
end
|
119
107
|
|
108
|
+
indexer.logger.info(" marcout writing type:#{output_type} to file:#{output_arg}")
|
109
|
+
|
120
110
|
case output_type
|
121
111
|
when "binary"
|
122
112
|
writer = MARC::Writer.new(output_arg)
|
123
113
|
|
124
114
|
allow_oversized = indexer.settings["marcout.allow_oversized"]
|
125
115
|
if allow_oversized
|
126
|
-
allow_oversized = (allow_oversized.to_s == "true")
|
116
|
+
allow_oversized = (allow_oversized.to_s == "true")
|
127
117
|
writer.allow_oversized = allow_oversized
|
128
118
|
end
|
129
119
|
when "xml"
|
130
120
|
writer = MARC::XMLWriter.new(output_arg)
|
131
121
|
when "human"
|
132
|
-
writer = output_arg.kind_of?(String) ? File.open(output_arg, "w:binary") : output_arg
|
122
|
+
writer = output_arg.kind_of?(String) ? File.open(output_arg, "w:binary") : output_arg
|
133
123
|
else
|
134
124
|
raise ArgumentError.new("traject marcout unrecognized marcout.type: #{output_type}")
|
135
125
|
end
|
@@ -174,7 +164,7 @@ module Traject
|
|
174
164
|
filename = argv.first
|
175
165
|
indexer.logger.info "Reading from #{filename}"
|
176
166
|
end
|
177
|
-
|
167
|
+
|
178
168
|
return io, filename
|
179
169
|
end
|
180
170
|
|
@@ -215,24 +205,6 @@ module Traject
|
|
215
205
|
end
|
216
206
|
end
|
217
207
|
|
218
|
-
# requires bundler/setup, optionally first setting ENV["BUNDLE_GEMFILE"]
|
219
|
-
# to tell bundler to use a specific gemfile. Gemfile arg can be relative
|
220
|
-
# to current working directory.
|
221
|
-
def require_bundler_setup(gemfile=nil)
|
222
|
-
if gemfile
|
223
|
-
# tell bundler what gemfile to use
|
224
|
-
gem_path = File.expand_path( gemfile )
|
225
|
-
# bundler not good at error reporting, we check ourselves
|
226
|
-
unless File.exists? gem_path
|
227
|
-
self.console.puts "Gemfile `#{gemfile}` does not exist, exiting..."
|
228
|
-
self.console.puts
|
229
|
-
self.console.puts slop.help
|
230
|
-
exit 2
|
231
|
-
end
|
232
|
-
ENV["BUNDLE_GEMFILE"] = gem_path
|
233
|
-
end
|
234
|
-
require 'bundler/setup'
|
235
|
-
end
|
236
208
|
|
237
209
|
def assemble_settings_hash(options)
|
238
210
|
settings = {}
|
@@ -256,7 +228,7 @@ module Traject
|
|
256
228
|
if options[:'debug-mode']
|
257
229
|
require 'traject/debug_writer'
|
258
230
|
settings["writer_class_name"] = "Traject::DebugWriter"
|
259
|
-
settings["log.level"] = "debug"
|
231
|
+
settings["log.level"] = "debug"
|
260
232
|
settings["processing_thread_pool"] = 0
|
261
233
|
end
|
262
234
|
if options[:writer]
|
@@ -294,7 +266,6 @@ module Traject
|
|
294
266
|
on :u, :solr, "Set solr url, shortcut for -s solr.url=", :argument => true
|
295
267
|
on :t, :marc_type, "xml, json or binary. shortcut for -s marc_source.type=", :argument => true
|
296
268
|
on :I, "load_path", "append paths to ruby $LOAD_PATH", :argument => true, :as => Array, :delimiter => ":"
|
297
|
-
on :G, "Gemfile", "run with bundler and optionally specified Gemfile", :argument => :optional, :default => nil
|
298
269
|
|
299
270
|
on :x, "command", "alternate traject command: process (default); marcout; commit", :argument => true, :default => "process"
|
300
271
|
|