traject 0.16.0 → 0.17.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.yardopts +1 -0
- data/README.md +183 -191
- data/bench/bench.rb +1 -1
- data/doc/batch_execution.md +14 -0
- data/doc/extending.md +14 -12
- data/doc/indexing_rules.md +265 -0
- data/lib/traject/command_line.rb +12 -41
- data/lib/traject/debug_writer.rb +32 -13
- data/lib/traject/indexer.rb +101 -24
- data/lib/traject/indexer/settings.rb +18 -17
- data/lib/traject/json_writer.rb +32 -11
- data/lib/traject/line_writer.rb +6 -6
- data/lib/traject/macros/basic.rb +1 -1
- data/lib/traject/macros/marc21.rb +17 -13
- data/lib/traject/macros/marc21_semantics.rb +27 -25
- data/lib/traject/macros/marc_format_classifier.rb +39 -25
- data/lib/traject/marc4j_reader.rb +36 -22
- data/lib/traject/marc_extractor.rb +79 -75
- data/lib/traject/marc_reader.rb +33 -25
- data/lib/traject/mock_reader.rb +9 -10
- data/lib/traject/ndj_reader.rb +7 -7
- data/lib/traject/null_writer.rb +1 -1
- data/lib/traject/qualified_const_get.rb +12 -2
- data/lib/traject/solrj_writer.rb +61 -52
- data/lib/traject/thread_pool.rb +45 -45
- data/lib/traject/translation_map.rb +59 -27
- data/lib/traject/util.rb +3 -3
- data/lib/traject/version.rb +1 -1
- data/lib/traject/yaml_writer.rb +1 -1
- data/test/debug_writer_test.rb +7 -7
- data/test/indexer/each_record_test.rb +4 -4
- data/test/indexer/macros_marc21_semantics_test.rb +12 -12
- data/test/indexer/macros_marc21_test.rb +10 -10
- data/test/indexer/macros_test.rb +1 -1
- data/test/indexer/map_record_test.rb +6 -6
- data/test/indexer/read_write_test.rb +43 -4
- data/test/indexer/settings_test.rb +2 -2
- data/test/indexer/to_field_test.rb +8 -8
- data/test/marc4j_reader_test.rb +4 -4
- data/test/marc_extractor_test.rb +33 -25
- data/test/marc_format_classifier_test.rb +3 -3
- data/test/marc_reader_test.rb +2 -2
- data/test/test_helper.rb +3 -3
- data/test/test_support/demo_config.rb +52 -48
- data/test/translation_map_test.rb +22 -4
- data/test/translation_maps/bad_ruby.rb +2 -2
- data/test/translation_maps/both_map.rb +1 -1
- data/test/translation_maps/default_literal.rb +1 -1
- data/test/translation_maps/default_passthrough.rb +1 -1
- data/test/translation_maps/ruby_map.rb +1 -1
- metadata +7 -31
- data/doc/macros.md +0 -103
data/bench/bench.rb
CHANGED
data/doc/batch_execution.md
CHANGED
@@ -99,6 +99,20 @@ Now any account, in a crontab, in an interactive shell, wherever,
|
|
99
99
|
can just execute `jruby-traject {arguments}`, and execute traject
|
100
100
|
in a jruby environment.
|
101
101
|
|
102
|
+
### Bundler too?
|
103
|
+
|
104
|
+
If you're running with bundler too, you could make a wrapper file specific to
|
105
|
+
a particular traject project and it's Gemfile, by combining the `bundle exec` into
|
106
|
+
your wrapper file. For instance, for chruby, this works:
|
107
|
+
|
108
|
+
#!/usr/bin/env bash
|
109
|
+
|
110
|
+
chruby-exec jruby -- BUNDLE_GEMFILE=/path/to/Gemfile bundle exec traject "$@"
|
111
|
+
|
112
|
+
Now you can call your wrapper script from anywhere and with any active ruby,
|
113
|
+
and execute it in jruby and with the dependencies specified in the Gemfile
|
114
|
+
for your project.
|
115
|
+
|
102
116
|
## Exit codes
|
103
117
|
|
104
118
|
Traject tries to always return a well-behaved unix exit code -- 0 for success,
|
data/doc/extending.md
CHANGED
@@ -19,9 +19,9 @@ of a couple traject features meant to make it easier.
|
|
19
19
|
* translation map files found in a
|
20
20
|
"./translation_maps" subdir on the load path will be found
|
21
21
|
for Traject translation maps.
|
22
|
-
*
|
23
|
-
|
24
|
-
|
22
|
+
* You can use Bundler with traject simply by creating a Gemfile with `bundler init`,
|
23
|
+
and then running command line with `bundle exec traject` or
|
24
|
+
even `BUNDLE_GEMFILE=path/to/Gemfile bundle exec traject`
|
25
25
|
|
26
26
|
## Custom code local to your project
|
27
27
|
|
@@ -160,19 +160,21 @@ possibly with version restrictions, in the [Gemfile](http://bundler.io/v1.3/gemf
|
|
160
160
|
Run `bundle install` from the directory with the Gemfile, on any system
|
161
161
|
at any time, to make sure specified gems are installed.
|
162
162
|
|
163
|
-
**Run traject** with
|
164
|
-
your
|
163
|
+
**Run traject** with `bundle exec` to have bundler set up the environment
|
164
|
+
from your Gemfile. You can `cd` into the directory containing the Gemfile,
|
165
|
+
so bundler can find it:
|
165
166
|
|
166
|
-
|
167
|
+
$ cd /some/where
|
168
|
+
$ bundle exec traject -c some_traject_config.rb ...
|
167
169
|
|
168
|
-
Or
|
170
|
+
Or you can use the BUNDLE_GEMFILE environment variable to tell bundler where
|
171
|
+
to find the Gemfile, and run from any directory at all:
|
169
172
|
|
170
|
-
|
173
|
+
$ BUNDLE_GEMFILE=/path/to/Gemfile bundle exec traject -c /path/to/some_config.rb ...
|
171
174
|
|
172
|
-
|
173
|
-
|
174
|
-
|
175
|
-
the program).
|
175
|
+
Bundler will make sure the specified versions of all gems are used by
|
176
|
+
traject, and also make sure no gems except those specified in the gemfile
|
177
|
+
are available to the program, for a reliable reproducible environment.
|
176
178
|
|
177
179
|
You should still `require` the gem in your traject config file,
|
178
180
|
then just refer to what it provides in your config code as usual.
|
@@ -0,0 +1,265 @@
|
|
1
|
+
# Details on Traject Indexing: from custom logic to Macros
|
2
|
+
|
3
|
+
Traject macros are a way of providing re-usable index mapping rules. Before we discuss how they work, we need to remind ourselves of the basic/direct Traject `to_field` indexing method.
|
4
|
+
|
5
|
+
## How direct indexing logic works
|
6
|
+
|
7
|
+
Here's the simplest possible direct Traject mapping logic, duplicating the effects of the `literal` macro:
|
8
|
+
|
9
|
+
~~~ruby
|
10
|
+
to_field("title") do |record, accumulator, context|
|
11
|
+
accumulator << "FIXED LITERAL"
|
12
|
+
end
|
13
|
+
~~~
|
14
|
+
|
15
|
+
That `do` is just ruby `block` syntax, whereby we can pass a block of ruby code as an argument to to a ruby method. We pass a block taking three arguments, labeled `record`, `accumulator`, and `context`, to the `to_field` method. The third 'context' object is optional, you can define it in your block or not, depending on if you want to use it.
|
16
|
+
|
17
|
+
The block is then stored by the Traject::Indexer, and called for each record indexed, with three arguments provided.
|
18
|
+
|
19
|
+
#### record argument
|
20
|
+
|
21
|
+
The record that gets passed to your block is a MARC::Record object (or, theoretically, any object that gets returned by a traject Reader). Your logic will usually examine the record to calculate the desired output.
|
22
|
+
|
23
|
+
### accumulator argument
|
24
|
+
|
25
|
+
The accumulator argument is an array. At the end of your custom code, the accumulator
|
26
|
+
array should hold the output you want to send off, to the field specified in the `to_field`.
|
27
|
+
|
28
|
+
The accumulator is a reference to a ruby array, and you need to **modify** that array,
|
29
|
+
manipulating it in place with Array methods that mutate the array, like `concat`, `<<`,
|
30
|
+
`map!` or even `replace`.
|
31
|
+
|
32
|
+
You can't simply assign the accumulator variable to a different array, that won't work,
|
33
|
+
you need to modify the array in-place.
|
34
|
+
|
35
|
+
# Won't work, assigning variable
|
36
|
+
to_field('foo') do |rec, acc|
|
37
|
+
acc = ["some constant"] } # WRONG!
|
38
|
+
end
|
39
|
+
|
40
|
+
# Won't work, assigning variable
|
41
|
+
to_field('foo') do |rec, acc|
|
42
|
+
acc << 'bill'
|
43
|
+
acc << 'dueber'
|
44
|
+
acc = acc.map{|str| str.upcase}
|
45
|
+
end # WRONG! WRONG! WRONG! WRONG! WRONG!
|
46
|
+
|
47
|
+
|
48
|
+
# Instead, do, modify array in place
|
49
|
+
to_field('foo') {|rec, acc| acc << "some constant" }
|
50
|
+
to_field('foo') do |rec, acc|
|
51
|
+
acc << 'bill'
|
52
|
+
acc << 'dueber'
|
53
|
+
acc = acc.map!{|str| str.upcase} #notice using "map!" not just "map"
|
54
|
+
end
|
55
|
+
|
56
|
+
### context argument
|
57
|
+
|
58
|
+
The third optional context argument
|
59
|
+
|
60
|
+
The third optional argument is a
|
61
|
+
[Traject::Indexer::Context](./lib/traject/indexer/context.rb) ([rdoc](http://rdoc.info/github/jrochkind/traject/Traject/Indexer/Context))
|
62
|
+
object. Most of the time you don't need it, but you can use it for
|
63
|
+
some sophisticated functionality, for example using these Context methods:
|
64
|
+
|
65
|
+
* `context.clipboard` A hash into which you can stuff values that you want to pass from one indexing step to another. For example, if you go through a bunch of work to query a database and get a result you'll need more than once, stick the results somewhere in the clipboard.
|
66
|
+
* `context.position` The position of the record in the input file (e.g., was it the first record, seoncd, etc.). Useful for error reporting
|
67
|
+
* `context.output_hash` A hash mapping the field names (generally defined in `to_field` calls) to an array of values to be sent to the writer associated with that field. This allows you to modify what goes to the writer without going through a `to_field` call -- you can just set `context.output_hash['myfield'] = ['my', 'values']` and you're set. See below for more examples
|
68
|
+
* `context.skip!(msg)` An assertion that this record should be ignored. No more indexing steps will be called, no results will be sent to the writer, and a `debug`-level log message will be written stating that the record was skipped.
|
69
|
+
|
70
|
+
|
71
|
+
## Gotcha: Use closures to make your code more efficient
|
72
|
+
|
73
|
+
A _closure_ is a computer-science term that means "a piece of code
|
74
|
+
that remembers all the variables that were in scope when it was
|
75
|
+
created." In ruby, lambdas and blocks are closures. Method definitions
|
76
|
+
are not, which most of us have run across much to our chagrin.
|
77
|
+
|
78
|
+
Within the context of `traject`, this means you can define a variable
|
79
|
+
outside of a `to_field` or `each_record` block and it will be avaiable
|
80
|
+
inside those blocks. And you only have to define it once.
|
81
|
+
|
82
|
+
That's useful to do for any object that is even a bit expensive
|
83
|
+
to create -- we can maximize the performance of our traject
|
84
|
+
indexing by creating those objects once outside the block,
|
85
|
+
instead of inside the block where it will be created
|
86
|
+
once per-record (every time the block is executed):
|
87
|
+
|
88
|
+
Compare:
|
89
|
+
|
90
|
+
```ruby
|
91
|
+
# Create the transformer for every single record
|
92
|
+
to_field 'normalized_title' do |rec, acc|
|
93
|
+
transformer = My::Custom::Format::Transformer.new # Oh no! I'm doing this for each of my 10M records!
|
94
|
+
acc << transformer.transform(rec['245'].value)
|
95
|
+
end
|
96
|
+
|
97
|
+
# Create the transformer exactly once
|
98
|
+
transformer = My::Custom::Format::Transformer.new # Ahhh. Do it once.
|
99
|
+
to_field 'normalized_title' do |rec, acc|
|
100
|
+
acc << transformer.transform(rec['245'].value)
|
101
|
+
end
|
102
|
+
```
|
103
|
+
|
104
|
+
Certain built-in traject calls have been optimized to be high performance
|
105
|
+
so it's safe to do them inside 'inner loop' blocks though.
|
106
|
+
That includes `Traject::TranslationMap.new` and `Traject::MarcExtractor.cached("xxx")`
|
107
|
+
(note #cached rather than #new there)
|
108
|
+
|
109
|
+
|
110
|
+
## From block to lambda
|
111
|
+
|
112
|
+
In the ruby language, in addition to creating a code block as an argument
|
113
|
+
to a method with `do |args| ... end` or `{|arg| ... }, we can also create
|
114
|
+
a code block to hold in a variable, with the `lambda` keyword:
|
115
|
+
|
116
|
+
always_output_foo = lambda do |record, accumulator|
|
117
|
+
accumulator << "FOO"
|
118
|
+
end
|
119
|
+
|
120
|
+
traject `to_field` is written so, as a convenience, it can take a lambda expression
|
121
|
+
stored in a variable as an alternative to a block:
|
122
|
+
|
123
|
+
to_field("always_has_foo"), always_output_foo
|
124
|
+
|
125
|
+
Why is this a convenience? Well, ordinarily it's not something we
|
126
|
+
need, but in fact it's what allows traject 'macros' as re-useable
|
127
|
+
code templates.
|
128
|
+
|
129
|
+
|
130
|
+
## Macros
|
131
|
+
|
132
|
+
A Traject macro is a way to automatically create indexing rules via re-usable "templates".
|
133
|
+
|
134
|
+
Traject macros are simply methods that return ruby lambda/proc objects, possibly creating
|
135
|
+
them based on parameters passed in.
|
136
|
+
|
137
|
+
Here is in fact how the `literal` function is implemented:
|
138
|
+
|
139
|
+
~~~ruby
|
140
|
+
def literal(value)
|
141
|
+
return lambda do |record, accumulator, context|
|
142
|
+
# because a lambda is a closure, we can define it in terms
|
143
|
+
# of the 'value' from the scope it's defined in!
|
144
|
+
accumulator << value
|
145
|
+
end
|
146
|
+
end
|
147
|
+
to_field("something"), literal("something")
|
148
|
+
~~~
|
149
|
+
|
150
|
+
It's really as simple as that, that's all a Traject macro is. A function that takes parameters, and based on those parameters returns a lambda; the lambda is then passed to the `to_field` indexing method, or similar methods.
|
151
|
+
|
152
|
+
How do you make these methods available to the indexer?
|
153
|
+
|
154
|
+
Define it in a module:
|
155
|
+
|
156
|
+
~~~ruby
|
157
|
+
# in a file literal_macro.rb
|
158
|
+
module LiteralMacro
|
159
|
+
def literal(value)
|
160
|
+
return lambda do |record, accumulator, context|
|
161
|
+
# because a lambda is a closure, we can define it in terms
|
162
|
+
# of the 'value' from the scope it's defined in!
|
163
|
+
accumulator << value
|
164
|
+
end
|
165
|
+
end
|
166
|
+
end
|
167
|
+
~~~
|
168
|
+
|
169
|
+
And then use ordinary ruby `require` and `extend` to add it to the current Indexer file, by simply including this
|
170
|
+
in one of your config files:
|
171
|
+
|
172
|
+
~~~
|
173
|
+
require `literal_macro.rb`
|
174
|
+
extend LiteralMacro
|
175
|
+
|
176
|
+
to_field ...
|
177
|
+
~~~
|
178
|
+
|
179
|
+
That's it. You can use the traject command line `-I` option to set the ruby load path, so your file will be findable via `require`. Or you can distribute it in a gem, and use straight rubygems and the `gem` command in your configuration file, or Bundler with traject command-line `-g` option.
|
180
|
+
|
181
|
+
## Using a lambda _and_ and block
|
182
|
+
|
183
|
+
Traject macros (such as `extract_marc`) create and return a lambda. If
|
184
|
+
you include a lambda _and_ a block on a `to_field` call, the latter
|
185
|
+
gets the accumulator as it was filled in by the former.
|
186
|
+
|
187
|
+
```ruby
|
188
|
+
# Get the titles and lowercase them
|
189
|
+
to_field 'lc_title', extract_marc('245') do |rec, acc, context|
|
190
|
+
acc.map!{|title| title.downcase}
|
191
|
+
end
|
192
|
+
|
193
|
+
# Build my own lambda and use it
|
194
|
+
mylam = lambda {|rec, acc| acc << 'one'} # just add a constant
|
195
|
+
to_field('foo'), mylam do |rec, acc, context|
|
196
|
+
acc << 'two'
|
197
|
+
end #=> context.output_hash['foo'] == ['one', 'two']
|
198
|
+
|
199
|
+
|
200
|
+
# You might also want to do something like this
|
201
|
+
|
202
|
+
to_field('foo'), my_macro_that_doesn't_dedup_ do |rec, acc|
|
203
|
+
acc.uniq!
|
204
|
+
end
|
205
|
+
```
|
206
|
+
|
207
|
+
## Maniuplating `context.output_hash` directly
|
208
|
+
|
209
|
+
If you ask for the context argument, a [Traject::Indexer::Context](./lib/traject/indexer/context.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Indexer/Context)), you have access to context.output_hash, with is
|
210
|
+
the hash of transformed output that will be sent to Solr (or any other Writer)
|
211
|
+
|
212
|
+
You can look in there to see any already transformed output and use it as the source
|
213
|
+
for new output. You can actually *write* to there manually, which can be useful
|
214
|
+
to write routines that effect more than one output field at once.
|
215
|
+
|
216
|
+
**Note**: Make sure you always assign an _array_ to, e.g., `context.output_hash['foo']`, not a single value!
|
217
|
+
|
218
|
+
|
219
|
+
|
220
|
+
## each_record
|
221
|
+
|
222
|
+
All the previous discussion was in terms of `to_field` -- `each_record` is a similar
|
223
|
+
routine, to define logic that is executed for each record, but isn't fixed to write
|
224
|
+
to a single output field.
|
225
|
+
|
226
|
+
So `each_record` blocks have no `accumulator` argument, instead they either take a single
|
227
|
+
`record` argument; or both a `record` and a `context`.
|
228
|
+
|
229
|
+
`each_record` can be used for logging or notifiying; computing intermediate
|
230
|
+
results; or writing to more than one field at once.
|
231
|
+
|
232
|
+
~~~ruby
|
233
|
+
each_record do |record, context|
|
234
|
+
if is_it_bad?(record)
|
235
|
+
context.skip!("Skipping bad record")
|
236
|
+
else
|
237
|
+
context.clipboard[:expensive_result] = calculate_expensive_thing(record)
|
238
|
+
end
|
239
|
+
end
|
240
|
+
|
241
|
+
each_record do |record, context|
|
242
|
+
(one, two) = calculate_two_things_from(record)
|
243
|
+
|
244
|
+
context.output_hash["first_field"] ||= []
|
245
|
+
context.output_hash["first_field"] << one
|
246
|
+
|
247
|
+
context.output_hash["second_field"] ||= []
|
248
|
+
context.output_hash["second_field"] << one
|
249
|
+
end
|
250
|
+
~~~
|
251
|
+
|
252
|
+
traject doesn't come with any macros written for use with
|
253
|
+
`each_record`, but they could be created if useful --
|
254
|
+
just methods that return lambda's taking the right
|
255
|
+
args for `each_record`.
|
256
|
+
|
257
|
+
## More tips and gotchas about indexing steps
|
258
|
+
|
259
|
+
* **All your `to_field` and `each_record` steps are run _in the order in which they were initially evaluated_**. That means that the order you call your config files can potentially make a difference if you're screwing around stuffing stuff into the context clipboard or whatnot.
|
260
|
+
|
261
|
+
* **`to_field` can be called multiple times on the same field name.** If you call the same field name multiple times, all the values will be sent to the writer.
|
262
|
+
|
263
|
+
* **Once you call `context.skip!(msg)` no more index steps will be run for that record**. So if you have any cleanup code, you'll need to make sure to call it yourself.
|
264
|
+
|
265
|
+
* **By default, `trajcet` indexing runs multi-threaded**. In the current implementation, the indexing steps for one record are *not* split across threads, but different records can be processed simultaneously by more than one thread. That means you need to make sure your code is thread-safe (or always set `processing_thread_pool` to 0).
|
data/lib/traject/command_line.rb
CHANGED
@@ -1,7 +1,6 @@
|
|
1
|
-
# Require as little as possible at top, so we can bundle require later
|
2
|
-
# if needed, before requiring anything from the bundle. Can't avoid slop
|
3
|
-
# though, to get our bundle arg out, sorry.
|
4
1
|
require 'slop'
|
2
|
+
require 'traject'
|
3
|
+
require 'traject/indexer'
|
5
4
|
|
6
5
|
module Traject
|
7
6
|
# The class that executes for the Traject command line utility.
|
@@ -33,21 +32,6 @@ module Traject
|
|
33
32
|
# Returns true on success or false on failure; may also raise exceptions;
|
34
33
|
# may also exit program directly itself (yeah, could use some normalization)
|
35
34
|
def execute
|
36
|
-
# Do bundler setup FIRST to try and initialize all gems from gemfile
|
37
|
-
# if requested.
|
38
|
-
|
39
|
-
# have to use Slop object to tell diff between
|
40
|
-
# no arg supplied and no option -g given at all
|
41
|
-
if slop.present? :Gemfile
|
42
|
-
require_bundler_setup(options[:Gemfile])
|
43
|
-
end
|
44
|
-
|
45
|
-
|
46
|
-
# We require them here instead of top of file,
|
47
|
-
# so we have done bundler require before we require these.
|
48
|
-
require 'traject'
|
49
|
-
require 'traject/indexer'
|
50
|
-
|
51
35
|
if options[:version]
|
52
36
|
self.console.puts "traject version #{Traject::VERSION}"
|
53
37
|
return
|
@@ -92,6 +76,10 @@ module Traject
|
|
92
76
|
end
|
93
77
|
|
94
78
|
return result
|
79
|
+
rescue Exception => e
|
80
|
+
# Try to log unexpected exceptions if possible
|
81
|
+
indexer && indexer.logger && indexer.logger.fatal("Traject::CommandLine: Unexpected exception, terminating execution: #{e.inspect}") rescue nil
|
82
|
+
raise e
|
95
83
|
end
|
96
84
|
|
97
85
|
def command_commit!
|
@@ -117,19 +105,21 @@ module Traject
|
|
117
105
|
$stdout
|
118
106
|
end
|
119
107
|
|
108
|
+
indexer.logger.info(" marcout writing type:#{output_type} to file:#{output_arg}")
|
109
|
+
|
120
110
|
case output_type
|
121
111
|
when "binary"
|
122
112
|
writer = MARC::Writer.new(output_arg)
|
123
113
|
|
124
114
|
allow_oversized = indexer.settings["marcout.allow_oversized"]
|
125
115
|
if allow_oversized
|
126
|
-
allow_oversized = (allow_oversized.to_s == "true")
|
116
|
+
allow_oversized = (allow_oversized.to_s == "true")
|
127
117
|
writer.allow_oversized = allow_oversized
|
128
118
|
end
|
129
119
|
when "xml"
|
130
120
|
writer = MARC::XMLWriter.new(output_arg)
|
131
121
|
when "human"
|
132
|
-
writer = output_arg.kind_of?(String) ? File.open(output_arg, "w:binary") : output_arg
|
122
|
+
writer = output_arg.kind_of?(String) ? File.open(output_arg, "w:binary") : output_arg
|
133
123
|
else
|
134
124
|
raise ArgumentError.new("traject marcout unrecognized marcout.type: #{output_type}")
|
135
125
|
end
|
@@ -174,7 +164,7 @@ module Traject
|
|
174
164
|
filename = argv.first
|
175
165
|
indexer.logger.info "Reading from #{filename}"
|
176
166
|
end
|
177
|
-
|
167
|
+
|
178
168
|
return io, filename
|
179
169
|
end
|
180
170
|
|
@@ -215,24 +205,6 @@ module Traject
|
|
215
205
|
end
|
216
206
|
end
|
217
207
|
|
218
|
-
# requires bundler/setup, optionally first setting ENV["BUNDLE_GEMFILE"]
|
219
|
-
# to tell bundler to use a specific gemfile. Gemfile arg can be relative
|
220
|
-
# to current working directory.
|
221
|
-
def require_bundler_setup(gemfile=nil)
|
222
|
-
if gemfile
|
223
|
-
# tell bundler what gemfile to use
|
224
|
-
gem_path = File.expand_path( gemfile )
|
225
|
-
# bundler not good at error reporting, we check ourselves
|
226
|
-
unless File.exists? gem_path
|
227
|
-
self.console.puts "Gemfile `#{gemfile}` does not exist, exiting..."
|
228
|
-
self.console.puts
|
229
|
-
self.console.puts slop.help
|
230
|
-
exit 2
|
231
|
-
end
|
232
|
-
ENV["BUNDLE_GEMFILE"] = gem_path
|
233
|
-
end
|
234
|
-
require 'bundler/setup'
|
235
|
-
end
|
236
208
|
|
237
209
|
def assemble_settings_hash(options)
|
238
210
|
settings = {}
|
@@ -256,7 +228,7 @@ module Traject
|
|
256
228
|
if options[:'debug-mode']
|
257
229
|
require 'traject/debug_writer'
|
258
230
|
settings["writer_class_name"] = "Traject::DebugWriter"
|
259
|
-
settings["log.level"] = "debug"
|
231
|
+
settings["log.level"] = "debug"
|
260
232
|
settings["processing_thread_pool"] = 0
|
261
233
|
end
|
262
234
|
if options[:writer]
|
@@ -294,7 +266,6 @@ module Traject
|
|
294
266
|
on :u, :solr, "Set solr url, shortcut for -s solr.url=", :argument => true
|
295
267
|
on :t, :marc_type, "xml, json or binary. shortcut for -s marc_source.type=", :argument => true
|
296
268
|
on :I, "load_path", "append paths to ruby $LOAD_PATH", :argument => true, :as => Array, :delimiter => ":"
|
297
|
-
on :G, "Gemfile", "run with bundler and optionally specified Gemfile", :argument => :optional, :default => nil
|
298
269
|
|
299
270
|
on :x, "command", "alternate traject command: process (default); marcout; commit", :argument => true, :default => "process"
|
300
271
|
|