traject 0.13.0 → 0.13.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.yardopts +2 -0
- data/README.md +13 -9
- data/doc/extending.md +16 -12
- data/lib/traject/command_line.rb +13 -9
- data/lib/traject/translation_map.rb +11 -1
- data/lib/traject/version.rb +1 -1
- data/test/translation_map_test.rb +26 -0
- data/traject.gemspec +2 -0
- metadata +9 -3
data/.yardopts
ADDED
data/README.md
CHANGED
@@ -15,15 +15,19 @@ them somewhere.
|
|
15
15
|
|
16
16
|
Existing tools for indexing Marc to Solr exist, and have served us well for many years, and have many useful things about them -- which I've tried to preserve in traject. But I was having more and more difficulty working with the existing tools, including difficulty providing the custom logic I needed in a maintainable way. I realized that for me, to create a tool with the flexibility, maintainability, and performance I wanted, I would need to do it in jruby (ruby on the JVM).
|
17
17
|
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
*
|
23
|
-
*
|
24
|
-
|
25
|
-
* High performance
|
26
|
-
|
18
|
+
* *Easy to use*, getting started with standard use cases should be easy, even for non-rubyists.
|
19
|
+
* *Support customization and flexiblity*, common customization use cases, including simple local
|
20
|
+
logic, should be very easy. More sophisticated and even complex customization use cases should still be possible,
|
21
|
+
changing just the parts of traject you want to change.
|
22
|
+
* *Maintainable local logic*, including supporting sharing of reusable logic via ruby gems.
|
23
|
+
* *Maintainable understandable internal logic*; well-covered by tests, well-factored seperation of concerns,
|
24
|
+
easy for newcomer developers who know ruby to understand the codebase.
|
25
|
+
* *High performance*, using multi-threaded concurrency where appropriate to maximize throughput.
|
26
|
+
While it depends on your configuration and the size of your server(s), traject is likely higher
|
27
|
+
performance than other similar solutions.
|
28
|
+
* *Well-behaved shell script*, for painless integration in batch processes and cronjobs, with
|
29
|
+
exit codes, sufficiently flexible control of logging, proper use of stderr, etc.
|
30
|
+
|
27
31
|
|
28
32
|
|
29
33
|
## Installation
|
data/doc/extending.md
CHANGED
@@ -5,7 +5,7 @@ organize it in files other than traject config files, but then
|
|
5
5
|
use it in traject config files.
|
6
6
|
|
7
7
|
You might want to have code local to your traject project; or you
|
8
|
-
might want to use ruby gems
|
8
|
+
might want to use ruby gems to share code between projects and developers.
|
9
9
|
A given project may use both of these techniques.
|
10
10
|
|
11
11
|
Here are some suggestions for how to do this, along with mention
|
@@ -16,7 +16,7 @@ of a couple traject features meant to make it easier.
|
|
16
16
|
* Traject `-I` argument command line can be used to list directories to
|
17
17
|
add to the load path, similar to the `ruby -I` argument. You
|
18
18
|
can then 'require' local project files from the load path.
|
19
|
-
* translation map files found
|
19
|
+
* translation map files found in a
|
20
20
|
"./translation_maps" subdir on the load path will be found
|
21
21
|
for Traject translation maps.
|
22
22
|
* Traject `-G` command line can be used to tell traject to use
|
@@ -26,7 +26,7 @@ of a couple traject features meant to make it easier.
|
|
26
26
|
## Custom code local to your project
|
27
27
|
|
28
28
|
You might want local translation maps, or local ruby
|
29
|
-
code. Here's a standard way you might lay out
|
29
|
+
code. Here's a standard recommended way you might lay out
|
30
30
|
this extra code in the file system, using a 'lib'
|
31
31
|
directory kept next to your traject config files:
|
32
32
|
|
@@ -97,8 +97,8 @@ That's pretty much it!
|
|
97
97
|
|
98
98
|
What about that translation map? The `$LOAD_PATH` modification
|
99
99
|
took care of that too, the Traject::TranslationMap will look
|
100
|
-
up translation map definition files
|
101
|
-
in a `./translation_maps` subdir on the load path.
|
100
|
+
up translation map definition files
|
101
|
+
in a `./translation_maps` subdir on the load path, as in `./lib/translation_maps` in this case.
|
102
102
|
|
103
103
|
|
104
104
|
## Using gems in your traject project
|
@@ -128,11 +128,10 @@ require 'some_gem'
|
|
128
128
|
SomeGem.whatever!
|
129
129
|
~~~
|
130
130
|
|
131
|
-
|
132
|
-
in
|
133
|
-
sub-directory, and traject will be able to find those
|
131
|
+
A gem can provide traject translation map definitions
|
132
|
+
in a `lib/translation_maps` sub-directory, and traject will be able to find those
|
134
133
|
translation maps when the gem is loaded. (Because gems'
|
135
|
-
`./lib` directories are added to the ruby load path.)
|
134
|
+
`./lib` directories are by default added to the ruby load path.)
|
136
135
|
|
137
136
|
### Or, with bundler:
|
138
137
|
|
@@ -161,9 +160,14 @@ possibly with version restrictions, in the [Gemfile](http://bundler.io/v1.3/gemf
|
|
161
160
|
Run `bundle install` from the directory with the Gemfile, on any system
|
162
161
|
at any time, to make sure specified gems are installed.
|
163
162
|
|
164
|
-
**Run traject** with the `-G` flag to tell it to use the Gemfile
|
163
|
+
**Run traject** with the `-G` flag to tell it to use the Gemfile, for instance if
|
164
|
+
your working directory is the one that includes your Gemfile:
|
165
165
|
|
166
|
-
|
166
|
+
traject -G -c some_traject_config.rb ...
|
167
|
+
|
168
|
+
Or explicitly specify a Gemfile somewhere else:
|
169
|
+
|
170
|
+
traject -G /some/path/Gemfile -c some_config.rb ...
|
167
171
|
|
168
172
|
Traject will use bundler to setup with the Gemfile, making sure
|
169
173
|
the specified versions of all gems are used (and also making sure
|
@@ -179,4 +183,4 @@ that bundler creates into your source control repo. The
|
|
179
183
|
gem dependencies are currently being used, so you can get the exact
|
180
184
|
same dependency environment on different servers.
|
181
185
|
|
182
|
-
See the [bundler documentation](http://bundler.io/#getting-started), or google, for more information.
|
186
|
+
See the [bundler documentation](http://bundler.io/#getting-started), or google, for more information.
|
data/lib/traject/command_line.rb
CHANGED
@@ -33,14 +33,8 @@ module Traject
|
|
33
33
|
# Returns true on success or false on failure; may also raise exceptions;
|
34
34
|
# may also exit program directly itself (yeah, could use some normalization)
|
35
35
|
def execute
|
36
|
-
|
37
|
-
|
38
|
-
return
|
39
|
-
end
|
40
|
-
if options[:help]
|
41
|
-
self.console.puts slop.help
|
42
|
-
return
|
43
|
-
end
|
36
|
+
# Do bundler setup FIRST to try and initialize all gems from gemfile
|
37
|
+
# if requested.
|
44
38
|
|
45
39
|
# have to use Slop object to tell diff between
|
46
40
|
# no arg supplied and no option -g given at all
|
@@ -48,11 +42,21 @@ module Traject
|
|
48
42
|
require_bundler_setup(options[:Gemfile])
|
49
43
|
end
|
50
44
|
|
45
|
+
|
51
46
|
# We require them here instead of top of file,
|
52
47
|
# so we have done bundler require before we require these.
|
53
48
|
require 'traject'
|
54
49
|
require 'traject/indexer'
|
55
50
|
|
51
|
+
if options[:version]
|
52
|
+
self.console.puts "traject version #{Traject::VERSION}"
|
53
|
+
return
|
54
|
+
end
|
55
|
+
if options[:help]
|
56
|
+
self.console.puts slop.help
|
57
|
+
return
|
58
|
+
end
|
59
|
+
|
56
60
|
|
57
61
|
(options[:load_path] || []).each do |path|
|
58
62
|
$LOAD_PATH << path unless $LOAD_PATH.include? path
|
@@ -282,7 +286,7 @@ module Traject
|
|
282
286
|
on :j, "output as pretty printed json, shortcut for -s writer_class_name=JsonWriter -s json_writer.pretty_print=true"
|
283
287
|
on :t, :marc_type, "xml, json or binary. shortcut for -s marc_source.type=", :argument => true
|
284
288
|
on :I, "load_path", "append paths to ruby $LOAD_PATH", :argument => true, :as => Array, :delimiter => ":"
|
285
|
-
on :G, "Gemfile", "run with bundler and optionally specified Gemfile", :argument => :optional, :default =>
|
289
|
+
on :G, "Gemfile", "run with bundler and optionally specified Gemfile", :argument => :optional, :default => nil
|
286
290
|
|
287
291
|
on :x, "command", "alternate traject command: process (default); marcout", :argument => true, :default => "process"
|
288
292
|
|
@@ -109,6 +109,10 @@ module Traject
|
|
109
109
|
end
|
110
110
|
end
|
111
111
|
|
112
|
+
# Cached hash can't be mutated without weird consequences, let's
|
113
|
+
# freeze it!
|
114
|
+
found.freeze if found
|
115
|
+
|
112
116
|
return found
|
113
117
|
end
|
114
118
|
|
@@ -141,7 +145,7 @@ module Traject
|
|
141
145
|
if options[:default]
|
142
146
|
@default = options[:default]
|
143
147
|
elsif @hash.has_key? "__default__"
|
144
|
-
@default = @hash
|
148
|
+
@default = @hash["__default__"]
|
145
149
|
end
|
146
150
|
end
|
147
151
|
|
@@ -158,6 +162,12 @@ module Traject
|
|
158
162
|
end
|
159
163
|
alias_method :map, :[]
|
160
164
|
|
165
|
+
# Returns a dup of internal hash, dup so you can modify it
|
166
|
+
# if you like.
|
167
|
+
def to_hash
|
168
|
+
@hash.dup
|
169
|
+
end
|
170
|
+
|
161
171
|
# Run every element of an array through this translation map,
|
162
172
|
# return the resulting array. If translation map returns nil,
|
163
173
|
# original element will be missing from output.
|
data/lib/traject/version.rb
CHANGED
@@ -27,6 +27,19 @@ describe "TranslationMap" do
|
|
27
27
|
assert_equal "value1", found["key1"]
|
28
28
|
end
|
29
29
|
|
30
|
+
it "freezes the hash" do
|
31
|
+
found = @cache.lookup("yaml_map")
|
32
|
+
|
33
|
+
assert found.frozen?
|
34
|
+
end
|
35
|
+
|
36
|
+
it "respects in-file default, even on second load" do
|
37
|
+
map = Traject::TranslationMap.new("default_literal")
|
38
|
+
map = Traject::TranslationMap.new("default_literal")
|
39
|
+
|
40
|
+
assert_equal "DEFAULT LITERAL", map["not in the map"]
|
41
|
+
end
|
42
|
+
|
30
43
|
it "finds .rb over .yaml" do
|
31
44
|
found = @cache.lookup("both_map")
|
32
45
|
|
@@ -103,4 +116,17 @@ describe "TranslationMap" do
|
|
103
116
|
|
104
117
|
assert_equal ["hola", "first", "second", "last thing", "buenas noches", "hola", "everything else"], arr
|
105
118
|
end
|
119
|
+
|
120
|
+
it "#to_hash" do
|
121
|
+
map = Traject::TranslationMap.new("yaml_map")
|
122
|
+
|
123
|
+
hash = map.to_hash
|
124
|
+
|
125
|
+
assert_kind_of Hash, hash
|
126
|
+
|
127
|
+
assert ! hash.frozen?, "#to_hash result is not frozen"
|
128
|
+
|
129
|
+
refute_same hash, map.to_hash, "each #to_hash result is a copy"
|
130
|
+
end
|
131
|
+
|
106
132
|
end
|
data/traject.gemspec
CHANGED
@@ -17,6 +17,8 @@ Gem::Specification.new do |spec|
|
|
17
17
|
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
|
18
18
|
spec.require_paths = ["lib"]
|
19
19
|
|
20
|
+
spec.extra_rdoc_files = spec.files.grep(%r{^doc/})
|
21
|
+
|
20
22
|
|
21
23
|
spec.add_dependency "marc", ">= 0.7.1"
|
22
24
|
spec.add_dependency "marc-marc4j", ">=0.1.1"
|
metadata
CHANGED
@@ -2,14 +2,14 @@
|
|
2
2
|
name: traject
|
3
3
|
version: !ruby/object:Gem::Version
|
4
4
|
prerelease:
|
5
|
-
version: 0.13.
|
5
|
+
version: 0.13.1
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
8
8
|
- Jonathan Rochkind
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-09-
|
12
|
+
date: 2013-09-16 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: marc
|
@@ -157,10 +157,16 @@ email:
|
|
157
157
|
executables:
|
158
158
|
- traject
|
159
159
|
extensions: []
|
160
|
-
extra_rdoc_files:
|
160
|
+
extra_rdoc_files:
|
161
|
+
- doc/batch_execution.md
|
162
|
+
- doc/extending.md
|
163
|
+
- doc/macros.md
|
164
|
+
- doc/other_commands.md
|
165
|
+
- doc/settings.md
|
161
166
|
files:
|
162
167
|
- .gitignore
|
163
168
|
- .travis.yml
|
169
|
+
- .yardopts
|
164
170
|
- Gemfile
|
165
171
|
- LICENSE.txt
|
166
172
|
- README.md
|