traject 0.13.0 → 0.13.1
Sign up to get free protection for your applications and to get access to all the features.
- data/.yardopts +2 -0
- data/README.md +13 -9
- data/doc/extending.md +16 -12
- data/lib/traject/command_line.rb +13 -9
- data/lib/traject/translation_map.rb +11 -1
- data/lib/traject/version.rb +1 -1
- data/test/translation_map_test.rb +26 -0
- data/traject.gemspec +2 -0
- metadata +9 -3
data/.yardopts
ADDED
data/README.md
CHANGED
@@ -15,15 +15,19 @@ them somewhere.
|
|
15
15
|
|
16
16
|
Existing tools for indexing Marc to Solr exist, and have served us well for many years, and have many useful things about them -- which I've tried to preserve in traject. But I was having more and more difficulty working with the existing tools, including difficulty providing the custom logic I needed in a maintainable way. I realized that for me, to create a tool with the flexibility, maintainability, and performance I wanted, I would need to do it in jruby (ruby on the JVM).
|
17
17
|
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
*
|
23
|
-
*
|
24
|
-
|
25
|
-
* High performance
|
26
|
-
|
18
|
+
* *Easy to use*, getting started with standard use cases should be easy, even for non-rubyists.
|
19
|
+
* *Support customization and flexiblity*, common customization use cases, including simple local
|
20
|
+
logic, should be very easy. More sophisticated and even complex customization use cases should still be possible,
|
21
|
+
changing just the parts of traject you want to change.
|
22
|
+
* *Maintainable local logic*, including supporting sharing of reusable logic via ruby gems.
|
23
|
+
* *Maintainable understandable internal logic*; well-covered by tests, well-factored seperation of concerns,
|
24
|
+
easy for newcomer developers who know ruby to understand the codebase.
|
25
|
+
* *High performance*, using multi-threaded concurrency where appropriate to maximize throughput.
|
26
|
+
While it depends on your configuration and the size of your server(s), traject is likely higher
|
27
|
+
performance than other similar solutions.
|
28
|
+
* *Well-behaved shell script*, for painless integration in batch processes and cronjobs, with
|
29
|
+
exit codes, sufficiently flexible control of logging, proper use of stderr, etc.
|
30
|
+
|
27
31
|
|
28
32
|
|
29
33
|
## Installation
|
data/doc/extending.md
CHANGED
@@ -5,7 +5,7 @@ organize it in files other than traject config files, but then
|
|
5
5
|
use it in traject config files.
|
6
6
|
|
7
7
|
You might want to have code local to your traject project; or you
|
8
|
-
might want to use ruby gems
|
8
|
+
might want to use ruby gems to share code between projects and developers.
|
9
9
|
A given project may use both of these techniques.
|
10
10
|
|
11
11
|
Here are some suggestions for how to do this, along with mention
|
@@ -16,7 +16,7 @@ of a couple traject features meant to make it easier.
|
|
16
16
|
* Traject `-I` argument command line can be used to list directories to
|
17
17
|
add to the load path, similar to the `ruby -I` argument. You
|
18
18
|
can then 'require' local project files from the load path.
|
19
|
-
* translation map files found
|
19
|
+
* translation map files found in a
|
20
20
|
"./translation_maps" subdir on the load path will be found
|
21
21
|
for Traject translation maps.
|
22
22
|
* Traject `-G` command line can be used to tell traject to use
|
@@ -26,7 +26,7 @@ of a couple traject features meant to make it easier.
|
|
26
26
|
## Custom code local to your project
|
27
27
|
|
28
28
|
You might want local translation maps, or local ruby
|
29
|
-
code. Here's a standard way you might lay out
|
29
|
+
code. Here's a standard recommended way you might lay out
|
30
30
|
this extra code in the file system, using a 'lib'
|
31
31
|
directory kept next to your traject config files:
|
32
32
|
|
@@ -97,8 +97,8 @@ That's pretty much it!
|
|
97
97
|
|
98
98
|
What about that translation map? The `$LOAD_PATH` modification
|
99
99
|
took care of that too, the Traject::TranslationMap will look
|
100
|
-
up translation map definition files
|
101
|
-
in a `./translation_maps` subdir on the load path.
|
100
|
+
up translation map definition files
|
101
|
+
in a `./translation_maps` subdir on the load path, as in `./lib/translation_maps` in this case.
|
102
102
|
|
103
103
|
|
104
104
|
## Using gems in your traject project
|
@@ -128,11 +128,10 @@ require 'some_gem'
|
|
128
128
|
SomeGem.whatever!
|
129
129
|
~~~
|
130
130
|
|
131
|
-
|
132
|
-
in
|
133
|
-
sub-directory, and traject will be able to find those
|
131
|
+
A gem can provide traject translation map definitions
|
132
|
+
in a `lib/translation_maps` sub-directory, and traject will be able to find those
|
134
133
|
translation maps when the gem is loaded. (Because gems'
|
135
|
-
`./lib` directories are added to the ruby load path.)
|
134
|
+
`./lib` directories are by default added to the ruby load path.)
|
136
135
|
|
137
136
|
### Or, with bundler:
|
138
137
|
|
@@ -161,9 +160,14 @@ possibly with version restrictions, in the [Gemfile](http://bundler.io/v1.3/gemf
|
|
161
160
|
Run `bundle install` from the directory with the Gemfile, on any system
|
162
161
|
at any time, to make sure specified gems are installed.
|
163
162
|
|
164
|
-
**Run traject** with the `-G` flag to tell it to use the Gemfile
|
163
|
+
**Run traject** with the `-G` flag to tell it to use the Gemfile, for instance if
|
164
|
+
your working directory is the one that includes your Gemfile:
|
165
165
|
|
166
|
-
|
166
|
+
traject -G -c some_traject_config.rb ...
|
167
|
+
|
168
|
+
Or explicitly specify a Gemfile somewhere else:
|
169
|
+
|
170
|
+
traject -G /some/path/Gemfile -c some_config.rb ...
|
167
171
|
|
168
172
|
Traject will use bundler to setup with the Gemfile, making sure
|
169
173
|
the specified versions of all gems are used (and also making sure
|
@@ -179,4 +183,4 @@ that bundler creates into your source control repo. The
|
|
179
183
|
gem dependencies are currently being used, so you can get the exact
|
180
184
|
same dependency environment on different servers.
|
181
185
|
|
182
|
-
See the [bundler documentation](http://bundler.io/#getting-started), or google, for more information.
|
186
|
+
See the [bundler documentation](http://bundler.io/#getting-started), or google, for more information.
|
data/lib/traject/command_line.rb
CHANGED
@@ -33,14 +33,8 @@ module Traject
|
|
33
33
|
# Returns true on success or false on failure; may also raise exceptions;
|
34
34
|
# may also exit program directly itself (yeah, could use some normalization)
|
35
35
|
def execute
|
36
|
-
|
37
|
-
|
38
|
-
return
|
39
|
-
end
|
40
|
-
if options[:help]
|
41
|
-
self.console.puts slop.help
|
42
|
-
return
|
43
|
-
end
|
36
|
+
# Do bundler setup FIRST to try and initialize all gems from gemfile
|
37
|
+
# if requested.
|
44
38
|
|
45
39
|
# have to use Slop object to tell diff between
|
46
40
|
# no arg supplied and no option -g given at all
|
@@ -48,11 +42,21 @@ module Traject
|
|
48
42
|
require_bundler_setup(options[:Gemfile])
|
49
43
|
end
|
50
44
|
|
45
|
+
|
51
46
|
# We require them here instead of top of file,
|
52
47
|
# so we have done bundler require before we require these.
|
53
48
|
require 'traject'
|
54
49
|
require 'traject/indexer'
|
55
50
|
|
51
|
+
if options[:version]
|
52
|
+
self.console.puts "traject version #{Traject::VERSION}"
|
53
|
+
return
|
54
|
+
end
|
55
|
+
if options[:help]
|
56
|
+
self.console.puts slop.help
|
57
|
+
return
|
58
|
+
end
|
59
|
+
|
56
60
|
|
57
61
|
(options[:load_path] || []).each do |path|
|
58
62
|
$LOAD_PATH << path unless $LOAD_PATH.include? path
|
@@ -282,7 +286,7 @@ module Traject
|
|
282
286
|
on :j, "output as pretty printed json, shortcut for -s writer_class_name=JsonWriter -s json_writer.pretty_print=true"
|
283
287
|
on :t, :marc_type, "xml, json or binary. shortcut for -s marc_source.type=", :argument => true
|
284
288
|
on :I, "load_path", "append paths to ruby $LOAD_PATH", :argument => true, :as => Array, :delimiter => ":"
|
285
|
-
on :G, "Gemfile", "run with bundler and optionally specified Gemfile", :argument => :optional, :default =>
|
289
|
+
on :G, "Gemfile", "run with bundler and optionally specified Gemfile", :argument => :optional, :default => nil
|
286
290
|
|
287
291
|
on :x, "command", "alternate traject command: process (default); marcout", :argument => true, :default => "process"
|
288
292
|
|
@@ -109,6 +109,10 @@ module Traject
|
|
109
109
|
end
|
110
110
|
end
|
111
111
|
|
112
|
+
# Cached hash can't be mutated without weird consequences, let's
|
113
|
+
# freeze it!
|
114
|
+
found.freeze if found
|
115
|
+
|
112
116
|
return found
|
113
117
|
end
|
114
118
|
|
@@ -141,7 +145,7 @@ module Traject
|
|
141
145
|
if options[:default]
|
142
146
|
@default = options[:default]
|
143
147
|
elsif @hash.has_key? "__default__"
|
144
|
-
@default = @hash
|
148
|
+
@default = @hash["__default__"]
|
145
149
|
end
|
146
150
|
end
|
147
151
|
|
@@ -158,6 +162,12 @@ module Traject
|
|
158
162
|
end
|
159
163
|
alias_method :map, :[]
|
160
164
|
|
165
|
+
# Returns a dup of internal hash, dup so you can modify it
|
166
|
+
# if you like.
|
167
|
+
def to_hash
|
168
|
+
@hash.dup
|
169
|
+
end
|
170
|
+
|
161
171
|
# Run every element of an array through this translation map,
|
162
172
|
# return the resulting array. If translation map returns nil,
|
163
173
|
# original element will be missing from output.
|
data/lib/traject/version.rb
CHANGED
@@ -27,6 +27,19 @@ describe "TranslationMap" do
|
|
27
27
|
assert_equal "value1", found["key1"]
|
28
28
|
end
|
29
29
|
|
30
|
+
it "freezes the hash" do
|
31
|
+
found = @cache.lookup("yaml_map")
|
32
|
+
|
33
|
+
assert found.frozen?
|
34
|
+
end
|
35
|
+
|
36
|
+
it "respects in-file default, even on second load" do
|
37
|
+
map = Traject::TranslationMap.new("default_literal")
|
38
|
+
map = Traject::TranslationMap.new("default_literal")
|
39
|
+
|
40
|
+
assert_equal "DEFAULT LITERAL", map["not in the map"]
|
41
|
+
end
|
42
|
+
|
30
43
|
it "finds .rb over .yaml" do
|
31
44
|
found = @cache.lookup("both_map")
|
32
45
|
|
@@ -103,4 +116,17 @@ describe "TranslationMap" do
|
|
103
116
|
|
104
117
|
assert_equal ["hola", "first", "second", "last thing", "buenas noches", "hola", "everything else"], arr
|
105
118
|
end
|
119
|
+
|
120
|
+
it "#to_hash" do
|
121
|
+
map = Traject::TranslationMap.new("yaml_map")
|
122
|
+
|
123
|
+
hash = map.to_hash
|
124
|
+
|
125
|
+
assert_kind_of Hash, hash
|
126
|
+
|
127
|
+
assert ! hash.frozen?, "#to_hash result is not frozen"
|
128
|
+
|
129
|
+
refute_same hash, map.to_hash, "each #to_hash result is a copy"
|
130
|
+
end
|
131
|
+
|
106
132
|
end
|
data/traject.gemspec
CHANGED
@@ -17,6 +17,8 @@ Gem::Specification.new do |spec|
|
|
17
17
|
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
|
18
18
|
spec.require_paths = ["lib"]
|
19
19
|
|
20
|
+
spec.extra_rdoc_files = spec.files.grep(%r{^doc/})
|
21
|
+
|
20
22
|
|
21
23
|
spec.add_dependency "marc", ">= 0.7.1"
|
22
24
|
spec.add_dependency "marc-marc4j", ">=0.1.1"
|
metadata
CHANGED
@@ -2,14 +2,14 @@
|
|
2
2
|
name: traject
|
3
3
|
version: !ruby/object:Gem::Version
|
4
4
|
prerelease:
|
5
|
-
version: 0.13.
|
5
|
+
version: 0.13.1
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
8
8
|
- Jonathan Rochkind
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2013-09-
|
12
|
+
date: 2013-09-16 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: marc
|
@@ -157,10 +157,16 @@ email:
|
|
157
157
|
executables:
|
158
158
|
- traject
|
159
159
|
extensions: []
|
160
|
-
extra_rdoc_files:
|
160
|
+
extra_rdoc_files:
|
161
|
+
- doc/batch_execution.md
|
162
|
+
- doc/extending.md
|
163
|
+
- doc/macros.md
|
164
|
+
- doc/other_commands.md
|
165
|
+
- doc/settings.md
|
161
166
|
files:
|
162
167
|
- .gitignore
|
163
168
|
- .travis.yml
|
169
|
+
- .yardopts
|
164
170
|
- Gemfile
|
165
171
|
- LICENSE.txt
|
166
172
|
- README.md
|