traject 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 31243d453f43fbc8f8634c2511340d8fb96c606f
4
- data.tar.gz: a843a583f235920931304fc6baa7ff75ce91dfab
3
+ metadata.gz: 4ae9c6a2d87868021cae1b48637592238387d8a1
4
+ data.tar.gz: 578c645162da3560ff5e01a28cf43b36e82734f8
5
5
  SHA512:
6
- metadata.gz: 5a0e6f3c695a5fe497f4cadd14fba7a2a10a3ea1ff40e42a6625949227d1eb3f995dc4d0ecb228f5e98c95f737df1c1e0e8089b59e17fcb0fedca8de5208b535
7
- data.tar.gz: 3598e91ed10b039c4382d56ddc69939713425c4f059eb27adfddfdc272e25f2c52e8f900a31b94ceda6576d716c2155188dc03d4697156314e71aadf99a92471
6
+ metadata.gz: e0bf13c4ff3cab492b6be8922ae22e33311f701756910526eb3206522774b1519db07324531ebd2c366d5907854c78bb6cde0d65eeff78258513d37fae1a3a57
7
+ data.tar.gz: dd251b15afafe2a11cbefe8493e145d9b9b883183b5f884a5b61557046c74b699427fe2766d5df587e27729f6c22052b2f672b756aba8fee33149f6abbcc4f40
data/Gemfile CHANGED
@@ -8,5 +8,5 @@ group :development do
8
8
  end
9
9
 
10
10
  group :debug do
11
- gem "ruby-debug"
11
+ gem "ruby-debug", :platform => "jruby"
12
12
  end
@@ -62,7 +62,7 @@ The third optional argument is a
62
62
  object. Most of the time you don't need it, but you can use it for
63
63
  some sophisticated functionality, for example using these Context methods:
64
64
 
65
- * `context.clipboard` A hash into which you can stuff values that you want to pass from one indexing step to another. For example, if you go through a bunch of work to query a database and get a result you'll need more than once, stick the results somewhere in the clipboard.
65
+ * `context.clipboard` A hash into which you can stuff values that you want to pass from one indexing step to another. For example, if you go through a bunch of work to query a database and get a result you'll need more than once, stick the results somewhere in the clipboard. This clipboard is record-specific, and won't persist between records.
66
66
  * `context.position` The position of the record in the input file (e.g., was it the first record, seoncd, etc.). Useful for error reporting
67
67
  * `context.output_hash` A hash mapping the field names (generally defined in `to_field` calls) to an array of values to be sent to the writer associated with that field. This allows you to modify what goes to the writer without going through a `to_field` call -- you can just set `context.output_hash['myfield'] = ['my', 'values']` and you're set. See below for more examples
68
68
  * `context.skip!(msg)` An assertion that this record should be ignored. No more indexing steps will be called, no results will be sent to the writer, and a `debug`-level log message will be written stating that the record was skipped.
@@ -262,4 +262,4 @@ args for `each_record`.
262
262
 
263
263
  * **Once you call `context.skip!(msg)` no more index steps will be run for that record**. So if you have any cleanup code, you'll need to make sure to call it yourself.
264
264
 
265
- * **By default, `trajcet` indexing runs multi-threaded**. In the current implementation, the indexing steps for one record are *not* split across threads, but different records can be processed simultaneously by more than one thread. That means you need to make sure your code is thread-safe (or always set `processing_thread_pool` to 0).
265
+ * **By default, `trajcet` indexing runs multi-threaded**. In the current implementation, the indexing steps for one record are *not* split across threads, but different records can be processed simultaneously by more than one thread. That means you need to make sure your code is thread-safe (or always set `processing_thread_pool` to 0).
@@ -26,15 +26,23 @@ module Traject::Macros
26
26
  accumulator.concat list.uniq if list
27
27
  end
28
28
  end
29
+
29
30
  # If a num begins with a known OCLC prefix, return it without the prefix.
30
31
  # otherwise nil.
32
+ #
33
+ # Allow (OCoLC) and/or ocn/ocm/on
34
+
35
+ OCLCPAT = /
36
+ \A\s*
37
+ (?:(?:\(OCoLC\)) |
38
+ (?:\(OCoLC\))?(?:(?:ocm)|(?:ocn)|(?:on))
39
+ )(\d+)
40
+ /x
41
+
31
42
  def self.oclcnum_extract(num)
32
- stripped = num.gsub(/\A(ocm)|(ocn)|(on)|(\(OCoLC\))/, '')
33
- if num != stripped
34
- # it had the prefix, which we've now stripped
35
- return stripped
43
+ if OCLCPAT.match(num)
44
+ return $1
36
45
  else
37
- # it didn't have the prefix
38
46
  return nil
39
47
  end
40
48
  end
@@ -1,6 +1,7 @@
1
1
  require 'traject'
2
2
 
3
3
  require 'yaml'
4
+ require 'dot-properties'
4
5
 
5
6
 
6
7
  module Traject
@@ -14,7 +15,8 @@ module Traject
14
15
  #
15
16
  # What makes it more useful than a stunted hash is it's ability to load
16
17
  # the hash definitions from configuration files, either pure ruby,
17
- # yaml, or (limited subset of) java .properties file.
18
+ # yaml, or java .properties file (not all .properties features may
19
+ # be supported, we use dot-properties gem for reading)
18
20
  #
19
21
  # traject's `extract_marc` macro allows you to specify a :translation_map=>filename argument
20
22
  # that will automatically find and use a translation map on the resulting data:
@@ -197,7 +199,9 @@ module Traject
197
199
  # Returns a dup of internal hash, dup so you can modify it
198
200
  # if you like.
199
201
  def to_hash
200
- @hash.dup
202
+ dup = @hash.dup
203
+ dup.delete("__default__")
204
+ dup
201
205
  end
202
206
 
203
207
  # Run every element of an array through this translation map,
@@ -224,6 +228,24 @@ module Traject
224
228
  array.replace( self.translate_array(array))
225
229
  end
226
230
 
231
+ # Return a new TranslationMap that results from merging argument on top of self.
232
+ # Can be useful for taking an existing translation map, but merging a few
233
+ # overrides on top.
234
+ #
235
+ # merged_map = TranslationMap.new(something).merge TranslationMap.new(else)
236
+ # #...
237
+ # merged_map.translate_array(something) # etc
238
+ #
239
+ # If a default is set in the second map, it will merge over the first too.
240
+ #
241
+ # You can also pass in a plain hash as an arg, instead of an existing TranslationMap:
242
+ #
243
+ # TranslationMap.new(something).merge("overridden_key" => "value", "a" => "")
244
+ def merge(other_map)
245
+ default = other_map.default || self.default
246
+ TranslationMap.new(self.to_hash.merge(other_map.to_hash), :default => default)
247
+ end
248
+
227
249
  class NotFound < Exception
228
250
  def initialize(path)
229
251
  super("No translation map definition file found at 'translation_maps/#{path}.[rb|yaml|properties]' in load path: #{$LOAD_PATH}")
@@ -232,36 +254,10 @@ module Traject
232
254
 
233
255
  protected
234
256
 
235
- # No built-in way to read java-style .properties, we hack it.
236
- # inspired by various hacky things found google ruby java properties parse
237
- # .properties spec seems to be:
238
- # http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load%28java.io.Reader%29
239
- #
240
- # We do NOT handle split lines, don't do that!
241
- def self.read_properties(file_name)
242
- hash = {}
243
- i = 0
244
- f = File.open(file_name)
245
- f.each_line do |line|
246
- i += 1
247
-
248
- line.strip!
249
-
250
- # skip blank lines
251
- next if line.empty?
252
-
253
- # skip comment lines
254
- next if line =~ /^\s*[!\#].*$/
255
-
256
- if line =~ /\A([^:=]+)[\:\=]\s*(.*)\s*\Z/
257
- hash[$1.strip] = $2
258
- else
259
- raise IOError.new("Can't parse from #{file_name} line #{i}: #{line}")
260
- end
261
- end
262
- f.close
263
-
264
- return hash
257
+ # We use dot-properties gem for reading .properties files,
258
+ # return a hash.
259
+ def self.read_properties(file_name)
260
+ return DotProperties.load(file_name).to_h
265
261
  end
266
262
 
267
263
  end
@@ -1,3 +1,3 @@
1
1
  module Traject
2
- VERSION = "1.0.0"
2
+ VERSION = "1.1.0"
3
3
  end
@@ -31,6 +31,25 @@ describe "Traject::Macros::Marc21Semantics" do
31
31
 
32
32
  assert_equal({}, @indexer.map_record(empty_record))
33
33
  end
34
+
35
+ it "deals with all prefixed OCLC nunbers" do
36
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', '(OCoLC)ocm111111111']))
37
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', '(OCoLC)222222222']))
38
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', 'ocm333333333']))
39
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', 'ocn444444444']))
40
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', '(OCoLC)ocn555555555']))
41
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', '(OCoLC)on666666666']))
42
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', '777777777'])) # not OCLC number
43
+
44
+ @indexer.instance_eval do
45
+ to_field "oclcnum", oclcnum
46
+ end
47
+ output = @indexer.map_record(@record)
48
+
49
+ assert_equal %w{47971712 111111111 222222222 333333333 444444444 555555555 666666666}, output["oclcnum"]
50
+ end
51
+
52
+
34
53
 
35
54
  it "#marc_series_facet" do
36
55
  @record = MARC::Reader.new(support_file_path "louis_armstrong.marc").to_a.first
@@ -109,7 +109,7 @@ describe "TranslationMap" do
109
109
 
110
110
  assert_equal "DEFAULT LITERAL", map["not in the map"]
111
111
  end
112
-
112
+
113
113
  it "respects __default__ __passthrough__" do
114
114
  map = Traject::TranslationMap.new("default_passthrough")
115
115
 
@@ -135,16 +135,82 @@ describe "TranslationMap" do
135
135
  assert_equal ["one"], values
136
136
  end
137
137
 
138
- it "#to_hash" do
139
- map = Traject::TranslationMap.new("yaml_map")
138
+ describe "#to_hash" do
139
+ it "produces a hash" do
140
+ map = Traject::TranslationMap.new("yaml_map")
141
+
142
+ hash = map.to_hash
143
+
144
+ assert_kind_of Hash, hash
145
+
146
+ assert ! hash.frozen?, "#to_hash result is not frozen"
147
+
148
+ refute_same hash, map.to_hash, "each #to_hash result is a copy"
149
+ end
150
+
151
+ it "does not include __default__ key" do
152
+ map = Traject::TranslationMap.new("default_passthrough")
153
+
154
+ refute map.to_hash.has_key?("__default__")
155
+ assert_nil map.to_hash["__default__"]
156
+ end
157
+
158
+ end
159
+
160
+ describe "#merge" do
161
+ it "merges" do
162
+ original = Traject::TranslationMap.new("yaml_map")
163
+ override = Traject::TranslationMap.new("other" => "OVERRIDE", "new" => "NEW")
164
+
165
+ merged = original.merge(override)
140
166
 
141
- hash = map.to_hash
167
+ assert_equal "value1", merged["key1"]
168
+ assert_equal "OVERRIDE", merged["other"]
169
+ assert_equal "NEW", merged["new"]
170
+ end
171
+
172
+ it "passes through default from first map when no default in second" do
173
+ original = Traject::TranslationMap.new("yaml_map", :default => "DEFAULT_VALUE")
174
+ override = Traject::TranslationMap.new("other" => "OVERRIDE")
175
+
176
+ merged = original.merge(override)
177
+
178
+ assert_equal "DEFAULT_VALUE", merged.default
179
+ assert_equal "DEFAULT_VALUE", merged["SOME_KEY_NOT_MATCHED"]
180
+ end
181
+
182
+ it "passes through default from second map when no default in first" do
183
+ original = Traject::TranslationMap.new("yaml_map")
184
+ override = Traject::TranslationMap.new({"other" => "OVERRIDE"}, :default => "DEFAULT_VALUE")
185
+
186
+ merged = original.merge(override)
142
187
 
143
- assert_kind_of Hash, hash
188
+ assert_equal "DEFAULT_VALUE", merged.default
189
+ assert_equal "DEFAULT_VALUE", merged["SOME_KEY_NOT_MATCHED"]
190
+ end
191
+
192
+ it "merges second default on top of first" do
193
+ original = Traject::TranslationMap.new("yaml_map", :default => "DEFAULT_VALUE")
194
+ override = Traject::TranslationMap.new({"other" => "OVERRIDE"}, :default => "NEW_DEFAULT_VALUE")
144
195
 
145
- assert ! hash.frozen?, "#to_hash result is not frozen"
196
+ merged = original.merge(override)
197
+
198
+ assert_equal "NEW_DEFAULT_VALUE", merged.default
199
+ assert_equal "NEW_DEFAULT_VALUE", merged["SOME_KEY_NOT_MATCHED"]
200
+ end
201
+
202
+ it "merges in a plain hash too" do
203
+ original = Traject::TranslationMap.new("yaml_map")
204
+ merged = original.merge(
205
+ "other" => "OVERRIDE",
206
+ "new" => "NEW"
207
+ )
208
+
209
+ assert_equal "value1", merged["key1"]
210
+ assert_equal "OVERRIDE", merged["other"]
211
+ assert_equal "NEW", merged["new"]
212
+ end
146
213
 
147
- refute_same hash, map.to_hash, "each #to_hash result is a copy"
148
214
  end
149
215
 
150
216
  end
@@ -25,6 +25,7 @@ Gem::Specification.new do |spec|
25
25
  spec.add_dependency "hashie", ">= 2.0.5", "< 2.1" # used for Indexer#settings
26
26
  spec.add_dependency "slop", ">= 3.4.5", "< 4.0" # command line parsing
27
27
  spec.add_dependency "yell" # logging
28
+ spec.add_dependency "dot-properties", ">= 0.1.1" # reading java style .properties
28
29
 
29
30
  spec.add_development_dependency "bundler", "~> 1.3"
30
31
  spec.add_development_dependency "rake"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: traject
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Rochkind
@@ -9,38 +9,48 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-11-25 00:00:00.000000000 Z
12
+ date: 2014-04-07 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - '>='
18
+ - !ruby/object:Gem::Version
19
+ version: 0.8.0
15
20
  name: marc
21
+ prerelease: false
22
+ type: :runtime
16
23
  version_requirements: !ruby/object:Gem::Requirement
17
24
  requirements:
18
25
  - - '>='
19
26
  - !ruby/object:Gem::Version
20
27
  version: 0.8.0
28
+ - !ruby/object:Gem::Dependency
21
29
  requirement: !ruby/object:Gem::Requirement
22
30
  requirements:
23
31
  - - '>='
24
32
  - !ruby/object:Gem::Version
25
- version: 0.8.0
33
+ version: 0.1.1
34
+ name: marc-marc4j
26
35
  prerelease: false
27
36
  type: :runtime
28
- - !ruby/object:Gem::Dependency
29
- name: marc-marc4j
30
37
  version_requirements: !ruby/object:Gem::Requirement
31
38
  requirements:
32
39
  - - '>='
33
40
  - !ruby/object:Gem::Version
34
41
  version: 0.1.1
42
+ - !ruby/object:Gem::Dependency
35
43
  requirement: !ruby/object:Gem::Requirement
36
44
  requirements:
37
45
  - - '>='
38
46
  - !ruby/object:Gem::Version
39
- version: 0.1.1
47
+ version: 2.0.5
48
+ - - <
49
+ - !ruby/object:Gem::Version
50
+ version: '2.1'
51
+ name: hashie
40
52
  prerelease: false
41
53
  type: :runtime
42
- - !ruby/object:Gem::Dependency
43
- name: hashie
44
54
  version_requirements: !ruby/object:Gem::Requirement
45
55
  requirements:
46
56
  - - '>='
@@ -49,18 +59,18 @@ dependencies:
49
59
  - - <
50
60
  - !ruby/object:Gem::Version
51
61
  version: '2.1'
62
+ - !ruby/object:Gem::Dependency
52
63
  requirement: !ruby/object:Gem::Requirement
53
64
  requirements:
54
65
  - - '>='
55
66
  - !ruby/object:Gem::Version
56
- version: 2.0.5
67
+ version: 3.4.5
57
68
  - - <
58
69
  - !ruby/object:Gem::Version
59
- version: '2.1'
70
+ version: '4.0'
71
+ name: slop
60
72
  prerelease: false
61
73
  type: :runtime
62
- - !ruby/object:Gem::Dependency
63
- name: slop
64
74
  version_requirements: !ruby/object:Gem::Requirement
65
75
  requirements:
66
76
  - - '>='
@@ -69,72 +79,76 @@ dependencies:
69
79
  - - <
70
80
  - !ruby/object:Gem::Version
71
81
  version: '4.0'
82
+ - !ruby/object:Gem::Dependency
72
83
  requirement: !ruby/object:Gem::Requirement
73
84
  requirements:
74
85
  - - '>='
75
86
  - !ruby/object:Gem::Version
76
- version: 3.4.5
77
- - - <
78
- - !ruby/object:Gem::Version
79
- version: '4.0'
87
+ version: '0'
88
+ name: yell
80
89
  prerelease: false
81
90
  type: :runtime
82
- - !ruby/object:Gem::Dependency
83
- name: yell
84
91
  version_requirements: !ruby/object:Gem::Requirement
85
92
  requirements:
86
93
  - - '>='
87
94
  - !ruby/object:Gem::Version
88
95
  version: '0'
96
+ - !ruby/object:Gem::Dependency
89
97
  requirement: !ruby/object:Gem::Requirement
90
98
  requirements:
91
99
  - - '>='
92
100
  - !ruby/object:Gem::Version
93
- version: '0'
101
+ version: 0.1.1
102
+ name: dot-properties
94
103
  prerelease: false
95
104
  type: :runtime
96
- - !ruby/object:Gem::Dependency
97
- name: bundler
98
105
  version_requirements: !ruby/object:Gem::Requirement
99
106
  requirements:
100
- - - ~>
107
+ - - '>='
101
108
  - !ruby/object:Gem::Version
102
- version: '1.3'
109
+ version: 0.1.1
110
+ - !ruby/object:Gem::Dependency
103
111
  requirement: !ruby/object:Gem::Requirement
104
112
  requirements:
105
113
  - - ~>
106
114
  - !ruby/object:Gem::Version
107
115
  version: '1.3'
116
+ name: bundler
108
117
  prerelease: false
109
118
  type: :development
110
- - !ruby/object:Gem::Dependency
111
- name: rake
112
119
  version_requirements: !ruby/object:Gem::Requirement
113
120
  requirements:
114
- - - '>='
121
+ - - ~>
115
122
  - !ruby/object:Gem::Version
116
- version: '0'
123
+ version: '1.3'
124
+ - !ruby/object:Gem::Dependency
117
125
  requirement: !ruby/object:Gem::Requirement
118
126
  requirements:
119
127
  - - '>='
120
128
  - !ruby/object:Gem::Version
121
129
  version: '0'
130
+ name: rake
122
131
  prerelease: false
123
132
  type: :development
124
- - !ruby/object:Gem::Dependency
125
- name: minitest
126
133
  version_requirements: !ruby/object:Gem::Requirement
127
134
  requirements:
128
135
  - - '>='
129
136
  - !ruby/object:Gem::Version
130
137
  version: '0'
138
+ - !ruby/object:Gem::Dependency
131
139
  requirement: !ruby/object:Gem::Requirement
132
140
  requirements:
133
141
  - - '>='
134
142
  - !ruby/object:Gem::Version
135
143
  version: '0'
144
+ name: minitest
136
145
  prerelease: false
137
146
  type: :development
147
+ version_requirements: !ruby/object:Gem::Requirement
148
+ requirements:
149
+ - - '>='
150
+ - !ruby/object:Gem::Version
151
+ version: '0'
138
152
  description:
139
153
  email:
140
154
  - none@nowhere.org
@@ -288,7 +302,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
288
302
  version: '0'
289
303
  requirements: []
290
304
  rubyforge_project:
291
- rubygems_version: 2.1.11
305
+ rubygems_version: 2.1.9
292
306
  signing_key:
293
307
  specification_version: 4
294
308
  summary: Index MARC to Solr; or generally process source records to hash-like structures