traject 1.0.0 → 1.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 31243d453f43fbc8f8634c2511340d8fb96c606f
4
- data.tar.gz: a843a583f235920931304fc6baa7ff75ce91dfab
3
+ metadata.gz: 4ae9c6a2d87868021cae1b48637592238387d8a1
4
+ data.tar.gz: 578c645162da3560ff5e01a28cf43b36e82734f8
5
5
  SHA512:
6
- metadata.gz: 5a0e6f3c695a5fe497f4cadd14fba7a2a10a3ea1ff40e42a6625949227d1eb3f995dc4d0ecb228f5e98c95f737df1c1e0e8089b59e17fcb0fedca8de5208b535
7
- data.tar.gz: 3598e91ed10b039c4382d56ddc69939713425c4f059eb27adfddfdc272e25f2c52e8f900a31b94ceda6576d716c2155188dc03d4697156314e71aadf99a92471
6
+ metadata.gz: e0bf13c4ff3cab492b6be8922ae22e33311f701756910526eb3206522774b1519db07324531ebd2c366d5907854c78bb6cde0d65eeff78258513d37fae1a3a57
7
+ data.tar.gz: dd251b15afafe2a11cbefe8493e145d9b9b883183b5f884a5b61557046c74b699427fe2766d5df587e27729f6c22052b2f672b756aba8fee33149f6abbcc4f40
data/Gemfile CHANGED
@@ -8,5 +8,5 @@ group :development do
8
8
  end
9
9
 
10
10
  group :debug do
11
- gem "ruby-debug"
11
+ gem "ruby-debug", :platform => "jruby"
12
12
  end
@@ -62,7 +62,7 @@ The third optional argument is a
62
62
  object. Most of the time you don't need it, but you can use it for
63
63
  some sophisticated functionality, for example using these Context methods:
64
64
 
65
- * `context.clipboard` A hash into which you can stuff values that you want to pass from one indexing step to another. For example, if you go through a bunch of work to query a database and get a result you'll need more than once, stick the results somewhere in the clipboard.
65
+ * `context.clipboard` A hash into which you can stuff values that you want to pass from one indexing step to another. For example, if you go through a bunch of work to query a database and get a result you'll need more than once, stick the results somewhere in the clipboard. This clipboard is record-specific, and won't persist between records.
66
66
  * `context.position` The position of the record in the input file (e.g., was it the first record, seoncd, etc.). Useful for error reporting
67
67
  * `context.output_hash` A hash mapping the field names (generally defined in `to_field` calls) to an array of values to be sent to the writer associated with that field. This allows you to modify what goes to the writer without going through a `to_field` call -- you can just set `context.output_hash['myfield'] = ['my', 'values']` and you're set. See below for more examples
68
68
  * `context.skip!(msg)` An assertion that this record should be ignored. No more indexing steps will be called, no results will be sent to the writer, and a `debug`-level log message will be written stating that the record was skipped.
@@ -262,4 +262,4 @@ args for `each_record`.
262
262
 
263
263
  * **Once you call `context.skip!(msg)` no more index steps will be run for that record**. So if you have any cleanup code, you'll need to make sure to call it yourself.
264
264
 
265
- * **By default, `trajcet` indexing runs multi-threaded**. In the current implementation, the indexing steps for one record are *not* split across threads, but different records can be processed simultaneously by more than one thread. That means you need to make sure your code is thread-safe (or always set `processing_thread_pool` to 0).
265
+ * **By default, `trajcet` indexing runs multi-threaded**. In the current implementation, the indexing steps for one record are *not* split across threads, but different records can be processed simultaneously by more than one thread. That means you need to make sure your code is thread-safe (or always set `processing_thread_pool` to 0).
@@ -26,15 +26,23 @@ module Traject::Macros
26
26
  accumulator.concat list.uniq if list
27
27
  end
28
28
  end
29
+
29
30
  # If a num begins with a known OCLC prefix, return it without the prefix.
30
31
  # otherwise nil.
32
+ #
33
+ # Allow (OCoLC) and/or ocn/ocm/on
34
+
35
+ OCLCPAT = /
36
+ \A\s*
37
+ (?:(?:\(OCoLC\)) |
38
+ (?:\(OCoLC\))?(?:(?:ocm)|(?:ocn)|(?:on))
39
+ )(\d+)
40
+ /x
41
+
31
42
  def self.oclcnum_extract(num)
32
- stripped = num.gsub(/\A(ocm)|(ocn)|(on)|(\(OCoLC\))/, '')
33
- if num != stripped
34
- # it had the prefix, which we've now stripped
35
- return stripped
43
+ if OCLCPAT.match(num)
44
+ return $1
36
45
  else
37
- # it didn't have the prefix
38
46
  return nil
39
47
  end
40
48
  end
@@ -1,6 +1,7 @@
1
1
  require 'traject'
2
2
 
3
3
  require 'yaml'
4
+ require 'dot-properties'
4
5
 
5
6
 
6
7
  module Traject
@@ -14,7 +15,8 @@ module Traject
14
15
  #
15
16
  # What makes it more useful than a stunted hash is it's ability to load
16
17
  # the hash definitions from configuration files, either pure ruby,
17
- # yaml, or (limited subset of) java .properties file.
18
+ # yaml, or java .properties file (not all .properties features may
19
+ # be supported, we use dot-properties gem for reading)
18
20
  #
19
21
  # traject's `extract_marc` macro allows you to specify a :translation_map=>filename argument
20
22
  # that will automatically find and use a translation map on the resulting data:
@@ -197,7 +199,9 @@ module Traject
197
199
  # Returns a dup of internal hash, dup so you can modify it
198
200
  # if you like.
199
201
  def to_hash
200
- @hash.dup
202
+ dup = @hash.dup
203
+ dup.delete("__default__")
204
+ dup
201
205
  end
202
206
 
203
207
  # Run every element of an array through this translation map,
@@ -224,6 +228,24 @@ module Traject
224
228
  array.replace( self.translate_array(array))
225
229
  end
226
230
 
231
+ # Return a new TranslationMap that results from merging argument on top of self.
232
+ # Can be useful for taking an existing translation map, but merging a few
233
+ # overrides on top.
234
+ #
235
+ # merged_map = TranslationMap.new(something).merge TranslationMap.new(else)
236
+ # #...
237
+ # merged_map.translate_array(something) # etc
238
+ #
239
+ # If a default is set in the second map, it will merge over the first too.
240
+ #
241
+ # You can also pass in a plain hash as an arg, instead of an existing TranslationMap:
242
+ #
243
+ # TranslationMap.new(something).merge("overridden_key" => "value", "a" => "")
244
+ def merge(other_map)
245
+ default = other_map.default || self.default
246
+ TranslationMap.new(self.to_hash.merge(other_map.to_hash), :default => default)
247
+ end
248
+
227
249
  class NotFound < Exception
228
250
  def initialize(path)
229
251
  super("No translation map definition file found at 'translation_maps/#{path}.[rb|yaml|properties]' in load path: #{$LOAD_PATH}")
@@ -232,36 +254,10 @@ module Traject
232
254
 
233
255
  protected
234
256
 
235
- # No built-in way to read java-style .properties, we hack it.
236
- # inspired by various hacky things found google ruby java properties parse
237
- # .properties spec seems to be:
238
- # http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load%28java.io.Reader%29
239
- #
240
- # We do NOT handle split lines, don't do that!
241
- def self.read_properties(file_name)
242
- hash = {}
243
- i = 0
244
- f = File.open(file_name)
245
- f.each_line do |line|
246
- i += 1
247
-
248
- line.strip!
249
-
250
- # skip blank lines
251
- next if line.empty?
252
-
253
- # skip comment lines
254
- next if line =~ /^\s*[!\#].*$/
255
-
256
- if line =~ /\A([^:=]+)[\:\=]\s*(.*)\s*\Z/
257
- hash[$1.strip] = $2
258
- else
259
- raise IOError.new("Can't parse from #{file_name} line #{i}: #{line}")
260
- end
261
- end
262
- f.close
263
-
264
- return hash
257
+ # We use dot-properties gem for reading .properties files,
258
+ # return a hash.
259
+ def self.read_properties(file_name)
260
+ return DotProperties.load(file_name).to_h
265
261
  end
266
262
 
267
263
  end
@@ -1,3 +1,3 @@
1
1
  module Traject
2
- VERSION = "1.0.0"
2
+ VERSION = "1.1.0"
3
3
  end
@@ -31,6 +31,25 @@ describe "Traject::Macros::Marc21Semantics" do
31
31
 
32
32
  assert_equal({}, @indexer.map_record(empty_record))
33
33
  end
34
+
35
+ it "deals with all prefixed OCLC nunbers" do
36
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', '(OCoLC)ocm111111111']))
37
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', '(OCoLC)222222222']))
38
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', 'ocm333333333']))
39
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', 'ocn444444444']))
40
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', '(OCoLC)ocn555555555']))
41
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', '(OCoLC)on666666666']))
42
+ @record.append(MARC::DataField.new('035', ' ', ' ', ['a', '777777777'])) # not OCLC number
43
+
44
+ @indexer.instance_eval do
45
+ to_field "oclcnum", oclcnum
46
+ end
47
+ output = @indexer.map_record(@record)
48
+
49
+ assert_equal %w{47971712 111111111 222222222 333333333 444444444 555555555 666666666}, output["oclcnum"]
50
+ end
51
+
52
+
34
53
 
35
54
  it "#marc_series_facet" do
36
55
  @record = MARC::Reader.new(support_file_path "louis_armstrong.marc").to_a.first
@@ -109,7 +109,7 @@ describe "TranslationMap" do
109
109
 
110
110
  assert_equal "DEFAULT LITERAL", map["not in the map"]
111
111
  end
112
-
112
+
113
113
  it "respects __default__ __passthrough__" do
114
114
  map = Traject::TranslationMap.new("default_passthrough")
115
115
 
@@ -135,16 +135,82 @@ describe "TranslationMap" do
135
135
  assert_equal ["one"], values
136
136
  end
137
137
 
138
- it "#to_hash" do
139
- map = Traject::TranslationMap.new("yaml_map")
138
+ describe "#to_hash" do
139
+ it "produces a hash" do
140
+ map = Traject::TranslationMap.new("yaml_map")
141
+
142
+ hash = map.to_hash
143
+
144
+ assert_kind_of Hash, hash
145
+
146
+ assert ! hash.frozen?, "#to_hash result is not frozen"
147
+
148
+ refute_same hash, map.to_hash, "each #to_hash result is a copy"
149
+ end
150
+
151
+ it "does not include __default__ key" do
152
+ map = Traject::TranslationMap.new("default_passthrough")
153
+
154
+ refute map.to_hash.has_key?("__default__")
155
+ assert_nil map.to_hash["__default__"]
156
+ end
157
+
158
+ end
159
+
160
+ describe "#merge" do
161
+ it "merges" do
162
+ original = Traject::TranslationMap.new("yaml_map")
163
+ override = Traject::TranslationMap.new("other" => "OVERRIDE", "new" => "NEW")
164
+
165
+ merged = original.merge(override)
140
166
 
141
- hash = map.to_hash
167
+ assert_equal "value1", merged["key1"]
168
+ assert_equal "OVERRIDE", merged["other"]
169
+ assert_equal "NEW", merged["new"]
170
+ end
171
+
172
+ it "passes through default from first map when no default in second" do
173
+ original = Traject::TranslationMap.new("yaml_map", :default => "DEFAULT_VALUE")
174
+ override = Traject::TranslationMap.new("other" => "OVERRIDE")
175
+
176
+ merged = original.merge(override)
177
+
178
+ assert_equal "DEFAULT_VALUE", merged.default
179
+ assert_equal "DEFAULT_VALUE", merged["SOME_KEY_NOT_MATCHED"]
180
+ end
181
+
182
+ it "passes through default from second map when no default in first" do
183
+ original = Traject::TranslationMap.new("yaml_map")
184
+ override = Traject::TranslationMap.new({"other" => "OVERRIDE"}, :default => "DEFAULT_VALUE")
185
+
186
+ merged = original.merge(override)
142
187
 
143
- assert_kind_of Hash, hash
188
+ assert_equal "DEFAULT_VALUE", merged.default
189
+ assert_equal "DEFAULT_VALUE", merged["SOME_KEY_NOT_MATCHED"]
190
+ end
191
+
192
+ it "merges second default on top of first" do
193
+ original = Traject::TranslationMap.new("yaml_map", :default => "DEFAULT_VALUE")
194
+ override = Traject::TranslationMap.new({"other" => "OVERRIDE"}, :default => "NEW_DEFAULT_VALUE")
144
195
 
145
- assert ! hash.frozen?, "#to_hash result is not frozen"
196
+ merged = original.merge(override)
197
+
198
+ assert_equal "NEW_DEFAULT_VALUE", merged.default
199
+ assert_equal "NEW_DEFAULT_VALUE", merged["SOME_KEY_NOT_MATCHED"]
200
+ end
201
+
202
+ it "merges in a plain hash too" do
203
+ original = Traject::TranslationMap.new("yaml_map")
204
+ merged = original.merge(
205
+ "other" => "OVERRIDE",
206
+ "new" => "NEW"
207
+ )
208
+
209
+ assert_equal "value1", merged["key1"]
210
+ assert_equal "OVERRIDE", merged["other"]
211
+ assert_equal "NEW", merged["new"]
212
+ end
146
213
 
147
- refute_same hash, map.to_hash, "each #to_hash result is a copy"
148
214
  end
149
215
 
150
216
  end
@@ -25,6 +25,7 @@ Gem::Specification.new do |spec|
25
25
  spec.add_dependency "hashie", ">= 2.0.5", "< 2.1" # used for Indexer#settings
26
26
  spec.add_dependency "slop", ">= 3.4.5", "< 4.0" # command line parsing
27
27
  spec.add_dependency "yell" # logging
28
+ spec.add_dependency "dot-properties", ">= 0.1.1" # reading java style .properties
28
29
 
29
30
  spec.add_development_dependency "bundler", "~> 1.3"
30
31
  spec.add_development_dependency "rake"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: traject
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Rochkind
@@ -9,38 +9,48 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-11-25 00:00:00.000000000 Z
12
+ date: 2014-04-07 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - '>='
18
+ - !ruby/object:Gem::Version
19
+ version: 0.8.0
15
20
  name: marc
21
+ prerelease: false
22
+ type: :runtime
16
23
  version_requirements: !ruby/object:Gem::Requirement
17
24
  requirements:
18
25
  - - '>='
19
26
  - !ruby/object:Gem::Version
20
27
  version: 0.8.0
28
+ - !ruby/object:Gem::Dependency
21
29
  requirement: !ruby/object:Gem::Requirement
22
30
  requirements:
23
31
  - - '>='
24
32
  - !ruby/object:Gem::Version
25
- version: 0.8.0
33
+ version: 0.1.1
34
+ name: marc-marc4j
26
35
  prerelease: false
27
36
  type: :runtime
28
- - !ruby/object:Gem::Dependency
29
- name: marc-marc4j
30
37
  version_requirements: !ruby/object:Gem::Requirement
31
38
  requirements:
32
39
  - - '>='
33
40
  - !ruby/object:Gem::Version
34
41
  version: 0.1.1
42
+ - !ruby/object:Gem::Dependency
35
43
  requirement: !ruby/object:Gem::Requirement
36
44
  requirements:
37
45
  - - '>='
38
46
  - !ruby/object:Gem::Version
39
- version: 0.1.1
47
+ version: 2.0.5
48
+ - - <
49
+ - !ruby/object:Gem::Version
50
+ version: '2.1'
51
+ name: hashie
40
52
  prerelease: false
41
53
  type: :runtime
42
- - !ruby/object:Gem::Dependency
43
- name: hashie
44
54
  version_requirements: !ruby/object:Gem::Requirement
45
55
  requirements:
46
56
  - - '>='
@@ -49,18 +59,18 @@ dependencies:
49
59
  - - <
50
60
  - !ruby/object:Gem::Version
51
61
  version: '2.1'
62
+ - !ruby/object:Gem::Dependency
52
63
  requirement: !ruby/object:Gem::Requirement
53
64
  requirements:
54
65
  - - '>='
55
66
  - !ruby/object:Gem::Version
56
- version: 2.0.5
67
+ version: 3.4.5
57
68
  - - <
58
69
  - !ruby/object:Gem::Version
59
- version: '2.1'
70
+ version: '4.0'
71
+ name: slop
60
72
  prerelease: false
61
73
  type: :runtime
62
- - !ruby/object:Gem::Dependency
63
- name: slop
64
74
  version_requirements: !ruby/object:Gem::Requirement
65
75
  requirements:
66
76
  - - '>='
@@ -69,72 +79,76 @@ dependencies:
69
79
  - - <
70
80
  - !ruby/object:Gem::Version
71
81
  version: '4.0'
82
+ - !ruby/object:Gem::Dependency
72
83
  requirement: !ruby/object:Gem::Requirement
73
84
  requirements:
74
85
  - - '>='
75
86
  - !ruby/object:Gem::Version
76
- version: 3.4.5
77
- - - <
78
- - !ruby/object:Gem::Version
79
- version: '4.0'
87
+ version: '0'
88
+ name: yell
80
89
  prerelease: false
81
90
  type: :runtime
82
- - !ruby/object:Gem::Dependency
83
- name: yell
84
91
  version_requirements: !ruby/object:Gem::Requirement
85
92
  requirements:
86
93
  - - '>='
87
94
  - !ruby/object:Gem::Version
88
95
  version: '0'
96
+ - !ruby/object:Gem::Dependency
89
97
  requirement: !ruby/object:Gem::Requirement
90
98
  requirements:
91
99
  - - '>='
92
100
  - !ruby/object:Gem::Version
93
- version: '0'
101
+ version: 0.1.1
102
+ name: dot-properties
94
103
  prerelease: false
95
104
  type: :runtime
96
- - !ruby/object:Gem::Dependency
97
- name: bundler
98
105
  version_requirements: !ruby/object:Gem::Requirement
99
106
  requirements:
100
- - - ~>
107
+ - - '>='
101
108
  - !ruby/object:Gem::Version
102
- version: '1.3'
109
+ version: 0.1.1
110
+ - !ruby/object:Gem::Dependency
103
111
  requirement: !ruby/object:Gem::Requirement
104
112
  requirements:
105
113
  - - ~>
106
114
  - !ruby/object:Gem::Version
107
115
  version: '1.3'
116
+ name: bundler
108
117
  prerelease: false
109
118
  type: :development
110
- - !ruby/object:Gem::Dependency
111
- name: rake
112
119
  version_requirements: !ruby/object:Gem::Requirement
113
120
  requirements:
114
- - - '>='
121
+ - - ~>
115
122
  - !ruby/object:Gem::Version
116
- version: '0'
123
+ version: '1.3'
124
+ - !ruby/object:Gem::Dependency
117
125
  requirement: !ruby/object:Gem::Requirement
118
126
  requirements:
119
127
  - - '>='
120
128
  - !ruby/object:Gem::Version
121
129
  version: '0'
130
+ name: rake
122
131
  prerelease: false
123
132
  type: :development
124
- - !ruby/object:Gem::Dependency
125
- name: minitest
126
133
  version_requirements: !ruby/object:Gem::Requirement
127
134
  requirements:
128
135
  - - '>='
129
136
  - !ruby/object:Gem::Version
130
137
  version: '0'
138
+ - !ruby/object:Gem::Dependency
131
139
  requirement: !ruby/object:Gem::Requirement
132
140
  requirements:
133
141
  - - '>='
134
142
  - !ruby/object:Gem::Version
135
143
  version: '0'
144
+ name: minitest
136
145
  prerelease: false
137
146
  type: :development
147
+ version_requirements: !ruby/object:Gem::Requirement
148
+ requirements:
149
+ - - '>='
150
+ - !ruby/object:Gem::Version
151
+ version: '0'
138
152
  description:
139
153
  email:
140
154
  - none@nowhere.org
@@ -288,7 +302,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
288
302
  version: '0'
289
303
  requirements: []
290
304
  rubyforge_project:
291
- rubygems_version: 2.1.11
305
+ rubygems_version: 2.1.9
292
306
  signing_key:
293
307
  specification_version: 4
294
308
  summary: Index MARC to Solr; or generally process source records to hash-like structures