openurl 0.3.1 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -38,22 +38,32 @@ response format, so parsing the returned value is up to you.
38
38
 
39
39
  ## Ruby 1.9 and encodings
40
40
 
41
- Gem does run and all tests pass under ruby 1.9. There is very limited
42
- support for handling character encodings in the proper 1.9 way.
41
+ Gem is currently developed under 1.9.3, although it should work under 1.8.7.
42
+ There is some limited support for handling character encodings in the
43
+ proper ruby 1.9 way.
43
44
 
44
- CTX or XML context objects will be assumed utf-8 even if the ruby string
45
- they are held in has an ascii-8bit encoding. They will forced into a utf-8 encoding.
46
- This seems to be a side effect of the REXML and CGI libraries we use to parse,
47
- but there are runnable tests that assert it is true. (see test/encoding_test.rb)
45
+ ### load_from_kev, load_from_form_vars
48
46
 
49
- Incoming context objects with a non-utf8 ctx_enc value will *not* be handled
50
- properly, they'll still be forced to utf8.
47
+ When using ContextObject#load_from_kev or #load_from_form_vars, input
48
+ will be assumed to be UTF8, unless a ctx_enc value is present specifying
49
+ ISO-8859-1. The actual ruby #encoding of the input string/stream is ignored,
50
+ data will be force_encoded on read. If input is specified ISO-8859-1 with
51
+ ctx_enc data _will_ be _transcoded_ to UTF8 on read.
52
+
53
+ Any illegal bytes for the input character encoding _will_ be replaced by
54
+ the unicode replacement symbol ("\uFFFD") on read.
55
+
56
+ ### load_from_xml
57
+
58
+ Input will be assumed UTF8, and force_encoded to UTF8. Illegal bytes in input
59
+ for UTF8 will be replaced by unicode replacement char ("\uFFFD").
60
+
61
+ ### Programmatic creation of context objects
51
62
 
52
63
  Programmatically created context objects, you must ensure all strings are
53
- represented as utf8 encoded yourself.
64
+ represented and tagged as utf8 encoded yourself, no detection or trascoding
65
+ or correction will be done for you.
54
66
 
55
- More sophisticated encoding handling can theoretically be added, but it's
56
- somewhat non-trivial, and it's not clear anyone needs it.
57
67
 
58
68
  ## INSTALLATION
59
69
 
@@ -0,0 +1,663 @@
1
+ # encoding: UTF-8
2
+
3
+ require 'ensure_valid_encoding'
4
+
5
+ module OpenURL
6
+
7
+ if RUBY_VERSION < '1.9'
8
+ require 'jcode'
9
+ $KCODE='UTF-8'
10
+ end
11
+
12
+ ##
13
+ # The ContextObject class is intended to both create new OpenURL 1.0 context
14
+ # objects or parse existing ones, either from Key-Encoded Values (KEVs) or
15
+ # XML.
16
+ # == Create a new ContextObject programmatically
17
+ # require 'openurl/context_object'
18
+ # include OpenURL
19
+ #
20
+ # ctx = ContextObject.new
21
+ # ctx.referent.set_format('journal') # important to do this FIRST.
22
+ #
23
+ # ctx.referent.add_identifier('info:doi/10.1016/j.ipm.2005.03.024')
24
+ # ctx.referent.set_metadata('issn', '0306-4573')
25
+ # ctx.referent.set_metadata('aulast', 'Bollen')
26
+ # ctx.referrer.add_identifier('info:sid/google')
27
+ # puts ctx.kev
28
+ # # url_ver=Z39.88-2004&ctx_tim=2007-10-29T12%3A18%3A53-0400&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&ctx_id=&rft.issn=0306-4573&rft.aulast=Bollen&rft_val_fmt=info%3Aofi%2Ffmt%3Axml%3Axsd%3Ajournal&rft_id=info%3Adoi%2F10.1016%2Fj.ipm.2005.03.024&rfr_id=info%3Asid%2Fgoogle
29
+ #
30
+ # == Create a new ContextObject from an existing kev or XML serialization:
31
+ #
32
+ # ContextObject.new_from_kev( kev_context_object )
33
+ # ContextObject.new_from_xml( xml_context_object ) # Can be String or REXML::Document
34
+ #
35
+ # == Serialize a ContextObject to kev or XML :
36
+ # ctx.kev
37
+ # ctx.xml
38
+ class ContextObject
39
+ include EnsureValidEncoding
40
+
41
+ attr_reader :admin, :referent, :referringEntity, :requestor, :referrer,
42
+ :serviceType, :resolver
43
+ attr_accessor :foreign_keys, :openurl_ver
44
+
45
+ @@defined_entities = {"rft"=>"referent", "rfr"=>"referrer", "rfe"=>"referring-entity", "req"=>"requestor", "svc"=>"service-type", "res"=>"resolver"}
46
+
47
+ # Creates a new ContextObject object and initializes the ContextObjectEntities.
48
+
49
+ def initialize()
50
+ @referent = ContextObjectEntity.new
51
+ @referrer = ContextObjectEntity.new
52
+ @referringEntity = ContextObjectEntity.new
53
+ @requestor = ContextObjectEntity.new
54
+ @serviceType = []
55
+ @resolver = []
56
+ @foreign_keys = {}
57
+ @openurl_ver = "Z39.88-2004"
58
+ @admin = {"ctx_ver"=>{"label"=>"version", "value"=>@openurl_ver}, "ctx_tim"=>{"label"=>"timestamp", "value"=>DateTime.now().to_s}, "ctx_id"=>{"label"=>"identifier", "value"=>""}, "ctx_enc"=>{"label"=>"encoding", "value"=>"info:ofi/enc:UTF-8"}}
59
+ end
60
+
61
+ # Any legal OpenURL 1.0 sends url_ver=Z39.88-2004, and usually
62
+ # ctx_ver=Z39.88-2004 too. However, sometimes we need to send
63
+ # an illegal OpenURL with a different openurl ver string, to deal
64
+ # with weird agents, for instance to trick SFX into doing the right thing.
65
+ def openurl_ver=(ver)
66
+ @openurl_ver = ver
67
+ @admin["ctx_ver"]["value"] = ver
68
+ end
69
+
70
+ def deep_copy
71
+ cloned = ContextObject.new
72
+ cloned.import_context_object( self )
73
+ return cloned
74
+ end
75
+
76
+ # Serialize the ContextObject to XML.
77
+
78
+ def xml
79
+ doc = REXML::Document.new()
80
+ coContainer = doc.add_element "ctx:context-objects"
81
+ coContainer.add_namespace("ctx","info:ofi/fmt:xml:xsd:ctx")
82
+ coContainer.add_namespace("xsi", "http://www.w3.org/2001/XMLSchema-instance")
83
+ coContainer.add_attribute("xsi:schemaLocation", "info:ofi/fmt:xml:xsd:ctx http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx")
84
+ co = coContainer.add_element "ctx:context-object"
85
+ @admin.each_key do |k|
86
+ next if k == "ctx_enc"
87
+ co.add_attribute(@admin[k]["label"], @admin[k]["value"])
88
+ end
89
+
90
+ [{@referent=>"rft"},
91
+ {@referringEntity=>"rfe"}, {@requestor=>"req"},
92
+ {@referrer=>"rfr"}].each do | entity |
93
+
94
+ entity.each do | ent, label |
95
+ ent.xml(co, label) unless ent.empty?
96
+ end
97
+ end
98
+
99
+ [{@serviceType=>"svc"}, {@resolver=>"res"}].each do |entity|
100
+ entity.each do | entCont, label |
101
+ entCont.each do |ent|
102
+ ent.xml(co, label) unless ent.empty?
103
+ end
104
+ end
105
+ end
106
+
107
+ return doc.to_s
108
+ end
109
+
110
+
111
+ # Output the ContextObject as a Key-encoded value string. Pass a boolean
112
+ # true argument if you do not want the ctx_tim key included.
113
+
114
+ def kev(no_date=false)
115
+ kevs = ["url_ver=#{self.openurl_ver}", "url_ctx_fmt=#{CGI.escape("info:ofi/fmt:kev:mtx:ctx")}"]
116
+
117
+ # Loop through the administrative metadata
118
+ @admin.each_key do |k|
119
+ next if k == "ctx_tim" && no_date
120
+ kevs.push(k+"="+CGI.escape(@admin[k]["value"].to_s)) if @admin[k]["value"]
121
+ end
122
+
123
+ {@referent=>"rft", @referringEntity=>"rfe", @requestor=>"req", @referrer=>"rfr"}.each do | ent, abbr |
124
+ kevs.push(ent.kev(abbr)) unless ent.empty?
125
+ end
126
+
127
+ {@serviceType=>"svc", @resolver=>"res"}.each do |entCont, abbr|
128
+ entCont.each do |ent|
129
+ next if ent.empty?
130
+ kevs.push(ent.kev(abbr))
131
+ end
132
+ end
133
+ return kevs.join("&")
134
+ end
135
+
136
+ # Outputs the ContextObject as a ruby hash---hash version of the kev format.
137
+ # Outputting a context object as a hash
138
+ # is imperfect, because context objects can have multiple elements
139
+ # with the same key--and because some keys depend on SAP1 vs SAP2.
140
+ # So this function is really deprecated, but here because we have so much
141
+ # code dependent on it.
142
+ def to_hash
143
+ co_hash = {"url_ver"=>self.openurl_ver, "url_ctx_fmt"=>"info:ofi/fmt:kev:mtx:ctx"}
144
+
145
+ @admin.each_key do |k|
146
+ co_hash[k]=@admin[k]["value"] if @admin[k]["value"]
147
+ end
148
+
149
+ {@referent=>"rft", @referringEntity=>"rfe", @requestor=>"req", @referrer=>"rfr"}.each do | ent, abbr |
150
+ co_hash.merge!(ent.to_hash(abbr)) unless ent.empty?
151
+ end
152
+
153
+ # svc and res are arrays of ContextObjectEntity
154
+ {@serviceType=>"svc", @resolver=>"res"}.each do |ent_list, abbr|
155
+ ent_list.each do |ent|
156
+ co_hash.merge!(ent.to_hash(abbr)) unless ent.empty?
157
+ end
158
+ end
159
+ return co_hash
160
+ end
161
+
162
+
163
+ # Outputs a COinS (ContextObject in SPANS) span tag for the ContextObject.
164
+ # Arguments are any other CSS classes you want included and the innerHTML
165
+ # content.
166
+
167
+ def coins (classnames=nil, innerHTML=nil)
168
+ return "<span class='Z3988 #{classnames}' title='"+CGI.escapeHTML(self.kev(true))+"'>#{innerHTML}</span>"
169
+ end
170
+
171
+
172
+ # Sets a ContextObject administration field.
173
+
174
+ def set_administration_key(key, val)
175
+ raise ArgumentException, "#{key} is not a valid admin key!" unless @admin.keys.index(key)
176
+ @admin[key]["value"] = val
177
+ end
178
+
179
+ # Imports an existing Key-encoded value string and sets the appropriate
180
+ # entities.
181
+
182
+ def import_kev(kev)
183
+ co = CGI::parse(kev)
184
+ co2 = {}
185
+ co.each do |key, val|
186
+ if val.is_a?(Array)
187
+ if val.length == 1
188
+ co2[key] = val[0]
189
+ else
190
+ co2[key] = val
191
+ end
192
+ end
193
+ end
194
+ self.import_hash(co2)
195
+ end
196
+
197
+ # Initialize a new ContextObject object from an existing KEV
198
+
199
+ def self.new_from_kev(kev)
200
+ co = self.new
201
+ co.import_kev(kev)
202
+ return co
203
+ end
204
+
205
+ # Initialize a new ContextObject object from a CGI.params style hash
206
+ # Expects a hash with default value being nil though, not [] as CGI.params
207
+ # actually returns, beware. Can also accept a Rails-style params hash
208
+ # (single string values, not array values), although this may lose
209
+ # some context object information.
210
+ def self.new_from_form_vars(params)
211
+ co = self.new
212
+ if ctx_val = (params[:url_ctx_val]||params["url_ctx_val"]) and not ctx_val.empty? # this is where the context object stuff will be
213
+ co.admin.keys.each do | adm |
214
+ if params[adm.to_s]
215
+ if params[adm.to_s].is_a?(Array)
216
+ co.set_administration_key(adm, params[adm.to_s].first)
217
+ else
218
+ co.set_administration_key(adm, params[adm.to_s])
219
+ end
220
+ end
221
+ end
222
+
223
+ if ctx_format = (params["url_ctx_fmt"]||params[:url_ctx_fmt])
224
+ ctx_format = ctx_format.first if ctx_format.is_a?(Array)
225
+ ctx_val = ctx_val.first if ctx_val.is_a?(Array)
226
+ if ctx_format == "info:ofi/fmt:xml:xsd:ctx"
227
+ co.import_xml(ctx_val)
228
+ elsif ctx_format == "info:ofi/fmt:kev:mtx:ctx"
229
+ co.import_kev(ctx_val)
230
+ end
231
+ end
232
+ else # we'll assume this is standard inline kev
233
+ co.import_hash(params)
234
+ end
235
+ return co
236
+ end
237
+
238
+ # Imports an existing XML encoded context object and sets the appropriate
239
+ # entities
240
+
241
+ def import_xml(xml)
242
+ if xml.is_a?(String)
243
+ xml.force_encoding("UTF-8")
244
+ ensure_valid_encoding!(xml, :invalid => :replace)
245
+ doc = REXML::Document.new xml.gsub(/>[\s\t]*\n*[\s\t]*</, '><').strip
246
+ elsif xml.is_a?(REXML::Document)
247
+ doc = xml
248
+ else
249
+ raise ArgumentError, "Argument must be an REXML::Document or well-formed XML string"
250
+ end
251
+
252
+ # Cut to the context object
253
+ ctx = REXML::XPath.first(doc, ".//ctx:context-object", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"})
254
+
255
+
256
+
257
+
258
+
259
+
260
+ ctx.attributes.each do |attr, val|
261
+ @admin.each do |adm, vals|
262
+ self.set_administration_key(adm, val) if vals["label"] == attr
263
+ end
264
+ end
265
+ ctx.to_a.each do | ent |
266
+ if @@defined_entities.value?(ent.name())
267
+ self.import_entity(ent)
268
+ else
269
+ self.import_custom_node(ent)
270
+ end
271
+ end
272
+ end
273
+
274
+ # Initialize a new ContextObject object from an existing XML ContextObject
275
+
276
+ def self.new_from_xml(xml)
277
+ co = self.new
278
+ co.import_xml(xml)
279
+ return co
280
+ end
281
+
282
+
283
+ # Takes a hash of openurl key/values, as output by CGI.parse
284
+ # from a query string for example (values can be strings or arrays
285
+ # of string). Mutates hash in place.
286
+ #
287
+ # Force encodes to UTF8 or 8859-1, depending on ctx_enc
288
+ # presence and value.
289
+ #
290
+ # Replaces any illegal bytes with replacement chars,
291
+ # transcodes to UTF-8 if needed to ensure UTF8 on way out.
292
+ def clean_char_encoding!(hash)
293
+ source_encoding = if hash["ctx_ver"] == "info:ofi/enc:ISO-8859-1"
294
+ "ISO-8859-1"
295
+ else
296
+ "UTF-8"
297
+ end
298
+
299
+ hash.each_pair do |key, values|
300
+ # get a list of all terminal values, whether wrapped
301
+ # in arrays or not. We're going to mutate them.
302
+ [values].flatten.each do | v |
303
+ v.force_encoding(source_encoding)
304
+ if source_encoding == "UTF-8"
305
+ ensure_valid_encoding!(v, :invalid => :replace)
306
+ else
307
+ # transcode, replacing any bad chars.
308
+ v.encode!("UTF-8", :invalid => :replace)
309
+ end
310
+ end
311
+ end
312
+
313
+ end
314
+
315
+ # Imports an existing hash of ContextObject values and sets the appropriate
316
+ # entities.
317
+ def import_hash(hash)
318
+ clean_char_encoding!(hash)
319
+
320
+ ref = {}
321
+ {"@referent"=>"rft", "@referrer"=>"rfr", "@referringEntity"=>"rfe",
322
+ "@requestor"=>"req"}.each do | ent, abbr |
323
+ next unless hash["#{abbr}_val_fmt"]
324
+ val = hash["#{abbr}_val_fmt"]
325
+ val = val[0] if val.is_a?(Array)
326
+ self.instance_variable_set(ent.to_sym, ContextObjectEntityFactory.format(val))
327
+ end
328
+
329
+ {"@serviceType"=>"svc","@resolver"=>"res"}.each do | ent, abbr |
330
+ next unless hash["#{abbr}_val_fmt"]
331
+ val = hash["#{abbr}_val_fmt"]
332
+ val = val[0] if val.is_a?(Array)
333
+ self.instance_variable_set(ent.to_sym, [ContextObjectEntityFactory.format(val)])
334
+ end
335
+
336
+ openurl_keys = ["url_ver", "url_tim", "url_ctx_fmt"]
337
+ hash.each do |key, value|
338
+ val = value
339
+ val = value[0] if value.is_a?(Array)
340
+
341
+ next if value.nil? || value.empty?
342
+
343
+ if openurl_keys.include?(key)
344
+ next # None of these matter much for our purposes
345
+ elsif @admin.has_key?(key)
346
+ self.set_administration_key(key, val)
347
+ elsif key.match(/^[a-z]{3}_val_fmt/)
348
+ next
349
+ elsif key.match(/^[a-z]{3}_ref/)
350
+ # determines if we have a by-reference context object
351
+ (entity, v, fmt) = key.split("_")
352
+ ent = self.translate_abbr(entity)
353
+ unless ent
354
+ self.foreign_keys[key] = val
355
+ next
356
+ end
357
+ # by-reference requires two fields, format and location, if this is
358
+ # the first field we've run across, set a place holder until we get
359
+ # the other value
360
+ unless ref[entity]
361
+ if fmt
362
+ ref_key = "format"
363
+ else
364
+ ref_key = "location"
365
+ end
366
+ ref[entity] = [ref_key, val]
367
+ else
368
+ if ref[entity][0] == "format"
369
+ eval("@"+ent).set_reference(val, ref[entity][1])
370
+ else
371
+ eval("@"+ent).set_reference(ref[entity][1], val)
372
+ end
373
+ end
374
+ elsif key.match(/^[a-z]{3}_id$/)
375
+ # Get the entity identifier
376
+ (entity, v) = key.split("_")
377
+ ent = self.translate_abbr(entity)
378
+ unless ent
379
+ self.foreign_keys[key] = val
380
+ next
381
+ end
382
+ # May or may not be an array, turn it into one.
383
+ [value].flatten.each do | id |
384
+ eval("@"+ent).add_identifier(id)
385
+ end
386
+
387
+ elsif key.match(/^[a-z]{3}_dat$/)
388
+ # Get any private data
389
+ (entity, v) = key.split("_")
390
+ ent = self.translate_abbr(entity)
391
+ unless ent
392
+ self.foreign_keys[key] = val
393
+ next
394
+ end
395
+ eval("@"+ent).set_private_data(val)
396
+ else
397
+ # collect the entity metadata
398
+ keyparts = key.split(".")
399
+ if keyparts.length > 1
400
+ # This is 1.0 OpenURL
401
+ ent = self.translate_abbr(keyparts[0])
402
+ unless ent
403
+ self.foreign_keys[key] = val
404
+ next
405
+ end
406
+ eval("@"+ent).set_metadata(keyparts[1], val)
407
+ else
408
+ # This is a 0.1 OpenURL. Your mileage may vary on how accurately
409
+ # this maps.
410
+ if key == 'id'
411
+ if value.is_a?(Array)
412
+ value.each do | id |
413
+ @referent.add_identifier(id)
414
+ end
415
+ else
416
+ @referent.add_identifier(val)
417
+ end
418
+ elsif key == 'sid'
419
+ @referrer.set_identifier("info:sid/"+val.to_s)
420
+ elsif key == 'pid'
421
+ @referent.set_private_data(val.to_s)
422
+ else
423
+ @referent.set_metadata(key, val)
424
+ end
425
+ end
426
+ end
427
+ end
428
+
429
+
430
+
431
+ # Initialize a new ContextObject object from an existing key/value hash
432
+ co = self.new
433
+ co.import_hash(hash)
434
+ return co
435
+ end
436
+
437
+ # if we don't have a referent format (most likely because we have a 0.1
438
+ # OpenURL), try to determine something from the genre. If that doesn't
439
+ # exist, just call it a journal since most 0.1 OpenURLs would be one,
440
+ # anyway.
441
+ unless @referent.format
442
+ fmt = case @referent.metadata['genre']
443
+ when /article|journal|issue|proceeding|conference|preprint/ then 'journal'
444
+ when /book|bookitem|report|document/ then 'book'
445
+ else 'journal'
446
+ end
447
+ @referent.set_format(fmt)
448
+ end
449
+ end
450
+
451
+ # Translates the abbreviated entity (rft, rfr, etc.) to the associated class
452
+ # name. For repeatable entities, uses the first object in the array. Returns
453
+ # a string of the object name which would then be eval'ed to call a method
454
+ # upon.
455
+
456
+ def translate_abbr(abbr)
457
+ if @@defined_entities.has_key?(abbr)
458
+ ent = @@defined_entities[abbr]
459
+ if ent == "service-type"
460
+ ent = "serviceType[0]"
461
+ elsif ent == "resolver"
462
+ ent = "resolver[0]"
463
+ elsif ent == "referring-entity"
464
+ ent = "referringEntity"
465
+ end
466
+ else
467
+ return nil
468
+ end
469
+ return ent
470
+ end
471
+
472
+ def self.entities(term)
473
+ return @@defined_entities[term] if @@defined_entities.keys.index(term)
474
+ return @@defined_entities[@@defined_entities.values.index(term)] if @@defined_entities.values.index(term)
475
+ return nil
476
+
477
+ end
478
+
479
+ # Imports an existing OpenURL::ContextObject object and sets the appropriate
480
+ # entity values.
481
+
482
+ def import_context_object(context_object)
483
+ @admin.each_key { |k|
484
+ self.set_administration_key(k, context_object.admin[k]["value"])
485
+ }
486
+ ["@referent", "@referringEntity", "@requestor", "@referrer"].each do | ent |
487
+ self.instance_variable_set(ent.to_sym, Marshal::load(Marshal.dump(context_object.instance_variable_get(ent.to_sym))))
488
+ end
489
+ context_object.serviceType.each { |svc|
490
+ @serviceType << Marshal::load(Marshal.dump(svc))
491
+ }
492
+ context_object.resolver.each { |res|
493
+ @resolver << Marshal::load(Marshal.dump(res))
494
+ }
495
+ context_object.foreign_keys.each do | key, val |
496
+ self.foreign_keys[key] = val
497
+ end
498
+ end
499
+
500
+ # Initialize a new ContextObject object from an existing
501
+ # OpenURL::ContextObject
502
+
503
+ def self.new_from_context_object(context_object)
504
+ co = self.new
505
+ co.import_context_object(context_object)
506
+ return co
507
+ end
508
+
509
+ def referent=(entity)
510
+ raise ArgumentError, "Referent must be an OpenURL::ContextObjectEntity" unless entity.is_a?(OpenURL::ContextObjectEntity)
511
+ @referent=entity
512
+ end
513
+
514
+ def referrer=(entity)
515
+ raise ArgumentError, "Referrer must be an OpenURL::ContextObjectEntity" unless entity.is_a?(OpenURL::ContextObjectEntity)
516
+ @referrer=entity
517
+ end
518
+
519
+ def referringEntity=(entity)
520
+ raise ArgumentError, "Referring-Entity must be an OpenURL::ContextObjectEntity" unless entity.is_a?(OpenURL::ContextObjectEntity)
521
+ @referringEntity=entity
522
+ end
523
+
524
+ def requestor=(entity)
525
+ raise ArgumentError, "Requestor must be an OpenURL::ContextObjectEntity" unless entity.is_a?(OpenURL::ContextObjectEntity)
526
+ @requestor=entity
527
+ end
528
+
529
+ protected
530
+
531
+ def import_entity(node)
532
+ entities = {"rft"=>:@referent, "rfr"=>:@referrer, "rfe"=>:@referringEntity,"req"=>:@requestor,
533
+ "svc"=>:@serviceType,"res"=>:@resolver}
534
+
535
+ ent = @@defined_entities.keys[@@defined_entities.values.index(node.name())]
536
+
537
+
538
+ metalib_workaround(node)
539
+
540
+ unless ["svc","res"].index(ent)
541
+ self.instance_variable_set(entities[ent], self.set_typed_entity(node))
542
+ entity = self.instance_variable_get(entities[ent])
543
+
544
+
545
+
546
+ self.import_xml_common(entity, node)
547
+ entity.import_xml_metadata(node)
548
+ end
549
+ end
550
+
551
+ def import_svc_node(node)
552
+ if @serviceType[0].empty?
553
+ key = 0
554
+ else
555
+ key = self.add_service_type_entity
556
+ end
557
+ self.import_xml_common(@serviceType[key], node)
558
+ self.import_xml_mbv(@serviceType[key], node)
559
+ end
560
+
561
+ def import_res_node(node)
562
+ if @resolver[0].empty?
563
+ key = 0
564
+ else
565
+ key = self.add_resolver_entity
566
+ end
567
+ self.import_xml_common(@resolver[key], node)
568
+ self.import_xml_mbv(@resolver[key], node)
569
+ end
570
+
571
+ # Determines the proper subclass of ContextObjectEntity to use
572
+ # for given format. Input is an REXML node representing a ctx:referent.
573
+ # Returns ContextObjectEntity.
574
+ def set_typed_entity(node)
575
+ fmt = REXML::XPath.first(node, "./ctx:metadata-by-val/ctx:format", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"})
576
+
577
+ fmt_val = fmt.get_text.value if fmt && fmt.has_text?
578
+
579
+ # Special weird workaround for info sent from metalib.
580
+ # "info:ofi/fmt:xml:xsd" is not actually a legal format
581
+ # identifier, it should have more on the end.
582
+ # XPath should really end in "rft:*" for maximal generality, but
583
+ # REXML doesn't like that.
584
+ if (false && fmt_val && fmt_val == "info:ofi/fmt:xml:xsd")
585
+ metalib_evidence = REXML::XPath.first( node, "./ctx:metadata-by-val/ctx:metadata/rft:journal", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx", "rft"=>"info:ofi/fmt:xml:xsd:journal"})
586
+
587
+ # Okay, even if we don't have that one, do we have a REALLY bad one
588
+ # where Metalib puts an illegal namespace identifier in too?
589
+ metalib_evidence = REXML::XPath.first( node, "./ctx:metadata-by-val/ctx:metadata/rft:journal", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx", "rft"=>"info:ofi/fmt:xml:xsd"}) unless metalib_evidence
590
+
591
+ # metalib didn't advertise it properly, but it's really
592
+ # journal format.
593
+ fmt_val = "info:ofi/fmt:xml:xsd:journal" if metalib_evidence
594
+ end
595
+
596
+ if fmt_val
597
+ return OpenURL::ContextObjectEntityFactory.format(fmt_val)
598
+ else
599
+ return OpenURL::ContextObjectEntity.new
600
+ end
601
+ end
602
+
603
+ # Parses the data that should apply to all XML context objects
604
+ def import_xml_common(ent, node)
605
+
606
+ REXML::XPath.each(node, "./ctx:identifier", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"}) do | id |
607
+ ent.add_identifier(id.get_text.value) if id and id.has_text?
608
+ end
609
+
610
+ priv = REXML::XPath.first(node, "./ctx:private-data", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"})
611
+ ent.set_private_data(priv.get_text.value) if priv and priv.has_text?
612
+
613
+ ref = REXML::XPath.first(node, "./ctx:metadata-by-ref", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"})
614
+ if ref
615
+ reference = {}
616
+ ref.to_a.each do |r|
617
+ if r.name() == "format"
618
+ reference[:format] = r.get_text.value if r.get_text
619
+ else
620
+ reference[:location] = r.get_text.value
621
+ end
622
+ end
623
+ ent.set_reference(reference[:location], reference[:format])
624
+ end
625
+ end
626
+
627
+ # Pass in a REXML element representing an entity.
628
+ # Special weird workaround for info sent from metalib.
629
+ # Metalib uses "info:ofi/fmt:xml:xsd" as a format identifier, and
630
+ # sometimes even as a namespace identifier for a <journal> element.
631
+ # It's not legal for either. It messes up our parsing. The identifier
632
+ # should have something else on the end ":journal", ":book", etc.
633
+ # We tack ":journal" on the end if we find this unspecified
634
+ # but it contains a <journal> element.
635
+ # XPath should really end in "rft:*" for maximal generality, but
636
+ # REXML doesn't like that.
637
+ def metalib_workaround(node)
638
+ # Metalib fix
639
+ # Fix awful illegal Metalib XML
640
+ fmt = REXML::XPath.first(node, "./ctx:metadata-by-val/ctx:format", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"})
641
+ if ( fmt && fmt.text == "info:ofi/fmt:xml:xsd")
642
+ metadata_by_val = node.children.find {|e| e.respond_to?(:name) && e.name == 'metadata-by-val' }
643
+
644
+ # Find a "journal" element to make sure forcing to ":journal" is a good
645
+ # idea, and to later
646
+ # fix the journal namespace if needed
647
+ metadata = metadata_by_val.children.find {|e| e.respond_to?(:name) && e.name == 'metadata' } if metadata_by_val
648
+ journal = metadata.find {|e| e.respond_to?(:name) && e.name == 'journal' } if metadata
649
+
650
+ # Fix the format only if there's a <journal> element in there.
651
+ fmt = metadata_by_val.children.find {|e| e.respond_to?(:name) && e.name == 'format' } if metadata_by_val && journal
652
+ fmt.text = "info:ofi/fmt:xml:xsd:journal" if fmt
653
+
654
+ if (journal && journal.namespace == "info:ofi/fmt:xml:xsd")
655
+ journal.add_namespace("xmlns:rft", "info:ofi/fmt:xml:xsd:journal")
656
+ end
657
+ end
658
+ end
659
+
660
+ end
661
+
662
+
663
+ end
@@ -1,5 +1,7 @@
1
1
  # encoding: UTF-8
2
2
 
3
+ require 'ensure_valid_encoding'
4
+
3
5
  module OpenURL
4
6
 
5
7
  if RUBY_VERSION < '1.9'
@@ -34,7 +36,8 @@ module OpenURL
34
36
  # ctx.kev
35
37
  # ctx.xml
36
38
  class ContextObject
37
-
39
+ include EnsureValidEncoding
40
+
38
41
  attr_reader :admin, :referent, :referringEntity, :requestor, :referrer,
39
42
  :serviceType, :resolver
40
43
  attr_accessor :foreign_keys, :openurl_ver
@@ -236,7 +239,9 @@ module OpenURL
236
239
  # entities
237
240
 
238
241
  def import_xml(xml)
239
- if xml.is_a?(String)
242
+ if xml.is_a?(String)
243
+ xml.force_encoding("UTF-8") if xml.respond_to? :force_encoding
244
+ ensure_valid_encoding!(xml, :invalid => :replace)
240
245
  doc = REXML::Document.new xml.gsub(/>[\s\t]*\n*[\s\t]*</, '><').strip
241
246
  elsif xml.is_a?(REXML::Document)
242
247
  doc = xml
@@ -274,10 +279,47 @@ module OpenURL
274
279
  return co
275
280
  end
276
281
 
277
- # Imports an existing hash of ContextObject values and sets the appropriate
278
- # entities.
279
282
 
280
- def import_hash(hash)
283
+ # Takes a hash of openurl key/values, as output by CGI.parse
284
+ # from a query string for example (values can be strings or arrays
285
+ # of string). Mutates hash in place.
286
+ #
287
+ # Force encodes to UTF8 or 8859-1, depending on ctx_enc
288
+ # presence and value.
289
+ #
290
+ # Replaces any illegal bytes with replacement chars,
291
+ # transcodes to UTF-8 if needed to ensure UTF8 on way out.
292
+ def clean_char_encoding!(hash)
293
+ # Bail if we're not in ruby 1.9
294
+ return unless "".respond_to? :encoding
295
+
296
+ source_encoding = "UTF-8"
297
+ if hash["ctx_enc"] == "info:ofi/enc:ISO-8859-1"
298
+ hash.delete("ctx_enc")
299
+ source_encoding = "ISO-8859-1"
300
+ end
301
+
302
+ hash.each_pair do |key, values|
303
+ # get a list of all terminal values, whether wrapped
304
+ # in arrays or not. We're going to mutate them.
305
+ [values].flatten.each do | v |
306
+ v.force_encoding(source_encoding)
307
+ if source_encoding == "UTF-8"
308
+ ensure_valid_encoding!(v, :invalid => :replace, :undef => :replace )
309
+ else
310
+ # transcode, replacing any bad chars.
311
+ v.encode!("UTF-8", :invalid => :replace)
312
+ end
313
+ end
314
+ end
315
+
316
+ end
317
+
318
+ # Imports an existing hash of ContextObject values and sets the appropriate
319
+ # entities.
320
+ def import_hash(hash)
321
+ clean_char_encoding!(hash)
322
+
281
323
  ref = {}
282
324
  {"@referent"=>"rft", "@referrer"=>"rfr", "@referringEntity"=>"rfe",
283
325
  "@requestor"=>"req"}.each do | ent, abbr |
@@ -387,9 +429,10 @@ module OpenURL
387
429
  end
388
430
  end
389
431
 
390
- # Initialize a new ContextObject object from an existing key/value hash
432
+
391
433
 
392
- def self.new_from_hash(hash)
434
+ # Initialize a new ContextObject object from an existing key/value hash
435
+ def self.new_from_hash(hash)
393
436
  co = self.new
394
437
  co.import_hash(hash)
395
438
  return co
@@ -563,8 +606,7 @@ module OpenURL
563
606
 
564
607
  # Parses the data that should apply to all XML context objects
565
608
  def import_xml_common(ent, node)
566
-
567
-
609
+
568
610
  REXML::XPath.each(node, "./ctx:identifier", {"ctx"=>"info:ofi/fmt:xml:xsd:ctx"}) do | id |
569
611
  ent.add_identifier(id.get_text.value) if id and id.has_text?
570
612
  end
@@ -1,6 +1,5 @@
1
1
  # encoding: UTF-8
2
2
 
3
-
4
3
  require 'yaml'
5
4
 
6
5
  unless "".respond_to?(:encoding)
@@ -46,6 +45,67 @@ else
46
45
  assert_equal("UTF-8", ctx.xml.encoding.name)
47
46
  end
48
47
 
48
+ # includes byte that are bad for UTF8, OpenURL auto replaces em.
49
+ def test_bad_kev
50
+ raw_kev = "&rft.pub=M\xE9xico".force_encoding("ascii-8bit")
51
+
52
+ ctx = OpenURL::ContextObject.new_from_kev(raw_kev)
53
+
54
+
55
+ assert_equal("UTF-8", ctx.referent.metadata['pub'].encoding.name)
56
+ # replacement char
57
+ assert_equal "M\uFFFDxico", ctx.referent.metadata['pub']
58
+
59
+ # serialized as utf-8i
60
+ assert_equal("UTF-8", ctx.kev.encoding.name)
61
+
62
+ end
63
+
64
+ def test_bad_xml
65
+ ctx = OpenURL::ContextObject.new_from_xml(@@xml_with_bad_utf8)
66
+
67
+ assert_equal("UTF-8", ctx.referent.metadata['btitle'].encoding.name)
68
+ assert_equal("D\uFFFDpendances et niveaux de représentation en syntaxe", ctx.referent.metadata["btitle"])
69
+
70
+ # serialized as utf-8
71
+ assert_equal("UTF-8", ctx.xml.encoding.name)
72
+
73
+ end
74
+
75
+ def test_bad_from_form_vars
76
+ ctx = OpenURL::ContextObject.new_from_form_vars("btitle" => "M\xE9xico".force_encoding("binary"))
77
+
78
+ assert_equal("UTF-8", ctx.referent.metadata['btitle'].encoding.name)
79
+ assert_equal("M\uFFFDxico", ctx.referent.metadata["btitle"])
80
+
81
+ assert_equal("UTF-8", ctx.kev.encoding.name)
82
+ end
83
+
84
+ def test_8859_kev
85
+ # specify 8859-1 encoding.
86
+ raw_kev = "ctx_enc=info%3Aofi%2Fenc%3AISO-8859-1"
87
+ raw_kev += "&url_ver=Z39.88-2004&rft.btitle=M%E9xico".force_encoding("ascii-8bit")
88
+
89
+ ctx = OpenURL::ContextObject.new_from_kev(raw_kev)
90
+
91
+ # properly transcoded to UTF8
92
+ assert_equal("UTF-8", ctx.referent.metadata['btitle'].encoding.name)
93
+ assert_equal("México".force_encoding("UTF-8"), ctx.referent.metadata["btitle"])
94
+
95
+ # serialized as utf-8
96
+ assert_equal("UTF-8", ctx.kev.encoding.name)
97
+ # with proper ctx_env, not the one previously specifying ISO-8859-1!
98
+
99
+ assert_not_equal "info:ofi/enc:ISO-8859-1", CGI.parse(ctx.kev)["ctx_enc"].first
100
+ end
101
+
102
+ def test_8859_form_vars
103
+ ctx = OpenURL::ContextObject.new_from_form_vars("btitle" => "M\xE9xico", "ctx_enc" => "info:ofi/enc:ISO-8859-1")
104
+
105
+ assert_equal("UTF-8", ctx.referent.metadata['btitle'].encoding.name)
106
+ assert_equal("México".force_encoding("UTF-8"), ctx.referent.metadata["btitle"])
107
+ end
108
+
49
109
 
50
110
 
51
111
  @@xml_with_utf8 = <<eos
@@ -53,6 +113,12 @@ else
53
113
  eos
54
114
  # Make sure it's got a raw encoding, so we can test it winds up utf-8 anyhow
55
115
  @@xml_with_utf8.force_encoding("ascii-8bit")
116
+
117
+ @@xml_with_bad_utf8 = <<eos
118
+ <ctx:context-objects xmlns:ctx='info:ofi/fmt:xml:xsd:ctx' xsi:schemaLocation='info:ofi/fmt:xml:xsd:ctx http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><ctx:context-object identifier='10_8' timestamp='2003-04-11T10:08:30TZD' version='Z39.88-2004'><ctx:referent><ctx:metadata-by-val><ctx:format>info:ofi/fmt:xml:xsd:book</ctx:format><ctx:metadata><rft:book xmlns:rft='info:ofi/fmt:xml:xsd:book' xsi:schemaLocation='info:ofi/fmt:xml:xsd:book http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:book'><rft:genre>book</rft:genre><rft:btitle>D\xE9pendances et niveaux de représentation en syntaxe</rft:btitle><rft:date>1985</rft:date><rft:pub>Benjamins</rft:pub><rft:place>Amsterdam, Philadelphia</rft:place><rft:authors><rft:author><rft:aulast>Vergnaud</rft:aulast><rft:auinit>J.-R</rft:auinit></rft:author></rft:authors></rft:book></ctx:metadata></ctx:metadata-by-val></ctx:referent><ctx:referring-entity><ctx:metadata-by-val><ctx:format>info:ofi/fmt:xml:xsd:book</ctx:format><ctx:metadata><rfe:book xmlns:rfe='info:ofi/fmt:xml:xsd:book' xsi:schemaLocation='info:ofi/fmt:xml:xsd:book http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:book'><rfe:genre>book</rfe:genre><rfe:btitle>Minimalist Program</rfe:btitle><rfe:isbn>0262531283</rfe:isbn><rfe:date>1995</rfe:date><rfe:pub>The MIT Press</rfe:pub><rfe:place>Cambridge, Mass</rfe:place><rfe:authors><rfe:author><rfe:aulast>Chomsky</rfe:aulast><rfe:auinit>N</rfe:auinit></rfe:author></rfe:authors></rfe:book></ctx:metadata></ctx:metadata-by-val><ctx:identifier>urn:isbn:0262531283</ctx:identifier></ctx:referring-entity><ctx:referrer><ctx:identifier>info:sid/ebookco.com:bookreader</ctx:identifier></ctx:referrer><ctx:service-type><ctx:metadata-by-val><ctx:format>info:ofi/fmt:xml:xsd:sch_svc</ctx:format><ctx:metadata><svc:abstract xmlns:svc='info:ofi/fmt:xml:xsd:sch_svc'>yes</svc:abstract></ctx:metadata></ctx:metadata-by-val></ctx:service-type></ctx:context-object></ctx:context-objects>
119
+ eos
120
+ @@xml_with_bad_utf8.force_encoding("ascii-8bit")
121
+
56
122
 
57
123
  end
58
124
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: openurl
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.4.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2011-10-10 00:00:00.000000000 Z
13
+ date: 2012-07-10 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: marc
@@ -28,6 +28,22 @@ dependencies:
28
28
  - - ! '>='
29
29
  - !ruby/object:Gem::Version
30
30
  version: '0'
31
+ - !ruby/object:Gem::Dependency
32
+ name: ensure_valid_encoding
33
+ requirement: !ruby/object:Gem::Requirement
34
+ none: false
35
+ requirements:
36
+ - - ! '>='
37
+ - !ruby/object:Gem::Version
38
+ version: '0'
39
+ type: :runtime
40
+ prerelease: false
41
+ version_requirements: !ruby/object:Gem::Requirement
42
+ none: false
43
+ requirements:
44
+ - - ! '>='
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
31
47
  description:
32
48
  email:
33
49
  - rochkind@jhu.edu
@@ -47,6 +63,7 @@ files:
47
63
  - lib/openurl/metadata_formats/marc.rb
48
64
  - lib/openurl/metadata_formats/patent.rb
49
65
  - lib/openurl/metadata_formats/dissertation.rb
66
+ - lib/openurl/context_object-SAVED
50
67
  - lib/openurl/transport.rb
51
68
  - lib/openurl/context_object.rb
52
69
  - test/data/metalib_sap2_post_params.yml
@@ -84,4 +101,14 @@ rubygems_version: 1.8.24
84
101
  signing_key:
85
102
  specification_version: 3
86
103
  summary: a Ruby library to create, parse and use NISO Z39.88 OpenURLs
87
- test_files: []
104
+ test_files:
105
+ - test/data/metalib_sap2_post_params.yml
106
+ - test/data/dc_ctx.xml
107
+ - test/data/yu.xml
108
+ - test/data/scholarly_au_ctx.xml
109
+ - test/data/marc_ctx.xml
110
+ - test/context_object_entity_test.rb
111
+ - test/test.yml
112
+ - test/context_object_test.rb
113
+ - test/encoding_test.rb
114
+ - test/scholarly_common_test.rb