traject 1.0.0.beta.1 → 1.0.0.beta.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 19fae7c4d428bc40fd5c568f13daa7583fcebb2a
4
- data.tar.gz: 9c71c57d86f6b9585451e958596a757cc75c432d
3
+ metadata.gz: cb35b4c5ba302cb865b459bfac6859ef6be68927
4
+ data.tar.gz: 14964b88428d0a827932cbf17194a77c56de1091
5
5
  SHA512:
6
- metadata.gz: 6730cd527d1ea285a5d60ebddeb95950dae2fdb5c548b905627eddf7a4a985e1891a039f409577903f181c60613861a3798fede6d5ecc8a942140cbb7a2d63d4
7
- data.tar.gz: ee66b43f7bed57fcb21d82e7cd82c6e49c973bf7780b7a9f9c95fb5afc28744ffdb03e63d9d9b28bf567123571fa760ea9fed30fa62d20cf0ebfd0ef95d5880a
6
+ metadata.gz: b3ae114fe4a11baaf6f6470d35d5df339a32b8f7e747f012310ee90ed55b982754d196abcc25ec0375d8bf988ca5e3ebd2c71ef7df159bc7007e6cdae9c69643
7
+ data.tar.gz: 74c6c6170860cf1cd4883d644f9c23151a703ff996592abb8668a655dcff885c108c7e04e13419f6c4ba727384b3bab3dff896c177382284b8d12b98cc711225
data/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
  Tools for reading MARC records, transforming them with indexing rules, and indexing to Solr.
4
4
  Might be used to index MARC data for a Solr-based discovery product like [Blacklight](https://github.com/projectblacklight/blacklight) or [VUFind](http://vufind.org/).
5
5
 
6
- Traject might also be generalized to a set of tools for getting structured data from a source, and sending it to a destination.
6
+ Traject might also be generalized to a set of tools for getting structured data from a source, and transforming it to a hash-like object to send to a destination.
7
7
 
8
8
 
9
9
  **Traject is nearing 1.0, it is robust, feature-rich and being used in production by authors -- feedback invited**
@@ -7,7 +7,6 @@ require 'traject/indexer/settings'
7
7
  require 'traject/marc_reader'
8
8
  require 'traject/marc4j_reader'
9
9
  require 'traject/json_writer'
10
- require 'traject/solrj_writer'
11
10
 
12
11
  require 'traject/macros/marc21'
13
12
  require 'traject/macros/basic'
@@ -71,9 +70,10 @@ require 'traject/macros/basic'
71
70
  # 4) Optionally implements a #skipped_record_count method, returning int count of records
72
71
  # that were skipped due to errors (and presumably logged)
73
72
  #
74
- # The default writer (will be) the SolrWriter , which is configured
75
- # through additional Settings as well. A JsonWriter is also available,
76
- # which can be useful for debugging your index mappings.
73
+ # The default writer is the SolrJWriter, using Java SolrJ to
74
+ # write to a Solr. A few other built-in writers are available,
75
+ # but it's anticipated more will be created as plugins or local
76
+ # code for special purposes.
77
77
  #
78
78
  # You can set alternate writers by setting a Class object directly
79
79
  # with the #writer_class method, or by the 'writer_class_name' Setting,
@@ -191,8 +191,18 @@ module Traject::Macros
191
191
  #
192
192
  # Returns altered string, doesn't change original arg.
193
193
  def self.trim_punctuation(str)
194
+
195
+ # If something went wrong and we got a nil, just return it
196
+ return str unless str
197
+
198
+ # trailing: comma, slash, semicolon, colon (possibly preceded and followed by whitespace)
194
199
  str = str.sub(/ *[ ,\/;:] *\Z/, '')
195
- str = str.sub(/ *(\w\w\w)\. *\Z/, '\1')
200
+
201
+ # trailing period if it is preceded by at least three letters (possibly preceded and followed by whitespace)
202
+ str = str.sub(/( *\w\w\w)\. *\Z/, '\1')
203
+
204
+ # single square bracket characters if they are the start and/or end
205
+ # chars and there are no internal square brackets.
196
206
  str = str.sub(/\A\[?([^\[\]]+)\]?\Z/, '\1')
197
207
  return str
198
208
  end
@@ -1,3 +1,5 @@
1
+ # Encoding: UTF-8
2
+
1
3
  require 'traject/marc_extractor'
2
4
 
3
5
  module Traject::Macros
@@ -81,7 +83,8 @@ module Traject::Macros
81
83
  # 245 a and b, with non-filing characters stripped off
82
84
  def marc_sortable_title
83
85
  lambda do |record, accumulator|
84
- accumulator << Marc21Semantics.get_sortable_title(record)
86
+ st = Marc21Semantics.get_sortable_title(record)
87
+ accumulator << st if st
85
88
  end
86
89
  end
87
90
 
@@ -503,6 +506,71 @@ module Traject::Macros
503
506
  end
504
507
  end
505
508
 
509
+ # Extracts LCSH-carrying fields, and formatting them
510
+ # as a pre-coordinated LCSH string, for instance suitable for including
511
+ # in a facet.
512
+ #
513
+ # You can supply your own list of fields as a spec, but for significant
514
+ # customization you probably just want to write your own method in
515
+ # terms of the Marc21Semantics.assemble_lcsh method.
516
+ def marc_lcsh_formatted(options = {})
517
+ spec = options[:spec] || "600:610:611:630:648:650:651:654:6662"
518
+ subd_separator = options[:subdivison_separator] || " — "
519
+ other_separator = options[:other_separator] || " "
520
+
521
+ extractor = MarcExtractor.new(spec)
522
+
523
+ return lambda do |record, accumulator|
524
+ accumulator.concat( extractor.collect_matching_lines(record) do |field, spec|
525
+ Marc21Semantics.assemble_lcsh(field, subd_separator, other_separator)
526
+ end)
527
+ end
528
+
529
+ end
530
+
531
+ # Takes a MARC::Field and formats it into a pre-coordinated LCSH string
532
+ # with subdivision seperators in the right place.
533
+ #
534
+ # For 600 fields especially, need to not just join with subdivision seperator
535
+ # to take acount of $a$d$t -- for other fields, might be able to just
536
+ # join subfields, not sure.
537
+ #
538
+ # WILL strip trailing period from generated string, contrary to some LCSH practice.
539
+ # Our data is inconsistent on whether it has period or not, this was
540
+ # the easiest way to standardize.
541
+ #
542
+ # Default subdivision seperator is em-dash with spaces, set to '--' if you want.
543
+ #
544
+ # Cite: "Dash (-) that precedes a subdivision in an extended 600 subject heading
545
+ # is not carried in the MARC record. It may be system generated as a display constant
546
+ # associated with the content of subfield $v, $x, $y, and $z."
547
+ # http://www.loc.gov/marc/bibliographic/bd600.html
548
+ def self.assemble_lcsh(marc_field, subd_separator = " — ", other_separator = " ")
549
+ str = ""
550
+ subd_prefix_codes = %w{v x y z}
551
+
552
+
553
+ marc_field.subfields.each_with_index do |sf, i|
554
+ # ignore non-alphabetic, like numeric control subfields
555
+ next unless sf.code =~ /\A[a-z]\Z/
556
+
557
+ prefix = if subd_prefix_codes.include? sf.code
558
+ subd_separator
559
+ elsif i == 0
560
+ ""
561
+ else
562
+ other_separator
563
+ end
564
+ str << prefix << sf.value
565
+ end
566
+
567
+ str.gsub!(/\.\Z/, '')
568
+
569
+ return nil if str == ""
570
+
571
+ return str
572
+ end
573
+
506
574
 
507
575
  end
508
576
  end
@@ -1,3 +1,3 @@
1
1
  module Traject
2
- VERSION = "1.0.0.beta.1"
2
+ VERSION = "1.0.0.beta.2"
3
3
  end
@@ -1,3 +1,5 @@
1
+ # Encoding: UTF-8
2
+
1
3
  require 'test_helper'
2
4
 
3
5
  require 'traject/indexer'
@@ -231,7 +233,52 @@ describe "Traject::Macros::Marc21Semantics" do
231
233
  assert_equal ["Early modern, 1500-1700", "17th century", "Great Britain: Puritan Revolution, 1642-1660", "Great Britain: Civil War, 1642-1649", "1642-1660"],
232
234
  output["era_facet"]
233
235
  end
236
+ end
237
+
238
+ describe "marc_lcsh_display" do
239
+ it "formats typical field" do
240
+ field = MARC::DataField.new('650', ' ', ' ', ['a', 'Psychoanalysis and literature'], ['z', 'England'], ['x', 'History'], ['y', '19th century.'])
241
+ str = Marc21Semantics.assemble_lcsh(field)
242
+
243
+ assert_equal "Psychoanalysis and literature — England — History — 19th century", str
244
+ end
245
+
246
+ it "ignores numeric subfields" do
247
+ field = MARC::DataField.new('650', ' ', ' ', ['a', 'Psychoanalysis and literature'], ['x', 'History'], ['0', '01234'], ['3', 'Some part'])
248
+ str = Marc21Semantics.assemble_lcsh(field)
249
+
250
+ assert_equal "Psychoanalysis and literature — History", str
251
+ end
252
+
253
+ it "doesn't put subdivision in wrong place" do
254
+ field = MARC::DataField.new('600', ' ', ' ', ['a', 'Eliot, George,'],['d', '1819-1880.'], ['t', 'Middlemarch'])
255
+ str = Marc21Semantics.assemble_lcsh(field)
256
+
257
+ assert_equal "Eliot, George, 1819-1880. Middlemarch", str
258
+ end
259
+
260
+ it "mixes non-subdivisions with subdivisions" do
261
+ field = MARC::DataField.new('600', ' ', ' ', ['a', 'Eliot, George,'],['d', '1819-1880.'], ['t', 'Middlemarch'], ['x', 'Criticism.'])
262
+ str = Marc21Semantics.assemble_lcsh(field)
263
+
264
+ assert_equal "Eliot, George, 1819-1880. Middlemarch — Criticism", str
265
+ end
266
+
267
+ it "returns nil for a field with no relevant subfields" do
268
+ field = MARC::DataField.new('650', ' ', ' ')
269
+ assert_nil Marc21Semantics.assemble_lcsh(field)
270
+ end
271
+
272
+ describe "marc_lcsh_formatted macro" do
273
+ it "smoke test" do
274
+ @record = MARC::Reader.new(support_file_path "george_eliot.marc").to_a.first
275
+ @indexer.instance_eval {to_field "lcsh", marc_lcsh_formatted}
276
+ output = @indexer.map_record(@record)
234
277
 
278
+ assert output["lcsh"].length > 0, "outputs data"
279
+ assert output["lcsh"].include?("Eliot, George, 1819-1880 — Characters"), "includes a string its supposed to"
280
+ end
281
+ end
235
282
  end
236
283
 
237
284
  describe "extract_marc_filing_version" do
@@ -272,7 +319,6 @@ describe "Traject::Macros::Marc21Semantics" do
272
319
  end
273
320
  end
274
321
 
275
-
276
322
  end
277
323
 
278
324
 
@@ -97,6 +97,9 @@ describe "Traject::Macros::Marc21" do
97
97
  assert_equal "one two three", Marc21.trim_punctuation("one two three]")
98
98
  assert_equal "one two three", Marc21.trim_punctuation("[one two three")
99
99
  assert_equal "one two three", Marc21.trim_punctuation("[one two three]")
100
+
101
+ # This one was a bug before
102
+ assert_equal "Feminism and art", Marc21.trim_punctuation("Feminism and art.")
100
103
  end
101
104
 
102
105
  it "uses :translation_map" do
@@ -20,7 +20,7 @@ extend Traject::Macros::MarcFormats
20
20
  # files however you like, you can call traject with as many
21
21
  # config files as you like, `traject -c one.rb -c two.rb -c etc.rb`
22
22
  settings do
23
- provide "solr.url", "http://blacklight.mse.jhu.edu:8983/solr/prod"
23
+ provide "solr.url", "http://solr.somewhere.edu:8983/solr/corename"
24
24
 
25
25
  # Only if you need to connect to a Solr 1.x:
26
26
  provide "solrj_writer.parser_class_name", "XMLResponseParser"
@@ -0,0 +1 @@
1
+ 01359cam a2200361 a 4500001000800000005001700008008004100025010001700066020002800083020003500111035001600146040001300162043001200175049000900187050002500196082001500221100002200236245009700258260005800355300002700413440004600440504006400486600005400550600004700604600004400651600004300695650006700738650005800805650002900863910002600892994001200918991006700930232964520030805093128.0020925s2003 nyu b s001 0 eng  a 2002036483 a0791458334 (alk. paper) a0791458342 (pbk. : alk. paper) aocm50737282 aDLCcDLC ae-uk-en aJHEE00aPR4692.P74bP37 200300a823/.82211 aParis, Bernard J.10aRereading George Eliot :bchanging responses to her experiments in life /cBernard J. Paris. aAlbany :bState University of New York Press,cc2003. axiii, 220 p. ;c23 cm. 0aSUNY series in psychoanalysis and culture aIncludes bibliographical references (p. 213-215) and index.10aEliot, George,d1819-1880xKnowledgexPsychology.10aEliot, George,d1819-1880.tDaniel Deronda10aEliot, George,d1819-1880.tMiddlemarch10aEliot, George,d1819-1880xCharacters. 0aPsychoanalysis and literaturezEnglandxHistoryy19th century. 0aPsychological fiction, EnglishxHistory and criticism 0aPsychology in literature a2329645bHorizon bib# aE0bJHE aPR4692.P74 P37 2003flcbelc1cc. 1q0i3857076lembluememsel
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: traject
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0.beta.1
4
+ version: 1.0.0.beta.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Rochkind
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-10-14 00:00:00.000000000 Z
12
+ date: 2013-10-17 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: marc
@@ -216,6 +216,7 @@ files:
216
216
  - test/test_support/date_with_u.marc
217
217
  - test/test_support/demo_config.rb
218
218
  - test/test_support/emptyish_record.marc
219
+ - test/test_support/george_eliot.marc
219
220
  - test/test_support/hebrew880s.marc
220
221
  - test/test_support/louis_armstrong.marc
221
222
  - test/test_support/manufacturing_consent.marc
@@ -313,6 +314,7 @@ test_files:
313
314
  - test/test_support/date_with_u.marc
314
315
  - test/test_support/demo_config.rb
315
316
  - test/test_support/emptyish_record.marc
317
+ - test/test_support/george_eliot.marc
316
318
  - test/test_support/hebrew880s.marc
317
319
  - test/test_support/louis_armstrong.marc
318
320
  - test/test_support/manufacturing_consent.marc