traject 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (64) hide show
  1. data/.gitignore +18 -0
  2. data/Gemfile +4 -0
  3. data/LICENSE.txt +22 -0
  4. data/README.md +346 -0
  5. data/Rakefile +16 -0
  6. data/bin/traject +153 -0
  7. data/doc/macros.md +103 -0
  8. data/doc/settings.md +34 -0
  9. data/lib/traject.rb +10 -0
  10. data/lib/traject/indexer.rb +196 -0
  11. data/lib/traject/json_writer.rb +51 -0
  12. data/lib/traject/macros/basic.rb +9 -0
  13. data/lib/traject/macros/marc21.rb +145 -0
  14. data/lib/traject/marc_extractor.rb +206 -0
  15. data/lib/traject/marc_reader.rb +61 -0
  16. data/lib/traject/qualified_const_get.rb +30 -0
  17. data/lib/traject/solrj_writer.rb +120 -0
  18. data/lib/traject/translation_map.rb +184 -0
  19. data/lib/traject/version.rb +3 -0
  20. data/test/indexer/macros_marc21_test.rb +146 -0
  21. data/test/indexer/macros_test.rb +40 -0
  22. data/test/indexer/map_record_test.rb +120 -0
  23. data/test/indexer/read_write_test.rb +47 -0
  24. data/test/indexer/settings_test.rb +65 -0
  25. data/test/marc_extractor_test.rb +168 -0
  26. data/test/marc_reader_test.rb +29 -0
  27. data/test/solrj_writer_test.rb +106 -0
  28. data/test/test_helper.rb +28 -0
  29. data/test/test_support/hebrew880s.marc +1 -0
  30. data/test/test_support/manufacturing_consent.marc +1 -0
  31. data/test/test_support/test_data.utf8.marc.xml +2609 -0
  32. data/test/test_support/test_data.utf8.mrc +1 -0
  33. data/test/translation_map_test.rb +98 -0
  34. data/test/translation_maps/bad_ruby.rb +8 -0
  35. data/test/translation_maps/bad_yaml.yaml +1 -0
  36. data/test/translation_maps/both_map.rb +1 -0
  37. data/test/translation_maps/both_map.yaml +1 -0
  38. data/test/translation_maps/default_literal.rb +10 -0
  39. data/test/translation_maps/default_passthrough.rb +10 -0
  40. data/test/translation_maps/marc_040a_translate_test.yaml +1 -0
  41. data/test/translation_maps/ruby_map.rb +10 -0
  42. data/test/translation_maps/translate_array_test.yaml +8 -0
  43. data/test/translation_maps/yaml_map.yaml +7 -0
  44. data/traject.gemspec +30 -0
  45. data/vendor/solrj/README +8 -0
  46. data/vendor/solrj/build.xml +39 -0
  47. data/vendor/solrj/ivy.xml +16 -0
  48. data/vendor/solrj/lib/commons-codec-1.7.jar +0 -0
  49. data/vendor/solrj/lib/commons-io-2.1.jar +0 -0
  50. data/vendor/solrj/lib/httpclient-4.2.3.jar +0 -0
  51. data/vendor/solrj/lib/httpcore-4.2.2.jar +0 -0
  52. data/vendor/solrj/lib/httpmime-4.2.3.jar +0 -0
  53. data/vendor/solrj/lib/jcl-over-slf4j-1.6.6.jar +0 -0
  54. data/vendor/solrj/lib/jul-to-slf4j-1.6.6.jar +0 -0
  55. data/vendor/solrj/lib/log4j-1.2.16.jar +0 -0
  56. data/vendor/solrj/lib/noggit-0.5.jar +0 -0
  57. data/vendor/solrj/lib/slf4j-api-1.6.6.jar +0 -0
  58. data/vendor/solrj/lib/slf4j-log4j12-1.6.6.jar +0 -0
  59. data/vendor/solrj/lib/solr-solrj-4.3.1-javadoc.jar +0 -0
  60. data/vendor/solrj/lib/solr-solrj-4.3.1-sources.jar +0 -0
  61. data/vendor/solrj/lib/solr-solrj-4.3.1.jar +0 -0
  62. data/vendor/solrj/lib/wstx-asl-3.2.7.jar +0 -0
  63. data/vendor/solrj/lib/zookeeper-3.4.5.jar +0 -0
  64. metadata +264 -0
@@ -0,0 +1,103 @@
1
+ # Traject Indexing 'Macros'
2
+
3
+ Traject macros are a way of providing re-usable index mapping rules. Before we discuss how they work, we need to remind ourselves of the basic/direct Traject `to_field` indexing method.
4
+
5
+ ## Review and details of direct indexing logic
6
+
7
+ Here's the simplest possible direct Traject mapping logic, duplicating the effects of the `literal` function:
8
+
9
+ ~~~ruby
10
+ to_field("title") do |record, accumulator, context|
11
+ accumulator << "FIXED LITERAL"
12
+ end
13
+ ~~~
14
+
15
+ That `do` is just ruby `block` syntax, whereby we can pass a block of ruby code as an argument to to a ruby method. We pass a block taking three arguments, labelled `record`, `accumulator`, and `context`, to the `to_field` method.
16
+
17
+ The block is then stored by the Traject::Indexer, and called for each record indexed. When it's called, it's passed the particular record at hand for the first argument, an Array used as an 'accumulator' as the second argument, and a Traject::Indexer::Context as the third argument.
18
+
19
+ The code in the block can add values to the accumulator array, which the Traject::Indexer then adds to the field specified by `to_field`.
20
+
21
+ It's also worth pointing out that ruby blocks are `closures`, so they can "capture" and use values from outside the block. So this would work too:
22
+
23
+ ~~~ruby
24
+ my_var = "FIXED LITERAL"
25
+ to_field("title") do |record, accumulator, context|
26
+ accumulator << my_var
27
+ end
28
+ ~~~
29
+
30
+ So that's the way to provide direct logic for mapping rules.
31
+
32
+ ## Macros
33
+
34
+ A Traject macro is a way to automatically create indexing rules via re-usable "templates".
35
+
36
+ Traject macros are simply methods that return ruby lambda/proc objects. A ruby lambda is just another syntax for creating blocks of ruby logic that can be passed around as data.
37
+
38
+ So, for instance, we could capture that fixed literal block in a lambda like this:
39
+
40
+ ~~~ruby
41
+ always_add_black = lambda do |record, accumulator, context|
42
+ accumulator << "BLACK"
43
+ end
44
+ ~~~
45
+
46
+ Then, knowing that the `to_field` ruby method takes a block, we can use the ruby `&` operator
47
+ to convert our lambda to a block argument. This would in fact work:
48
+
49
+ ~~~ruby
50
+ to_field "color", &always_add_black
51
+ ~~~
52
+
53
+ However, for convenience, the `to_field` method can take a lambda directly (without having to use '&' to convert it to a block argument) as a second argument too. So this would work too:
54
+
55
+ ~~~ruby
56
+ to_field "color", always_add_black
57
+ ~~~
58
+
59
+ A macro is jus more step, using a method to create lambdas dynamically: A Traject macro is just a ruby method that **returns** a lambda, a three-arg lambda like `to_field` wants.
60
+
61
+ Here is in fact how the `literal` function is implemented:
62
+
63
+ ~~~ruby
64
+ def literal(value)
65
+ return lambda do |record, accumulator, context|
66
+ # because a lambda is a closure, we can define it in terms
67
+ # of the 'value' from the scope it's defined in!
68
+ accumulator << value
69
+ end
70
+ end
71
+ to_field("something"), literal("something")
72
+ ~~~
73
+
74
+ It's really as simple as that, that's all a Traject macro is. A function that takes parameters, and based on those parameters returns a lambda; the lambda is then passed to the `to_field` indexing method, or similar methods.
75
+
76
+ How do you make these methods available to the indexer?
77
+
78
+ Define it in a module:
79
+
80
+ ~~~ruby
81
+ # in a file literal_macro.rb
82
+ module LiteralMacro
83
+ def literal(value)
84
+ return lambda do |record, accumulator, context|
85
+ # because a lambda is a closure, we can define it in terms
86
+ # of the 'value' from the scope it's defined in!
87
+ accumulator << value
88
+ end
89
+ end
90
+ end
91
+ ~~~
92
+
93
+ And then use ordinary ruby `require` and `extend` to add it to the current Indexer file, by simply including this
94
+ in one of your config files:
95
+
96
+ ~~~
97
+ require `literal_macro.rb`
98
+ extend LiteralMacro
99
+
100
+ to_field ...
101
+ ~~~
102
+
103
+ That's it. You can use the traject command line `-I` option to set the ruby load path, so your file will be findable via `require`. Or you can distribute it in a gem, and use straight rubygems and the `gem` command in your configuration file, or Bundler with traject command-line `-g` option.
@@ -0,0 +1,34 @@
1
+ # Traject settings
2
+
3
+ Traject settings are a flat list of key/value pairs -- a single
4
+ Hash, not nested. Keys are always strings, and dots (".") can be
5
+ used for grouping and namespacing.
6
+
7
+ Values are usually strings, but occasionally something else.
8
+
9
+ Settings can be set in configuration files, or on the command
10
+ line.
11
+
12
+ ## Known settings
13
+
14
+ * json_writer.pretty_print: used by the JsonWriter, if set to true, will output pretty printed json (with added whitespace) for easier human readability. Default false.
15
+
16
+ * marc_source.type: default 'binary'. Can also set to 'xml' or (not yet implemented todo) 'json'. Command line shortcut `-t`
17
+
18
+ * reader_class_name: a Traject Reader class, used by the indexer as a source of records. Default Traject::MarcReader. See Traject::Indexer for more info. Command-line shortcut `-r`
19
+
20
+ * solr.url: URL to connect to a solr instance for indexing, eg http://example.org:8983/solr . Command-line short-cut `-u`.
21
+
22
+ * solrj.jar_dir: SolrJWriter needs to load Java .jar files with SolrJ. It will load from a packaged SolrJ, but you can load your own SolrJ (different version etc) by specifying a directory. All *.jar in directory will be loaded.
23
+
24
+ * solr.version: Set to eg "1.4.0", "4.3.0"; currently un-used, but in the future will control
25
+ change some default settings, and/or sanity check and warn you if you're doing something
26
+ that might not work with that version of solr. Set now for help in the future.
27
+
28
+ * solrj_writer.commit_on_close: default false, set to true to have SolrJWriter send an explicit commit message to Solr after indexing.
29
+
30
+ * solrj_writer.parser_class_name: Set to "XMLResponseParser" or "BinaryResponseParser". Will be instantiated and passed to the solrj.SolrServer with setResponseParser. Default nil, use SolrServer default. To talk to a solr 1.x, you will want to set to "XMLResponseParser"
31
+
32
+ * solrj_writer.server_class_name: String name of a solrj.SolrServer subclass to be used by SolrJWriter. Default "HttpSolrServer"
33
+
34
+ * writer_class_name: a Traject Writer class, used by indexer to send processed dictionaries off. Default Traject::SolrJWriter, also available Traject::JsonWriter. See Traject::Indexer for more info. Command line shortcut `-w`
@@ -0,0 +1,10 @@
1
+ require "traject/version"
2
+
3
+ require 'traject/indexer'
4
+
5
+ require 'traject/macros/basic'
6
+ require 'traject/macros/marc21'
7
+
8
+ module Traject
9
+ # Your code goes here...
10
+ end
@@ -0,0 +1,196 @@
1
+ require 'hashie'
2
+
3
+ require 'traject'
4
+ require 'traject/qualified_const_get'
5
+ require 'traject/marc_reader'
6
+ require 'traject/json_writer'
7
+ require 'traject/solrj_writer'
8
+ #
9
+ # == Readers and Writers
10
+ #
11
+ # The Indexer has a modularized architecture for readers and writers, for where
12
+ # source records come from (reader), and where output is sent to (writer).
13
+ #
14
+ # A Reader is any class that:
15
+ # 1) Has a two-argument initializer taking an IO stream and a Settings hash
16
+ # 2) Responds to the usual ruby #each, returning a source record from each #each.
17
+ # (Including Enumerable is prob a good idea too)
18
+ #
19
+ # The default reader is the Traject::MarcReader, who's behavior is
20
+ # further customized by several settings in the Settings hash.
21
+ #
22
+ # Alternate readers can be set directly with the #reader_class= method, or
23
+ # with the "reader_class_name" Setting, a String name of a class
24
+ # meeting the reader contract.
25
+ #
26
+ #
27
+ # A Writer is any class that:
28
+ # 1) Has a one-argument initializer taking a Settings hash.
29
+ # 2) Responds to a one argument #put method, where the argument is
30
+ # a hash of mapped keys/values. The writer should write them
31
+ # to the appropriate place.
32
+ # 3) Responds to a #close method, called when we're done.
33
+ #
34
+ # The default writer (will be) the SolrWriter , which is configured
35
+ # through additional Settings as well. A JsonWriter is also available,
36
+ # which can be useful for debugging your index mappings.
37
+ #
38
+ # You can set alternate writers by setting a Class object directly
39
+ # with the #writer_class method, or by the 'writer_class_name' Setting,
40
+ # with a String name of class meeting the Writer contract.
41
+ #
42
+ class Traject::Indexer
43
+ include Traject::QualifiedConstGet
44
+
45
+ attr_writer :reader_class, :writer_class
46
+
47
+ def initialize
48
+ @settings = Settings.new(self.class.default_settings)
49
+ @index_steps = []
50
+ end
51
+
52
+ # The Indexer's settings are a hash of key/values -- not
53
+ # nested, just one level -- of configuration settings. Keys
54
+ # are strings.
55
+ #
56
+ # The settings method with no arguments returns that hash.
57
+ #
58
+ # With a hash and/or block argument, can be used to set
59
+ # new key/values. Each call merges onto the existing settings
60
+ # hash.
61
+ #
62
+ # indexer.settings("a" => "a", "b" => "b")
63
+ #
64
+ # indexer.settings do
65
+ # store "b", "new b"
66
+ # end
67
+ #
68
+ # indexer.settings #=> {"a" => "a", "b" => "new b"}
69
+ #
70
+ # even with arguments, returns settings hash too, so can
71
+ # be chained.
72
+ def settings(new_settings = nil, &block)
73
+ @settings.merge!(new_settings) if new_settings
74
+
75
+ @settings.instance_eval &block if block
76
+
77
+ return @settings
78
+ end
79
+
80
+ # Used to define an indexing mapping.
81
+ def to_field(field_name, aLambda = nil, &block)
82
+ @index_steps << {
83
+ :field_name => field_name.to_s,
84
+ :lambda => aLambda,
85
+ :block => block
86
+ }
87
+ end
88
+
89
+ # Processes a single record, according to indexing rules
90
+ # set up in this Indexer. Returns a hash whose values are
91
+ # Arrays, and keys are strings.
92
+ #
93
+ def map_record(record)
94
+ context = Context.new(:source_record => record, :settings => settings)
95
+
96
+ @index_steps.each do |index_step|
97
+ accumulator = []
98
+ field_name = index_step[:field_name]
99
+ context.field_name = field_name
100
+
101
+ # Might have a lambda arg AND a block, we execute in order,
102
+ # with same accumulator.
103
+ [index_step[:lambda], index_step[:block]].each do |aProc|
104
+ if aProc
105
+ case aProc.arity
106
+ when 1 then aProc.call(record)
107
+ when 2 then aProc.call(record, accumulator)
108
+ else aProc.call(record, accumulator, context)
109
+ end
110
+ end
111
+
112
+ end
113
+
114
+ (context.output_hash[field_name] ||= []).concat accumulator
115
+ context.field_name = nil
116
+ end
117
+
118
+ return context.output_hash
119
+ end
120
+
121
+ # Processes a stream of records, reading from the configured Reader,
122
+ # mapping according to configured mapping rules, and then writing
123
+ # to configured Writer.
124
+ def process(io_stream)
125
+ reader = self.reader!(io_stream)
126
+ writer = self.writer!
127
+
128
+ reader.each do |record|
129
+ writer.put map_record(record)
130
+ end
131
+ writer.close if writer.respond_to?(:close)
132
+ end
133
+
134
+ def reader_class
135
+ unless defined? @reader_class
136
+ @reader_class = qualified_const_get(settings["reader_class_name"])
137
+ end
138
+ return @reader_class
139
+ end
140
+
141
+ def writer_class
142
+ unless defined? @writer_class
143
+ @writer_class = qualified_const_get(settings["writer_class_name"])
144
+ end
145
+ return @writer_class
146
+ end
147
+
148
+ # Instantiate a Traject Reader, using class set
149
+ # in #reader_class, initialized with io_stream passed in
150
+ def reader!(io_stream)
151
+ return reader_class.new(io_stream, settings)
152
+ end
153
+
154
+ # Instantiate a Traject Writer, suing class set in #writer_class
155
+ def writer!
156
+ return writer_class.new(settings)
157
+ end
158
+
159
+ def self.default_settings
160
+ {
161
+ "reader_class_name" => "Traject::MarcReader",
162
+ "writer_class_name" => "Traject::SolrJWriter"
163
+ }
164
+ end
165
+
166
+
167
+
168
+ # Enhanced with a few features from Hashie, to make it for
169
+ # instance string/symbol indifferent
170
+ class Settings < Hash
171
+ include Hashie::Extensions::MergeInitializer # can init with hash
172
+ include Hashie::Extensions::IndifferentAccess
173
+
174
+ # Hashie bug Issue #100 https://github.com/intridea/hashie/pull/100
175
+ alias_method :store, :indifferent_writer
176
+ end
177
+
178
+ # Represents the context of a specific record being indexed, passed
179
+ # to indexing logic blocks
180
+ #
181
+ class Traject::Indexer::Context
182
+ def initialize(hash_init = {})
183
+ # TODO, argument checking for required args?
184
+
185
+ self.clipboard = {}
186
+ self.output_hash = {}
187
+
188
+ hash_init.each_pair do |key, value|
189
+ self.send("#{key}=", value)
190
+ end
191
+ end
192
+
193
+ attr_accessor :clipboard, :output_hash
194
+ attr_accessor :field_name, :source_record, :settings
195
+ end
196
+ end
@@ -0,0 +1,51 @@
1
+ require 'json'
2
+
3
+ # A writer for Traject::Indexer, that just writes out
4
+ # all the output as Json. It's newline delimitted json, but
5
+ # right now no checks to make sure there is no internal newlines
6
+ # as whitespace in the json. TODO, add that.
7
+ #
8
+ # Not currently thread-safe (have to make sure whole object and newline
9
+ # get written without context switch. Can be made so.)
10
+ #
11
+ # You can force pretty-printing with setting 'json_writer.pretty_print' of boolean
12
+ # true or string 'true'. Useful mostly for human checking of output.
13
+ #
14
+ # Output will be sent to settings["output_file"] string path, or else
15
+ # settings["output_stream"] (ruby IO object), or else stdout.
16
+ class Traject::JsonWriter
17
+ attr_reader :settings
18
+
19
+ def initialize(argSettings)
20
+ @settings = argSettings
21
+ end
22
+
23
+ def put(hash)
24
+ serialized =
25
+ if settings["json_writer.pretty_print"]
26
+ JSON.pretty_generate(hash)
27
+ else
28
+ JSON.generate(hash)
29
+ end
30
+ output_file.puts(serialized)
31
+ end
32
+
33
+ def output_file
34
+ unless defined? @output_file
35
+ @output_file =
36
+ if settings["output_file"]
37
+ File.open(settings["output_file"])
38
+ elsif settings["output_stream"]
39
+ settings["output_stream"]
40
+ else
41
+ $stdout
42
+ end
43
+ end
44
+ return @output_file
45
+ end
46
+
47
+ def close
48
+ @output_file.close unless (@output_file.nil? || @output_file.tty?)
49
+ end
50
+
51
+ end
@@ -0,0 +1,9 @@
1
+ module Traject::Macros
2
+ module Basic
3
+ def literal(literal)
4
+ lambda do |record, accumulator, context|
5
+ accumulator << literal
6
+ end
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,145 @@
1
+ require 'traject/marc_extractor'
2
+ require 'traject/translation_map'
3
+ require 'base64'
4
+ require 'json'
5
+
6
+ module Traject::Macros
7
+ # Some of these may be generic for any MARC, but we haven't done
8
+ # the analytical work to think it through, some of this is
9
+ # def specific to Marc21.
10
+ module Marc21
11
+
12
+ # A combo function macro that will extract data from marc according to a string
13
+ # field/substring specification, then apply various optional post-processing to it too.
14
+ #
15
+ # First argument is a string spec suitable for the MarcExtractor, see
16
+ # MarcExtractor::parse_string_spec.
17
+ #
18
+ # Second arg is optional options, including options valid on MarcExtractor.new,
19
+ # and others. (TODO)
20
+ #
21
+ # Examples:
22
+ #
23
+ # to_field("title"), extract_marc("245abcd", :trim_punctuation => true)
24
+ # to_field("id"), extract_marc("001", :first => true)
25
+ # to_field("geo"), extract_marc("040a", :seperator => nil, :translation_map => "marc040")
26
+ def extract_marc(spec, options = {})
27
+ only_first = options.delete(:first)
28
+ trim_punctuation = options.delete(:trim_punctuation)
29
+
30
+ # We create the TranslationMap here on load, not inside the closure
31
+ # where it'll be called for every record. Since TranslationMap is supposed
32
+ # to cache, prob doesn't matter, but doens't hurt. Also causes any syntax
33
+ # exceptions to raise on load.
34
+ if translation_map_arg = options.delete(:translation_map)
35
+ translation_map = Traject::TranslationMap.new(translation_map_arg)
36
+ end
37
+
38
+ lambda do |record, accumulator, context|
39
+ accumulator.concat Traject::MarcExtractor.extract_by_spec(record, spec, options)
40
+
41
+ if only_first
42
+ Marc21.first! accumulator
43
+ end
44
+
45
+ if translation_map
46
+ translation_map.translate_array! accumulator
47
+ end
48
+
49
+ if trim_punctuation
50
+ accumulator.collect! {|s| Marc21.trim_punctuation(s)}
51
+ end
52
+ end
53
+ end
54
+
55
+ # Serializes complete marc record to a serialization format.
56
+ # required param :format,
57
+ # serialize_marc(:format => :binary)
58
+ #
59
+ # formats:
60
+ # [xml] MarcXML
61
+ # [json] marc-in-json (http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/)
62
+ # [binary] Standard ISO 2709 binary marc. By default WILL be base64-encoded,
63
+ # assumed destination a solr 'binary' field.
64
+ # add option `:binary_escape => false` to do straight binary -- unclear
65
+ # what Solr's documented behavior is when you do this, and add a string
66
+ # with binary control chars to solr. May do different things in diff
67
+ # Solr versions, including raising exceptions.
68
+ def serialized_marc(options)
69
+ options[:format] = options[:format].to_s
70
+ raise ArgumentError.new("Need :format => [binary|xml|json] arg") unless %w{binary xml json}.include?(options[:format])
71
+
72
+ lambda do |record, accumulator, context|
73
+ case options[:format]
74
+ when "binary"
75
+ binary = record.to_marc
76
+ binary = Base64.encode64(binary) unless options[:binary_escape] == false
77
+ accumulator << binary
78
+ when "xml"
79
+ # ruby-marc #to_xml returns a REXML object at time of this writing, bah!@
80
+ # call #to_s on it. Hopefully that'll be forward compatible.
81
+ accumulator << record.to_xml.to_s
82
+ when "json"
83
+ accumulator << JSON.dump(record.to_hash)
84
+ end
85
+ end
86
+ end
87
+
88
+ # Takes the whole record, by default from tags 100 to 899 inclusive,
89
+ # all subfields, and adds them to output. Subfields in a record are all
90
+ # joined by space by default.
91
+ #
92
+ # options
93
+ # [:from] default 100, only tags >= lexicographically
94
+ # [:to] default 899, only tags <= lexicographically
95
+ # [:seperator] how to join subfields, default space, nil means don't join
96
+ #
97
+ # All fields in from-to must be marc DATA (not control fields), or weirdness
98
+ #
99
+ # Can always run this thing multiple times on the same field if you need
100
+ # non-contiguous ranges of fields.
101
+ def extract_all_marc_values(options = {})
102
+ options = {:from => "100", :to => "899", :seperator => ' '}.merge(options)
103
+
104
+ lambda do |record, accumulator, context|
105
+ record.each do |field|
106
+ next unless field.tag >= options[:from] && field.tag <= options[:to]
107
+ subfield_values = field.subfields.collect {|sf| sf.value}
108
+ next unless subfield_values.length > 0
109
+
110
+ if options[:seperator]
111
+ accumulator << subfield_values.join( options[:seperator])
112
+ else
113
+ accumulator.concat subfield_values
114
+ end
115
+ end
116
+ end
117
+
118
+ end
119
+
120
+
121
+ # Trims punctuation mostly from end, and occasionally from beginning
122
+ # of string. Not nearly as complex logic as SolrMarc's version, just
123
+ # pretty simple.
124
+ #
125
+ # Removes
126
+ # * trailing: comma, slash, semicolon, colon (possibly followed by whitespace)
127
+ # * trailing period if it is preceded by at least three letters (possibly followed by whitespace)
128
+ # * single square bracket characters if they are the start and/or end
129
+ # chars and there are no internal square brackets.
130
+ #
131
+ # Returns altered string, doesn't change original arg.
132
+ def self.trim_punctuation(str)
133
+ str = str.sub(/[ ,\/;:] *\Z/, '')
134
+ str = str.sub(/(\w\w\w)\. *\Z/, '\1')
135
+ str = str.sub(/\A\[?([^\[\]]+)\]?\Z/, '\1')
136
+ return str
137
+ end
138
+
139
+ def self.first!(arr)
140
+ # kind of esoteric, but slice used this way does mutating first, yep
141
+ arr.slice!(1, arr.length)
142
+ end
143
+
144
+ end
145
+ end