traject_plus 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 69aca3e99ad81702e279a2f87b62498bfa500d34
4
+ data.tar.gz: 381ea6164dcef901c9e8086eab86cc621aa9e6e2
5
+ SHA512:
6
+ metadata.gz: ea73af5076668db42f5c04b7ecb2840f55c8d5869f3563744b213371500761d897ad7c34aa452628d5c26d0e75097a118609832b06b0853a65fac413fb6129fc
7
+ data.tar.gz: 4d27d2554eb9204a7cbd3acba7329558ac3fadbf61591026e8d2ab2b5353e17f8a28b90868cff6d12b027c180d3ad86154ef32d97d92e5b37d065da81ce61d77
data/.gitignore ADDED
@@ -0,0 +1,12 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+
11
+ # rspec failure tracking
12
+ .rspec_status
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
data/.travis.yml ADDED
@@ -0,0 +1,5 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 2.4.1
5
+ before_install: gem install bundler -v 1.15.4
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at cabeer@stanford.edu. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in traject_plus.gemspec
6
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,13 @@
1
+ © 2017 The Board of Trustees of the Leland Stanford Junior University.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
data/README.md ADDED
@@ -0,0 +1,120 @@
1
+ # TrajectPlus
2
+
3
+ TrajectPlus is a number of useful additions to [Traject](https://github.com/traject/traject)
4
+
5
+ ## Features
6
+
7
+ ### New readers:
8
+ #### TrajectPlus::JsonReader
9
+ ```ruby
10
+ provide 'reader_class_name', 'TrajectPlus::JsonReader'
11
+ to_field 'title', extract_json('$.label')
12
+ ```
13
+
14
+ #### TrajectPlus::CSVReader
15
+ ```ruby
16
+ provide 'reader_class_name', 'TrajectPlus::CSVReader'
17
+ to_field 'title', column('Record Title')
18
+ ```
19
+ #### TrajectPlus::XMLReader
20
+ ```ruby
21
+ provide 'reader_class_name', 'TrajectPlus::XMLReader'
22
+ to_field 'title', extract_xml('/*/mods:language/mods:scriptTerm',
23
+ { 'mods' => 'http://www.loc.gov/mods/v3' })
24
+ ```
25
+
26
+ There are also XML macros for specific formats (MODS, TEI, FGCD):
27
+
28
+ For example:
29
+ ```ruby
30
+ to_field 'title', extract_mods('/*/mods:language/mods:scriptTerm')
31
+ to_field 'cho_description', extract_tei("/*/tei:teiHeader/tei:fileDesc/tei:sourceDesc/tei:msDesc/tei:msContents/tei:summary")
32
+ extract_fgdc('/*/idinfo/citation/citeinfo/geoform')
33
+ ```
34
+
35
+ ### New macros:
36
+ * transform_values
37
+ * first
38
+ * conditional
39
+ * from_settings
40
+ * match
41
+ * format
42
+ * translation_map
43
+ * `accumulate` : Streamlines creation of lambdas that need no additional parsing or filtering
44
+
45
+ ```ruby
46
+ to_field 'x', accumulate { |record, context| record.values }
47
+ ```
48
+
49
+ * `copy` : Copies values from one output field to another
50
+
51
+ ```ruby
52
+ to_field 'x', copy('y')
53
+ ```
54
+
55
+ * `compose` : easy create sub-transformations
56
+
57
+ ```ruby
58
+ compose do
59
+ to_field 'x', accumulate { |record, context| record.x }
60
+ end
61
+
62
+ # => { 'x' => [1, 2, 3]}
63
+ ```
64
+
65
+ ```ruby
66
+ compose('subfield') do
67
+ to_field 'x', accumulate { |record, context| record.x }
68
+ end
69
+
70
+ # => { 'subfield' => [{ 'x' => [1, 2, 3]} ]}
71
+ ```
72
+
73
+ ```ruby
74
+ compose ->(record, accumulator, context) { record.subfield } do
75
+ to_field 'x', accumulate { |subfield, context| subfield.x }
76
+ end
77
+ # => { 'x' => [1, 2, 3]}
78
+ ```
79
+
80
+ * `transform`, supporting a variety of string methods: 'split', 'concat', 'prepend', 'gsub', 'encode', 'insert', 'strip', 'upcase', 'downcase', 'capitalize'
81
+
82
+ These can be applied to any extract function:
83
+
84
+ ```ruby
85
+ to_field 'title', extract: extract_xml('title'), transform: transform(gsub: ['|', ' - '])
86
+ ```
87
+
88
+ ## Installation
89
+
90
+ Add this line to your application's Gemfile:
91
+
92
+ ```ruby
93
+ gem 'traject_plus'
94
+ ```
95
+
96
+ And then execute:
97
+
98
+ $ bundle
99
+
100
+ Or install it yourself as:
101
+
102
+ $ gem install traject_plus
103
+
104
+ ## Usage
105
+
106
+ TODO: Write usage instructions here
107
+
108
+ ## Development
109
+
110
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
111
+
112
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
113
+
114
+ ## Contributing
115
+
116
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/traject_plus. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
117
+
118
+ ## Code of Conduct
119
+
120
+ Everyone interacting in the TrajectPlus project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/traject_plus/blob/master/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "traject_plus"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'csv'
4
+
5
+ # Reads in CSV records for traject
6
+ module TrajectPlus
7
+ class CsvReader
8
+ # @param input_stream [File]
9
+ # @param settings [Traject::Indexer::Settings]
10
+ def initialize(input_stream, settings)
11
+ @settings = Traject::Indexer::Settings.new settings
12
+ @input_stream = input_stream
13
+ @csv = CSV.parse(input_stream, headers: true)
14
+ end
15
+
16
+ def each(*args, &block)
17
+ csv.each(*args, &block)
18
+ end
19
+
20
+ attr_reader :csv
21
+ end
22
+ end
@@ -0,0 +1,116 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'active_support/core_ext/object/blank'
4
+ module TrajectPlus
5
+ module Extraction
6
+ def self.apply_extraction_options(result, options = {})
7
+ TransformPipeline.new(options).transform(result)
8
+ end
9
+
10
+ # Pipeline for transforming extracted values into normalized values
11
+ class TransformPipeline
12
+ attr_reader :options
13
+
14
+ def initialize(options)
15
+ @options = options
16
+ end
17
+
18
+ def transform(values)
19
+ options.inject(values) do |memo, (step, params)|
20
+ if step.respond_to? :call
21
+ memo.flat_map { |v| step.call(v, params) }
22
+ else
23
+ public_send(step, memo, params)
24
+ end
25
+ end
26
+ end
27
+
28
+ # Examples:
29
+ #
30
+ # to_field 'x', split: '/' # 'a / b / c' => 'a', 'b', 'c'
31
+ # to_field 'x', concat: '123' # 'abc' to 'abc123'
32
+ # to_field 'x', prepend: '321' # 'abc' to '321abc'
33
+ # to_field 'x', gsub: ['a', 'b'] # 'abc' to 'bbc'
34
+ # to_field 'x', gsub: [/[abc]/, 'b'] # 'abc' to 'bbb'
35
+ # to_field 'x', encode: 'UTF-8' # 'abc' to 'abc'
36
+ # to_field 'x', insert: [1, 'x'] # 'abc' to 'axbc'
37
+ ['split', 'concat', 'prepend', 'gsub', 'encode', 'insert'].each do |method|
38
+ define_method(method) do |values, *args|
39
+ values.flat_map do |v|
40
+ v.public_send(method, *args)
41
+ end
42
+ end
43
+ end
44
+
45
+ # to_field 'x', strip: true # ' abc ' to 'abc'
46
+ # to_field 'x', upcase: true # 'abc' to 'ABC'
47
+ # to_field 'x', downcase: true # 'ABC' to 'abc'
48
+ # to_field 'x', capitalize: true # 'abc' to 'Abc'
49
+ ['strip', 'upcase', 'downcase', 'capitalize'].each do |method|
50
+ define_method(method) do |values, *args|
51
+ values.map(&(method.to_sym))
52
+ end
53
+ end
54
+
55
+ # to_field 'x', match: [/([aeiou])/, 1] # 'abc' => 'a'
56
+ def match(values, match, index)
57
+ values.flat_map do |v|
58
+ v.match(match) do |m|
59
+ m[index]
60
+ end
61
+ end
62
+ end
63
+
64
+ # to_field 'x', format: '-> %s <-' # 'abc' to '-> abc <-'
65
+ def format(values, insert_string)
66
+ values.flat_map do |v|
67
+ insert_string % v
68
+ end
69
+ end
70
+
71
+ # to_field 'x', select: lambda { |x| x =~ /a/} # ['a', 'b'] => ['a']
72
+ def select(values, block)
73
+ values.select(&block)
74
+ end
75
+
76
+ # to_field 'x', reject: lambda { |x| x =~ /a/} # ['a', 'b'] => ['b']
77
+ def reject(values, block)
78
+ values.reject(&block)
79
+ end
80
+
81
+ # to_field 'x', min: 1 # ['a', 'b'] => ['a']
82
+ def min(values, count, block = nil)
83
+ if block.present?
84
+ values.min(count)
85
+ else
86
+ values.min(count, &block)
87
+ end
88
+ end
89
+
90
+ # to_field 'x', max: 1 # ['a', 'b'] => ['b']
91
+ def max(values, count, block = nil)
92
+ if block.present?
93
+ values.max(count)
94
+ else
95
+ values.max(count, &block)
96
+ end
97
+ end
98
+
99
+ # Using a named Traject translation map:
100
+ # to_field 'x', translation_map: 'types' # 'x' => 'mapped x',
101
+ def translation_map(values, maps)
102
+ translation_map = Traject::TranslationMap.new(*Array(maps))
103
+ translation_map.translate_array Array(values)
104
+ end
105
+
106
+ # to_field 'x', default: 'y' # nil => 'y'
107
+ def default(values, default_value)
108
+ if values.present?
109
+ values
110
+ else
111
+ default_value
112
+ end
113
+ end
114
+ end
115
+ end
116
+ end
@@ -0,0 +1,77 @@
1
+ module TrajectPlus
2
+ module Indexer
3
+ class ToFieldStep < Traject::Indexer::ToFieldStep
4
+ def initialize(fieldname, lambda, block, source_location, single: false)
5
+ super(fieldname, lambda, block, source_location)
6
+
7
+ @single = single
8
+ end
9
+
10
+ def single?
11
+ !!@single
12
+ end
13
+
14
+ # disable to_field_step? so we can implement our own version of add_accumulator_to_context
15
+ def to_field_step?
16
+ false
17
+ end
18
+
19
+ def execute(context)
20
+ accumulator = super
21
+
22
+ add_accumulator_to_context!(accumulator, context)
23
+ end
24
+
25
+ def add_accumulator_to_context!(accumulator, context)
26
+ self.class.add_accumulator_to_context!(self, field_name, accumulator, context)
27
+ end
28
+
29
+ def self.add_accumulator_to_context!(field, field_name, accumulator, context)
30
+ accumulator.compact! unless context.settings[Traject::Indexer::ALLOW_NIL_VALUES]
31
+ return if accumulator.empty? and not (context.settings[Traject::Indexer::ALLOW_EMPTY_FIELDS])
32
+
33
+ if field.single?
34
+ context.output_hash[field_name] = accumulator.first if accumulator.length > 0
35
+ else
36
+ context.output_hash[field_name] ||= []
37
+
38
+ existing_accumulator = context.output_hash[field_name].concat(accumulator)
39
+ existing_accumulator.uniq! unless context.settings[Traject::Indexer::ALLOW_DUPLICATE_VALUES]
40
+ end
41
+ end
42
+ end
43
+
44
+ class ComposeStep < ToFieldStep
45
+ attr_reader :indexer
46
+
47
+ def initialize(fieldname, lambda, block, source_location, indexer)
48
+ @indexer = indexer
49
+ self.field_name = fieldname
50
+ self.lambda = lambda
51
+ self.block = block
52
+ self.source_location = source_location
53
+ end
54
+
55
+ def execute(context)
56
+ accumulator = []
57
+ if lambda
58
+ lambda.call(context.source_record, accumulator, context)
59
+ else
60
+ accumulator << context.source_record
61
+ end
62
+
63
+ accumulator.map do |record|
64
+ result = indexer.map_record(record)
65
+
66
+ if field_name
67
+ self.class.add_accumulator_to_context! self, field_name, [result], context
68
+ else
69
+ result.each do |k, v|
70
+ self.class.add_accumulator_to_context! self, k, Array(v), context
71
+ end
72
+ end
73
+ end
74
+ end
75
+ end
76
+ end
77
+ end
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Reads in JSON records for traject
4
+ module TrajectPlus
5
+ class JsonReader
6
+ # @param input_stream [File]
7
+ # @param settings [Traject::Indexer::Settings]
8
+ def initialize(input_stream, settings)
9
+ @settings = Traject::Indexer::Settings.new settings
10
+ @input_stream = input_stream
11
+ @json = JSON.parse(input_stream.read)
12
+ end
13
+
14
+ attr_reader :json
15
+
16
+ def each(&block)
17
+ return to_enum(:each) unless block_given?
18
+
19
+ if json.is_a? Array
20
+ json.each(&block)
21
+ else
22
+ yield json
23
+ end
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,18 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TrajectPlus
4
+ module Macros
5
+ # Macros for extracting values from CSV rows
6
+ module Csv
7
+ # @param header_or_index [String] the field header or index to accumulate
8
+ def column(header_or_index, options = {})
9
+ lambda do |row, accumulator, _context|
10
+ return if row[header_or_index].to_s.empty?
11
+ result = Array(row[header_or_index].to_s)
12
+ result = TrajectPlus::Extraction.apply_extraction_options(result, options)
13
+ accumulator.concat(result)
14
+ end
15
+ end
16
+ end
17
+ end
18
+ end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TrajectPlus
4
+ module Macros
5
+ # Macros for extracting FGDC values from Nokogiri documents
6
+ module FGDC
7
+ NS = { fgdc: 'http://www.fgdc.gov/metadata/fgdc-std-001-1998.dtd' }.freeze
8
+
9
+ # @param xpath [String] the xpath query expression
10
+ def extract_fgdc(xpath, options = {})
11
+ extract_xml(xpath, NS, options)
12
+ end
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'jsonpath'
4
+
5
+ module TrajectPlus
6
+ module Macros
7
+ # Macros for extracting values from JSON documents
8
+ module JSON
9
+ # @param path [String] the jsonpath query expression
10
+ # @param options [Hash] other options, may include :trim
11
+ def extract_json(path, options = {})
12
+ lambda do |json, accumulator, _context|
13
+ result = Array(JsonPath.on(json, path))
14
+ result = TrajectPlus::Extraction.apply_extraction_options(result, options)
15
+ accumulator.concat(result)
16
+ end
17
+ end
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,18 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TrajectPlus
4
+ module Macros
5
+ # Macros for extracting MODS values from Nokogiri documents
6
+ module Mods
7
+ NS = { mods: 'http://www.loc.gov/mods/v3',
8
+ rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
9
+ dc: 'http://purl.org/dc/elements/1.1/',
10
+ xlink: 'http://www.w3.org/1999/xlink' }.freeze
11
+
12
+ # @param xpath [String] the xpath query expression
13
+ def extract_mods(xpath, options = {})
14
+ extract_xml(xpath, NS, options)
15
+ end
16
+ end
17
+ end
18
+ end
@@ -0,0 +1,14 @@
1
+ # frozen_string_literal: true
2
+ module TrajectPlus
3
+ module Macros
4
+ # Macros for extracting TEI values from Nokogiri documents
5
+ module Tei
6
+ NS = { tei: 'http://www.tei-c.org/ns/1.0' }.freeze
7
+
8
+ # @param xpath [String] the xpath query expression
9
+ def extract_tei(xpath, options = {})
10
+ extract_xml(xpath, NS, options)
11
+ end
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,19 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TrajectPlus
4
+ module Macros
5
+ # Macros for extracting MODS values from Nokogiri documents
6
+ module Xml
7
+ # @param xpath [String] the xpath query expression
8
+ # @param namespaces [Hash<String,String>] The namespaces for the xpath query
9
+ # @param options [Hash] other options, may include :trim
10
+ def extract_xml(xpath, namespaces, options = {})
11
+ lambda do |xml, accumulator, _context|
12
+ result = xml.xpath(xpath, namespaces).map(&:text)
13
+ result = TrajectPlus::Extraction.apply_extraction_options(result, options)
14
+ accumulator.concat(result)
15
+ end
16
+ end
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,81 @@
1
+ # frozen_string_literal: true
2
+ module TrajectPlus
3
+ module Macros
4
+ # construct a structured hash using values extracted using traject
5
+ def transform_values(context, hash)
6
+ hash.transform_values do |lambdas|
7
+ accumulator = []
8
+ Array(lambdas).each do |lambda|
9
+ lambda.call(context.source_record, accumulator, context)
10
+ end
11
+ accumulator
12
+ end
13
+ end
14
+
15
+ # try a bunch of macros and short-circuit after one returns values
16
+ def first(*macros)
17
+ lambda do |record, accumulator, context|
18
+ macros.lazy.map do |block|
19
+ block.call(record, accumulator, context)
20
+ end.reject(&:blank?).first
21
+ end
22
+ end
23
+
24
+ def accumulate(&block)
25
+ lambda do |record, accumulator, context|
26
+ Array(block.call(record, context)).each do |v|
27
+ accumulator << v if v.present?
28
+ end
29
+ end
30
+ end
31
+
32
+ # only accumulate values if a condition is met
33
+ def conditional(condition, block)
34
+ lambda do |record, accumulator, context|
35
+ if condition.call(record, context)
36
+ block.call(record, accumulator, context)
37
+ end
38
+ end
39
+ end
40
+
41
+ def from_settings(field)
42
+ accumulate do |record, context|
43
+ context.settings.fetch(field)
44
+ end
45
+ end
46
+
47
+ def copy(field)
48
+ accumulate do |_record, context|
49
+ Array(context.output_hash[field])
50
+ end
51
+ end
52
+
53
+ def transform(options = {})
54
+ lambda do |record, accumulator, context|
55
+ results = TrajectPlus::Extraction.apply_extraction_options(accumulator, options)
56
+ accumulator.replace(results)
57
+ end
58
+ end
59
+
60
+ # apply the same mapping to multiple fields
61
+ def to_fields(fields, mapping_method)
62
+ fields.each { |field| to_field field, mapping_method }
63
+ end
64
+
65
+ def to_field(field_name, aLambda = nil, extract: nil, transform: nil, **namedArgs, &block)
66
+ @index_steps << TrajectPlus::Indexer::ToFieldStep.new(field_name, extract || aLambda, transform || block, Traject::Util.extract_caller_location(caller.first), **namedArgs)
67
+ end
68
+
69
+ def compose(fieldname = nil, aLambda = nil, extract: nil, transform: nil, &block)
70
+ if fieldname.is_a? Proc
71
+ aLambda ||= fieldname
72
+ fieldname = nil
73
+ end
74
+
75
+ indexer = self.class.new(settings)
76
+ indexer.instance_eval(&block)
77
+
78
+ @index_steps << TrajectPlus::Indexer::ComposeStep.new(fieldname, extract || aLambda, transform, Traject::Util.extract_caller_location(caller.first), indexer)
79
+ end
80
+ end
81
+ end
@@ -0,0 +1,3 @@
1
+ module TrajectPlus
2
+ VERSION = "0.0.1"
3
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TrajectPlus
4
+ # Reads in XML records for traject
5
+ class XmlReader
6
+ # @param input_stream [File]
7
+ # @param settings [Traject::Indexer::Settings]
8
+ def initialize(input_stream, settings)
9
+ @settings = Traject::Indexer::Settings.new settings
10
+ @input_stream = input_stream
11
+ @xml = Nokogiri::XML(input_stream)
12
+ end
13
+
14
+ attr_reader :xml
15
+
16
+ def each
17
+ yield(xml)
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,20 @@
1
+ require 'traject_plus/version'
2
+ require 'traject'
3
+
4
+ module TrajectPlus
5
+ require 'traject_plus/indexer/step'
6
+
7
+ require 'traject_plus/macros'
8
+ require 'traject_plus/extraction'
9
+
10
+ require 'traject_plus/csv_reader'
11
+ require 'traject_plus/json_reader'
12
+ require 'traject_plus/xml_reader'
13
+
14
+ require 'traject_plus/macros/csv'
15
+ require 'traject_plus/macros/fgdc'
16
+ require 'traject_plus/macros/json'
17
+ require 'traject_plus/macros/mods'
18
+ require 'traject_plus/macros/tei'
19
+ require 'traject_plus/macros/xml'
20
+ end
@@ -0,0 +1,30 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "traject_plus/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "traject_plus"
8
+ spec.version = TrajectPlus::VERSION
9
+ spec.authors = ["Chris Beer", "Christina Harlow", "Aaron Collier", "Justin Coyne"]
10
+ spec.email = ["cabeer@stanford.edu", "cmharlow@stanford.edu", "amcollie@stanford.edu", "jcoyne85@stanford.edu"]
11
+
12
+ spec.summary = "Extensions to Traject for non-MARC formats"
13
+ spec.description = "Extensions to Traject for non-MARC formats"
14
+ spec.homepage = "https://github.com/sul-dlss/traject_plus"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
17
+ f.match(%r{^(test|spec|features)/})
18
+ end
19
+ spec.bindir = "exe"
20
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
21
+ spec.require_paths = ["lib"]
22
+
23
+ spec.add_dependency 'activesupport'
24
+ spec.add_dependency 'jsonpath'
25
+ spec.add_dependency 'traject'
26
+
27
+ spec.add_development_dependency "bundler", "~> 1.15"
28
+ spec.add_development_dependency "rake", "~> 10.0"
29
+ spec.add_development_dependency "rspec", "~> 3.0"
30
+ end
metadata ADDED
@@ -0,0 +1,158 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: traject_plus
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Chris Beer
8
+ - Christina Harlow
9
+ - Aaron Collier
10
+ - Justin Coyne
11
+ autorequire:
12
+ bindir: exe
13
+ cert_chain: []
14
+ date: 2017-12-04 00:00:00.000000000 Z
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
+ name: activesupport
18
+ requirement: !ruby/object:Gem::Requirement
19
+ requirements:
20
+ - - ">="
21
+ - !ruby/object:Gem::Version
22
+ version: '0'
23
+ type: :runtime
24
+ prerelease: false
25
+ version_requirements: !ruby/object:Gem::Requirement
26
+ requirements:
27
+ - - ">="
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: jsonpath
32
+ requirement: !ruby/object:Gem::Requirement
33
+ requirements:
34
+ - - ">="
35
+ - !ruby/object:Gem::Version
36
+ version: '0'
37
+ type: :runtime
38
+ prerelease: false
39
+ version_requirements: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - ">="
42
+ - !ruby/object:Gem::Version
43
+ version: '0'
44
+ - !ruby/object:Gem::Dependency
45
+ name: traject
46
+ requirement: !ruby/object:Gem::Requirement
47
+ requirements:
48
+ - - ">="
49
+ - !ruby/object:Gem::Version
50
+ version: '0'
51
+ type: :runtime
52
+ prerelease: false
53
+ version_requirements: !ruby/object:Gem::Requirement
54
+ requirements:
55
+ - - ">="
56
+ - !ruby/object:Gem::Version
57
+ version: '0'
58
+ - !ruby/object:Gem::Dependency
59
+ name: bundler
60
+ requirement: !ruby/object:Gem::Requirement
61
+ requirements:
62
+ - - "~>"
63
+ - !ruby/object:Gem::Version
64
+ version: '1.15'
65
+ type: :development
66
+ prerelease: false
67
+ version_requirements: !ruby/object:Gem::Requirement
68
+ requirements:
69
+ - - "~>"
70
+ - !ruby/object:Gem::Version
71
+ version: '1.15'
72
+ - !ruby/object:Gem::Dependency
73
+ name: rake
74
+ requirement: !ruby/object:Gem::Requirement
75
+ requirements:
76
+ - - "~>"
77
+ - !ruby/object:Gem::Version
78
+ version: '10.0'
79
+ type: :development
80
+ prerelease: false
81
+ version_requirements: !ruby/object:Gem::Requirement
82
+ requirements:
83
+ - - "~>"
84
+ - !ruby/object:Gem::Version
85
+ version: '10.0'
86
+ - !ruby/object:Gem::Dependency
87
+ name: rspec
88
+ requirement: !ruby/object:Gem::Requirement
89
+ requirements:
90
+ - - "~>"
91
+ - !ruby/object:Gem::Version
92
+ version: '3.0'
93
+ type: :development
94
+ prerelease: false
95
+ version_requirements: !ruby/object:Gem::Requirement
96
+ requirements:
97
+ - - "~>"
98
+ - !ruby/object:Gem::Version
99
+ version: '3.0'
100
+ description: Extensions to Traject for non-MARC formats
101
+ email:
102
+ - cabeer@stanford.edu
103
+ - cmharlow@stanford.edu
104
+ - amcollie@stanford.edu
105
+ - jcoyne85@stanford.edu
106
+ executables: []
107
+ extensions: []
108
+ extra_rdoc_files: []
109
+ files:
110
+ - ".gitignore"
111
+ - ".rspec"
112
+ - ".travis.yml"
113
+ - CODE_OF_CONDUCT.md
114
+ - Gemfile
115
+ - LICENSE
116
+ - README.md
117
+ - Rakefile
118
+ - bin/console
119
+ - bin/setup
120
+ - lib/traject_plus.rb
121
+ - lib/traject_plus/csv_reader.rb
122
+ - lib/traject_plus/extraction.rb
123
+ - lib/traject_plus/indexer/step.rb
124
+ - lib/traject_plus/json_reader.rb
125
+ - lib/traject_plus/macros.rb
126
+ - lib/traject_plus/macros/csv.rb
127
+ - lib/traject_plus/macros/fgdc.rb
128
+ - lib/traject_plus/macros/json.rb
129
+ - lib/traject_plus/macros/mods.rb
130
+ - lib/traject_plus/macros/tei.rb
131
+ - lib/traject_plus/macros/xml.rb
132
+ - lib/traject_plus/version.rb
133
+ - lib/traject_plus/xml_reader.rb
134
+ - traject_plus.gemspec
135
+ homepage: https://github.com/sul-dlss/traject_plus
136
+ licenses: []
137
+ metadata: {}
138
+ post_install_message:
139
+ rdoc_options: []
140
+ require_paths:
141
+ - lib
142
+ required_ruby_version: !ruby/object:Gem::Requirement
143
+ requirements:
144
+ - - ">="
145
+ - !ruby/object:Gem::Version
146
+ version: '0'
147
+ required_rubygems_version: !ruby/object:Gem::Requirement
148
+ requirements:
149
+ - - ">="
150
+ - !ruby/object:Gem::Version
151
+ version: '0'
152
+ requirements: []
153
+ rubyforge_project:
154
+ rubygems_version: 2.6.11
155
+ signing_key:
156
+ specification_version: 4
157
+ summary: Extensions to Traject for non-MARC formats
158
+ test_files: []