traject_plus 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 69aca3e99ad81702e279a2f87b62498bfa500d34
4
+ data.tar.gz: 381ea6164dcef901c9e8086eab86cc621aa9e6e2
5
+ SHA512:
6
+ metadata.gz: ea73af5076668db42f5c04b7ecb2840f55c8d5869f3563744b213371500761d897ad7c34aa452628d5c26d0e75097a118609832b06b0853a65fac413fb6129fc
7
+ data.tar.gz: 4d27d2554eb9204a7cbd3acba7329558ac3fadbf61591026e8d2ab2b5353e17f8a28b90868cff6d12b027c180d3ad86154ef32d97d92e5b37d065da81ce61d77
data/.gitignore ADDED
@@ -0,0 +1,12 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+
11
+ # rspec failure tracking
12
+ .rspec_status
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
data/.travis.yml ADDED
@@ -0,0 +1,5 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 2.4.1
5
+ before_install: gem install bundler -v 1.15.4
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at cabeer@stanford.edu. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in traject_plus.gemspec
6
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,13 @@
1
+ © 2017 The Board of Trustees of the Leland Stanford Junior University.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
data/README.md ADDED
@@ -0,0 +1,120 @@
1
+ # TrajectPlus
2
+
3
+ TrajectPlus is a number of useful additions to [Traject](https://github.com/traject/traject)
4
+
5
+ ## Features
6
+
7
+ ### New readers:
8
+ #### TrajectPlus::JsonReader
9
+ ```ruby
10
+ provide 'reader_class_name', 'TrajectPlus::JsonReader'
11
+ to_field 'title', extract_json('$.label')
12
+ ```
13
+
14
+ #### TrajectPlus::CSVReader
15
+ ```ruby
16
+ provide 'reader_class_name', 'TrajectPlus::CSVReader'
17
+ to_field 'title', column('Record Title')
18
+ ```
19
+ #### TrajectPlus::XMLReader
20
+ ```ruby
21
+ provide 'reader_class_name', 'TrajectPlus::XMLReader'
22
+ to_field 'title', extract_xml('/*/mods:language/mods:scriptTerm',
23
+ { 'mods' => 'http://www.loc.gov/mods/v3' })
24
+ ```
25
+
26
+ There are also XML macros for specific formats (MODS, TEI, FGCD):
27
+
28
+ For example:
29
+ ```ruby
30
+ to_field 'title', extract_mods('/*/mods:language/mods:scriptTerm')
31
+ to_field 'cho_description', extract_tei("/*/tei:teiHeader/tei:fileDesc/tei:sourceDesc/tei:msDesc/tei:msContents/tei:summary")
32
+ extract_fgdc('/*/idinfo/citation/citeinfo/geoform')
33
+ ```
34
+
35
+ ### New macros:
36
+ * transform_values
37
+ * first
38
+ * conditional
39
+ * from_settings
40
+ * match
41
+ * format
42
+ * translation_map
43
+ * `accumulate` : Streamlines creation of lambdas that need no additional parsing or filtering
44
+
45
+ ```ruby
46
+ to_field 'x', accumulate { |record, context| record.values }
47
+ ```
48
+
49
+ * `copy` : Copies values from one output field to another
50
+
51
+ ```ruby
52
+ to_field 'x', copy('y')
53
+ ```
54
+
55
+ * `compose` : easy create sub-transformations
56
+
57
+ ```ruby
58
+ compose do
59
+ to_field 'x', accumulate { |record, context| record.x }
60
+ end
61
+
62
+ # => { 'x' => [1, 2, 3]}
63
+ ```
64
+
65
+ ```ruby
66
+ compose('subfield') do
67
+ to_field 'x', accumulate { |record, context| record.x }
68
+ end
69
+
70
+ # => { 'subfield' => [{ 'x' => [1, 2, 3]} ]}
71
+ ```
72
+
73
+ ```ruby
74
+ compose ->(record, accumulator, context) { record.subfield } do
75
+ to_field 'x', accumulate { |subfield, context| subfield.x }
76
+ end
77
+ # => { 'x' => [1, 2, 3]}
78
+ ```
79
+
80
+ * `transform`, supporting a variety of string methods: 'split', 'concat', 'prepend', 'gsub', 'encode', 'insert', 'strip', 'upcase', 'downcase', 'capitalize'
81
+
82
+ These can be applied to any extract function:
83
+
84
+ ```ruby
85
+ to_field 'title', extract: extract_xml('title'), transform: transform(gsub: ['|', ' - '])
86
+ ```
87
+
88
+ ## Installation
89
+
90
+ Add this line to your application's Gemfile:
91
+
92
+ ```ruby
93
+ gem 'traject_plus'
94
+ ```
95
+
96
+ And then execute:
97
+
98
+ $ bundle
99
+
100
+ Or install it yourself as:
101
+
102
+ $ gem install traject_plus
103
+
104
+ ## Usage
105
+
106
+ TODO: Write usage instructions here
107
+
108
+ ## Development
109
+
110
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
111
+
112
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
113
+
114
+ ## Contributing
115
+
116
+ Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/traject_plus. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
117
+
118
+ ## Code of Conduct
119
+
120
+ Everyone interacting in the TrajectPlus project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/[USERNAME]/traject_plus/blob/master/CODE_OF_CONDUCT.md).
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "traject_plus"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,22 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'csv'
4
+
5
+ # Reads in CSV records for traject
6
+ module TrajectPlus
7
+ class CsvReader
8
+ # @param input_stream [File]
9
+ # @param settings [Traject::Indexer::Settings]
10
+ def initialize(input_stream, settings)
11
+ @settings = Traject::Indexer::Settings.new settings
12
+ @input_stream = input_stream
13
+ @csv = CSV.parse(input_stream, headers: true)
14
+ end
15
+
16
+ def each(*args, &block)
17
+ csv.each(*args, &block)
18
+ end
19
+
20
+ attr_reader :csv
21
+ end
22
+ end
@@ -0,0 +1,116 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'active_support/core_ext/object/blank'
4
+ module TrajectPlus
5
+ module Extraction
6
+ def self.apply_extraction_options(result, options = {})
7
+ TransformPipeline.new(options).transform(result)
8
+ end
9
+
10
+ # Pipeline for transforming extracted values into normalized values
11
+ class TransformPipeline
12
+ attr_reader :options
13
+
14
+ def initialize(options)
15
+ @options = options
16
+ end
17
+
18
+ def transform(values)
19
+ options.inject(values) do |memo, (step, params)|
20
+ if step.respond_to? :call
21
+ memo.flat_map { |v| step.call(v, params) }
22
+ else
23
+ public_send(step, memo, params)
24
+ end
25
+ end
26
+ end
27
+
28
+ # Examples:
29
+ #
30
+ # to_field 'x', split: '/' # 'a / b / c' => 'a', 'b', 'c'
31
+ # to_field 'x', concat: '123' # 'abc' to 'abc123'
32
+ # to_field 'x', prepend: '321' # 'abc' to '321abc'
33
+ # to_field 'x', gsub: ['a', 'b'] # 'abc' to 'bbc'
34
+ # to_field 'x', gsub: [/[abc]/, 'b'] # 'abc' to 'bbb'
35
+ # to_field 'x', encode: 'UTF-8' # 'abc' to 'abc'
36
+ # to_field 'x', insert: [1, 'x'] # 'abc' to 'axbc'
37
+ ['split', 'concat', 'prepend', 'gsub', 'encode', 'insert'].each do |method|
38
+ define_method(method) do |values, *args|
39
+ values.flat_map do |v|
40
+ v.public_send(method, *args)
41
+ end
42
+ end
43
+ end
44
+
45
+ # to_field 'x', strip: true # ' abc ' to 'abc'
46
+ # to_field 'x', upcase: true # 'abc' to 'ABC'
47
+ # to_field 'x', downcase: true # 'ABC' to 'abc'
48
+ # to_field 'x', capitalize: true # 'abc' to 'Abc'
49
+ ['strip', 'upcase', 'downcase', 'capitalize'].each do |method|
50
+ define_method(method) do |values, *args|
51
+ values.map(&(method.to_sym))
52
+ end
53
+ end
54
+
55
+ # to_field 'x', match: [/([aeiou])/, 1] # 'abc' => 'a'
56
+ def match(values, match, index)
57
+ values.flat_map do |v|
58
+ v.match(match) do |m|
59
+ m[index]
60
+ end
61
+ end
62
+ end
63
+
64
+ # to_field 'x', format: '-> %s <-' # 'abc' to '-> abc <-'
65
+ def format(values, insert_string)
66
+ values.flat_map do |v|
67
+ insert_string % v
68
+ end
69
+ end
70
+
71
+ # to_field 'x', select: lambda { |x| x =~ /a/} # ['a', 'b'] => ['a']
72
+ def select(values, block)
73
+ values.select(&block)
74
+ end
75
+
76
+ # to_field 'x', reject: lambda { |x| x =~ /a/} # ['a', 'b'] => ['b']
77
+ def reject(values, block)
78
+ values.reject(&block)
79
+ end
80
+
81
+ # to_field 'x', min: 1 # ['a', 'b'] => ['a']
82
+ def min(values, count, block = nil)
83
+ if block.present?
84
+ values.min(count)
85
+ else
86
+ values.min(count, &block)
87
+ end
88
+ end
89
+
90
+ # to_field 'x', max: 1 # ['a', 'b'] => ['b']
91
+ def max(values, count, block = nil)
92
+ if block.present?
93
+ values.max(count)
94
+ else
95
+ values.max(count, &block)
96
+ end
97
+ end
98
+
99
+ # Using a named Traject translation map:
100
+ # to_field 'x', translation_map: 'types' # 'x' => 'mapped x',
101
+ def translation_map(values, maps)
102
+ translation_map = Traject::TranslationMap.new(*Array(maps))
103
+ translation_map.translate_array Array(values)
104
+ end
105
+
106
+ # to_field 'x', default: 'y' # nil => 'y'
107
+ def default(values, default_value)
108
+ if values.present?
109
+ values
110
+ else
111
+ default_value
112
+ end
113
+ end
114
+ end
115
+ end
116
+ end
@@ -0,0 +1,77 @@
1
+ module TrajectPlus
2
+ module Indexer
3
+ class ToFieldStep < Traject::Indexer::ToFieldStep
4
+ def initialize(fieldname, lambda, block, source_location, single: false)
5
+ super(fieldname, lambda, block, source_location)
6
+
7
+ @single = single
8
+ end
9
+
10
+ def single?
11
+ !!@single
12
+ end
13
+
14
+ # disable to_field_step? so we can implement our own version of add_accumulator_to_context
15
+ def to_field_step?
16
+ false
17
+ end
18
+
19
+ def execute(context)
20
+ accumulator = super
21
+
22
+ add_accumulator_to_context!(accumulator, context)
23
+ end
24
+
25
+ def add_accumulator_to_context!(accumulator, context)
26
+ self.class.add_accumulator_to_context!(self, field_name, accumulator, context)
27
+ end
28
+
29
+ def self.add_accumulator_to_context!(field, field_name, accumulator, context)
30
+ accumulator.compact! unless context.settings[Traject::Indexer::ALLOW_NIL_VALUES]
31
+ return if accumulator.empty? and not (context.settings[Traject::Indexer::ALLOW_EMPTY_FIELDS])
32
+
33
+ if field.single?
34
+ context.output_hash[field_name] = accumulator.first if accumulator.length > 0
35
+ else
36
+ context.output_hash[field_name] ||= []
37
+
38
+ existing_accumulator = context.output_hash[field_name].concat(accumulator)
39
+ existing_accumulator.uniq! unless context.settings[Traject::Indexer::ALLOW_DUPLICATE_VALUES]
40
+ end
41
+ end
42
+ end
43
+
44
+ class ComposeStep < ToFieldStep
45
+ attr_reader :indexer
46
+
47
+ def initialize(fieldname, lambda, block, source_location, indexer)
48
+ @indexer = indexer
49
+ self.field_name = fieldname
50
+ self.lambda = lambda
51
+ self.block = block
52
+ self.source_location = source_location
53
+ end
54
+
55
+ def execute(context)
56
+ accumulator = []
57
+ if lambda
58
+ lambda.call(context.source_record, accumulator, context)
59
+ else
60
+ accumulator << context.source_record
61
+ end
62
+
63
+ accumulator.map do |record|
64
+ result = indexer.map_record(record)
65
+
66
+ if field_name
67
+ self.class.add_accumulator_to_context! self, field_name, [result], context
68
+ else
69
+ result.each do |k, v|
70
+ self.class.add_accumulator_to_context! self, k, Array(v), context
71
+ end
72
+ end
73
+ end
74
+ end
75
+ end
76
+ end
77
+ end
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Reads in JSON records for traject
4
+ module TrajectPlus
5
+ class JsonReader
6
+ # @param input_stream [File]
7
+ # @param settings [Traject::Indexer::Settings]
8
+ def initialize(input_stream, settings)
9
+ @settings = Traject::Indexer::Settings.new settings
10
+ @input_stream = input_stream
11
+ @json = JSON.parse(input_stream.read)
12
+ end
13
+
14
+ attr_reader :json
15
+
16
+ def each(&block)
17
+ return to_enum(:each) unless block_given?
18
+
19
+ if json.is_a? Array
20
+ json.each(&block)
21
+ else
22
+ yield json
23
+ end
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,18 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TrajectPlus
4
+ module Macros
5
+ # Macros for extracting values from CSV rows
6
+ module Csv
7
+ # @param header_or_index [String] the field header or index to accumulate
8
+ def column(header_or_index, options = {})
9
+ lambda do |row, accumulator, _context|
10
+ return if row[header_or_index].to_s.empty?
11
+ result = Array(row[header_or_index].to_s)
12
+ result = TrajectPlus::Extraction.apply_extraction_options(result, options)
13
+ accumulator.concat(result)
14
+ end
15
+ end
16
+ end
17
+ end
18
+ end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TrajectPlus
4
+ module Macros
5
+ # Macros for extracting FGDC values from Nokogiri documents
6
+ module FGDC
7
+ NS = { fgdc: 'http://www.fgdc.gov/metadata/fgdc-std-001-1998.dtd' }.freeze
8
+
9
+ # @param xpath [String] the xpath query expression
10
+ def extract_fgdc(xpath, options = {})
11
+ extract_xml(xpath, NS, options)
12
+ end
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'jsonpath'
4
+
5
+ module TrajectPlus
6
+ module Macros
7
+ # Macros for extracting values from JSON documents
8
+ module JSON
9
+ # @param path [String] the jsonpath query expression
10
+ # @param options [Hash] other options, may include :trim
11
+ def extract_json(path, options = {})
12
+ lambda do |json, accumulator, _context|
13
+ result = Array(JsonPath.on(json, path))
14
+ result = TrajectPlus::Extraction.apply_extraction_options(result, options)
15
+ accumulator.concat(result)
16
+ end
17
+ end
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,18 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TrajectPlus
4
+ module Macros
5
+ # Macros for extracting MODS values from Nokogiri documents
6
+ module Mods
7
+ NS = { mods: 'http://www.loc.gov/mods/v3',
8
+ rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
9
+ dc: 'http://purl.org/dc/elements/1.1/',
10
+ xlink: 'http://www.w3.org/1999/xlink' }.freeze
11
+
12
+ # @param xpath [String] the xpath query expression
13
+ def extract_mods(xpath, options = {})
14
+ extract_xml(xpath, NS, options)
15
+ end
16
+ end
17
+ end
18
+ end
@@ -0,0 +1,14 @@
1
+ # frozen_string_literal: true
2
+ module TrajectPlus
3
+ module Macros
4
+ # Macros for extracting TEI values from Nokogiri documents
5
+ module Tei
6
+ NS = { tei: 'http://www.tei-c.org/ns/1.0' }.freeze
7
+
8
+ # @param xpath [String] the xpath query expression
9
+ def extract_tei(xpath, options = {})
10
+ extract_xml(xpath, NS, options)
11
+ end
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,19 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TrajectPlus
4
+ module Macros
5
+ # Macros for extracting MODS values from Nokogiri documents
6
+ module Xml
7
+ # @param xpath [String] the xpath query expression
8
+ # @param namespaces [Hash<String,String>] The namespaces for the xpath query
9
+ # @param options [Hash] other options, may include :trim
10
+ def extract_xml(xpath, namespaces, options = {})
11
+ lambda do |xml, accumulator, _context|
12
+ result = xml.xpath(xpath, namespaces).map(&:text)
13
+ result = TrajectPlus::Extraction.apply_extraction_options(result, options)
14
+ accumulator.concat(result)
15
+ end
16
+ end
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,81 @@
1
+ # frozen_string_literal: true
2
+ module TrajectPlus
3
+ module Macros
4
+ # construct a structured hash using values extracted using traject
5
+ def transform_values(context, hash)
6
+ hash.transform_values do |lambdas|
7
+ accumulator = []
8
+ Array(lambdas).each do |lambda|
9
+ lambda.call(context.source_record, accumulator, context)
10
+ end
11
+ accumulator
12
+ end
13
+ end
14
+
15
+ # try a bunch of macros and short-circuit after one returns values
16
+ def first(*macros)
17
+ lambda do |record, accumulator, context|
18
+ macros.lazy.map do |block|
19
+ block.call(record, accumulator, context)
20
+ end.reject(&:blank?).first
21
+ end
22
+ end
23
+
24
+ def accumulate(&block)
25
+ lambda do |record, accumulator, context|
26
+ Array(block.call(record, context)).each do |v|
27
+ accumulator << v if v.present?
28
+ end
29
+ end
30
+ end
31
+
32
+ # only accumulate values if a condition is met
33
+ def conditional(condition, block)
34
+ lambda do |record, accumulator, context|
35
+ if condition.call(record, context)
36
+ block.call(record, accumulator, context)
37
+ end
38
+ end
39
+ end
40
+
41
+ def from_settings(field)
42
+ accumulate do |record, context|
43
+ context.settings.fetch(field)
44
+ end
45
+ end
46
+
47
+ def copy(field)
48
+ accumulate do |_record, context|
49
+ Array(context.output_hash[field])
50
+ end
51
+ end
52
+
53
+ def transform(options = {})
54
+ lambda do |record, accumulator, context|
55
+ results = TrajectPlus::Extraction.apply_extraction_options(accumulator, options)
56
+ accumulator.replace(results)
57
+ end
58
+ end
59
+
60
+ # apply the same mapping to multiple fields
61
+ def to_fields(fields, mapping_method)
62
+ fields.each { |field| to_field field, mapping_method }
63
+ end
64
+
65
+ def to_field(field_name, aLambda = nil, extract: nil, transform: nil, **namedArgs, &block)
66
+ @index_steps << TrajectPlus::Indexer::ToFieldStep.new(field_name, extract || aLambda, transform || block, Traject::Util.extract_caller_location(caller.first), **namedArgs)
67
+ end
68
+
69
+ def compose(fieldname = nil, aLambda = nil, extract: nil, transform: nil, &block)
70
+ if fieldname.is_a? Proc
71
+ aLambda ||= fieldname
72
+ fieldname = nil
73
+ end
74
+
75
+ indexer = self.class.new(settings)
76
+ indexer.instance_eval(&block)
77
+
78
+ @index_steps << TrajectPlus::Indexer::ComposeStep.new(fieldname, extract || aLambda, transform, Traject::Util.extract_caller_location(caller.first), indexer)
79
+ end
80
+ end
81
+ end
@@ -0,0 +1,3 @@
1
+ module TrajectPlus
2
+ VERSION = "0.0.1"
3
+ end
@@ -0,0 +1,20 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TrajectPlus
4
+ # Reads in XML records for traject
5
+ class XmlReader
6
+ # @param input_stream [File]
7
+ # @param settings [Traject::Indexer::Settings]
8
+ def initialize(input_stream, settings)
9
+ @settings = Traject::Indexer::Settings.new settings
10
+ @input_stream = input_stream
11
+ @xml = Nokogiri::XML(input_stream)
12
+ end
13
+
14
+ attr_reader :xml
15
+
16
+ def each
17
+ yield(xml)
18
+ end
19
+ end
20
+ end
@@ -0,0 +1,20 @@
1
+ require 'traject_plus/version'
2
+ require 'traject'
3
+
4
+ module TrajectPlus
5
+ require 'traject_plus/indexer/step'
6
+
7
+ require 'traject_plus/macros'
8
+ require 'traject_plus/extraction'
9
+
10
+ require 'traject_plus/csv_reader'
11
+ require 'traject_plus/json_reader'
12
+ require 'traject_plus/xml_reader'
13
+
14
+ require 'traject_plus/macros/csv'
15
+ require 'traject_plus/macros/fgdc'
16
+ require 'traject_plus/macros/json'
17
+ require 'traject_plus/macros/mods'
18
+ require 'traject_plus/macros/tei'
19
+ require 'traject_plus/macros/xml'
20
+ end
@@ -0,0 +1,30 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "traject_plus/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "traject_plus"
8
+ spec.version = TrajectPlus::VERSION
9
+ spec.authors = ["Chris Beer", "Christina Harlow", "Aaron Collier", "Justin Coyne"]
10
+ spec.email = ["cabeer@stanford.edu", "cmharlow@stanford.edu", "amcollie@stanford.edu", "jcoyne85@stanford.edu"]
11
+
12
+ spec.summary = "Extensions to Traject for non-MARC formats"
13
+ spec.description = "Extensions to Traject for non-MARC formats"
14
+ spec.homepage = "https://github.com/sul-dlss/traject_plus"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
17
+ f.match(%r{^(test|spec|features)/})
18
+ end
19
+ spec.bindir = "exe"
20
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
21
+ spec.require_paths = ["lib"]
22
+
23
+ spec.add_dependency 'activesupport'
24
+ spec.add_dependency 'jsonpath'
25
+ spec.add_dependency 'traject'
26
+
27
+ spec.add_development_dependency "bundler", "~> 1.15"
28
+ spec.add_development_dependency "rake", "~> 10.0"
29
+ spec.add_development_dependency "rspec", "~> 3.0"
30
+ end
metadata ADDED
@@ -0,0 +1,158 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: traject_plus
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Chris Beer
8
+ - Christina Harlow
9
+ - Aaron Collier
10
+ - Justin Coyne
11
+ autorequire:
12
+ bindir: exe
13
+ cert_chain: []
14
+ date: 2017-12-04 00:00:00.000000000 Z
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
+ name: activesupport
18
+ requirement: !ruby/object:Gem::Requirement
19
+ requirements:
20
+ - - ">="
21
+ - !ruby/object:Gem::Version
22
+ version: '0'
23
+ type: :runtime
24
+ prerelease: false
25
+ version_requirements: !ruby/object:Gem::Requirement
26
+ requirements:
27
+ - - ">="
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: jsonpath
32
+ requirement: !ruby/object:Gem::Requirement
33
+ requirements:
34
+ - - ">="
35
+ - !ruby/object:Gem::Version
36
+ version: '0'
37
+ type: :runtime
38
+ prerelease: false
39
+ version_requirements: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - ">="
42
+ - !ruby/object:Gem::Version
43
+ version: '0'
44
+ - !ruby/object:Gem::Dependency
45
+ name: traject
46
+ requirement: !ruby/object:Gem::Requirement
47
+ requirements:
48
+ - - ">="
49
+ - !ruby/object:Gem::Version
50
+ version: '0'
51
+ type: :runtime
52
+ prerelease: false
53
+ version_requirements: !ruby/object:Gem::Requirement
54
+ requirements:
55
+ - - ">="
56
+ - !ruby/object:Gem::Version
57
+ version: '0'
58
+ - !ruby/object:Gem::Dependency
59
+ name: bundler
60
+ requirement: !ruby/object:Gem::Requirement
61
+ requirements:
62
+ - - "~>"
63
+ - !ruby/object:Gem::Version
64
+ version: '1.15'
65
+ type: :development
66
+ prerelease: false
67
+ version_requirements: !ruby/object:Gem::Requirement
68
+ requirements:
69
+ - - "~>"
70
+ - !ruby/object:Gem::Version
71
+ version: '1.15'
72
+ - !ruby/object:Gem::Dependency
73
+ name: rake
74
+ requirement: !ruby/object:Gem::Requirement
75
+ requirements:
76
+ - - "~>"
77
+ - !ruby/object:Gem::Version
78
+ version: '10.0'
79
+ type: :development
80
+ prerelease: false
81
+ version_requirements: !ruby/object:Gem::Requirement
82
+ requirements:
83
+ - - "~>"
84
+ - !ruby/object:Gem::Version
85
+ version: '10.0'
86
+ - !ruby/object:Gem::Dependency
87
+ name: rspec
88
+ requirement: !ruby/object:Gem::Requirement
89
+ requirements:
90
+ - - "~>"
91
+ - !ruby/object:Gem::Version
92
+ version: '3.0'
93
+ type: :development
94
+ prerelease: false
95
+ version_requirements: !ruby/object:Gem::Requirement
96
+ requirements:
97
+ - - "~>"
98
+ - !ruby/object:Gem::Version
99
+ version: '3.0'
100
+ description: Extensions to Traject for non-MARC formats
101
+ email:
102
+ - cabeer@stanford.edu
103
+ - cmharlow@stanford.edu
104
+ - amcollie@stanford.edu
105
+ - jcoyne85@stanford.edu
106
+ executables: []
107
+ extensions: []
108
+ extra_rdoc_files: []
109
+ files:
110
+ - ".gitignore"
111
+ - ".rspec"
112
+ - ".travis.yml"
113
+ - CODE_OF_CONDUCT.md
114
+ - Gemfile
115
+ - LICENSE
116
+ - README.md
117
+ - Rakefile
118
+ - bin/console
119
+ - bin/setup
120
+ - lib/traject_plus.rb
121
+ - lib/traject_plus/csv_reader.rb
122
+ - lib/traject_plus/extraction.rb
123
+ - lib/traject_plus/indexer/step.rb
124
+ - lib/traject_plus/json_reader.rb
125
+ - lib/traject_plus/macros.rb
126
+ - lib/traject_plus/macros/csv.rb
127
+ - lib/traject_plus/macros/fgdc.rb
128
+ - lib/traject_plus/macros/json.rb
129
+ - lib/traject_plus/macros/mods.rb
130
+ - lib/traject_plus/macros/tei.rb
131
+ - lib/traject_plus/macros/xml.rb
132
+ - lib/traject_plus/version.rb
133
+ - lib/traject_plus/xml_reader.rb
134
+ - traject_plus.gemspec
135
+ homepage: https://github.com/sul-dlss/traject_plus
136
+ licenses: []
137
+ metadata: {}
138
+ post_install_message:
139
+ rdoc_options: []
140
+ require_paths:
141
+ - lib
142
+ required_ruby_version: !ruby/object:Gem::Requirement
143
+ requirements:
144
+ - - ">="
145
+ - !ruby/object:Gem::Version
146
+ version: '0'
147
+ required_rubygems_version: !ruby/object:Gem::Requirement
148
+ requirements:
149
+ - - ">="
150
+ - !ruby/object:Gem::Version
151
+ version: '0'
152
+ requirements: []
153
+ rubyforge_project:
154
+ rubygems_version: 2.6.11
155
+ signing_key:
156
+ specification_version: 4
157
+ summary: Extensions to Traject for non-MARC formats
158
+ test_files: []