solrizer 3.1.1 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 2352b7c26cf55046a975d909108924a8ca9021f8
4
- data.tar.gz: c83e79545b2caa984d75f2deb6bd14d3495a057e
3
+ metadata.gz: 8d77a71925151847cbb586039e03296e1b0aa2c9
4
+ data.tar.gz: a2bbb1286b7b037dc5a83d81f84ec098edc60283
5
5
  SHA512:
6
- metadata.gz: 518b8942031a7de396ebfd51b0e3de9b933ec0f9a06e10d5f53249655dee2ebfc9da6ad65a42036dec7fbf9f6bd0614f84f2ae4ce411e036c63ddeffd5b41967
7
- data.tar.gz: d3e557a70e7f9fd80c6be842b5fb392eee8e93f786638ae691b6ee7c4c9b5bbb7ea939b61f42d2854e19c2a075e247e49d9240085ca6d08484cac57f6d6a20b4
6
+ metadata.gz: 76ad75ea81fb427b72a0b214637427f44ecd54d5e4b80cc43f49ae4c13598816d3756144b4b1081f7c357169595d97d7c8ce4e1137533e54e99807c619f9d9d6
7
+ data.tar.gz: 9294cad6b4a5661321d4a0780652af3e31fbebfeabf6b3ebe5d62577d37aaf6cc685255edb50acc5fd8782f468bb05a0d8ba1e64bb40c346c6f10202e8103250
@@ -0,0 +1,113 @@
1
+ # How to Contribute
2
+
3
+ We want your help to make Project Hydra great.
4
+ There are a few guidelines that we need contributors to follow so that we can have a chance of keeping on top of things.
5
+
6
+ ## Hydra Project Intellectual Property Licensing and Ownership
7
+
8
+ All code contributors must have an Individual Contributor License Agreement (iCLA) on file with the Hydra Project Steering Group.
9
+ If the contributor works for an institution, the institution must have a Corporate Contributor License Agreement (cCLA) on file.
10
+
11
+ https://wiki.duraspace.org/display/hydra/Hydra+Project+Intellectual+Property+Licensing+and+Ownership
12
+
13
+ You should also add yourself to the `CONTRIBUTORS.md` file in the root of the project.
14
+
15
+ ## Contribution Tasks
16
+
17
+ * Reporting Issues
18
+ * Making Changes
19
+ * Submitting Changes
20
+ * Merging Changes
21
+
22
+ ### Reporting Issues
23
+
24
+ * Make sure you have a [GitHub account](https://github.com/signup/free)
25
+ * Submit a [Github issue](./issues) by:
26
+ * Clearly describing the issue
27
+ * Provide a descriptive summary
28
+ * Explain the expected behavior
29
+ * Explain the actual behavior
30
+ * Provide steps to reproduce the actual behavior
31
+
32
+ ### Making Changes
33
+
34
+ * Fork the repository on GitHub
35
+ * Create a topic branch from where you want to base your work.
36
+ * This is usually the master branch.
37
+ * To quickly create a topic branch based on master; `git branch fix/master/my_contribution master`
38
+ * Then checkout the new branch with `git checkout fix/master/my_contribution`.
39
+ * Please avoid working directly on the `master` branch.
40
+ * You may find the [hub suite of commands](https://github.com/defunkt/hub) helpful
41
+ * Make commits of logical units.
42
+ * Your commit should include a high level description of your work in HISTORY.textile
43
+ * Check for unnecessary whitespace with `git diff --check` before committing.
44
+ * Make sure your commit messages are [well formed](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).
45
+ * If you created an issue, you can close it by including "Closes #issue" in your commit message. See [Github's blog post for more details](https://github.com/blog/1386-closing-issues-via-commit-messages)
46
+
47
+ ```
48
+ Present tense short summary (50 characters or less)
49
+
50
+ More detailed description, if necessary. It should be wrapped to 72
51
+ characters. Try to be as descriptive as you can, even if you think that
52
+ the commit content is obvious, it may not be obvious to others. You
53
+ should add such description also if it's already present in bug tracker,
54
+ it should not be necessary to visit a webpage to check the history.
55
+
56
+ Include Closes #<issue-number> when relavent.
57
+
58
+ Description can have multiple paragraphs and you can use code examples
59
+ inside, just indent it with 4 spaces:
60
+
61
+ class PostsController
62
+ def index
63
+ respond_with Post.limit(10)
64
+ end
65
+ end
66
+
67
+ You can also add bullet points:
68
+
69
+ - you can use dashes or asterisks
70
+
71
+ - also, try to indent next line of a point for readability, if it's too
72
+ long to fit in 72 characters
73
+ ```
74
+
75
+ * Make sure you have added the necessary tests for your changes.
76
+ * Run _all_ the tests to assure nothing else was accidentally broken.
77
+ * When you are ready to submit a pull request
78
+
79
+ ### Submitting Changes
80
+
81
+ [Detailed Walkthrough of One Pull Request per Commit](http://ndlib.github.io/practices/one-commit-per-pull-request/)
82
+
83
+ * Read the article ["Using Pull Requests"](https://help.github.com/articles/using-pull-requests) on GitHub.
84
+ * Make sure your branch is up to date with its parent branch (i.e. master)
85
+ * `git checkout master`
86
+ * `git pull --rebase`
87
+ * `git checkout <your-branch>`
88
+ * `git rebase master`
89
+ * It is likely a good idea to run your tests again.
90
+ * Squash the commits for your branch into one commit
91
+ * `git rebase --interactive HEAD~<number-of-commits>` ([See Github help](https://help.github.com/articles/interactive-rebase))
92
+ * To determine the number of commits on your branch: `git log master..<your-branch> --oneline | wc -l`
93
+ * Squashing your branch's changes into one commit is "good form" and helps the person merging your request to see everything that is going on.
94
+ * Push your changes to a topic branch in your fork of the repository.
95
+ * Submit a pull request from your fork to the project.
96
+
97
+ ### Merging Changes
98
+
99
+ * It is considered "poor from" to merge your own request.
100
+ * Please take the time to review the changes and get a sense of what is being changed. Things to consider:
101
+ * Does the commit message explain what is going on?
102
+ * Does the code changes have tests? _Not all changes need new tests, some changes are refactorings_
103
+ * Does the commit contain more than it should? Are two separate concerns being addressed in one commit?
104
+ * Did the Travis tests complete successfully?
105
+ * If you are uncertain, bring other contributors into the conversation by creating a comment that includes their @username.
106
+ * If you like the pull request, but want others to chime in, create a +1 comment and tag a user.
107
+
108
+ # Additional Resources
109
+
110
+ * [General GitHub documentation](http://help.github.com/)
111
+ * [GitHub pull request documentation](http://help.github.com/send-pull-requests/)
112
+ * [Pro Git](http://git-scm.com/book) is both a free and excellent book about Git.
113
+ * [A Git Config for Contributing](http://ndlib.github.io/practices/my-typical-per-project-git-config/)
@@ -1,3 +1,12 @@
1
+ h2. 3.2.0
2
+ #25 Allow any field_value except nil to be inserted into a solr field
3
+ #24 Remove dependency on solrizer-fedora, use AF to update index by pid
4
+ #23 Enhance Suffix#config so it can be usefully overridden by downstream
5
+
6
+ h2. 3.1.1
7
+ #22 Support for boolean values
8
+ #21 Testing on Rails version 4
9
+
1
10
  h2. 3.1.0
2
11
  #16 Inserting non-multivalued fields should not create a solr error
3
12
  #20 Time fields should be formatted correctly when using active_support/core_ext/date_time/conversions
@@ -0,0 +1,252 @@
1
+ # solrizer
2
+
3
+ [![Build Status](https://travis-ci.org/projecthydra/solrizer.png?branch=master)](https://travis-ci.org/projecthydra/solrizer)
4
+ [![Gem Version](https://badge.fury.io/rb/solrizer.png)](http://badge.fury.io/rb/solrizer)
5
+
6
+ A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from
7
+ the command line, or as a JMS listener.
8
+
9
+ Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
10
+ data source and write solr documents into a solr instance, you need to use an implementation specific gem, such as
11
+ "solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a
12
+ fedora repository and writing to a solr instance.
13
+
14
+
15
+ ## Installation
16
+
17
+ The gem is hosted on [rubygems.org](http://rubygems.org/gems/solrizer). The best way to manage the gems for your project
18
+ is to use bundler. Create a Gemfile in the root of your application and include the following:
19
+
20
+
21
+ source "http://rubygems.org"
22
+ gem 'solrizer'
23
+
24
+ Then:
25
+
26
+ bundle install
27
+
28
+ ## Usage
29
+
30
+ ### Fire up the console:
31
+
32
+ The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
33
+
34
+ Start up a console and load solrizer:
35
+
36
+ > irb
37
+ > require "rubygems"
38
+ > require "solrizer"
39
+
40
+ ### Field Mapper
41
+
42
+ The `FieldMapper` maps term names and values to Solr fields, based on the term's data type and any index_as options.
43
+ Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr
44
+ [schema.xml](https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml).
45
+
46
+ More information on the conventions followed for the dynamic solr fields is on the
47
+ [wiki page](https://github.com/projecthydra/hydra-head/wiki/Solr-Schema).
48
+
49
+ To examine all of Solrizer's field names, open up a ruby console:
50
+
51
+
52
+ > require 'solrizer'
53
+ => true
54
+ > default_mapper = Solrizer::FieldMapper.new
55
+ => #<Solrizer::FieldMapper:0x007fb47a273770 @id_field="id">
56
+ > default_mapper.solr_name("foo",:searchable, type: :string)
57
+ => "foo_teim"
58
+ > default_mapper.solr_name("foo",:searchable, type: :date)
59
+ => "foo_dtim"
60
+ > default_mapper.solr_name("foo",:searchable, type: :integer)
61
+ => "foo_iim"
62
+ > default_mapper.solr_name("foo",:facetable, type: :string)
63
+ => "foo_sim"
64
+ > default_mapper.solr_name("foo",:facetable, type: :integer)
65
+ => "foo_sim"
66
+ > default_mapper.solr_name("foo",:sortable, type: :string)
67
+ => "foo_si"
68
+ > default_mapper.solr_name("foo",:displayable, type: :string)
69
+ => "foo_ssm"
70
+
71
+ ### Default indexing strategies
72
+
73
+ > solr_doc = Hash.new
74
+ > Solrizer.insert_field(solr_doc, 'title', 'whatever', :stored_searchable)
75
+ => {"title_tesim"=>["whatever"]}
76
+
77
+ > Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
78
+ => {"pub_date_si"=>"Nov 2012", "pub_date_ssm"=>["Nov 2012"]}
79
+
80
+ ### Indexing dates
81
+
82
+ as a date:
83
+
84
+ > solr_doc = {}
85
+ > Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
86
+ => {"pub_date_dtim"=>["2012-11-07T00:00:00Z"]}
87
+
88
+ or as a string:
89
+
90
+ > solr_doc = {}
91
+ > Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
92
+ => {"pub_date_dti"=>"2012-11-07T00:00:00Z", "pub_date_ssm"=>["2012-11-07"]}
93
+
94
+ or a string that is stored as a date:
95
+
96
+ > solr_doc = {}
97
+ > Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
98
+ => {"pub_date_dtsim"=>["2013-01-29T00:00:00Z"]}
99
+
100
+ ### Custom indexing strategies
101
+
102
+ #### Create your own index descriptor
103
+
104
+ > solr_doc = {}
105
+ > displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
106
+ > Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
107
+ => {"some_count_isi"=>"45"}
108
+
109
+ #### Override the defaults
110
+
111
+ We can override the default indexing methods within `Solrizer::DefaultDescriptors`
112
+
113
+ Here's the default behavior:
114
+
115
+ > solr_doc = {}
116
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
117
+ => {"title_sim"=>["foobar"]}
118
+
119
+ But let's override that by redefining `:facetable`
120
+
121
+ module Solrizer
122
+ module DefaultDescriptors
123
+ def self.facetable
124
+ Descriptor.new(:string, :indexed, :stored)
125
+ end
126
+ end
127
+ end
128
+
129
+ Now, `:facetable` will return something different:
130
+
131
+ > solr_doc = {}
132
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
133
+ => {"title_ssi"=>"foobar"}
134
+
135
+ #### Creating your own indexers
136
+
137
+ module MyMappers
138
+ def self.mapper_one
139
+ Solrizer::Descriptor.new(:string, :indexed, :stored)
140
+ end
141
+ end
142
+
143
+ Now, set Solrizer's field mapper to use our new module:
144
+
145
+ > solr_doc = {}
146
+ > Solrizer::FieldMapper.descriptors = [MyMappers]
147
+ => [MyMappers]
148
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
149
+ => {"title_ssi"=>"foobar"}
150
+
151
+ ### Using OM
152
+
153
+ t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
154
+
155
+ But now you may also pass an Descriptor instance if that works for you:
156
+
157
+ indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
158
+ t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
159
+
160
+ ### Extractor and Extractor Mixins
161
+
162
+ Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
163
+
164
+ > extractor = Solrizer::Extractor.new
165
+ > solr_doc = Hash.new
166
+ > extractor.format_node_value(["foo ","\n bar"])
167
+ => "foo bar"
168
+ > extractor.insert_solr_field_value(solr_doc, "foo","bar")
169
+ => {"foo"=>"bar"}
170
+ > extractor.insert_solr_field_value(solr_doc,"foo","baz")
171
+ => {"foo"=>["bar", "baz"]}
172
+ > extractor.insert_solr_field_value(solr_doc, "boo","hoo")
173
+ => {"foo"=>["bar", "baz"], "boo"=>"hoo"}
174
+
175
+ #### Solrizer provides some default mixins:
176
+
177
+ `Solrizer::HTML::Extractor` provides html_to_solr method and `Solrizer::XML::Extractor` provides xml_to_solr method:
178
+
179
+ > Solrizer::XML::Extractor
180
+ > extractor = Solrizer::Extractor.new
181
+ > xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
182
+ > extractor.xml_to_solr(xml)
183
+ => {:foo_tesim=>"bar", :bar_tesim=>"baz"}
184
+
185
+ #### Solrizer::XML::TerminologyBasedSolrizer
186
+
187
+ Another powerful mixin for use with classes that include the `OM::XML::Document` module is
188
+ `Solrizer::XML::TerminologyBasedSolrizer`. The methods provided by this module map provides a robust way of mapping
189
+ terms and solr fields via om terminologies. A notable example can be found in `ActiveFedora::NokogiriDatatstream`.
190
+
191
+ ## JMS Listener for Hydra Rails Applications
192
+
193
+ ### The executables: solrizer and solrizerd
194
+
195
+ The solrizer gem provides two executables:
196
+
197
+ * solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
198
+ * solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
199
+
200
+ ### Usage
201
+
202
+ The usage for solrizerd is as follows:
203
+
204
+ solrizerd command --hydra_home PATH [options]
205
+
206
+ The commands are as follows:
207
+ * start start an instance of the application
208
+ * stop stop all instances of the application
209
+ * restart stop all instances and restart them afterwards
210
+ * status show status (PID) of application instances
211
+
212
+ Required parameters:
213
+
214
+ --hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
215
+
216
+ The options:
217
+ * -p, --port Stomp port 61613
218
+ * -o, --host Host to connect to localhost
219
+ * -u, --user User name for stomp listener
220
+ * -w, --password Password for stomp listener
221
+ * -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
222
+ * -h, --help Display this screen
223
+
224
+ Note:
225
+
226
+ Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
227
+
228
+ ## Note on Patches/Pull Requests
229
+
230
+ * Fork the project.
231
+ * Make your feature addition or bug fix.
232
+ * Add tests for it. This is important so I don't break it in a
233
+ future version unintentionally.
234
+ * Commit, do not mess with rake file, version, or history.
235
+ (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
236
+ * Send me a pull request. Bonus points for topic branches.
237
+
238
+ ## Acknowledgments
239
+
240
+ ### Technical Lead
241
+
242
+ Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
243
+
244
+ ### Thanks to
245
+
246
+ * Douglas Kim, who created the initial code base for Solrizer.
247
+ * Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
248
+ * Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
249
+
250
+ ## Copyright
251
+
252
+ Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.
@@ -90,8 +90,7 @@ begin
90
90
  puts @msg.headers.inspect
91
91
  puts "\nPID: #{@msg.headers["pid"]}\n"
92
92
  if ["addDatastream", "addRelationship","ingest","modifyDatastreamByValue","modifyDatastreamByReference","modifyObject","purgeDatastream","purgeRelationship"].include? method
93
- solrizer = Solrizer::Fedora::Solrizer.new
94
- solrizer.solrize @msg.headers["pid"]
93
+ ActiveFedora::Base.find(@msg.headers["pid"], cast: true).update_index
95
94
  elsif method == "purgeObject"
96
95
  ActiveFedora::SolrService.instance.conn.delete_by_id(pid)
97
96
  else
@@ -31,7 +31,7 @@ module Solrizer
31
31
 
32
32
  # @params [Hash] doc the hash to insert the value into
33
33
  # @params [String] name the name of the field (without the suffix)
34
- # @params [String,Date] value the value to be inserted
34
+ # @params [String,Date,Array] value the value (or array of values) to be inserted
35
35
  # @params [Array,Hash] indexer_args the arguments that find the indexer
36
36
  # @returns [Hash] doc the document that was provided with the new field inserted
37
37
  def self.insert_field(doc, name, value, *indexer_args)
@@ -181,7 +181,7 @@ module Solrizer
181
181
  # mapped names and values. The values in the hash are _arrays_, and may contain multiple values.
182
182
 
183
183
  def solr_names_and_values(field_name, field_value, index_types)
184
- return {} unless field_value
184
+ return {} if field_value.nil?
185
185
 
186
186
  # Determine the set of index types
187
187
  index_types = Array(index_types)
@@ -1,20 +1,26 @@
1
+ require 'ostruct'
2
+
1
3
  module Solrizer
2
4
  class Suffix
3
5
 
4
- def initialize(fields)
5
- @fields = fields
6
+ def initialize(*fields)
7
+ @fields = fields.flatten
6
8
  end
7
9
 
8
10
  def multivalued?
9
- @fields.include? :multivalued
11
+ has_field? :multivalued
10
12
  end
11
13
 
12
14
  def stored?
13
- @fields.include? :stored
15
+ has_field? :stored
14
16
  end
15
17
 
16
18
  def indexed?
17
- @fields.include? :indexed
19
+ has_field? :indexed
20
+ end
21
+
22
+ def has_field? f
23
+ f.to_sym == :type or @fields.include? f.to_sym
18
24
  end
19
25
 
20
26
  def data_type
@@ -22,40 +28,52 @@ module Solrizer
22
28
  end
23
29
 
24
30
  def to_s
25
- stored_suffix = config[:stored_suffix] if stored?
26
- index_suffix = config[:index_suffix] if indexed?
27
- multivalued_suffix = config[:multivalued_suffix] if multivalued?
31
+
28
32
  raise Solrizer::InvalidIndexDescriptor, "Missing datatype for #{@fields}" unless data_type
29
- type_suffix = config[:type_suffix].call(data_type)
30
- raise Solrizer::InvalidIndexDescriptor, "Invalid datatype `#{data_type.inspect}'. Must be one of: :date, :time, :text, :text_en, :string, :integer" unless type_suffix
31
33
 
32
- [config[:suffix_delimiter], type_suffix, stored_suffix, index_suffix, multivalued_suffix].join
34
+ field_suffix = [config.suffix_delimiter]
35
+
36
+ config.fields.select { |f| has_field? f }.each do |f|
37
+ key = :"#{f}_suffix"
38
+ field_suffix << if config.send(key).is_a? Proc
39
+ config.send(key).call(@fields)
40
+ else
41
+ config.send(key)
42
+ end
43
+ end
44
+
45
+ field_suffix.join
33
46
  end
34
47
 
48
+ def self.config
49
+ @config ||= OpenStruct.new :fields => [:type, :stored, :indexed, :multivalued],
50
+ suffix_delimiter: '_',
51
+ type_suffix: (lambda do |fields|
52
+ type = fields.first
53
+ case type
54
+ when :string, :symbol # TODO `:symbol' usage ought to be deprecated
55
+ 's'
56
+ when :text
57
+ 't'
58
+ when :text_en
59
+ 'te'
60
+ when :date, :time
61
+ 'dt'
62
+ when :integer
63
+ 'i'
64
+ when :boolean
65
+ 'b'
66
+ else
67
+ raise Solrizer::InvalidIndexDescriptor, "Invalid datatype `#{type.inspect}'. Must be one of: :date, :time, :text, :text_en, :string, :symbol, :integer, :boolean"
68
+ end
69
+ end),
70
+ stored_suffix: 's',
71
+ indexed_suffix: 'i',
72
+ multivalued_suffix: 'm'
73
+ end
35
74
 
36
- private
37
75
  def config
38
- @config ||=
39
- {suffix_delimiter: '_',
40
- type_suffix: lambda do |type|
41
- case type
42
- when :string, :symbol # TODO `:symbol' usage ought to be deprecated
43
- 's'
44
- when :text
45
- 't'
46
- when :text_en
47
- 'te'
48
- when :date, :time
49
- 'dt'
50
- when :integer
51
- 'i'
52
- when :boolean
53
- 'b'
54
- end
55
- end,
56
- stored_suffix: 's',
57
- index_suffix: 'i',
58
- multivalued_suffix: 'm'}
76
+ @config ||= self.class.config.dup
59
77
  end
60
78
  end
61
79
  end
@@ -1,3 +1,3 @@
1
1
  module Solrizer
2
- VERSION = "3.1.1"
2
+ VERSION = "3.2.0"
3
3
  end
@@ -28,7 +28,7 @@ Gem::Specification.new do |s|
28
28
  s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
29
29
  s.extra_rdoc_files = [
30
30
  "LICENSE",
31
- "README.textile"
31
+ "README.md"
32
32
  ]
33
33
  s.require_paths = ["lib"]
34
34
  end
@@ -25,15 +25,24 @@ describe Solrizer do
25
25
  Solrizer.insert_field(doc, 'foo', Time.parse('2013-01-13T22:45:56+06:00'))
26
26
  doc.should == {'foo_dtsim' => ["2013-01-13T16:45:56Z"]}
27
27
  end
28
- it "should insert Booleans" do
28
+ it "should insert true Booleans" do
29
29
  Solrizer.insert_field(doc, 'foo', true)
30
30
  doc.should == {'foo_bsi' => true}
31
31
  end
32
+ it "should insert false Booleans" do
33
+ Solrizer.insert_field(doc, 'foo', false)
34
+ doc.should == {'foo_bsi' => false}
35
+ end
32
36
 
33
37
  it "should insert multiple values" do
34
38
  Solrizer.insert_field(doc, 'foo', ['A name', 'B name'], :sortable, :facetable)
35
39
  doc.should == {'foo_si' => 'B name', 'foo_sim' => ['A name', 'B name']}
36
40
  end
41
+
42
+ it 'should insert nothing when passed a nil value' do
43
+ Solrizer.insert_field(doc, 'foo', nil, :sortable, :facetable)
44
+ doc.should == {}
45
+ end
37
46
  end
38
47
 
39
48
  describe "on a document with values" do
@@ -0,0 +1,80 @@
1
+ require 'spec_helper'
2
+
3
+ describe Solrizer::Suffix do
4
+
5
+ describe "#multivalued?" do
6
+ it "should be multivalued if :multivalued is among the field types" do
7
+ expect(Solrizer::Suffix.new(:multivalued)).to be_multivalued
8
+ end
9
+
10
+ it "should not be multivalued if :multivalued was not passed in a field type" do
11
+ expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_multivalued
12
+ end
13
+ end
14
+
15
+ describe "#stored?" do
16
+ it "should be stored if :stored is among the field types" do
17
+ expect(Solrizer::Suffix.new(:stored)).to be_stored
18
+ end
19
+
20
+ it "should not be stored if :stored was not passed in a field type" do
21
+ expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_stored
22
+ end
23
+ end
24
+
25
+ describe "#indexed?" do
26
+ it "should be indexed if :indexed is among the field types" do
27
+ expect(Solrizer::Suffix.new(:indexed)).to be_indexed
28
+ end
29
+
30
+ it "should not be indexed if :indexed was not passed in a field type" do
31
+ expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_indexed
32
+ end
33
+ end
34
+ describe "#has_field?" do
35
+ subject do
36
+ Solrizer::Suffix.new(:type, :a, :b, :c)
37
+ end
38
+ it "should be able to tell if a field is in the suffix or not" do
39
+ expect(subject).to have_field :a
40
+ expect(subject).to have_field :b
41
+ expect(subject).to have_field :c
42
+ expect(subject).to_not have_field :d
43
+ end
44
+ end
45
+
46
+ describe "#data_type" do
47
+ it "should always be the first argument to the suffix" do
48
+ expect(Solrizer::Suffix.new(:some_type, :a).data_type).to eq :some_type
49
+ end
50
+ end
51
+
52
+ describe "#to_s" do
53
+ it "should combine the fields into a suffix string" do
54
+ expect(Solrizer::Suffix.new(:string, :stored, :indexed).to_s).to eq '_ssi'
55
+ expect(Solrizer::Suffix.new(:integer, :stored, :multivalued).to_s).to eq '_ism'
56
+ end
57
+ end
58
+
59
+ describe "config" do
60
+ subject do
61
+ Solrizer::Suffix.new(:my_custom_type, :a, :b, :c)
62
+ end
63
+
64
+ it "should let you mess with the suffix config" do
65
+ subject.config.fields += [:b]
66
+ subject.config.suffix_delimiter = "#"
67
+ subject.config.type_suffix = lambda do |fields|
68
+ type = fields.first
69
+
70
+ if type == :my_custom_type
71
+ "custom_suffix_"
72
+ else
73
+ "nope"
74
+ end
75
+ end
76
+ subject.config.b_suffix = 'now_with_more_b'
77
+ expect(subject.to_s).to eq "#custom_suffix_now_with_more_b"
78
+ end
79
+ end
80
+ end
metadata CHANGED
@@ -1,153 +1,153 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: solrizer
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.1.1
4
+ version: 3.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Matt Zumwalt
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2013-06-25 00:00:00.000000000 Z
11
+ date: 2014-05-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - '>='
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
19
  version: '0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - '>='
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: xml-simple
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - '>='
31
+ - - ">="
32
32
  - !ruby/object:Gem::Version
33
33
  version: '0'
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - '>='
38
+ - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: mediashelf-loggable
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - ~>
45
+ - - "~>"
46
46
  - !ruby/object:Gem::Version
47
47
  version: 0.4.7
48
48
  type: :runtime
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - ~>
52
+ - - "~>"
53
53
  - !ruby/object:Gem::Version
54
54
  version: 0.4.7
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: stomp
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
- - - '>='
59
+ - - ">="
60
60
  - !ruby/object:Gem::Version
61
61
  version: '0'
62
62
  type: :runtime
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
- - - '>='
66
+ - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: daemons
71
71
  requirement: !ruby/object:Gem::Requirement
72
72
  requirements:
73
- - - '>='
73
+ - - ">="
74
74
  - !ruby/object:Gem::Version
75
75
  version: '0'
76
76
  type: :runtime
77
77
  prerelease: false
78
78
  version_requirements: !ruby/object:Gem::Requirement
79
79
  requirements:
80
- - - '>='
80
+ - - ">="
81
81
  - !ruby/object:Gem::Version
82
82
  version: '0'
83
83
  - !ruby/object:Gem::Dependency
84
84
  name: activesupport
85
85
  requirement: !ruby/object:Gem::Requirement
86
86
  requirements:
87
- - - '>='
87
+ - - ">="
88
88
  - !ruby/object:Gem::Version
89
89
  version: '0'
90
90
  type: :runtime
91
91
  prerelease: false
92
92
  version_requirements: !ruby/object:Gem::Requirement
93
93
  requirements:
94
- - - '>='
94
+ - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0'
97
97
  - !ruby/object:Gem::Dependency
98
98
  name: rspec
99
99
  requirement: !ruby/object:Gem::Requirement
100
100
  requirements:
101
- - - '>='
101
+ - - ">="
102
102
  - !ruby/object:Gem::Version
103
103
  version: '0'
104
104
  type: :development
105
105
  prerelease: false
106
106
  version_requirements: !ruby/object:Gem::Requirement
107
107
  requirements:
108
- - - '>='
108
+ - - ">="
109
109
  - !ruby/object:Gem::Version
110
110
  version: '0'
111
111
  - !ruby/object:Gem::Dependency
112
112
  name: rake
113
113
  requirement: !ruby/object:Gem::Requirement
114
114
  requirements:
115
- - - '>='
115
+ - - ">="
116
116
  - !ruby/object:Gem::Version
117
117
  version: '0'
118
118
  type: :development
119
119
  prerelease: false
120
120
  version_requirements: !ruby/object:Gem::Requirement
121
121
  requirements:
122
- - - '>='
122
+ - - ">="
123
123
  - !ruby/object:Gem::Version
124
124
  version: '0'
125
125
  - !ruby/object:Gem::Dependency
126
126
  name: yard
127
127
  requirement: !ruby/object:Gem::Requirement
128
128
  requirements:
129
- - - '>='
129
+ - - ">="
130
130
  - !ruby/object:Gem::Version
131
131
  version: '0'
132
132
  type: :development
133
133
  prerelease: false
134
134
  version_requirements: !ruby/object:Gem::Requirement
135
135
  requirements:
136
- - - '>='
136
+ - - ">="
137
137
  - !ruby/object:Gem::Version
138
138
  version: '0'
139
139
  - !ruby/object:Gem::Dependency
140
140
  name: RedCloth
141
141
  requirement: !ruby/object:Gem::Requirement
142
142
  requirements:
143
- - - '>='
143
+ - - ">="
144
144
  - !ruby/object:Gem::Version
145
145
  version: '0'
146
146
  type: :development
147
147
  prerelease: false
148
148
  version_requirements: !ruby/object:Gem::Requirement
149
149
  requirements:
150
- - - '>='
150
+ - - ">="
151
151
  - !ruby/object:Gem::Version
152
152
  version: '0'
153
153
  description: Use solrizer to populate solr indexes. You can run solrizer from within
@@ -159,14 +159,15 @@ executables:
159
159
  extensions: []
160
160
  extra_rdoc_files:
161
161
  - LICENSE
162
- - README.textile
162
+ - README.md
163
163
  files:
164
- - .gitignore
165
- - .travis.yml
164
+ - ".gitignore"
165
+ - ".travis.yml"
166
+ - CONTRIBUTING.md
166
167
  - Gemfile
167
168
  - History.txt
168
169
  - LICENSE
169
- - README.textile
170
+ - README.md
170
171
  - Rakefile
171
172
  - bin/solrizer
172
173
  - bin/solrizerd
@@ -193,6 +194,7 @@ files:
193
194
  - spec/units/extractor_spec.rb
194
195
  - spec/units/field_mapper_spec.rb
195
196
  - spec/units/solrizer_spec.rb
197
+ - spec/units/suffix_spec.rb
196
198
  - spec/units/xml_extractor_spec.rb
197
199
  homepage: http://github.com/projecthydra/solrizer
198
200
  licenses: []
@@ -203,17 +205,17 @@ require_paths:
203
205
  - lib
204
206
  required_ruby_version: !ruby/object:Gem::Requirement
205
207
  requirements:
206
- - - '>='
208
+ - - ">="
207
209
  - !ruby/object:Gem::Version
208
210
  version: '0'
209
211
  required_rubygems_version: !ruby/object:Gem::Requirement
210
212
  requirements:
211
- - - '>='
213
+ - - ">="
212
214
  - !ruby/object:Gem::Version
213
215
  version: '0'
214
216
  requirements: []
215
217
  rubyforge_project:
216
- rubygems_version: 2.0.3
218
+ rubygems_version: 2.2.2
217
219
  signing_key:
218
220
  specification_version: 4
219
221
  summary: A utility for building solr indexes, usually from Fedora repository content
@@ -225,5 +227,6 @@ test_files:
225
227
  - spec/units/extractor_spec.rb
226
228
  - spec/units/field_mapper_spec.rb
227
229
  - spec/units/solrizer_spec.rb
230
+ - spec/units/suffix_spec.rb
228
231
  - spec/units/xml_extractor_spec.rb
229
232
  has_rdoc:
@@ -1,249 +0,0 @@
1
- h1. solrizer
2
-
3
- A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from the command line, or as a JMS listener.
4
-
5
- Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
6
- datasource and write solr documents into a solr instance, you need to use an implementation specific gem, such as
7
- "solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a fedora repository and writing to a solr instance.
8
-
9
-
10
- h2. Installation
11
-
12
- The gem is hosted on rubygems.org. The best way to manage the gems for your project is to use bundler. Create a Gemfile in the root of your application and include the following:
13
-
14
- <pre>
15
- source "http://rubygems.org"
16
-
17
- gem 'solrizer'
18
- </pre>
19
-
20
- Then:
21
-
22
- <pre>bundle install</pre>
23
-
24
- h2. Usage
25
-
26
- h3. Fire up the console:
27
-
28
- The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
29
-
30
- Start up a console and load solrizer:
31
-
32
- <pre>
33
- irb
34
- require "rubygems"
35
- require "solrizer"
36
- </pre>
37
-
38
-
39
- h3. Field Mapper
40
-
41
- The FieldMapper maps term names and values to Solr fields, based on the term’s data type and any index_as options. Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr schema.xml file. A copy of that is available :
42
- https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml
43
-
44
- More information on the conventions followed for the dynamic solr fields is here:
45
- https://github.com/projecthydra/hydra-head/wiki/Solr-Schema
46
-
47
- <pre>
48
- default_mapper = Solrizer::FieldMapper::Default.new
49
-
50
- # some of the default mappings in solrizer
51
- default_mapper.solr_name("foo",:string,:searchable) # returns foo_tesim
52
- default_mapper.solr_name("foo",:date,:searchable) # returns foo_dtsim
53
- default_mapper.solr_name("foo",:integer,:searchable # returns foo_isim
54
- default_mapper.solr_name("foo",:string,:facetable) # returns foo_sim
55
- default_mapper.solr_name("foo",:integer,:facetable) # returns foo_iim
56
- default_mapper.solr_name("foo",:string,:sortable) # returns foo_si
57
- default_mapper.solr_name("foo",:string,:displayable) # returns foo_ssm
58
- </pre>
59
-
60
- ## Using default indexing strategies
61
-
62
- <pre>
63
- solr_doc = {}
64
- Solrizer.insert_field(solr_doc, 'title', 'whatever', :searchable)
65
- => {"title_tesim"=>["whatever"]}
66
-
67
- Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
68
- => {"title_tesim"=>["whatever"], "pub_date_ssi"=>["Nov 2012"], "pub_date_ssm"=>["Nov 2012"]}
69
- </pre>
70
-
71
- #### You can also index dates
72
- <pre>
73
- # as a date
74
- solr_doc = {}
75
- Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
76
- => {"pub_date_dtsi"=>["2012-11-07T00:00:00Z"]}
77
-
78
- # or as a string
79
- solr_doc = {}
80
- Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
81
- => {"pub_date_ssi"=>["2012-11-07"], "pub_date_ssm"=>["2012-11-07"]}
82
-
83
- # or a string that is stored as a date
84
- solr_doc = {}
85
- Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
86
- => {"pub_date_dtsi"=>["2013-01-29T00:00:00Z"]}
87
- </pre>
88
-
89
-
90
- ## Using a custom indexing strategy
91
- All you have to do is create your own index descriptor:
92
- <pre>
93
- solr_doc = {}
94
- displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
95
- Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
96
- {"some_count_isi"=>["45"]}
97
- </pre>
98
-
99
- ## Changing the behavior of a default descriptor
100
-
101
- Simply override the methods within Solrizer::DefaultDescriptors
102
- <pre>
103
- # before
104
- solr_doc = {}
105
- Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
106
- => {"title_sim"=>["foobar"]}
107
-
108
- # redefine facetable:
109
- module Solrizer
110
- module DefaultDescriptors
111
- def self.facetable
112
- Descriptor.new(:string, :indexed, :stored)
113
- end
114
- end
115
- end
116
-
117
- # after
118
- solr_doc = {}
119
- Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
120
- => {"title_ssi"=>["foobar"]}
121
- </pre>
122
-
123
-
124
- ## Creating your own Indexers
125
- <pre>
126
- module MyMappers
127
- def self.mapper_one
128
- Solrizer::Descriptor.new(:string, :indexed, :stored)
129
- end
130
- end
131
-
132
- solr_doc = {}
133
-
134
- Solrizer::FieldMapper.descriptors = [MyMappers]
135
- => [MyMappers]
136
-
137
- Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
138
- => {"title_ssi"=>["foobar"]}
139
- </pre>
140
-
141
- ## Using OM
142
- Same as it ever was:
143
- <pre>
144
- t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
145
- </pre>
146
-
147
- But now you may also pass an Descriptor instance if that works for you:
148
- <pre>
149
- indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
150
- t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
151
-
152
- </pre>
153
-
154
- h3. Extractor and Extractor Mixins
155
-
156
- Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
157
-
158
- <pre>
159
- extractor = Solrizer::Extractor.new
160
-
161
- extractor.format_node_value(["foo ","\n bar"]) # returns "foo bar"
162
-
163
- solr_doc = Hash.new
164
- extractor.insert_solr_field_value(solr_doc, "foo","bar") # solr_doc is now {"foo" => ["bar"]}
165
- extractor.insert_solr_field_value(solr_doc,"foo","baz") # solr_doc is now {"foo" => ["bar","baz"]}
166
- extractor.insert_solr_field_value(solr_doc, "boo","hoo") # solr_doc is now {"foo" => ["bar","baz"], "boo" => ["hoo"]}
167
- </pre>
168
-
169
- h4. Solrizer provides some default mixins:
170
-
171
- * Solrizer::HTML::Extractor -=> provides html_to_solr method
172
- * Solrizer::XML::Extractor -=> provides xml_to_solr method
173
-
174
- <pre>
175
- xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
176
-
177
- extractor.xml_to_solr(xml) # returns {:foo_tesim=>"bar", :bar_tesim=>"baz"}
178
- </pre>
179
-
180
- h4. Solrizer::XML::TerminologyBasedSolrizer
181
-
182
- Another powerful mixin for use with classes that include the OM::XML::Document module is Solrizer::XML::TerminologyBasedSolrizer.
183
- The methods provided by this module map provides a robust way of mapping terms and solr fields via om terminologies. A notable example
184
- can be found in ActiveFedora::NokogiriDatatstream.
185
-
186
-
187
- h2. JMS Listener for Hydra Rails Applications
188
-
189
- h3. The executables: solrizer and solrizerd
190
-
191
- The solrizer gem provides two executables:
192
-
193
- * solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
194
- * solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
195
-
196
- h3. Usage
197
-
198
- The usage for solrizerd is as follows:
199
-
200
- <pre>
201
- solrizerd command --hydra_home PATH [options]
202
- </pre>
203
-
204
- The commands are as follows:
205
- * start start an instance of the application
206
- * stop stop all instances of the application
207
- * restart stop all instances and restart them afterwards
208
- * status show status (PID) of application instances
209
-
210
- Required parameters:
211
-
212
- --hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
213
-
214
- The options:
215
- * -p, --port Stomp port 61613
216
- * -o, --host Host to connect to localhost
217
- * -u, --user User name for stomp listener
218
- * -w, --password Password for stomp listener
219
- * -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
220
- * -h, --help Display this screen
221
-
222
- Note:
223
-
224
- Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
225
-
226
-
227
- h2. Note on Patches/Pull Requests
228
-
229
- * Fork the project.
230
- * Make your feature addition or bug fix.
231
- * Add tests for it. This is important so I don't break it in a
232
- future version unintentionally.
233
- * Commit, do not mess with rake file, version, or history.
234
- (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
235
- * Send me a pull request. Bonus points for topic branches.
236
-
237
- h2. Acknowledgements
238
-
239
- Technical Lead: Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
240
-
241
- Thanks to
242
-
243
- Douglas Kim, who created the initial code base for Solrizer.
244
- Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
245
- Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
246
-
247
- h2. Copyright
248
-
249
- Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.