solrizer 3.1.1 → 3.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 2352b7c26cf55046a975d909108924a8ca9021f8
4
- data.tar.gz: c83e79545b2caa984d75f2deb6bd14d3495a057e
3
+ metadata.gz: 8d77a71925151847cbb586039e03296e1b0aa2c9
4
+ data.tar.gz: a2bbb1286b7b037dc5a83d81f84ec098edc60283
5
5
  SHA512:
6
- metadata.gz: 518b8942031a7de396ebfd51b0e3de9b933ec0f9a06e10d5f53249655dee2ebfc9da6ad65a42036dec7fbf9f6bd0614f84f2ae4ce411e036c63ddeffd5b41967
7
- data.tar.gz: d3e557a70e7f9fd80c6be842b5fb392eee8e93f786638ae691b6ee7c4c9b5bbb7ea939b61f42d2854e19c2a075e247e49d9240085ca6d08484cac57f6d6a20b4
6
+ metadata.gz: 76ad75ea81fb427b72a0b214637427f44ecd54d5e4b80cc43f49ae4c13598816d3756144b4b1081f7c357169595d97d7c8ce4e1137533e54e99807c619f9d9d6
7
+ data.tar.gz: 9294cad6b4a5661321d4a0780652af3e31fbebfeabf6b3ebe5d62577d37aaf6cc685255edb50acc5fd8782f468bb05a0d8ba1e64bb40c346c6f10202e8103250
@@ -0,0 +1,113 @@
1
+ # How to Contribute
2
+
3
+ We want your help to make Project Hydra great.
4
+ There are a few guidelines that we need contributors to follow so that we can have a chance of keeping on top of things.
5
+
6
+ ## Hydra Project Intellectual Property Licensing and Ownership
7
+
8
+ All code contributors must have an Individual Contributor License Agreement (iCLA) on file with the Hydra Project Steering Group.
9
+ If the contributor works for an institution, the institution must have a Corporate Contributor License Agreement (cCLA) on file.
10
+
11
+ https://wiki.duraspace.org/display/hydra/Hydra+Project+Intellectual+Property+Licensing+and+Ownership
12
+
13
+ You should also add yourself to the `CONTRIBUTORS.md` file in the root of the project.
14
+
15
+ ## Contribution Tasks
16
+
17
+ * Reporting Issues
18
+ * Making Changes
19
+ * Submitting Changes
20
+ * Merging Changes
21
+
22
+ ### Reporting Issues
23
+
24
+ * Make sure you have a [GitHub account](https://github.com/signup/free)
25
+ * Submit a [Github issue](./issues) by:
26
+ * Clearly describing the issue
27
+ * Provide a descriptive summary
28
+ * Explain the expected behavior
29
+ * Explain the actual behavior
30
+ * Provide steps to reproduce the actual behavior
31
+
32
+ ### Making Changes
33
+
34
+ * Fork the repository on GitHub
35
+ * Create a topic branch from where you want to base your work.
36
+ * This is usually the master branch.
37
+ * To quickly create a topic branch based on master; `git branch fix/master/my_contribution master`
38
+ * Then checkout the new branch with `git checkout fix/master/my_contribution`.
39
+ * Please avoid working directly on the `master` branch.
40
+ * You may find the [hub suite of commands](https://github.com/defunkt/hub) helpful
41
+ * Make commits of logical units.
42
+ * Your commit should include a high level description of your work in HISTORY.textile
43
+ * Check for unnecessary whitespace with `git diff --check` before committing.
44
+ * Make sure your commit messages are [well formed](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).
45
+ * If you created an issue, you can close it by including "Closes #issue" in your commit message. See [Github's blog post for more details](https://github.com/blog/1386-closing-issues-via-commit-messages)
46
+
47
+ ```
48
+ Present tense short summary (50 characters or less)
49
+
50
+ More detailed description, if necessary. It should be wrapped to 72
51
+ characters. Try to be as descriptive as you can, even if you think that
52
+ the commit content is obvious, it may not be obvious to others. You
53
+ should add such description also if it's already present in bug tracker,
54
+ it should not be necessary to visit a webpage to check the history.
55
+
56
+ Include Closes #<issue-number> when relavent.
57
+
58
+ Description can have multiple paragraphs and you can use code examples
59
+ inside, just indent it with 4 spaces:
60
+
61
+ class PostsController
62
+ def index
63
+ respond_with Post.limit(10)
64
+ end
65
+ end
66
+
67
+ You can also add bullet points:
68
+
69
+ - you can use dashes or asterisks
70
+
71
+ - also, try to indent next line of a point for readability, if it's too
72
+ long to fit in 72 characters
73
+ ```
74
+
75
+ * Make sure you have added the necessary tests for your changes.
76
+ * Run _all_ the tests to assure nothing else was accidentally broken.
77
+ * When you are ready to submit a pull request
78
+
79
+ ### Submitting Changes
80
+
81
+ [Detailed Walkthrough of One Pull Request per Commit](http://ndlib.github.io/practices/one-commit-per-pull-request/)
82
+
83
+ * Read the article ["Using Pull Requests"](https://help.github.com/articles/using-pull-requests) on GitHub.
84
+ * Make sure your branch is up to date with its parent branch (i.e. master)
85
+ * `git checkout master`
86
+ * `git pull --rebase`
87
+ * `git checkout <your-branch>`
88
+ * `git rebase master`
89
+ * It is likely a good idea to run your tests again.
90
+ * Squash the commits for your branch into one commit
91
+ * `git rebase --interactive HEAD~<number-of-commits>` ([See Github help](https://help.github.com/articles/interactive-rebase))
92
+ * To determine the number of commits on your branch: `git log master..<your-branch> --oneline | wc -l`
93
+ * Squashing your branch's changes into one commit is "good form" and helps the person merging your request to see everything that is going on.
94
+ * Push your changes to a topic branch in your fork of the repository.
95
+ * Submit a pull request from your fork to the project.
96
+
97
+ ### Merging Changes
98
+
99
+ * It is considered "poor from" to merge your own request.
100
+ * Please take the time to review the changes and get a sense of what is being changed. Things to consider:
101
+ * Does the commit message explain what is going on?
102
+ * Does the code changes have tests? _Not all changes need new tests, some changes are refactorings_
103
+ * Does the commit contain more than it should? Are two separate concerns being addressed in one commit?
104
+ * Did the Travis tests complete successfully?
105
+ * If you are uncertain, bring other contributors into the conversation by creating a comment that includes their @username.
106
+ * If you like the pull request, but want others to chime in, create a +1 comment and tag a user.
107
+
108
+ # Additional Resources
109
+
110
+ * [General GitHub documentation](http://help.github.com/)
111
+ * [GitHub pull request documentation](http://help.github.com/send-pull-requests/)
112
+ * [Pro Git](http://git-scm.com/book) is both a free and excellent book about Git.
113
+ * [A Git Config for Contributing](http://ndlib.github.io/practices/my-typical-per-project-git-config/)
@@ -1,3 +1,12 @@
1
+ h2. 3.2.0
2
+ #25 Allow any field_value except nil to be inserted into a solr field
3
+ #24 Remove dependency on solrizer-fedora, use AF to update index by pid
4
+ #23 Enhance Suffix#config so it can be usefully overridden by downstream
5
+
6
+ h2. 3.1.1
7
+ #22 Support for boolean values
8
+ #21 Testing on Rails version 4
9
+
1
10
  h2. 3.1.0
2
11
  #16 Inserting non-multivalued fields should not create a solr error
3
12
  #20 Time fields should be formatted correctly when using active_support/core_ext/date_time/conversions
@@ -0,0 +1,252 @@
1
+ # solrizer
2
+
3
+ [![Build Status](https://travis-ci.org/projecthydra/solrizer.png?branch=master)](https://travis-ci.org/projecthydra/solrizer)
4
+ [![Gem Version](https://badge.fury.io/rb/solrizer.png)](http://badge.fury.io/rb/solrizer)
5
+
6
+ A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from
7
+ the command line, or as a JMS listener.
8
+
9
+ Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
10
+ data source and write solr documents into a solr instance, you need to use an implementation specific gem, such as
11
+ "solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a
12
+ fedora repository and writing to a solr instance.
13
+
14
+
15
+ ## Installation
16
+
17
+ The gem is hosted on [rubygems.org](http://rubygems.org/gems/solrizer). The best way to manage the gems for your project
18
+ is to use bundler. Create a Gemfile in the root of your application and include the following:
19
+
20
+
21
+ source "http://rubygems.org"
22
+ gem 'solrizer'
23
+
24
+ Then:
25
+
26
+ bundle install
27
+
28
+ ## Usage
29
+
30
+ ### Fire up the console:
31
+
32
+ The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
33
+
34
+ Start up a console and load solrizer:
35
+
36
+ > irb
37
+ > require "rubygems"
38
+ > require "solrizer"
39
+
40
+ ### Field Mapper
41
+
42
+ The `FieldMapper` maps term names and values to Solr fields, based on the term's data type and any index_as options.
43
+ Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr
44
+ [schema.xml](https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml).
45
+
46
+ More information on the conventions followed for the dynamic solr fields is on the
47
+ [wiki page](https://github.com/projecthydra/hydra-head/wiki/Solr-Schema).
48
+
49
+ To examine all of Solrizer's field names, open up a ruby console:
50
+
51
+
52
+ > require 'solrizer'
53
+ => true
54
+ > default_mapper = Solrizer::FieldMapper.new
55
+ => #<Solrizer::FieldMapper:0x007fb47a273770 @id_field="id">
56
+ > default_mapper.solr_name("foo",:searchable, type: :string)
57
+ => "foo_teim"
58
+ > default_mapper.solr_name("foo",:searchable, type: :date)
59
+ => "foo_dtim"
60
+ > default_mapper.solr_name("foo",:searchable, type: :integer)
61
+ => "foo_iim"
62
+ > default_mapper.solr_name("foo",:facetable, type: :string)
63
+ => "foo_sim"
64
+ > default_mapper.solr_name("foo",:facetable, type: :integer)
65
+ => "foo_sim"
66
+ > default_mapper.solr_name("foo",:sortable, type: :string)
67
+ => "foo_si"
68
+ > default_mapper.solr_name("foo",:displayable, type: :string)
69
+ => "foo_ssm"
70
+
71
+ ### Default indexing strategies
72
+
73
+ > solr_doc = Hash.new
74
+ > Solrizer.insert_field(solr_doc, 'title', 'whatever', :stored_searchable)
75
+ => {"title_tesim"=>["whatever"]}
76
+
77
+ > Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
78
+ => {"pub_date_si"=>"Nov 2012", "pub_date_ssm"=>["Nov 2012"]}
79
+
80
+ ### Indexing dates
81
+
82
+ as a date:
83
+
84
+ > solr_doc = {}
85
+ > Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
86
+ => {"pub_date_dtim"=>["2012-11-07T00:00:00Z"]}
87
+
88
+ or as a string:
89
+
90
+ > solr_doc = {}
91
+ > Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
92
+ => {"pub_date_dti"=>"2012-11-07T00:00:00Z", "pub_date_ssm"=>["2012-11-07"]}
93
+
94
+ or a string that is stored as a date:
95
+
96
+ > solr_doc = {}
97
+ > Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
98
+ => {"pub_date_dtsim"=>["2013-01-29T00:00:00Z"]}
99
+
100
+ ### Custom indexing strategies
101
+
102
+ #### Create your own index descriptor
103
+
104
+ > solr_doc = {}
105
+ > displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
106
+ > Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
107
+ => {"some_count_isi"=>"45"}
108
+
109
+ #### Override the defaults
110
+
111
+ We can override the default indexing methods within `Solrizer::DefaultDescriptors`
112
+
113
+ Here's the default behavior:
114
+
115
+ > solr_doc = {}
116
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
117
+ => {"title_sim"=>["foobar"]}
118
+
119
+ But let's override that by redefining `:facetable`
120
+
121
+ module Solrizer
122
+ module DefaultDescriptors
123
+ def self.facetable
124
+ Descriptor.new(:string, :indexed, :stored)
125
+ end
126
+ end
127
+ end
128
+
129
+ Now, `:facetable` will return something different:
130
+
131
+ > solr_doc = {}
132
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
133
+ => {"title_ssi"=>"foobar"}
134
+
135
+ #### Creating your own indexers
136
+
137
+ module MyMappers
138
+ def self.mapper_one
139
+ Solrizer::Descriptor.new(:string, :indexed, :stored)
140
+ end
141
+ end
142
+
143
+ Now, set Solrizer's field mapper to use our new module:
144
+
145
+ > solr_doc = {}
146
+ > Solrizer::FieldMapper.descriptors = [MyMappers]
147
+ => [MyMappers]
148
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
149
+ => {"title_ssi"=>"foobar"}
150
+
151
+ ### Using OM
152
+
153
+ t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
154
+
155
+ But now you may also pass an Descriptor instance if that works for you:
156
+
157
+ indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
158
+ t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
159
+
160
+ ### Extractor and Extractor Mixins
161
+
162
+ Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
163
+
164
+ > extractor = Solrizer::Extractor.new
165
+ > solr_doc = Hash.new
166
+ > extractor.format_node_value(["foo ","\n bar"])
167
+ => "foo bar"
168
+ > extractor.insert_solr_field_value(solr_doc, "foo","bar")
169
+ => {"foo"=>"bar"}
170
+ > extractor.insert_solr_field_value(solr_doc,"foo","baz")
171
+ => {"foo"=>["bar", "baz"]}
172
+ > extractor.insert_solr_field_value(solr_doc, "boo","hoo")
173
+ => {"foo"=>["bar", "baz"], "boo"=>"hoo"}
174
+
175
+ #### Solrizer provides some default mixins:
176
+
177
+ `Solrizer::HTML::Extractor` provides html_to_solr method and `Solrizer::XML::Extractor` provides xml_to_solr method:
178
+
179
+ > Solrizer::XML::Extractor
180
+ > extractor = Solrizer::Extractor.new
181
+ > xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
182
+ > extractor.xml_to_solr(xml)
183
+ => {:foo_tesim=>"bar", :bar_tesim=>"baz"}
184
+
185
+ #### Solrizer::XML::TerminologyBasedSolrizer
186
+
187
+ Another powerful mixin for use with classes that include the `OM::XML::Document` module is
188
+ `Solrizer::XML::TerminologyBasedSolrizer`. The methods provided by this module map provides a robust way of mapping
189
+ terms and solr fields via om terminologies. A notable example can be found in `ActiveFedora::NokogiriDatatstream`.
190
+
191
+ ## JMS Listener for Hydra Rails Applications
192
+
193
+ ### The executables: solrizer and solrizerd
194
+
195
+ The solrizer gem provides two executables:
196
+
197
+ * solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
198
+ * solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
199
+
200
+ ### Usage
201
+
202
+ The usage for solrizerd is as follows:
203
+
204
+ solrizerd command --hydra_home PATH [options]
205
+
206
+ The commands are as follows:
207
+ * start start an instance of the application
208
+ * stop stop all instances of the application
209
+ * restart stop all instances and restart them afterwards
210
+ * status show status (PID) of application instances
211
+
212
+ Required parameters:
213
+
214
+ --hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
215
+
216
+ The options:
217
+ * -p, --port Stomp port 61613
218
+ * -o, --host Host to connect to localhost
219
+ * -u, --user User name for stomp listener
220
+ * -w, --password Password for stomp listener
221
+ * -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
222
+ * -h, --help Display this screen
223
+
224
+ Note:
225
+
226
+ Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
227
+
228
+ ## Note on Patches/Pull Requests
229
+
230
+ * Fork the project.
231
+ * Make your feature addition or bug fix.
232
+ * Add tests for it. This is important so I don't break it in a
233
+ future version unintentionally.
234
+ * Commit, do not mess with rake file, version, or history.
235
+ (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
236
+ * Send me a pull request. Bonus points for topic branches.
237
+
238
+ ## Acknowledgments
239
+
240
+ ### Technical Lead
241
+
242
+ Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
243
+
244
+ ### Thanks to
245
+
246
+ * Douglas Kim, who created the initial code base for Solrizer.
247
+ * Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
248
+ * Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
249
+
250
+ ## Copyright
251
+
252
+ Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.
@@ -90,8 +90,7 @@ begin
90
90
  puts @msg.headers.inspect
91
91
  puts "\nPID: #{@msg.headers["pid"]}\n"
92
92
  if ["addDatastream", "addRelationship","ingest","modifyDatastreamByValue","modifyDatastreamByReference","modifyObject","purgeDatastream","purgeRelationship"].include? method
93
- solrizer = Solrizer::Fedora::Solrizer.new
94
- solrizer.solrize @msg.headers["pid"]
93
+ ActiveFedora::Base.find(@msg.headers["pid"], cast: true).update_index
95
94
  elsif method == "purgeObject"
96
95
  ActiveFedora::SolrService.instance.conn.delete_by_id(pid)
97
96
  else
@@ -31,7 +31,7 @@ module Solrizer
31
31
 
32
32
  # @params [Hash] doc the hash to insert the value into
33
33
  # @params [String] name the name of the field (without the suffix)
34
- # @params [String,Date] value the value to be inserted
34
+ # @params [String,Date,Array] value the value (or array of values) to be inserted
35
35
  # @params [Array,Hash] indexer_args the arguments that find the indexer
36
36
  # @returns [Hash] doc the document that was provided with the new field inserted
37
37
  def self.insert_field(doc, name, value, *indexer_args)
@@ -181,7 +181,7 @@ module Solrizer
181
181
  # mapped names and values. The values in the hash are _arrays_, and may contain multiple values.
182
182
 
183
183
  def solr_names_and_values(field_name, field_value, index_types)
184
- return {} unless field_value
184
+ return {} if field_value.nil?
185
185
 
186
186
  # Determine the set of index types
187
187
  index_types = Array(index_types)
@@ -1,20 +1,26 @@
1
+ require 'ostruct'
2
+
1
3
  module Solrizer
2
4
  class Suffix
3
5
 
4
- def initialize(fields)
5
- @fields = fields
6
+ def initialize(*fields)
7
+ @fields = fields.flatten
6
8
  end
7
9
 
8
10
  def multivalued?
9
- @fields.include? :multivalued
11
+ has_field? :multivalued
10
12
  end
11
13
 
12
14
  def stored?
13
- @fields.include? :stored
15
+ has_field? :stored
14
16
  end
15
17
 
16
18
  def indexed?
17
- @fields.include? :indexed
19
+ has_field? :indexed
20
+ end
21
+
22
+ def has_field? f
23
+ f.to_sym == :type or @fields.include? f.to_sym
18
24
  end
19
25
 
20
26
  def data_type
@@ -22,40 +28,52 @@ module Solrizer
22
28
  end
23
29
 
24
30
  def to_s
25
- stored_suffix = config[:stored_suffix] if stored?
26
- index_suffix = config[:index_suffix] if indexed?
27
- multivalued_suffix = config[:multivalued_suffix] if multivalued?
31
+
28
32
  raise Solrizer::InvalidIndexDescriptor, "Missing datatype for #{@fields}" unless data_type
29
- type_suffix = config[:type_suffix].call(data_type)
30
- raise Solrizer::InvalidIndexDescriptor, "Invalid datatype `#{data_type.inspect}'. Must be one of: :date, :time, :text, :text_en, :string, :integer" unless type_suffix
31
33
 
32
- [config[:suffix_delimiter], type_suffix, stored_suffix, index_suffix, multivalued_suffix].join
34
+ field_suffix = [config.suffix_delimiter]
35
+
36
+ config.fields.select { |f| has_field? f }.each do |f|
37
+ key = :"#{f}_suffix"
38
+ field_suffix << if config.send(key).is_a? Proc
39
+ config.send(key).call(@fields)
40
+ else
41
+ config.send(key)
42
+ end
43
+ end
44
+
45
+ field_suffix.join
33
46
  end
34
47
 
48
+ def self.config
49
+ @config ||= OpenStruct.new :fields => [:type, :stored, :indexed, :multivalued],
50
+ suffix_delimiter: '_',
51
+ type_suffix: (lambda do |fields|
52
+ type = fields.first
53
+ case type
54
+ when :string, :symbol # TODO `:symbol' usage ought to be deprecated
55
+ 's'
56
+ when :text
57
+ 't'
58
+ when :text_en
59
+ 'te'
60
+ when :date, :time
61
+ 'dt'
62
+ when :integer
63
+ 'i'
64
+ when :boolean
65
+ 'b'
66
+ else
67
+ raise Solrizer::InvalidIndexDescriptor, "Invalid datatype `#{type.inspect}'. Must be one of: :date, :time, :text, :text_en, :string, :symbol, :integer, :boolean"
68
+ end
69
+ end),
70
+ stored_suffix: 's',
71
+ indexed_suffix: 'i',
72
+ multivalued_suffix: 'm'
73
+ end
35
74
 
36
- private
37
75
  def config
38
- @config ||=
39
- {suffix_delimiter: '_',
40
- type_suffix: lambda do |type|
41
- case type
42
- when :string, :symbol # TODO `:symbol' usage ought to be deprecated
43
- 's'
44
- when :text
45
- 't'
46
- when :text_en
47
- 'te'
48
- when :date, :time
49
- 'dt'
50
- when :integer
51
- 'i'
52
- when :boolean
53
- 'b'
54
- end
55
- end,
56
- stored_suffix: 's',
57
- index_suffix: 'i',
58
- multivalued_suffix: 'm'}
76
+ @config ||= self.class.config.dup
59
77
  end
60
78
  end
61
79
  end
@@ -1,3 +1,3 @@
1
1
  module Solrizer
2
- VERSION = "3.1.1"
2
+ VERSION = "3.2.0"
3
3
  end
@@ -28,7 +28,7 @@ Gem::Specification.new do |s|
28
28
  s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
29
29
  s.extra_rdoc_files = [
30
30
  "LICENSE",
31
- "README.textile"
31
+ "README.md"
32
32
  ]
33
33
  s.require_paths = ["lib"]
34
34
  end
@@ -25,15 +25,24 @@ describe Solrizer do
25
25
  Solrizer.insert_field(doc, 'foo', Time.parse('2013-01-13T22:45:56+06:00'))
26
26
  doc.should == {'foo_dtsim' => ["2013-01-13T16:45:56Z"]}
27
27
  end
28
- it "should insert Booleans" do
28
+ it "should insert true Booleans" do
29
29
  Solrizer.insert_field(doc, 'foo', true)
30
30
  doc.should == {'foo_bsi' => true}
31
31
  end
32
+ it "should insert false Booleans" do
33
+ Solrizer.insert_field(doc, 'foo', false)
34
+ doc.should == {'foo_bsi' => false}
35
+ end
32
36
 
33
37
  it "should insert multiple values" do
34
38
  Solrizer.insert_field(doc, 'foo', ['A name', 'B name'], :sortable, :facetable)
35
39
  doc.should == {'foo_si' => 'B name', 'foo_sim' => ['A name', 'B name']}
36
40
  end
41
+
42
+ it 'should insert nothing when passed a nil value' do
43
+ Solrizer.insert_field(doc, 'foo', nil, :sortable, :facetable)
44
+ doc.should == {}
45
+ end
37
46
  end
38
47
 
39
48
  describe "on a document with values" do
@@ -0,0 +1,80 @@
1
+ require 'spec_helper'
2
+
3
+ describe Solrizer::Suffix do
4
+
5
+ describe "#multivalued?" do
6
+ it "should be multivalued if :multivalued is among the field types" do
7
+ expect(Solrizer::Suffix.new(:multivalued)).to be_multivalued
8
+ end
9
+
10
+ it "should not be multivalued if :multivalued was not passed in a field type" do
11
+ expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_multivalued
12
+ end
13
+ end
14
+
15
+ describe "#stored?" do
16
+ it "should be stored if :stored is among the field types" do
17
+ expect(Solrizer::Suffix.new(:stored)).to be_stored
18
+ end
19
+
20
+ it "should not be stored if :stored was not passed in a field type" do
21
+ expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_stored
22
+ end
23
+ end
24
+
25
+ describe "#indexed?" do
26
+ it "should be indexed if :indexed is among the field types" do
27
+ expect(Solrizer::Suffix.new(:indexed)).to be_indexed
28
+ end
29
+
30
+ it "should not be indexed if :indexed was not passed in a field type" do
31
+ expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_indexed
32
+ end
33
+ end
34
+ describe "#has_field?" do
35
+ subject do
36
+ Solrizer::Suffix.new(:type, :a, :b, :c)
37
+ end
38
+ it "should be able to tell if a field is in the suffix or not" do
39
+ expect(subject).to have_field :a
40
+ expect(subject).to have_field :b
41
+ expect(subject).to have_field :c
42
+ expect(subject).to_not have_field :d
43
+ end
44
+ end
45
+
46
+ describe "#data_type" do
47
+ it "should always be the first argument to the suffix" do
48
+ expect(Solrizer::Suffix.new(:some_type, :a).data_type).to eq :some_type
49
+ end
50
+ end
51
+
52
+ describe "#to_s" do
53
+ it "should combine the fields into a suffix string" do
54
+ expect(Solrizer::Suffix.new(:string, :stored, :indexed).to_s).to eq '_ssi'
55
+ expect(Solrizer::Suffix.new(:integer, :stored, :multivalued).to_s).to eq '_ism'
56
+ end
57
+ end
58
+
59
+ describe "config" do
60
+ subject do
61
+ Solrizer::Suffix.new(:my_custom_type, :a, :b, :c)
62
+ end
63
+
64
+ it "should let you mess with the suffix config" do
65
+ subject.config.fields += [:b]
66
+ subject.config.suffix_delimiter = "#"
67
+ subject.config.type_suffix = lambda do |fields|
68
+ type = fields.first
69
+
70
+ if type == :my_custom_type
71
+ "custom_suffix_"
72
+ else
73
+ "nope"
74
+ end
75
+ end
76
+ subject.config.b_suffix = 'now_with_more_b'
77
+ expect(subject.to_s).to eq "#custom_suffix_now_with_more_b"
78
+ end
79
+ end
80
+ end
metadata CHANGED
@@ -1,153 +1,153 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: solrizer
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.1.1
4
+ version: 3.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Matt Zumwalt
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2013-06-25 00:00:00.000000000 Z
11
+ date: 2014-05-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - '>='
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
19
  version: '0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - '>='
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: xml-simple
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - '>='
31
+ - - ">="
32
32
  - !ruby/object:Gem::Version
33
33
  version: '0'
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - '>='
38
+ - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: mediashelf-loggable
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - ~>
45
+ - - "~>"
46
46
  - !ruby/object:Gem::Version
47
47
  version: 0.4.7
48
48
  type: :runtime
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - ~>
52
+ - - "~>"
53
53
  - !ruby/object:Gem::Version
54
54
  version: 0.4.7
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: stomp
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
- - - '>='
59
+ - - ">="
60
60
  - !ruby/object:Gem::Version
61
61
  version: '0'
62
62
  type: :runtime
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
- - - '>='
66
+ - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: daemons
71
71
  requirement: !ruby/object:Gem::Requirement
72
72
  requirements:
73
- - - '>='
73
+ - - ">="
74
74
  - !ruby/object:Gem::Version
75
75
  version: '0'
76
76
  type: :runtime
77
77
  prerelease: false
78
78
  version_requirements: !ruby/object:Gem::Requirement
79
79
  requirements:
80
- - - '>='
80
+ - - ">="
81
81
  - !ruby/object:Gem::Version
82
82
  version: '0'
83
83
  - !ruby/object:Gem::Dependency
84
84
  name: activesupport
85
85
  requirement: !ruby/object:Gem::Requirement
86
86
  requirements:
87
- - - '>='
87
+ - - ">="
88
88
  - !ruby/object:Gem::Version
89
89
  version: '0'
90
90
  type: :runtime
91
91
  prerelease: false
92
92
  version_requirements: !ruby/object:Gem::Requirement
93
93
  requirements:
94
- - - '>='
94
+ - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0'
97
97
  - !ruby/object:Gem::Dependency
98
98
  name: rspec
99
99
  requirement: !ruby/object:Gem::Requirement
100
100
  requirements:
101
- - - '>='
101
+ - - ">="
102
102
  - !ruby/object:Gem::Version
103
103
  version: '0'
104
104
  type: :development
105
105
  prerelease: false
106
106
  version_requirements: !ruby/object:Gem::Requirement
107
107
  requirements:
108
- - - '>='
108
+ - - ">="
109
109
  - !ruby/object:Gem::Version
110
110
  version: '0'
111
111
  - !ruby/object:Gem::Dependency
112
112
  name: rake
113
113
  requirement: !ruby/object:Gem::Requirement
114
114
  requirements:
115
- - - '>='
115
+ - - ">="
116
116
  - !ruby/object:Gem::Version
117
117
  version: '0'
118
118
  type: :development
119
119
  prerelease: false
120
120
  version_requirements: !ruby/object:Gem::Requirement
121
121
  requirements:
122
- - - '>='
122
+ - - ">="
123
123
  - !ruby/object:Gem::Version
124
124
  version: '0'
125
125
  - !ruby/object:Gem::Dependency
126
126
  name: yard
127
127
  requirement: !ruby/object:Gem::Requirement
128
128
  requirements:
129
- - - '>='
129
+ - - ">="
130
130
  - !ruby/object:Gem::Version
131
131
  version: '0'
132
132
  type: :development
133
133
  prerelease: false
134
134
  version_requirements: !ruby/object:Gem::Requirement
135
135
  requirements:
136
- - - '>='
136
+ - - ">="
137
137
  - !ruby/object:Gem::Version
138
138
  version: '0'
139
139
  - !ruby/object:Gem::Dependency
140
140
  name: RedCloth
141
141
  requirement: !ruby/object:Gem::Requirement
142
142
  requirements:
143
- - - '>='
143
+ - - ">="
144
144
  - !ruby/object:Gem::Version
145
145
  version: '0'
146
146
  type: :development
147
147
  prerelease: false
148
148
  version_requirements: !ruby/object:Gem::Requirement
149
149
  requirements:
150
- - - '>='
150
+ - - ">="
151
151
  - !ruby/object:Gem::Version
152
152
  version: '0'
153
153
  description: Use solrizer to populate solr indexes. You can run solrizer from within
@@ -159,14 +159,15 @@ executables:
159
159
  extensions: []
160
160
  extra_rdoc_files:
161
161
  - LICENSE
162
- - README.textile
162
+ - README.md
163
163
  files:
164
- - .gitignore
165
- - .travis.yml
164
+ - ".gitignore"
165
+ - ".travis.yml"
166
+ - CONTRIBUTING.md
166
167
  - Gemfile
167
168
  - History.txt
168
169
  - LICENSE
169
- - README.textile
170
+ - README.md
170
171
  - Rakefile
171
172
  - bin/solrizer
172
173
  - bin/solrizerd
@@ -193,6 +194,7 @@ files:
193
194
  - spec/units/extractor_spec.rb
194
195
  - spec/units/field_mapper_spec.rb
195
196
  - spec/units/solrizer_spec.rb
197
+ - spec/units/suffix_spec.rb
196
198
  - spec/units/xml_extractor_spec.rb
197
199
  homepage: http://github.com/projecthydra/solrizer
198
200
  licenses: []
@@ -203,17 +205,17 @@ require_paths:
203
205
  - lib
204
206
  required_ruby_version: !ruby/object:Gem::Requirement
205
207
  requirements:
206
- - - '>='
208
+ - - ">="
207
209
  - !ruby/object:Gem::Version
208
210
  version: '0'
209
211
  required_rubygems_version: !ruby/object:Gem::Requirement
210
212
  requirements:
211
- - - '>='
213
+ - - ">="
212
214
  - !ruby/object:Gem::Version
213
215
  version: '0'
214
216
  requirements: []
215
217
  rubyforge_project:
216
- rubygems_version: 2.0.3
218
+ rubygems_version: 2.2.2
217
219
  signing_key:
218
220
  specification_version: 4
219
221
  summary: A utility for building solr indexes, usually from Fedora repository content
@@ -225,5 +227,6 @@ test_files:
225
227
  - spec/units/extractor_spec.rb
226
228
  - spec/units/field_mapper_spec.rb
227
229
  - spec/units/solrizer_spec.rb
230
+ - spec/units/suffix_spec.rb
228
231
  - spec/units/xml_extractor_spec.rb
229
232
  has_rdoc:
@@ -1,249 +0,0 @@
1
- h1. solrizer
2
-
3
- A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from the command line, or as a JMS listener.
4
-
5
- Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
6
- datasource and write solr documents into a solr instance, you need to use an implementation specific gem, such as
7
- "solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a fedora repository and writing to a solr instance.
8
-
9
-
10
- h2. Installation
11
-
12
- The gem is hosted on rubygems.org. The best way to manage the gems for your project is to use bundler. Create a Gemfile in the root of your application and include the following:
13
-
14
- <pre>
15
- source "http://rubygems.org"
16
-
17
- gem 'solrizer'
18
- </pre>
19
-
20
- Then:
21
-
22
- <pre>bundle install</pre>
23
-
24
- h2. Usage
25
-
26
- h3. Fire up the console:
27
-
28
- The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
29
-
30
- Start up a console and load solrizer:
31
-
32
- <pre>
33
- irb
34
- require "rubygems"
35
- require "solrizer"
36
- </pre>
37
-
38
-
39
- h3. Field Mapper
40
-
41
- The FieldMapper maps term names and values to Solr fields, based on the term’s data type and any index_as options. Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr schema.xml file. A copy of that is available :
42
- https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml
43
-
44
- More information on the conventions followed for the dynamic solr fields is here:
45
- https://github.com/projecthydra/hydra-head/wiki/Solr-Schema
46
-
47
- <pre>
48
- default_mapper = Solrizer::FieldMapper::Default.new
49
-
50
- # some of the default mappings in solrizer
51
- default_mapper.solr_name("foo",:string,:searchable) # returns foo_tesim
52
- default_mapper.solr_name("foo",:date,:searchable) # returns foo_dtsim
53
- default_mapper.solr_name("foo",:integer,:searchable # returns foo_isim
54
- default_mapper.solr_name("foo",:string,:facetable) # returns foo_sim
55
- default_mapper.solr_name("foo",:integer,:facetable) # returns foo_iim
56
- default_mapper.solr_name("foo",:string,:sortable) # returns foo_si
57
- default_mapper.solr_name("foo",:string,:displayable) # returns foo_ssm
58
- </pre>
59
-
60
- ## Using default indexing strategies
61
-
62
- <pre>
63
- solr_doc = {}
64
- Solrizer.insert_field(solr_doc, 'title', 'whatever', :searchable)
65
- => {"title_tesim"=>["whatever"]}
66
-
67
- Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
68
- => {"title_tesim"=>["whatever"], "pub_date_ssi"=>["Nov 2012"], "pub_date_ssm"=>["Nov 2012"]}
69
- </pre>
70
-
71
- #### You can also index dates
72
- <pre>
73
- # as a date
74
- solr_doc = {}
75
- Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
76
- => {"pub_date_dtsi"=>["2012-11-07T00:00:00Z"]}
77
-
78
- # or as a string
79
- solr_doc = {}
80
- Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
81
- => {"pub_date_ssi"=>["2012-11-07"], "pub_date_ssm"=>["2012-11-07"]}
82
-
83
- # or a string that is stored as a date
84
- solr_doc = {}
85
- Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
86
- => {"pub_date_dtsi"=>["2013-01-29T00:00:00Z"]}
87
- </pre>
88
-
89
-
90
- ## Using a custom indexing strategy
91
- All you have to do is create your own index descriptor:
92
- <pre>
93
- solr_doc = {}
94
- displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
95
- Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
96
- {"some_count_isi"=>["45"]}
97
- </pre>
98
-
99
- ## Changing the behavior of a default descriptor
100
-
101
- Simply override the methods within Solrizer::DefaultDescriptors
102
- <pre>
103
- # before
104
- solr_doc = {}
105
- Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
106
- => {"title_sim"=>["foobar"]}
107
-
108
- # redefine facetable:
109
- module Solrizer
110
- module DefaultDescriptors
111
- def self.facetable
112
- Descriptor.new(:string, :indexed, :stored)
113
- end
114
- end
115
- end
116
-
117
- # after
118
- solr_doc = {}
119
- Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
120
- => {"title_ssi"=>["foobar"]}
121
- </pre>
122
-
123
-
124
- ## Creating your own Indexers
125
- <pre>
126
- module MyMappers
127
- def self.mapper_one
128
- Solrizer::Descriptor.new(:string, :indexed, :stored)
129
- end
130
- end
131
-
132
- solr_doc = {}
133
-
134
- Solrizer::FieldMapper.descriptors = [MyMappers]
135
- => [MyMappers]
136
-
137
- Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
138
- => {"title_ssi"=>["foobar"]}
139
- </pre>
140
-
141
- ## Using OM
142
- Same as it ever was:
143
- <pre>
144
- t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
145
- </pre>
146
-
147
- But now you may also pass an Descriptor instance if that works for you:
148
- <pre>
149
- indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
150
- t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
151
-
152
- </pre>
153
-
154
- h3. Extractor and Extractor Mixins
155
-
156
- Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
157
-
158
- <pre>
159
- extractor = Solrizer::Extractor.new
160
-
161
- extractor.format_node_value(["foo ","\n bar"]) # returns "foo bar"
162
-
163
- solr_doc = Hash.new
164
- extractor.insert_solr_field_value(solr_doc, "foo","bar") # solr_doc is now {"foo" => ["bar"]}
165
- extractor.insert_solr_field_value(solr_doc,"foo","baz") # solr_doc is now {"foo" => ["bar","baz"]}
166
- extractor.insert_solr_field_value(solr_doc, "boo","hoo") # solr_doc is now {"foo" => ["bar","baz"], "boo" => ["hoo"]}
167
- </pre>
168
-
169
- h4. Solrizer provides some default mixins:
170
-
171
- * Solrizer::HTML::Extractor -=> provides html_to_solr method
172
- * Solrizer::XML::Extractor -=> provides xml_to_solr method
173
-
174
- <pre>
175
- xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
176
-
177
- extractor.xml_to_solr(xml) # returns {:foo_tesim=>"bar", :bar_tesim=>"baz"}
178
- </pre>
179
-
180
- h4. Solrizer::XML::TerminologyBasedSolrizer
181
-
182
- Another powerful mixin for use with classes that include the OM::XML::Document module is Solrizer::XML::TerminologyBasedSolrizer.
183
- The methods provided by this module map provides a robust way of mapping terms and solr fields via om terminologies. A notable example
184
- can be found in ActiveFedora::NokogiriDatatstream.
185
-
186
-
187
- h2. JMS Listener for Hydra Rails Applications
188
-
189
- h3. The executables: solrizer and solrizerd
190
-
191
- The solrizer gem provides two executables:
192
-
193
- * solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
194
- * solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
195
-
196
- h3. Usage
197
-
198
- The usage for solrizerd is as follows:
199
-
200
- <pre>
201
- solrizerd command --hydra_home PATH [options]
202
- </pre>
203
-
204
- The commands are as follows:
205
- * start start an instance of the application
206
- * stop stop all instances of the application
207
- * restart stop all instances and restart them afterwards
208
- * status show status (PID) of application instances
209
-
210
- Required parameters:
211
-
212
- --hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
213
-
214
- The options:
215
- * -p, --port Stomp port 61613
216
- * -o, --host Host to connect to localhost
217
- * -u, --user User name for stomp listener
218
- * -w, --password Password for stomp listener
219
- * -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
220
- * -h, --help Display this screen
221
-
222
- Note:
223
-
224
- Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
225
-
226
-
227
- h2. Note on Patches/Pull Requests
228
-
229
- * Fork the project.
230
- * Make your feature addition or bug fix.
231
- * Add tests for it. This is important so I don't break it in a
232
- future version unintentionally.
233
- * Commit, do not mess with rake file, version, or history.
234
- (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
235
- * Send me a pull request. Bonus points for topic branches.
236
-
237
- h2. Acknowledgements
238
-
239
- Technical Lead: Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
240
-
241
- Thanks to
242
-
243
- Douglas Kim, who created the initial code base for Solrizer.
244
- Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
245
- Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
246
-
247
- h2. Copyright
248
-
249
- Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.