solrizer 3.1.1 → 3.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CONTRIBUTING.md +113 -0
- data/History.txt +9 -0
- data/README.md +252 -0
- data/bin/solrizer +1 -2
- data/lib/solrizer.rb +1 -1
- data/lib/solrizer/field_mapper.rb +1 -1
- data/lib/solrizer/suffix.rb +51 -33
- data/lib/solrizer/version.rb +1 -1
- data/solrizer.gemspec +1 -1
- data/spec/units/solrizer_spec.rb +10 -1
- data/spec/units/suffix_spec.rb +80 -0
- metadata +32 -29
- data/README.textile +0 -249
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 8d77a71925151847cbb586039e03296e1b0aa2c9
|
4
|
+
data.tar.gz: a2bbb1286b7b037dc5a83d81f84ec098edc60283
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 76ad75ea81fb427b72a0b214637427f44ecd54d5e4b80cc43f49ae4c13598816d3756144b4b1081f7c357169595d97d7c8ce4e1137533e54e99807c619f9d9d6
|
7
|
+
data.tar.gz: 9294cad6b4a5661321d4a0780652af3e31fbebfeabf6b3ebe5d62577d37aaf6cc685255edb50acc5fd8782f468bb05a0d8ba1e64bb40c346c6f10202e8103250
|
data/CONTRIBUTING.md
ADDED
@@ -0,0 +1,113 @@
|
|
1
|
+
# How to Contribute
|
2
|
+
|
3
|
+
We want your help to make Project Hydra great.
|
4
|
+
There are a few guidelines that we need contributors to follow so that we can have a chance of keeping on top of things.
|
5
|
+
|
6
|
+
## Hydra Project Intellectual Property Licensing and Ownership
|
7
|
+
|
8
|
+
All code contributors must have an Individual Contributor License Agreement (iCLA) on file with the Hydra Project Steering Group.
|
9
|
+
If the contributor works for an institution, the institution must have a Corporate Contributor License Agreement (cCLA) on file.
|
10
|
+
|
11
|
+
https://wiki.duraspace.org/display/hydra/Hydra+Project+Intellectual+Property+Licensing+and+Ownership
|
12
|
+
|
13
|
+
You should also add yourself to the `CONTRIBUTORS.md` file in the root of the project.
|
14
|
+
|
15
|
+
## Contribution Tasks
|
16
|
+
|
17
|
+
* Reporting Issues
|
18
|
+
* Making Changes
|
19
|
+
* Submitting Changes
|
20
|
+
* Merging Changes
|
21
|
+
|
22
|
+
### Reporting Issues
|
23
|
+
|
24
|
+
* Make sure you have a [GitHub account](https://github.com/signup/free)
|
25
|
+
* Submit a [Github issue](./issues) by:
|
26
|
+
* Clearly describing the issue
|
27
|
+
* Provide a descriptive summary
|
28
|
+
* Explain the expected behavior
|
29
|
+
* Explain the actual behavior
|
30
|
+
* Provide steps to reproduce the actual behavior
|
31
|
+
|
32
|
+
### Making Changes
|
33
|
+
|
34
|
+
* Fork the repository on GitHub
|
35
|
+
* Create a topic branch from where you want to base your work.
|
36
|
+
* This is usually the master branch.
|
37
|
+
* To quickly create a topic branch based on master; `git branch fix/master/my_contribution master`
|
38
|
+
* Then checkout the new branch with `git checkout fix/master/my_contribution`.
|
39
|
+
* Please avoid working directly on the `master` branch.
|
40
|
+
* You may find the [hub suite of commands](https://github.com/defunkt/hub) helpful
|
41
|
+
* Make commits of logical units.
|
42
|
+
* Your commit should include a high level description of your work in HISTORY.textile
|
43
|
+
* Check for unnecessary whitespace with `git diff --check` before committing.
|
44
|
+
* Make sure your commit messages are [well formed](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).
|
45
|
+
* If you created an issue, you can close it by including "Closes #issue" in your commit message. See [Github's blog post for more details](https://github.com/blog/1386-closing-issues-via-commit-messages)
|
46
|
+
|
47
|
+
```
|
48
|
+
Present tense short summary (50 characters or less)
|
49
|
+
|
50
|
+
More detailed description, if necessary. It should be wrapped to 72
|
51
|
+
characters. Try to be as descriptive as you can, even if you think that
|
52
|
+
the commit content is obvious, it may not be obvious to others. You
|
53
|
+
should add such description also if it's already present in bug tracker,
|
54
|
+
it should not be necessary to visit a webpage to check the history.
|
55
|
+
|
56
|
+
Include Closes #<issue-number> when relavent.
|
57
|
+
|
58
|
+
Description can have multiple paragraphs and you can use code examples
|
59
|
+
inside, just indent it with 4 spaces:
|
60
|
+
|
61
|
+
class PostsController
|
62
|
+
def index
|
63
|
+
respond_with Post.limit(10)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
You can also add bullet points:
|
68
|
+
|
69
|
+
- you can use dashes or asterisks
|
70
|
+
|
71
|
+
- also, try to indent next line of a point for readability, if it's too
|
72
|
+
long to fit in 72 characters
|
73
|
+
```
|
74
|
+
|
75
|
+
* Make sure you have added the necessary tests for your changes.
|
76
|
+
* Run _all_ the tests to assure nothing else was accidentally broken.
|
77
|
+
* When you are ready to submit a pull request
|
78
|
+
|
79
|
+
### Submitting Changes
|
80
|
+
|
81
|
+
[Detailed Walkthrough of One Pull Request per Commit](http://ndlib.github.io/practices/one-commit-per-pull-request/)
|
82
|
+
|
83
|
+
* Read the article ["Using Pull Requests"](https://help.github.com/articles/using-pull-requests) on GitHub.
|
84
|
+
* Make sure your branch is up to date with its parent branch (i.e. master)
|
85
|
+
* `git checkout master`
|
86
|
+
* `git pull --rebase`
|
87
|
+
* `git checkout <your-branch>`
|
88
|
+
* `git rebase master`
|
89
|
+
* It is likely a good idea to run your tests again.
|
90
|
+
* Squash the commits for your branch into one commit
|
91
|
+
* `git rebase --interactive HEAD~<number-of-commits>` ([See Github help](https://help.github.com/articles/interactive-rebase))
|
92
|
+
* To determine the number of commits on your branch: `git log master..<your-branch> --oneline | wc -l`
|
93
|
+
* Squashing your branch's changes into one commit is "good form" and helps the person merging your request to see everything that is going on.
|
94
|
+
* Push your changes to a topic branch in your fork of the repository.
|
95
|
+
* Submit a pull request from your fork to the project.
|
96
|
+
|
97
|
+
### Merging Changes
|
98
|
+
|
99
|
+
* It is considered "poor from" to merge your own request.
|
100
|
+
* Please take the time to review the changes and get a sense of what is being changed. Things to consider:
|
101
|
+
* Does the commit message explain what is going on?
|
102
|
+
* Does the code changes have tests? _Not all changes need new tests, some changes are refactorings_
|
103
|
+
* Does the commit contain more than it should? Are two separate concerns being addressed in one commit?
|
104
|
+
* Did the Travis tests complete successfully?
|
105
|
+
* If you are uncertain, bring other contributors into the conversation by creating a comment that includes their @username.
|
106
|
+
* If you like the pull request, but want others to chime in, create a +1 comment and tag a user.
|
107
|
+
|
108
|
+
# Additional Resources
|
109
|
+
|
110
|
+
* [General GitHub documentation](http://help.github.com/)
|
111
|
+
* [GitHub pull request documentation](http://help.github.com/send-pull-requests/)
|
112
|
+
* [Pro Git](http://git-scm.com/book) is both a free and excellent book about Git.
|
113
|
+
* [A Git Config for Contributing](http://ndlib.github.io/practices/my-typical-per-project-git-config/)
|
data/History.txt
CHANGED
@@ -1,3 +1,12 @@
|
|
1
|
+
h2. 3.2.0
|
2
|
+
#25 Allow any field_value except nil to be inserted into a solr field
|
3
|
+
#24 Remove dependency on solrizer-fedora, use AF to update index by pid
|
4
|
+
#23 Enhance Suffix#config so it can be usefully overridden by downstream
|
5
|
+
|
6
|
+
h2. 3.1.1
|
7
|
+
#22 Support for boolean values
|
8
|
+
#21 Testing on Rails version 4
|
9
|
+
|
1
10
|
h2. 3.1.0
|
2
11
|
#16 Inserting non-multivalued fields should not create a solr error
|
3
12
|
#20 Time fields should be formatted correctly when using active_support/core_ext/date_time/conversions
|
data/README.md
ADDED
@@ -0,0 +1,252 @@
|
|
1
|
+
# solrizer
|
2
|
+
|
3
|
+
[](https://travis-ci.org/projecthydra/solrizer)
|
4
|
+
[](http://badge.fury.io/rb/solrizer)
|
5
|
+
|
6
|
+
A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from
|
7
|
+
the command line, or as a JMS listener.
|
8
|
+
|
9
|
+
Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
|
10
|
+
data source and write solr documents into a solr instance, you need to use an implementation specific gem, such as
|
11
|
+
"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a
|
12
|
+
fedora repository and writing to a solr instance.
|
13
|
+
|
14
|
+
|
15
|
+
## Installation
|
16
|
+
|
17
|
+
The gem is hosted on [rubygems.org](http://rubygems.org/gems/solrizer). The best way to manage the gems for your project
|
18
|
+
is to use bundler. Create a Gemfile in the root of your application and include the following:
|
19
|
+
|
20
|
+
|
21
|
+
source "http://rubygems.org"
|
22
|
+
gem 'solrizer'
|
23
|
+
|
24
|
+
Then:
|
25
|
+
|
26
|
+
bundle install
|
27
|
+
|
28
|
+
## Usage
|
29
|
+
|
30
|
+
### Fire up the console:
|
31
|
+
|
32
|
+
The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
|
33
|
+
|
34
|
+
Start up a console and load solrizer:
|
35
|
+
|
36
|
+
> irb
|
37
|
+
> require "rubygems"
|
38
|
+
> require "solrizer"
|
39
|
+
|
40
|
+
### Field Mapper
|
41
|
+
|
42
|
+
The `FieldMapper` maps term names and values to Solr fields, based on the term's data type and any index_as options.
|
43
|
+
Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr
|
44
|
+
[schema.xml](https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml).
|
45
|
+
|
46
|
+
More information on the conventions followed for the dynamic solr fields is on the
|
47
|
+
[wiki page](https://github.com/projecthydra/hydra-head/wiki/Solr-Schema).
|
48
|
+
|
49
|
+
To examine all of Solrizer's field names, open up a ruby console:
|
50
|
+
|
51
|
+
|
52
|
+
> require 'solrizer'
|
53
|
+
=> true
|
54
|
+
> default_mapper = Solrizer::FieldMapper.new
|
55
|
+
=> #<Solrizer::FieldMapper:0x007fb47a273770 @id_field="id">
|
56
|
+
> default_mapper.solr_name("foo",:searchable, type: :string)
|
57
|
+
=> "foo_teim"
|
58
|
+
> default_mapper.solr_name("foo",:searchable, type: :date)
|
59
|
+
=> "foo_dtim"
|
60
|
+
> default_mapper.solr_name("foo",:searchable, type: :integer)
|
61
|
+
=> "foo_iim"
|
62
|
+
> default_mapper.solr_name("foo",:facetable, type: :string)
|
63
|
+
=> "foo_sim"
|
64
|
+
> default_mapper.solr_name("foo",:facetable, type: :integer)
|
65
|
+
=> "foo_sim"
|
66
|
+
> default_mapper.solr_name("foo",:sortable, type: :string)
|
67
|
+
=> "foo_si"
|
68
|
+
> default_mapper.solr_name("foo",:displayable, type: :string)
|
69
|
+
=> "foo_ssm"
|
70
|
+
|
71
|
+
### Default indexing strategies
|
72
|
+
|
73
|
+
> solr_doc = Hash.new
|
74
|
+
> Solrizer.insert_field(solr_doc, 'title', 'whatever', :stored_searchable)
|
75
|
+
=> {"title_tesim"=>["whatever"]}
|
76
|
+
|
77
|
+
> Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
|
78
|
+
=> {"pub_date_si"=>"Nov 2012", "pub_date_ssm"=>["Nov 2012"]}
|
79
|
+
|
80
|
+
### Indexing dates
|
81
|
+
|
82
|
+
as a date:
|
83
|
+
|
84
|
+
> solr_doc = {}
|
85
|
+
> Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
|
86
|
+
=> {"pub_date_dtim"=>["2012-11-07T00:00:00Z"]}
|
87
|
+
|
88
|
+
or as a string:
|
89
|
+
|
90
|
+
> solr_doc = {}
|
91
|
+
> Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
|
92
|
+
=> {"pub_date_dti"=>"2012-11-07T00:00:00Z", "pub_date_ssm"=>["2012-11-07"]}
|
93
|
+
|
94
|
+
or a string that is stored as a date:
|
95
|
+
|
96
|
+
> solr_doc = {}
|
97
|
+
> Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
|
98
|
+
=> {"pub_date_dtsim"=>["2013-01-29T00:00:00Z"]}
|
99
|
+
|
100
|
+
### Custom indexing strategies
|
101
|
+
|
102
|
+
#### Create your own index descriptor
|
103
|
+
|
104
|
+
> solr_doc = {}
|
105
|
+
> displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
|
106
|
+
> Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
|
107
|
+
=> {"some_count_isi"=>"45"}
|
108
|
+
|
109
|
+
#### Override the defaults
|
110
|
+
|
111
|
+
We can override the default indexing methods within `Solrizer::DefaultDescriptors`
|
112
|
+
|
113
|
+
Here's the default behavior:
|
114
|
+
|
115
|
+
> solr_doc = {}
|
116
|
+
> Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
|
117
|
+
=> {"title_sim"=>["foobar"]}
|
118
|
+
|
119
|
+
But let's override that by redefining `:facetable`
|
120
|
+
|
121
|
+
module Solrizer
|
122
|
+
module DefaultDescriptors
|
123
|
+
def self.facetable
|
124
|
+
Descriptor.new(:string, :indexed, :stored)
|
125
|
+
end
|
126
|
+
end
|
127
|
+
end
|
128
|
+
|
129
|
+
Now, `:facetable` will return something different:
|
130
|
+
|
131
|
+
> solr_doc = {}
|
132
|
+
> Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
|
133
|
+
=> {"title_ssi"=>"foobar"}
|
134
|
+
|
135
|
+
#### Creating your own indexers
|
136
|
+
|
137
|
+
module MyMappers
|
138
|
+
def self.mapper_one
|
139
|
+
Solrizer::Descriptor.new(:string, :indexed, :stored)
|
140
|
+
end
|
141
|
+
end
|
142
|
+
|
143
|
+
Now, set Solrizer's field mapper to use our new module:
|
144
|
+
|
145
|
+
> solr_doc = {}
|
146
|
+
> Solrizer::FieldMapper.descriptors = [MyMappers]
|
147
|
+
=> [MyMappers]
|
148
|
+
> Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
|
149
|
+
=> {"title_ssi"=>"foobar"}
|
150
|
+
|
151
|
+
### Using OM
|
152
|
+
|
153
|
+
t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
|
154
|
+
|
155
|
+
But now you may also pass an Descriptor instance if that works for you:
|
156
|
+
|
157
|
+
indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
|
158
|
+
t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
|
159
|
+
|
160
|
+
### Extractor and Extractor Mixins
|
161
|
+
|
162
|
+
Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
|
163
|
+
|
164
|
+
> extractor = Solrizer::Extractor.new
|
165
|
+
> solr_doc = Hash.new
|
166
|
+
> extractor.format_node_value(["foo ","\n bar"])
|
167
|
+
=> "foo bar"
|
168
|
+
> extractor.insert_solr_field_value(solr_doc, "foo","bar")
|
169
|
+
=> {"foo"=>"bar"}
|
170
|
+
> extractor.insert_solr_field_value(solr_doc,"foo","baz")
|
171
|
+
=> {"foo"=>["bar", "baz"]}
|
172
|
+
> extractor.insert_solr_field_value(solr_doc, "boo","hoo")
|
173
|
+
=> {"foo"=>["bar", "baz"], "boo"=>"hoo"}
|
174
|
+
|
175
|
+
#### Solrizer provides some default mixins:
|
176
|
+
|
177
|
+
`Solrizer::HTML::Extractor` provides html_to_solr method and `Solrizer::XML::Extractor` provides xml_to_solr method:
|
178
|
+
|
179
|
+
> Solrizer::XML::Extractor
|
180
|
+
> extractor = Solrizer::Extractor.new
|
181
|
+
> xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
|
182
|
+
> extractor.xml_to_solr(xml)
|
183
|
+
=> {:foo_tesim=>"bar", :bar_tesim=>"baz"}
|
184
|
+
|
185
|
+
#### Solrizer::XML::TerminologyBasedSolrizer
|
186
|
+
|
187
|
+
Another powerful mixin for use with classes that include the `OM::XML::Document` module is
|
188
|
+
`Solrizer::XML::TerminologyBasedSolrizer`. The methods provided by this module map provides a robust way of mapping
|
189
|
+
terms and solr fields via om terminologies. A notable example can be found in `ActiveFedora::NokogiriDatatstream`.
|
190
|
+
|
191
|
+
## JMS Listener for Hydra Rails Applications
|
192
|
+
|
193
|
+
### The executables: solrizer and solrizerd
|
194
|
+
|
195
|
+
The solrizer gem provides two executables:
|
196
|
+
|
197
|
+
* solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
|
198
|
+
* solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
|
199
|
+
|
200
|
+
### Usage
|
201
|
+
|
202
|
+
The usage for solrizerd is as follows:
|
203
|
+
|
204
|
+
solrizerd command --hydra_home PATH [options]
|
205
|
+
|
206
|
+
The commands are as follows:
|
207
|
+
* start start an instance of the application
|
208
|
+
* stop stop all instances of the application
|
209
|
+
* restart stop all instances and restart them afterwards
|
210
|
+
* status show status (PID) of application instances
|
211
|
+
|
212
|
+
Required parameters:
|
213
|
+
|
214
|
+
--hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
|
215
|
+
|
216
|
+
The options:
|
217
|
+
* -p, --port Stomp port 61613
|
218
|
+
* -o, --host Host to connect to localhost
|
219
|
+
* -u, --user User name for stomp listener
|
220
|
+
* -w, --password Password for stomp listener
|
221
|
+
* -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
|
222
|
+
* -h, --help Display this screen
|
223
|
+
|
224
|
+
Note:
|
225
|
+
|
226
|
+
Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
|
227
|
+
|
228
|
+
## Note on Patches/Pull Requests
|
229
|
+
|
230
|
+
* Fork the project.
|
231
|
+
* Make your feature addition or bug fix.
|
232
|
+
* Add tests for it. This is important so I don't break it in a
|
233
|
+
future version unintentionally.
|
234
|
+
* Commit, do not mess with rake file, version, or history.
|
235
|
+
(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
|
236
|
+
* Send me a pull request. Bonus points for topic branches.
|
237
|
+
|
238
|
+
## Acknowledgments
|
239
|
+
|
240
|
+
### Technical Lead
|
241
|
+
|
242
|
+
Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
|
243
|
+
|
244
|
+
### Thanks to
|
245
|
+
|
246
|
+
* Douglas Kim, who created the initial code base for Solrizer.
|
247
|
+
* Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
|
248
|
+
* Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
|
249
|
+
|
250
|
+
## Copyright
|
251
|
+
|
252
|
+
Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.
|
data/bin/solrizer
CHANGED
@@ -90,8 +90,7 @@ begin
|
|
90
90
|
puts @msg.headers.inspect
|
91
91
|
puts "\nPID: #{@msg.headers["pid"]}\n"
|
92
92
|
if ["addDatastream", "addRelationship","ingest","modifyDatastreamByValue","modifyDatastreamByReference","modifyObject","purgeDatastream","purgeRelationship"].include? method
|
93
|
-
|
94
|
-
solrizer.solrize @msg.headers["pid"]
|
93
|
+
ActiveFedora::Base.find(@msg.headers["pid"], cast: true).update_index
|
95
94
|
elsif method == "purgeObject"
|
96
95
|
ActiveFedora::SolrService.instance.conn.delete_by_id(pid)
|
97
96
|
else
|
data/lib/solrizer.rb
CHANGED
@@ -31,7 +31,7 @@ module Solrizer
|
|
31
31
|
|
32
32
|
# @params [Hash] doc the hash to insert the value into
|
33
33
|
# @params [String] name the name of the field (without the suffix)
|
34
|
-
# @params [String,Date] value the value to be inserted
|
34
|
+
# @params [String,Date,Array] value the value (or array of values) to be inserted
|
35
35
|
# @params [Array,Hash] indexer_args the arguments that find the indexer
|
36
36
|
# @returns [Hash] doc the document that was provided with the new field inserted
|
37
37
|
def self.insert_field(doc, name, value, *indexer_args)
|
@@ -181,7 +181,7 @@ module Solrizer
|
|
181
181
|
# mapped names and values. The values in the hash are _arrays_, and may contain multiple values.
|
182
182
|
|
183
183
|
def solr_names_and_values(field_name, field_value, index_types)
|
184
|
-
return {}
|
184
|
+
return {} if field_value.nil?
|
185
185
|
|
186
186
|
# Determine the set of index types
|
187
187
|
index_types = Array(index_types)
|
data/lib/solrizer/suffix.rb
CHANGED
@@ -1,20 +1,26 @@
|
|
1
|
+
require 'ostruct'
|
2
|
+
|
1
3
|
module Solrizer
|
2
4
|
class Suffix
|
3
5
|
|
4
|
-
def initialize(fields)
|
5
|
-
@fields = fields
|
6
|
+
def initialize(*fields)
|
7
|
+
@fields = fields.flatten
|
6
8
|
end
|
7
9
|
|
8
10
|
def multivalued?
|
9
|
-
|
11
|
+
has_field? :multivalued
|
10
12
|
end
|
11
13
|
|
12
14
|
def stored?
|
13
|
-
|
15
|
+
has_field? :stored
|
14
16
|
end
|
15
17
|
|
16
18
|
def indexed?
|
17
|
-
|
19
|
+
has_field? :indexed
|
20
|
+
end
|
21
|
+
|
22
|
+
def has_field? f
|
23
|
+
f.to_sym == :type or @fields.include? f.to_sym
|
18
24
|
end
|
19
25
|
|
20
26
|
def data_type
|
@@ -22,40 +28,52 @@ module Solrizer
|
|
22
28
|
end
|
23
29
|
|
24
30
|
def to_s
|
25
|
-
|
26
|
-
index_suffix = config[:index_suffix] if indexed?
|
27
|
-
multivalued_suffix = config[:multivalued_suffix] if multivalued?
|
31
|
+
|
28
32
|
raise Solrizer::InvalidIndexDescriptor, "Missing datatype for #{@fields}" unless data_type
|
29
|
-
type_suffix = config[:type_suffix].call(data_type)
|
30
|
-
raise Solrizer::InvalidIndexDescriptor, "Invalid datatype `#{data_type.inspect}'. Must be one of: :date, :time, :text, :text_en, :string, :integer" unless type_suffix
|
31
33
|
|
32
|
-
[config
|
34
|
+
field_suffix = [config.suffix_delimiter]
|
35
|
+
|
36
|
+
config.fields.select { |f| has_field? f }.each do |f|
|
37
|
+
key = :"#{f}_suffix"
|
38
|
+
field_suffix << if config.send(key).is_a? Proc
|
39
|
+
config.send(key).call(@fields)
|
40
|
+
else
|
41
|
+
config.send(key)
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
field_suffix.join
|
33
46
|
end
|
34
47
|
|
48
|
+
def self.config
|
49
|
+
@config ||= OpenStruct.new :fields => [:type, :stored, :indexed, :multivalued],
|
50
|
+
suffix_delimiter: '_',
|
51
|
+
type_suffix: (lambda do |fields|
|
52
|
+
type = fields.first
|
53
|
+
case type
|
54
|
+
when :string, :symbol # TODO `:symbol' usage ought to be deprecated
|
55
|
+
's'
|
56
|
+
when :text
|
57
|
+
't'
|
58
|
+
when :text_en
|
59
|
+
'te'
|
60
|
+
when :date, :time
|
61
|
+
'dt'
|
62
|
+
when :integer
|
63
|
+
'i'
|
64
|
+
when :boolean
|
65
|
+
'b'
|
66
|
+
else
|
67
|
+
raise Solrizer::InvalidIndexDescriptor, "Invalid datatype `#{type.inspect}'. Must be one of: :date, :time, :text, :text_en, :string, :symbol, :integer, :boolean"
|
68
|
+
end
|
69
|
+
end),
|
70
|
+
stored_suffix: 's',
|
71
|
+
indexed_suffix: 'i',
|
72
|
+
multivalued_suffix: 'm'
|
73
|
+
end
|
35
74
|
|
36
|
-
private
|
37
75
|
def config
|
38
|
-
@config ||=
|
39
|
-
{suffix_delimiter: '_',
|
40
|
-
type_suffix: lambda do |type|
|
41
|
-
case type
|
42
|
-
when :string, :symbol # TODO `:symbol' usage ought to be deprecated
|
43
|
-
's'
|
44
|
-
when :text
|
45
|
-
't'
|
46
|
-
when :text_en
|
47
|
-
'te'
|
48
|
-
when :date, :time
|
49
|
-
'dt'
|
50
|
-
when :integer
|
51
|
-
'i'
|
52
|
-
when :boolean
|
53
|
-
'b'
|
54
|
-
end
|
55
|
-
end,
|
56
|
-
stored_suffix: 's',
|
57
|
-
index_suffix: 'i',
|
58
|
-
multivalued_suffix: 'm'}
|
76
|
+
@config ||= self.class.config.dup
|
59
77
|
end
|
60
78
|
end
|
61
79
|
end
|
data/lib/solrizer/version.rb
CHANGED
data/solrizer.gemspec
CHANGED
data/spec/units/solrizer_spec.rb
CHANGED
@@ -25,15 +25,24 @@ describe Solrizer do
|
|
25
25
|
Solrizer.insert_field(doc, 'foo', Time.parse('2013-01-13T22:45:56+06:00'))
|
26
26
|
doc.should == {'foo_dtsim' => ["2013-01-13T16:45:56Z"]}
|
27
27
|
end
|
28
|
-
it "should insert Booleans" do
|
28
|
+
it "should insert true Booleans" do
|
29
29
|
Solrizer.insert_field(doc, 'foo', true)
|
30
30
|
doc.should == {'foo_bsi' => true}
|
31
31
|
end
|
32
|
+
it "should insert false Booleans" do
|
33
|
+
Solrizer.insert_field(doc, 'foo', false)
|
34
|
+
doc.should == {'foo_bsi' => false}
|
35
|
+
end
|
32
36
|
|
33
37
|
it "should insert multiple values" do
|
34
38
|
Solrizer.insert_field(doc, 'foo', ['A name', 'B name'], :sortable, :facetable)
|
35
39
|
doc.should == {'foo_si' => 'B name', 'foo_sim' => ['A name', 'B name']}
|
36
40
|
end
|
41
|
+
|
42
|
+
it 'should insert nothing when passed a nil value' do
|
43
|
+
Solrizer.insert_field(doc, 'foo', nil, :sortable, :facetable)
|
44
|
+
doc.should == {}
|
45
|
+
end
|
37
46
|
end
|
38
47
|
|
39
48
|
describe "on a document with values" do
|
@@ -0,0 +1,80 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe Solrizer::Suffix do
|
4
|
+
|
5
|
+
describe "#multivalued?" do
|
6
|
+
it "should be multivalued if :multivalued is among the field types" do
|
7
|
+
expect(Solrizer::Suffix.new(:multivalued)).to be_multivalued
|
8
|
+
end
|
9
|
+
|
10
|
+
it "should not be multivalued if :multivalued was not passed in a field type" do
|
11
|
+
expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_multivalued
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
15
|
+
describe "#stored?" do
|
16
|
+
it "should be stored if :stored is among the field types" do
|
17
|
+
expect(Solrizer::Suffix.new(:stored)).to be_stored
|
18
|
+
end
|
19
|
+
|
20
|
+
it "should not be stored if :stored was not passed in a field type" do
|
21
|
+
expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_stored
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
describe "#indexed?" do
|
26
|
+
it "should be indexed if :indexed is among the field types" do
|
27
|
+
expect(Solrizer::Suffix.new(:indexed)).to be_indexed
|
28
|
+
end
|
29
|
+
|
30
|
+
it "should not be indexed if :indexed was not passed in a field type" do
|
31
|
+
expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_indexed
|
32
|
+
end
|
33
|
+
end
|
34
|
+
describe "#has_field?" do
|
35
|
+
subject do
|
36
|
+
Solrizer::Suffix.new(:type, :a, :b, :c)
|
37
|
+
end
|
38
|
+
it "should be able to tell if a field is in the suffix or not" do
|
39
|
+
expect(subject).to have_field :a
|
40
|
+
expect(subject).to have_field :b
|
41
|
+
expect(subject).to have_field :c
|
42
|
+
expect(subject).to_not have_field :d
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
describe "#data_type" do
|
47
|
+
it "should always be the first argument to the suffix" do
|
48
|
+
expect(Solrizer::Suffix.new(:some_type, :a).data_type).to eq :some_type
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
describe "#to_s" do
|
53
|
+
it "should combine the fields into a suffix string" do
|
54
|
+
expect(Solrizer::Suffix.new(:string, :stored, :indexed).to_s).to eq '_ssi'
|
55
|
+
expect(Solrizer::Suffix.new(:integer, :stored, :multivalued).to_s).to eq '_ism'
|
56
|
+
end
|
57
|
+
end
|
58
|
+
|
59
|
+
describe "config" do
|
60
|
+
subject do
|
61
|
+
Solrizer::Suffix.new(:my_custom_type, :a, :b, :c)
|
62
|
+
end
|
63
|
+
|
64
|
+
it "should let you mess with the suffix config" do
|
65
|
+
subject.config.fields += [:b]
|
66
|
+
subject.config.suffix_delimiter = "#"
|
67
|
+
subject.config.type_suffix = lambda do |fields|
|
68
|
+
type = fields.first
|
69
|
+
|
70
|
+
if type == :my_custom_type
|
71
|
+
"custom_suffix_"
|
72
|
+
else
|
73
|
+
"nope"
|
74
|
+
end
|
75
|
+
end
|
76
|
+
subject.config.b_suffix = 'now_with_more_b'
|
77
|
+
expect(subject.to_s).to eq "#custom_suffix_now_with_more_b"
|
78
|
+
end
|
79
|
+
end
|
80
|
+
end
|
metadata
CHANGED
@@ -1,153 +1,153 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: solrizer
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.
|
4
|
+
version: 3.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Matt Zumwalt
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2014-05-28 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
16
16
|
requirements:
|
17
|
-
- -
|
17
|
+
- - ">="
|
18
18
|
- !ruby/object:Gem::Version
|
19
19
|
version: '0'
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
|
-
- -
|
24
|
+
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: '0'
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: xml-simple
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
30
30
|
requirements:
|
31
|
-
- -
|
31
|
+
- - ">="
|
32
32
|
- !ruby/object:Gem::Version
|
33
33
|
version: '0'
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
|
-
- -
|
38
|
+
- - ">="
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '0'
|
41
41
|
- !ruby/object:Gem::Dependency
|
42
42
|
name: mediashelf-loggable
|
43
43
|
requirement: !ruby/object:Gem::Requirement
|
44
44
|
requirements:
|
45
|
-
- - ~>
|
45
|
+
- - "~>"
|
46
46
|
- !ruby/object:Gem::Version
|
47
47
|
version: 0.4.7
|
48
48
|
type: :runtime
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
|
-
- - ~>
|
52
|
+
- - "~>"
|
53
53
|
- !ruby/object:Gem::Version
|
54
54
|
version: 0.4.7
|
55
55
|
- !ruby/object:Gem::Dependency
|
56
56
|
name: stomp
|
57
57
|
requirement: !ruby/object:Gem::Requirement
|
58
58
|
requirements:
|
59
|
-
- -
|
59
|
+
- - ">="
|
60
60
|
- !ruby/object:Gem::Version
|
61
61
|
version: '0'
|
62
62
|
type: :runtime
|
63
63
|
prerelease: false
|
64
64
|
version_requirements: !ruby/object:Gem::Requirement
|
65
65
|
requirements:
|
66
|
-
- -
|
66
|
+
- - ">="
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: '0'
|
69
69
|
- !ruby/object:Gem::Dependency
|
70
70
|
name: daemons
|
71
71
|
requirement: !ruby/object:Gem::Requirement
|
72
72
|
requirements:
|
73
|
-
- -
|
73
|
+
- - ">="
|
74
74
|
- !ruby/object:Gem::Version
|
75
75
|
version: '0'
|
76
76
|
type: :runtime
|
77
77
|
prerelease: false
|
78
78
|
version_requirements: !ruby/object:Gem::Requirement
|
79
79
|
requirements:
|
80
|
-
- -
|
80
|
+
- - ">="
|
81
81
|
- !ruby/object:Gem::Version
|
82
82
|
version: '0'
|
83
83
|
- !ruby/object:Gem::Dependency
|
84
84
|
name: activesupport
|
85
85
|
requirement: !ruby/object:Gem::Requirement
|
86
86
|
requirements:
|
87
|
-
- -
|
87
|
+
- - ">="
|
88
88
|
- !ruby/object:Gem::Version
|
89
89
|
version: '0'
|
90
90
|
type: :runtime
|
91
91
|
prerelease: false
|
92
92
|
version_requirements: !ruby/object:Gem::Requirement
|
93
93
|
requirements:
|
94
|
-
- -
|
94
|
+
- - ">="
|
95
95
|
- !ruby/object:Gem::Version
|
96
96
|
version: '0'
|
97
97
|
- !ruby/object:Gem::Dependency
|
98
98
|
name: rspec
|
99
99
|
requirement: !ruby/object:Gem::Requirement
|
100
100
|
requirements:
|
101
|
-
- -
|
101
|
+
- - ">="
|
102
102
|
- !ruby/object:Gem::Version
|
103
103
|
version: '0'
|
104
104
|
type: :development
|
105
105
|
prerelease: false
|
106
106
|
version_requirements: !ruby/object:Gem::Requirement
|
107
107
|
requirements:
|
108
|
-
- -
|
108
|
+
- - ">="
|
109
109
|
- !ruby/object:Gem::Version
|
110
110
|
version: '0'
|
111
111
|
- !ruby/object:Gem::Dependency
|
112
112
|
name: rake
|
113
113
|
requirement: !ruby/object:Gem::Requirement
|
114
114
|
requirements:
|
115
|
-
- -
|
115
|
+
- - ">="
|
116
116
|
- !ruby/object:Gem::Version
|
117
117
|
version: '0'
|
118
118
|
type: :development
|
119
119
|
prerelease: false
|
120
120
|
version_requirements: !ruby/object:Gem::Requirement
|
121
121
|
requirements:
|
122
|
-
- -
|
122
|
+
- - ">="
|
123
123
|
- !ruby/object:Gem::Version
|
124
124
|
version: '0'
|
125
125
|
- !ruby/object:Gem::Dependency
|
126
126
|
name: yard
|
127
127
|
requirement: !ruby/object:Gem::Requirement
|
128
128
|
requirements:
|
129
|
-
- -
|
129
|
+
- - ">="
|
130
130
|
- !ruby/object:Gem::Version
|
131
131
|
version: '0'
|
132
132
|
type: :development
|
133
133
|
prerelease: false
|
134
134
|
version_requirements: !ruby/object:Gem::Requirement
|
135
135
|
requirements:
|
136
|
-
- -
|
136
|
+
- - ">="
|
137
137
|
- !ruby/object:Gem::Version
|
138
138
|
version: '0'
|
139
139
|
- !ruby/object:Gem::Dependency
|
140
140
|
name: RedCloth
|
141
141
|
requirement: !ruby/object:Gem::Requirement
|
142
142
|
requirements:
|
143
|
-
- -
|
143
|
+
- - ">="
|
144
144
|
- !ruby/object:Gem::Version
|
145
145
|
version: '0'
|
146
146
|
type: :development
|
147
147
|
prerelease: false
|
148
148
|
version_requirements: !ruby/object:Gem::Requirement
|
149
149
|
requirements:
|
150
|
-
- -
|
150
|
+
- - ">="
|
151
151
|
- !ruby/object:Gem::Version
|
152
152
|
version: '0'
|
153
153
|
description: Use solrizer to populate solr indexes. You can run solrizer from within
|
@@ -159,14 +159,15 @@ executables:
|
|
159
159
|
extensions: []
|
160
160
|
extra_rdoc_files:
|
161
161
|
- LICENSE
|
162
|
-
- README.
|
162
|
+
- README.md
|
163
163
|
files:
|
164
|
-
- .gitignore
|
165
|
-
- .travis.yml
|
164
|
+
- ".gitignore"
|
165
|
+
- ".travis.yml"
|
166
|
+
- CONTRIBUTING.md
|
166
167
|
- Gemfile
|
167
168
|
- History.txt
|
168
169
|
- LICENSE
|
169
|
-
- README.
|
170
|
+
- README.md
|
170
171
|
- Rakefile
|
171
172
|
- bin/solrizer
|
172
173
|
- bin/solrizerd
|
@@ -193,6 +194,7 @@ files:
|
|
193
194
|
- spec/units/extractor_spec.rb
|
194
195
|
- spec/units/field_mapper_spec.rb
|
195
196
|
- spec/units/solrizer_spec.rb
|
197
|
+
- spec/units/suffix_spec.rb
|
196
198
|
- spec/units/xml_extractor_spec.rb
|
197
199
|
homepage: http://github.com/projecthydra/solrizer
|
198
200
|
licenses: []
|
@@ -203,17 +205,17 @@ require_paths:
|
|
203
205
|
- lib
|
204
206
|
required_ruby_version: !ruby/object:Gem::Requirement
|
205
207
|
requirements:
|
206
|
-
- -
|
208
|
+
- - ">="
|
207
209
|
- !ruby/object:Gem::Version
|
208
210
|
version: '0'
|
209
211
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
210
212
|
requirements:
|
211
|
-
- -
|
213
|
+
- - ">="
|
212
214
|
- !ruby/object:Gem::Version
|
213
215
|
version: '0'
|
214
216
|
requirements: []
|
215
217
|
rubyforge_project:
|
216
|
-
rubygems_version: 2.
|
218
|
+
rubygems_version: 2.2.2
|
217
219
|
signing_key:
|
218
220
|
specification_version: 4
|
219
221
|
summary: A utility for building solr indexes, usually from Fedora repository content
|
@@ -225,5 +227,6 @@ test_files:
|
|
225
227
|
- spec/units/extractor_spec.rb
|
226
228
|
- spec/units/field_mapper_spec.rb
|
227
229
|
- spec/units/solrizer_spec.rb
|
230
|
+
- spec/units/suffix_spec.rb
|
228
231
|
- spec/units/xml_extractor_spec.rb
|
229
232
|
has_rdoc:
|
data/README.textile
DELETED
@@ -1,249 +0,0 @@
|
|
1
|
-
h1. solrizer
|
2
|
-
|
3
|
-
A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from the command line, or as a JMS listener.
|
4
|
-
|
5
|
-
Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
|
6
|
-
datasource and write solr documents into a solr instance, you need to use an implementation specific gem, such as
|
7
|
-
"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a fedora repository and writing to a solr instance.
|
8
|
-
|
9
|
-
|
10
|
-
h2. Installation
|
11
|
-
|
12
|
-
The gem is hosted on rubygems.org. The best way to manage the gems for your project is to use bundler. Create a Gemfile in the root of your application and include the following:
|
13
|
-
|
14
|
-
<pre>
|
15
|
-
source "http://rubygems.org"
|
16
|
-
|
17
|
-
gem 'solrizer'
|
18
|
-
</pre>
|
19
|
-
|
20
|
-
Then:
|
21
|
-
|
22
|
-
<pre>bundle install</pre>
|
23
|
-
|
24
|
-
h2. Usage
|
25
|
-
|
26
|
-
h3. Fire up the console:
|
27
|
-
|
28
|
-
The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
|
29
|
-
|
30
|
-
Start up a console and load solrizer:
|
31
|
-
|
32
|
-
<pre>
|
33
|
-
irb
|
34
|
-
require "rubygems"
|
35
|
-
require "solrizer"
|
36
|
-
</pre>
|
37
|
-
|
38
|
-
|
39
|
-
h3. Field Mapper
|
40
|
-
|
41
|
-
The FieldMapper maps term names and values to Solr fields, based on the term’s data type and any index_as options. Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr schema.xml file. A copy of that is available :
|
42
|
-
https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml
|
43
|
-
|
44
|
-
More information on the conventions followed for the dynamic solr fields is here:
|
45
|
-
https://github.com/projecthydra/hydra-head/wiki/Solr-Schema
|
46
|
-
|
47
|
-
<pre>
|
48
|
-
default_mapper = Solrizer::FieldMapper::Default.new
|
49
|
-
|
50
|
-
# some of the default mappings in solrizer
|
51
|
-
default_mapper.solr_name("foo",:string,:searchable) # returns foo_tesim
|
52
|
-
default_mapper.solr_name("foo",:date,:searchable) # returns foo_dtsim
|
53
|
-
default_mapper.solr_name("foo",:integer,:searchable # returns foo_isim
|
54
|
-
default_mapper.solr_name("foo",:string,:facetable) # returns foo_sim
|
55
|
-
default_mapper.solr_name("foo",:integer,:facetable) # returns foo_iim
|
56
|
-
default_mapper.solr_name("foo",:string,:sortable) # returns foo_si
|
57
|
-
default_mapper.solr_name("foo",:string,:displayable) # returns foo_ssm
|
58
|
-
</pre>
|
59
|
-
|
60
|
-
## Using default indexing strategies
|
61
|
-
|
62
|
-
<pre>
|
63
|
-
solr_doc = {}
|
64
|
-
Solrizer.insert_field(solr_doc, 'title', 'whatever', :searchable)
|
65
|
-
=> {"title_tesim"=>["whatever"]}
|
66
|
-
|
67
|
-
Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
|
68
|
-
=> {"title_tesim"=>["whatever"], "pub_date_ssi"=>["Nov 2012"], "pub_date_ssm"=>["Nov 2012"]}
|
69
|
-
</pre>
|
70
|
-
|
71
|
-
#### You can also index dates
|
72
|
-
<pre>
|
73
|
-
# as a date
|
74
|
-
solr_doc = {}
|
75
|
-
Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
|
76
|
-
=> {"pub_date_dtsi"=>["2012-11-07T00:00:00Z"]}
|
77
|
-
|
78
|
-
# or as a string
|
79
|
-
solr_doc = {}
|
80
|
-
Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
|
81
|
-
=> {"pub_date_ssi"=>["2012-11-07"], "pub_date_ssm"=>["2012-11-07"]}
|
82
|
-
|
83
|
-
# or a string that is stored as a date
|
84
|
-
solr_doc = {}
|
85
|
-
Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
|
86
|
-
=> {"pub_date_dtsi"=>["2013-01-29T00:00:00Z"]}
|
87
|
-
</pre>
|
88
|
-
|
89
|
-
|
90
|
-
## Using a custom indexing strategy
|
91
|
-
All you have to do is create your own index descriptor:
|
92
|
-
<pre>
|
93
|
-
solr_doc = {}
|
94
|
-
displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
|
95
|
-
Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
|
96
|
-
{"some_count_isi"=>["45"]}
|
97
|
-
</pre>
|
98
|
-
|
99
|
-
## Changing the behavior of a default descriptor
|
100
|
-
|
101
|
-
Simply override the methods within Solrizer::DefaultDescriptors
|
102
|
-
<pre>
|
103
|
-
# before
|
104
|
-
solr_doc = {}
|
105
|
-
Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
|
106
|
-
=> {"title_sim"=>["foobar"]}
|
107
|
-
|
108
|
-
# redefine facetable:
|
109
|
-
module Solrizer
|
110
|
-
module DefaultDescriptors
|
111
|
-
def self.facetable
|
112
|
-
Descriptor.new(:string, :indexed, :stored)
|
113
|
-
end
|
114
|
-
end
|
115
|
-
end
|
116
|
-
|
117
|
-
# after
|
118
|
-
solr_doc = {}
|
119
|
-
Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
|
120
|
-
=> {"title_ssi"=>["foobar"]}
|
121
|
-
</pre>
|
122
|
-
|
123
|
-
|
124
|
-
## Creating your own Indexers
|
125
|
-
<pre>
|
126
|
-
module MyMappers
|
127
|
-
def self.mapper_one
|
128
|
-
Solrizer::Descriptor.new(:string, :indexed, :stored)
|
129
|
-
end
|
130
|
-
end
|
131
|
-
|
132
|
-
solr_doc = {}
|
133
|
-
|
134
|
-
Solrizer::FieldMapper.descriptors = [MyMappers]
|
135
|
-
=> [MyMappers]
|
136
|
-
|
137
|
-
Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
|
138
|
-
=> {"title_ssi"=>["foobar"]}
|
139
|
-
</pre>
|
140
|
-
|
141
|
-
## Using OM
|
142
|
-
Same as it ever was:
|
143
|
-
<pre>
|
144
|
-
t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
|
145
|
-
</pre>
|
146
|
-
|
147
|
-
But now you may also pass an Descriptor instance if that works for you:
|
148
|
-
<pre>
|
149
|
-
indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
|
150
|
-
t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
|
151
|
-
|
152
|
-
</pre>
|
153
|
-
|
154
|
-
h3. Extractor and Extractor Mixins
|
155
|
-
|
156
|
-
Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
|
157
|
-
|
158
|
-
<pre>
|
159
|
-
extractor = Solrizer::Extractor.new
|
160
|
-
|
161
|
-
extractor.format_node_value(["foo ","\n bar"]) # returns "foo bar"
|
162
|
-
|
163
|
-
solr_doc = Hash.new
|
164
|
-
extractor.insert_solr_field_value(solr_doc, "foo","bar") # solr_doc is now {"foo" => ["bar"]}
|
165
|
-
extractor.insert_solr_field_value(solr_doc,"foo","baz") # solr_doc is now {"foo" => ["bar","baz"]}
|
166
|
-
extractor.insert_solr_field_value(solr_doc, "boo","hoo") # solr_doc is now {"foo" => ["bar","baz"], "boo" => ["hoo"]}
|
167
|
-
</pre>
|
168
|
-
|
169
|
-
h4. Solrizer provides some default mixins:
|
170
|
-
|
171
|
-
* Solrizer::HTML::Extractor -=> provides html_to_solr method
|
172
|
-
* Solrizer::XML::Extractor -=> provides xml_to_solr method
|
173
|
-
|
174
|
-
<pre>
|
175
|
-
xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
|
176
|
-
|
177
|
-
extractor.xml_to_solr(xml) # returns {:foo_tesim=>"bar", :bar_tesim=>"baz"}
|
178
|
-
</pre>
|
179
|
-
|
180
|
-
h4. Solrizer::XML::TerminologyBasedSolrizer
|
181
|
-
|
182
|
-
Another powerful mixin for use with classes that include the OM::XML::Document module is Solrizer::XML::TerminologyBasedSolrizer.
|
183
|
-
The methods provided by this module map provides a robust way of mapping terms and solr fields via om terminologies. A notable example
|
184
|
-
can be found in ActiveFedora::NokogiriDatatstream.
|
185
|
-
|
186
|
-
|
187
|
-
h2. JMS Listener for Hydra Rails Applications
|
188
|
-
|
189
|
-
h3. The executables: solrizer and solrizerd
|
190
|
-
|
191
|
-
The solrizer gem provides two executables:
|
192
|
-
|
193
|
-
* solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
|
194
|
-
* solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
|
195
|
-
|
196
|
-
h3. Usage
|
197
|
-
|
198
|
-
The usage for solrizerd is as follows:
|
199
|
-
|
200
|
-
<pre>
|
201
|
-
solrizerd command --hydra_home PATH [options]
|
202
|
-
</pre>
|
203
|
-
|
204
|
-
The commands are as follows:
|
205
|
-
* start start an instance of the application
|
206
|
-
* stop stop all instances of the application
|
207
|
-
* restart stop all instances and restart them afterwards
|
208
|
-
* status show status (PID) of application instances
|
209
|
-
|
210
|
-
Required parameters:
|
211
|
-
|
212
|
-
--hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
|
213
|
-
|
214
|
-
The options:
|
215
|
-
* -p, --port Stomp port 61613
|
216
|
-
* -o, --host Host to connect to localhost
|
217
|
-
* -u, --user User name for stomp listener
|
218
|
-
* -w, --password Password for stomp listener
|
219
|
-
* -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
|
220
|
-
* -h, --help Display this screen
|
221
|
-
|
222
|
-
Note:
|
223
|
-
|
224
|
-
Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
|
225
|
-
|
226
|
-
|
227
|
-
h2. Note on Patches/Pull Requests
|
228
|
-
|
229
|
-
* Fork the project.
|
230
|
-
* Make your feature addition or bug fix.
|
231
|
-
* Add tests for it. This is important so I don't break it in a
|
232
|
-
future version unintentionally.
|
233
|
-
* Commit, do not mess with rake file, version, or history.
|
234
|
-
(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
|
235
|
-
* Send me a pull request. Bonus points for topic branches.
|
236
|
-
|
237
|
-
h2. Acknowledgements
|
238
|
-
|
239
|
-
Technical Lead: Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
|
240
|
-
|
241
|
-
Thanks to
|
242
|
-
|
243
|
-
Douglas Kim, who created the initial code base for Solrizer.
|
244
|
-
Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
|
245
|
-
Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
|
246
|
-
|
247
|
-
h2. Copyright
|
248
|
-
|
249
|
-
Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.
|