solrizer 3.1.1 → 3.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CONTRIBUTING.md +113 -0
- data/History.txt +9 -0
- data/README.md +252 -0
- data/bin/solrizer +1 -2
- data/lib/solrizer.rb +1 -1
- data/lib/solrizer/field_mapper.rb +1 -1
- data/lib/solrizer/suffix.rb +51 -33
- data/lib/solrizer/version.rb +1 -1
- data/solrizer.gemspec +1 -1
- data/spec/units/solrizer_spec.rb +10 -1
- data/spec/units/suffix_spec.rb +80 -0
- metadata +32 -29
- data/README.textile +0 -249
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 8d77a71925151847cbb586039e03296e1b0aa2c9
|
4
|
+
data.tar.gz: a2bbb1286b7b037dc5a83d81f84ec098edc60283
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 76ad75ea81fb427b72a0b214637427f44ecd54d5e4b80cc43f49ae4c13598816d3756144b4b1081f7c357169595d97d7c8ce4e1137533e54e99807c619f9d9d6
|
7
|
+
data.tar.gz: 9294cad6b4a5661321d4a0780652af3e31fbebfeabf6b3ebe5d62577d37aaf6cc685255edb50acc5fd8782f468bb05a0d8ba1e64bb40c346c6f10202e8103250
|
data/CONTRIBUTING.md
ADDED
@@ -0,0 +1,113 @@
|
|
1
|
+
# How to Contribute
|
2
|
+
|
3
|
+
We want your help to make Project Hydra great.
|
4
|
+
There are a few guidelines that we need contributors to follow so that we can have a chance of keeping on top of things.
|
5
|
+
|
6
|
+
## Hydra Project Intellectual Property Licensing and Ownership
|
7
|
+
|
8
|
+
All code contributors must have an Individual Contributor License Agreement (iCLA) on file with the Hydra Project Steering Group.
|
9
|
+
If the contributor works for an institution, the institution must have a Corporate Contributor License Agreement (cCLA) on file.
|
10
|
+
|
11
|
+
https://wiki.duraspace.org/display/hydra/Hydra+Project+Intellectual+Property+Licensing+and+Ownership
|
12
|
+
|
13
|
+
You should also add yourself to the `CONTRIBUTORS.md` file in the root of the project.
|
14
|
+
|
15
|
+
## Contribution Tasks
|
16
|
+
|
17
|
+
* Reporting Issues
|
18
|
+
* Making Changes
|
19
|
+
* Submitting Changes
|
20
|
+
* Merging Changes
|
21
|
+
|
22
|
+
### Reporting Issues
|
23
|
+
|
24
|
+
* Make sure you have a [GitHub account](https://github.com/signup/free)
|
25
|
+
* Submit a [Github issue](./issues) by:
|
26
|
+
* Clearly describing the issue
|
27
|
+
* Provide a descriptive summary
|
28
|
+
* Explain the expected behavior
|
29
|
+
* Explain the actual behavior
|
30
|
+
* Provide steps to reproduce the actual behavior
|
31
|
+
|
32
|
+
### Making Changes
|
33
|
+
|
34
|
+
* Fork the repository on GitHub
|
35
|
+
* Create a topic branch from where you want to base your work.
|
36
|
+
* This is usually the master branch.
|
37
|
+
* To quickly create a topic branch based on master; `git branch fix/master/my_contribution master`
|
38
|
+
* Then checkout the new branch with `git checkout fix/master/my_contribution`.
|
39
|
+
* Please avoid working directly on the `master` branch.
|
40
|
+
* You may find the [hub suite of commands](https://github.com/defunkt/hub) helpful
|
41
|
+
* Make commits of logical units.
|
42
|
+
* Your commit should include a high level description of your work in HISTORY.textile
|
43
|
+
* Check for unnecessary whitespace with `git diff --check` before committing.
|
44
|
+
* Make sure your commit messages are [well formed](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).
|
45
|
+
* If you created an issue, you can close it by including "Closes #issue" in your commit message. See [Github's blog post for more details](https://github.com/blog/1386-closing-issues-via-commit-messages)
|
46
|
+
|
47
|
+
```
|
48
|
+
Present tense short summary (50 characters or less)
|
49
|
+
|
50
|
+
More detailed description, if necessary. It should be wrapped to 72
|
51
|
+
characters. Try to be as descriptive as you can, even if you think that
|
52
|
+
the commit content is obvious, it may not be obvious to others. You
|
53
|
+
should add such description also if it's already present in bug tracker,
|
54
|
+
it should not be necessary to visit a webpage to check the history.
|
55
|
+
|
56
|
+
Include Closes #<issue-number> when relavent.
|
57
|
+
|
58
|
+
Description can have multiple paragraphs and you can use code examples
|
59
|
+
inside, just indent it with 4 spaces:
|
60
|
+
|
61
|
+
class PostsController
|
62
|
+
def index
|
63
|
+
respond_with Post.limit(10)
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
You can also add bullet points:
|
68
|
+
|
69
|
+
- you can use dashes or asterisks
|
70
|
+
|
71
|
+
- also, try to indent next line of a point for readability, if it's too
|
72
|
+
long to fit in 72 characters
|
73
|
+
```
|
74
|
+
|
75
|
+
* Make sure you have added the necessary tests for your changes.
|
76
|
+
* Run _all_ the tests to assure nothing else was accidentally broken.
|
77
|
+
* When you are ready to submit a pull request
|
78
|
+
|
79
|
+
### Submitting Changes
|
80
|
+
|
81
|
+
[Detailed Walkthrough of One Pull Request per Commit](http://ndlib.github.io/practices/one-commit-per-pull-request/)
|
82
|
+
|
83
|
+
* Read the article ["Using Pull Requests"](https://help.github.com/articles/using-pull-requests) on GitHub.
|
84
|
+
* Make sure your branch is up to date with its parent branch (i.e. master)
|
85
|
+
* `git checkout master`
|
86
|
+
* `git pull --rebase`
|
87
|
+
* `git checkout <your-branch>`
|
88
|
+
* `git rebase master`
|
89
|
+
* It is likely a good idea to run your tests again.
|
90
|
+
* Squash the commits for your branch into one commit
|
91
|
+
* `git rebase --interactive HEAD~<number-of-commits>` ([See Github help](https://help.github.com/articles/interactive-rebase))
|
92
|
+
* To determine the number of commits on your branch: `git log master..<your-branch> --oneline | wc -l`
|
93
|
+
* Squashing your branch's changes into one commit is "good form" and helps the person merging your request to see everything that is going on.
|
94
|
+
* Push your changes to a topic branch in your fork of the repository.
|
95
|
+
* Submit a pull request from your fork to the project.
|
96
|
+
|
97
|
+
### Merging Changes
|
98
|
+
|
99
|
+
* It is considered "poor from" to merge your own request.
|
100
|
+
* Please take the time to review the changes and get a sense of what is being changed. Things to consider:
|
101
|
+
* Does the commit message explain what is going on?
|
102
|
+
* Does the code changes have tests? _Not all changes need new tests, some changes are refactorings_
|
103
|
+
* Does the commit contain more than it should? Are two separate concerns being addressed in one commit?
|
104
|
+
* Did the Travis tests complete successfully?
|
105
|
+
* If you are uncertain, bring other contributors into the conversation by creating a comment that includes their @username.
|
106
|
+
* If you like the pull request, but want others to chime in, create a +1 comment and tag a user.
|
107
|
+
|
108
|
+
# Additional Resources
|
109
|
+
|
110
|
+
* [General GitHub documentation](http://help.github.com/)
|
111
|
+
* [GitHub pull request documentation](http://help.github.com/send-pull-requests/)
|
112
|
+
* [Pro Git](http://git-scm.com/book) is both a free and excellent book about Git.
|
113
|
+
* [A Git Config for Contributing](http://ndlib.github.io/practices/my-typical-per-project-git-config/)
|
data/History.txt
CHANGED
@@ -1,3 +1,12 @@
|
|
1
|
+
h2. 3.2.0
|
2
|
+
#25 Allow any field_value except nil to be inserted into a solr field
|
3
|
+
#24 Remove dependency on solrizer-fedora, use AF to update index by pid
|
4
|
+
#23 Enhance Suffix#config so it can be usefully overridden by downstream
|
5
|
+
|
6
|
+
h2. 3.1.1
|
7
|
+
#22 Support for boolean values
|
8
|
+
#21 Testing on Rails version 4
|
9
|
+
|
1
10
|
h2. 3.1.0
|
2
11
|
#16 Inserting non-multivalued fields should not create a solr error
|
3
12
|
#20 Time fields should be formatted correctly when using active_support/core_ext/date_time/conversions
|
data/README.md
ADDED
@@ -0,0 +1,252 @@
|
|
1
|
+
# solrizer
|
2
|
+
|
3
|
+
[![Build Status](https://travis-ci.org/projecthydra/solrizer.png?branch=master)](https://travis-ci.org/projecthydra/solrizer)
|
4
|
+
[![Gem Version](https://badge.fury.io/rb/solrizer.png)](http://badge.fury.io/rb/solrizer)
|
5
|
+
|
6
|
+
A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from
|
7
|
+
the command line, or as a JMS listener.
|
8
|
+
|
9
|
+
Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
|
10
|
+
data source and write solr documents into a solr instance, you need to use an implementation specific gem, such as
|
11
|
+
"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a
|
12
|
+
fedora repository and writing to a solr instance.
|
13
|
+
|
14
|
+
|
15
|
+
## Installation
|
16
|
+
|
17
|
+
The gem is hosted on [rubygems.org](http://rubygems.org/gems/solrizer). The best way to manage the gems for your project
|
18
|
+
is to use bundler. Create a Gemfile in the root of your application and include the following:
|
19
|
+
|
20
|
+
|
21
|
+
source "http://rubygems.org"
|
22
|
+
gem 'solrizer'
|
23
|
+
|
24
|
+
Then:
|
25
|
+
|
26
|
+
bundle install
|
27
|
+
|
28
|
+
## Usage
|
29
|
+
|
30
|
+
### Fire up the console:
|
31
|
+
|
32
|
+
The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
|
33
|
+
|
34
|
+
Start up a console and load solrizer:
|
35
|
+
|
36
|
+
> irb
|
37
|
+
> require "rubygems"
|
38
|
+
> require "solrizer"
|
39
|
+
|
40
|
+
### Field Mapper
|
41
|
+
|
42
|
+
The `FieldMapper` maps term names and values to Solr fields, based on the term's data type and any index_as options.
|
43
|
+
Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr
|
44
|
+
[schema.xml](https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml).
|
45
|
+
|
46
|
+
More information on the conventions followed for the dynamic solr fields is on the
|
47
|
+
[wiki page](https://github.com/projecthydra/hydra-head/wiki/Solr-Schema).
|
48
|
+
|
49
|
+
To examine all of Solrizer's field names, open up a ruby console:
|
50
|
+
|
51
|
+
|
52
|
+
> require 'solrizer'
|
53
|
+
=> true
|
54
|
+
> default_mapper = Solrizer::FieldMapper.new
|
55
|
+
=> #<Solrizer::FieldMapper:0x007fb47a273770 @id_field="id">
|
56
|
+
> default_mapper.solr_name("foo",:searchable, type: :string)
|
57
|
+
=> "foo_teim"
|
58
|
+
> default_mapper.solr_name("foo",:searchable, type: :date)
|
59
|
+
=> "foo_dtim"
|
60
|
+
> default_mapper.solr_name("foo",:searchable, type: :integer)
|
61
|
+
=> "foo_iim"
|
62
|
+
> default_mapper.solr_name("foo",:facetable, type: :string)
|
63
|
+
=> "foo_sim"
|
64
|
+
> default_mapper.solr_name("foo",:facetable, type: :integer)
|
65
|
+
=> "foo_sim"
|
66
|
+
> default_mapper.solr_name("foo",:sortable, type: :string)
|
67
|
+
=> "foo_si"
|
68
|
+
> default_mapper.solr_name("foo",:displayable, type: :string)
|
69
|
+
=> "foo_ssm"
|
70
|
+
|
71
|
+
### Default indexing strategies
|
72
|
+
|
73
|
+
> solr_doc = Hash.new
|
74
|
+
> Solrizer.insert_field(solr_doc, 'title', 'whatever', :stored_searchable)
|
75
|
+
=> {"title_tesim"=>["whatever"]}
|
76
|
+
|
77
|
+
> Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
|
78
|
+
=> {"pub_date_si"=>"Nov 2012", "pub_date_ssm"=>["Nov 2012"]}
|
79
|
+
|
80
|
+
### Indexing dates
|
81
|
+
|
82
|
+
as a date:
|
83
|
+
|
84
|
+
> solr_doc = {}
|
85
|
+
> Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
|
86
|
+
=> {"pub_date_dtim"=>["2012-11-07T00:00:00Z"]}
|
87
|
+
|
88
|
+
or as a string:
|
89
|
+
|
90
|
+
> solr_doc = {}
|
91
|
+
> Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
|
92
|
+
=> {"pub_date_dti"=>"2012-11-07T00:00:00Z", "pub_date_ssm"=>["2012-11-07"]}
|
93
|
+
|
94
|
+
or a string that is stored as a date:
|
95
|
+
|
96
|
+
> solr_doc = {}
|
97
|
+
> Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
|
98
|
+
=> {"pub_date_dtsim"=>["2013-01-29T00:00:00Z"]}
|
99
|
+
|
100
|
+
### Custom indexing strategies
|
101
|
+
|
102
|
+
#### Create your own index descriptor
|
103
|
+
|
104
|
+
> solr_doc = {}
|
105
|
+
> displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
|
106
|
+
> Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
|
107
|
+
=> {"some_count_isi"=>"45"}
|
108
|
+
|
109
|
+
#### Override the defaults
|
110
|
+
|
111
|
+
We can override the default indexing methods within `Solrizer::DefaultDescriptors`
|
112
|
+
|
113
|
+
Here's the default behavior:
|
114
|
+
|
115
|
+
> solr_doc = {}
|
116
|
+
> Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
|
117
|
+
=> {"title_sim"=>["foobar"]}
|
118
|
+
|
119
|
+
But let's override that by redefining `:facetable`
|
120
|
+
|
121
|
+
module Solrizer
|
122
|
+
module DefaultDescriptors
|
123
|
+
def self.facetable
|
124
|
+
Descriptor.new(:string, :indexed, :stored)
|
125
|
+
end
|
126
|
+
end
|
127
|
+
end
|
128
|
+
|
129
|
+
Now, `:facetable` will return something different:
|
130
|
+
|
131
|
+
> solr_doc = {}
|
132
|
+
> Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
|
133
|
+
=> {"title_ssi"=>"foobar"}
|
134
|
+
|
135
|
+
#### Creating your own indexers
|
136
|
+
|
137
|
+
module MyMappers
|
138
|
+
def self.mapper_one
|
139
|
+
Solrizer::Descriptor.new(:string, :indexed, :stored)
|
140
|
+
end
|
141
|
+
end
|
142
|
+
|
143
|
+
Now, set Solrizer's field mapper to use our new module:
|
144
|
+
|
145
|
+
> solr_doc = {}
|
146
|
+
> Solrizer::FieldMapper.descriptors = [MyMappers]
|
147
|
+
=> [MyMappers]
|
148
|
+
> Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
|
149
|
+
=> {"title_ssi"=>"foobar"}
|
150
|
+
|
151
|
+
### Using OM
|
152
|
+
|
153
|
+
t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
|
154
|
+
|
155
|
+
But now you may also pass an Descriptor instance if that works for you:
|
156
|
+
|
157
|
+
indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
|
158
|
+
t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
|
159
|
+
|
160
|
+
### Extractor and Extractor Mixins
|
161
|
+
|
162
|
+
Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
|
163
|
+
|
164
|
+
> extractor = Solrizer::Extractor.new
|
165
|
+
> solr_doc = Hash.new
|
166
|
+
> extractor.format_node_value(["foo ","\n bar"])
|
167
|
+
=> "foo bar"
|
168
|
+
> extractor.insert_solr_field_value(solr_doc, "foo","bar")
|
169
|
+
=> {"foo"=>"bar"}
|
170
|
+
> extractor.insert_solr_field_value(solr_doc,"foo","baz")
|
171
|
+
=> {"foo"=>["bar", "baz"]}
|
172
|
+
> extractor.insert_solr_field_value(solr_doc, "boo","hoo")
|
173
|
+
=> {"foo"=>["bar", "baz"], "boo"=>"hoo"}
|
174
|
+
|
175
|
+
#### Solrizer provides some default mixins:
|
176
|
+
|
177
|
+
`Solrizer::HTML::Extractor` provides html_to_solr method and `Solrizer::XML::Extractor` provides xml_to_solr method:
|
178
|
+
|
179
|
+
> Solrizer::XML::Extractor
|
180
|
+
> extractor = Solrizer::Extractor.new
|
181
|
+
> xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
|
182
|
+
> extractor.xml_to_solr(xml)
|
183
|
+
=> {:foo_tesim=>"bar", :bar_tesim=>"baz"}
|
184
|
+
|
185
|
+
#### Solrizer::XML::TerminologyBasedSolrizer
|
186
|
+
|
187
|
+
Another powerful mixin for use with classes that include the `OM::XML::Document` module is
|
188
|
+
`Solrizer::XML::TerminologyBasedSolrizer`. The methods provided by this module map provides a robust way of mapping
|
189
|
+
terms and solr fields via om terminologies. A notable example can be found in `ActiveFedora::NokogiriDatatstream`.
|
190
|
+
|
191
|
+
## JMS Listener for Hydra Rails Applications
|
192
|
+
|
193
|
+
### The executables: solrizer and solrizerd
|
194
|
+
|
195
|
+
The solrizer gem provides two executables:
|
196
|
+
|
197
|
+
* solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
|
198
|
+
* solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
|
199
|
+
|
200
|
+
### Usage
|
201
|
+
|
202
|
+
The usage for solrizerd is as follows:
|
203
|
+
|
204
|
+
solrizerd command --hydra_home PATH [options]
|
205
|
+
|
206
|
+
The commands are as follows:
|
207
|
+
* start start an instance of the application
|
208
|
+
* stop stop all instances of the application
|
209
|
+
* restart stop all instances and restart them afterwards
|
210
|
+
* status show status (PID) of application instances
|
211
|
+
|
212
|
+
Required parameters:
|
213
|
+
|
214
|
+
--hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
|
215
|
+
|
216
|
+
The options:
|
217
|
+
* -p, --port Stomp port 61613
|
218
|
+
* -o, --host Host to connect to localhost
|
219
|
+
* -u, --user User name for stomp listener
|
220
|
+
* -w, --password Password for stomp listener
|
221
|
+
* -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
|
222
|
+
* -h, --help Display this screen
|
223
|
+
|
224
|
+
Note:
|
225
|
+
|
226
|
+
Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
|
227
|
+
|
228
|
+
## Note on Patches/Pull Requests
|
229
|
+
|
230
|
+
* Fork the project.
|
231
|
+
* Make your feature addition or bug fix.
|
232
|
+
* Add tests for it. This is important so I don't break it in a
|
233
|
+
future version unintentionally.
|
234
|
+
* Commit, do not mess with rake file, version, or history.
|
235
|
+
(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
|
236
|
+
* Send me a pull request. Bonus points for topic branches.
|
237
|
+
|
238
|
+
## Acknowledgments
|
239
|
+
|
240
|
+
### Technical Lead
|
241
|
+
|
242
|
+
Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
|
243
|
+
|
244
|
+
### Thanks to
|
245
|
+
|
246
|
+
* Douglas Kim, who created the initial code base for Solrizer.
|
247
|
+
* Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
|
248
|
+
* Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
|
249
|
+
|
250
|
+
## Copyright
|
251
|
+
|
252
|
+
Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.
|
data/bin/solrizer
CHANGED
@@ -90,8 +90,7 @@ begin
|
|
90
90
|
puts @msg.headers.inspect
|
91
91
|
puts "\nPID: #{@msg.headers["pid"]}\n"
|
92
92
|
if ["addDatastream", "addRelationship","ingest","modifyDatastreamByValue","modifyDatastreamByReference","modifyObject","purgeDatastream","purgeRelationship"].include? method
|
93
|
-
|
94
|
-
solrizer.solrize @msg.headers["pid"]
|
93
|
+
ActiveFedora::Base.find(@msg.headers["pid"], cast: true).update_index
|
95
94
|
elsif method == "purgeObject"
|
96
95
|
ActiveFedora::SolrService.instance.conn.delete_by_id(pid)
|
97
96
|
else
|
data/lib/solrizer.rb
CHANGED
@@ -31,7 +31,7 @@ module Solrizer
|
|
31
31
|
|
32
32
|
# @params [Hash] doc the hash to insert the value into
|
33
33
|
# @params [String] name the name of the field (without the suffix)
|
34
|
-
# @params [String,Date] value the value to be inserted
|
34
|
+
# @params [String,Date,Array] value the value (or array of values) to be inserted
|
35
35
|
# @params [Array,Hash] indexer_args the arguments that find the indexer
|
36
36
|
# @returns [Hash] doc the document that was provided with the new field inserted
|
37
37
|
def self.insert_field(doc, name, value, *indexer_args)
|
@@ -181,7 +181,7 @@ module Solrizer
|
|
181
181
|
# mapped names and values. The values in the hash are _arrays_, and may contain multiple values.
|
182
182
|
|
183
183
|
def solr_names_and_values(field_name, field_value, index_types)
|
184
|
-
return {}
|
184
|
+
return {} if field_value.nil?
|
185
185
|
|
186
186
|
# Determine the set of index types
|
187
187
|
index_types = Array(index_types)
|
data/lib/solrizer/suffix.rb
CHANGED
@@ -1,20 +1,26 @@
|
|
1
|
+
require 'ostruct'
|
2
|
+
|
1
3
|
module Solrizer
|
2
4
|
class Suffix
|
3
5
|
|
4
|
-
def initialize(fields)
|
5
|
-
@fields = fields
|
6
|
+
def initialize(*fields)
|
7
|
+
@fields = fields.flatten
|
6
8
|
end
|
7
9
|
|
8
10
|
def multivalued?
|
9
|
-
|
11
|
+
has_field? :multivalued
|
10
12
|
end
|
11
13
|
|
12
14
|
def stored?
|
13
|
-
|
15
|
+
has_field? :stored
|
14
16
|
end
|
15
17
|
|
16
18
|
def indexed?
|
17
|
-
|
19
|
+
has_field? :indexed
|
20
|
+
end
|
21
|
+
|
22
|
+
def has_field? f
|
23
|
+
f.to_sym == :type or @fields.include? f.to_sym
|
18
24
|
end
|
19
25
|
|
20
26
|
def data_type
|
@@ -22,40 +28,52 @@ module Solrizer
|
|
22
28
|
end
|
23
29
|
|
24
30
|
def to_s
|
25
|
-
|
26
|
-
index_suffix = config[:index_suffix] if indexed?
|
27
|
-
multivalued_suffix = config[:multivalued_suffix] if multivalued?
|
31
|
+
|
28
32
|
raise Solrizer::InvalidIndexDescriptor, "Missing datatype for #{@fields}" unless data_type
|
29
|
-
type_suffix = config[:type_suffix].call(data_type)
|
30
|
-
raise Solrizer::InvalidIndexDescriptor, "Invalid datatype `#{data_type.inspect}'. Must be one of: :date, :time, :text, :text_en, :string, :integer" unless type_suffix
|
31
33
|
|
32
|
-
[config
|
34
|
+
field_suffix = [config.suffix_delimiter]
|
35
|
+
|
36
|
+
config.fields.select { |f| has_field? f }.each do |f|
|
37
|
+
key = :"#{f}_suffix"
|
38
|
+
field_suffix << if config.send(key).is_a? Proc
|
39
|
+
config.send(key).call(@fields)
|
40
|
+
else
|
41
|
+
config.send(key)
|
42
|
+
end
|
43
|
+
end
|
44
|
+
|
45
|
+
field_suffix.join
|
33
46
|
end
|
34
47
|
|
48
|
+
def self.config
|
49
|
+
@config ||= OpenStruct.new :fields => [:type, :stored, :indexed, :multivalued],
|
50
|
+
suffix_delimiter: '_',
|
51
|
+
type_suffix: (lambda do |fields|
|
52
|
+
type = fields.first
|
53
|
+
case type
|
54
|
+
when :string, :symbol # TODO `:symbol' usage ought to be deprecated
|
55
|
+
's'
|
56
|
+
when :text
|
57
|
+
't'
|
58
|
+
when :text_en
|
59
|
+
'te'
|
60
|
+
when :date, :time
|
61
|
+
'dt'
|
62
|
+
when :integer
|
63
|
+
'i'
|
64
|
+
when :boolean
|
65
|
+
'b'
|
66
|
+
else
|
67
|
+
raise Solrizer::InvalidIndexDescriptor, "Invalid datatype `#{type.inspect}'. Must be one of: :date, :time, :text, :text_en, :string, :symbol, :integer, :boolean"
|
68
|
+
end
|
69
|
+
end),
|
70
|
+
stored_suffix: 's',
|
71
|
+
indexed_suffix: 'i',
|
72
|
+
multivalued_suffix: 'm'
|
73
|
+
end
|
35
74
|
|
36
|
-
private
|
37
75
|
def config
|
38
|
-
@config ||=
|
39
|
-
{suffix_delimiter: '_',
|
40
|
-
type_suffix: lambda do |type|
|
41
|
-
case type
|
42
|
-
when :string, :symbol # TODO `:symbol' usage ought to be deprecated
|
43
|
-
's'
|
44
|
-
when :text
|
45
|
-
't'
|
46
|
-
when :text_en
|
47
|
-
'te'
|
48
|
-
when :date, :time
|
49
|
-
'dt'
|
50
|
-
when :integer
|
51
|
-
'i'
|
52
|
-
when :boolean
|
53
|
-
'b'
|
54
|
-
end
|
55
|
-
end,
|
56
|
-
stored_suffix: 's',
|
57
|
-
index_suffix: 'i',
|
58
|
-
multivalued_suffix: 'm'}
|
76
|
+
@config ||= self.class.config.dup
|
59
77
|
end
|
60
78
|
end
|
61
79
|
end
|
data/lib/solrizer/version.rb
CHANGED
data/solrizer.gemspec
CHANGED
data/spec/units/solrizer_spec.rb
CHANGED
@@ -25,15 +25,24 @@ describe Solrizer do
|
|
25
25
|
Solrizer.insert_field(doc, 'foo', Time.parse('2013-01-13T22:45:56+06:00'))
|
26
26
|
doc.should == {'foo_dtsim' => ["2013-01-13T16:45:56Z"]}
|
27
27
|
end
|
28
|
-
it "should insert Booleans" do
|
28
|
+
it "should insert true Booleans" do
|
29
29
|
Solrizer.insert_field(doc, 'foo', true)
|
30
30
|
doc.should == {'foo_bsi' => true}
|
31
31
|
end
|
32
|
+
it "should insert false Booleans" do
|
33
|
+
Solrizer.insert_field(doc, 'foo', false)
|
34
|
+
doc.should == {'foo_bsi' => false}
|
35
|
+
end
|
32
36
|
|
33
37
|
it "should insert multiple values" do
|
34
38
|
Solrizer.insert_field(doc, 'foo', ['A name', 'B name'], :sortable, :facetable)
|
35
39
|
doc.should == {'foo_si' => 'B name', 'foo_sim' => ['A name', 'B name']}
|
36
40
|
end
|
41
|
+
|
42
|
+
it 'should insert nothing when passed a nil value' do
|
43
|
+
Solrizer.insert_field(doc, 'foo', nil, :sortable, :facetable)
|
44
|
+
doc.should == {}
|
45
|
+
end
|
37
46
|
end
|
38
47
|
|
39
48
|
describe "on a document with values" do
|
@@ -0,0 +1,80 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
|
3
|
+
describe Solrizer::Suffix do
|
4
|
+
|
5
|
+
describe "#multivalued?" do
|
6
|
+
it "should be multivalued if :multivalued is among the field types" do
|
7
|
+
expect(Solrizer::Suffix.new(:multivalued)).to be_multivalued
|
8
|
+
end
|
9
|
+
|
10
|
+
it "should not be multivalued if :multivalued was not passed in a field type" do
|
11
|
+
expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_multivalued
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
15
|
+
describe "#stored?" do
|
16
|
+
it "should be stored if :stored is among the field types" do
|
17
|
+
expect(Solrizer::Suffix.new(:stored)).to be_stored
|
18
|
+
end
|
19
|
+
|
20
|
+
it "should not be stored if :stored was not passed in a field type" do
|
21
|
+
expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_stored
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
describe "#indexed?" do
|
26
|
+
it "should be indexed if :indexed is among the field types" do
|
27
|
+
expect(Solrizer::Suffix.new(:indexed)).to be_indexed
|
28
|
+
end
|
29
|
+
|
30
|
+
it "should not be indexed if :indexed was not passed in a field type" do
|
31
|
+
expect(Solrizer::Suffix.new(:some_other_field_type)).to_not be_indexed
|
32
|
+
end
|
33
|
+
end
|
34
|
+
describe "#has_field?" do
|
35
|
+
subject do
|
36
|
+
Solrizer::Suffix.new(:type, :a, :b, :c)
|
37
|
+
end
|
38
|
+
it "should be able to tell if a field is in the suffix or not" do
|
39
|
+
expect(subject).to have_field :a
|
40
|
+
expect(subject).to have_field :b
|
41
|
+
expect(subject).to have_field :c
|
42
|
+
expect(subject).to_not have_field :d
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
describe "#data_type" do
|
47
|
+
it "should always be the first argument to the suffix" do
|
48
|
+
expect(Solrizer::Suffix.new(:some_type, :a).data_type).to eq :some_type
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
describe "#to_s" do
|
53
|
+
it "should combine the fields into a suffix string" do
|
54
|
+
expect(Solrizer::Suffix.new(:string, :stored, :indexed).to_s).to eq '_ssi'
|
55
|
+
expect(Solrizer::Suffix.new(:integer, :stored, :multivalued).to_s).to eq '_ism'
|
56
|
+
end
|
57
|
+
end
|
58
|
+
|
59
|
+
describe "config" do
|
60
|
+
subject do
|
61
|
+
Solrizer::Suffix.new(:my_custom_type, :a, :b, :c)
|
62
|
+
end
|
63
|
+
|
64
|
+
it "should let you mess with the suffix config" do
|
65
|
+
subject.config.fields += [:b]
|
66
|
+
subject.config.suffix_delimiter = "#"
|
67
|
+
subject.config.type_suffix = lambda do |fields|
|
68
|
+
type = fields.first
|
69
|
+
|
70
|
+
if type == :my_custom_type
|
71
|
+
"custom_suffix_"
|
72
|
+
else
|
73
|
+
"nope"
|
74
|
+
end
|
75
|
+
end
|
76
|
+
subject.config.b_suffix = 'now_with_more_b'
|
77
|
+
expect(subject.to_s).to eq "#custom_suffix_now_with_more_b"
|
78
|
+
end
|
79
|
+
end
|
80
|
+
end
|
metadata
CHANGED
@@ -1,153 +1,153 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: solrizer
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.
|
4
|
+
version: 3.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Matt Zumwalt
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2014-05-28 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
16
16
|
requirements:
|
17
|
-
- -
|
17
|
+
- - ">="
|
18
18
|
- !ruby/object:Gem::Version
|
19
19
|
version: '0'
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
|
-
- -
|
24
|
+
- - ">="
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: '0'
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: xml-simple
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
30
30
|
requirements:
|
31
|
-
- -
|
31
|
+
- - ">="
|
32
32
|
- !ruby/object:Gem::Version
|
33
33
|
version: '0'
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
|
-
- -
|
38
|
+
- - ">="
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '0'
|
41
41
|
- !ruby/object:Gem::Dependency
|
42
42
|
name: mediashelf-loggable
|
43
43
|
requirement: !ruby/object:Gem::Requirement
|
44
44
|
requirements:
|
45
|
-
- - ~>
|
45
|
+
- - "~>"
|
46
46
|
- !ruby/object:Gem::Version
|
47
47
|
version: 0.4.7
|
48
48
|
type: :runtime
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
|
-
- - ~>
|
52
|
+
- - "~>"
|
53
53
|
- !ruby/object:Gem::Version
|
54
54
|
version: 0.4.7
|
55
55
|
- !ruby/object:Gem::Dependency
|
56
56
|
name: stomp
|
57
57
|
requirement: !ruby/object:Gem::Requirement
|
58
58
|
requirements:
|
59
|
-
- -
|
59
|
+
- - ">="
|
60
60
|
- !ruby/object:Gem::Version
|
61
61
|
version: '0'
|
62
62
|
type: :runtime
|
63
63
|
prerelease: false
|
64
64
|
version_requirements: !ruby/object:Gem::Requirement
|
65
65
|
requirements:
|
66
|
-
- -
|
66
|
+
- - ">="
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: '0'
|
69
69
|
- !ruby/object:Gem::Dependency
|
70
70
|
name: daemons
|
71
71
|
requirement: !ruby/object:Gem::Requirement
|
72
72
|
requirements:
|
73
|
-
- -
|
73
|
+
- - ">="
|
74
74
|
- !ruby/object:Gem::Version
|
75
75
|
version: '0'
|
76
76
|
type: :runtime
|
77
77
|
prerelease: false
|
78
78
|
version_requirements: !ruby/object:Gem::Requirement
|
79
79
|
requirements:
|
80
|
-
- -
|
80
|
+
- - ">="
|
81
81
|
- !ruby/object:Gem::Version
|
82
82
|
version: '0'
|
83
83
|
- !ruby/object:Gem::Dependency
|
84
84
|
name: activesupport
|
85
85
|
requirement: !ruby/object:Gem::Requirement
|
86
86
|
requirements:
|
87
|
-
- -
|
87
|
+
- - ">="
|
88
88
|
- !ruby/object:Gem::Version
|
89
89
|
version: '0'
|
90
90
|
type: :runtime
|
91
91
|
prerelease: false
|
92
92
|
version_requirements: !ruby/object:Gem::Requirement
|
93
93
|
requirements:
|
94
|
-
- -
|
94
|
+
- - ">="
|
95
95
|
- !ruby/object:Gem::Version
|
96
96
|
version: '0'
|
97
97
|
- !ruby/object:Gem::Dependency
|
98
98
|
name: rspec
|
99
99
|
requirement: !ruby/object:Gem::Requirement
|
100
100
|
requirements:
|
101
|
-
- -
|
101
|
+
- - ">="
|
102
102
|
- !ruby/object:Gem::Version
|
103
103
|
version: '0'
|
104
104
|
type: :development
|
105
105
|
prerelease: false
|
106
106
|
version_requirements: !ruby/object:Gem::Requirement
|
107
107
|
requirements:
|
108
|
-
- -
|
108
|
+
- - ">="
|
109
109
|
- !ruby/object:Gem::Version
|
110
110
|
version: '0'
|
111
111
|
- !ruby/object:Gem::Dependency
|
112
112
|
name: rake
|
113
113
|
requirement: !ruby/object:Gem::Requirement
|
114
114
|
requirements:
|
115
|
-
- -
|
115
|
+
- - ">="
|
116
116
|
- !ruby/object:Gem::Version
|
117
117
|
version: '0'
|
118
118
|
type: :development
|
119
119
|
prerelease: false
|
120
120
|
version_requirements: !ruby/object:Gem::Requirement
|
121
121
|
requirements:
|
122
|
-
- -
|
122
|
+
- - ">="
|
123
123
|
- !ruby/object:Gem::Version
|
124
124
|
version: '0'
|
125
125
|
- !ruby/object:Gem::Dependency
|
126
126
|
name: yard
|
127
127
|
requirement: !ruby/object:Gem::Requirement
|
128
128
|
requirements:
|
129
|
-
- -
|
129
|
+
- - ">="
|
130
130
|
- !ruby/object:Gem::Version
|
131
131
|
version: '0'
|
132
132
|
type: :development
|
133
133
|
prerelease: false
|
134
134
|
version_requirements: !ruby/object:Gem::Requirement
|
135
135
|
requirements:
|
136
|
-
- -
|
136
|
+
- - ">="
|
137
137
|
- !ruby/object:Gem::Version
|
138
138
|
version: '0'
|
139
139
|
- !ruby/object:Gem::Dependency
|
140
140
|
name: RedCloth
|
141
141
|
requirement: !ruby/object:Gem::Requirement
|
142
142
|
requirements:
|
143
|
-
- -
|
143
|
+
- - ">="
|
144
144
|
- !ruby/object:Gem::Version
|
145
145
|
version: '0'
|
146
146
|
type: :development
|
147
147
|
prerelease: false
|
148
148
|
version_requirements: !ruby/object:Gem::Requirement
|
149
149
|
requirements:
|
150
|
-
- -
|
150
|
+
- - ">="
|
151
151
|
- !ruby/object:Gem::Version
|
152
152
|
version: '0'
|
153
153
|
description: Use solrizer to populate solr indexes. You can run solrizer from within
|
@@ -159,14 +159,15 @@ executables:
|
|
159
159
|
extensions: []
|
160
160
|
extra_rdoc_files:
|
161
161
|
- LICENSE
|
162
|
-
- README.
|
162
|
+
- README.md
|
163
163
|
files:
|
164
|
-
- .gitignore
|
165
|
-
- .travis.yml
|
164
|
+
- ".gitignore"
|
165
|
+
- ".travis.yml"
|
166
|
+
- CONTRIBUTING.md
|
166
167
|
- Gemfile
|
167
168
|
- History.txt
|
168
169
|
- LICENSE
|
169
|
-
- README.
|
170
|
+
- README.md
|
170
171
|
- Rakefile
|
171
172
|
- bin/solrizer
|
172
173
|
- bin/solrizerd
|
@@ -193,6 +194,7 @@ files:
|
|
193
194
|
- spec/units/extractor_spec.rb
|
194
195
|
- spec/units/field_mapper_spec.rb
|
195
196
|
- spec/units/solrizer_spec.rb
|
197
|
+
- spec/units/suffix_spec.rb
|
196
198
|
- spec/units/xml_extractor_spec.rb
|
197
199
|
homepage: http://github.com/projecthydra/solrizer
|
198
200
|
licenses: []
|
@@ -203,17 +205,17 @@ require_paths:
|
|
203
205
|
- lib
|
204
206
|
required_ruby_version: !ruby/object:Gem::Requirement
|
205
207
|
requirements:
|
206
|
-
- -
|
208
|
+
- - ">="
|
207
209
|
- !ruby/object:Gem::Version
|
208
210
|
version: '0'
|
209
211
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
210
212
|
requirements:
|
211
|
-
- -
|
213
|
+
- - ">="
|
212
214
|
- !ruby/object:Gem::Version
|
213
215
|
version: '0'
|
214
216
|
requirements: []
|
215
217
|
rubyforge_project:
|
216
|
-
rubygems_version: 2.
|
218
|
+
rubygems_version: 2.2.2
|
217
219
|
signing_key:
|
218
220
|
specification_version: 4
|
219
221
|
summary: A utility for building solr indexes, usually from Fedora repository content
|
@@ -225,5 +227,6 @@ test_files:
|
|
225
227
|
- spec/units/extractor_spec.rb
|
226
228
|
- spec/units/field_mapper_spec.rb
|
227
229
|
- spec/units/solrizer_spec.rb
|
230
|
+
- spec/units/suffix_spec.rb
|
228
231
|
- spec/units/xml_extractor_spec.rb
|
229
232
|
has_rdoc:
|
data/README.textile
DELETED
@@ -1,249 +0,0 @@
|
|
1
|
-
h1. solrizer
|
2
|
-
|
3
|
-
A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from the command line, or as a JMS listener.
|
4
|
-
|
5
|
-
Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
|
6
|
-
datasource and write solr documents into a solr instance, you need to use an implementation specific gem, such as
|
7
|
-
"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a fedora repository and writing to a solr instance.
|
8
|
-
|
9
|
-
|
10
|
-
h2. Installation
|
11
|
-
|
12
|
-
The gem is hosted on rubygems.org. The best way to manage the gems for your project is to use bundler. Create a Gemfile in the root of your application and include the following:
|
13
|
-
|
14
|
-
<pre>
|
15
|
-
source "http://rubygems.org"
|
16
|
-
|
17
|
-
gem 'solrizer'
|
18
|
-
</pre>
|
19
|
-
|
20
|
-
Then:
|
21
|
-
|
22
|
-
<pre>bundle install</pre>
|
23
|
-
|
24
|
-
h2. Usage
|
25
|
-
|
26
|
-
h3. Fire up the console:
|
27
|
-
|
28
|
-
The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
|
29
|
-
|
30
|
-
Start up a console and load solrizer:
|
31
|
-
|
32
|
-
<pre>
|
33
|
-
irb
|
34
|
-
require "rubygems"
|
35
|
-
require "solrizer"
|
36
|
-
</pre>
|
37
|
-
|
38
|
-
|
39
|
-
h3. Field Mapper
|
40
|
-
|
41
|
-
The FieldMapper maps term names and values to Solr fields, based on the term’s data type and any index_as options. Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr schema.xml file. A copy of that is available :
|
42
|
-
https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml
|
43
|
-
|
44
|
-
More information on the conventions followed for the dynamic solr fields is here:
|
45
|
-
https://github.com/projecthydra/hydra-head/wiki/Solr-Schema
|
46
|
-
|
47
|
-
<pre>
|
48
|
-
default_mapper = Solrizer::FieldMapper::Default.new
|
49
|
-
|
50
|
-
# some of the default mappings in solrizer
|
51
|
-
default_mapper.solr_name("foo",:string,:searchable) # returns foo_tesim
|
52
|
-
default_mapper.solr_name("foo",:date,:searchable) # returns foo_dtsim
|
53
|
-
default_mapper.solr_name("foo",:integer,:searchable # returns foo_isim
|
54
|
-
default_mapper.solr_name("foo",:string,:facetable) # returns foo_sim
|
55
|
-
default_mapper.solr_name("foo",:integer,:facetable) # returns foo_iim
|
56
|
-
default_mapper.solr_name("foo",:string,:sortable) # returns foo_si
|
57
|
-
default_mapper.solr_name("foo",:string,:displayable) # returns foo_ssm
|
58
|
-
</pre>
|
59
|
-
|
60
|
-
## Using default indexing strategies
|
61
|
-
|
62
|
-
<pre>
|
63
|
-
solr_doc = {}
|
64
|
-
Solrizer.insert_field(solr_doc, 'title', 'whatever', :searchable)
|
65
|
-
=> {"title_tesim"=>["whatever"]}
|
66
|
-
|
67
|
-
Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
|
68
|
-
=> {"title_tesim"=>["whatever"], "pub_date_ssi"=>["Nov 2012"], "pub_date_ssm"=>["Nov 2012"]}
|
69
|
-
</pre>
|
70
|
-
|
71
|
-
#### You can also index dates
|
72
|
-
<pre>
|
73
|
-
# as a date
|
74
|
-
solr_doc = {}
|
75
|
-
Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
|
76
|
-
=> {"pub_date_dtsi"=>["2012-11-07T00:00:00Z"]}
|
77
|
-
|
78
|
-
# or as a string
|
79
|
-
solr_doc = {}
|
80
|
-
Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
|
81
|
-
=> {"pub_date_ssi"=>["2012-11-07"], "pub_date_ssm"=>["2012-11-07"]}
|
82
|
-
|
83
|
-
# or a string that is stored as a date
|
84
|
-
solr_doc = {}
|
85
|
-
Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
|
86
|
-
=> {"pub_date_dtsi"=>["2013-01-29T00:00:00Z"]}
|
87
|
-
</pre>
|
88
|
-
|
89
|
-
|
90
|
-
## Using a custom indexing strategy
|
91
|
-
All you have to do is create your own index descriptor:
|
92
|
-
<pre>
|
93
|
-
solr_doc = {}
|
94
|
-
displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
|
95
|
-
Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
|
96
|
-
{"some_count_isi"=>["45"]}
|
97
|
-
</pre>
|
98
|
-
|
99
|
-
## Changing the behavior of a default descriptor
|
100
|
-
|
101
|
-
Simply override the methods within Solrizer::DefaultDescriptors
|
102
|
-
<pre>
|
103
|
-
# before
|
104
|
-
solr_doc = {}
|
105
|
-
Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
|
106
|
-
=> {"title_sim"=>["foobar"]}
|
107
|
-
|
108
|
-
# redefine facetable:
|
109
|
-
module Solrizer
|
110
|
-
module DefaultDescriptors
|
111
|
-
def self.facetable
|
112
|
-
Descriptor.new(:string, :indexed, :stored)
|
113
|
-
end
|
114
|
-
end
|
115
|
-
end
|
116
|
-
|
117
|
-
# after
|
118
|
-
solr_doc = {}
|
119
|
-
Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
|
120
|
-
=> {"title_ssi"=>["foobar"]}
|
121
|
-
</pre>
|
122
|
-
|
123
|
-
|
124
|
-
## Creating your own Indexers
|
125
|
-
<pre>
|
126
|
-
module MyMappers
|
127
|
-
def self.mapper_one
|
128
|
-
Solrizer::Descriptor.new(:string, :indexed, :stored)
|
129
|
-
end
|
130
|
-
end
|
131
|
-
|
132
|
-
solr_doc = {}
|
133
|
-
|
134
|
-
Solrizer::FieldMapper.descriptors = [MyMappers]
|
135
|
-
=> [MyMappers]
|
136
|
-
|
137
|
-
Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
|
138
|
-
=> {"title_ssi"=>["foobar"]}
|
139
|
-
</pre>
|
140
|
-
|
141
|
-
## Using OM
|
142
|
-
Same as it ever was:
|
143
|
-
<pre>
|
144
|
-
t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
|
145
|
-
</pre>
|
146
|
-
|
147
|
-
But now you may also pass an Descriptor instance if that works for you:
|
148
|
-
<pre>
|
149
|
-
indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
|
150
|
-
t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
|
151
|
-
|
152
|
-
</pre>
|
153
|
-
|
154
|
-
h3. Extractor and Extractor Mixins
|
155
|
-
|
156
|
-
Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
|
157
|
-
|
158
|
-
<pre>
|
159
|
-
extractor = Solrizer::Extractor.new
|
160
|
-
|
161
|
-
extractor.format_node_value(["foo ","\n bar"]) # returns "foo bar"
|
162
|
-
|
163
|
-
solr_doc = Hash.new
|
164
|
-
extractor.insert_solr_field_value(solr_doc, "foo","bar") # solr_doc is now {"foo" => ["bar"]}
|
165
|
-
extractor.insert_solr_field_value(solr_doc,"foo","baz") # solr_doc is now {"foo" => ["bar","baz"]}
|
166
|
-
extractor.insert_solr_field_value(solr_doc, "boo","hoo") # solr_doc is now {"foo" => ["bar","baz"], "boo" => ["hoo"]}
|
167
|
-
</pre>
|
168
|
-
|
169
|
-
h4. Solrizer provides some default mixins:
|
170
|
-
|
171
|
-
* Solrizer::HTML::Extractor -=> provides html_to_solr method
|
172
|
-
* Solrizer::XML::Extractor -=> provides xml_to_solr method
|
173
|
-
|
174
|
-
<pre>
|
175
|
-
xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
|
176
|
-
|
177
|
-
extractor.xml_to_solr(xml) # returns {:foo_tesim=>"bar", :bar_tesim=>"baz"}
|
178
|
-
</pre>
|
179
|
-
|
180
|
-
h4. Solrizer::XML::TerminologyBasedSolrizer
|
181
|
-
|
182
|
-
Another powerful mixin for use with classes that include the OM::XML::Document module is Solrizer::XML::TerminologyBasedSolrizer.
|
183
|
-
The methods provided by this module map provides a robust way of mapping terms and solr fields via om terminologies. A notable example
|
184
|
-
can be found in ActiveFedora::NokogiriDatatstream.
|
185
|
-
|
186
|
-
|
187
|
-
h2. JMS Listener for Hydra Rails Applications
|
188
|
-
|
189
|
-
h3. The executables: solrizer and solrizerd
|
190
|
-
|
191
|
-
The solrizer gem provides two executables:
|
192
|
-
|
193
|
-
* solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
|
194
|
-
* solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
|
195
|
-
|
196
|
-
h3. Usage
|
197
|
-
|
198
|
-
The usage for solrizerd is as follows:
|
199
|
-
|
200
|
-
<pre>
|
201
|
-
solrizerd command --hydra_home PATH [options]
|
202
|
-
</pre>
|
203
|
-
|
204
|
-
The commands are as follows:
|
205
|
-
* start start an instance of the application
|
206
|
-
* stop stop all instances of the application
|
207
|
-
* restart stop all instances and restart them afterwards
|
208
|
-
* status show status (PID) of application instances
|
209
|
-
|
210
|
-
Required parameters:
|
211
|
-
|
212
|
-
--hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
|
213
|
-
|
214
|
-
The options:
|
215
|
-
* -p, --port Stomp port 61613
|
216
|
-
* -o, --host Host to connect to localhost
|
217
|
-
* -u, --user User name for stomp listener
|
218
|
-
* -w, --password Password for stomp listener
|
219
|
-
* -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
|
220
|
-
* -h, --help Display this screen
|
221
|
-
|
222
|
-
Note:
|
223
|
-
|
224
|
-
Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
|
225
|
-
|
226
|
-
|
227
|
-
h2. Note on Patches/Pull Requests
|
228
|
-
|
229
|
-
* Fork the project.
|
230
|
-
* Make your feature addition or bug fix.
|
231
|
-
* Add tests for it. This is important so I don't break it in a
|
232
|
-
future version unintentionally.
|
233
|
-
* Commit, do not mess with rake file, version, or history.
|
234
|
-
(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
|
235
|
-
* Send me a pull request. Bonus points for topic branches.
|
236
|
-
|
237
|
-
h2. Acknowledgements
|
238
|
-
|
239
|
-
Technical Lead: Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
|
240
|
-
|
241
|
-
Thanks to
|
242
|
-
|
243
|
-
Douglas Kim, who created the initial code base for Solrizer.
|
244
|
-
Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
|
245
|
-
Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
|
246
|
-
|
247
|
-
h2. Copyright
|
248
|
-
|
249
|
-
Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.
|