lucid_works 0.2.0 → 0.3.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.rdoc ADDED
@@ -0,0 +1,285 @@
1
+ == LucidWorks-Ruby
2
+
3
+ Ruby bindings for the REST API of the LucidWorks family of search products.
4
+
5
+ The LucidWorks family of products are search engines that combine the open source search technologies Lucene and Solr with open source crawlers, a management UI and a REST API. The LucidWorks REST API provides a programmatic way to manage collections, data-sources, scheduling and many of the other objects and tasks involved in running a search engine.
6
+
7
+ == Information
8
+
9
+ You can view the LucidWorks-Ruby documentation in RDoc format here:
10
+
11
+ http://rubydoc.info/github/lucidimagination/lucidworks-ruby/master/frames
12
+
13
+ The LucidWorks REST API is documented here:
14
+
15
+ http://lucidworks.lucidimagination.com/display/LWEUG/Rest+API
16
+
17
+ === Bug reports
18
+
19
+ Where should people file bugs?
20
+ GitHub? That implies we have open sourced this already.
21
+ An email address at Lucid?
22
+
23
+ == Installation
24
+
25
+ Install the gem:
26
+
27
+ gem install lucid_works
28
+
29
+ Or add it to your Gemfile, then run bundle install:
30
+
31
+ gem "lucid_works"
32
+
33
+ == Show Me the Money
34
+
35
+ This single statement (note the periods) will connect to a LucidWorks server running on the local machine, create a collection called "News" and a data-source called "cnn" for the cnn.com website, then start a crawl. Cut and paste into Irb:
36
+
37
+ require 'lucid_works'
38
+
39
+ LucidWorks::Server.new("http://localhost:8888").
40
+ create_collection(:name => 'News').
41
+ create_datasource(:name => 'cnn',
42
+ :crawler => 'lucid.aperture', :type => 'web',
43
+ :url => 'http://cnn.com', :crawl_depth => '1').
44
+ build_schedule(:start_time => 0, :period => 0, :type => 'index', :active => true).
45
+ save
46
+
47
+ Now, how does it work:
48
+
49
+
50
+ == Object Model
51
+
52
+ The LucidWorks object model looks something like this:
53
+
54
+ Server -+- Collection -+- Datasource -+- Status
55
+ | | +- History
56
+ | | +- Schedule
57
+ | | +- Index
58
+ | | +- Crawldata
59
+ | +- Field
60
+ | +- Index
61
+ | +- Info
62
+ | +- Settings
63
+ |
64
+ +- Logs -+- Index -+- Summary
65
+ | +- Query -+- Summary
66
+ |
67
+ +- Crawlers
68
+
69
+ This is what has been modeled so far. The actual REST API is more extensive.
70
+
71
+ == Usage
72
+
73
+ === Server
74
+
75
+ The starting point for our communication with a LucidWorks server is a LucidWorks::Server object, e.g. for a LucidWorks server running on the local machine, on the standard port:
76
+
77
+ server = LucidWorks::Server.new("http://localhost:8888")
78
+
79
+ === Collections
80
+
81
+ Collections are modeled using the LucidWorks::Collection class. LucidWorks::Server has_many :collections, therefore:
82
+
83
+ To retrieve collections:
84
+
85
+ @server.collections -> an array LucidWorks::Collection
86
+
87
+ puts @server.collections.map(&:name)
88
+
89
+ @server.collection("name") -> a single LucidWorks::Collection
90
+
91
+ Create a collection:
92
+
93
+ collection = @server.build_collection(:name => "MY_STUFF")
94
+ collection.save
95
+
96
+ or
97
+
98
+ collection = @server.create_collection(:name => "MY_STUFF")
99
+
100
+ Delete a collection:
101
+
102
+ collection.destroy
103
+
104
+ Wipe all indexed data from a collection:
105
+
106
+ collection.empty!
107
+
108
+ === Collection Info
109
+
110
+ The Collection::Info contains a lot of data about the state of a collection.
111
+
112
+ info = @server.collection('coll1').info -> a LucidWorks::Collection::Info
113
+
114
+ info.index_num_docs -> 12345
115
+ info.index_size -> "44.3 MB"
116
+
117
+ === Collection Settings
118
+
119
+ The Collection::Settings class contains indexing and querying settings for the collection.
120
+
121
+ settings = @server.collection('collection1').settings -> a LucidWorks::Collection::Settings
122
+
123
+ settings.query_parser -> "lucid"
124
+ settings.synonym_list -> ["Lawyer", "Attorney", "one", "1", ...]
125
+
126
+ === Field
127
+
128
+ Collection has_many :fields. The Field class models data about a collection's field.
129
+
130
+ field = @server.collection('collection1').field('body') -> a LucidWorks::Field
131
+
132
+ field.field_type -> "text_en"
133
+ field.facet -> false
134
+
135
+ === Datasources
136
+
137
+ Collection has_many :datasources. Datasources are modeled using the LucidWorks::Datasource class. They support all the standard ORM methods, e.g.
138
+
139
+ collection.datasources -> an array of LucidWorks::Datasource
140
+
141
+ collection.datasource(123) -> a single LucidWorks::Datasource
142
+
143
+ datasource = collection.create_datasource(
144
+ :crawler => LucidWorks::Datasource::CRAWLERS['web'],
145
+ :type => 'web',
146
+ :name => "example.com",
147
+ :url => "http://example.com/",
148
+ :crawl_depth => 1
149
+ )
150
+
151
+ Note that the latter does not start a crawl of the datasource.
152
+
153
+ To delete all the data crawled from a data-source:
154
+
155
+ datasource.empty!
156
+
157
+ === The ORM
158
+
159
+ This library implements a simple ORM (object relational model) on top of the LucidWorks REST API which behaves somewhat like ActiveResource/ActiveRecord (if you want to know why we didn't just use ActiveResource, see the Rationale section).
160
+
161
+ === Base
162
+
163
+ LucidWorks::Base is the ORM foundation of this library. It supports many of the ActiveRecord style methods. e.g. given a Thing model:
164
+
165
+ class Thing < LucidWorks::Base
166
+ end
167
+
168
+ Then Thing will have the following class methods:
169
+
170
+ thing = Thing.new(:attrib => value, :parent => parent) -> unsaved Thing
171
+
172
+ Thing.create(:attr => value, ..., :parent => parent) -> saved Thing
173
+
174
+ Thing.find(:all, :parent => parent) -> Array of Thing
175
+
176
+ Thing.find(id, :parent => parent) -> a Thing
177
+
178
+ The 'parent' must be another LucidWorks::Base model or a LucidWorks::Server; this is only required when the class is used stand-alone. If the model is created/retrieved from an association, this value is set for you automatically.
179
+
180
+ thing.save -> true/false
181
+ thing.destroy
182
+
183
+ ==== Has_many associations
184
+
185
+ The has_many association is used to associate a resource with another collection resource. Given:
186
+
187
+ class Thing < LucidWorks::Base
188
+ has_many :others
189
+ end
190
+
191
+ Then
192
+
193
+ thing.others -> array of Other
194
+
195
+ thing.other(id) -> an Other
196
+
197
+ thing.new_other(:attr => val, ...) -> an unsaved Other
198
+
199
+ thing.create_other(:attr => val, ...) -> saved Other
200
+
201
+ ==== Has_one associations
202
+
203
+ The has_one association is used to associate a resource with another singleton resource that is transient, i.e. can be created and destroyed.
204
+
205
+ class Thing < LucidWorks::Base
206
+ has_one :whatnot
207
+ end
208
+
209
+ class Whatnot < LucidWorks::Base
210
+ self.singleton = true
211
+ belongs_to :thing
212
+ end
213
+
214
+ Then
215
+
216
+ thing.whatnot -> a retrieved Whatnot
217
+
218
+ thing.build_whatnot -> an unsaved Whatnot
219
+
220
+ ==== Has_singleton associations
221
+
222
+ The has_singleton association is used to associate a resource with another intransient singleton resource, i.e. one that always exists and calling destroy does not remove it.
223
+
224
+ class Thing < LucidWorks::Base
225
+ has_one :whatnot
226
+ end
227
+
228
+ class Whatnot < LucidWorks::Base
229
+ self.singleton = true
230
+ belongs_to :thing
231
+ end
232
+
233
+ Then
234
+
235
+ thing.whatnot -> an unsaved Whatnot
236
+
237
+ === Belongs_to associations
238
+
239
+ Te belongs to association augments the model with methods to access its parent. Given:
240
+
241
+ class Whatnot < LucidWorks::Base
242
+ self.singleton = true
243
+ belongs_to :thing
244
+ end
245
+
246
+ Then:
247
+
248
+ whatnot.thing -> A Thing
249
+
250
+ === Rationale
251
+
252
+ Originally this library started out as a set of ActiveResource classes. This required a lot of hacking of ActiveResource as ActiveResource makes a lot of assumptions about the way a REST API should work - it's basically just designed to talk to Rails applications - and many REST APIs, including this one, don't conform to those rules. Among the changes required to ActiveResource were:
253
+
254
+ - Don't require attributes always be nested inside :resource => on create and update.
255
+ - Allow client-side generation of a resource ID during create.
256
+ - Support has_one and has_many associations.
257
+
258
+ However eventually this strategy hit a brick wall that would have been extremely expensive to hurdle. We needed the following features:
259
+
260
+ - The ability to talk to the same API on more than one server simultaneously.
261
+ - Support file uploads using multi-part post.
262
+
263
+ Given the design of ActiveResource these would have been expensive to implement and it became simpler to just write a simple ORM by marrying ActiveModel and RestClient.
264
+
265
+ == Maintainers
266
+
267
+ * Sam Pierson (http://github.com/sampierson)
268
+
269
+ == License
270
+
271
+ Copyright 2011 Lucid Imagination
272
+ http://lucidimagination.com
273
+
274
+ Licensed under the Apache License, Version 2.0 (the "License");
275
+ you may not use this software except in compliance with the License.
276
+ You may obtain a copy of the License at
277
+
278
+ http://www.apache.org/licenses/LICENSE-2.0
279
+
280
+ Unless required by applicable law or agreed to in writing, software
281
+ distributed under the License is distributed on an "AS IS" BASIS,
282
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
283
+ See the License for the specific language governing permissions and
284
+ limitations under the License.
285
+
data/Rakefile CHANGED
@@ -1,7 +1,13 @@
1
1
  require 'bundler'
2
+ require 'rake/rdoctask'
3
+
2
4
  Bundler::GemHelper.install_tasks
3
5
 
4
- desc "Create RDoc documentation"
5
- task :doc do
6
- system 'rdoc'
6
+ desc 'Generate documentation for the lucid_works library.'
7
+ Rake::RDocTask.new(:rdoc) do |rdoc|
8
+ rdoc.rdoc_dir = 'rdoc'
9
+ rdoc.title = 'LucidWorks-Ruby'
10
+ rdoc.main = 'README.rdoc'
11
+ rdoc.rdoc_files.include('README.rdoc')
12
+ rdoc.rdoc_files.include('lib/**/*.rb')
7
13
  end
@@ -0,0 +1,106 @@
1
+ ---
2
+ en:
3
+ activemodel:
4
+ models:
5
+ lucid_works:
6
+ collection:
7
+ one: Collection
8
+ other: Collections
9
+ settings:
10
+ one: Settings
11
+ other: Settings
12
+ de_duplication:
13
+ 'off': 'Off'
14
+ overwrite: Overwrite
15
+ tag: Tag
16
+ datasource:
17
+ one: Data-source
18
+ other: Data-sources
19
+ status:
20
+ crawl_state:
21
+ aborted: Aborted
22
+ aborting: Aborting
23
+ exception: Exception
24
+ finished: Finished
25
+ idle: Idle
26
+ running: Running
27
+ stopped: Stopped
28
+ stopping: Stoppong
29
+ type:
30
+ file: Local Filesystem
31
+ jdbc: Database
32
+ sharepoint: Sharepoint
33
+ solrxml: Solr XML
34
+ web: Web Site
35
+ attributes:
36
+ lucid_works:
37
+ collection:
38
+ name: Name
39
+ info:
40
+ collection_name: Collection name
41
+ data_dir: Data directory
42
+ free_disk_bytes: Free disk bytes
43
+ free_disk_space: Free disk space
44
+ index_directory: Index directory
45
+ index_has_deletions: Index has deletions
46
+ index_is_current: Index is current
47
+ index_is_optimized: Optimized
48
+ index_last_modified: Index last modified
49
+ index_max_doc: Index max doc
50
+ index_num_docs: Documents indexed
51
+ index_size: Index size
52
+ index_size_bytes: Index size
53
+ index_version: Index version
54
+ instance_dir: Instance directory
55
+ root_dir: Root directory
56
+ total_disk_bytes: Total disk bytes
57
+ total_disk_space: Total disk space
58
+ settings:
59
+ auto_complete: Auto complete
60
+ boosts: Boosts
61
+ boost_recent: Boost recent
62
+ click_boost_data: Click boost data
63
+ click_boost_field: Click boost field
64
+ click_enabled: Click scoring enabled
65
+ default_sort: Default sort
66
+ de_duplication: De-duplication
67
+ display_facets: Display facets
68
+ elevations: elevations
69
+ index_time_stopwords: Don't index stop words
70
+ query_parser: Query parser
71
+ query_time_stopwords: Include stop words in searches
72
+ query_time_synonyms: Use synomyms
73
+ search_server_list: Search server list
74
+ show_similar: Show similar
75
+ spellcheck: Spell-check
76
+ ssl: SSL
77
+ stopword_list: Stopword list
78
+ synonym_list: Synonym list
79
+ unknown_type_handling: Default field type
80
+ unsupervised_feedback: Unsupervised feedback
81
+ unsupervised_feedback_emphasis: Unsupervised feedback emphasis
82
+ update_server_list: Update server list
83
+ datasource:
84
+ type: Type
85
+ bounds: Constrain to
86
+ max_bytes: Skip files larger than
87
+ history:
88
+ crawlStarted: Started
89
+ crawlStopped: Stopped
90
+ crawlState: State
91
+ numNew: New
92
+ numUpdated: Updated
93
+ numDeleted: Deleted
94
+ numUnchanged: Unchanged
95
+ numFailed: Failed
96
+ status:
97
+ crawlStarted: Last crawl started
98
+ crawlStopped: Last crawl stopped
99
+ crawlState: State
100
+ doc_count: Documents indexed
101
+ jobId: Job ID
102
+ numNew: New docs
103
+ numUpdated: Updated docs
104
+ numDeleted: Deleted docs
105
+ numUnchanged: Unchanged docs
106
+ numFailed: Failed docs
@@ -8,9 +8,12 @@ module LucidWorks
8
8
  # Specifies a singleton child resource.
9
9
  #
10
10
  # In the parent resource creates methods:
11
- # child - load and cache the child. Subsequent calls will access the cached value.
12
- # child! - load and cache the child, ignoring existing cached value if present.
13
- #
11
+ # if option :has_content is true (default)
12
+ # child - load and cache the child. Subsequent calls will access the cached value.
13
+ # child! - load and cache the child, ignoring existing cached value if present.
14
+ # build_child - create a new, unsaved resource
15
+ # if option :has_content is false
16
+ # child - create a new, unsaved resource
14
17
  # === Options
15
18
  #
16
19
  # The declaration can also include an options hash to specialize the behavior of the association.
@@ -21,6 +24,13 @@ module LucidWorks
21
24
  # from the association name, e.g.
22
25
  # has_one :info, :class_name => :collection_info # use CollectionInfo class
23
26
  # has_one :foo, :class_name => :'foo/bar' # use Foo::Bar class
27
+ # [:has_content]
28
+ # Changes the behavior of the .<resource> association method:
29
+ # If set to true (default), indicates that this resource may be retrieved using a GET, and
30
+ # the .<resource> method will retrieve it.
31
+ # If set to false, this resource may not be retrieved using a GET, and the .<resource> method
32
+ # will instead build and return new, unsaved, model. This is useful for pseudo-resources that only
33
+ # provide actions, not data.
24
34
  #
25
35
  def has_one(*arguments)
26
36
  options = arguments.last.is_a?(Hash) ? arguments.pop : {}
@@ -29,7 +39,6 @@ module LucidWorks
29
39
  end
30
40
  end
31
41
 
32
- #
33
42
  # Specifies a child resource.
34
43
  #
35
44
  # e.g. for Blog has_many posts
@@ -39,31 +48,11 @@ module LucidWorks
39
48
  # posts
40
49
  # post(id)
41
50
  #
42
- def has_many(resources, options = {})
43
- resource = resources.to_s.singularize
44
- resource_class_name = (options[:class_name] || resource).to_s.classify
45
-
46
- class_eval <<-EOF, __FILE__, __LINE__ + 1
47
- def #{resources}(options={})
48
- @#{resources} || #{resources}!
49
- end
50
-
51
- def #{resources}!(options={})
52
- @#{resources} = #{resource_class_name}.all(options.merge :parent => self)
53
- end
54
-
55
- def #{resource}(id, options={})
56
- #{resource_class_name}.find(id, options.merge(:parent => self))
57
- end
58
-
59
- def create_#{resource}(options = {})
60
- #{resource_class_name}.create(options.merge :parent => self)
61
- end
62
-
63
- def build_#{resource}(options = {})
64
- #{resource_class_name}.new(options.merge :parent => self)
65
- end
66
- EOF
51
+ def has_many(*arguments)
52
+ options = arguments.last.is_a?(Hash) ? arguments.pop : {}
53
+ arguments.each do |resources|
54
+ define_has_many resources, options
55
+ end
67
56
  end
68
57
 
69
58
  # Specified a parent resource.
@@ -99,14 +88,50 @@ module LucidWorks
99
88
 
100
89
  def define_has_one(resource, options={})
101
90
  resource_class_name = (options[:class_name] || resource).to_s.camelize
91
+
92
+ if options[:has_content] == false
93
+ class_eval <<-EOF1, __FILE__, __LINE__ + 1
94
+ def #{resource} # def resource
95
+ #{resource_class_name}.new(:parent => self) # Child.new(options.merge :parent => self)
96
+ end # end
97
+ EOF1
98
+ else
99
+ class_eval <<-EOF2, __FILE__, __LINE__ + 1
100
+ def #{resource} # def resource
101
+ @#{resource} || #{resource}! # @resource || resource!
102
+ end # end
103
+
104
+ def #{resource}! # def resource!
105
+ @#{resource} = #{resource_class_name}.find(:parent => self) # @resource = Resource.find(:parent => self)
106
+ end # end
107
+
108
+ def build_#{resource}(options = {})
109
+ #{resource_class_name}.new(options.merge :parent => self)
110
+ end
111
+ EOF2
112
+ end
113
+ end
114
+
115
+ def define_has_many(resources, options = {})
116
+ resource = resources.to_s.singularize
117
+ resource_class_name = (options[:class_name] || resource).to_s.classify
118
+
102
119
  class_eval <<-EOF, __FILE__, __LINE__ + 1
103
- def #{resource} # def resource
104
- @#{resource} || #{resource}! # @resource || resource!
105
- end # end
120
+ def #{resources}(options={})
121
+ @#{resources} || #{resources}!(options)
122
+ end
106
123
 
107
- def #{resource}! # def resource!
108
- @#{resource} = #{resource_class_name}.find(:parent => self) # @resource = Resource.find(:parent => self)
109
- end # end
124
+ def #{resources}!(options={})
125
+ @#{resources} = #{resource_class_name}.all(options.merge :parent => self)
126
+ end
127
+
128
+ def #{resource}(id, options={})
129
+ #{resource_class_name}.find(id, options.merge(:parent => self))
130
+ end
131
+
132
+ def create_#{resource}(options = {})
133
+ #{resource_class_name}.create(options.merge :parent => self)
134
+ end
110
135
 
111
136
  def build_#{resource}(options = {})
112
137
  #{resource_class_name}.new(options.merge :parent => self)