lucid_works 0.2.0 → 0.3.9

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc ADDED
@@ -0,0 +1,285 @@
1
+ == LucidWorks-Ruby
2
+
3
+ Ruby bindings for the REST API of the LucidWorks family of search products.
4
+
5
+ The LucidWorks family of products are search engines that combine the open source search technologies Lucene and Solr with open source crawlers, a management UI and a REST API. The LucidWorks REST API provides a programmatic way to manage collections, data-sources, scheduling and many of the other objects and tasks involved in running a search engine.
6
+
7
+ == Information
8
+
9
+ You can view the LucidWorks-Ruby documentation in RDoc format here:
10
+
11
+ http://rubydoc.info/github/lucidimagination/lucidworks-ruby/master/frames
12
+
13
+ The LucidWorks REST API is documented here:
14
+
15
+ http://lucidworks.lucidimagination.com/display/LWEUG/Rest+API
16
+
17
+ === Bug reports
18
+
19
+ Where should people file bugs?
20
+ GitHub? That implies we have open sourced this already.
21
+ An email address at Lucid?
22
+
23
+ == Installation
24
+
25
+ Install the gem:
26
+
27
+ gem install lucid_works
28
+
29
+ Or add it to your Gemfile, then run bundle install:
30
+
31
+ gem "lucid_works"
32
+
33
+ == Show Me the Money
34
+
35
+ This single statement (note the periods) will connect to a LucidWorks server running on the local machine, create a collection called "News" and a data-source called "cnn" for the cnn.com website, then start a crawl. Cut and paste into Irb:
36
+
37
+ require 'lucid_works'
38
+
39
+ LucidWorks::Server.new("http://localhost:8888").
40
+ create_collection(:name => 'News').
41
+ create_datasource(:name => 'cnn',
42
+ :crawler => 'lucid.aperture', :type => 'web',
43
+ :url => 'http://cnn.com', :crawl_depth => '1').
44
+ build_schedule(:start_time => 0, :period => 0, :type => 'index', :active => true).
45
+ save
46
+
47
+ Now, how does it work:
48
+
49
+
50
+ == Object Model
51
+
52
+ The LucidWorks object model looks something like this:
53
+
54
+ Server -+- Collection -+- Datasource -+- Status
55
+ | | +- History
56
+ | | +- Schedule
57
+ | | +- Index
58
+ | | +- Crawldata
59
+ | +- Field
60
+ | +- Index
61
+ | +- Info
62
+ | +- Settings
63
+ |
64
+ +- Logs -+- Index -+- Summary
65
+ | +- Query -+- Summary
66
+ |
67
+ +- Crawlers
68
+
69
+ This is what has been modeled so far. The actual REST API is more extensive.
70
+
71
+ == Usage
72
+
73
+ === Server
74
+
75
+ The starting point for our communication with a LucidWorks server is a LucidWorks::Server object, e.g. for a LucidWorks server running on the local machine, on the standard port:
76
+
77
+ server = LucidWorks::Server.new("http://localhost:8888")
78
+
79
+ === Collections
80
+
81
+ Collections are modeled using the LucidWorks::Collection class. LucidWorks::Server has_many :collections, therefore:
82
+
83
+ To retrieve collections:
84
+
85
+ @server.collections -> an array LucidWorks::Collection
86
+
87
+ puts @server.collections.map(&:name)
88
+
89
+ @server.collection("name") -> a single LucidWorks::Collection
90
+
91
+ Create a collection:
92
+
93
+ collection = @server.build_collection(:name => "MY_STUFF")
94
+ collection.save
95
+
96
+ or
97
+
98
+ collection = @server.create_collection(:name => "MY_STUFF")
99
+
100
+ Delete a collection:
101
+
102
+ collection.destroy
103
+
104
+ Wipe all indexed data from a collection:
105
+
106
+ collection.empty!
107
+
108
+ === Collection Info
109
+
110
+ The Collection::Info contains a lot of data about the state of a collection.
111
+
112
+ info = @server.collection('coll1').info -> a LucidWorks::Collection::Info
113
+
114
+ info.index_num_docs -> 12345
115
+ info.index_size -> "44.3 MB"
116
+
117
+ === Collection Settings
118
+
119
+ The Collection::Settings class contains indexing and querying settings for the collection.
120
+
121
+ settings = @server.collection('collection1').settings -> a LucidWorks::Collection::Settings
122
+
123
+ settings.query_parser -> "lucid"
124
+ settings.synonym_list -> ["Lawyer", "Attorney", "one", "1", ...]
125
+
126
+ === Field
127
+
128
+ Collection has_many :fields. The Field class models data about a collection's field.
129
+
130
+ field = @server.collection('collection1').field('body') -> a LucidWorks::Field
131
+
132
+ field.field_type -> "text_en"
133
+ field.facet -> false
134
+
135
+ === Datasources
136
+
137
+ Collection has_many :datasources. Datasources are modeled using the LucidWorks::Datasource class. They support all the standard ORM methods, e.g.
138
+
139
+ collection.datasources -> an array of LucidWorks::Datasource
140
+
141
+ collection.datasource(123) -> a single LucidWorks::Datasource
142
+
143
+ datasource = collection.create_datasource(
144
+ :crawler => LucidWorks::Datasource::CRAWLERS['web'],
145
+ :type => 'web',
146
+ :name => "example.com",
147
+ :url => "http://example.com/",
148
+ :crawl_depth => 1
149
+ )
150
+
151
+ Note that the latter does not start a crawl of the datasource.
152
+
153
+ To delete all the data crawled from a data-source:
154
+
155
+ datasource.empty!
156
+
157
+ === The ORM
158
+
159
+ This library implements a simple ORM (object relational model) on top of the LucidWorks REST API which behaves somewhat like ActiveResource/ActiveRecord (if you want to know why we didn't just use ActiveResource, see the Rationale section).
160
+
161
+ === Base
162
+
163
+ LucidWorks::Base is the ORM foundation of this library. It supports many of the ActiveRecord style methods. e.g. given a Thing model:
164
+
165
+ class Thing < LucidWorks::Base
166
+ end
167
+
168
+ Then Thing will have the following class methods:
169
+
170
+ thing = Thing.new(:attrib => value, :parent => parent) -> unsaved Thing
171
+
172
+ Thing.create(:attr => value, ..., :parent => parent) -> saved Thing
173
+
174
+ Thing.find(:all, :parent => parent) -> Array of Thing
175
+
176
+ Thing.find(id, :parent => parent) -> a Thing
177
+
178
+ The 'parent' must be another LucidWorks::Base model or a LucidWorks::Server; this is only required when the class is used stand-alone. If the model is created/retrieved from an association, this value is set for you automatically.
179
+
180
+ thing.save -> true/false
181
+ thing.destroy
182
+
183
+ ==== Has_many associations
184
+
185
+ The has_many association is used to associate a resource with another collection resource. Given:
186
+
187
+ class Thing < LucidWorks::Base
188
+ has_many :others
189
+ end
190
+
191
+ Then
192
+
193
+ thing.others -> array of Other
194
+
195
+ thing.other(id) -> an Other
196
+
197
+ thing.new_other(:attr => val, ...) -> an unsaved Other
198
+
199
+ thing.create_other(:attr => val, ...) -> saved Other
200
+
201
+ ==== Has_one associations
202
+
203
+ The has_one association is used to associate a resource with another singleton resource that is transient, i.e. can be created and destroyed.
204
+
205
+ class Thing < LucidWorks::Base
206
+ has_one :whatnot
207
+ end
208
+
209
+ class Whatnot < LucidWorks::Base
210
+ self.singleton = true
211
+ belongs_to :thing
212
+ end
213
+
214
+ Then
215
+
216
+ thing.whatnot -> a retrieved Whatnot
217
+
218
+ thing.build_whatnot -> an unsaved Whatnot
219
+
220
+ ==== Has_singleton associations
221
+
222
+ The has_singleton association is used to associate a resource with another intransient singleton resource, i.e. one that always exists and calling destroy does not remove it.
223
+
224
+ class Thing < LucidWorks::Base
225
+ has_one :whatnot
226
+ end
227
+
228
+ class Whatnot < LucidWorks::Base
229
+ self.singleton = true
230
+ belongs_to :thing
231
+ end
232
+
233
+ Then
234
+
235
+ thing.whatnot -> an unsaved Whatnot
236
+
237
+ === Belongs_to associations
238
+
239
+ Te belongs to association augments the model with methods to access its parent. Given:
240
+
241
+ class Whatnot < LucidWorks::Base
242
+ self.singleton = true
243
+ belongs_to :thing
244
+ end
245
+
246
+ Then:
247
+
248
+ whatnot.thing -> A Thing
249
+
250
+ === Rationale
251
+
252
+ Originally this library started out as a set of ActiveResource classes. This required a lot of hacking of ActiveResource as ActiveResource makes a lot of assumptions about the way a REST API should work - it's basically just designed to talk to Rails applications - and many REST APIs, including this one, don't conform to those rules. Among the changes required to ActiveResource were:
253
+
254
+ - Don't require attributes always be nested inside :resource => on create and update.
255
+ - Allow client-side generation of a resource ID during create.
256
+ - Support has_one and has_many associations.
257
+
258
+ However eventually this strategy hit a brick wall that would have been extremely expensive to hurdle. We needed the following features:
259
+
260
+ - The ability to talk to the same API on more than one server simultaneously.
261
+ - Support file uploads using multi-part post.
262
+
263
+ Given the design of ActiveResource these would have been expensive to implement and it became simpler to just write a simple ORM by marrying ActiveModel and RestClient.
264
+
265
+ == Maintainers
266
+
267
+ * Sam Pierson (http://github.com/sampierson)
268
+
269
+ == License
270
+
271
+ Copyright 2011 Lucid Imagination
272
+ http://lucidimagination.com
273
+
274
+ Licensed under the Apache License, Version 2.0 (the "License");
275
+ you may not use this software except in compliance with the License.
276
+ You may obtain a copy of the License at
277
+
278
+ http://www.apache.org/licenses/LICENSE-2.0
279
+
280
+ Unless required by applicable law or agreed to in writing, software
281
+ distributed under the License is distributed on an "AS IS" BASIS,
282
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
283
+ See the License for the specific language governing permissions and
284
+ limitations under the License.
285
+
data/Rakefile CHANGED
@@ -1,7 +1,13 @@
1
1
  require 'bundler'
2
+ require 'rake/rdoctask'
3
+
2
4
  Bundler::GemHelper.install_tasks
3
5
 
4
- desc "Create RDoc documentation"
5
- task :doc do
6
- system 'rdoc'
6
+ desc 'Generate documentation for the lucid_works library.'
7
+ Rake::RDocTask.new(:rdoc) do |rdoc|
8
+ rdoc.rdoc_dir = 'rdoc'
9
+ rdoc.title = 'LucidWorks-Ruby'
10
+ rdoc.main = 'README.rdoc'
11
+ rdoc.rdoc_files.include('README.rdoc')
12
+ rdoc.rdoc_files.include('lib/**/*.rb')
7
13
  end
@@ -0,0 +1,106 @@
1
+ ---
2
+ en:
3
+ activemodel:
4
+ models:
5
+ lucid_works:
6
+ collection:
7
+ one: Collection
8
+ other: Collections
9
+ settings:
10
+ one: Settings
11
+ other: Settings
12
+ de_duplication:
13
+ 'off': 'Off'
14
+ overwrite: Overwrite
15
+ tag: Tag
16
+ datasource:
17
+ one: Data-source
18
+ other: Data-sources
19
+ status:
20
+ crawl_state:
21
+ aborted: Aborted
22
+ aborting: Aborting
23
+ exception: Exception
24
+ finished: Finished
25
+ idle: Idle
26
+ running: Running
27
+ stopped: Stopped
28
+ stopping: Stoppong
29
+ type:
30
+ file: Local Filesystem
31
+ jdbc: Database
32
+ sharepoint: Sharepoint
33
+ solrxml: Solr XML
34
+ web: Web Site
35
+ attributes:
36
+ lucid_works:
37
+ collection:
38
+ name: Name
39
+ info:
40
+ collection_name: Collection name
41
+ data_dir: Data directory
42
+ free_disk_bytes: Free disk bytes
43
+ free_disk_space: Free disk space
44
+ index_directory: Index directory
45
+ index_has_deletions: Index has deletions
46
+ index_is_current: Index is current
47
+ index_is_optimized: Optimized
48
+ index_last_modified: Index last modified
49
+ index_max_doc: Index max doc
50
+ index_num_docs: Documents indexed
51
+ index_size: Index size
52
+ index_size_bytes: Index size
53
+ index_version: Index version
54
+ instance_dir: Instance directory
55
+ root_dir: Root directory
56
+ total_disk_bytes: Total disk bytes
57
+ total_disk_space: Total disk space
58
+ settings:
59
+ auto_complete: Auto complete
60
+ boosts: Boosts
61
+ boost_recent: Boost recent
62
+ click_boost_data: Click boost data
63
+ click_boost_field: Click boost field
64
+ click_enabled: Click scoring enabled
65
+ default_sort: Default sort
66
+ de_duplication: De-duplication
67
+ display_facets: Display facets
68
+ elevations: elevations
69
+ index_time_stopwords: Don't index stop words
70
+ query_parser: Query parser
71
+ query_time_stopwords: Include stop words in searches
72
+ query_time_synonyms: Use synomyms
73
+ search_server_list: Search server list
74
+ show_similar: Show similar
75
+ spellcheck: Spell-check
76
+ ssl: SSL
77
+ stopword_list: Stopword list
78
+ synonym_list: Synonym list
79
+ unknown_type_handling: Default field type
80
+ unsupervised_feedback: Unsupervised feedback
81
+ unsupervised_feedback_emphasis: Unsupervised feedback emphasis
82
+ update_server_list: Update server list
83
+ datasource:
84
+ type: Type
85
+ bounds: Constrain to
86
+ max_bytes: Skip files larger than
87
+ history:
88
+ crawlStarted: Started
89
+ crawlStopped: Stopped
90
+ crawlState: State
91
+ numNew: New
92
+ numUpdated: Updated
93
+ numDeleted: Deleted
94
+ numUnchanged: Unchanged
95
+ numFailed: Failed
96
+ status:
97
+ crawlStarted: Last crawl started
98
+ crawlStopped: Last crawl stopped
99
+ crawlState: State
100
+ doc_count: Documents indexed
101
+ jobId: Job ID
102
+ numNew: New docs
103
+ numUpdated: Updated docs
104
+ numDeleted: Deleted docs
105
+ numUnchanged: Unchanged docs
106
+ numFailed: Failed docs
@@ -8,9 +8,12 @@ module LucidWorks
8
8
  # Specifies a singleton child resource.
9
9
  #
10
10
  # In the parent resource creates methods:
11
- # child - load and cache the child. Subsequent calls will access the cached value.
12
- # child! - load and cache the child, ignoring existing cached value if present.
13
- #
11
+ # if option :has_content is true (default)
12
+ # child - load and cache the child. Subsequent calls will access the cached value.
13
+ # child! - load and cache the child, ignoring existing cached value if present.
14
+ # build_child - create a new, unsaved resource
15
+ # if option :has_content is false
16
+ # child - create a new, unsaved resource
14
17
  # === Options
15
18
  #
16
19
  # The declaration can also include an options hash to specialize the behavior of the association.
@@ -21,6 +24,13 @@ module LucidWorks
21
24
  # from the association name, e.g.
22
25
  # has_one :info, :class_name => :collection_info # use CollectionInfo class
23
26
  # has_one :foo, :class_name => :'foo/bar' # use Foo::Bar class
27
+ # [:has_content]
28
+ # Changes the behavior of the .<resource> association method:
29
+ # If set to true (default), indicates that this resource may be retrieved using a GET, and
30
+ # the .<resource> method will retrieve it.
31
+ # If set to false, this resource may not be retrieved using a GET, and the .<resource> method
32
+ # will instead build and return new, unsaved, model. This is useful for pseudo-resources that only
33
+ # provide actions, not data.
24
34
  #
25
35
  def has_one(*arguments)
26
36
  options = arguments.last.is_a?(Hash) ? arguments.pop : {}
@@ -29,7 +39,6 @@ module LucidWorks
29
39
  end
30
40
  end
31
41
 
32
- #
33
42
  # Specifies a child resource.
34
43
  #
35
44
  # e.g. for Blog has_many posts
@@ -39,31 +48,11 @@ module LucidWorks
39
48
  # posts
40
49
  # post(id)
41
50
  #
42
- def has_many(resources, options = {})
43
- resource = resources.to_s.singularize
44
- resource_class_name = (options[:class_name] || resource).to_s.classify
45
-
46
- class_eval <<-EOF, __FILE__, __LINE__ + 1
47
- def #{resources}(options={})
48
- @#{resources} || #{resources}!
49
- end
50
-
51
- def #{resources}!(options={})
52
- @#{resources} = #{resource_class_name}.all(options.merge :parent => self)
53
- end
54
-
55
- def #{resource}(id, options={})
56
- #{resource_class_name}.find(id, options.merge(:parent => self))
57
- end
58
-
59
- def create_#{resource}(options = {})
60
- #{resource_class_name}.create(options.merge :parent => self)
61
- end
62
-
63
- def build_#{resource}(options = {})
64
- #{resource_class_name}.new(options.merge :parent => self)
65
- end
66
- EOF
51
+ def has_many(*arguments)
52
+ options = arguments.last.is_a?(Hash) ? arguments.pop : {}
53
+ arguments.each do |resources|
54
+ define_has_many resources, options
55
+ end
67
56
  end
68
57
 
69
58
  # Specified a parent resource.
@@ -99,14 +88,50 @@ module LucidWorks
99
88
 
100
89
  def define_has_one(resource, options={})
101
90
  resource_class_name = (options[:class_name] || resource).to_s.camelize
91
+
92
+ if options[:has_content] == false
93
+ class_eval <<-EOF1, __FILE__, __LINE__ + 1
94
+ def #{resource} # def resource
95
+ #{resource_class_name}.new(:parent => self) # Child.new(options.merge :parent => self)
96
+ end # end
97
+ EOF1
98
+ else
99
+ class_eval <<-EOF2, __FILE__, __LINE__ + 1
100
+ def #{resource} # def resource
101
+ @#{resource} || #{resource}! # @resource || resource!
102
+ end # end
103
+
104
+ def #{resource}! # def resource!
105
+ @#{resource} = #{resource_class_name}.find(:parent => self) # @resource = Resource.find(:parent => self)
106
+ end # end
107
+
108
+ def build_#{resource}(options = {})
109
+ #{resource_class_name}.new(options.merge :parent => self)
110
+ end
111
+ EOF2
112
+ end
113
+ end
114
+
115
+ def define_has_many(resources, options = {})
116
+ resource = resources.to_s.singularize
117
+ resource_class_name = (options[:class_name] || resource).to_s.classify
118
+
102
119
  class_eval <<-EOF, __FILE__, __LINE__ + 1
103
- def #{resource} # def resource
104
- @#{resource} || #{resource}! # @resource || resource!
105
- end # end
120
+ def #{resources}(options={})
121
+ @#{resources} || #{resources}!(options)
122
+ end
106
123
 
107
- def #{resource}! # def resource!
108
- @#{resource} = #{resource_class_name}.find(:parent => self) # @resource = Resource.find(:parent => self)
109
- end # end
124
+ def #{resources}!(options={})
125
+ @#{resources} = #{resource_class_name}.all(options.merge :parent => self)
126
+ end
127
+
128
+ def #{resource}(id, options={})
129
+ #{resource_class_name}.find(id, options.merge(:parent => self))
130
+ end
131
+
132
+ def create_#{resource}(options = {})
133
+ #{resource_class_name}.create(options.merge :parent => self)
134
+ end
110
135
 
111
136
  def build_#{resource}(options = {})
112
137
  #{resource_class_name}.new(options.merge :parent => self)