RubyGems - lucid_works - Versions diffs - 0.2.0 → 0.3.9 - Mend

lucid_works 0.2.0 → 0.3.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

data/.gitignore +1 -0
data/LICENSE +202 -0
data/NOTICE +202 -0
data/README.rdoc +285 -0
data/Rakefile +9 -3
data/config/locales/en.yml +106 -0
data/lib/lucid_works/associations.rb +60 -35
data/lib/lucid_works/base.rb +170 -55
data/lib/lucid_works/collection/info.rb +13 -0
data/lib/lucid_works/collection/settings.rb +14 -0
data/lib/lucid_works/collection.rb +11 -3
data/lib/lucid_works/crawler.rb +8 -0
data/lib/lucid_works/datasource/crawldata.rb +9 -0
data/lib/lucid_works/datasource/history.rb +10 -2
data/lib/lucid_works/datasource/schedule.rb +7 -0
data/lib/lucid_works/datasource/status.rb +45 -0
data/lib/lucid_works/datasource.rb +48 -29
data/lib/lucid_works/field.rb +53 -0
data/lib/lucid_works/logs.rb +1 -2
data/lib/lucid_works/schema.rb +72 -0
data/lib/lucid_works/server.rb +2 -2
data/lib/lucid_works/version.rb +1 -1
data/lib/lucid_works.rb +12 -0
data/spec/lib/lucid_works/associations_spec.rb +41 -14
data/spec/lib/lucid_works/base_spec.rb +209 -75
data/spec/lib/lucid_works/collection_spec.rb +16 -4
data/spec/lib/lucid_works/datasource/history_spec.rb +47 -0
data/spec/lib/lucid_works/datasource/status_spec.rb +52 -0
data/spec/lib/lucid_works/datasource_spec.rb +33 -36
data/spec/lib/lucid_works/field_spec.rb +23 -0
data/spec/lib/lucid_works/schema_spec.rb +50 -0
metadata +18 -2

data/README.rdoc ADDED Viewed

@@ -0,0 +1,285 @@
+== LucidWorks-Ruby
+Ruby bindings for the REST API of the LucidWorks family of search products.
+The LucidWorks family of products are search engines that combine the open source search technologies Lucene and Solr with open source crawlers, a management UI and a REST API.  The LucidWorks REST API provides a programmatic way to manage collections, data-sources, scheduling and many of the other objects and tasks involved in running a search engine.
+== Information
+You can view the LucidWorks-Ruby documentation in RDoc format here:
+http://rubydoc.info/github/lucidimagination/lucidworks-ruby/master/frames
+The LucidWorks REST API is documented here:
+http://lucidworks.lucidimagination.com/display/LWEUG/Rest+API
+=== Bug reports
+Where should people file bugs?
+GitHub?  That implies we have open sourced this already.
+An email address at Lucid?
+== Installation
+Install the gem:
+  gem install lucid_works
+Or add it to your Gemfile, then run bundle install:
+  gem "lucid_works"
+== Show Me the Money
+This single statement (note the periods) will connect to a LucidWorks server running on the local machine, create a collection called "News" and a data-source called "cnn" for the cnn.com website, then start a crawl.  Cut and paste into Irb:
+  require 'lucid_works'
+  LucidWorks::Server.new("http://localhost:8888").
+    create_collection(:name => 'News').
+    create_datasource(:name => 'cnn',
+                      :crawler => 'lucid.aperture', :type => 'web',
+                      :url => 'http://cnn.com', :crawl_depth => '1').
+    build_schedule(:start_time => 0, :period => 0, :type => 'index', :active => true).
+    save
+Now, how does it work:
+== Object Model
+The LucidWorks object model looks something like this:
+  Server -+- Collection -+- Datasource -+- Status
+          |              |              +- History
+          |              |              +- Schedule
+          |              |              +- Index
+          |              |              +- Crawldata
+          |              +- Field
+          |              +- Index
+          |              +- Info
+          |              +- Settings
+          |
+          +- Logs -+- Index -+- Summary
+          |        +- Query -+- Summary
+          |
+          +- Crawlers
+This is what has been modeled so far.  The actual REST API is more extensive.
+== Usage
+=== Server
+The starting point for our communication with a LucidWorks server is a LucidWorks::Server object, e.g. for a LucidWorks server running on the local machine, on the standard port:
+  server = LucidWorks::Server.new("http://localhost:8888")
+=== Collections
+Collections are modeled using the LucidWorks::Collection class.  LucidWorks::Server has_many :collections, therefore:
+To retrieve collections:
+  @server.collections                   -> an array LucidWorks::Collection
+  puts @server.collections.map(&:name)
+  @server.collection("name")            -> a single LucidWorks::Collection
+Create a collection:
+  collection = @server.build_collection(:name => "MY_STUFF")
+  collection.save
+  or
+  collection = @server.create_collection(:name => "MY_STUFF")
+Delete a collection:
+  collection.destroy
+Wipe all indexed data from a collection:
+  collection.empty!
+=== Collection Info
+The Collection::Info contains a lot of data about the state of a collection.
+  info = @server.collection('coll1').info -> a LucidWorks::Collection::Info
+  info.index_num_docs  ->  12345
+  info.index_size      ->  "44.3 MB"
+=== Collection Settings
+The Collection::Settings class contains indexing and querying settings for the collection.
+  settings = @server.collection('collection1').settings -> a LucidWorks::Collection::Settings
+  settings.query_parser    ->  "lucid"
+  settings.synonym_list    ->  ["Lawyer", "Attorney", "one", "1", ...]
+=== Field
+Collection has_many :fields.  The Field class models data about a collection's field.
+  field = @server.collection('collection1').field('body')  -> a LucidWorks::Field
+  field.field_type  ->  "text_en"
+  field.facet       ->  false
+=== Datasources
+Collection has_many :datasources.  Datasources are modeled using the LucidWorks::Datasource class.  They support all the standard ORM methods, e.g.
+  collection.datasources       -> an array of LucidWorks::Datasource
+  collection.datasource(123)   -> a single LucidWorks::Datasource
+  datasource = collection.create_datasource(
+    :crawler => LucidWorks::Datasource::CRAWLERS['web'],
+    :type => 'web',
+    :name => "example.com",
+    :url => "http://example.com/",
+    :crawl_depth => 1
+  )
+Note that the latter does not start a crawl of the datasource.
+To delete all the data crawled from a data-source:
+  datasource.empty!
+=== The ORM
+This library implements a simple ORM (object relational model) on top of the LucidWorks REST API which behaves somewhat like ActiveResource/ActiveRecord (if you want to know why we didn't just use ActiveResource, see the Rationale section).
+=== Base
+LucidWorks::Base is the ORM foundation of this library.  It supports many of the ActiveRecord style methods. e.g. given a Thing model:
+  class Thing < LucidWorks::Base
+  end
+Then Thing will have the following class methods:
+  thing = Thing.new(:attrib => value, :parent => parent)        -> unsaved Thing
+  Thing.create(:attr => value, ..., :parent => parent)          -> saved Thing
+  Thing.find(:all, :parent => parent)                           -> Array of Thing
+  Thing.find(id, :parent => parent)                             -> a Thing
+The 'parent' must be another LucidWorks::Base model or a LucidWorks::Server;  this is only required when the class is used stand-alone.  If the model is created/retrieved from an association, this value is set for you automatically.
+  thing.save                                                    -> true/false
+  thing.destroy
+==== Has_many associations
+The has_many association is used to associate a resource with another collection resource.  Given:
+  class Thing < LucidWorks::Base
+    has_many :others
+  end
+Then
+  thing.others                            -> array of Other
+  thing.other(id)                         -> an Other
+  thing.new_other(:attr => val, ...)      -> an unsaved Other
+  thing.create_other(:attr => val, ...)   -> saved Other
+==== Has_one associations
+The has_one association is used to associate a resource with another singleton resource that is transient, i.e. can be created and destroyed.
+  class Thing < LucidWorks::Base
+    has_one :whatnot
+  end
+  class Whatnot < LucidWorks::Base
+    self.singleton = true
+    belongs_to :thing
+  end
+Then
+  thing.whatnot                           -> a retrieved Whatnot
+  thing.build_whatnot                     -> an unsaved Whatnot
+==== Has_singleton associations
+The has_singleton association is used to associate a resource with another intransient singleton resource, i.e. one that always exists and calling destroy does not remove it.
+  class Thing < LucidWorks::Base
+    has_one :whatnot
+  end
+  class Whatnot < LucidWorks::Base
+    self.singleton = true
+    belongs_to :thing
+  end
+Then
+  thing.whatnot                           -> an unsaved Whatnot
+=== Belongs_to associations
+Te belongs to association augments the model with methods to access its parent. Given:
+  class Whatnot < LucidWorks::Base
+    self.singleton = true
+    belongs_to :thing
+  end
+Then:
+  whatnot.thing         -> A Thing
+=== Rationale
+Originally this library started out as a set of ActiveResource classes.  This required a lot of hacking of ActiveResource as ActiveResource makes a lot of assumptions about the way a REST API should work - it's basically just designed to talk to Rails applications - and many REST APIs, including this one, don't conform to those rules.  Among the changes required to ActiveResource were:
+- Don't require attributes always be nested inside :resource => on create and update.
+- Allow client-side generation of a resource ID during create.
+- Support has_one and has_many associations.
+However eventually this strategy hit a brick wall that would have been extremely expensive to hurdle.  We needed the following features:
+- The ability to talk to the same API on more than one server simultaneously.
+- Support file uploads using multi-part post.
+Given the design of ActiveResource these would have been expensive to implement and it became simpler to just write a simple ORM by marrying ActiveModel and RestClient.
+== Maintainers
+* Sam Pierson (http://github.com/sampierson)
+== License
+Copyright 2011 Lucid Imagination
+http://lucidimagination.com
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this software except in compliance with the License.
+You may obtain a copy of the License at
+   http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

data/Rakefile CHANGED Viewed

@@ -1,7 +1,13 @@
 require 'bundler'
+require 'rake/rdoctask'
 Bundler::GemHelper.install_tasks
-desc "Create RDoc documentation"
-task :doc do
-  system 'rdoc'
+desc 'Generate documentation for the lucid_works library.'
+Rake::RDocTask.new(:rdoc) do |rdoc|
+  rdoc.rdoc_dir = 'rdoc'
+  rdoc.title    = 'LucidWorks-Ruby'
+  rdoc.main     = 'README.rdoc'
+  rdoc.rdoc_files.include('README.rdoc')
+  rdoc.rdoc_files.include('lib/**/*.rb')
 end

data/config/locales/en.yml ADDED Viewed

@@ -0,0 +1,106 @@
+---
+en:
+  activemodel:
+    models:
+      lucid_works:
+        collection:
+          one: Collection
+          other: Collections
+          settings:
+            one: Settings
+            other: Settings
+            de_duplication:
+              'off': 'Off'
+              overwrite: Overwrite
+              tag: Tag
+        datasource:
+          one: Data-source
+          other: Data-sources
+          status:
+            crawl_state:
+              aborted: Aborted
+              aborting: Aborting
+              exception: Exception
+              finished: Finished
+              idle: Idle
+              running: Running
+              stopped: Stopped
+              stopping: Stoppong
+          type:
+            file: Local Filesystem
+            jdbc: Database
+            sharepoint: Sharepoint
+            solrxml: Solr XML
+            web: Web Site
+    attributes:
+      lucid_works:
+        collection:
+          name: Name
+          info:
+            collection_name: Collection name
+            data_dir: Data directory
+            free_disk_bytes: Free disk bytes
+            free_disk_space: Free disk space
+            index_directory: Index directory
+            index_has_deletions: Index has deletions
+            index_is_current: Index is current
+            index_is_optimized: Optimized
+            index_last_modified: Index last modified
+            index_max_doc: Index max doc
+            index_num_docs: Documents indexed
+            index_size: Index size
+            index_size_bytes: Index size
+            index_version: Index version
+            instance_dir: Instance directory
+            root_dir: Root directory
+            total_disk_bytes: Total disk bytes
+            total_disk_space: Total disk space
+          settings:
+            auto_complete: Auto complete
+            boosts: Boosts
+            boost_recent: Boost recent
+            click_boost_data: Click boost data
+            click_boost_field: Click boost field
+            click_enabled: Click scoring enabled
+            default_sort: Default sort
+            de_duplication: De-duplication
+            display_facets: Display facets
+            elevations: elevations
+            index_time_stopwords: Don't index stop words
+            query_parser: Query parser
+            query_time_stopwords: Include stop words in searches
+            query_time_synonyms: Use synomyms
+            search_server_list: Search server list
+            show_similar: Show similar
+            spellcheck: Spell-check
+            ssl: SSL
+            stopword_list: Stopword list
+            synonym_list: Synonym list
+            unknown_type_handling: Default field type
+            unsupervised_feedback: Unsupervised feedback
+            unsupervised_feedback_emphasis: Unsupervised feedback emphasis
+            update_server_list: Update server list
+        datasource:
+          type: Type
+          bounds: Constrain to
+          max_bytes: Skip files larger than
+          history:
+            crawlStarted: Started
+            crawlStopped: Stopped
+            crawlState: State
+            numNew: New
+            numUpdated: Updated
+            numDeleted: Deleted
+            numUnchanged: Unchanged
+            numFailed: Failed
+          status:
+            crawlStarted: Last crawl started
+            crawlStopped: Last crawl stopped
+            crawlState: State
+            doc_count: Documents indexed
+            jobId: Job ID
+            numNew: New docs
+            numUpdated: Updated docs
+            numDeleted: Deleted docs
+            numUnchanged: Unchanged docs
+            numFailed: Failed docs

data/lib/lucid_works/associations.rb CHANGED Viewed

@@ -8,9 +8,12 @@ module LucidWorks
       # Specifies a singleton child resource.
       #
       # In the parent resource creates methods:
-      #   child  - load and cache the child. Subsequent calls will access the cached value.
-      #   child! - load and cache the child, ignoring existing cached value if present.
-      #
+      #   if option :has_content is true (default)
+      #     child       - load and cache the child. Subsequent calls will access the cached value.
+      #     child!      - load and cache the child, ignoring existing cached value if present.
+      #     build_child - create a new, unsaved resource
+      #   if option :has_content is false
+      #     child       - create a new, unsaved resource
       # === Options
       #
       # The declaration can also include an options hash to specialize the behavior of the association.
@@ -21,6 +24,13 @@ module LucidWorks
       #   from the association name, e.g.
       #     has_one :info, :class_name => :collection_info # use CollectionInfo class
       #     has_one :foo,  :class_name => :'foo/bar'       # use Foo::Bar class
+      # [:has_content]
+      #   Changes the behavior of the .<resource> association method:
+      #   If set to true (default), indicates that this resource may be retrieved using a GET, and
+      #   the .<resource> method will retrieve it.
+      #   If set to false, this resource may not be retrieved using a GET, and the .<resource> method
+      #   will instead build and return new, unsaved, model. This is useful for pseudo-resources that only
+      #   provide actions, not data.
       #
       def has_one(*arguments)
         options = arguments.last.is_a?(Hash) ? arguments.pop : {}
@@ -29,7 +39,6 @@ module LucidWorks
         end
       end
-      #
       # Specifies a child resource.
       #
       # e.g. for Blog has_many posts
@@ -39,31 +48,11 @@ module LucidWorks
       #   posts
       #   post(id)
       #
-      def has_many(resources, options = {})
-        resource = resources.to_s.singularize
-        resource_class_name = (options[:class_name] || resource).to_s.classify
-        class_eval <<-EOF, __FILE__, __LINE__ + 1
-          def #{resources}(options={})
-            @#{resources} || #{resources}!
-          end
-          def #{resources}!(options={})
-            @#{resources} = #{resource_class_name}.all(options.merge :parent => self)
-          end
-          def #{resource}(id, options={})
-            #{resource_class_name}.find(id, options.merge(:parent => self))
-          end
-          def create_#{resource}(options = {})
-            #{resource_class_name}.create(options.merge :parent => self)
-          end
-          def build_#{resource}(options = {})
-            #{resource_class_name}.new(options.merge :parent => self)
-          end
-        EOF
+      def has_many(*arguments)
+        options = arguments.last.is_a?(Hash) ? arguments.pop : {}
+        arguments.each do |resources|
+          define_has_many resources, options
+        end
       end
       # Specified a parent resource.
@@ -99,14 +88,50 @@ module LucidWorks
       def define_has_one(resource, options={})
         resource_class_name = (options[:class_name] || resource).to_s.camelize
+        if options[:has_content] == false
+          class_eval <<-EOF1, __FILE__, __LINE__ + 1
+            def #{resource}                                                # def resource
+              #{resource_class_name}.new(:parent => self)                  #   Child.new(options.merge :parent => self)
+            end                                                            # end
+          EOF1
+        else
+          class_eval <<-EOF2, __FILE__, __LINE__ + 1
+            def #{resource}                                                # def resource
+              @#{resource} || #{resource}!                                 #   @resource || resource!
+            end                                                            # end
+            def #{resource}!                                               # def resource!
+              @#{resource} = #{resource_class_name}.find(:parent => self)  #   @resource = Resource.find(:parent => self)
+            end                                                            # end
+            def build_#{resource}(options = {})
+              #{resource_class_name}.new(options.merge :parent => self)
+            end
+          EOF2
+        end
+      end
+      def define_has_many(resources, options = {})
+        resource = resources.to_s.singularize
+        resource_class_name = (options[:class_name] || resource).to_s.classify
         class_eval <<-EOF, __FILE__, __LINE__ + 1
-          def #{resource}                                                # def resource
-            @#{resource} || #{resource}!                                 #   @resource || resource!
-          end                                                            # end
+          def #{resources}(options={})
+            @#{resources} || #{resources}!(options)
+          end
-          def #{resource}!                                               # def resource!
-            @#{resource} = #{resource_class_name}.find(:parent => self)  #   @resource = Resource.find(:parent => self)
-          end                                                            # end
+          def #{resources}!(options={})
+            @#{resources} = #{resource_class_name}.all(options.merge :parent => self)
+          end
+          def #{resource}(id, options={})
+            #{resource_class_name}.find(id, options.merge(:parent => self))
+          end
+          def create_#{resource}(options = {})
+            #{resource_class_name}.create(options.merge :parent => self)
+          end
           def build_#{resource}(options = {})
             #{resource_class_name}.new(options.merge :parent => self)