RubyGems - tvdeyen-ferret - Versions diffs - 0.11.8.1 - Mend

tvdeyen-ferret 0.11.8.1

Files changed (3) hide show

data/README +90 -0
data/TODO +109 -0
metadata +70 -0

data/README ADDED

@@ -0,0 +1,90 @@
+= Ferret
+Ferret is a Ruby search library inspired by the Apache Lucene search engine for
+Java (http://jakarta.apache.org/lucene/). In the same way as Lucene, it is not
+a standalone application, but a library you can use to index documents and
+search for things in them later.
+== Requirements
+* Ruby 1.8
+* C compiler to build the extension. Tested with gcc, VC6
+* make (or nmake on windows)
+== Installation
+  $ sudo gem install ferret
+If you don't have rubygems installed you can still install Ferret. Just
+download one of the zipped up versions of Ferret, unzip it and change into the
+unzipped directory. Then run the following set of commands;
+  $ ruby setup.rb config
+  $ ruby setup.rb setup
+  $ sudo ruby setup.rb install
+== Usage
+You can read the TUTORIAL which you'll find in the same directory as this
+README. You can also check the following modules for more specific
+documentation.
+* Ferret::Analysis: for more information on how the data is processed when it
+  is tokenized. There are a number of things you can do with your data such as
+  adding stop lists or perhaps a porter stemmer. There are also a number of
+  analyzers already available and it is almost trivial to create a new one
+  with a simple regular expression.
+* Ferret::Search: for more information on querying the index. There are a
+  number of already available queries and it's unlikely you'll need to create
+  your own. You may however want to take advantage of the sorting or filtering
+  abilities of Ferret to present your data the best way you see fit.
+* Ferret::Document: to find out how to create documents. This part of Ferret
+  is relatively straightforward. If you know how Strings, Hashes and Arrays work
+  Ferret then you'll be able to create Documents.
+* Ferret::QueryParser: if you want to find out more about what you can do with
+  Ferret's Query Parser, this is the place to look. The query parser is one
+  area that could use a bit of work so please send your suggestions.
+* Ferret::Index: for more advanced access to the index you'll probably want to
+  use the Ferret::Index::IndexWriter and Ferret::Index::IndexReader. This is
+  the place to look for more information on them.
+* Ferret::Store: This is the module used to access the actual index storage
+  and won't be of much interest to most people.
+=== Performance
+We are unaware of any alternatives that can out-perform Ferret while still
+matching it in features.
+== Contact
+For bug reports and patches I have set up Trac here;
+  http://ferret.davebalmain.com/trac
+Queries, discussion etc should be addressed to the mailing lists here;
+  http://rubyforge.org/projects/ferret/
+Alternatively you could create a new page for discussion on the Ferret wiki;
+  http://ferret.davebalmain.com/trac
+Of course, since Ferret was ported from Apache Lucene, most of what you can
+do with Lucene you can also do with Ferret.
+== Authors
+[<b>David Balmain</b>] Port to Ruby
+[The Apache Software Foundation (Doug Cutting and friends)] Original Apache Lucene
+== License
+Ferret is available under an MIT-style license.
+:include: MIT-LICENSE

data/TODO ADDED

@@ -0,0 +1,109 @@
+TODO
+====
+* C
+  - IMPORTANT:
+    + FIX file descriptor overflow. See Tickets #341 and #343
+  - add .. operator to query parser. For example, [100 200] could be written as
+    100..200 or 100...201 like in Ruby Ranges
+  - remove exception handling from C code. All errors to be handled by return
+    values.
+  - Move to sqlite's locking model. Ferret should work fine in a multi-process
+    environment.
+  - Add optional logging. To be enabled at compilation time, perhaps?
+  - Add support for changing zlib and bzlib compression parameters
+  - Improve unit test coverage to 100%
+  - Add benchmark suite
+  - Add Rakefile for development purposes
+    + task to publish gcov and benchmark results to ferret wiki
+  - Index rebuilding of old versioned indexes.
+  - Add a globally accessable, threadsafe symbol table. This will be very
+    useful for storing field names so that no objects need to strdup the
+    field-names but can just store the symbol representative instead.
+    + this has been done but it can be improved using actual Symbol structs
+      instead of plain char*
+  - Make threading optional at compile time
+  - to_json should limit output to prevent memory overflow on large indexes.
+    Perhaps we could use some type of buffered read for this.
+  - Make BitVector run as fast as bitset from C++ STL. See;
+      c/benchmark/bm_bitvector.c
+  - Add a symbol table for field names. This will mean that we won't need to
+    worry about mallocing and freeing field names which happens all over the
+    place.
+  - Divide the headers into public and private (the private headers to be
+    stored in the src directory).
+  - Group-by search. ie you should be able to pass a field to group search
+    results by
+  - Auto-loading of documents during search. ie actual documents get returned
+    instead of document numbers.
+* Ruby bindings
+  - argument checking for every method. We need a new api for argument checking
+    so that the arguments get checked at the start of each method that could
+    cause a segfault.
+  - improve memory management. It was way to complex at the moment. I also need
+    to document how it works so that other developers understand what is going
+    on.
+  - Replace Data_Wrap_Struct with ferret alternative which handles rewrapping
+    of structs automatically and also knows when to release a struct by using
+    refcounting.
+* Ruby
+  - integrate rcov
+  - improve unit test coverage to 100%
+* Documentation.
+  - generate Ruby binding documentation with custom build template similar
+    jaxdoc http://rubyforge.org/projects/jaxdoc
+  - all documentation should meet DOCUMENTATION_STANDARDS
+  - documentation in C code to be generated by doxygen
+Someday Maybe
+=============
+* apply for Google Summer of Code 2009
+* optimize read and write vint
+  - test the following outside of ferret before implementing
+  - perform a binary scan using bit-wise or to find out how many bytes need
+    to be written
+  - if the write/read will overflow the buffer, split it into two, refreshing
+    the buffer in between
+  - use Duff's device to write bytes now that we know how many we need
+* add a super fast language based dictionary compression
+* add portable stacktrace function. Perhaps implement as an external library.
+  - See http://www.nongnu.org/libunwind/
+  - See http://www.tlug.org.za/wiki/index.php/Obtaining_a_stack_trace_in_C_upon_SIGSEGV
+* investigate unscored searching
+* user defined sorting
+* Fix highlighting to work for external fields
+* investigate faster string hashing method
+Done
+====
+* add rake install task
+* FIX :create parameter so that it only deletes the files owned by Ferret.
+* fix compression. Currently nothing is happening if you set a field to
+  :compress. I guess we'll just assume zlib is installed, as I think it has to
+  be for Ruby to be installed.
+* add bzlib support
+* integrate gcov
+* add a field cache to IndexReader
+* setup email alerts for svn commits
+* Ranged, unordered searching. Ie search through the index until you have the
+  required number of documents and then break. This will require the ability to
+  start searches from a particular doc-num.
+  + See searcher_search_unordered in the C code and Searcher#scan in Ruby
+* improve unit test code. I'd like to implement some way to print out a stack
+  trace when a test fails so that it is easy to find the source of the error.
+* catch segfaults and print stack trace so users can post helpful bug tickets.
+  again, see the same links for adding stacktrace to unit tests.
+* Add string Sort descripter
+* fix memory bug
+* add MultiReader interface
+* add lexicographical sort (byte sort)
+* Add highlighting
+* add field compression
+* Fix highlighting to work for compressed fields
+* Add Ferret::Index::Index
+* Fix:
+  + Working Query:  field1:value1 AND NOT field2:value2
+  + Failing Query:    field1:value1 AND ( NOT field2:value2 )
+* update benchmark suite to use getrusage

metadata ADDED

@@ -0,0 +1,70 @@
+--- !ruby/object:Gem::Specification
+name: tvdeyen-ferret
+version: !ruby/object:Gem::Version
+  hash: 53
+  prerelease: false
+  segments:
+  - 0
+  - 11
+  - 8
+  - 1
+  version: 0.11.8.1
+platform: ruby
+authors:
+- David Balmain
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2010-08-25 00:00:00 +02:00
+default_executable:
+dependencies: []
+description:
+email: dbalmain@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files:
+- README
+- TODO
+files:
+- README
+- TODO
+has_rdoc: true
+homepage: http://ferret.davebalmain.com/trac
+licenses: []
+post_install_message:
+rdoc_options:
+- --charset=UTF-8
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      hash: 3
+      segments:
+      - 0
+      version: "0"
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      hash: 3
+      segments:
+      - 0
+      version: "0"
+requirements: []
+rubyforge_project: ferret
+rubygems_version: 1.3.7
+signing_key:
+specification_version: 3
+summary: Ferret is a port of the Java Lucene project. It is a powerful indexing and search library.
+test_files: []