tvdeyen-ferret 0.11.8.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (3) hide show
  1. data/README +90 -0
  2. data/TODO +109 -0
  3. metadata +70 -0
data/README ADDED
@@ -0,0 +1,90 @@
1
+ = Ferret
2
+
3
+ Ferret is a Ruby search library inspired by the Apache Lucene search engine for
4
+ Java (http://jakarta.apache.org/lucene/). In the same way as Lucene, it is not
5
+ a standalone application, but a library you can use to index documents and
6
+ search for things in them later.
7
+
8
+ == Requirements
9
+
10
+ * Ruby 1.8
11
+ * C compiler to build the extension. Tested with gcc, VC6
12
+ * make (or nmake on windows)
13
+
14
+ == Installation
15
+
16
+ $ sudo gem install ferret
17
+
18
+ If you don't have rubygems installed you can still install Ferret. Just
19
+ download one of the zipped up versions of Ferret, unzip it and change into the
20
+ unzipped directory. Then run the following set of commands;
21
+
22
+ $ ruby setup.rb config
23
+ $ ruby setup.rb setup
24
+ $ sudo ruby setup.rb install
25
+
26
+ == Usage
27
+
28
+ You can read the TUTORIAL which you'll find in the same directory as this
29
+ README. You can also check the following modules for more specific
30
+ documentation.
31
+
32
+ * Ferret::Analysis: for more information on how the data is processed when it
33
+ is tokenized. There are a number of things you can do with your data such as
34
+ adding stop lists or perhaps a porter stemmer. There are also a number of
35
+ analyzers already available and it is almost trivial to create a new one
36
+ with a simple regular expression.
37
+
38
+ * Ferret::Search: for more information on querying the index. There are a
39
+ number of already available queries and it's unlikely you'll need to create
40
+ your own. You may however want to take advantage of the sorting or filtering
41
+ abilities of Ferret to present your data the best way you see fit.
42
+
43
+ * Ferret::Document: to find out how to create documents. This part of Ferret
44
+ is relatively straightforward. If you know how Strings, Hashes and Arrays work
45
+ Ferret then you'll be able to create Documents.
46
+
47
+ * Ferret::QueryParser: if you want to find out more about what you can do with
48
+ Ferret's Query Parser, this is the place to look. The query parser is one
49
+ area that could use a bit of work so please send your suggestions.
50
+
51
+ * Ferret::Index: for more advanced access to the index you'll probably want to
52
+ use the Ferret::Index::IndexWriter and Ferret::Index::IndexReader. This is
53
+ the place to look for more information on them.
54
+
55
+ * Ferret::Store: This is the module used to access the actual index storage
56
+ and won't be of much interest to most people.
57
+
58
+ === Performance
59
+
60
+ We are unaware of any alternatives that can out-perform Ferret while still
61
+ matching it in features.
62
+
63
+ == Contact
64
+
65
+ For bug reports and patches I have set up Trac here;
66
+
67
+ http://ferret.davebalmain.com/trac
68
+
69
+ Queries, discussion etc should be addressed to the mailing lists here;
70
+
71
+ http://rubyforge.org/projects/ferret/
72
+
73
+ Alternatively you could create a new page for discussion on the Ferret wiki;
74
+
75
+ http://ferret.davebalmain.com/trac
76
+
77
+ Of course, since Ferret was ported from Apache Lucene, most of what you can
78
+ do with Lucene you can also do with Ferret.
79
+
80
+ == Authors
81
+
82
+ [<b>David Balmain</b>] Port to Ruby
83
+
84
+ [The Apache Software Foundation (Doug Cutting and friends)] Original Apache Lucene
85
+
86
+ == License
87
+
88
+ Ferret is available under an MIT-style license.
89
+
90
+ :include: MIT-LICENSE
data/TODO ADDED
@@ -0,0 +1,109 @@
1
+ TODO
2
+ ====
3
+ * C
4
+ - IMPORTANT:
5
+ + FIX file descriptor overflow. See Tickets #341 and #343
6
+ - add .. operator to query parser. For example, [100 200] could be written as
7
+ 100..200 or 100...201 like in Ruby Ranges
8
+ - remove exception handling from C code. All errors to be handled by return
9
+ values.
10
+ - Move to sqlite's locking model. Ferret should work fine in a multi-process
11
+ environment.
12
+ - Add optional logging. To be enabled at compilation time, perhaps?
13
+ - Add support for changing zlib and bzlib compression parameters
14
+ - Improve unit test coverage to 100%
15
+ - Add benchmark suite
16
+ - Add Rakefile for development purposes
17
+ + task to publish gcov and benchmark results to ferret wiki
18
+ - Index rebuilding of old versioned indexes.
19
+ - Add a globally accessable, threadsafe symbol table. This will be very
20
+ useful for storing field names so that no objects need to strdup the
21
+ field-names but can just store the symbol representative instead.
22
+ + this has been done but it can be improved using actual Symbol structs
23
+ instead of plain char*
24
+ - Make threading optional at compile time
25
+ - to_json should limit output to prevent memory overflow on large indexes.
26
+ Perhaps we could use some type of buffered read for this.
27
+ - Make BitVector run as fast as bitset from C++ STL. See;
28
+ c/benchmark/bm_bitvector.c
29
+ - Add a symbol table for field names. This will mean that we won't need to
30
+ worry about mallocing and freeing field names which happens all over the
31
+ place.
32
+ - Divide the headers into public and private (the private headers to be
33
+ stored in the src directory).
34
+ - Group-by search. ie you should be able to pass a field to group search
35
+ results by
36
+ - Auto-loading of documents during search. ie actual documents get returned
37
+ instead of document numbers.
38
+
39
+ * Ruby bindings
40
+ - argument checking for every method. We need a new api for argument checking
41
+ so that the arguments get checked at the start of each method that could
42
+ cause a segfault.
43
+ - improve memory management. It was way to complex at the moment. I also need
44
+ to document how it works so that other developers understand what is going
45
+ on.
46
+ - Replace Data_Wrap_Struct with ferret alternative which handles rewrapping
47
+ of structs automatically and also knows when to release a struct by using
48
+ refcounting.
49
+
50
+ * Ruby
51
+ - integrate rcov
52
+ - improve unit test coverage to 100%
53
+
54
+ * Documentation.
55
+ - generate Ruby binding documentation with custom build template similar
56
+ jaxdoc http://rubyforge.org/projects/jaxdoc
57
+ - all documentation should meet DOCUMENTATION_STANDARDS
58
+ - documentation in C code to be generated by doxygen
59
+
60
+ Someday Maybe
61
+ =============
62
+ * apply for Google Summer of Code 2009
63
+ * optimize read and write vint
64
+ - test the following outside of ferret before implementing
65
+ - perform a binary scan using bit-wise or to find out how many bytes need
66
+ to be written
67
+ - if the write/read will overflow the buffer, split it into two, refreshing
68
+ the buffer in between
69
+ - use Duff's device to write bytes now that we know how many we need
70
+ * add a super fast language based dictionary compression
71
+ * add portable stacktrace function. Perhaps implement as an external library.
72
+ - See http://www.nongnu.org/libunwind/
73
+ - See http://www.tlug.org.za/wiki/index.php/Obtaining_a_stack_trace_in_C_upon_SIGSEGV
74
+ * investigate unscored searching
75
+ * user defined sorting
76
+ * Fix highlighting to work for external fields
77
+ * investigate faster string hashing method
78
+
79
+ Done
80
+ ====
81
+ * add rake install task
82
+ * FIX :create parameter so that it only deletes the files owned by Ferret.
83
+ * fix compression. Currently nothing is happening if you set a field to
84
+ :compress. I guess we'll just assume zlib is installed, as I think it has to
85
+ be for Ruby to be installed.
86
+ * add bzlib support
87
+ * integrate gcov
88
+ * add a field cache to IndexReader
89
+ * setup email alerts for svn commits
90
+ * Ranged, unordered searching. Ie search through the index until you have the
91
+ required number of documents and then break. This will require the ability to
92
+ start searches from a particular doc-num.
93
+ + See searcher_search_unordered in the C code and Searcher#scan in Ruby
94
+ * improve unit test code. I'd like to implement some way to print out a stack
95
+ trace when a test fails so that it is easy to find the source of the error.
96
+ * catch segfaults and print stack trace so users can post helpful bug tickets.
97
+ again, see the same links for adding stacktrace to unit tests.
98
+ * Add string Sort descripter
99
+ * fix memory bug
100
+ * add MultiReader interface
101
+ * add lexicographical sort (byte sort)
102
+ * Add highlighting
103
+ * add field compression
104
+ * Fix highlighting to work for compressed fields
105
+ * Add Ferret::Index::Index
106
+ * Fix:
107
+ + Working Query: field1:value1 AND NOT field2:value2
108
+ + Failing Query: field1:value1 AND ( NOT field2:value2 )
109
+ * update benchmark suite to use getrusage
metadata ADDED
@@ -0,0 +1,70 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: tvdeyen-ferret
3
+ version: !ruby/object:Gem::Version
4
+ hash: 53
5
+ prerelease: false
6
+ segments:
7
+ - 0
8
+ - 11
9
+ - 8
10
+ - 1
11
+ version: 0.11.8.1
12
+ platform: ruby
13
+ authors:
14
+ - David Balmain
15
+ autorequire:
16
+ bindir: bin
17
+ cert_chain: []
18
+
19
+ date: 2010-08-25 00:00:00 +02:00
20
+ default_executable:
21
+ dependencies: []
22
+
23
+ description:
24
+ email: dbalmain@gmail.com
25
+ executables: []
26
+
27
+ extensions: []
28
+
29
+ extra_rdoc_files:
30
+ - README
31
+ - TODO
32
+ files:
33
+ - README
34
+ - TODO
35
+ has_rdoc: true
36
+ homepage: http://ferret.davebalmain.com/trac
37
+ licenses: []
38
+
39
+ post_install_message:
40
+ rdoc_options:
41
+ - --charset=UTF-8
42
+ require_paths:
43
+ - lib
44
+ required_ruby_version: !ruby/object:Gem::Requirement
45
+ none: false
46
+ requirements:
47
+ - - ">="
48
+ - !ruby/object:Gem::Version
49
+ hash: 3
50
+ segments:
51
+ - 0
52
+ version: "0"
53
+ required_rubygems_version: !ruby/object:Gem::Requirement
54
+ none: false
55
+ requirements:
56
+ - - ">="
57
+ - !ruby/object:Gem::Version
58
+ hash: 3
59
+ segments:
60
+ - 0
61
+ version: "0"
62
+ requirements: []
63
+
64
+ rubyforge_project: ferret
65
+ rubygems_version: 1.3.7
66
+ signing_key:
67
+ specification_version: 3
68
+ summary: Ferret is a port of the Java Lucene project. It is a powerful indexing and search library.
69
+ test_files: []
70
+