tvdeyen-ferret 0.11.8.1
Sign up to get free protection for your applications and to get access to all the features.
- data/README +90 -0
- data/TODO +109 -0
- metadata +70 -0
data/README
ADDED
@@ -0,0 +1,90 @@
|
|
1
|
+
= Ferret
|
2
|
+
|
3
|
+
Ferret is a Ruby search library inspired by the Apache Lucene search engine for
|
4
|
+
Java (http://jakarta.apache.org/lucene/). In the same way as Lucene, it is not
|
5
|
+
a standalone application, but a library you can use to index documents and
|
6
|
+
search for things in them later.
|
7
|
+
|
8
|
+
== Requirements
|
9
|
+
|
10
|
+
* Ruby 1.8
|
11
|
+
* C compiler to build the extension. Tested with gcc, VC6
|
12
|
+
* make (or nmake on windows)
|
13
|
+
|
14
|
+
== Installation
|
15
|
+
|
16
|
+
$ sudo gem install ferret
|
17
|
+
|
18
|
+
If you don't have rubygems installed you can still install Ferret. Just
|
19
|
+
download one of the zipped up versions of Ferret, unzip it and change into the
|
20
|
+
unzipped directory. Then run the following set of commands;
|
21
|
+
|
22
|
+
$ ruby setup.rb config
|
23
|
+
$ ruby setup.rb setup
|
24
|
+
$ sudo ruby setup.rb install
|
25
|
+
|
26
|
+
== Usage
|
27
|
+
|
28
|
+
You can read the TUTORIAL which you'll find in the same directory as this
|
29
|
+
README. You can also check the following modules for more specific
|
30
|
+
documentation.
|
31
|
+
|
32
|
+
* Ferret::Analysis: for more information on how the data is processed when it
|
33
|
+
is tokenized. There are a number of things you can do with your data such as
|
34
|
+
adding stop lists or perhaps a porter stemmer. There are also a number of
|
35
|
+
analyzers already available and it is almost trivial to create a new one
|
36
|
+
with a simple regular expression.
|
37
|
+
|
38
|
+
* Ferret::Search: for more information on querying the index. There are a
|
39
|
+
number of already available queries and it's unlikely you'll need to create
|
40
|
+
your own. You may however want to take advantage of the sorting or filtering
|
41
|
+
abilities of Ferret to present your data the best way you see fit.
|
42
|
+
|
43
|
+
* Ferret::Document: to find out how to create documents. This part of Ferret
|
44
|
+
is relatively straightforward. If you know how Strings, Hashes and Arrays work
|
45
|
+
Ferret then you'll be able to create Documents.
|
46
|
+
|
47
|
+
* Ferret::QueryParser: if you want to find out more about what you can do with
|
48
|
+
Ferret's Query Parser, this is the place to look. The query parser is one
|
49
|
+
area that could use a bit of work so please send your suggestions.
|
50
|
+
|
51
|
+
* Ferret::Index: for more advanced access to the index you'll probably want to
|
52
|
+
use the Ferret::Index::IndexWriter and Ferret::Index::IndexReader. This is
|
53
|
+
the place to look for more information on them.
|
54
|
+
|
55
|
+
* Ferret::Store: This is the module used to access the actual index storage
|
56
|
+
and won't be of much interest to most people.
|
57
|
+
|
58
|
+
=== Performance
|
59
|
+
|
60
|
+
We are unaware of any alternatives that can out-perform Ferret while still
|
61
|
+
matching it in features.
|
62
|
+
|
63
|
+
== Contact
|
64
|
+
|
65
|
+
For bug reports and patches I have set up Trac here;
|
66
|
+
|
67
|
+
http://ferret.davebalmain.com/trac
|
68
|
+
|
69
|
+
Queries, discussion etc should be addressed to the mailing lists here;
|
70
|
+
|
71
|
+
http://rubyforge.org/projects/ferret/
|
72
|
+
|
73
|
+
Alternatively you could create a new page for discussion on the Ferret wiki;
|
74
|
+
|
75
|
+
http://ferret.davebalmain.com/trac
|
76
|
+
|
77
|
+
Of course, since Ferret was ported from Apache Lucene, most of what you can
|
78
|
+
do with Lucene you can also do with Ferret.
|
79
|
+
|
80
|
+
== Authors
|
81
|
+
|
82
|
+
[<b>David Balmain</b>] Port to Ruby
|
83
|
+
|
84
|
+
[The Apache Software Foundation (Doug Cutting and friends)] Original Apache Lucene
|
85
|
+
|
86
|
+
== License
|
87
|
+
|
88
|
+
Ferret is available under an MIT-style license.
|
89
|
+
|
90
|
+
:include: MIT-LICENSE
|
data/TODO
ADDED
@@ -0,0 +1,109 @@
|
|
1
|
+
TODO
|
2
|
+
====
|
3
|
+
* C
|
4
|
+
- IMPORTANT:
|
5
|
+
+ FIX file descriptor overflow. See Tickets #341 and #343
|
6
|
+
- add .. operator to query parser. For example, [100 200] could be written as
|
7
|
+
100..200 or 100...201 like in Ruby Ranges
|
8
|
+
- remove exception handling from C code. All errors to be handled by return
|
9
|
+
values.
|
10
|
+
- Move to sqlite's locking model. Ferret should work fine in a multi-process
|
11
|
+
environment.
|
12
|
+
- Add optional logging. To be enabled at compilation time, perhaps?
|
13
|
+
- Add support for changing zlib and bzlib compression parameters
|
14
|
+
- Improve unit test coverage to 100%
|
15
|
+
- Add benchmark suite
|
16
|
+
- Add Rakefile for development purposes
|
17
|
+
+ task to publish gcov and benchmark results to ferret wiki
|
18
|
+
- Index rebuilding of old versioned indexes.
|
19
|
+
- Add a globally accessable, threadsafe symbol table. This will be very
|
20
|
+
useful for storing field names so that no objects need to strdup the
|
21
|
+
field-names but can just store the symbol representative instead.
|
22
|
+
+ this has been done but it can be improved using actual Symbol structs
|
23
|
+
instead of plain char*
|
24
|
+
- Make threading optional at compile time
|
25
|
+
- to_json should limit output to prevent memory overflow on large indexes.
|
26
|
+
Perhaps we could use some type of buffered read for this.
|
27
|
+
- Make BitVector run as fast as bitset from C++ STL. See;
|
28
|
+
c/benchmark/bm_bitvector.c
|
29
|
+
- Add a symbol table for field names. This will mean that we won't need to
|
30
|
+
worry about mallocing and freeing field names which happens all over the
|
31
|
+
place.
|
32
|
+
- Divide the headers into public and private (the private headers to be
|
33
|
+
stored in the src directory).
|
34
|
+
- Group-by search. ie you should be able to pass a field to group search
|
35
|
+
results by
|
36
|
+
- Auto-loading of documents during search. ie actual documents get returned
|
37
|
+
instead of document numbers.
|
38
|
+
|
39
|
+
* Ruby bindings
|
40
|
+
- argument checking for every method. We need a new api for argument checking
|
41
|
+
so that the arguments get checked at the start of each method that could
|
42
|
+
cause a segfault.
|
43
|
+
- improve memory management. It was way to complex at the moment. I also need
|
44
|
+
to document how it works so that other developers understand what is going
|
45
|
+
on.
|
46
|
+
- Replace Data_Wrap_Struct with ferret alternative which handles rewrapping
|
47
|
+
of structs automatically and also knows when to release a struct by using
|
48
|
+
refcounting.
|
49
|
+
|
50
|
+
* Ruby
|
51
|
+
- integrate rcov
|
52
|
+
- improve unit test coverage to 100%
|
53
|
+
|
54
|
+
* Documentation.
|
55
|
+
- generate Ruby binding documentation with custom build template similar
|
56
|
+
jaxdoc http://rubyforge.org/projects/jaxdoc
|
57
|
+
- all documentation should meet DOCUMENTATION_STANDARDS
|
58
|
+
- documentation in C code to be generated by doxygen
|
59
|
+
|
60
|
+
Someday Maybe
|
61
|
+
=============
|
62
|
+
* apply for Google Summer of Code 2009
|
63
|
+
* optimize read and write vint
|
64
|
+
- test the following outside of ferret before implementing
|
65
|
+
- perform a binary scan using bit-wise or to find out how many bytes need
|
66
|
+
to be written
|
67
|
+
- if the write/read will overflow the buffer, split it into two, refreshing
|
68
|
+
the buffer in between
|
69
|
+
- use Duff's device to write bytes now that we know how many we need
|
70
|
+
* add a super fast language based dictionary compression
|
71
|
+
* add portable stacktrace function. Perhaps implement as an external library.
|
72
|
+
- See http://www.nongnu.org/libunwind/
|
73
|
+
- See http://www.tlug.org.za/wiki/index.php/Obtaining_a_stack_trace_in_C_upon_SIGSEGV
|
74
|
+
* investigate unscored searching
|
75
|
+
* user defined sorting
|
76
|
+
* Fix highlighting to work for external fields
|
77
|
+
* investigate faster string hashing method
|
78
|
+
|
79
|
+
Done
|
80
|
+
====
|
81
|
+
* add rake install task
|
82
|
+
* FIX :create parameter so that it only deletes the files owned by Ferret.
|
83
|
+
* fix compression. Currently nothing is happening if you set a field to
|
84
|
+
:compress. I guess we'll just assume zlib is installed, as I think it has to
|
85
|
+
be for Ruby to be installed.
|
86
|
+
* add bzlib support
|
87
|
+
* integrate gcov
|
88
|
+
* add a field cache to IndexReader
|
89
|
+
* setup email alerts for svn commits
|
90
|
+
* Ranged, unordered searching. Ie search through the index until you have the
|
91
|
+
required number of documents and then break. This will require the ability to
|
92
|
+
start searches from a particular doc-num.
|
93
|
+
+ See searcher_search_unordered in the C code and Searcher#scan in Ruby
|
94
|
+
* improve unit test code. I'd like to implement some way to print out a stack
|
95
|
+
trace when a test fails so that it is easy to find the source of the error.
|
96
|
+
* catch segfaults and print stack trace so users can post helpful bug tickets.
|
97
|
+
again, see the same links for adding stacktrace to unit tests.
|
98
|
+
* Add string Sort descripter
|
99
|
+
* fix memory bug
|
100
|
+
* add MultiReader interface
|
101
|
+
* add lexicographical sort (byte sort)
|
102
|
+
* Add highlighting
|
103
|
+
* add field compression
|
104
|
+
* Fix highlighting to work for compressed fields
|
105
|
+
* Add Ferret::Index::Index
|
106
|
+
* Fix:
|
107
|
+
+ Working Query: field1:value1 AND NOT field2:value2
|
108
|
+
+ Failing Query: field1:value1 AND ( NOT field2:value2 )
|
109
|
+
* update benchmark suite to use getrusage
|
metadata
ADDED
@@ -0,0 +1,70 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: tvdeyen-ferret
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
hash: 53
|
5
|
+
prerelease: false
|
6
|
+
segments:
|
7
|
+
- 0
|
8
|
+
- 11
|
9
|
+
- 8
|
10
|
+
- 1
|
11
|
+
version: 0.11.8.1
|
12
|
+
platform: ruby
|
13
|
+
authors:
|
14
|
+
- David Balmain
|
15
|
+
autorequire:
|
16
|
+
bindir: bin
|
17
|
+
cert_chain: []
|
18
|
+
|
19
|
+
date: 2010-08-25 00:00:00 +02:00
|
20
|
+
default_executable:
|
21
|
+
dependencies: []
|
22
|
+
|
23
|
+
description:
|
24
|
+
email: dbalmain@gmail.com
|
25
|
+
executables: []
|
26
|
+
|
27
|
+
extensions: []
|
28
|
+
|
29
|
+
extra_rdoc_files:
|
30
|
+
- README
|
31
|
+
- TODO
|
32
|
+
files:
|
33
|
+
- README
|
34
|
+
- TODO
|
35
|
+
has_rdoc: true
|
36
|
+
homepage: http://ferret.davebalmain.com/trac
|
37
|
+
licenses: []
|
38
|
+
|
39
|
+
post_install_message:
|
40
|
+
rdoc_options:
|
41
|
+
- --charset=UTF-8
|
42
|
+
require_paths:
|
43
|
+
- lib
|
44
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
45
|
+
none: false
|
46
|
+
requirements:
|
47
|
+
- - ">="
|
48
|
+
- !ruby/object:Gem::Version
|
49
|
+
hash: 3
|
50
|
+
segments:
|
51
|
+
- 0
|
52
|
+
version: "0"
|
53
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
54
|
+
none: false
|
55
|
+
requirements:
|
56
|
+
- - ">="
|
57
|
+
- !ruby/object:Gem::Version
|
58
|
+
hash: 3
|
59
|
+
segments:
|
60
|
+
- 0
|
61
|
+
version: "0"
|
62
|
+
requirements: []
|
63
|
+
|
64
|
+
rubyforge_project: ferret
|
65
|
+
rubygems_version: 1.3.7
|
66
|
+
signing_key:
|
67
|
+
specification_version: 3
|
68
|
+
summary: Ferret is a port of the Java Lucene project. It is a powerful indexing and search library.
|
69
|
+
test_files: []
|
70
|
+
|