lucene 0.5.0.beta.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,147 @@
1
+ == 0.4.6 / 2010-08-31
2
+ * Bug fix: Using a has_one - should only delete the relationship to the old node, NOT delete the old node (#123)
3
+
4
+ == 0.4.5 / 2010-08-18
5
+ * Bug fix: When setting an indexed property = nil, raises an undefined method "root_class" exception (#122)
6
+
7
+ == 0.4.4 / 2010-08-01
8
+ * Fixed bug on traversing when using the RelationshipMixin (#121)
9
+ * BatchInserter and JRuby 1.6 - Fix iteration error with trying to modify in-place hash
10
+
11
+ == 0.4.3 / 2010-04-10
12
+ * Fixed .gitignore - make sure that we do not include unnecessarily files like neo4j databases. Release 0.4.2 contained test data.
13
+ * Added synchronize around Index.new so that two thread can't modify the same index at the same time.
14
+
15
+ == 0.4.2 / 2010-04-08
16
+ * No index on properties for the initialize method bug (#116)
17
+ * Tidy up Thread Synchronization in Lucene wrapper - lucene indexing performance improvement (#117)
18
+ * Permission bug loading neo4j jar file (#118)
19
+ * Spike: Make NodeMixin ActiveModel complient - experimental (#115)
20
+
21
+ == 0.4.1 / 2010-03-11
22
+ * Migrations (#108)
23
+ * BatchInserter (#111)
24
+ * Neo4j::Relationship.new should take a hash of properties (#110)
25
+ * Upgrade to neo4j-1.0 (#114)
26
+ * Bigfix: has_one should replace old relationship (#106)
27
+ * Bugfix: custom accessors for NodeMixin#update (#113)
28
+ * Bugfix: Indexed properties problem on extented ruby classes critical "properties indexer" (#112)
29
+
30
+ == 0.4.0 / 2010-02-06
31
+ * Performance improvements and Refactoring: Use and Extend Neo4j Java Classes (#97)
32
+ * Support for Index and Declaration of Properties on Relationships (#91)
33
+ * Upgrade to neo4j-1.0 rc (#100)
34
+ * All internal properties should be prefix with a '_',0.4.0 (#105)
35
+ * Generate relationship accessor methods for declared has_n and has_one relationships (#104)
36
+ * New way of creating relationship - Neo4j::Relationship.new (#103)
37
+ * Neo4j#init_node method should take one or more args (#98)
38
+ * Namespaced relationships: has_one...from using the wrong has_n...to(#92)
39
+ * Neo4j::NodeMixin and Neo4j::Node should allow a hash for initialization (#99)
40
+
41
+ == 0.3.3 / 2009-11-25
42
+ * Support for a counter property on has_lists (#75)
43
+ * Support for Cascade delete. On has_n, had_one and has_list (#81)
44
+ * NodeMixin#all should work with inheritance - Child classes should have a relationship of their own. (#64)
45
+ * Support for other lucene analyzer then StandardAnalyzer (#87)
46
+ * NodeMixin initialize should accept block like docs (#82)
47
+ * Add incoming relationship should work as expected: n1.relationships.incoming(:foo) << n2 (#80)
48
+ * Delete node from a has_list relationship should work as expected (#79)
49
+ * Improve stacktraces (#94)
50
+ * Removed sideeffect of rspecs (#90)
51
+ * Add debug method on NodeMixin to print it self (#88)
52
+ * Removed to_a method (#73)
53
+ * Upgrade to neo4j-1.0b10 (#95)
54
+ * Upgrade to lucene 2.9.0 (#83)
55
+ * Refactoring: RSpecs (#74)
56
+ * Refactoring: aggregate each, renamed to property aggregator (#72)
57
+ * BugFix: neo4j gem cannot be built from the source (#86)
58
+ * BugFix: Neo4j::relationship should not raise Exception if there are no relationships (#78)
59
+
60
+ == 0.3.2 / 2009-09-17
61
+ * Added support for aggregating nodes (#65)
62
+ * Wrapped Neo4j GraphAlgo AllSimplePath (#70)
63
+ * Added traversal with traversal position (#71)
64
+ * Removed DynamicAccessors mixin, replaced by [] operator (#67)
65
+ * Impl Neo4j.all_nodes (#69)
66
+ * Upgrated Neo4j jar file to 1.0-b9
67
+ * The Neo4j#relationship method now allows a filter parameter (#66)
68
+ * Neo4j.rb now can read database not created by Neo4j.rb - does not require classname property (#63)
69
+ * REST - added an "all" value for the depth traversal query parameter (#62)
70
+ * REST - Performance improvments using the Rest Mixin (#60)
71
+
72
+ == 0.3.1 / 2009-07-25
73
+ * Feature, extension - find path between given pair of nodes (#58)
74
+ * Fix a messy exception on GET /nodes/UnknownClassName (#57)
75
+ * Bug - exception on GET /nodes/classname/rel if rel is a has_one relationship (#56)
76
+ * Bug: GET /nodes/classname missing out nodes with no properties (#55)
77
+ * Bug: Lucene sorting caused exception if there were no documents (#54)
78
+ * Bug: reindexer fails to connect nodes to the IndexNode (#53)
79
+
80
+ == 0.3.0 / 2009-06-25
81
+ * Neo4j should track node changes
82
+ * RESTful support for lucene queries, sorting and paging
83
+ * RESTful support for Relationships
84
+ * RESTful support for Node and properties
85
+ * Experimental support for Master-Slave Replication via REST
86
+ * RESTful Node representation should contain hyperlinks to relationships
87
+ * Added some handy method like first and empty? on relationships
88
+ * Use new neo4j: neo-1.0-b8
89
+ * Add an event handler for create/delete nodes start/stop neo, update property/relationship
90
+ * The NodeMixin should behave like a hash, added [] and []= methods
91
+ * Support list topology - has_list and belongs_to_list Neo4j::NodeMixin Classmethods
92
+ * Should be possible to add relationships without declaring them (Neo4j#relationships.outgoing(:friends) << node)
93
+ * Neo4j extensions file structure, should be easy to create your own extensions
94
+ * Rename relation to relationship (Neo4j::Relations => Neo4j::Relationships, DynamicRelation => Relationship) [data incompatible change]
95
+ * Auto Transaction is now optional
96
+ * Setting Float properties fails under JRuby1.2.0
97
+ * Bug: Indexing relationships does not work
98
+ * Make the ReferenceNode include Neo4j::NodeMixin
99
+ * Added handy Neo4j class that simply includes the Neo4j::NodeMixin
100
+ * Neo4j::IndexNode now holds references to all nodes (Neo4j.ref_node -> Neo4j::IndexNode -> ...)
101
+
102
+
103
+ == 0.2.1 / 2009-03-15
104
+ * Refactoring of lucene indexing of the node space (28)
105
+ * Fixed bug on Neo4j::Nodemixin#property? (#22)
106
+
107
+
108
+ == 0.2.0 / 2009-01-20
109
+ * Impl. Neo4j::Node#traverse - enables traversal and filtering using TraversalPosition info (#17,#19)
110
+ * Impl. traversal to any depth (#15)
111
+ * Impl. traversal several relationships type at the same time (#16)
112
+ * Fixed a Lucene timezone bug (#20)
113
+ * Lots of refactoring of the neo4j.rb traversal code and RSpecs
114
+
115
+ == 0.1.0 / 2008-12-18
116
+ * Property can now be of any type (and not only String, Fixnum, Float)
117
+ * Indexing and Query with Date and DateTime
118
+ * YARD documentation
119
+ * Properties can be removed
120
+ * A property can be set to nil (it will then be removed).
121
+
122
+ == 0.0.7 / 2008-12-10
123
+ * Added method to_param and methods on the value object needed for Ruby on Rails
124
+ * Impl. update from a value object/hash for a node
125
+ * Impl. generation of value object classes/instances from a node.
126
+ * Refactoring the Transaction handling (reuse PlaceboTransaction instances if possible)
127
+ * Removed the need to start and stop neo. It will be done automatically when needed.
128
+
129
+
130
+ == 0.0.6 / 2008-12-03
131
+ * Removed the configuration from the Neo4j.start method. Now exist in Neo4j::Config and Lucene::Config.
132
+ * Implemented sort_by method.
133
+ * Lazy loading of search result. Execute the query and load the nodes only if needed.
134
+ * Added support to use lucene query language, example: Person.find("name:foo AND age:42")
135
+ * All test now uses RAM based lucene indexes.
136
+
137
+ == 0.0.5 / 2008-11-17
138
+ * Supports keeping lucene index in memory instead of on disk
139
+ * Added support for lucene full text search
140
+ * Fixed so neo4j runs on JRuby 1.1.5
141
+ * Implemented support for reindex all instances of a node class. This is needed if the lucene index is kept in memory or if the index is changed.
142
+ * Added ReferenceNode. All nodes now have a relationship from this reference node.
143
+ * Lots of refactoring
144
+ * Added the IMDB example. It shows how to create a neo database, lucene queries and node traversals.
145
+
146
+ == 0.0.4 / 2008-10-23
147
+ * First release to rubyforge
@@ -0,0 +1,17 @@
1
+ Maintainer:
2
+ Andreas Ronge <andreas dot ronge at gmail dot com>
3
+
4
+ Contributors:
5
+ * Martin Kleppmann
6
+ * Peter Neubauer
7
+ * Jan-Felix Wittmann
8
+ * Marius Mårnes Mathiesen
9
+ * Bert Fitié
10
+ * Jan Berkel
11
+ * David Beckwith
12
+ * Johny Ho
13
+ * Carlo Cabanilla
14
+ * Anders Janmyr
15
+ * Nick Sieger
16
+ * Sean Bowman
17
+ * BrilliantArc
data/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ source :gemcutter
2
+
3
+ gemspec
4
+
5
+ gem "rake", ">= 0.8.7"
6
+ gem "rdoc", ">= 2.5.10"
7
+ gem "horo", ">= 1.0.2"
8
+ gem "rspec", ">= 2.0.0"
9
+
@@ -0,0 +1,274 @@
1
+ = Lucene.rb
2
+
3
+ Lucene.rb is JRuby wrapper for the Lucene document database.
4
+
5
+ * Lucene (http://lucene.apache.org/java/docs/index.html) for querying and indexing.
6
+
7
+ == Installation
8
+
9
+ ==== Install JRuby
10
+ The easiest way to install JRuby is by using RVM, see http://rvm.beginrescueend.com. Otherwise check: http://kenai.com/projects/jruby/pages/GettingStarted#Installing_JRuby
11
+
12
+ == The Lucene Module
13
+
14
+ Lucene provides:
15
+ * Flexible Queries - Phrases, Wildcards, Compound boolean expressions etc...
16
+ * Field-specific Queries eg. title, artist, album
17
+ * Sorting
18
+ * Ranked Searching
19
+
20
+ The Lucene index will be updated after the transaction commits. It is not possible to
21
+ query for something that has been created inside the same transaction as where the query is performed.
22
+
23
+ === Lucene Document
24
+
25
+ In Lucene everything is a Document. A document can represent anything textual:
26
+ A Word Document, a DVD (the textual metadata only), or a Neo4j.rb node.
27
+ A document is like a record or row in a relationship database.
28
+
29
+ The following example shows how a document can be created by using the ''<<'' operator
30
+ on the Lucene::Index class and found using the Lucene::Index#find method.
31
+
32
+ Example of how to write a document and find it:
33
+
34
+ require 'lucene'
35
+
36
+ include Lucene
37
+
38
+ # the var/myindex parameter is either a path where to store the index or
39
+ # just a key if index is kept in memory (see below)
40
+ index = Index.new('var/myindex')
41
+
42
+ # add one document (a document is like a record or row in a relationship database)
43
+ index << {:id=>'1', :name=>'foo'}
44
+
45
+ # write to the index file
46
+ index.commit
47
+
48
+ # find a document with name foo
49
+ # hits is a ruby Enumeration of documents
50
+ hits = index.find{name == 'foo'}
51
+
52
+ # show the id of the first document (document 0) found
53
+ # (the document contains all stored fields - see below)
54
+ hits[0][:id] # => '1'
55
+
56
+ Notice that you have to call the commit method in order to update the index (both disk and in memory indexes).
57
+ Performing several update and delete operations before a commit will give much
58
+ better performance than committing after each operation.
59
+
60
+ === Keep indexing on disk
61
+
62
+ By default Neo4j::Lucene keeps indexes in memory. That means that when the application restarts
63
+ the index will be gone and you have to reindex everything again.
64
+
65
+ To store indexes on file:
66
+
67
+ Lucene::Config[:store_on_file] = true
68
+ Lucene::Config[:storage_path] => '/home/neo/lucene-db'
69
+
70
+ When creating a new index the location of the index will be the Lucene::Config[:storage_path] + index path
71
+ Example:
72
+
73
+ Lucene::Config[:store_on_file] = true
74
+ Lucene::Config[:storage_path] => '/home/neo/lucene-db'
75
+ index = Index.new('/foo/lucene')
76
+
77
+ The example above will store the index at /home/neo/lucene-db/foo/lucene
78
+
79
+ === Indexing several values with the same key
80
+
81
+ Let say a person can have several phone numbers. How do we index that?
82
+
83
+ index << {:id=>'1', :name=>'adam', :phone => ['987-654', '1234-5678']}
84
+
85
+
86
+ === Id field
87
+
88
+ All Documents must have one id field. If an id is not specified, the default will be: :id of type String.
89
+ A different id can be specified using the field_infos id_field property on the index:
90
+
91
+ index = Index.new('some/path/to/the/index')
92
+ index.field_infos.id_field = :my_id
93
+
94
+ To change the type of the my_id from String to a different type see below.
95
+
96
+ === Conversion of types
97
+
98
+ Lucene.rb can handle type conversion for you. (The Java Lucene library stores all
99
+ the fields as Strings)
100
+ For example if you want the id field to be a Fixnum
101
+
102
+ require 'lucene'
103
+ include Lucene
104
+
105
+ index = Index.new('var/myindex') # store the index at dir: var/myindex
106
+ index.field_infos[:id][:type] = Fixnum
107
+
108
+ index << {:id=>1, :name=>'foo'} # notice 1 is not a string now
109
+
110
+ index.commit
111
+
112
+ # find that document, hits is a ruby Enumeration of documents
113
+ hits = index.find(:name => 'foo')
114
+
115
+ # show the id of the first document (document 0) found
116
+ # (the document contains all stored fields - see below)
117
+ doc[0][:id] # => 1
118
+
119
+ If the field_info type parameter is not set then it has a default value of String.
120
+
121
+ === Storage of fields
122
+
123
+ By default only the id field will be stored.
124
+ That means that in the example above the :name field will not be included in the document.
125
+
126
+ Example
127
+ doc = index.find('name' => 'foo')
128
+ doc[:id] # => 1
129
+ doc[:name] # => nil
130
+
131
+ Use the field info :store=true if you want a field to be stored in the index
132
+ (otherwise it will only be searchable).
133
+
134
+ Example
135
+
136
+ require 'lucene'
137
+ include Lucene
138
+
139
+ index = Index.new('var/myindex') # store the index at dir: var/myindex
140
+ index.field_infos[:id][:type] = Fixnum
141
+ index.field_infos[:name][:store] = true # store this field
142
+
143
+ index << {:id=>1, :name=>'foo'} # notice 1 is not a string now
144
+
145
+ index.commit
146
+
147
+ # find that document, hits is a ruby Enumeration of documents
148
+ hits = index.find('name' => 'foo')
149
+
150
+ # let say hits only contains one document so we can use doc[0] for that one
151
+ # that document contains all stored fields (see below)
152
+ doc[0][:id] # => 1
153
+ doc[0][:name] # => 'foo'
154
+
155
+ === Setting field infos
156
+
157
+ As shown above you can set field infos like this
158
+
159
+ index.field_infos[:id][:type] = Fixnum
160
+
161
+ Or you can set several properties like this:
162
+
163
+ index.field_infos[:id] = {:type => Fixnum, :store => true}
164
+
165
+ ==== Tokenized
166
+
167
+ Field infos can be used to specify if the should be tokenized.
168
+ If this value is not set then the entire content of the field will be considered as a single term.
169
+
170
+ Example
171
+
172
+ index.field_infos[:text][:tokenized] = true
173
+
174
+ If not specified, the default is 'false'
175
+
176
+ ==== Analyzer
177
+
178
+ Field infos can also be used to set which analyzer should be used.
179
+ If none is specified, the default analyzer - org.apache.lucene.analysis.standard.StandardAnalyzer (:standard) will be used.
180
+
181
+
182
+ index.field_infos[:code][:tokenized] = false
183
+ index.field_infos[:code][:analyzer] = :standard
184
+
185
+ The following analyzer is supported
186
+ * :standard (default) - org.apache.lucene.analysis.standard.StandardAnalyzer
187
+ * :keyword - org.apache.lucene.analysis.KeywordAnalyzer
188
+ * :simple - org.apache.lucene.analysis.SimpleAnalyzer
189
+ * :whitespace - org.apache.lucene.analysis.WhitespaceAnalyzer
190
+ * :stop - org.apache.lucene.analysis.StopAnalyzer
191
+
192
+ For more info, check the Lucene documentation, http://lucene.apache.org/java/docs/
193
+
194
+
195
+ === Simple Queries
196
+
197
+ Lucene.rb support search in several fields:
198
+ Example:
199
+
200
+ # finds all document having both name 'foo' and age 42
201
+ hits = index.find('name' => 'foo', :age=>42)
202
+
203
+ Range queries:
204
+
205
+ # finds all document having both name 'foo' and age between 3 and 30
206
+ hits = index.find('name' => 'foo', :age=>3..30)
207
+
208
+ === Lucene Queries
209
+
210
+ If the query is string then the string is a Lucene query.
211
+
212
+ hits = index.find('name:foo')
213
+
214
+ For more information see:
215
+ http://lucene.apache.org/java/2_4_0/queryparsersyntax.html
216
+
217
+ === Advanced Queries (DSL)
218
+
219
+ The queries above can also be written in a lucene.rb DSL:
220
+
221
+ hits = index.find { (name == 'andreas') & (foo == 'bar')}
222
+
223
+ Expression with OR (|) is supported, example
224
+
225
+ # find all documents with name 'andreas' or age between 30 and 40
226
+ hits = index.find { (name == 'andreas') | (age == 30..40)}
227
+
228
+ === Sorting
229
+
230
+ Sorting is specified by the 'sort_by' parameter
231
+ Example:
232
+
233
+ hits = index.find(:name => 'foo', :sort_by=>:category)
234
+
235
+ To sort by several fields:
236
+
237
+ hits = index.find(:name => 'foo', :sort_by=>[:category, :country])
238
+
239
+ Example sort order:
240
+
241
+ hits = index.find(:name => 'foo', :sort_by=>[Desc[:category, :country], Asc[:city]])
242
+
243
+ === Thread-safety
244
+
245
+ The Lucene::Index is thread safe.
246
+ It guarantees that an index is not updated from two threads at the same time.
247
+
248
+
249
+ === Lucene Transactions
250
+
251
+ Use the Lucene::Transaction in order to do atomic commits.
252
+ By using a transaction you do not need to call the Index.commit method.
253
+
254
+ Example:
255
+
256
+ Transaction.run do |t|
257
+ index = Index.new('var/index/foo')
258
+ index << { id=>42, :name=>'andreas'}
259
+ t.failure # rollback
260
+ end
261
+
262
+ result = index.find('name' => 'andreas')
263
+ result.size.should == 0
264
+
265
+ You can find uncommitted documents with the uncommitted index property.
266
+
267
+ Example:
268
+
269
+ index = Index.new('var/index/foo')
270
+ index.uncommited #=> [document1, document2]
271
+
272
+ Notice that even if it looks like a new Index instance object was created the index.uncommitted
273
+ may return a non-empty array. This is because Index.new is a singleton - a new instance object is not created.
274
+