lucene 0.5.0.beta.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,147 @@
1
+ == 0.4.6 / 2010-08-31
2
+ * Bug fix: Using a has_one - should only delete the relationship to the old node, NOT delete the old node (#123)
3
+
4
+ == 0.4.5 / 2010-08-18
5
+ * Bug fix: When setting an indexed property = nil, raises an undefined method "root_class" exception (#122)
6
+
7
+ == 0.4.4 / 2010-08-01
8
+ * Fixed bug on traversing when using the RelationshipMixin (#121)
9
+ * BatchInserter and JRuby 1.6 - Fix iteration error with trying to modify in-place hash
10
+
11
+ == 0.4.3 / 2010-04-10
12
+ * Fixed .gitignore - make sure that we do not include unnecessarily files like neo4j databases. Release 0.4.2 contained test data.
13
+ * Added synchronize around Index.new so that two thread can't modify the same index at the same time.
14
+
15
+ == 0.4.2 / 2010-04-08
16
+ * No index on properties for the initialize method bug (#116)
17
+ * Tidy up Thread Synchronization in Lucene wrapper - lucene indexing performance improvement (#117)
18
+ * Permission bug loading neo4j jar file (#118)
19
+ * Spike: Make NodeMixin ActiveModel complient - experimental (#115)
20
+
21
+ == 0.4.1 / 2010-03-11
22
+ * Migrations (#108)
23
+ * BatchInserter (#111)
24
+ * Neo4j::Relationship.new should take a hash of properties (#110)
25
+ * Upgrade to neo4j-1.0 (#114)
26
+ * Bigfix: has_one should replace old relationship (#106)
27
+ * Bugfix: custom accessors for NodeMixin#update (#113)
28
+ * Bugfix: Indexed properties problem on extented ruby classes critical "properties indexer" (#112)
29
+
30
+ == 0.4.0 / 2010-02-06
31
+ * Performance improvements and Refactoring: Use and Extend Neo4j Java Classes (#97)
32
+ * Support for Index and Declaration of Properties on Relationships (#91)
33
+ * Upgrade to neo4j-1.0 rc (#100)
34
+ * All internal properties should be prefix with a '_',0.4.0 (#105)
35
+ * Generate relationship accessor methods for declared has_n and has_one relationships (#104)
36
+ * New way of creating relationship - Neo4j::Relationship.new (#103)
37
+ * Neo4j#init_node method should take one or more args (#98)
38
+ * Namespaced relationships: has_one...from using the wrong has_n...to(#92)
39
+ * Neo4j::NodeMixin and Neo4j::Node should allow a hash for initialization (#99)
40
+
41
+ == 0.3.3 / 2009-11-25
42
+ * Support for a counter property on has_lists (#75)
43
+ * Support for Cascade delete. On has_n, had_one and has_list (#81)
44
+ * NodeMixin#all should work with inheritance - Child classes should have a relationship of their own. (#64)
45
+ * Support for other lucene analyzer then StandardAnalyzer (#87)
46
+ * NodeMixin initialize should accept block like docs (#82)
47
+ * Add incoming relationship should work as expected: n1.relationships.incoming(:foo) << n2 (#80)
48
+ * Delete node from a has_list relationship should work as expected (#79)
49
+ * Improve stacktraces (#94)
50
+ * Removed sideeffect of rspecs (#90)
51
+ * Add debug method on NodeMixin to print it self (#88)
52
+ * Removed to_a method (#73)
53
+ * Upgrade to neo4j-1.0b10 (#95)
54
+ * Upgrade to lucene 2.9.0 (#83)
55
+ * Refactoring: RSpecs (#74)
56
+ * Refactoring: aggregate each, renamed to property aggregator (#72)
57
+ * BugFix: neo4j gem cannot be built from the source (#86)
58
+ * BugFix: Neo4j::relationship should not raise Exception if there are no relationships (#78)
59
+
60
+ == 0.3.2 / 2009-09-17
61
+ * Added support for aggregating nodes (#65)
62
+ * Wrapped Neo4j GraphAlgo AllSimplePath (#70)
63
+ * Added traversal with traversal position (#71)
64
+ * Removed DynamicAccessors mixin, replaced by [] operator (#67)
65
+ * Impl Neo4j.all_nodes (#69)
66
+ * Upgrated Neo4j jar file to 1.0-b9
67
+ * The Neo4j#relationship method now allows a filter parameter (#66)
68
+ * Neo4j.rb now can read database not created by Neo4j.rb - does not require classname property (#63)
69
+ * REST - added an "all" value for the depth traversal query parameter (#62)
70
+ * REST - Performance improvments using the Rest Mixin (#60)
71
+
72
+ == 0.3.1 / 2009-07-25
73
+ * Feature, extension - find path between given pair of nodes (#58)
74
+ * Fix a messy exception on GET /nodes/UnknownClassName (#57)
75
+ * Bug - exception on GET /nodes/classname/rel if rel is a has_one relationship (#56)
76
+ * Bug: GET /nodes/classname missing out nodes with no properties (#55)
77
+ * Bug: Lucene sorting caused exception if there were no documents (#54)
78
+ * Bug: reindexer fails to connect nodes to the IndexNode (#53)
79
+
80
+ == 0.3.0 / 2009-06-25
81
+ * Neo4j should track node changes
82
+ * RESTful support for lucene queries, sorting and paging
83
+ * RESTful support for Relationships
84
+ * RESTful support for Node and properties
85
+ * Experimental support for Master-Slave Replication via REST
86
+ * RESTful Node representation should contain hyperlinks to relationships
87
+ * Added some handy method like first and empty? on relationships
88
+ * Use new neo4j: neo-1.0-b8
89
+ * Add an event handler for create/delete nodes start/stop neo, update property/relationship
90
+ * The NodeMixin should behave like a hash, added [] and []= methods
91
+ * Support list topology - has_list and belongs_to_list Neo4j::NodeMixin Classmethods
92
+ * Should be possible to add relationships without declaring them (Neo4j#relationships.outgoing(:friends) << node)
93
+ * Neo4j extensions file structure, should be easy to create your own extensions
94
+ * Rename relation to relationship (Neo4j::Relations => Neo4j::Relationships, DynamicRelation => Relationship) [data incompatible change]
95
+ * Auto Transaction is now optional
96
+ * Setting Float properties fails under JRuby1.2.0
97
+ * Bug: Indexing relationships does not work
98
+ * Make the ReferenceNode include Neo4j::NodeMixin
99
+ * Added handy Neo4j class that simply includes the Neo4j::NodeMixin
100
+ * Neo4j::IndexNode now holds references to all nodes (Neo4j.ref_node -> Neo4j::IndexNode -> ...)
101
+
102
+
103
+ == 0.2.1 / 2009-03-15
104
+ * Refactoring of lucene indexing of the node space (28)
105
+ * Fixed bug on Neo4j::Nodemixin#property? (#22)
106
+
107
+
108
+ == 0.2.0 / 2009-01-20
109
+ * Impl. Neo4j::Node#traverse - enables traversal and filtering using TraversalPosition info (#17,#19)
110
+ * Impl. traversal to any depth (#15)
111
+ * Impl. traversal several relationships type at the same time (#16)
112
+ * Fixed a Lucene timezone bug (#20)
113
+ * Lots of refactoring of the neo4j.rb traversal code and RSpecs
114
+
115
+ == 0.1.0 / 2008-12-18
116
+ * Property can now be of any type (and not only String, Fixnum, Float)
117
+ * Indexing and Query with Date and DateTime
118
+ * YARD documentation
119
+ * Properties can be removed
120
+ * A property can be set to nil (it will then be removed).
121
+
122
+ == 0.0.7 / 2008-12-10
123
+ * Added method to_param and methods on the value object needed for Ruby on Rails
124
+ * Impl. update from a value object/hash for a node
125
+ * Impl. generation of value object classes/instances from a node.
126
+ * Refactoring the Transaction handling (reuse PlaceboTransaction instances if possible)
127
+ * Removed the need to start and stop neo. It will be done automatically when needed.
128
+
129
+
130
+ == 0.0.6 / 2008-12-03
131
+ * Removed the configuration from the Neo4j.start method. Now exist in Neo4j::Config and Lucene::Config.
132
+ * Implemented sort_by method.
133
+ * Lazy loading of search result. Execute the query and load the nodes only if needed.
134
+ * Added support to use lucene query language, example: Person.find("name:foo AND age:42")
135
+ * All test now uses RAM based lucene indexes.
136
+
137
+ == 0.0.5 / 2008-11-17
138
+ * Supports keeping lucene index in memory instead of on disk
139
+ * Added support for lucene full text search
140
+ * Fixed so neo4j runs on JRuby 1.1.5
141
+ * Implemented support for reindex all instances of a node class. This is needed if the lucene index is kept in memory or if the index is changed.
142
+ * Added ReferenceNode. All nodes now have a relationship from this reference node.
143
+ * Lots of refactoring
144
+ * Added the IMDB example. It shows how to create a neo database, lucene queries and node traversals.
145
+
146
+ == 0.0.4 / 2008-10-23
147
+ * First release to rubyforge
@@ -0,0 +1,17 @@
1
+ Maintainer:
2
+ Andreas Ronge <andreas dot ronge at gmail dot com>
3
+
4
+ Contributors:
5
+ * Martin Kleppmann
6
+ * Peter Neubauer
7
+ * Jan-Felix Wittmann
8
+ * Marius Mårnes Mathiesen
9
+ * Bert Fitié
10
+ * Jan Berkel
11
+ * David Beckwith
12
+ * Johny Ho
13
+ * Carlo Cabanilla
14
+ * Anders Janmyr
15
+ * Nick Sieger
16
+ * Sean Bowman
17
+ * BrilliantArc
data/Gemfile ADDED
@@ -0,0 +1,9 @@
1
+ source :gemcutter
2
+
3
+ gemspec
4
+
5
+ gem "rake", ">= 0.8.7"
6
+ gem "rdoc", ">= 2.5.10"
7
+ gem "horo", ">= 1.0.2"
8
+ gem "rspec", ">= 2.0.0"
9
+
@@ -0,0 +1,274 @@
1
+ = Lucene.rb
2
+
3
+ Lucene.rb is JRuby wrapper for the Lucene document database.
4
+
5
+ * Lucene (http://lucene.apache.org/java/docs/index.html) for querying and indexing.
6
+
7
+ == Installation
8
+
9
+ ==== Install JRuby
10
+ The easiest way to install JRuby is by using RVM, see http://rvm.beginrescueend.com. Otherwise check: http://kenai.com/projects/jruby/pages/GettingStarted#Installing_JRuby
11
+
12
+ == The Lucene Module
13
+
14
+ Lucene provides:
15
+ * Flexible Queries - Phrases, Wildcards, Compound boolean expressions etc...
16
+ * Field-specific Queries eg. title, artist, album
17
+ * Sorting
18
+ * Ranked Searching
19
+
20
+ The Lucene index will be updated after the transaction commits. It is not possible to
21
+ query for something that has been created inside the same transaction as where the query is performed.
22
+
23
+ === Lucene Document
24
+
25
+ In Lucene everything is a Document. A document can represent anything textual:
26
+ A Word Document, a DVD (the textual metadata only), or a Neo4j.rb node.
27
+ A document is like a record or row in a relationship database.
28
+
29
+ The following example shows how a document can be created by using the ''<<'' operator
30
+ on the Lucene::Index class and found using the Lucene::Index#find method.
31
+
32
+ Example of how to write a document and find it:
33
+
34
+ require 'lucene'
35
+
36
+ include Lucene
37
+
38
+ # the var/myindex parameter is either a path where to store the index or
39
+ # just a key if index is kept in memory (see below)
40
+ index = Index.new('var/myindex')
41
+
42
+ # add one document (a document is like a record or row in a relationship database)
43
+ index << {:id=>'1', :name=>'foo'}
44
+
45
+ # write to the index file
46
+ index.commit
47
+
48
+ # find a document with name foo
49
+ # hits is a ruby Enumeration of documents
50
+ hits = index.find{name == 'foo'}
51
+
52
+ # show the id of the first document (document 0) found
53
+ # (the document contains all stored fields - see below)
54
+ hits[0][:id] # => '1'
55
+
56
+ Notice that you have to call the commit method in order to update the index (both disk and in memory indexes).
57
+ Performing several update and delete operations before a commit will give much
58
+ better performance than committing after each operation.
59
+
60
+ === Keep indexing on disk
61
+
62
+ By default Neo4j::Lucene keeps indexes in memory. That means that when the application restarts
63
+ the index will be gone and you have to reindex everything again.
64
+
65
+ To store indexes on file:
66
+
67
+ Lucene::Config[:store_on_file] = true
68
+ Lucene::Config[:storage_path] => '/home/neo/lucene-db'
69
+
70
+ When creating a new index the location of the index will be the Lucene::Config[:storage_path] + index path
71
+ Example:
72
+
73
+ Lucene::Config[:store_on_file] = true
74
+ Lucene::Config[:storage_path] => '/home/neo/lucene-db'
75
+ index = Index.new('/foo/lucene')
76
+
77
+ The example above will store the index at /home/neo/lucene-db/foo/lucene
78
+
79
+ === Indexing several values with the same key
80
+
81
+ Let say a person can have several phone numbers. How do we index that?
82
+
83
+ index << {:id=>'1', :name=>'adam', :phone => ['987-654', '1234-5678']}
84
+
85
+
86
+ === Id field
87
+
88
+ All Documents must have one id field. If an id is not specified, the default will be: :id of type String.
89
+ A different id can be specified using the field_infos id_field property on the index:
90
+
91
+ index = Index.new('some/path/to/the/index')
92
+ index.field_infos.id_field = :my_id
93
+
94
+ To change the type of the my_id from String to a different type see below.
95
+
96
+ === Conversion of types
97
+
98
+ Lucene.rb can handle type conversion for you. (The Java Lucene library stores all
99
+ the fields as Strings)
100
+ For example if you want the id field to be a Fixnum
101
+
102
+ require 'lucene'
103
+ include Lucene
104
+
105
+ index = Index.new('var/myindex') # store the index at dir: var/myindex
106
+ index.field_infos[:id][:type] = Fixnum
107
+
108
+ index << {:id=>1, :name=>'foo'} # notice 1 is not a string now
109
+
110
+ index.commit
111
+
112
+ # find that document, hits is a ruby Enumeration of documents
113
+ hits = index.find(:name => 'foo')
114
+
115
+ # show the id of the first document (document 0) found
116
+ # (the document contains all stored fields - see below)
117
+ doc[0][:id] # => 1
118
+
119
+ If the field_info type parameter is not set then it has a default value of String.
120
+
121
+ === Storage of fields
122
+
123
+ By default only the id field will be stored.
124
+ That means that in the example above the :name field will not be included in the document.
125
+
126
+ Example
127
+ doc = index.find('name' => 'foo')
128
+ doc[:id] # => 1
129
+ doc[:name] # => nil
130
+
131
+ Use the field info :store=true if you want a field to be stored in the index
132
+ (otherwise it will only be searchable).
133
+
134
+ Example
135
+
136
+ require 'lucene'
137
+ include Lucene
138
+
139
+ index = Index.new('var/myindex') # store the index at dir: var/myindex
140
+ index.field_infos[:id][:type] = Fixnum
141
+ index.field_infos[:name][:store] = true # store this field
142
+
143
+ index << {:id=>1, :name=>'foo'} # notice 1 is not a string now
144
+
145
+ index.commit
146
+
147
+ # find that document, hits is a ruby Enumeration of documents
148
+ hits = index.find('name' => 'foo')
149
+
150
+ # let say hits only contains one document so we can use doc[0] for that one
151
+ # that document contains all stored fields (see below)
152
+ doc[0][:id] # => 1
153
+ doc[0][:name] # => 'foo'
154
+
155
+ === Setting field infos
156
+
157
+ As shown above you can set field infos like this
158
+
159
+ index.field_infos[:id][:type] = Fixnum
160
+
161
+ Or you can set several properties like this:
162
+
163
+ index.field_infos[:id] = {:type => Fixnum, :store => true}
164
+
165
+ ==== Tokenized
166
+
167
+ Field infos can be used to specify if the should be tokenized.
168
+ If this value is not set then the entire content of the field will be considered as a single term.
169
+
170
+ Example
171
+
172
+ index.field_infos[:text][:tokenized] = true
173
+
174
+ If not specified, the default is 'false'
175
+
176
+ ==== Analyzer
177
+
178
+ Field infos can also be used to set which analyzer should be used.
179
+ If none is specified, the default analyzer - org.apache.lucene.analysis.standard.StandardAnalyzer (:standard) will be used.
180
+
181
+
182
+ index.field_infos[:code][:tokenized] = false
183
+ index.field_infos[:code][:analyzer] = :standard
184
+
185
+ The following analyzer is supported
186
+ * :standard (default) - org.apache.lucene.analysis.standard.StandardAnalyzer
187
+ * :keyword - org.apache.lucene.analysis.KeywordAnalyzer
188
+ * :simple - org.apache.lucene.analysis.SimpleAnalyzer
189
+ * :whitespace - org.apache.lucene.analysis.WhitespaceAnalyzer
190
+ * :stop - org.apache.lucene.analysis.StopAnalyzer
191
+
192
+ For more info, check the Lucene documentation, http://lucene.apache.org/java/docs/
193
+
194
+
195
+ === Simple Queries
196
+
197
+ Lucene.rb support search in several fields:
198
+ Example:
199
+
200
+ # finds all document having both name 'foo' and age 42
201
+ hits = index.find('name' => 'foo', :age=>42)
202
+
203
+ Range queries:
204
+
205
+ # finds all document having both name 'foo' and age between 3 and 30
206
+ hits = index.find('name' => 'foo', :age=>3..30)
207
+
208
+ === Lucene Queries
209
+
210
+ If the query is string then the string is a Lucene query.
211
+
212
+ hits = index.find('name:foo')
213
+
214
+ For more information see:
215
+ http://lucene.apache.org/java/2_4_0/queryparsersyntax.html
216
+
217
+ === Advanced Queries (DSL)
218
+
219
+ The queries above can also be written in a lucene.rb DSL:
220
+
221
+ hits = index.find { (name == 'andreas') & (foo == 'bar')}
222
+
223
+ Expression with OR (|) is supported, example
224
+
225
+ # find all documents with name 'andreas' or age between 30 and 40
226
+ hits = index.find { (name == 'andreas') | (age == 30..40)}
227
+
228
+ === Sorting
229
+
230
+ Sorting is specified by the 'sort_by' parameter
231
+ Example:
232
+
233
+ hits = index.find(:name => 'foo', :sort_by=>:category)
234
+
235
+ To sort by several fields:
236
+
237
+ hits = index.find(:name => 'foo', :sort_by=>[:category, :country])
238
+
239
+ Example sort order:
240
+
241
+ hits = index.find(:name => 'foo', :sort_by=>[Desc[:category, :country], Asc[:city]])
242
+
243
+ === Thread-safety
244
+
245
+ The Lucene::Index is thread safe.
246
+ It guarantees that an index is not updated from two threads at the same time.
247
+
248
+
249
+ === Lucene Transactions
250
+
251
+ Use the Lucene::Transaction in order to do atomic commits.
252
+ By using a transaction you do not need to call the Index.commit method.
253
+
254
+ Example:
255
+
256
+ Transaction.run do |t|
257
+ index = Index.new('var/index/foo')
258
+ index << { id=>42, :name=>'andreas'}
259
+ t.failure # rollback
260
+ end
261
+
262
+ result = index.find('name' => 'andreas')
263
+ result.size.should == 0
264
+
265
+ You can find uncommitted documents with the uncommitted index property.
266
+
267
+ Example:
268
+
269
+ index = Index.new('var/index/foo')
270
+ index.uncommited #=> [document1, document2]
271
+
272
+ Notice that even if it looks like a new Index instance object was created the index.uncommitted
273
+ may return a non-empty array. This is because Index.new is a singleton - a new instance object is not created.
274
+