make-text-search 0.1

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc ADDED
@@ -0,0 +1,78 @@
1
+ = MakeTextSearch
2
+
3
+ MakeTextSearch is a tool that let you make full-text search using the engine of your RDBMS easily.
4
+
5
+ There are a tools like {Sphinx}[http://sphinxsearch.com/] or {Lucene}[http://lucene.apache.org/] very powerful and fast, but they require an extra effort to configure and maintain them, because they are tools outside of the RDBMS. Some RDBMS, like PostgreSQL or MySQL, have their own full-text search engine, so the time in configuration and maintenance is lesser.
6
+
7
+ In this first version we have implemented support for {PostgreSQL}[http://www.postgresql.org/docs/8.3/static/textsearch.html]. In the near future we will implement support for {MySQL}[http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html] and more. If the database has no full-text search engine it will use an equivalent using plain SQL.
8
+
9
+ MakeTextSearch works with Rails 3
10
+
11
+ == Installation
12
+
13
+ In the +Gemfile+ file add
14
+
15
+ gem "make-text-search"
16
+
17
+ After the +bundle install+ you have to generate the migration to create the documents table
18
+
19
+ rails generate text_search:migration
20
+
21
+ == Usage
22
+
23
+ In the models where you want to run full-text searchs, you have to declare the indexed fields using +has_text_search+
24
+
25
+ class Post < ActiveRecord::Base
26
+ has_text_search :title, :content
27
+ end
28
+
29
+ The fields added to the index can be virtual.
30
+
31
+ class Post < ActiveRecord::Base
32
+ belongs_to :user
33
+
34
+ # Add the user name to the index
35
+ def user_name
36
+ user.try :name
37
+ end
38
+
39
+ has_text_search :title, :content, :user_name
40
+ end
41
+
42
+ === Filters
43
+
44
+ The content added to the index can be filtered. Right now there are two filters: +:substrings+ y +:strip_html+
45
+
46
+ class Post < ActiveRecord::Base
47
+ has_text_search :title, :filter => :substrings
48
+ has_text_search :content, :intro :filter => [:strip_html, :substrings]
49
+ end
50
+
51
+ You can use several filters using an array. The order is important. If you use both +:substrings+ and +:strip_html+, +:strip_html+ should be the first.
52
+
53
+ +:substrings+ let you search inside the words. For example, the word +knowledge+ can be found with +owled+ if you filter the content with +:substrings+.
54
+
55
+ +:strip_html+ elimina las etiquetas HTML y traduce las entities a su versión en UTF-8. Por ejemplo
56
+ +:strip_html+ removes the HTML tags and it translates HTML entities to its equivalent in UTF-8:
57
+
58
+ Ir a <a href="http://www.google.es">Google Espa&ntilde;a</a>
59
+
60
+ will be
61
+
62
+ Ir a Google España
63
+
64
+
65
+ == Search
66
+
67
+ Para realizar las búsquedas hay que usar el scope #search_text
68
+ To perform search you have to use the scope #search_text
69
+
70
+ Post.search_text("foo")
71
+
72
+ Post.published.search_text("foo & bar").paginate(:page => params[:page])
73
+
74
+ == TODO
75
+
76
+ * Query builder. Add & and | operators
77
+ * Option :language in #search_text
78
+ * RDoc-ize methods
data/Rakefile ADDED
@@ -0,0 +1,36 @@
1
+
2
+ APPLICATION_NAME = "temp_make_text_search_tests"
3
+
4
+ task :default => "test:pg"
5
+
6
+ namespace :test do
7
+
8
+ def run_tests(adapter)
9
+ Process.waitpid(fork do
10
+ ENV["MTS_GEM_PATH"] = File.dirname(__FILE__)
11
+
12
+ Dir.chdir "/tmp"
13
+ system "rm", "-fr", APPLICATION_NAME if File.directory?(APPLICATION_NAME)
14
+ system "rails", "new", APPLICATION_NAME, "-d", adapter, "-G", "-J", "-m", "#{ENV["MTS_GEM_PATH"]}/test/app_template.rb"
15
+
16
+ Dir.chdir "/tmp/#{APPLICATION_NAME}"
17
+ exec "rake", "RAILS_ENV=test", "db:create", "db:migrate", "test", "db:drop"
18
+ end)
19
+ end
20
+
21
+ desc "Run tests againts PostgreSQL"
22
+ task :pg do
23
+ run_tests "postgresql"
24
+ end
25
+
26
+ #desc "Run tests againts SQLite3"
27
+ #task :sqlite3 do
28
+ # run_tests "sqlite3"
29
+ #end
30
+
31
+ #desc "Run tests againts MySQL"
32
+ #task :mysql do
33
+ # run_tests "mysql"
34
+ #end
35
+
36
+ end
@@ -0,0 +1,15 @@
1
+ require 'rails/generators/active_record'
2
+
3
+ module TextSearch
4
+ class MigrationGenerator < ActiveRecord::Generators::Base
5
+ argument :name, :type => :string, :default => "add_text_search_documents_table"
6
+
7
+ source_root File.join(File.dirname(__FILE__), "templates")
8
+
9
+ def create_migration_file
10
+ migration_template "migration.rb", "db/migrate/#{file_name}.rb"
11
+ end
12
+
13
+ end
14
+ end
15
+
@@ -0,0 +1,9 @@
1
+ class <%= migration_class_name %> < ActiveRecord::Migration
2
+ def self.up
3
+ create_text_search_documents_table <%= Rails.application.config.make_text_search.table_name.inspect %>
4
+ end
5
+
6
+ def self.down
7
+ drop_table <%= Rails.application.config.make_text_search.table_name.inspect %>
8
+ end
9
+ end
@@ -0,0 +1,65 @@
1
+ module MakeTextSearch
2
+ module Adapters
3
+ class PostgreSQL
4
+
5
+ attr_reader :connection
6
+
7
+ def self.is_available?(connection)
8
+ begin
9
+ connection.select_one("select to_tsquery('test') as query").has_key?("query")
10
+ rescue ActiveRecord::StatementInvalid
11
+ false
12
+ end
13
+ end
14
+
15
+ def initialize(connection)
16
+ @connection = connection
17
+ end
18
+
19
+ # Schema actions
20
+
21
+ def create_text_search_documents_table(table_name)
22
+ connection.execute %[CREATE TABLE #{table_name} (record_type varchar(300) NOT NULL, record_id integer NOT NULL, language varchar(20), document tsvector)]
23
+ connection.execute %[CREATE INDEX #{table_name}_idx ON #{table_name} USING gin(document)]
24
+ end
25
+
26
+
27
+ # Document actions
28
+
29
+ def quote(value)
30
+ @connection.quote value
31
+ end
32
+
33
+ def update_document(record)
34
+
35
+ record_class = record.class
36
+ return if record_class.text_search_fields.empty?
37
+
38
+ table_name = Rails.application.config.make_text_search.table_name
39
+ record_type = quote record_class.name
40
+ record_id = quote record.id
41
+ language = quote record.text_search_language
42
+
43
+ document = "to_tsvector(#{language}, #{quote record.text_search_build_document})"
44
+
45
+ if connection.select_value("SELECT count(*) FROM #{table_name} WHERE #{_where_record record}").to_i == 0
46
+ connection.insert(%[INSERT INTO #{table_name}
47
+ (record_type, record_id, language, document)
48
+ VALUES
49
+ (#{record_type}, #{record_id}, #{language}, #{document})])
50
+ else
51
+ connection.update(%[UPDATE #{table_name} SET document = #{document} WHERE #{_where_record record}])
52
+ end
53
+ end
54
+
55
+ def remove_document(record)
56
+ connection.delete "DELETE FROM #{Rails.application.config.make_text_search.table_name} WHERE #{_where_record record}"
57
+ end
58
+
59
+ def _where_record(record)
60
+ "record_type = #{quote record.class.name} AND record_id = #{quote record.id}"
61
+ end
62
+
63
+ end
64
+ end
65
+ end
@@ -0,0 +1,39 @@
1
+ module MakeTextSearch
2
+
3
+ module SubstringsFilter
4
+ extend self
5
+
6
+ def substrings(word, min_length = 3)
7
+ results = []
8
+ for starts in 0..word.size
9
+ started = word[starts..-1]
10
+ for ends in min_length..started.size
11
+ results << word[starts, ends]
12
+ end
13
+ end
14
+
15
+ results
16
+ end
17
+
18
+ def apply_filter(record, value)
19
+ value.gsub(/(\S+)/) { substrings($1).join(" ") }
20
+ end
21
+ end
22
+
23
+ module StripHtmlFilter
24
+ extend self
25
+
26
+ def translate_html_entities!(value)
27
+ # http://gist.github.com/582351
28
+ @entities_map ||= File.read("#{File.dirname(__FILE__)}/html_entities.dat").split("\0").inject({}) {|hash, line| line = line.split(" ", 2); hash[line[0]] = line[1]; hash };
29
+
30
+ value.gsub!(/&(\w+);/) { @entities_map[$1] || $1 } or value
31
+ end
32
+
33
+ def apply_filter(record, value)
34
+ # TODO extracts the content for some attributes like alt, title and longdesc
35
+ translate_html_entities! value.gsub(/<[^>]*>/, "")
36
+ end
37
+ end
38
+
39
+ end
@@ -0,0 +1,65 @@
1
+ module MakeTextSearch
2
+
3
+ module ModelHelpers
4
+ extend ActiveSupport::Concern
5
+
6
+ included do
7
+ class_inheritable_array :text_search_fields
8
+ self.text_search_fields = []
9
+
10
+ after_save :text_search_update_document
11
+ after_destroy :text_search_remove_document
12
+ end
13
+
14
+ module ClassMethods
15
+ def has_text_search(*fields)
16
+ options = fields.extract_options!
17
+ options.assert_valid_keys :filter
18
+
19
+ if options[:filter]
20
+ options[:filter] = [options[:filter]].flatten.map! {|filter_name| "make_text_search/#{filter_name}_filter".camelize.constantize }
21
+ end
22
+
23
+ fields.each do |field|
24
+ self.text_search_fields.push([field, options])
25
+ end
26
+ end
27
+
28
+ def search_text(query)
29
+ where MakeTextSearch.build_condition(self, query)
30
+ end
31
+ end
32
+
33
+ def text_search_update_document
34
+ if not self.class.text_search_fields.empty?
35
+ self.class.connection.text_search_adapter.update_document self
36
+ end
37
+ end
38
+
39
+ def text_search_remove_document
40
+ if not self.class.text_search_fields.empty?
41
+ self.class.connection.text_search_adapter.remove_document self
42
+ end
43
+ end
44
+
45
+ def text_search_language
46
+ Rails.application.config.make_text_search.default_language
47
+ end
48
+
49
+ def text_search_build_document
50
+ self.class.text_search_fields.map do |ts_field|
51
+ field_name, options = ts_field
52
+
53
+ if value = send(field_name)
54
+ value = value.to_s
55
+
56
+ if filters = options[:filter]
57
+ filters.each {|f| value = f.apply_filter(self, value) }
58
+ end
59
+
60
+ value
61
+ end
62
+ end.compact.join(" ")
63
+ end
64
+ end
65
+ end
@@ -0,0 +1,8 @@
1
+ class <<MakeTextSearch
2
+ def build_condition(model, query)
3
+ # TODO use has_text_search_for_postgresql? to implement all the backends
4
+
5
+ db_connection = model.connection
6
+ %[#{model.table_name}.id IN (SELECT record_id FROM #{Rails.application.config.make_text_search.table_name} WHERE record_type = #{db_connection.quote model.name} AND document @@ to_tsquery(#{db_connection.quote query}))]
7
+ end
8
+ end
@@ -0,0 +1,42 @@
1
+
2
+ require "make-text-search/adapters/postgresql_ts"
3
+
4
+ module MakeTextSearch
5
+ module ConnectionAdapterHelpers
6
+
7
+ # Returns the MakeTextSearch adapter for the current connection
8
+ def text_search_adapter
9
+ @text_search_adapter ||=
10
+ begin
11
+ # TODO Use a list adapter
12
+ if Adapters::PostgreSQL.is_available?(self)
13
+ Adapters::PostgreSQL.new(self)
14
+ else
15
+ # TODO Generic implementation (for SQLite3, etc)
16
+ raise NotImplementedError, "There is no adapter for #{self}"
17
+ end
18
+ end
19
+ end
20
+
21
+ # Create the text_search_documents table
22
+ def create_text_search_documents_table(table_name)
23
+ text_search_adapter.create_text_search_documents_table table_name
24
+ end
25
+ end
26
+
27
+ module SchemaDumperHelpers #:nodoc:
28
+ def self.included(cls)
29
+ cls.alias_method_chain :table, :make_text_search
30
+ end
31
+
32
+ def table_with_make_text_search(table, stream)
33
+ if table.to_s == Rails.application.config.make_text_search.table_name.to_s
34
+ stream.puts " create_text_search_documents_table #{table.inspect}"
35
+ else
36
+ table_without_make_text_search(table, stream)
37
+ end
38
+
39
+ stream
40
+ end
41
+ end
42
+ end
@@ -0,0 +1,23 @@
1
+ module MakeTextSearch
2
+
3
+ ActiveSupport.on_load(:active_record) do
4
+ require 'make-text-search/models'
5
+ require 'make-text-search/query'
6
+ require 'make-text-search/schema'
7
+ require 'make-text-search/filters'
8
+
9
+ include ModelHelpers
10
+ class ActiveRecord::ConnectionAdapters::AbstractAdapter; include ConnectionAdapterHelpers; end
11
+ class ActiveRecord::SchemaDumper; include SchemaDumperHelpers; end
12
+ end
13
+
14
+ class Railtie < ::Rails::Railtie
15
+ config.make_text_search = ActiveSupport::OrderedOptions.new
16
+ config.make_text_search.table_name = "text_search_documents"
17
+ config.make_text_search.default_language = "english"
18
+
19
+ generators do
20
+ load "generators/migration.rb"
21
+ end
22
+ end
23
+ end
@@ -0,0 +1,25 @@
1
+
2
+ if db_user = ENV["DB_USER"]
3
+ require 'yaml'
4
+ database = YAML.load(File.read("config/database.yml"))
5
+ database.values.each do |connection|
6
+ connection["username"] = ENV["DB_USER"]
7
+ end
8
+ File.open("config/database.yml", "w") {|f| f.write database.to_yaml }
9
+ end
10
+
11
+ gem "make-text-search", :path => ENV["MTS_GEM_PATH"]
12
+
13
+ generate "text_search:migration"
14
+ generate "model", "Post", "title:string", "content:text"
15
+
16
+ File.open("app/models/post.rb", "a") do |f|
17
+ f.puts "\nPost.has_text_search :title, :filter => :substrings"
18
+ f.puts "Post.has_text_search :content, :filter => [:substrings, :strip_html]"
19
+ end
20
+
21
+ # Copy tests from the gem
22
+ Dir["#{ENV["MTS_GEM_PATH"]}/test/*_test.rb"].each do |filename|
23
+ file "test/unit/#{File.basename filename}", File.read(filename)
24
+ end
25
+
@@ -0,0 +1,65 @@
1
+ # coding: utf-8
2
+
3
+ require 'test_helper'
4
+
5
+ class BasicsTest < ActiveSupport::TestCase
6
+
7
+ def query(words, expected_ids)
8
+ assert_equal expected_ids.sort, Post.search_text(words).map {|post| post.id }.sort, "The query #{words.inspect} expects #{expected_ids.inspect}"
9
+ end
10
+
11
+ test "the filter can be used after create some records" do
12
+ first = Post.create!(:title => "First post", :content => "111 222").id
13
+ second = Post.create!(:title => "Second post", :content => "random text").id
14
+
15
+ query "post", [first, second]
16
+ query "First", [first]
17
+ query "first", [first]
18
+ query "random", [second]
19
+ query "0", []
20
+ query "11", []
21
+
22
+ # With substrings. Filter has be to activated in the model (see test/app_template.rb)
23
+ query "econ", [second]
24
+ query "andom", [second]
25
+ end
26
+
27
+ test "search_text can be re-scoped" do
28
+ 20.times { Post.create!(:title => "The same post", :content => "somethingwithoutsense") }
29
+
30
+ assert_equal 20, Post.search_text("somethingwithoutsense").count
31
+ assert_equal 2, Post.search_text("somethingwithoutsense").limit(2).count
32
+ assert_equal "The same post", Post.search_text("somethingwithoutsense").last.title
33
+ end
34
+
35
+ test "the index is updated after change the posts" do
36
+ colors = {}
37
+ %w(red green blue black white orange).each {|color| colors[color] = Post.create!(:title => "Post #{color}") }
38
+
39
+ query "red", [colors["red"].id]
40
+ assert_equal Post.search_text("post").count, colors.size
41
+
42
+ colors["red"].destroy
43
+
44
+ query "red", []
45
+ assert_equal Post.search_text("post").count, colors.size - 1
46
+
47
+ colors["black"].update_attribute :content, "Old red"
48
+ query "red", [colors["black"].id]
49
+ end
50
+
51
+ test "html entities are translated with filters" do
52
+
53
+ begin
54
+ old_default_language = Rails.application.config.make_text_search.default_language
55
+ Rails.application.config.make_text_search.default_language = "spanish"
56
+
57
+ spain = Post.create!(:content => "Pa&iacute;s Espa&ntilde;a")
58
+ query "país", [spain.id]
59
+ query "españa", [spain.id]
60
+ ensure
61
+ Rails.application.config.make_text_search.default_language = old_default_language
62
+ end
63
+ end
64
+
65
+ end
@@ -0,0 +1,11 @@
1
+ require 'test_helper'
2
+
3
+ class SchemaTest < ActiveSupport::TestCase
4
+ test "schema has to include the call for create_text_search_documents_table" do
5
+ assert_match /^\s*create_text_search_documents_table/, Rails.root.join("db/schema.rb").read
6
+ end
7
+
8
+ test "documents table has to be created" do
9
+ assert_operator ActiveRecord::Base.connection.tables, "include?", Rails.application.config.make_text_search.table_name
10
+ end
11
+ end
metadata ADDED
@@ -0,0 +1,79 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: make-text-search
3
+ version: !ruby/object:Gem::Version
4
+ hash: 9
5
+ prerelease: false
6
+ segments:
7
+ - 0
8
+ - 1
9
+ version: "0.1"
10
+ platform: ruby
11
+ authors:
12
+ - Ayose Cazorla
13
+ autorequire:
14
+ bindir: bin
15
+ cert_chain: []
16
+
17
+ date: 2010-10-13 00:00:00 +01:00
18
+ default_executable:
19
+ dependencies: []
20
+
21
+ description: Some RDBMS (like PostgreSQL 8.3 and newer) implement full text search directly, so you don't need external tools. This Rails plugin gives that functionality in a generic way.
22
+ email: ayosec@gmail.com
23
+ executables: []
24
+
25
+ extensions: []
26
+
27
+ extra_rdoc_files: []
28
+
29
+ files:
30
+ - lib/make-text-search.rb
31
+ - lib/generators/templates/migration.rb
32
+ - lib/generators/migration.rb
33
+ - lib/make-text-search/query.rb
34
+ - lib/make-text-search/html_entities.dat
35
+ - lib/make-text-search/models.rb
36
+ - lib/make-text-search/filters.rb
37
+ - lib/make-text-search/adapters/postgresql_ts.rb
38
+ - lib/make-text-search/schema.rb
39
+ - test/schema_test.rb
40
+ - test/app_template.rb
41
+ - test/basics_test.rb
42
+ - Rakefile
43
+ - README.rdoc
44
+ has_rdoc: true
45
+ homepage: http://github.com/setepo/make_text_search
46
+ licenses: []
47
+
48
+ post_install_message:
49
+ rdoc_options: []
50
+
51
+ require_paths:
52
+ - lib
53
+ required_ruby_version: !ruby/object:Gem::Requirement
54
+ none: false
55
+ requirements:
56
+ - - ">="
57
+ - !ruby/object:Gem::Version
58
+ hash: 3
59
+ segments:
60
+ - 0
61
+ version: "0"
62
+ required_rubygems_version: !ruby/object:Gem::Requirement
63
+ none: false
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ hash: 3
68
+ segments:
69
+ - 0
70
+ version: "0"
71
+ requirements: []
72
+
73
+ rubyforge_project:
74
+ rubygems_version: 1.3.7
75
+ signing_key:
76
+ specification_version: 3
77
+ summary: Adapts the native Full Text Search of the RDBMS
78
+ test_files: []
79
+