doppel 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 3b290250de01378cd85bbfa84e6d88dea131f38e
4
+ data.tar.gz: 544c2cd1124f92bc52f5bea8bf168faadb57149f
5
+ SHA512:
6
+ metadata.gz: 4ed29bd8764e81f9b8e264e06712e965143958f104ca24ed7808e4d4e05765643106379661f11e6da638dd126821fcbe68d1fd0716c83aeed9033bbb9390e47e
7
+ data.tar.gz: a173fcd541e8d22f13d90ec35fad7e0cc2082f2ac294ad8addd0615eb5ebb13d956c76faf693af6df1a492198e9da9184f9fd782ba8adf5b7b3335b1edc784ce
data/MIT-LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright 2015 Dale Stevens
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,78 @@
1
+ # Doppel [![Build Status](https://secure.travis-ci.org/TwilightCoders/doppel.png)](http://travis-ci.org/TwilightCoders/doppel) [![Gem Version](https://badge.fury.io/rb/doppel.png)](http://badge.fury.io/rb/doppel) [![Code Climate](https://codeclimate.com/github/TwilightCoders/doppel.png)](https://codeclimate.com/github/TwilightCoders/doppel) [![Dependency Status](https://gemnasium.com/TwilightCoders/doppel.svg)](https://gemnasium.com/TwilightCoders/doppel)
2
+
3
+ ## Description
4
+
5
+ This `acts_as` extension provides the ability to find similar records matching a set of specified fuzzy matched fields.
6
+
7
+ ## Requirements
8
+
9
+ The database engine currently must impliment the [`levenshtein`](http://www.postgresql.org/docs/9.4/static/fuzzystrmatch.html) extension/feature. I believe this is currently limited to PostgreSQL.
10
+
11
+ ## Installation
12
+
13
+ In your Gemfile:
14
+
15
+ gem 'doppel'
16
+
17
+ Or, from the command line:
18
+
19
+ gem install doppel
20
+
21
+ ## Example
22
+
23
+ To use, call `has_many_doppels` method in the model:
24
+
25
+ ```ruby
26
+ class Company < ActiveRecord::Base
27
+ has_many_doppels [:name]
28
+ end
29
+
30
+ [1] pry(main)> duplicates = Company.with_name_doppels
31
+ Company Load (89586.2ms) SELECT DISTINCT companies.* FROM companies, companies as doppel_companies WHERE (companies.id != doppel_companies.id) AND (levenshtein(companies.name, doppel_companies.name) < 2) GROUP BY companies.id HAVING COUNT(doppel_companies.id) > 0
32
+ Company Load (89586.2ms) SELECT DISTINCT companies.* FROM companies, companies as doppel_companies WHERE (companies.id != doppel_companies.id) AND (levenshtein(companies.name, doppel_companies.name) < 2) GROUP BY companies.id HAVING COUNT(doppel_companies.id) > 0
33
+ => [#<Company:0x007fe486a84a18 id: 4332, type: "Company", name: "Basically Infinity", created_at: ..., updated_at: ... >,
34
+ #<Company:0x007fe485e70010 id: 5179, type: "Company", name: "Stucco", created_at: ..., updated_at: ... >,
35
+ #<Company:0x007fe485e6b5d8 id: 3234, type: "Company", name: "ÜSER", created_at: ..., updated_at: ... >,
36
+ #<Company:0x007fe485e6aae8 id: 8456, type: "Company", name: "Orange", created_at: ..., updated_at: ... >]
37
+
38
+ # From here you'll be able to do the following.
39
+
40
+ [2] pry(main)> duplicates.first.name_doppels
41
+ Company Load (13.1ms) SELECT "companies".* FROM "companies" WHERE (levenshtein('Basically Infinity', companies.name) < 2) AND ("companies"."id" != $1) [["id", 4332]]
42
+ Company Load (13.1ms) SELECT "companies".* FROM "companies" WHERE (levenshtein('Basically Infinity', companies.name) < 2) AND ("companies"."id" != $1) [["id", 4332]]
43
+ => [#<Company:0x007fe483464a00 id: 4057, type: "Company", name: "Interim Physicians", created_at: ..., updated_at: ... >]
44
+
45
+ ```
46
+
47
+ ## Instance Methods Added To ActiveRecord Models
48
+
49
+ You'll have a number of methods added to each instance of the ActiveRecord model that to which `doppel` is added.
50
+
51
+ These depend on how `has_many_doppels` is configured. For each field that is supplied, you'll have
52
+
53
+ `#{field_name}_doppels`
54
+
55
+ in addition to
56
+
57
+ `doppels`
58
+
59
+ which will return any matching duplicates from all fields supplied.
60
+
61
+ ## The Future
62
+
63
+ Currently `doppel` only supports `levenshtein` text searching. I have tentative plans to impliment a more modular framework and support other field types such as dates or numerics.
64
+
65
+ ## Contributing to `doppel`
66
+
67
+ - Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
68
+ - Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
69
+ - Fork the project
70
+ - Start a feature/bugfix branch
71
+ - Commit and push until you are happy with your contribution
72
+ - Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
73
+ - Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
74
+ - I would recommend using Rails 3.1.x and higher for testing the build before a pull request. The current test harness does not quite work with 3.0.x. The plugin itself works, but the issue lies with testing infrastructure.
75
+
76
+ ## Copyright
77
+
78
+ Copyright (c) 2015 Dale Stevens, released under the MIT license
data/Rakefile ADDED
@@ -0,0 +1,30 @@
1
+ begin
2
+ require 'bundler/setup'
3
+ rescue LoadError
4
+ puts 'You must `gem install bundler` and `bundle install` to run rake tasks'
5
+ end
6
+
7
+ require 'rdoc/task'
8
+
9
+ RDoc::Task.new(:rdoc) do |rdoc|
10
+ rdoc.rdoc_dir = 'rdoc'
11
+ rdoc.title = 'Doppel'
12
+ rdoc.options << '--line-numbers'
13
+ rdoc.rdoc_files.include('README.md')
14
+ rdoc.rdoc_files.include('lib/**/*.rb')
15
+ end
16
+
17
+
18
+ Bundler::GemHelper.install_tasks
19
+
20
+ require 'rake/testtask'
21
+
22
+ Rake::TestTask.new(:test) do |t|
23
+ t.libs << 'lib'
24
+ t.libs << 'test'
25
+ t.pattern = 'test/**/*_test.rb'
26
+ t.verbose = false
27
+ end
28
+
29
+
30
+ task default: :test
data/lib/doppel.rb ADDED
@@ -0,0 +1,6 @@
1
+ require 'doppel/acts_as_doppel'
2
+
3
+ module Doppel
4
+ end
5
+
6
+ ActiveRecord::Base.send :include, Doppel::ActsAsDoppel
@@ -0,0 +1,52 @@
1
+ module Doppel
2
+ module ActsAsDoppel
3
+ extend ActiveSupport::Concern
4
+
5
+ included do
6
+ end
7
+
8
+ module ClassMethods
9
+ def has_many_doppels(fields = [], options = {})
10
+ options = Doppel::ActsAsDoppel.merge_default_options(options)
11
+ other_table_name = "doppel_#{table_name}"
12
+ table_name_key = "#{table_name}.#{primary_key}"
13
+ other_table_name_key = "#{other_table_name}.#{primary_key}"
14
+ fields.each do |field|
15
+ class_eval <<-END, __FILE__, __LINE__ + 1
16
+ scope "with_#{field}_doppels", lambda { |count = #{options[:count]}, lev = #{options[:sensitivity]}|
17
+ select("DISTINCT #{table_name}.*")
18
+ .where("#{table_name_key} != #{other_table_name_key}")
19
+ .where("levenshtein(#{table_name}.#{field}, #{other_table_name}.#{field}) < \#\{lev\}")
20
+ .from("#{table_name}, #{table_name} as #{other_table_name}")
21
+ .group(table_name_key).having("COUNT(#{other_table_name_key}) > \#\{count\}")
22
+ }
23
+
24
+ def #{field}_doppels
25
+ @#{field}_doppels ||= self.where("levenshtein(\#\{self.sanitize(#{field})\}, #{table_name}.#{field}) < 2").where.not(#{primary_key}: #{primary_key})
26
+ end
27
+ END
28
+ # scope "with_#{field}_doppels", lambda { |count = options[:count], lev = options[:sensitivity]|
29
+ # select("DISTINCT #{table_name}.*")
30
+ # .where("#{table_name_key} != #{other_table_name_key}")
31
+ # .where("levenshtein(#{table_name}.#{field}, #{other_table_name}.#{field}) < #{lev}")
32
+ # .from("#{table_name}, #{table_name} as #{other_table_name}")
33
+ # .group(table_name_key).having("COUNT(#{other_table_name_key}) > #{count}")
34
+ # }
35
+ end
36
+
37
+ # scope 'with_any_doppels', lambda { |count = options[:count], lev = options[:sensitivity]|
38
+ # fields.inject(self) { |query, field| query.merge(send("with_#{field}_doppels")) }
39
+ # }
40
+
41
+ # alias_method 'with_dopels', "with_#{fields.first}_doppels" if fields.one?
42
+ end
43
+ end
44
+
45
+ def self.merge_default_options(options = {})
46
+ {
47
+ sensitivity: 2,
48
+ count: 0
49
+ }.merge(options)
50
+ end
51
+ end
52
+ end
@@ -0,0 +1,3 @@
1
+ module Doppel
2
+ VERSION = '0.0.1'
3
+ end
@@ -0,0 +1,4 @@
1
+ # desc "Explaining what the task does"
2
+ # task :doppel do
3
+ # # Task goes here
4
+ # end
@@ -0,0 +1,7 @@
1
+ require 'test_helper'
2
+
3
+ class DoppelTest < ActiveSupport::TestCase
4
+ test "truth" do
5
+ assert_kind_of Module, Doppel
6
+ end
7
+ end
@@ -0,0 +1,19 @@
1
+ # Configure Rails Environment
2
+ ENV["RAILS_ENV"] = "test"
3
+
4
+ require File.expand_path("../../test/dummy/config/environment.rb", __FILE__)
5
+ ActiveRecord::Migrator.migrations_paths = [File.expand_path("../../test/dummy/db/migrate", __FILE__)]
6
+ require "rails/test_help"
7
+
8
+ # Filter out Minitest backtrace while allowing backtrace from other libraries
9
+ # to be shown.
10
+ Minitest.backtrace_filter = Minitest::BacktraceFilter.new
11
+
12
+ # Load support files
13
+ Dir["#{File.dirname(__FILE__)}/support/**/*.rb"].each { |f| require f }
14
+
15
+ # Load fixtures from the engine
16
+ if ActiveSupport::TestCase.respond_to?(:fixture_path=)
17
+ ActiveSupport::TestCase.fixture_path = File.expand_path("../fixtures", __FILE__)
18
+ ActiveSupport::TestCase.fixtures :all
19
+ end
metadata ADDED
@@ -0,0 +1,83 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: doppel
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Dale Stevens
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-08-27 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rails
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '4.2'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '4.2'
27
+ - !ruby/object:Gem::Dependency
28
+ name: sqlite3
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ description: Provides convenient scopes for finding duplicate records.
42
+ email:
43
+ - dale@twilightcoders.net
44
+ executables: []
45
+ extensions: []
46
+ extra_rdoc_files: []
47
+ files:
48
+ - MIT-LICENSE
49
+ - README.md
50
+ - Rakefile
51
+ - lib/doppel.rb
52
+ - lib/doppel/acts_as_doppel.rb
53
+ - lib/doppel/version.rb
54
+ - lib/tasks/doppel_tasks.rake
55
+ - test/doppel_test.rb
56
+ - test/test_helper.rb
57
+ homepage: http://github.com/TwilightCoders/doppel
58
+ licenses:
59
+ - MIT
60
+ metadata: {}
61
+ post_install_message:
62
+ rdoc_options: []
63
+ require_paths:
64
+ - lib
65
+ required_ruby_version: !ruby/object:Gem::Requirement
66
+ requirements:
67
+ - - ">="
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ required_rubygems_version: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ requirements: []
76
+ rubyforge_project:
77
+ rubygems_version: 2.4.5
78
+ signing_key:
79
+ specification_version: 4
80
+ summary: Provides convenient scopes for finding duplicate records.
81
+ test_files:
82
+ - test/doppel_test.rb
83
+ - test/test_helper.rb