data_cleansing 0.9.0 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
 - data/README.md +3 -25
 - data/Rakefile +4 -10
 - data/lib/data_cleansing/cleaners.rb +14 -4
 - data/lib/data_cleansing/cleanse.rb +26 -44
 - data/lib/data_cleansing/data_cleansing.rb +18 -0
 - data/lib/data_cleansing/version.rb +1 -1
 - data/test/active_record_test.rb +11 -5
 - data/test/cleaners_test.rb +17 -5
 - data/test/data_cleansing_test.rb +9 -0
 - data/test/ruby_test.rb +1 -0
 - data/test/test_db.sqlite3 +0 -0
 - data/test/test_helper.rb +0 -1
 - metadata +4 -2
 
    
        checksums.yaml
    CHANGED
    
    | 
         @@ -1,7 +1,7 @@ 
     | 
|
| 
       1 
1 
     | 
    
         
             
            ---
         
     | 
| 
       2 
2 
     | 
    
         
             
            SHA1:
         
     | 
| 
       3 
     | 
    
         
            -
              metadata.gz:  
     | 
| 
       4 
     | 
    
         
            -
              data.tar.gz:  
     | 
| 
      
 3 
     | 
    
         
            +
              metadata.gz: 0347620583101155e6181d7e2504ef2a4816d970
         
     | 
| 
      
 4 
     | 
    
         
            +
              data.tar.gz: b80bff6ab7116bda3cef959a7eeb6c231ab3660b
         
     | 
| 
       5 
5 
     | 
    
         
             
            SHA512:
         
     | 
| 
       6 
     | 
    
         
            -
              metadata.gz:  
     | 
| 
       7 
     | 
    
         
            -
              data.tar.gz:  
     | 
| 
      
 6 
     | 
    
         
            +
              metadata.gz: 474e0ed54427a7958358a1d645d95792af3b83f0621e48c1986d09ecfd1f8288aada4e6ee55573f88347eb7193adf0eddde1b7cb39c110c6a84a1c5f43daae19
         
     | 
| 
      
 7 
     | 
    
         
            +
              data.tar.gz: deb42d04fa24cf7b3e77d8411989e2c8578bf8c23b9f8726898df640e93e271071e0cccd96850eb342780737894823ccd295032a24bb22fdc16678a71b55ddd1
         
     | 
    
        data/README.md
    CHANGED
    
    | 
         @@ -1,7 +1,7 @@ 
     | 
|
| 
       1 
1 
     | 
    
         
             
            data_cleansing
         
     | 
| 
       2 
2 
     | 
    
         
             
            ==============
         
     | 
| 
       3 
3 
     | 
    
         | 
| 
       4 
     | 
    
         
            -
            Data Cleansing framework for Ruby 
     | 
| 
      
 4 
     | 
    
         
            +
            Data Cleansing framework for Ruby.
         
     | 
| 
       5 
5 
     | 
    
         | 
| 
       6 
6 
     | 
    
         
             
            * http://github.com/reidmorrison/data_cleansing
         
     | 
| 
       7 
7 
     | 
    
         | 
| 
         @@ -12,12 +12,8 @@ or trailing blanks and even newlines. 
     | 
|
| 
       12 
12 
     | 
    
         
             
            Similarly it would be useful to be able to attach a cleansing solution to a field
         
     | 
| 
       13 
13 
     | 
    
         
             
            in a model and have the data cleansed transparently when required.
         
     | 
| 
       14 
14 
     | 
    
         | 
| 
       15 
     | 
    
         
            -
            DataCleansing is a framework that allows  
     | 
| 
       16 
     | 
    
         
            -
            specific attributes or fields. 
     | 
| 
       17 
     | 
    
         
            -
            solutions themselves since they are usually straight forward, or so complex
         
     | 
| 
       18 
     | 
    
         
            -
            that they don't tend to be too useful to others. However, over time built-in
         
     | 
| 
       19 
     | 
    
         
            -
            cleansing solutions may be added. Feel free to submit any suggestions via a ticket
         
     | 
| 
       20 
     | 
    
         
            -
            or pull request.
         
     | 
| 
      
 15 
     | 
    
         
            +
            DataCleansing is a framework that allows data cleansing to be applied to
         
     | 
| 
      
 16 
     | 
    
         
            +
            specific attributes or fields.
         
     | 
| 
       21 
17 
     | 
    
         | 
| 
       22 
18 
     | 
    
         
             
            ## Features
         
     | 
| 
       23 
19 
     | 
    
         | 
| 
         @@ -297,24 +293,6 @@ Install the Gem with bundler 
     | 
|
| 
       297 
293 
     | 
    
         | 
| 
       298 
294 
     | 
    
         
             
                bundle install
         
     | 
| 
       299 
295 
     | 
    
         | 
| 
       300 
     | 
    
         
            -
            ## Architecture
         
     | 
| 
       301 
     | 
    
         
            -
             
     | 
| 
       302 
     | 
    
         
            -
            DataCleansing has been designed to support externalized data cleansing routines.
         
     | 
| 
       303 
     | 
    
         
            -
            In this way the data cleansing routine itself can be loaded from a datastore and
         
     | 
| 
       304 
     | 
    
         
            -
            applied dynamically at runtime.
         
     | 
| 
       305 
     | 
    
         
            -
            Although not supported out of the box, this design allows for example for the
         
     | 
| 
       306 
     | 
    
         
            -
            data cleansing routines to be stored in something like [ZooKeeper](http://zookeeper.apache.org/).
         
     | 
| 
       307 
     | 
    
         
            -
            Then any changes to the data cleansing routines can be pushed out immediately to
         
     | 
| 
       308 
     | 
    
         
            -
            every server that needs it.
         
     | 
| 
       309 
     | 
    
         
            -
             
     | 
| 
       310 
     | 
    
         
            -
            DataCleansing is designed to support any Ruby model. In this way it can be used
         
     | 
| 
       311 
     | 
    
         
            -
            in just about any ORM or DOM. For example, it currently easily supports both
         
     | 
| 
       312 
     | 
    
         
            -
            Rails and Mongoid models. Some extensions have been added to support these frameworks.
         
     | 
| 
       313 
     | 
    
         
            -
             
     | 
| 
       314 
     | 
    
         
            -
            For example, in Rails it obtains the raw data value before Rails has converted it.
         
     | 
| 
       315 
     | 
    
         
            -
            Which is useful for cleansing integer or float fields as raw strings before Rails
         
     | 
| 
       316 
     | 
    
         
            -
            tries to convert it to an integer or float.
         
     | 
| 
       317 
     | 
    
         
            -
             
     | 
| 
       318 
296 
     | 
    
         
             
            ## Dependencies
         
     | 
| 
       319 
297 
     | 
    
         | 
| 
       320 
298 
     | 
    
         
             
            DataCleansing requires the following dependencies
         
     | 
    
        data/Rakefile
    CHANGED
    
    | 
         @@ -1,6 +1,4 @@ 
     | 
|
| 
       1 
     | 
    
         
            -
            require 'rake/clean'
         
     | 
| 
       2 
1 
     | 
    
         
             
            require 'rake/testtask'
         
     | 
| 
       3 
     | 
    
         
            -
             
     | 
| 
       4 
2 
     | 
    
         
             
            require_relative 'lib/data_cleansing/version'
         
     | 
| 
       5 
3 
     | 
    
         | 
| 
       6 
4 
     | 
    
         
             
            task :gem do
         
     | 
| 
         @@ -14,14 +12,10 @@ task publish: :gem do 
     | 
|
| 
       14 
12 
     | 
    
         
             
              system "rm data_cleansing-#{DataCleansing::VERSION}.gem"
         
     | 
| 
       15 
13 
     | 
    
         
             
            end
         
     | 
| 
       16 
14 
     | 
    
         | 
| 
       17 
     | 
    
         
            -
             
     | 
| 
       18 
     | 
    
         
            -
             
     | 
| 
       19 
     | 
    
         
            -
               
     | 
| 
       20 
     | 
    
         
            -
             
     | 
| 
       21 
     | 
    
         
            -
                t.verbose    = true
         
     | 
| 
       22 
     | 
    
         
            -
              end
         
     | 
| 
       23 
     | 
    
         
            -
             
     | 
| 
       24 
     | 
    
         
            -
              Rake::Task['functional'].invoke
         
     | 
| 
      
 15 
     | 
    
         
            +
            Rake::TestTask.new(:test) do |t|
         
     | 
| 
      
 16 
     | 
    
         
            +
              t.pattern = 'test/**/*_test.rb'
         
     | 
| 
      
 17 
     | 
    
         
            +
              t.verbose = true
         
     | 
| 
      
 18 
     | 
    
         
            +
              t.warning = true
         
     | 
| 
       25 
19 
     | 
    
         
             
            end
         
     | 
| 
       26 
20 
     | 
    
         | 
| 
       27 
21 
     | 
    
         
             
            task default: :test
         
     | 
| 
         @@ -1,4 +1,4 @@ 
     | 
|
| 
       1 
     | 
    
         
            -
            require ' 
     | 
| 
      
 1 
     | 
    
         
            +
            require 'cgi'
         
     | 
| 
       2 
2 
     | 
    
         
             
            module Cleaners
         
     | 
| 
       3 
3 
     | 
    
         
             
              # Strip leading and trailing whitespace
         
     | 
| 
       4 
4 
     | 
    
         
             
              module Strip
         
     | 
| 
         @@ -20,6 +20,16 @@ module Cleaners 
     | 
|
| 
       20 
20 
     | 
    
         
             
              end
         
     | 
| 
       21 
21 
     | 
    
         
             
              DataCleansing.register_cleaner(:upcase, Upcase)
         
     | 
| 
       22 
22 
     | 
    
         | 
| 
      
 23 
     | 
    
         
            +
              # Convert to downcase
         
     | 
| 
      
 24 
     | 
    
         
            +
              module Downcase
         
     | 
| 
      
 25 
     | 
    
         
            +
                def self.call(string)
         
     | 
| 
      
 26 
     | 
    
         
            +
                  return string unless string.is_a?(String)
         
     | 
| 
      
 27 
     | 
    
         
            +
             
     | 
| 
      
 28 
     | 
    
         
            +
                  string.downcase! || string
         
     | 
| 
      
 29 
     | 
    
         
            +
                end
         
     | 
| 
      
 30 
     | 
    
         
            +
              end
         
     | 
| 
      
 31 
     | 
    
         
            +
              DataCleansing.register_cleaner(:downcase, Downcase)
         
     | 
| 
      
 32 
     | 
    
         
            +
             
     | 
| 
       23 
33 
     | 
    
         
             
              # Remove all non-word characters, including whitespace
         
     | 
| 
       24 
34 
     | 
    
         
             
              module RemoveNonWord
         
     | 
| 
       25 
35 
     | 
    
         
             
                NOT_WORDS = Regexp.compile(/\W/)
         
     | 
| 
         @@ -44,7 +54,7 @@ module Cleaners 
     | 
|
| 
       44 
54 
     | 
    
         
             
              end
         
     | 
| 
       45 
55 
     | 
    
         
             
              DataCleansing.register_cleaner(:remove_non_printable, RemoveNonPrintable)
         
     | 
| 
       46 
56 
     | 
    
         | 
| 
       47 
     | 
    
         
            -
              #  
     | 
| 
      
 57 
     | 
    
         
            +
              # Unescape HTML Markup ( case-insensitive )
         
     | 
| 
       48 
58 
     | 
    
         
             
              module ReplaceHTMLMarkup
         
     | 
| 
       49 
59 
     | 
    
         
             
                HTML_MARKUP = Regexp.compile(/&(amp|quot|gt|lt|apos|nbsp);/in)
         
     | 
| 
       50 
60 
     | 
    
         | 
| 
         @@ -77,7 +87,7 @@ module Cleaners 
     | 
|
| 
       77 
87 
     | 
    
         
             
                def self.call(string)
         
     | 
| 
       78 
88 
     | 
    
         
             
                  return string unless string.is_a?(String)
         
     | 
| 
       79 
89 
     | 
    
         | 
| 
       80 
     | 
    
         
            -
                   
     | 
| 
      
 90 
     | 
    
         
            +
                  CGI.unescape(string)
         
     | 
| 
       81 
91 
     | 
    
         
             
                end
         
     | 
| 
       82 
92 
     | 
    
         
             
              end
         
     | 
| 
       83 
93 
     | 
    
         
             
              DataCleansing.register_cleaner(:unescape_uri, UnescapeURI)
         
     | 
| 
         @@ -86,7 +96,7 @@ module Cleaners 
     | 
|
| 
       86 
96 
     | 
    
         
             
                def self.call(string)
         
     | 
| 
       87 
97 
     | 
    
         
             
                  return string unless string.is_a?(String)
         
     | 
| 
       88 
98 
     | 
    
         | 
| 
       89 
     | 
    
         
            -
                   
     | 
| 
      
 99 
     | 
    
         
            +
                  CGI.escape(string)
         
     | 
| 
       90 
100 
     | 
    
         
             
                end
         
     | 
| 
       91 
101 
     | 
    
         
             
              end
         
     | 
| 
       92 
102 
     | 
    
         
             
              DataCleansing.register_cleaner(:escape_uri, EscapeURI)
         
     | 
| 
         @@ -7,10 +7,10 @@ module DataCleansing 
     | 
|
| 
       7 
7 
     | 
    
         
             
                module ClassMethods
         
     | 
| 
       8 
8 
     | 
    
         
             
                  # Define how to cleanse one or more attributes
         
     | 
| 
       9 
9 
     | 
    
         
             
                  def cleanse(*args)
         
     | 
| 
       10 
     | 
    
         
            -
                    last 
     | 
| 
      
 10 
     | 
    
         
            +
                    last       = args.last
         
     | 
| 
       11 
11 
     | 
    
         
             
                    attributes = args.dup
         
     | 
| 
       12 
     | 
    
         
            -
                    params 
     | 
| 
       13 
     | 
    
         
            -
                    cleaners 
     | 
| 
      
 12 
     | 
    
         
            +
                    params     = (last.is_a?(Hash) && last.instance_of?(Hash)) ? attributes.pop.dup : {}
         
     | 
| 
      
 13 
     | 
    
         
            +
                    cleaners   = Array(params.delete(:cleaner))
         
     | 
| 
       14 
14 
     | 
    
         
             
                    raise(ArgumentError, "Mandatory :cleaner parameter is missing: #{params.inspect}") unless cleaners
         
     | 
| 
       15 
15 
     | 
    
         | 
| 
       16 
16 
     | 
    
         
             
                    cleaner = DataCleansingCleaner.new(cleaners, attributes, params)
         
     | 
| 
         @@ -58,7 +58,7 @@ module DataCleansing 
     | 
|
| 
       58 
58 
     | 
    
         | 
| 
       59 
59 
     | 
    
         
             
                    # Collect parent cleaners first, starting with the top parent
         
     | 
| 
       60 
60 
     | 
    
         
             
                    cleaners = []
         
     | 
| 
       61 
     | 
    
         
            -
                    klass 
     | 
| 
      
 61 
     | 
    
         
            +
                    klass    = self
         
     | 
| 
       62 
62 
     | 
    
         
             
                    while klass != Object
         
     | 
| 
       63 
63 
     | 
    
         
             
                      if klass.respond_to?(:data_cleansing_attribute_cleaners)
         
     | 
| 
       64 
64 
     | 
    
         
             
                        cleaners += klass.data_cleansing_attribute_cleaners[:all] || []
         
     | 
| 
         @@ -66,8 +66,9 @@ module DataCleansing 
     | 
|
| 
       66 
66 
     | 
    
         
             
                      end
         
     | 
| 
       67 
67 
     | 
    
         
             
                      klass = klass.superclass
         
     | 
| 
       68 
68 
     | 
    
         
             
                    end
         
     | 
| 
       69 
     | 
    
         
            -
                     
     | 
| 
       70 
     | 
    
         
            -
                     
     | 
| 
      
 69 
     | 
    
         
            +
                    # Support Fixnum values
         
     | 
| 
      
 70 
     | 
    
         
            +
                    cleansed_value = value.is_a?(Fixnum) ? value : value.dup
         
     | 
| 
      
 71 
     | 
    
         
            +
                    cleaners.reverse_each { |cleaner| cleansed_value = data_cleansing_clean(cleaner, cleansed_value, object) if cleaner }
         
     | 
| 
       71 
72 
     | 
    
         
             
                    cleansed_value
         
     | 
| 
       72 
73 
     | 
    
         
             
                  end
         
     | 
| 
       73 
74 
     | 
    
         | 
| 
         @@ -90,33 +91,19 @@ module DataCleansing 
     | 
|
| 
       90 
91 
     | 
    
         | 
| 
       91 
92 
     | 
    
         
             
                  # Returns the supplied value cleansed using the supplied cleaner
         
     | 
| 
       92 
93 
     | 
    
         
             
                  # Parameters
         
     | 
| 
       93 
     | 
    
         
            -
                  #    
     | 
| 
      
 94 
     | 
    
         
            +
                  #   binding
         
     | 
| 
       94 
95 
     | 
    
         
             
                  #     If supplied the cleansing will be performed within the scope of
         
     | 
| 
       95 
     | 
    
         
            -
                  #     that  
     | 
| 
       96 
     | 
    
         
            -
                  #     of that  
     | 
| 
      
 96 
     | 
    
         
            +
                  #     that binding so that cleaners can read and write to attributes
         
     | 
| 
      
 97 
     | 
    
         
            +
                  #     of that binding
         
     | 
| 
       97 
98 
     | 
    
         
             
                  #
         
     | 
| 
       98 
99 
     | 
    
         
             
                  # No logging of cleansing is performed by this method since the value
         
     | 
| 
       99 
100 
     | 
    
         
             
                  # itself is not modified
         
     | 
| 
       100 
     | 
    
         
            -
                  def data_cleansing_clean(cleaner_struct, value,  
     | 
| 
      
 101 
     | 
    
         
            +
                  def data_cleansing_clean(cleaner_struct, value, binding = nil)
         
     | 
| 
       101 
102 
     | 
    
         
             
                    return if cleaner_struct.nil? || value.nil?
         
     | 
| 
       102 
103 
     | 
    
         
             
                    # Duplicate value in case cleaner uses methods such as gsub!
         
     | 
| 
       103 
104 
     | 
    
         
             
                    new_value = value.is_a?(String) ? value.dup : value
         
     | 
| 
       104 
105 
     | 
    
         
             
                    cleaner_struct.cleaners.each do |name|
         
     | 
| 
       105 
     | 
    
         
            -
                       
     | 
| 
       106 
     | 
    
         
            -
                      proc = name.is_a?(Proc) ? name : DataCleansing.cleaner(name.to_sym)
         
     | 
| 
       107 
     | 
    
         
            -
                      raise "No cleaner defined for #{name.inspect}" unless proc
         
     | 
| 
       108 
     | 
    
         
            -
             
     | 
| 
       109 
     | 
    
         
            -
                      if proc.is_a?(Proc)
         
     | 
| 
       110 
     | 
    
         
            -
                        new_value = if object
         
     | 
| 
       111 
     | 
    
         
            -
                          # Call the cleaner proc within the scope (binding) of the object
         
     | 
| 
       112 
     | 
    
         
            -
                          proc.arity == 1 ? object.instance_exec(new_value, &proc) : object.instance_exec(new_value, cleaner_struct.params, &proc)
         
     | 
| 
       113 
     | 
    
         
            -
                        else
         
     | 
| 
       114 
     | 
    
         
            -
                          proc.arity == 1 ? proc.call(new_value) : proc.call(new_value, cleaner_struct.params)
         
     | 
| 
       115 
     | 
    
         
            -
                        end
         
     | 
| 
       116 
     | 
    
         
            -
                      else
         
     | 
| 
       117 
     | 
    
         
            -
                        new_value = (proc.method(:call).arity == 1 ? proc.call(new_value) : proc.call(new_value, cleaner_struct.params))
         
     | 
| 
       118 
     | 
    
         
            -
                      end
         
     | 
| 
       119 
     | 
    
         
            -
             
     | 
| 
      
 106 
     | 
    
         
            +
                      new_value = DataCleansing.clean(name, new_value, binding)
         
     | 
| 
       120 
107 
     | 
    
         
             
                    end
         
     | 
| 
       121 
108 
     | 
    
         
             
                    new_value
         
     | 
| 
       122 
109 
     | 
    
         
             
                  end
         
     | 
| 
         @@ -135,19 +122,19 @@ module DataCleansing 
     | 
|
| 
       135 
122 
     | 
    
         
             
                    changes = {}
         
     | 
| 
       136 
123 
     | 
    
         
             
                    DataCleansing.logger.benchmark_info("#{self.class.name}#cleanse_attributes!", :payload => changes) do
         
     | 
| 
       137 
124 
     | 
    
         
             
                      # Collect parent cleaners first, starting with the top parent
         
     | 
| 
       138 
     | 
    
         
            -
                      cleaners 
     | 
| 
      
 125 
     | 
    
         
            +
                      cleaners       = [self.class.send(:data_cleansing_cleaners)]
         
     | 
| 
       139 
126 
     | 
    
         
             
                      after_cleaners = [self.class.send(:data_cleansing_after_cleaners)]
         
     | 
| 
       140 
     | 
    
         
            -
                      klass 
     | 
| 
      
 127 
     | 
    
         
            +
                      klass          = self.class.superclass
         
     | 
| 
       141 
128 
     | 
    
         
             
                      while klass != Object
         
     | 
| 
       142 
129 
     | 
    
         
             
                        cleaners << klass.send(:data_cleansing_cleaners) if klass.respond_to?(:data_cleansing_cleaners)
         
     | 
| 
       143 
130 
     | 
    
         
             
                        after_cleaners << klass.send(:data_cleansing_after_cleaners) if klass.respond_to?(:data_cleansing_after_cleaners)
         
     | 
| 
       144 
131 
     | 
    
         
             
                        klass = klass.superclass
         
     | 
| 
       145 
132 
     | 
    
         
             
                      end
         
     | 
| 
       146 
133 
     | 
    
         
             
                      # Capture all modified fields if log_level is :debug or :trace
         
     | 
| 
       147 
     | 
    
         
            -
                      cleaners.reverse_each {|cleaner| changes.merge!(data_cleansing_execute_cleaners(cleaner, verbose))}
         
     | 
| 
      
 134 
     | 
    
         
            +
                      cleaners.reverse_each { |cleaner| changes.merge!(data_cleansing_execute_cleaners(cleaner, verbose)) }
         
     | 
| 
       148 
135 
     | 
    
         | 
| 
       149 
136 
     | 
    
         
             
                      # Execute the after cleaners, starting with the parent after cleanse methods
         
     | 
| 
       150 
     | 
    
         
            -
                      after_cleaners.reverse_each {|a| a.each {|method| send(method)} }
         
     | 
| 
      
 137 
     | 
    
         
            +
                      after_cleaners.reverse_each { |a| a.each { |method| send(method) } }
         
     | 
| 
       151 
138 
     | 
    
         
             
                    end
         
     | 
| 
       152 
139 
     | 
    
         
             
                    changes
         
     | 
| 
       153 
140 
     | 
    
         
             
                  end
         
     | 
| 
         @@ -176,15 +163,9 @@ module DataCleansing 
     | 
|
| 
       176 
163 
     | 
    
         
             
                      # Special case to include :all fields
         
     | 
| 
       177 
164 
     | 
    
         
             
                      # Only works with ActiveRecord based models, not supported with regular Ruby models
         
     | 
| 
       178 
165 
     | 
    
         
             
                      if attrs.include?(:all) && defined?(ActiveRecord) && respond_to?(:attributes)
         
     | 
| 
       179 
     | 
    
         
            -
                        attrs = attributes.keys.collect{|i| i.to_sym}
         
     | 
| 
      
 166 
     | 
    
         
            +
                        attrs = attributes.keys.collect { |i| i.to_sym }
         
     | 
| 
       180 
167 
     | 
    
         
             
                        attrs.delete(:id)
         
     | 
| 
       181 
168 
     | 
    
         | 
| 
       182 
     | 
    
         
            -
                        # Remove serialized_attributes if any, from the :all condition
         
     | 
| 
       183 
     | 
    
         
            -
                        if self.class.respond_to?(:serialized_attributes)
         
     | 
| 
       184 
     | 
    
         
            -
                          serialized_attrs = self.class.serialized_attributes.keys
         
     | 
| 
       185 
     | 
    
         
            -
                          attrs -= serialized_attrs.collect{|i| i.to_sym} if serialized_attrs
         
     | 
| 
       186 
     | 
    
         
            -
                        end
         
     | 
| 
       187 
     | 
    
         
            -
             
     | 
| 
       188 
169 
     | 
    
         
             
                        # Replace any encrypted attributes with their non-encrypted versions if any
         
     | 
| 
       189 
170 
     | 
    
         
             
                        if defined?(SymmetricEncryption) && self.class.respond_to?(:encrypted_attributes)
         
     | 
| 
       190 
171 
     | 
    
         
             
                          self.class.encrypted_attributes.each_pair do |clear, encrypted|
         
     | 
| 
         @@ -205,15 +186,16 @@ module DataCleansing 
     | 
|
| 
       205 
186 
     | 
    
         
             
                      attrs.each do |attr|
         
     | 
| 
       206 
187 
     | 
    
         
             
                        # Under ActiveModel for Rails and Mongoid need to retrieve raw value
         
     | 
| 
       207 
188 
     | 
    
         
             
                        # before data type conversion
         
     | 
| 
       208 
     | 
    
         
            -
                        value = 
     | 
| 
       209 
     | 
    
         
            -
                          read_attribute_before_type_cast(attr.to_s)
         
     | 
| 
       210 
     | 
    
         
            -
             
     | 
| 
       211 
     | 
    
         
            -
                           
     | 
| 
       212 
     | 
    
         
            -
             
     | 
| 
      
 189 
     | 
    
         
            +
                        value =
         
     | 
| 
      
 190 
     | 
    
         
            +
                          if respond_to?(:read_attribute_before_type_cast) && has_attribute?(attr.to_s)
         
     | 
| 
      
 191 
     | 
    
         
            +
                            read_attribute_before_type_cast(attr.to_s)
         
     | 
| 
      
 192 
     | 
    
         
            +
                          else
         
     | 
| 
      
 193 
     | 
    
         
            +
                            send(attr.to_sym)
         
     | 
| 
      
 194 
     | 
    
         
            +
                          end
         
     | 
| 
       213 
195 
     | 
    
         | 
| 
       214 
196 
     | 
    
         
             
                        # No need to clean if attribute is nil
         
     | 
| 
       215 
197 
     | 
    
         
             
                        unless value.nil?
         
     | 
| 
       216 
     | 
    
         
            -
                          new_value = self.class.send(:data_cleansing_clean,cleaner_struct, value, self)
         
     | 
| 
      
 198 
     | 
    
         
            +
                          new_value = self.class.send(:data_cleansing_clean, cleaner_struct, value, self)
         
     | 
| 
       217 
199 
     | 
    
         | 
| 
       218 
200 
     | 
    
         
             
                          if new_value != value
         
     | 
| 
       219 
201 
     | 
    
         
             
                            # Update value only if it has changed
         
     | 
| 
         @@ -222,7 +204,7 @@ module DataCleansing 
     | 
|
| 
       222 
204 
     | 
    
         
             
                            # Capture changed attributes
         
     | 
| 
       223 
205 
     | 
    
         
             
                            if changes
         
     | 
| 
       224 
206 
     | 
    
         
             
                              # Mask sensitive attributes when logging
         
     | 
| 
       225 
     | 
    
         
            -
                              masked 
     | 
| 
      
 207 
     | 
    
         
            +
                              masked    = DataCleansing.masked_attributes.include?(attr.to_sym)
         
     | 
| 
       226 
208 
     | 
    
         
             
                              new_value = :masked if masked && !new_value.nil?
         
     | 
| 
       227 
209 
     | 
    
         
             
                              if previous = changes[attr.to_sym]
         
     | 
| 
       228 
210 
     | 
    
         
             
                                previous[:after] = new_value
         
     | 
| 
         @@ -246,7 +228,7 @@ module DataCleansing 
     | 
|
| 
       246 
228 
     | 
    
         | 
| 
       247 
229 
     | 
    
         
             
                def self.included(base)
         
     | 
| 
       248 
230 
     | 
    
         
             
                  base.class_eval do
         
     | 
| 
       249 
     | 
    
         
            -
                    extend 
     | 
| 
      
 231 
     | 
    
         
            +
                    extend DataCleansing::Cleanse::ClassMethods
         
     | 
| 
       250 
232 
     | 
    
         
             
                    include DataCleansing::Cleanse::InstanceMethods
         
     | 
| 
       251 
233 
     | 
    
         
             
                  end
         
     | 
| 
       252 
234 
     | 
    
         
             
                end
         
     | 
| 
         @@ -27,4 +27,22 @@ module DataCleansing 
     | 
|
| 
       27 
27 
     | 
    
         
             
                @@masked_attributes.freeze
         
     | 
| 
       28 
28 
     | 
    
         
             
              end
         
     | 
| 
       29 
29 
     | 
    
         | 
| 
      
 30 
     | 
    
         
            +
              # Run the specified cleanser against the supplied value
         
     | 
| 
      
 31 
     | 
    
         
            +
              def self.clean(name, value, binding = nil)
         
     | 
| 
      
 32 
     | 
    
         
            +
                # Cleaner itself could be a custom Proc, otherwise do a global lookup for it
         
     | 
| 
      
 33 
     | 
    
         
            +
                proc = name.is_a?(Proc) ? name : DataCleansing.cleaner(name.to_sym)
         
     | 
| 
      
 34 
     | 
    
         
            +
                raise(ArgumentError, "No cleaner defined for #{name.inspect}") unless proc
         
     | 
| 
      
 35 
     | 
    
         
            +
             
     | 
| 
      
 36 
     | 
    
         
            +
                if proc.is_a?(Proc)
         
     | 
| 
      
 37 
     | 
    
         
            +
                  if binding
         
     | 
| 
      
 38 
     | 
    
         
            +
                    # Call the cleaner proc within the scope (binding) of the binding
         
     | 
| 
      
 39 
     | 
    
         
            +
                    proc.arity == 1 ? binding.instance_exec(value, &proc) : binding.instance_exec(value, cleaner_struct.params, &proc)
         
     | 
| 
      
 40 
     | 
    
         
            +
                  else
         
     | 
| 
      
 41 
     | 
    
         
            +
                    proc.arity == 1 ? proc.call(value) : proc.call(value, cleaner_struct.params)
         
     | 
| 
      
 42 
     | 
    
         
            +
                  end
         
     | 
| 
      
 43 
     | 
    
         
            +
                else
         
     | 
| 
      
 44 
     | 
    
         
            +
                  (proc.method(:call).arity == 1 ? proc.call(value) : proc.call(value, cleaner_struct.params))
         
     | 
| 
      
 45 
     | 
    
         
            +
                end
         
     | 
| 
      
 46 
     | 
    
         
            +
              end
         
     | 
| 
      
 47 
     | 
    
         
            +
             
     | 
| 
       30 
48 
     | 
    
         
             
            end
         
     | 
    
        data/test/active_record_test.rb
    CHANGED
    
    | 
         @@ -10,7 +10,7 @@ ActiveRecord::Base.configurations = { 
     | 
|
| 
       10 
10 
     | 
    
         
             
                'timeout'  => 5000
         
     | 
| 
       11 
11 
     | 
    
         
             
              }
         
     | 
| 
       12 
12 
     | 
    
         
             
            }
         
     | 
| 
       13 
     | 
    
         
            -
            ActiveRecord::Base.establish_connection( 
     | 
| 
      
 13 
     | 
    
         
            +
            ActiveRecord::Base.establish_connection(:test)
         
     | 
| 
       14 
14 
     | 
    
         | 
| 
       15 
15 
     | 
    
         
             
            ActiveRecord::Schema.define :version => 0 do
         
     | 
| 
       16 
16 
     | 
    
         
             
              create_table :users, :force => true do |t|
         
     | 
| 
         @@ -20,6 +20,7 @@ ActiveRecord::Schema.define :version => 0 do 
     | 
|
| 
       20 
20 
     | 
    
         
             
                t.string  :address2
         
     | 
| 
       21 
21 
     | 
    
         
             
                t.string  :ssn
         
     | 
| 
       22 
22 
     | 
    
         
             
                t.integer :zip_code
         
     | 
| 
      
 23 
     | 
    
         
            +
                t.text    :text
         
     | 
| 
       23 
24 
     | 
    
         
             
              end
         
     | 
| 
       24 
25 
     | 
    
         
             
            end
         
     | 
| 
       25 
26 
     | 
    
         | 
| 
         @@ -54,8 +55,11 @@ class User2 < ActiveRecord::Base 
     | 
|
| 
       54 
55 
     | 
    
         
             
              # Use the same table as User above
         
     | 
| 
       55 
56 
     | 
    
         
             
              self.table_name = 'users'
         
     | 
| 
       56 
57 
     | 
    
         | 
| 
      
 58 
     | 
    
         
            +
              serialize :text
         
     | 
| 
      
 59 
     | 
    
         
            +
             
     | 
| 
       57 
60 
     | 
    
         
             
              # Test :all cleaner. Only works with ActiveRecord Models
         
     | 
| 
       58 
     | 
    
         
            -
               
     | 
| 
      
 61 
     | 
    
         
            +
              # Must explicitly excelude :text since it is serialized
         
     | 
| 
      
 62 
     | 
    
         
            +
              cleanse :all, :cleaner => [:strip, Proc.new{|s| "@#{s}@"}], :except => [:address1, :zip_code, :text]
         
     | 
| 
       59 
63 
     | 
    
         | 
| 
       60 
64 
     | 
    
         
             
              # Clean :first_name multiple times
         
     | 
| 
       61 
65 
     | 
    
         
             
              cleanse :first_name, :cleaner => Proc.new {|string| "<< #{string} >>"}
         
     | 
| 
         @@ -71,7 +75,7 @@ class User2 < ActiveRecord::Base 
     | 
|
| 
       71 
75 
     | 
    
         
             
            end
         
     | 
| 
       72 
76 
     | 
    
         | 
| 
       73 
77 
     | 
    
         
             
            class ActiveRecordTest < Minitest::Test
         
     | 
| 
       74 
     | 
    
         
            -
              describe  
     | 
| 
      
 78 
     | 
    
         
            +
              describe 'ActiveRecord Models' do
         
     | 
| 
       75 
79 
     | 
    
         | 
| 
       76 
80 
     | 
    
         
             
                it 'have globally registered cleaner' do
         
     | 
| 
       77 
81 
     | 
    
         
             
                  assert DataCleansing.cleaner(:strip)
         
     | 
| 
         @@ -118,14 +122,15 @@ class ActiveRecordTest < Minitest::Test 
     | 
|
| 
       118 
122 
     | 
    
         
             
                  end
         
     | 
| 
       119 
123 
     | 
    
         
             
                end
         
     | 
| 
       120 
124 
     | 
    
         | 
| 
       121 
     | 
    
         
            -
                describe  
     | 
| 
      
 125 
     | 
    
         
            +
                describe 'with user2' do
         
     | 
| 
       122 
126 
     | 
    
         
             
                  before do
         
     | 
| 
       123 
127 
     | 
    
         
             
                    @user = User2.new(
         
     | 
| 
       124 
128 
     | 
    
         
             
                      :first_name => '    joe   ',
         
     | 
| 
       125 
129 
     | 
    
         
             
                      :last_name  => "\n  black\n",
         
     | 
| 
       126 
130 
     | 
    
         
             
                      :ssn        => "\n    123456789   \n  ",
         
     | 
| 
       127 
131 
     | 
    
         
             
                      :address1   => "2632 Brown St   \n",
         
     | 
| 
       128 
     | 
    
         
            -
                      :zip_code   => "\n\t  blah\n"
         
     | 
| 
      
 132 
     | 
    
         
            +
                      :zip_code   => "\n\t  blah\n",
         
     | 
| 
      
 133 
     | 
    
         
            +
                      :text       => ["\n    123456789   \n  ", ' second ']
         
     | 
| 
       129 
134 
     | 
    
         
             
                    )
         
     | 
| 
       130 
135 
     | 
    
         
             
                  end
         
     | 
| 
       131 
136 
     | 
    
         | 
| 
         @@ -145,6 +150,7 @@ class ActiveRecordTest < Minitest::Test 
     | 
|
| 
       145 
150 
     | 
    
         
             
                    assert_equal "2632 Brown St   \n", @user.address1
         
     | 
| 
       146 
151 
     | 
    
         
             
                    assert_equal "@123456789@", @user.ssn
         
     | 
| 
       147 
152 
     | 
    
         
             
                    assert_equal nil, @user.zip_code, User2.send(:data_cleansing_cleaners)
         
     | 
| 
      
 153 
     | 
    
         
            +
                    assert_equal ["\n    123456789   \n  ", ' second '], @user.text
         
     | 
| 
       148 
154 
     | 
    
         
             
                  end
         
     | 
| 
       149 
155 
     | 
    
         | 
| 
       150 
156 
     | 
    
         
             
                end
         
     | 
    
        data/test/cleaners_test.rb
    CHANGED
    
    | 
         @@ -8,7 +8,8 @@ class CleanersTest < Minitest::Test 
     | 
|
| 
       8 
8 
     | 
    
         
             
                attr_accessor :first_name, :last_name, :address1, :address2,
         
     | 
| 
       9 
9 
     | 
    
         
             
                  :make_this_upper, :clean_non_word, :clean_non_printable,
         
     | 
| 
       10 
10 
     | 
    
         
             
                  :clean_html, :clean_from_uri, :clean_to_uri, :clean_whitespace,
         
     | 
| 
       11 
     | 
    
         
            -
                  :clean_digits_only, :clean_to_integer, :clean_to_float, :clean_end_of_day
         
     | 
| 
      
 11 
     | 
    
         
            +
                  :clean_digits_only, :clean_to_integer, :clean_to_float, :clean_end_of_day,
         
     | 
| 
      
 12 
     | 
    
         
            +
                  :clean_order
         
     | 
| 
       12 
13 
     | 
    
         | 
| 
       13 
14 
     | 
    
         
             
                cleanse :first_name, :last_name, :address1, :address2, cleaner: :strip
         
     | 
| 
       14 
15 
     | 
    
         
             
                cleanse :make_this_upper, cleaner: :upcase
         
     | 
| 
         @@ -22,6 +23,10 @@ class CleanersTest < Minitest::Test 
     | 
|
| 
       22 
23 
     | 
    
         
             
                cleanse :clean_to_integer, cleaner: :string_to_integer
         
     | 
| 
       23 
24 
     | 
    
         
             
                cleanse :clean_to_float, cleaner: :string_to_float
         
     | 
| 
       24 
25 
     | 
    
         
             
                cleanse :clean_end_of_day, cleaner: :end_of_day
         
     | 
| 
      
 26 
     | 
    
         
            +
             
     | 
| 
      
 27 
     | 
    
         
            +
                # Call cleaners in the order they are defined
         
     | 
| 
      
 28 
     | 
    
         
            +
                cleanse :clean_order, cleaner: [:upcase, :strip]
         
     | 
| 
      
 29 
     | 
    
         
            +
                cleanse :clean_order, cleaner: -> val { val == 'BLAH' ? ' yes ' : ' no ' }
         
     | 
| 
       25 
30 
     | 
    
         
             
              end
         
     | 
| 
       26 
31 
     | 
    
         | 
| 
       27 
32 
     | 
    
         
             
              describe 'Cleaners' do
         
     | 
| 
         @@ -140,17 +145,17 @@ class CleanersTest < Minitest::Test 
     | 
|
| 
       140 
145 
     | 
    
         
             
                end
         
     | 
| 
       141 
146 
     | 
    
         | 
| 
       142 
147 
     | 
    
         
             
                describe '#escape_uri' do
         
     | 
| 
       143 
     | 
    
         
            -
                  it 'converts  
     | 
| 
      
 148 
     | 
    
         
            +
                  it 'converts spaces' do
         
     | 
| 
       144 
149 
     | 
    
         
             
                    user              = User.new
         
     | 
| 
       145 
150 
     | 
    
         
             
                    user.clean_to_uri = 'Jim  Bob '
         
     | 
| 
       146 
151 
     | 
    
         
             
                    user.cleanse_attributes!
         
     | 
| 
       147 
     | 
    
         
            -
                    assert_equal 'Jim 
     | 
| 
      
 152 
     | 
    
         
            +
                    assert_equal 'Jim++Bob+', user.clean_to_uri
         
     | 
| 
       148 
153 
     | 
    
         
             
                  end
         
     | 
| 
       149 
     | 
    
         
            -
                  it 'converts  
     | 
| 
      
 154 
     | 
    
         
            +
                  it 'converts space only' do
         
     | 
| 
       150 
155 
     | 
    
         
             
                    user              = User.new
         
     | 
| 
       151 
156 
     | 
    
         
             
                    user.clean_to_uri = ' '
         
     | 
| 
       152 
157 
     | 
    
         
             
                    user.cleanse_attributes!
         
     | 
| 
       153 
     | 
    
         
            -
                    assert_equal ' 
     | 
| 
      
 158 
     | 
    
         
            +
                    assert_equal '+', user.clean_to_uri
         
     | 
| 
       154 
159 
     | 
    
         
             
                  end
         
     | 
| 
       155 
160 
     | 
    
         
             
                end
         
     | 
| 
       156 
161 
     | 
    
         | 
| 
         @@ -205,5 +210,12 @@ class CleanersTest < Minitest::Test 
     | 
|
| 
       205 
210 
     | 
    
         
             
                  assert_equal Time.parse('2016-03-03 23:59:59 +0000').to_i, user.clean_end_of_day.to_i
         
     | 
| 
       206 
211 
     | 
    
         
             
                end
         
     | 
| 
       207 
212 
     | 
    
         | 
| 
      
 213 
     | 
    
         
            +
                it 'cleans in the order defined' do
         
     | 
| 
      
 214 
     | 
    
         
            +
                  user             = User.new
         
     | 
| 
      
 215 
     | 
    
         
            +
                  user.clean_order = '  blah '
         
     | 
| 
      
 216 
     | 
    
         
            +
                  user.cleanse_attributes!
         
     | 
| 
      
 217 
     | 
    
         
            +
                  assert_equal ' yes ', user.clean_order
         
     | 
| 
      
 218 
     | 
    
         
            +
                end
         
     | 
| 
      
 219 
     | 
    
         
            +
             
     | 
| 
       208 
220 
     | 
    
         
             
              end
         
     | 
| 
       209 
221 
     | 
    
         
             
            end
         
     | 
    
        data/test/ruby_test.rb
    CHANGED
    
    | 
         @@ -75,6 +75,7 @@ class RubyTest < Minitest::Test 
     | 
|
| 
       75 
75 
     | 
    
         
             
                  assert_equal 'joe',                 RubyUserChild.cleanse_attribute(:first_name, '    joe   '), RubyUserChild.send(:data_cleansing_attribute_cleaners)
         
     | 
| 
       76 
76 
     | 
    
         
             
                  assert_equal 'black',               RubyUserChild.cleanse_attribute(:last_name,  "\n  black\n"), RubyUserChild.send(:data_cleansing_attribute_cleaners)
         
     | 
| 
       77 
77 
     | 
    
         
             
                  assert_equal '<< 2632 Brown St >>', RubyUserChild.cleanse_attribute(:address1,   "2632 Brown St   \n"), RubyUserChild.send(:data_cleansing_attribute_cleaners)
         
     | 
| 
      
 78 
     | 
    
         
            +
                  assert_equal 3,                     RubyUserChild.cleanse_attribute(:first_name, 3), RubyUserChild.send(:data_cleansing_attribute_cleaners)
         
     | 
| 
       78 
79 
     | 
    
         
             
                end
         
     | 
| 
       79 
80 
     | 
    
         | 
| 
       80 
81 
     | 
    
         
             
                describe "with ruby user" do
         
     | 
    
        data/test/test_db.sqlite3
    CHANGED
    
    | 
         Binary file 
     | 
    
        data/test/test_helper.rb
    CHANGED
    
    
    
        metadata
    CHANGED
    
    | 
         @@ -1,14 +1,14 @@ 
     | 
|
| 
       1 
1 
     | 
    
         
             
            --- !ruby/object:Gem::Specification
         
     | 
| 
       2 
2 
     | 
    
         
             
            name: data_cleansing
         
     | 
| 
       3 
3 
     | 
    
         
             
            version: !ruby/object:Gem::Version
         
     | 
| 
       4 
     | 
    
         
            -
              version: 0. 
     | 
| 
      
 4 
     | 
    
         
            +
              version: 1.0.0
         
     | 
| 
       5 
5 
     | 
    
         
             
            platform: ruby
         
     | 
| 
       6 
6 
     | 
    
         
             
            authors:
         
     | 
| 
       7 
7 
     | 
    
         
             
            - Reid Morrison
         
     | 
| 
       8 
8 
     | 
    
         
             
            autorequire: 
         
     | 
| 
       9 
9 
     | 
    
         
             
            bindir: bin
         
     | 
| 
       10 
10 
     | 
    
         
             
            cert_chain: []
         
     | 
| 
       11 
     | 
    
         
            -
            date: 2016- 
     | 
| 
      
 11 
     | 
    
         
            +
            date: 2016-08-25 00:00:00.000000000 Z
         
     | 
| 
       12 
12 
     | 
    
         
             
            dependencies:
         
     | 
| 
       13 
13 
     | 
    
         
             
            - !ruby/object:Gem::Dependency
         
     | 
| 
       14 
14 
     | 
    
         
             
              name: concurrent-ruby
         
     | 
| 
         @@ -56,6 +56,7 @@ files: 
     | 
|
| 
       56 
56 
     | 
    
         
             
            - lib/data_cleansing/version.rb
         
     | 
| 
       57 
57 
     | 
    
         
             
            - test/active_record_test.rb
         
     | 
| 
       58 
58 
     | 
    
         
             
            - test/cleaners_test.rb
         
     | 
| 
      
 59 
     | 
    
         
            +
            - test/data_cleansing_test.rb
         
     | 
| 
       59 
60 
     | 
    
         
             
            - test/ruby_test.rb
         
     | 
| 
       60 
61 
     | 
    
         
             
            - test/test_db.sqlite3
         
     | 
| 
       61 
62 
     | 
    
         
             
            - test/test_helper.rb
         
     | 
| 
         @@ -86,6 +87,7 @@ summary: Data Cleansing framework for Ruby, Rails, Mongoid and MongoMapper. 
     | 
|
| 
       86 
87 
     | 
    
         
             
            test_files:
         
     | 
| 
       87 
88 
     | 
    
         
             
            - test/active_record_test.rb
         
     | 
| 
       88 
89 
     | 
    
         
             
            - test/cleaners_test.rb
         
     | 
| 
      
 90 
     | 
    
         
            +
            - test/data_cleansing_test.rb
         
     | 
| 
       89 
91 
     | 
    
         
             
            - test/ruby_test.rb
         
     | 
| 
       90 
92 
     | 
    
         
             
            - test/test_db.sqlite3
         
     | 
| 
       91 
93 
     | 
    
         
             
            - test/test_helper.rb
         
     |