data_cleansing 0.9.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 8ba846025b7441eb5a93230b7fbd8ebe2a4d88e3
4
- data.tar.gz: 4e209fd6ef57540a8b549d06c314ae4caeddbf59
3
+ metadata.gz: 0347620583101155e6181d7e2504ef2a4816d970
4
+ data.tar.gz: b80bff6ab7116bda3cef959a7eeb6c231ab3660b
5
5
  SHA512:
6
- metadata.gz: 7b464ca76d4c40f4621d86a32cd76bd4bc3e71e8b5eed18ac094ae651a8f0be58772a503fa096c6798b081cf3030363973b0d96cfd2cf45d6497e14a5b2717f1
7
- data.tar.gz: e6933049c6200cadb6e398e3d2af8bae641534942a201c6ed7b8a47fc991f7a843d7b2d1b6cbc1c00f14d837f2de887ac6011f23c187fe33be3c6199a1e18cdf
6
+ metadata.gz: 474e0ed54427a7958358a1d645d95792af3b83f0621e48c1986d09ecfd1f8288aada4e6ee55573f88347eb7193adf0eddde1b7cb39c110c6a84a1c5f43daae19
7
+ data.tar.gz: deb42d04fa24cf7b3e77d8411989e2c8578bf8c23b9f8726898df640e93e271071e0cccd96850eb342780737894823ccd295032a24bb22fdc16678a71b55ddd1
data/README.md CHANGED
@@ -1,7 +1,7 @@
1
1
  data_cleansing
2
2
  ==============
3
3
 
4
- Data Cleansing framework for Ruby, Rails, Mongoid and MongoMapper.
4
+ Data Cleansing framework for Ruby.
5
5
 
6
6
  * http://github.com/reidmorrison/data_cleansing
7
7
 
@@ -12,12 +12,8 @@ or trailing blanks and even newlines.
12
12
  Similarly it would be useful to be able to attach a cleansing solution to a field
13
13
  in a model and have the data cleansed transparently when required.
14
14
 
15
- DataCleansing is a framework that allows any data cleansing to be applied to
16
- specific attributes or fields. At this time it does not supply the cleaning
17
- solutions themselves since they are usually straight forward, or so complex
18
- that they don't tend to be too useful to others. However, over time built-in
19
- cleansing solutions may be added. Feel free to submit any suggestions via a ticket
20
- or pull request.
15
+ DataCleansing is a framework that allows data cleansing to be applied to
16
+ specific attributes or fields.
21
17
 
22
18
  ## Features
23
19
 
@@ -297,24 +293,6 @@ Install the Gem with bundler
297
293
 
298
294
  bundle install
299
295
 
300
- ## Architecture
301
-
302
- DataCleansing has been designed to support externalized data cleansing routines.
303
- In this way the data cleansing routine itself can be loaded from a datastore and
304
- applied dynamically at runtime.
305
- Although not supported out of the box, this design allows for example for the
306
- data cleansing routines to be stored in something like [ZooKeeper](http://zookeeper.apache.org/).
307
- Then any changes to the data cleansing routines can be pushed out immediately to
308
- every server that needs it.
309
-
310
- DataCleansing is designed to support any Ruby model. In this way it can be used
311
- in just about any ORM or DOM. For example, it currently easily supports both
312
- Rails and Mongoid models. Some extensions have been added to support these frameworks.
313
-
314
- For example, in Rails it obtains the raw data value before Rails has converted it.
315
- Which is useful for cleansing integer or float fields as raw strings before Rails
316
- tries to convert it to an integer or float.
317
-
318
296
  ## Dependencies
319
297
 
320
298
  DataCleansing requires the following dependencies
data/Rakefile CHANGED
@@ -1,6 +1,4 @@
1
- require 'rake/clean'
2
1
  require 'rake/testtask'
3
-
4
2
  require_relative 'lib/data_cleansing/version'
5
3
 
6
4
  task :gem do
@@ -14,14 +12,10 @@ task publish: :gem do
14
12
  system "rm data_cleansing-#{DataCleansing::VERSION}.gem"
15
13
  end
16
14
 
17
- desc 'Run Test Suite'
18
- task :test do
19
- Rake::TestTask.new(:functional) do |t|
20
- t.test_files = FileList['test/**/*_test.rb']
21
- t.verbose = true
22
- end
23
-
24
- Rake::Task['functional'].invoke
15
+ Rake::TestTask.new(:test) do |t|
16
+ t.pattern = 'test/**/*_test.rb'
17
+ t.verbose = true
18
+ t.warning = true
25
19
  end
26
20
 
27
21
  task default: :test
@@ -1,4 +1,4 @@
1
- require 'uri'
1
+ require 'cgi'
2
2
  module Cleaners
3
3
  # Strip leading and trailing whitespace
4
4
  module Strip
@@ -20,6 +20,16 @@ module Cleaners
20
20
  end
21
21
  DataCleansing.register_cleaner(:upcase, Upcase)
22
22
 
23
+ # Convert to downcase
24
+ module Downcase
25
+ def self.call(string)
26
+ return string unless string.is_a?(String)
27
+
28
+ string.downcase! || string
29
+ end
30
+ end
31
+ DataCleansing.register_cleaner(:downcase, Downcase)
32
+
23
33
  # Remove all non-word characters, including whitespace
24
34
  module RemoveNonWord
25
35
  NOT_WORDS = Regexp.compile(/\W/)
@@ -44,7 +54,7 @@ module Cleaners
44
54
  end
45
55
  DataCleansing.register_cleaner(:remove_non_printable, RemoveNonPrintable)
46
56
 
47
- # Remove HTML Markup
57
+ # Unescape HTML Markup ( case-insensitive )
48
58
  module ReplaceHTMLMarkup
49
59
  HTML_MARKUP = Regexp.compile(/&(amp|quot|gt|lt|apos|nbsp);/in)
50
60
 
@@ -77,7 +87,7 @@ module Cleaners
77
87
  def self.call(string)
78
88
  return string unless string.is_a?(String)
79
89
 
80
- URI.unescape(string)
90
+ CGI.unescape(string)
81
91
  end
82
92
  end
83
93
  DataCleansing.register_cleaner(:unescape_uri, UnescapeURI)
@@ -86,7 +96,7 @@ module Cleaners
86
96
  def self.call(string)
87
97
  return string unless string.is_a?(String)
88
98
 
89
- URI.escape(string)
99
+ CGI.escape(string)
90
100
  end
91
101
  end
92
102
  DataCleansing.register_cleaner(:escape_uri, EscapeURI)
@@ -7,10 +7,10 @@ module DataCleansing
7
7
  module ClassMethods
8
8
  # Define how to cleanse one or more attributes
9
9
  def cleanse(*args)
10
- last = args.last
10
+ last = args.last
11
11
  attributes = args.dup
12
- params = (last.is_a?(Hash) && last.instance_of?(Hash)) ? attributes.pop.dup : {}
13
- cleaners = Array(params.delete(:cleaner))
12
+ params = (last.is_a?(Hash) && last.instance_of?(Hash)) ? attributes.pop.dup : {}
13
+ cleaners = Array(params.delete(:cleaner))
14
14
  raise(ArgumentError, "Mandatory :cleaner parameter is missing: #{params.inspect}") unless cleaners
15
15
 
16
16
  cleaner = DataCleansingCleaner.new(cleaners, attributes, params)
@@ -58,7 +58,7 @@ module DataCleansing
58
58
 
59
59
  # Collect parent cleaners first, starting with the top parent
60
60
  cleaners = []
61
- klass = self
61
+ klass = self
62
62
  while klass != Object
63
63
  if klass.respond_to?(:data_cleansing_attribute_cleaners)
64
64
  cleaners += klass.data_cleansing_attribute_cleaners[:all] || []
@@ -66,8 +66,9 @@ module DataCleansing
66
66
  end
67
67
  klass = klass.superclass
68
68
  end
69
- cleansed_value = value.dup
70
- cleaners.reverse_each {|cleaner| cleansed_value = data_cleansing_clean(cleaner, cleansed_value, object) if cleaner}
69
+ # Support Fixnum values
70
+ cleansed_value = value.is_a?(Fixnum) ? value : value.dup
71
+ cleaners.reverse_each { |cleaner| cleansed_value = data_cleansing_clean(cleaner, cleansed_value, object) if cleaner }
71
72
  cleansed_value
72
73
  end
73
74
 
@@ -90,33 +91,19 @@ module DataCleansing
90
91
 
91
92
  # Returns the supplied value cleansed using the supplied cleaner
92
93
  # Parameters
93
- # object
94
+ # binding
94
95
  # If supplied the cleansing will be performed within the scope of
95
- # that object so that cleaners can read and write to attributes
96
- # of that object
96
+ # that binding so that cleaners can read and write to attributes
97
+ # of that binding
97
98
  #
98
99
  # No logging of cleansing is performed by this method since the value
99
100
  # itself is not modified
100
- def data_cleansing_clean(cleaner_struct, value, object=nil)
101
+ def data_cleansing_clean(cleaner_struct, value, binding = nil)
101
102
  return if cleaner_struct.nil? || value.nil?
102
103
  # Duplicate value in case cleaner uses methods such as gsub!
103
104
  new_value = value.is_a?(String) ? value.dup : value
104
105
  cleaner_struct.cleaners.each do |name|
105
- # Cleaner itself could be a custom Proc, otherwise do a global lookup for it
106
- proc = name.is_a?(Proc) ? name : DataCleansing.cleaner(name.to_sym)
107
- raise "No cleaner defined for #{name.inspect}" unless proc
108
-
109
- if proc.is_a?(Proc)
110
- new_value = if object
111
- # Call the cleaner proc within the scope (binding) of the object
112
- proc.arity == 1 ? object.instance_exec(new_value, &proc) : object.instance_exec(new_value, cleaner_struct.params, &proc)
113
- else
114
- proc.arity == 1 ? proc.call(new_value) : proc.call(new_value, cleaner_struct.params)
115
- end
116
- else
117
- new_value = (proc.method(:call).arity == 1 ? proc.call(new_value) : proc.call(new_value, cleaner_struct.params))
118
- end
119
-
106
+ new_value = DataCleansing.clean(name, new_value, binding)
120
107
  end
121
108
  new_value
122
109
  end
@@ -135,19 +122,19 @@ module DataCleansing
135
122
  changes = {}
136
123
  DataCleansing.logger.benchmark_info("#{self.class.name}#cleanse_attributes!", :payload => changes) do
137
124
  # Collect parent cleaners first, starting with the top parent
138
- cleaners = [self.class.send(:data_cleansing_cleaners)]
125
+ cleaners = [self.class.send(:data_cleansing_cleaners)]
139
126
  after_cleaners = [self.class.send(:data_cleansing_after_cleaners)]
140
- klass = self.class.superclass
127
+ klass = self.class.superclass
141
128
  while klass != Object
142
129
  cleaners << klass.send(:data_cleansing_cleaners) if klass.respond_to?(:data_cleansing_cleaners)
143
130
  after_cleaners << klass.send(:data_cleansing_after_cleaners) if klass.respond_to?(:data_cleansing_after_cleaners)
144
131
  klass = klass.superclass
145
132
  end
146
133
  # Capture all modified fields if log_level is :debug or :trace
147
- cleaners.reverse_each {|cleaner| changes.merge!(data_cleansing_execute_cleaners(cleaner, verbose))}
134
+ cleaners.reverse_each { |cleaner| changes.merge!(data_cleansing_execute_cleaners(cleaner, verbose)) }
148
135
 
149
136
  # Execute the after cleaners, starting with the parent after cleanse methods
150
- after_cleaners.reverse_each {|a| a.each {|method| send(method)} }
137
+ after_cleaners.reverse_each { |a| a.each { |method| send(method) } }
151
138
  end
152
139
  changes
153
140
  end
@@ -176,15 +163,9 @@ module DataCleansing
176
163
  # Special case to include :all fields
177
164
  # Only works with ActiveRecord based models, not supported with regular Ruby models
178
165
  if attrs.include?(:all) && defined?(ActiveRecord) && respond_to?(:attributes)
179
- attrs = attributes.keys.collect{|i| i.to_sym}
166
+ attrs = attributes.keys.collect { |i| i.to_sym }
180
167
  attrs.delete(:id)
181
168
 
182
- # Remove serialized_attributes if any, from the :all condition
183
- if self.class.respond_to?(:serialized_attributes)
184
- serialized_attrs = self.class.serialized_attributes.keys
185
- attrs -= serialized_attrs.collect{|i| i.to_sym} if serialized_attrs
186
- end
187
-
188
169
  # Replace any encrypted attributes with their non-encrypted versions if any
189
170
  if defined?(SymmetricEncryption) && self.class.respond_to?(:encrypted_attributes)
190
171
  self.class.encrypted_attributes.each_pair do |clear, encrypted|
@@ -205,15 +186,16 @@ module DataCleansing
205
186
  attrs.each do |attr|
206
187
  # Under ActiveModel for Rails and Mongoid need to retrieve raw value
207
188
  # before data type conversion
208
- value = if respond_to?(:read_attribute_before_type_cast) && has_attribute?(attr.to_s)
209
- read_attribute_before_type_cast(attr.to_s)
210
- else
211
- send(attr.to_sym)
212
- end
189
+ value =
190
+ if respond_to?(:read_attribute_before_type_cast) && has_attribute?(attr.to_s)
191
+ read_attribute_before_type_cast(attr.to_s)
192
+ else
193
+ send(attr.to_sym)
194
+ end
213
195
 
214
196
  # No need to clean if attribute is nil
215
197
  unless value.nil?
216
- new_value = self.class.send(:data_cleansing_clean,cleaner_struct, value, self)
198
+ new_value = self.class.send(:data_cleansing_clean, cleaner_struct, value, self)
217
199
 
218
200
  if new_value != value
219
201
  # Update value only if it has changed
@@ -222,7 +204,7 @@ module DataCleansing
222
204
  # Capture changed attributes
223
205
  if changes
224
206
  # Mask sensitive attributes when logging
225
- masked = DataCleansing.masked_attributes.include?(attr.to_sym)
207
+ masked = DataCleansing.masked_attributes.include?(attr.to_sym)
226
208
  new_value = :masked if masked && !new_value.nil?
227
209
  if previous = changes[attr.to_sym]
228
210
  previous[:after] = new_value
@@ -246,7 +228,7 @@ module DataCleansing
246
228
 
247
229
  def self.included(base)
248
230
  base.class_eval do
249
- extend DataCleansing::Cleanse::ClassMethods
231
+ extend DataCleansing::Cleanse::ClassMethods
250
232
  include DataCleansing::Cleanse::InstanceMethods
251
233
  end
252
234
  end
@@ -27,4 +27,22 @@ module DataCleansing
27
27
  @@masked_attributes.freeze
28
28
  end
29
29
 
30
+ # Run the specified cleanser against the supplied value
31
+ def self.clean(name, value, binding = nil)
32
+ # Cleaner itself could be a custom Proc, otherwise do a global lookup for it
33
+ proc = name.is_a?(Proc) ? name : DataCleansing.cleaner(name.to_sym)
34
+ raise(ArgumentError, "No cleaner defined for #{name.inspect}") unless proc
35
+
36
+ if proc.is_a?(Proc)
37
+ if binding
38
+ # Call the cleaner proc within the scope (binding) of the binding
39
+ proc.arity == 1 ? binding.instance_exec(value, &proc) : binding.instance_exec(value, cleaner_struct.params, &proc)
40
+ else
41
+ proc.arity == 1 ? proc.call(value) : proc.call(value, cleaner_struct.params)
42
+ end
43
+ else
44
+ (proc.method(:call).arity == 1 ? proc.call(value) : proc.call(value, cleaner_struct.params))
45
+ end
46
+ end
47
+
30
48
  end
@@ -1,3 +1,3 @@
1
1
  module DataCleansing
2
- VERSION = '0.9.0'
2
+ VERSION = '1.0.0'
3
3
  end
@@ -10,7 +10,7 @@ ActiveRecord::Base.configurations = {
10
10
  'timeout' => 5000
11
11
  }
12
12
  }
13
- ActiveRecord::Base.establish_connection('test')
13
+ ActiveRecord::Base.establish_connection(:test)
14
14
 
15
15
  ActiveRecord::Schema.define :version => 0 do
16
16
  create_table :users, :force => true do |t|
@@ -20,6 +20,7 @@ ActiveRecord::Schema.define :version => 0 do
20
20
  t.string :address2
21
21
  t.string :ssn
22
22
  t.integer :zip_code
23
+ t.text :text
23
24
  end
24
25
  end
25
26
 
@@ -54,8 +55,11 @@ class User2 < ActiveRecord::Base
54
55
  # Use the same table as User above
55
56
  self.table_name = 'users'
56
57
 
58
+ serialize :text
59
+
57
60
  # Test :all cleaner. Only works with ActiveRecord Models
58
- cleanse :all, :cleaner => [:strip, Proc.new{|s| "@#{s}@"}], :except => [:address1, :zip_code]
61
+ # Must explicitly excelude :text since it is serialized
62
+ cleanse :all, :cleaner => [:strip, Proc.new{|s| "@#{s}@"}], :except => [:address1, :zip_code, :text]
59
63
 
60
64
  # Clean :first_name multiple times
61
65
  cleanse :first_name, :cleaner => Proc.new {|string| "<< #{string} >>"}
@@ -71,7 +75,7 @@ class User2 < ActiveRecord::Base
71
75
  end
72
76
 
73
77
  class ActiveRecordTest < Minitest::Test
74
- describe "ActiveRecord Models" do
78
+ describe 'ActiveRecord Models' do
75
79
 
76
80
  it 'have globally registered cleaner' do
77
81
  assert DataCleansing.cleaner(:strip)
@@ -118,14 +122,15 @@ class ActiveRecordTest < Minitest::Test
118
122
  end
119
123
  end
120
124
 
121
- describe "with user2" do
125
+ describe 'with user2' do
122
126
  before do
123
127
  @user = User2.new(
124
128
  :first_name => ' joe ',
125
129
  :last_name => "\n black\n",
126
130
  :ssn => "\n 123456789 \n ",
127
131
  :address1 => "2632 Brown St \n",
128
- :zip_code => "\n\t blah\n"
132
+ :zip_code => "\n\t blah\n",
133
+ :text => ["\n 123456789 \n ", ' second ']
129
134
  )
130
135
  end
131
136
 
@@ -145,6 +150,7 @@ class ActiveRecordTest < Minitest::Test
145
150
  assert_equal "2632 Brown St \n", @user.address1
146
151
  assert_equal "@123456789@", @user.ssn
147
152
  assert_equal nil, @user.zip_code, User2.send(:data_cleansing_cleaners)
153
+ assert_equal ["\n 123456789 \n ", ' second '], @user.text
148
154
  end
149
155
 
150
156
  end
@@ -8,7 +8,8 @@ class CleanersTest < Minitest::Test
8
8
  attr_accessor :first_name, :last_name, :address1, :address2,
9
9
  :make_this_upper, :clean_non_word, :clean_non_printable,
10
10
  :clean_html, :clean_from_uri, :clean_to_uri, :clean_whitespace,
11
- :clean_digits_only, :clean_to_integer, :clean_to_float, :clean_end_of_day
11
+ :clean_digits_only, :clean_to_integer, :clean_to_float, :clean_end_of_day,
12
+ :clean_order
12
13
 
13
14
  cleanse :first_name, :last_name, :address1, :address2, cleaner: :strip
14
15
  cleanse :make_this_upper, cleaner: :upcase
@@ -22,6 +23,10 @@ class CleanersTest < Minitest::Test
22
23
  cleanse :clean_to_integer, cleaner: :string_to_integer
23
24
  cleanse :clean_to_float, cleaner: :string_to_float
24
25
  cleanse :clean_end_of_day, cleaner: :end_of_day
26
+
27
+ # Call cleaners in the order they are defined
28
+ cleanse :clean_order, cleaner: [:upcase, :strip]
29
+ cleanse :clean_order, cleaner: -> val { val == 'BLAH' ? ' yes ' : ' no ' }
25
30
  end
26
31
 
27
32
  describe 'Cleaners' do
@@ -140,17 +145,17 @@ class CleanersTest < Minitest::Test
140
145
  end
141
146
 
142
147
  describe '#escape_uri' do
143
- it 'converts %20' do
148
+ it 'converts spaces' do
144
149
  user = User.new
145
150
  user.clean_to_uri = 'Jim Bob '
146
151
  user.cleanse_attributes!
147
- assert_equal 'Jim%20%20Bob%20', user.clean_to_uri
152
+ assert_equal 'Jim++Bob+', user.clean_to_uri
148
153
  end
149
- it 'converts %20 only' do
154
+ it 'converts space only' do
150
155
  user = User.new
151
156
  user.clean_to_uri = ' '
152
157
  user.cleanse_attributes!
153
- assert_equal '%20', user.clean_to_uri
158
+ assert_equal '+', user.clean_to_uri
154
159
  end
155
160
  end
156
161
 
@@ -205,5 +210,12 @@ class CleanersTest < Minitest::Test
205
210
  assert_equal Time.parse('2016-03-03 23:59:59 +0000').to_i, user.clean_end_of_day.to_i
206
211
  end
207
212
 
213
+ it 'cleans in the order defined' do
214
+ user = User.new
215
+ user.clean_order = ' blah '
216
+ user.cleanse_attributes!
217
+ assert_equal ' yes ', user.clean_order
218
+ end
219
+
208
220
  end
209
221
  end
@@ -0,0 +1,9 @@
1
+ require_relative 'test_helper'
2
+
3
+ class DataCleansingTest < Minitest::Test
4
+ describe '#clean' do
5
+ it 'can call any cleaner directly' do
6
+ assert_equal 'jack black', DataCleansing.clean(:strip, ' jack black ')
7
+ end
8
+ end
9
+ end
data/test/ruby_test.rb CHANGED
@@ -75,6 +75,7 @@ class RubyTest < Minitest::Test
75
75
  assert_equal 'joe', RubyUserChild.cleanse_attribute(:first_name, ' joe '), RubyUserChild.send(:data_cleansing_attribute_cleaners)
76
76
  assert_equal 'black', RubyUserChild.cleanse_attribute(:last_name, "\n black\n"), RubyUserChild.send(:data_cleansing_attribute_cleaners)
77
77
  assert_equal '<< 2632 Brown St >>', RubyUserChild.cleanse_attribute(:address1, "2632 Brown St \n"), RubyUserChild.send(:data_cleansing_attribute_cleaners)
78
+ assert_equal 3, RubyUserChild.cleanse_attribute(:first_name, 3), RubyUserChild.send(:data_cleansing_attribute_cleaners)
78
79
  end
79
80
 
80
81
  describe "with ruby user" do
data/test/test_db.sqlite3 CHANGED
Binary file
data/test/test_helper.rb CHANGED
@@ -3,7 +3,6 @@ $LOAD_PATH.unshift File.dirname(__FILE__) + '/../lib'
3
3
  require 'yaml'
4
4
  require 'minitest/autorun'
5
5
  require 'minitest/reporters'
6
- require 'minitest/stub_any_instance'
7
6
  require 'awesome_print'
8
7
  require 'data_cleansing'
9
8
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_cleansing
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.0
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Reid Morrison
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-03-03 00:00:00.000000000 Z
11
+ date: 2016-08-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: concurrent-ruby
@@ -56,6 +56,7 @@ files:
56
56
  - lib/data_cleansing/version.rb
57
57
  - test/active_record_test.rb
58
58
  - test/cleaners_test.rb
59
+ - test/data_cleansing_test.rb
59
60
  - test/ruby_test.rb
60
61
  - test/test_db.sqlite3
61
62
  - test/test_helper.rb
@@ -86,6 +87,7 @@ summary: Data Cleansing framework for Ruby, Rails, Mongoid and MongoMapper.
86
87
  test_files:
87
88
  - test/active_record_test.rb
88
89
  - test/cleaners_test.rb
90
+ - test/data_cleansing_test.rb
89
91
  - test/ruby_test.rb
90
92
  - test/test_db.sqlite3
91
93
  - test/test_helper.rb