data_cleansing 0.9.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 8ba846025b7441eb5a93230b7fbd8ebe2a4d88e3
4
- data.tar.gz: 4e209fd6ef57540a8b549d06c314ae4caeddbf59
3
+ metadata.gz: 0347620583101155e6181d7e2504ef2a4816d970
4
+ data.tar.gz: b80bff6ab7116bda3cef959a7eeb6c231ab3660b
5
5
  SHA512:
6
- metadata.gz: 7b464ca76d4c40f4621d86a32cd76bd4bc3e71e8b5eed18ac094ae651a8f0be58772a503fa096c6798b081cf3030363973b0d96cfd2cf45d6497e14a5b2717f1
7
- data.tar.gz: e6933049c6200cadb6e398e3d2af8bae641534942a201c6ed7b8a47fc991f7a843d7b2d1b6cbc1c00f14d837f2de887ac6011f23c187fe33be3c6199a1e18cdf
6
+ metadata.gz: 474e0ed54427a7958358a1d645d95792af3b83f0621e48c1986d09ecfd1f8288aada4e6ee55573f88347eb7193adf0eddde1b7cb39c110c6a84a1c5f43daae19
7
+ data.tar.gz: deb42d04fa24cf7b3e77d8411989e2c8578bf8c23b9f8726898df640e93e271071e0cccd96850eb342780737894823ccd295032a24bb22fdc16678a71b55ddd1
data/README.md CHANGED
@@ -1,7 +1,7 @@
1
1
  data_cleansing
2
2
  ==============
3
3
 
4
- Data Cleansing framework for Ruby, Rails, Mongoid and MongoMapper.
4
+ Data Cleansing framework for Ruby.
5
5
 
6
6
  * http://github.com/reidmorrison/data_cleansing
7
7
 
@@ -12,12 +12,8 @@ or trailing blanks and even newlines.
12
12
  Similarly it would be useful to be able to attach a cleansing solution to a field
13
13
  in a model and have the data cleansed transparently when required.
14
14
 
15
- DataCleansing is a framework that allows any data cleansing to be applied to
16
- specific attributes or fields. At this time it does not supply the cleaning
17
- solutions themselves since they are usually straight forward, or so complex
18
- that they don't tend to be too useful to others. However, over time built-in
19
- cleansing solutions may be added. Feel free to submit any suggestions via a ticket
20
- or pull request.
15
+ DataCleansing is a framework that allows data cleansing to be applied to
16
+ specific attributes or fields.
21
17
 
22
18
  ## Features
23
19
 
@@ -297,24 +293,6 @@ Install the Gem with bundler
297
293
 
298
294
  bundle install
299
295
 
300
- ## Architecture
301
-
302
- DataCleansing has been designed to support externalized data cleansing routines.
303
- In this way the data cleansing routine itself can be loaded from a datastore and
304
- applied dynamically at runtime.
305
- Although not supported out of the box, this design allows for example for the
306
- data cleansing routines to be stored in something like [ZooKeeper](http://zookeeper.apache.org/).
307
- Then any changes to the data cleansing routines can be pushed out immediately to
308
- every server that needs it.
309
-
310
- DataCleansing is designed to support any Ruby model. In this way it can be used
311
- in just about any ORM or DOM. For example, it currently easily supports both
312
- Rails and Mongoid models. Some extensions have been added to support these frameworks.
313
-
314
- For example, in Rails it obtains the raw data value before Rails has converted it.
315
- Which is useful for cleansing integer or float fields as raw strings before Rails
316
- tries to convert it to an integer or float.
317
-
318
296
  ## Dependencies
319
297
 
320
298
  DataCleansing requires the following dependencies
data/Rakefile CHANGED
@@ -1,6 +1,4 @@
1
- require 'rake/clean'
2
1
  require 'rake/testtask'
3
-
4
2
  require_relative 'lib/data_cleansing/version'
5
3
 
6
4
  task :gem do
@@ -14,14 +12,10 @@ task publish: :gem do
14
12
  system "rm data_cleansing-#{DataCleansing::VERSION}.gem"
15
13
  end
16
14
 
17
- desc 'Run Test Suite'
18
- task :test do
19
- Rake::TestTask.new(:functional) do |t|
20
- t.test_files = FileList['test/**/*_test.rb']
21
- t.verbose = true
22
- end
23
-
24
- Rake::Task['functional'].invoke
15
+ Rake::TestTask.new(:test) do |t|
16
+ t.pattern = 'test/**/*_test.rb'
17
+ t.verbose = true
18
+ t.warning = true
25
19
  end
26
20
 
27
21
  task default: :test
@@ -1,4 +1,4 @@
1
- require 'uri'
1
+ require 'cgi'
2
2
  module Cleaners
3
3
  # Strip leading and trailing whitespace
4
4
  module Strip
@@ -20,6 +20,16 @@ module Cleaners
20
20
  end
21
21
  DataCleansing.register_cleaner(:upcase, Upcase)
22
22
 
23
+ # Convert to downcase
24
+ module Downcase
25
+ def self.call(string)
26
+ return string unless string.is_a?(String)
27
+
28
+ string.downcase! || string
29
+ end
30
+ end
31
+ DataCleansing.register_cleaner(:downcase, Downcase)
32
+
23
33
  # Remove all non-word characters, including whitespace
24
34
  module RemoveNonWord
25
35
  NOT_WORDS = Regexp.compile(/\W/)
@@ -44,7 +54,7 @@ module Cleaners
44
54
  end
45
55
  DataCleansing.register_cleaner(:remove_non_printable, RemoveNonPrintable)
46
56
 
47
- # Remove HTML Markup
57
+ # Unescape HTML Markup ( case-insensitive )
48
58
  module ReplaceHTMLMarkup
49
59
  HTML_MARKUP = Regexp.compile(/&(amp|quot|gt|lt|apos|nbsp);/in)
50
60
 
@@ -77,7 +87,7 @@ module Cleaners
77
87
  def self.call(string)
78
88
  return string unless string.is_a?(String)
79
89
 
80
- URI.unescape(string)
90
+ CGI.unescape(string)
81
91
  end
82
92
  end
83
93
  DataCleansing.register_cleaner(:unescape_uri, UnescapeURI)
@@ -86,7 +96,7 @@ module Cleaners
86
96
  def self.call(string)
87
97
  return string unless string.is_a?(String)
88
98
 
89
- URI.escape(string)
99
+ CGI.escape(string)
90
100
  end
91
101
  end
92
102
  DataCleansing.register_cleaner(:escape_uri, EscapeURI)
@@ -7,10 +7,10 @@ module DataCleansing
7
7
  module ClassMethods
8
8
  # Define how to cleanse one or more attributes
9
9
  def cleanse(*args)
10
- last = args.last
10
+ last = args.last
11
11
  attributes = args.dup
12
- params = (last.is_a?(Hash) && last.instance_of?(Hash)) ? attributes.pop.dup : {}
13
- cleaners = Array(params.delete(:cleaner))
12
+ params = (last.is_a?(Hash) && last.instance_of?(Hash)) ? attributes.pop.dup : {}
13
+ cleaners = Array(params.delete(:cleaner))
14
14
  raise(ArgumentError, "Mandatory :cleaner parameter is missing: #{params.inspect}") unless cleaners
15
15
 
16
16
  cleaner = DataCleansingCleaner.new(cleaners, attributes, params)
@@ -58,7 +58,7 @@ module DataCleansing
58
58
 
59
59
  # Collect parent cleaners first, starting with the top parent
60
60
  cleaners = []
61
- klass = self
61
+ klass = self
62
62
  while klass != Object
63
63
  if klass.respond_to?(:data_cleansing_attribute_cleaners)
64
64
  cleaners += klass.data_cleansing_attribute_cleaners[:all] || []
@@ -66,8 +66,9 @@ module DataCleansing
66
66
  end
67
67
  klass = klass.superclass
68
68
  end
69
- cleansed_value = value.dup
70
- cleaners.reverse_each {|cleaner| cleansed_value = data_cleansing_clean(cleaner, cleansed_value, object) if cleaner}
69
+ # Support Fixnum values
70
+ cleansed_value = value.is_a?(Fixnum) ? value : value.dup
71
+ cleaners.reverse_each { |cleaner| cleansed_value = data_cleansing_clean(cleaner, cleansed_value, object) if cleaner }
71
72
  cleansed_value
72
73
  end
73
74
 
@@ -90,33 +91,19 @@ module DataCleansing
90
91
 
91
92
  # Returns the supplied value cleansed using the supplied cleaner
92
93
  # Parameters
93
- # object
94
+ # binding
94
95
  # If supplied the cleansing will be performed within the scope of
95
- # that object so that cleaners can read and write to attributes
96
- # of that object
96
+ # that binding so that cleaners can read and write to attributes
97
+ # of that binding
97
98
  #
98
99
  # No logging of cleansing is performed by this method since the value
99
100
  # itself is not modified
100
- def data_cleansing_clean(cleaner_struct, value, object=nil)
101
+ def data_cleansing_clean(cleaner_struct, value, binding = nil)
101
102
  return if cleaner_struct.nil? || value.nil?
102
103
  # Duplicate value in case cleaner uses methods such as gsub!
103
104
  new_value = value.is_a?(String) ? value.dup : value
104
105
  cleaner_struct.cleaners.each do |name|
105
- # Cleaner itself could be a custom Proc, otherwise do a global lookup for it
106
- proc = name.is_a?(Proc) ? name : DataCleansing.cleaner(name.to_sym)
107
- raise "No cleaner defined for #{name.inspect}" unless proc
108
-
109
- if proc.is_a?(Proc)
110
- new_value = if object
111
- # Call the cleaner proc within the scope (binding) of the object
112
- proc.arity == 1 ? object.instance_exec(new_value, &proc) : object.instance_exec(new_value, cleaner_struct.params, &proc)
113
- else
114
- proc.arity == 1 ? proc.call(new_value) : proc.call(new_value, cleaner_struct.params)
115
- end
116
- else
117
- new_value = (proc.method(:call).arity == 1 ? proc.call(new_value) : proc.call(new_value, cleaner_struct.params))
118
- end
119
-
106
+ new_value = DataCleansing.clean(name, new_value, binding)
120
107
  end
121
108
  new_value
122
109
  end
@@ -135,19 +122,19 @@ module DataCleansing
135
122
  changes = {}
136
123
  DataCleansing.logger.benchmark_info("#{self.class.name}#cleanse_attributes!", :payload => changes) do
137
124
  # Collect parent cleaners first, starting with the top parent
138
- cleaners = [self.class.send(:data_cleansing_cleaners)]
125
+ cleaners = [self.class.send(:data_cleansing_cleaners)]
139
126
  after_cleaners = [self.class.send(:data_cleansing_after_cleaners)]
140
- klass = self.class.superclass
127
+ klass = self.class.superclass
141
128
  while klass != Object
142
129
  cleaners << klass.send(:data_cleansing_cleaners) if klass.respond_to?(:data_cleansing_cleaners)
143
130
  after_cleaners << klass.send(:data_cleansing_after_cleaners) if klass.respond_to?(:data_cleansing_after_cleaners)
144
131
  klass = klass.superclass
145
132
  end
146
133
  # Capture all modified fields if log_level is :debug or :trace
147
- cleaners.reverse_each {|cleaner| changes.merge!(data_cleansing_execute_cleaners(cleaner, verbose))}
134
+ cleaners.reverse_each { |cleaner| changes.merge!(data_cleansing_execute_cleaners(cleaner, verbose)) }
148
135
 
149
136
  # Execute the after cleaners, starting with the parent after cleanse methods
150
- after_cleaners.reverse_each {|a| a.each {|method| send(method)} }
137
+ after_cleaners.reverse_each { |a| a.each { |method| send(method) } }
151
138
  end
152
139
  changes
153
140
  end
@@ -176,15 +163,9 @@ module DataCleansing
176
163
  # Special case to include :all fields
177
164
  # Only works with ActiveRecord based models, not supported with regular Ruby models
178
165
  if attrs.include?(:all) && defined?(ActiveRecord) && respond_to?(:attributes)
179
- attrs = attributes.keys.collect{|i| i.to_sym}
166
+ attrs = attributes.keys.collect { |i| i.to_sym }
180
167
  attrs.delete(:id)
181
168
 
182
- # Remove serialized_attributes if any, from the :all condition
183
- if self.class.respond_to?(:serialized_attributes)
184
- serialized_attrs = self.class.serialized_attributes.keys
185
- attrs -= serialized_attrs.collect{|i| i.to_sym} if serialized_attrs
186
- end
187
-
188
169
  # Replace any encrypted attributes with their non-encrypted versions if any
189
170
  if defined?(SymmetricEncryption) && self.class.respond_to?(:encrypted_attributes)
190
171
  self.class.encrypted_attributes.each_pair do |clear, encrypted|
@@ -205,15 +186,16 @@ module DataCleansing
205
186
  attrs.each do |attr|
206
187
  # Under ActiveModel for Rails and Mongoid need to retrieve raw value
207
188
  # before data type conversion
208
- value = if respond_to?(:read_attribute_before_type_cast) && has_attribute?(attr.to_s)
209
- read_attribute_before_type_cast(attr.to_s)
210
- else
211
- send(attr.to_sym)
212
- end
189
+ value =
190
+ if respond_to?(:read_attribute_before_type_cast) && has_attribute?(attr.to_s)
191
+ read_attribute_before_type_cast(attr.to_s)
192
+ else
193
+ send(attr.to_sym)
194
+ end
213
195
 
214
196
  # No need to clean if attribute is nil
215
197
  unless value.nil?
216
- new_value = self.class.send(:data_cleansing_clean,cleaner_struct, value, self)
198
+ new_value = self.class.send(:data_cleansing_clean, cleaner_struct, value, self)
217
199
 
218
200
  if new_value != value
219
201
  # Update value only if it has changed
@@ -222,7 +204,7 @@ module DataCleansing
222
204
  # Capture changed attributes
223
205
  if changes
224
206
  # Mask sensitive attributes when logging
225
- masked = DataCleansing.masked_attributes.include?(attr.to_sym)
207
+ masked = DataCleansing.masked_attributes.include?(attr.to_sym)
226
208
  new_value = :masked if masked && !new_value.nil?
227
209
  if previous = changes[attr.to_sym]
228
210
  previous[:after] = new_value
@@ -246,7 +228,7 @@ module DataCleansing
246
228
 
247
229
  def self.included(base)
248
230
  base.class_eval do
249
- extend DataCleansing::Cleanse::ClassMethods
231
+ extend DataCleansing::Cleanse::ClassMethods
250
232
  include DataCleansing::Cleanse::InstanceMethods
251
233
  end
252
234
  end
@@ -27,4 +27,22 @@ module DataCleansing
27
27
  @@masked_attributes.freeze
28
28
  end
29
29
 
30
+ # Run the specified cleanser against the supplied value
31
+ def self.clean(name, value, binding = nil)
32
+ # Cleaner itself could be a custom Proc, otherwise do a global lookup for it
33
+ proc = name.is_a?(Proc) ? name : DataCleansing.cleaner(name.to_sym)
34
+ raise(ArgumentError, "No cleaner defined for #{name.inspect}") unless proc
35
+
36
+ if proc.is_a?(Proc)
37
+ if binding
38
+ # Call the cleaner proc within the scope (binding) of the binding
39
+ proc.arity == 1 ? binding.instance_exec(value, &proc) : binding.instance_exec(value, cleaner_struct.params, &proc)
40
+ else
41
+ proc.arity == 1 ? proc.call(value) : proc.call(value, cleaner_struct.params)
42
+ end
43
+ else
44
+ (proc.method(:call).arity == 1 ? proc.call(value) : proc.call(value, cleaner_struct.params))
45
+ end
46
+ end
47
+
30
48
  end
@@ -1,3 +1,3 @@
1
1
  module DataCleansing
2
- VERSION = '0.9.0'
2
+ VERSION = '1.0.0'
3
3
  end
@@ -10,7 +10,7 @@ ActiveRecord::Base.configurations = {
10
10
  'timeout' => 5000
11
11
  }
12
12
  }
13
- ActiveRecord::Base.establish_connection('test')
13
+ ActiveRecord::Base.establish_connection(:test)
14
14
 
15
15
  ActiveRecord::Schema.define :version => 0 do
16
16
  create_table :users, :force => true do |t|
@@ -20,6 +20,7 @@ ActiveRecord::Schema.define :version => 0 do
20
20
  t.string :address2
21
21
  t.string :ssn
22
22
  t.integer :zip_code
23
+ t.text :text
23
24
  end
24
25
  end
25
26
 
@@ -54,8 +55,11 @@ class User2 < ActiveRecord::Base
54
55
  # Use the same table as User above
55
56
  self.table_name = 'users'
56
57
 
58
+ serialize :text
59
+
57
60
  # Test :all cleaner. Only works with ActiveRecord Models
58
- cleanse :all, :cleaner => [:strip, Proc.new{|s| "@#{s}@"}], :except => [:address1, :zip_code]
61
+ # Must explicitly excelude :text since it is serialized
62
+ cleanse :all, :cleaner => [:strip, Proc.new{|s| "@#{s}@"}], :except => [:address1, :zip_code, :text]
59
63
 
60
64
  # Clean :first_name multiple times
61
65
  cleanse :first_name, :cleaner => Proc.new {|string| "<< #{string} >>"}
@@ -71,7 +75,7 @@ class User2 < ActiveRecord::Base
71
75
  end
72
76
 
73
77
  class ActiveRecordTest < Minitest::Test
74
- describe "ActiveRecord Models" do
78
+ describe 'ActiveRecord Models' do
75
79
 
76
80
  it 'have globally registered cleaner' do
77
81
  assert DataCleansing.cleaner(:strip)
@@ -118,14 +122,15 @@ class ActiveRecordTest < Minitest::Test
118
122
  end
119
123
  end
120
124
 
121
- describe "with user2" do
125
+ describe 'with user2' do
122
126
  before do
123
127
  @user = User2.new(
124
128
  :first_name => ' joe ',
125
129
  :last_name => "\n black\n",
126
130
  :ssn => "\n 123456789 \n ",
127
131
  :address1 => "2632 Brown St \n",
128
- :zip_code => "\n\t blah\n"
132
+ :zip_code => "\n\t blah\n",
133
+ :text => ["\n 123456789 \n ", ' second ']
129
134
  )
130
135
  end
131
136
 
@@ -145,6 +150,7 @@ class ActiveRecordTest < Minitest::Test
145
150
  assert_equal "2632 Brown St \n", @user.address1
146
151
  assert_equal "@123456789@", @user.ssn
147
152
  assert_equal nil, @user.zip_code, User2.send(:data_cleansing_cleaners)
153
+ assert_equal ["\n 123456789 \n ", ' second '], @user.text
148
154
  end
149
155
 
150
156
  end
@@ -8,7 +8,8 @@ class CleanersTest < Minitest::Test
8
8
  attr_accessor :first_name, :last_name, :address1, :address2,
9
9
  :make_this_upper, :clean_non_word, :clean_non_printable,
10
10
  :clean_html, :clean_from_uri, :clean_to_uri, :clean_whitespace,
11
- :clean_digits_only, :clean_to_integer, :clean_to_float, :clean_end_of_day
11
+ :clean_digits_only, :clean_to_integer, :clean_to_float, :clean_end_of_day,
12
+ :clean_order
12
13
 
13
14
  cleanse :first_name, :last_name, :address1, :address2, cleaner: :strip
14
15
  cleanse :make_this_upper, cleaner: :upcase
@@ -22,6 +23,10 @@ class CleanersTest < Minitest::Test
22
23
  cleanse :clean_to_integer, cleaner: :string_to_integer
23
24
  cleanse :clean_to_float, cleaner: :string_to_float
24
25
  cleanse :clean_end_of_day, cleaner: :end_of_day
26
+
27
+ # Call cleaners in the order they are defined
28
+ cleanse :clean_order, cleaner: [:upcase, :strip]
29
+ cleanse :clean_order, cleaner: -> val { val == 'BLAH' ? ' yes ' : ' no ' }
25
30
  end
26
31
 
27
32
  describe 'Cleaners' do
@@ -140,17 +145,17 @@ class CleanersTest < Minitest::Test
140
145
  end
141
146
 
142
147
  describe '#escape_uri' do
143
- it 'converts %20' do
148
+ it 'converts spaces' do
144
149
  user = User.new
145
150
  user.clean_to_uri = 'Jim Bob '
146
151
  user.cleanse_attributes!
147
- assert_equal 'Jim%20%20Bob%20', user.clean_to_uri
152
+ assert_equal 'Jim++Bob+', user.clean_to_uri
148
153
  end
149
- it 'converts %20 only' do
154
+ it 'converts space only' do
150
155
  user = User.new
151
156
  user.clean_to_uri = ' '
152
157
  user.cleanse_attributes!
153
- assert_equal '%20', user.clean_to_uri
158
+ assert_equal '+', user.clean_to_uri
154
159
  end
155
160
  end
156
161
 
@@ -205,5 +210,12 @@ class CleanersTest < Minitest::Test
205
210
  assert_equal Time.parse('2016-03-03 23:59:59 +0000').to_i, user.clean_end_of_day.to_i
206
211
  end
207
212
 
213
+ it 'cleans in the order defined' do
214
+ user = User.new
215
+ user.clean_order = ' blah '
216
+ user.cleanse_attributes!
217
+ assert_equal ' yes ', user.clean_order
218
+ end
219
+
208
220
  end
209
221
  end
@@ -0,0 +1,9 @@
1
+ require_relative 'test_helper'
2
+
3
+ class DataCleansingTest < Minitest::Test
4
+ describe '#clean' do
5
+ it 'can call any cleaner directly' do
6
+ assert_equal 'jack black', DataCleansing.clean(:strip, ' jack black ')
7
+ end
8
+ end
9
+ end
data/test/ruby_test.rb CHANGED
@@ -75,6 +75,7 @@ class RubyTest < Minitest::Test
75
75
  assert_equal 'joe', RubyUserChild.cleanse_attribute(:first_name, ' joe '), RubyUserChild.send(:data_cleansing_attribute_cleaners)
76
76
  assert_equal 'black', RubyUserChild.cleanse_attribute(:last_name, "\n black\n"), RubyUserChild.send(:data_cleansing_attribute_cleaners)
77
77
  assert_equal '<< 2632 Brown St >>', RubyUserChild.cleanse_attribute(:address1, "2632 Brown St \n"), RubyUserChild.send(:data_cleansing_attribute_cleaners)
78
+ assert_equal 3, RubyUserChild.cleanse_attribute(:first_name, 3), RubyUserChild.send(:data_cleansing_attribute_cleaners)
78
79
  end
79
80
 
80
81
  describe "with ruby user" do
data/test/test_db.sqlite3 CHANGED
Binary file
data/test/test_helper.rb CHANGED
@@ -3,7 +3,6 @@ $LOAD_PATH.unshift File.dirname(__FILE__) + '/../lib'
3
3
  require 'yaml'
4
4
  require 'minitest/autorun'
5
5
  require 'minitest/reporters'
6
- require 'minitest/stub_any_instance'
7
6
  require 'awesome_print'
8
7
  require 'data_cleansing'
9
8
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_cleansing
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.0
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Reid Morrison
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-03-03 00:00:00.000000000 Z
11
+ date: 2016-08-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: concurrent-ruby
@@ -56,6 +56,7 @@ files:
56
56
  - lib/data_cleansing/version.rb
57
57
  - test/active_record_test.rb
58
58
  - test/cleaners_test.rb
59
+ - test/data_cleansing_test.rb
59
60
  - test/ruby_test.rb
60
61
  - test/test_db.sqlite3
61
62
  - test/test_helper.rb
@@ -86,6 +87,7 @@ summary: Data Cleansing framework for Ruby, Rails, Mongoid and MongoMapper.
86
87
  test_files:
87
88
  - test/active_record_test.rb
88
89
  - test/cleaners_test.rb
90
+ - test/data_cleansing_test.rb
89
91
  - test/ruby_test.rb
90
92
  - test/test_db.sqlite3
91
93
  - test/test_helper.rb