data_janitor 0.3.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: f234ed8031caa839c30a2e485173c4166987bcac
4
+ data.tar.gz: f72579f69d10bb6c2a76e70c98ac37479066156d
5
+ SHA512:
6
+ metadata.gz: b16d22613630c39960ca6a5e2ec84d8a42a1f618069981994fc345ab31c746981057b19913d137f16139bc04a98483d8c287b2e88a483890f19c4b4a09e878aa
7
+ data.tar.gz: 8e0768a057fb6d1093266d26d26b99efec8c0db594850e31b3dea6a44e236ea937f422e7a2b2b9bca3b2cd4065c3f0c51b79865b49ae75470fb4f33fb3963938
@@ -0,0 +1,16 @@
1
+ ### Expected/Desired Behavior:
2
+ < Description of the expected or desired behavior including an example >
3
+
4
+ ### Current Behavior:
5
+ < Description of the current behavior being encountered including an example >
6
+
7
+ ### Impact:
8
+ - Will this introduce any externally facing changes(endpoint, fields, validation, query)?
9
+ - Was a regression introduced(if bug)?
10
+ - Urgency(low, medium, high)?
11
+ - Teams impacted?(web, ios, android, maps, etc)
12
+
13
+ ### Requirements:
14
+ - Label as "Bug Fix", "Feature" or "Optimization"(Tech Debt)
15
+
16
+ cc: @two_ @team_members
@@ -0,0 +1,19 @@
1
+ ### Issue:
2
+ [JIRA-0000](http://link_to_ticket.com)
3
+
4
+ ### Proposed Changes:
5
+ < Description of the proposed change and its value. If possible also include a usage example and relative links to provide additional context for the reviewer>
6
+
7
+ ### Impact:
8
+ - Will this introduce any externally facing changes(endpoint, fields, validation, query)?
9
+ - Any special deployment coordination needed(env changes, data/db migrations, etc.)?
10
+ - Urgency(low, medium, high)?
11
+ - Teams impacted(web, ios, android, maps, etc)?
12
+
13
+ ### Requirements:
14
+ - Add Tests to cover changes
15
+ - Update Swagger documentation if applicable
16
+ - Update repo documentation if applicable(readme, wiki)
17
+ - Label as "Bug Fix", "Feature" or "Optimization"(Tech Debt)
18
+
19
+ cc: @two_ @team_members
data/.gitignore ADDED
@@ -0,0 +1,11 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ .DS_Store
11
+ /*.gem
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --require spec_helper
2
+ --format documentation
3
+ --color
data/.travis.yml ADDED
@@ -0,0 +1,11 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.2.2
4
+ before_install: gem install bundler -v 1.10.6
5
+ deploy:
6
+ provider: rubygems
7
+ api_key: $RUBYGEMS_API_KEY
8
+ gem: data_janitor
9
+ on:
10
+ tags: true
11
+ repo: westfieldlabs/data_janitor
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in data_janitor.gemspec
4
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,201 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "{}"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright {yyyy} {name of copyright owner}
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
data/README.md ADDED
@@ -0,0 +1,86 @@
1
+ [![Build Status](https://travis-ci.org/westfieldlabs/data_janitor.svg?branch=master)](https://travis-ci.org/westfieldlabs/data_janitor)
2
+
3
+ # DataJanitor
4
+
5
+ DataJanitor allows you to run your in-application Active Record validations as well as additional data audit validations across all records in a table or database at will. This is particular helpful in evolving validations and finding which records will no longer pass validation, as well as periodically performing more extensive audit validations without the real-time cost. Additional validations can be written to run only during the audit, or can be also be run during create. This allows time to migrate existing data to the new validation requirements while ensuring new data meets the current validation standards.
6
+
7
+ DataJanitor also augments your ActiveRecord models (at rake-task runtime) to allow for running project-wide common validations on the data. Thus, it will look at all models in your repository and tell you whether any data was stored that potentially violates the common project data formats. This can happen if a model does not have enough validations of its own or if its validations are not strict enough.
8
+ ## Installation
9
+
10
+ Add this line to your application's Gemfile:
11
+
12
+ ```ruby
13
+ gem 'data_janitor'
14
+ ```
15
+
16
+ And then execute:
17
+
18
+ $ bundle
19
+
20
+ Or install it yourself as:
21
+
22
+ $ gem install data_janitor
23
+
24
+ ## Usage
25
+
26
+ ### Custom Model Level Validations
27
+
28
+ ActiveRecord validations for **Audit Only**
29
+
30
+ ```ruby
31
+ class SomeModel < ActiveRecord::Base
32
+ extend DataJanitor::AuditValidatable
33
+
34
+ dj_audit_validations do
35
+ # Desired validations
36
+ validates :country, inclusion: { in: ['US', 'AU', 'NZ'] }
37
+ end
38
+ end
39
+ ```
40
+ These validations only run when validating with an the ActiveRecord context `:dj_audit` is included, as in `rec.invalid?(:dj_audit)`, so they normally will only be run by the DJ rake tasks.
41
+
42
+ ActiveRecord validations for **Audit** and **Newly Created Records**
43
+
44
+ ```ruby
45
+ class SomeModel < ActiveRecord::Base
46
+ extend DataJanitor::AuditValidatable
47
+
48
+ dj_validations do
49
+ validates :name, length: { maximum: 25 }
50
+ end
51
+ end
52
+ ```
53
+ These validations are run when validating during create and with an the ActiveRecord context `:dj_audit` is included, as in `rec.invalid?(:dj_audit)`, so they are run by the application as well as by the DJ rake tasks.
54
+
55
+ ### Rake Tasks
56
+
57
+ To audit the data defined by the ActiveRecord models in your repository:
58
+ ```
59
+ rake data_janitor:audit
60
+ rake data_janitor:audit['some/file/path.json']
61
+ rake data_janitor:audit['some/file/path.json',true]
62
+ ```
63
+
64
+ This will audit your DB for errors and output them to `tmp/data_janitor_results.json` (by default), or a specified path. The report will contain a list of errors with IDs of invalid records for each model. Including `true` will also display the output at the console.
65
+
66
+ You can also audit a specific model rather than all models found in the repository:
67
+
68
+ ```
69
+ rake data_janitor:audit_model[SomeModel]
70
+ ```
71
+
72
+ To apply common fixes to all models in your repository:
73
+ ```
74
+ rake data_janitor:cleanse
75
+ ```
76
+
77
+ To apply common fixes to just one model:
78
+ ```
79
+ rake data_janitor:cleanse[SomeModel]
80
+ ```
81
+
82
+ This will apply all the fixes that do not require semantic analysis of the data (e.g. replace `nil` values with `""` for strings)
83
+
84
+ ## Contributing
85
+
86
+ Bug reports and pull requests are welcome on GitHub at https://github.com/westfield/data_janitor.
data/Rakefile ADDED
@@ -0,0 +1,7 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
7
+ task :test => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "data_janitor"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
data/contributing.md ADDED
@@ -0,0 +1,13 @@
1
+ ## Contributing
2
+
3
+ 0. Raise an issue in this project outlining the change you'd like to make. You'll need the `issue_number`
4
+ 1. Fork this project into your own Github account
5
+ 2. Create your feature branch (`git checkout -b {issue_number}_my-new-feature`)
6
+ 4. Push to the branch (`git push -u origin {issue_number}_my-new-feature`)
7
+ 5. Write RSpec tests to formalise the outcomes of your change
8
+ 6. Write the code to implement the changes, ensuring that the tests eventually pass
9
+ 7. Commit your changes (`git commit -am '{issue_number} Added some feature'`)
10
+ 8. Push to the branch (`git push -u origin my-new-feature`)
11
+ 9. Create a new Pull Request using Github's built-in mechanisms.
12
+ 10. Wait for a code review to come back, or an approval.
13
+ 11. Either go back and fix whatever the reviewers recommend, or watch and smile as your PR gets merged in and you become part of software development history.
@@ -0,0 +1,27 @@
1
+ # coding: utf-8
2
+ require_relative 'lib/data_janitor/version'
3
+
4
+ Gem::Specification.new do |spec|
5
+ spec.name = 'data_janitor'
6
+ spec.version = DataJanitor::VERSION
7
+ spec.authors = ['Louis Tran', 'Zhenya Mirkin']
8
+ spec.email = ['tran.louis@gmail.com']
9
+
10
+ spec.summary = %q{Rake task to check validity of column types and values.}
11
+ spec.description = %q{Rake task to check validity of column types and values.}
12
+ spec.homepage = 'https://github.com/westfieldlabs/data_janitor'
13
+ spec.license = 'Apache-2.0'
14
+
15
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
16
+ spec.bindir = 'bin'
17
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
+ spec.require_paths = ['lib']
19
+
20
+ spec.required_ruby_version = '>= 2.2.2'
21
+
22
+ spec.add_dependency 'rails', '~> 4.2'
23
+
24
+ spec.add_development_dependency 'bundler', '~> 1.10'
25
+ spec.add_development_dependency 'rake', '~> 10.0'
26
+ spec.add_development_dependency 'rspec', '~> 3.3'
27
+ end
@@ -0,0 +1,15 @@
1
+ module DataJanitor
2
+ module AuditValidatable
3
+ # extended data janitor model validations that apply
4
+ # to new records and data_janitor audit purposes
5
+ def dj_validations(&block)
6
+ with_options(on: [:dj_audit, :create], &block)
7
+ end
8
+
9
+ # extended data janitor model validations that apply
10
+ # data_janitor audit purposes
11
+ def dj_audit_validations(&block)
12
+ with_options(on: [:dj_audit], &block)
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,50 @@
1
+ module DataJanitor
2
+ module UniversalValidator
3
+ # TODO: Disabled until we decide to apply this condition as a standard validation
4
+ # TODO: Run standard validators instead of home-brewed
5
+ # validate :validate_field_values
6
+ # ACCEPTABLE_BOOLEAN_VALUES = %w(t true y yes on 1 f false n no off 0) # this list was taken from Postgres spec. TRUE FALSE, that are also there, are not listed because they are DB-native literals and have no representation in Ruby code
7
+ def validate_field_values
8
+ # selected_attributes = self.changed? ? self.changed_attributes : self.attributes
9
+ selected_attributes = self.attributes
10
+
11
+ selected_attributes.each do |field_name, field_val|
12
+ column = self.column_for_attribute field_name
13
+ report_error = lambda {|msg| errors[column.name] << msg}
14
+
15
+ if column.array
16
+ report_error.call "cannot be nil" if field_val.nil?
17
+ next
18
+ end
19
+
20
+ case column.type
21
+ when :boolean
22
+ report_error.call "cannot be nil" if field_val.nil?
23
+ # report_error.call("must be a valid boolean") unless ACCEPTABLE_BOOLEAN_VALUES.include? field_val
24
+ when :date
25
+ # Date.iso8601(field_val) rescue report_error.call("must be a date in ISO-8601")
26
+ when :time
27
+ # Time.iso8601(field_val) rescue report_error.call("must be a datetime in ISO-8601")
28
+ when :datetime
29
+ # Time.iso8601(field_val) rescue report_error.call("must be a datetime in ISO-8601")
30
+ when :decimal
31
+ report_error.call "cannot be nil" if field_val.nil?
32
+ # TODO: run numericality test
33
+ when :float
34
+ report_error.call "cannot be nil" if field_val.nil?
35
+ # TODO: run numericality test
36
+ when :integer
37
+ report_error.call "cannot be nil" if field_val.nil?
38
+ # TODO: run numericality test
39
+ when :string, :text
40
+ if field_val.nil?
41
+ report_error.call "cannot be nil. Use an empty string instead if that's what you wanted."
42
+ else
43
+ report_error.call "cannot have leading/trailing whitespaces" if field_val =~ /^\s/ || field_val =~ /\s$/
44
+ # TODO: Should we constrain to certain encoding types?
45
+ end
46
+ end
47
+ end
48
+ end
49
+ end
50
+ end
@@ -0,0 +1,3 @@
1
+ module DataJanitor
2
+ VERSION = "0.3.4"
3
+ end
@@ -0,0 +1,14 @@
1
+ require "data_janitor/version"
2
+ require 'rails'
3
+ require "data_janitor/universal_validator"
4
+ require "data_janitor/audit_validatable"
5
+
6
+ module DataJanitor
7
+
8
+ class MyRailtie < Rails::Railtie
9
+ rake_tasks do
10
+ Dir[File.join(File.dirname(__FILE__),'tasks/*.rake')].each { |f| load f }
11
+ end
12
+ end
13
+
14
+ end
@@ -0,0 +1,100 @@
1
+ namespace :data_janitor do
2
+ desc 'Summarize invalid database records'
3
+ task :audit, [:output_file, :verbose, :unscoped] => [:environment] do |_t, args|
4
+ args.with_defaults(
5
+ output_file: Rails.root.join('tmp', 'data_janitor_results.json'),
6
+ verbose: false,
7
+ unscoped: false
8
+ )
9
+
10
+ output = {}
11
+ all_models.each do |ar_model|
12
+ begin
13
+ audit ar_model, output, args[:verbose], args[:unscoped]
14
+ rescue ActiveRecord::StatementInvalid # used to catch HABTM and schema migration. Only care about real Models
15
+ puts "skipping #{ar_model}"
16
+ end
17
+ end
18
+
19
+ File.write(args[:output_file], output.to_json)
20
+
21
+ puts "Wrote results to #{args[:output_file]}"
22
+ end
23
+
24
+ desc 'Audit one model for data issues'
25
+ task :audit_model, [:model] => [:environment] do |_t, args|
26
+ Rails.application.eager_load!
27
+ audit args[:model].constantize, {}, true
28
+ end
29
+
30
+ # For each model, apply trivial data corrections (those that do not require looking at data semantics).
31
+ # This includes:
32
+ # - replace all null strings with empty strings
33
+ # - replace all null booleans with false
34
+ # - replace all null arrays with []
35
+ desc 'Apply common and safe data corrections'
36
+ task cleanse: :environment do
37
+ all_models.each do |ar_model|
38
+ cleanse_model! ar_model
39
+ end
40
+ end
41
+
42
+ desc 'Apply fixes to one model only'
43
+ task :cleanse_model, [:model] => [:environment] do |_t, args|
44
+ Rails.application.eager_load!
45
+
46
+ cleanse_model! args[:model].constantize
47
+ end
48
+
49
+ private
50
+
51
+ def all_models
52
+ Rails.application.eager_load!
53
+ ActiveRecord::Base.descendants
54
+ end
55
+
56
+ def audit(model, output = {}, verbose = false, unscoped = false)
57
+ total = 0
58
+ failed = 0
59
+ puts "Validating: #{model.name}"
60
+ output[model.to_s] = {}
61
+ model = model.unscoped if unscoped
62
+
63
+ model.include(DataJanitor::UniversalValidator)
64
+ model.validate :validate_field_values
65
+
66
+ model.find_each do |rec|
67
+ if rec.invalid?(:dj_audit)
68
+ rec.errors.to_h.each_pair do |attribute, error_message|
69
+ output[model.to_s][attribute] ||= {}
70
+ output[model.to_s][attribute][error_message] ||= []
71
+ output[model.to_s][attribute][error_message] << rec.id
72
+ end
73
+
74
+ failed += 1
75
+ end
76
+
77
+ total += 1
78
+ end
79
+
80
+ puts output.to_json if verbose
81
+ puts "Completed #{total} records with #{failed} failures"
82
+ end
83
+
84
+ def cleanse_model!(model)
85
+ string_columns = model.columns.select{|c| (c.type == :string || c.type == :text) && c.array == false}
86
+ boolean_columns = model.columns.select{|c| c.type == :boolean && c.array == false}
87
+ array_columns = model.columns.select{|c| c.array == true}
88
+
89
+ clean_nils_from! model, string_columns, ""
90
+ clean_nils_from! model, boolean_columns, false
91
+ clean_nils_from! model, array_columns, []
92
+ end
93
+
94
+ def clean_nils_from!(model, columns, default)
95
+ columns.each do |column|
96
+ count = model.where(column.name => nil).update_all(column.name => default)
97
+ puts "Fixed #{count} #{model} records where #{column.name} was nil" if count > 0
98
+ end
99
+ end
100
+ end
metadata ADDED
@@ -0,0 +1,121 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: data_janitor
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.3.4
5
+ platform: ruby
6
+ authors:
7
+ - Louis Tran
8
+ - Zhenya Mirkin
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2017-06-09 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: rails
16
+ requirement: !ruby/object:Gem::Requirement
17
+ requirements:
18
+ - - "~>"
19
+ - !ruby/object:Gem::Version
20
+ version: '4.2'
21
+ type: :runtime
22
+ prerelease: false
23
+ version_requirements: !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - "~>"
26
+ - !ruby/object:Gem::Version
27
+ version: '4.2'
28
+ - !ruby/object:Gem::Dependency
29
+ name: bundler
30
+ requirement: !ruby/object:Gem::Requirement
31
+ requirements:
32
+ - - "~>"
33
+ - !ruby/object:Gem::Version
34
+ version: '1.10'
35
+ type: :development
36
+ prerelease: false
37
+ version_requirements: !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - "~>"
40
+ - !ruby/object:Gem::Version
41
+ version: '1.10'
42
+ - !ruby/object:Gem::Dependency
43
+ name: rake
44
+ requirement: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - "~>"
47
+ - !ruby/object:Gem::Version
48
+ version: '10.0'
49
+ type: :development
50
+ prerelease: false
51
+ version_requirements: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - "~>"
54
+ - !ruby/object:Gem::Version
55
+ version: '10.0'
56
+ - !ruby/object:Gem::Dependency
57
+ name: rspec
58
+ requirement: !ruby/object:Gem::Requirement
59
+ requirements:
60
+ - - "~>"
61
+ - !ruby/object:Gem::Version
62
+ version: '3.3'
63
+ type: :development
64
+ prerelease: false
65
+ version_requirements: !ruby/object:Gem::Requirement
66
+ requirements:
67
+ - - "~>"
68
+ - !ruby/object:Gem::Version
69
+ version: '3.3'
70
+ description: Rake task to check validity of column types and values.
71
+ email:
72
+ - tran.louis@gmail.com
73
+ executables:
74
+ - console
75
+ - setup
76
+ extensions: []
77
+ extra_rdoc_files: []
78
+ files:
79
+ - ".github/ISSUE_TEMPLATE"
80
+ - ".github/PULL_REQUEST_TEMPLATE"
81
+ - ".gitignore"
82
+ - ".rspec"
83
+ - ".travis.yml"
84
+ - Gemfile
85
+ - LICENSE
86
+ - README.md
87
+ - Rakefile
88
+ - bin/console
89
+ - bin/setup
90
+ - contributing.md
91
+ - data_janitor.gemspec
92
+ - lib/data_janitor.rb
93
+ - lib/data_janitor/audit_validatable.rb
94
+ - lib/data_janitor/universal_validator.rb
95
+ - lib/data_janitor/version.rb
96
+ - lib/tasks/data_janitor.rake
97
+ homepage: https://github.com/westfieldlabs/data_janitor
98
+ licenses:
99
+ - Apache-2.0
100
+ metadata: {}
101
+ post_install_message:
102
+ rdoc_options: []
103
+ require_paths:
104
+ - lib
105
+ required_ruby_version: !ruby/object:Gem::Requirement
106
+ requirements:
107
+ - - ">="
108
+ - !ruby/object:Gem::Version
109
+ version: 2.2.2
110
+ required_rubygems_version: !ruby/object:Gem::Requirement
111
+ requirements:
112
+ - - ">="
113
+ - !ruby/object:Gem::Version
114
+ version: '0'
115
+ requirements: []
116
+ rubyforge_project:
117
+ rubygems_version: 2.4.8
118
+ signing_key:
119
+ specification_version: 4
120
+ summary: Rake task to check validity of column types and values.
121
+ test_files: []