RubyGems - utf8_sanitizer - Versions diffs - 0.0.2.pre.rc.04 → 1.01 - Mend

utf8_sanitizer 0.0.2.pre.rc.04 → 1.01

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/.rspec_status +4 -4
data/README.md +65 -56
data/Rakefile +6 -2
data/lib/utf8_sanitizer/csv/seeds_dirty_1.csv +2 -0
data/lib/utf8_sanitizer/utf.rb +55 -47
data/lib/utf8_sanitizer/version.rb +1 -1
data/lib/utf8_sanitizer.rb +4 -20
data/utf8_sanitizer.gemspec +5 -5
metadata +24 -21
data/lib/utf8_sanitizer/seed.rb +0 -74

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 05155a4029ddd8224888b1482972b0640b4a81ef0990d7bd8f29311a5798d309
-  data.tar.gz: 6d3bd40596087c6e8dfa41934a44ffd2459f6bf738abf77c66aa20ce4591b641
+  metadata.gz: 0de542eedb064eda2b7a85eda41c674dc7ae822e5bff19f0758bf7418ace84ee
+  data.tar.gz: b252e6c2aa92f32c2068ba59fb8f0719afedb034c47c8513c3017dda9588933b
 SHA512:
-  metadata.gz: 23d64036dec061d1290d186069adfe018825bbe16a89f733994bce7eb6793fd5f0d402d9cc00c57a590fcb24f6b2f14890cac0611ae54f73bec0cc9b39e060d4
-  data.tar.gz: 972355c619c5c6ef4753df81d7a94f18b4f0e0622555e3fb4c3d227d5a31477ebeea015a10682281ace975bb322daed2752f9cf71123c19b194429893f1b6f78
+  metadata.gz: 8bf27abb9db602ab114a0606cb83b1e16d931bec77901381e5ad851334b58027c13cd1aa5942607334cf57b66a1394ff50984d62f17c8bac21cdcafe7907e074
+  data.tar.gz: 1e8bcbb08ef7bd8db08af0a79ba9dc06d824634197f005abfe5fc8e390703be0729a0c1ce568378d4fc22266db871e635df4b2237f54bf4eb3b023235bfea5d4

data/.rspec_status CHANGED Viewed

@@ -1,4 +1,4 @@
-example_id                         | status | run_time        |
----------------------------------- | ------ | --------------- |
-./spec/utf8_sanitizer_spec.rb[1:1] | passed | 0.00112 seconds |
-./spec/utf8_sanitizer_spec.rb[1:2] | failed | 0.02392 seconds |
+example_id                           | status | run_time               |
+------------------------------------ | ------ | ---------------------- |
+./spec/utf8_sanitizer_spec.rb[1:1]   | passed | 2 minutes 40.4 seconds |
+./spec/utf8_sanitizer_spec.rb[1:2:1] | passed | 0.0017 seconds         |

data/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Utf8Sanitizer
-Removes invalid UTF8 characters & extra whitespace (carriage returns, new lines, tabs, spaces, etc.) from csv or strings.
+Removes invalid UTF8 characters & extra whitespace (carriage returns, new lines, tabs, spaces, etc.) from csv or strings. Also provides detailed report indicating row numbers containing non-UTF8 and extra whitespace, and before and after to compare changes.
 Example:
 ```
@@ -35,7 +35,9 @@ Or install it yourself as:
 ## Usage
-You have three options for UTF8 Sanitizing your data: CSV Parsing, Data Hash of strings, or run default seed data to test.
+Options for UTF8 Sanitizing data:
+1. CSV Parsing
+2. Data Hash of strings
 #### 1. CSV Parsing
 This is a good option if you are having problems with a CSV containing non-UTF8 characters.  Pass your file_path as a hash like below.  Hash MUST be a SYMBOL and named `:file_path`.  If not, default seeds will be passed as the system detects empty user input and thinks user is trying to run built-in seed data for testing.
@@ -47,29 +49,23 @@ sanitized_data = Utf8Sanitizer.sanitize(args)
 #### 2. Hash of Strings
 This is a good option if you are scraping data or cleaning up existing databases.  Pass your data as a hash like below.  Hash MUST be a SYMBOL and named `:data`.  The value of `:data` should be an array of hashes like below and can be any size from one to many tens of thousands.  The hashes inside the data array can be named anything from crm contact data like below, stats, recipes, or any custom hashes as long as they are in an array and resemble the syntax and structure like below.
 ```
-data_hash = [ { url: 'abc_autos_example.com',
-           act_name: 'ABC Aut\x92os',
-           street: '123 E Main St\r\n',
-           city: 'Austin',
-           state: 'TX',
-           zip: '78735',
-           phone: '(888) 555-1234\r\n' },
-         { url: 'xyz_trucks_example',
-           act_name: 'XYZ Aut\xC1os',
-           street: '456 W Main St\r\n',
-           city: 'Austin',
-           state: 'TX',
-           zip: '78735',
-           phone: '(800) 555-5678\r\n' },
-      }]
-sanitized_data = Utf8Sanitizer.sanitize(data: data_hash)
-```
-#### 3. Run Seed Data to Test
-If you want to run built-in seed data to first test, simply run as below without passing args.
-```
-sanitized_data = Utf8Sanitizer.sanitize
+array_of_hashes = [ { url: 'abc_autos_example.com',
+                       act_name: 'ABC Aut\x92os',
+                       street: '123 E Main St\r\n',
+                       city: 'Austin',
+                       state: 'TX',
+                       zip: '78735',
+                       phone: '(888) 555-1234\r\n' },
+                     { url: 'xyz_trucks_example',
+                       act_name: 'XYZ Aut\xC1os',
+                       street: '456 W Main St\r\n',
+                       city: 'Austin',
+                       state: 'TX',
+                       zip: '78735',
+                       phone: '(800) 555-5678\r\n' },
+                  }]
+sanitized_data = Utf8Sanitizer.sanitize(data: array_of_hashes)
 ```
 ### Returned Sanitized Data Format
@@ -79,39 +75,52 @@ The `:stats` are a breakdown of the results. `:defective_rows` and `:error_rows`
 `:data` is broken down into the following categories: `:valid_data`, `:encoded_data`, `:defective_data`, and `:error_data`.
-`:valid_data` is the most important data and you can access it with `sanitized_data[:data][:valid_data]`.  Each non-UTF8 row will be included in its original syntax like below and can be accessed directly via `sanitized_data[:data][:encoded_data]`.  **You can change the name of `sanitized_data` to anything you like, but it must be followed with `[:data][:valid_data]` and `[:data][:encoded_data]`, etc.**
+`:valid_data` is the most important data and you can access it with `sanitized_data[:data][:valid_data]`.  Each non-UTF8 row will be included in its original syntax like below and can be accessed directly via `sanitized_data[:data][:encoded_data]`.
+**You can change the name of `sanitized_data` to anything you like, but it must be followed with `[:data][:valid_data]` and `[:data][:encoded_data]`, etc.**
-`:pollute_seeds` is only for running seed data.  It injects each row with non-UTF8 and extra whitespace for testing.  It can be ignored and will only run if your input is nil, which tells the system that you are intentionally trying to run seed data for testing.
 ```
-{:stats=>
-  {:total_rows=>2, :header_row=>1, :valid_rows=>2, :error_rows=>0, :defective_rows=>0, :perfect_rows=>0, :encoded_rows=>2, :wchar_rows=>2},
- :file_path=>nil,
- :data=>
-  {:valid_data=>
-    [{:row_id=>"1",
-      :utf_status=>"encoded, wchar",
-      :url=>"abc_autos_example.com",
-      :act_name=>"ABC Autos Example",
-      :street=>"123 E Main St",
-      :city=>"Austin",
-      :state=>"TX",
-      :zip=>"78735",
-      :phone=>"(888) 555-1234"},
-     {:row_id=>"2",
-      :utf_status=>"encoded, wchar",
-      :url=>"xyz_trucks_example",
-      :act_name=>"XYZ Trucks Example",
-      :street=>"456 W Main St",
-      :city=>"Austin",
-      :state=>"TX",
-      :zip=>"78735",
-      :phone=>"(800) 555-4321"}],
-   :encoded_data=>
-    [{:row_id=>1, :text=>"1,abc_autos_example.com,ABC Autos Example\x98_\xC0,123 E Main St,Austin,TX,78735,(888) 555-1234\r\n"},
-     {:row_id=>2, :text=>"2,xyz_trucks_example,XYZ \xC1_\xCCTrucks Example,456 W Main St,Austin,TX,78735,(800) 555-4321\r\n"}],
-   :defective_data=>[],
-   :error_data=>[]},
-  :pollute_seeds=>true}
+{ stats:
+  {
+  total_rows: 2,
+  header_row: 1,
+  valid_rows: 2,
+  error_rows: 0,
+  defective_rows: 0,
+  perfect_rows: 0,
+  encoded_rows: 2,
+  wchar_rows: 2
+  },
+  file_path: nil,
+  data:
+  {
+    valid_data:
+    [
+      { row_id: '1',
+        utf_status: 'encoded, wchar',
+        url: 'abc_autos_example.com',
+        act_name: 'ABC Autos Example',
+        street: '123 E Main St',
+        city: 'Austin',
+        state: 'TX',
+        zip: '78735',
+        phone: '(888) 555-1234' },
+      { row_id: '2',
+        utf_status: 'encoded, wchar',
+        url: 'xyz_trucks_example',
+        act_name: 'XYZ Trucks Example',
+        street: '456 W Main St',
+        city: 'Austin',
+        state: 'TX',
+        zip: '78735',
+        phone: '(800) 555-4321' }
+    ],
+    encoded_data:     [{ row_id: 1, text: "1,abc_autos_example.com,ABC Autos Example\x98_\xC0,123 E Main St,Austin,TX,78735,(888) 555-1234\r\n" },
+                       { row_id: 2, text: "2,xyz_trucks_example,XYZ \xC1_\xCCTrucks Example,456 W Main St,Austin,TX,78735,(800) 555-4321\r\n" }],
+    defective_data: [],
+    error_data: []
+  }
+}
 ```
 ## Development

data/Rakefile CHANGED Viewed

@@ -10,10 +10,14 @@ task :test => :spec
 task :console do
   require 'irb'
   require 'irb/completion'
-  require 'utf8_sanitizer' # You know what to do.
+  require 'utf8_sanitizer'
   require "active_support/all"
   ARGV.clear
-  # sanitized_data = Utf8Sanitizer.sanitize(file_path: "./lib/utf8_sanitizer/csv/seeds_mini.csv")
+  orig_hashes = [{ :row_id=>"1", :url=>"stanleykaufman.com", :act_name=>"Stanley Chevrolet Kaufman\x99_\xCC", :street=>"825 E Fair St", :city=>"Kaufman", :state=>"TX", :zip=>"75142", :phone=>"(888) 457-4391\r\n" }]
+  # sanitized_data = Utf8Sanitizer.sanitize(file_path: './lib/utf8_sanitizer/csv/seeds_dirty_1.csv')
+  # sanitized_data = Utf8Sanitizer.sanitize(data: orig_hashes)
   sanitized_data = Utf8Sanitizer.sanitize
+  puts sanitized_data.inspect
   IRB.start
 end

data/lib/utf8_sanitizer/csv/seeds_dirty_1.csv ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ url,act_name,street,city,state,zip,phone
2	+ http://www.courtesyfordsales.com,Courtesy Ford,__��__��____1410 West Pine Street Hattiesburg,Wexford,MS,39401,512-555-1212

data/lib/utf8_sanitizer/utf.rb CHANGED Viewed

@@ -1,8 +1,10 @@
 # frozen_string_literal: false
-# require 'csv'
+require 'csv'
 module Utf8Sanitizer
   class UTF
+    attr_accessor :headers, :valid_rows, :encoded_rows, :row_id, :data_hash, :defective_rows, :error_rows
     def initialize(args={})
       @valid_rows = []
       @encoded_rows = []
@@ -13,19 +15,6 @@ module Utf8Sanitizer
       @data_hash = {}
     end
-    #################### * VALIDATE DATA * ####################
-    def validate_data(args={})
-      args = args.slice(:file_path, :data, :pollute_seeds)
-      args = args.compact
-      @seed = Seed.new if args[:pollute_seeds]
-      file_path = args[:file_path]
-      data = args[:data]
-      utf_result = validate_csv(file_path) if file_path
-      utf_result = validate_hashes(data) if data
-      utf_result
-    end
     #################### * COMPILE RESULTS * ####################
     def compile_results
       utf_status = @valid_rows.map { |hsh| hsh[:utf_status] }
@@ -35,44 +24,38 @@ module Utf8Sanitizer
       perfect = groups['perfect']
       header_row_count = @headers.any? ? 1 : 0
       utf_result = {
         stats: { total_rows: @row_id, header_row: header_row_count, valid_rows: @valid_rows.count, error_rows: @error_rows.count, defective_rows: @defective_rows.count, perfect_rows: perfect, encoded_rows: @encoded_rows.count, wchar_rows: wchar },
         data: { valid_data: @valid_rows, encoded_data: @encoded_rows, defective_data: @defective_rows, error_data: @error_rows }
       }
+      utf_result
     end
-    #################### * VALIDATE CSV * ####################
-    def validate_csv(file_path)
-      return unless file_path.present?
-      File.open(file_path).each do |file_line|
-        validated_line = utf_filter(check_utf(file_line))
-        @row_id += 1
-        if validated_line
-          CSV.parse(validated_line) do |row|
-            if @headers.empty?
-              @headers = row
-            else
-              @data_hash.merge!(row_to_hsh(row))
-              @valid_rows << @data_hash
-            end
-          end
-        end
-      rescue StandardError => error
-        @error_rows << { row_id: @row_id, text: error.message }
-      end
-      compile_results
+    #################### * VALIDATE DATA * ####################
+    def validate_data(args={})
+      args = args.slice(:file_path, :data)
+      args = args.compact
+      file_path = args[:file_path]
+      data = args[:data]
+      utf_result = validate_csv(file_path) if file_path
+      utf_result = validate_hashes(data) if data
+      utf_result
     end
     #################### * VALIDATE HASHES * ####################
     def validate_hashes(orig_hashes)
       return unless orig_hashes.present?
       begin
-        process_hash_row(orig_hashes.first) ## re keys for headers.
-        orig_hashes.each { |hsh| process_hash_row(hsh) } ## re values
+        process_hash_row(orig_hashes.first) ## keys for headers.
+        orig_hashes.each { |hsh| process_hash_row(hsh) } ## values
       rescue StandardError => error
         @error_rows << { row_id: @row_id, text: error.message }
       end
-      compile_results ## handles returns.
+      results = compile_results ## handles returns.
+      results
     end
     ### process_hash_row - helper VALIDATE HASHES ###
@@ -86,7 +69,9 @@ module Utf8Sanitizer
       end
       file_line = keys_or_values.join(',')
-      line_parse(utf_filter(check_utf(file_line)))
+      validated_line = utf_filter(check_utf(file_line))
+      res = line_parse(validated_line)
+      res
     end
     ### line_parse - helper VALIDATE HASHES ###
@@ -105,9 +90,9 @@ module Utf8Sanitizer
     #################### * CHECK UTF * ####################
     def check_utf(text)
-      return unless text.present?
-      text = @seed.pollute_seeds(text) if @seed && @headers.any?
+      return if text.nil?
       results = { text: text, encoded: nil, wchar: nil, error: nil }
       begin
         if !text.valid_encoding?
           encoded = text.chars.select(&:valid_encoding?).join
@@ -128,7 +113,7 @@ module Utf8Sanitizer
     #################### * UTF FILTER * ####################
     def utf_filter(utf)
       return unless utf.present?
-      puts utf.inspect
+      # puts utf.inspect
       utf_status = utf.except(:text).compact.keys
       utf_status = utf_status&.map(&:to_s)&.join(', ')
       utf_status = 'perfect' if utf_status.blank?
@@ -145,6 +130,30 @@ module Utf8Sanitizer
       line
     end
+    #################### * VALIDATE CSV * ####################
+    def validate_csv(file_path)
+      return unless file_path.present?
+      File.open(file_path).each do |file_line|
+        validated_line = utf_filter(check_utf(file_line))
+        @row_id += 1
+        if validated_line
+          CSV.parse(validated_line) do |row|
+            if @headers.empty?
+              @headers = row
+            else
+              @data_hash.merge!(row_to_hsh(row))
+              @valid_rows << @data_hash
+            end
+          end
+        end
+      rescue StandardError => error
+        @error_rows << { row_id: @row_id, text: error.message }
+      end
+      utf_results = compile_results
+    end
     ############# !! HELPERS BELOW !! #############
     ############# KEY VALUE CONVERTERS #############
     def row_to_hsh(row)
@@ -152,15 +161,14 @@ module Utf8Sanitizer
       h.symbolize_keys
     end
-    def val_hsh(cols, hsh)
-      keys = hsh.keys
-      keys.each { |key| hsh.delete(key) unless cols.include?(key) }
-      hsh
-    end
     def make_groups_from_array(array)
       array.each_with_object(Hash.new(0)) { |e, h| h[e] += 1; }
     end
+    # def val_hsh(cols, hsh)
+    #   keys = hsh.keys
+    #   keys.each { |key| hsh.delete(key) unless cols.include?(key) }
+    #   hsh
+    # end
   end
 end

data/lib/utf8_sanitizer/version.rb CHANGED Viewed

@@ -1,4 +1,4 @@
 module Utf8Sanitizer
   # VERSION = "0.0.1-rc.1"
-  VERSION = "0.0.2.pre.rc.04"
+  VERSION = '1.01'.freeze
 end

data/lib/utf8_sanitizer.rb CHANGED Viewed

@@ -1,29 +1,13 @@
-require "utf8_sanitizer/version"
-require 'utf8_sanitizer/seed'
+require 'utf8_sanitizer/version'
 require 'utf8_sanitizer/utf'
 require 'pry'
 module Utf8Sanitizer
   ## Args must include :data or :file_path, else seeds will run by default.
-  def self.sanitize(args={})
-    keys = args.compact.keys
+  def self.sanitize(args = {})
     input = { stats: nil, file_path: nil, data: nil }.merge(args)
-    ## Grabs seeds if :data or :file_path empty.
-    unless (keys & [:data, :file_path]).any?
-      ## Toggle data[:file_path] & data[:data] to test csv parsing or data hashes.
-      # input[:file_path] = Seed.new.grab_seed_file_path
-      input[:data] = Seed.new.grab_seed_hashes
-      ## For Testing: Pollute_seeds adds non-utf8 chars to each line.
-      input[:pollute_seeds] = true
-    end
-    ## Sanitizes input hash, then merges results to original input hash, and returns as sanitized_data.
-    sanitized_data = input.merge!(Utf8Sanitizer::UTF.new.validate_data(input))
-    sanitized_data
+    return input unless input.compact.any?
+    sanitized_data = input.merge(Utf8Sanitizer::UTF.new.validate_data(input))
   end
 end

data/utf8_sanitizer.gemspec CHANGED Viewed

@@ -14,11 +14,11 @@ Gem::Specification.new do |spec|
   spec.homepage      = 'https://github.com/4rlm/utf8_sanitizer'
   spec.license       = 'MIT'
-  spec.summary       = "Removes invalid UTF8 characters & extra whitespace from csv or strings."
-  spec.description   = "Removes invalid UTF8 characters & extra whitespace (carriage returns, new lines, tabs, spaces, etc.) from csv or strings.\n Example: ABC Au\\xC1tos,123 E Main St,Anytown,TX,75142,(888) 555-1234\\n\\r\\n  =>  ABC Autos,123 E Main St,Anytown,TX,75142,(888) 555-1234"
+  spec.summary       = 'Removes invalid UTF8 characters & extra whitespace (carriage returns, new lines, tabs, spaces, etc.) from csv or strings. Also provides detailed report indicating row numbers containing non-UTF8 and extra whitespace, and before and after to compare changes.'
+  spec.description   = "Removes invalid UTF8 characters & extra whitespace (carriage returns, new lines, tabs, spaces, etc.) from csv or strings. Also provides detailed report indicating row numbers containing non-UTF8 and extra whitespace, and before and after to compare changes.\n Example: ABC Au\\xC1tos,123 E Main St,Anytown,TX,75142,(888) 555-1234\\n\\r\\n  =>  ABC Autos,123 E Main St,Anytown,TX,75142,(888) 555-1234"
   if spec.respond_to?(:metadata)
-    spec.metadata['allowed_push_host'] = "https://rubygems.org"
+    spec.metadata['allowed_push_host'] = 'https://rubygems.org'
   else
     raise 'RubyGems 2.0 or newer is required to protect against ' \
       'public gem pushes.'
@@ -31,7 +31,7 @@ Gem::Specification.new do |spec|
   spec.bindir        = 'exe'
   spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
   spec.require_paths = ['lib']
-  spec.post_install_message = "Thanks for installing utf8_sanitizer!"
+  spec.post_install_message = 'Thanks for installing utf8_sanitizer!'
   spec.required_ruby_version = '~> 2.5.1'
   spec.add_dependency 'activesupport', '~> 5.2', '>= 5.2.0'
@@ -40,11 +40,11 @@ Gem::Specification.new do |spec|
   spec.add_development_dependency 'byebug', '~> 10.0', '>= 10.0.2'
   spec.add_development_dependency 'class_indexer', '~> 0.3.0'
   spec.add_development_dependency 'irbtools', '~> 2.2', '>= 2.2.1'
+  spec.add_development_dependency 'pry', '~> 0.11.3'
   spec.add_development_dependency 'rake', '~> 12.3', '>= 12.3.1'
   spec.add_development_dependency 'rspec', '~> 3.7'
   spec.add_development_dependency 'rubocop', '~> 0.56.0'
   spec.add_development_dependency 'ruby-beautify', '~> 0.97.4'
-  spec.add_development_dependency "pry", "~> 0.11.3"
   # spec.add_runtime_dependency 'library', '~> 2.2'
   # spec.add_dependency 'activerecord', '>= 3.0'
   # spec.add_dependency 'actionpack', '>= 3.0'

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: utf8_sanitizer
 version: !ruby/object:Gem::Version
-  version: 0.0.2.pre.rc.04
+  version: '1.01'
 platform: ruby
 authors:
 - Adam Booth
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2018-06-01 00:00:00.000000000 Z
+date: 2018-06-21 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: activesupport
@@ -104,6 +104,20 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: 2.2.1
+- !ruby/object:Gem::Dependency
+  name: pry
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.11.3
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.11.3
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
@@ -166,22 +180,8 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: 0.97.4
-- !ruby/object:Gem::Dependency
-  name: pry
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 0.11.3
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 0.11.3
 description: |-
-  Removes invalid UTF8 characters & extra whitespace (carriage returns, new lines, tabs, spaces, etc.) from csv or strings.
+  Removes invalid UTF8 characters & extra whitespace (carriage returns, new lines, tabs, spaces, etc.) from csv or strings. Also provides detailed report indicating row numbers containing non-UTF8 and extra whitespace, and before and after to compare changes.
    Example: ABC Au\xC1tos,123 E Main St,Anytown,TX,75142,(888) 555-1234\n\r\n  =>  ABC Autos,123 E Main St,Anytown,TX,75142,(888) 555-1234
 email:
 - 4rlm@protonmail.ch
@@ -203,12 +203,12 @@ files:
 - lib/utf8_sanitizer/csv/extensions.csv
 - lib/utf8_sanitizer/csv/seeds_clean.csv
 - lib/utf8_sanitizer/csv/seeds_dirty.csv
+- lib/utf8_sanitizer/csv/seeds_dirty_1.csv
 - lib/utf8_sanitizer/csv/seeds_mega.csv
 - lib/utf8_sanitizer/csv/seeds_mini.csv
 - lib/utf8_sanitizer/csv/seeds_mini.csv,
 - lib/utf8_sanitizer/csv/seeds_mini_10.csv
 - lib/utf8_sanitizer/csv/seeds_mini_2_bug.csv
-- lib/utf8_sanitizer/seed.rb
 - lib/utf8_sanitizer/utf.rb
 - lib/utf8_sanitizer/version.rb
 - utf8_sanitizer.gemspec
@@ -228,13 +228,16 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: 2.5.1
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
-  - - ">"
+  - - ">="
     - !ruby/object:Gem::Version
-      version: 1.3.1
+      version: '0'
 requirements: []
 rubyforge_project:
 rubygems_version: 2.7.6
 signing_key:
 specification_version: 4
-summary: Removes invalid UTF8 characters & extra whitespace from csv or strings.
+summary: Removes invalid UTF8 characters & extra whitespace (carriage returns, new
+  lines, tabs, spaces, etc.) from csv or strings. Also provides detailed report indicating
+  row numbers containing non-UTF8 and extra whitespace, and before and after to compare
+  changes.
 test_files: []

data/lib/utf8_sanitizer/seed.rb DELETED Viewed

@@ -1,74 +0,0 @@
-require 'csv'
-module Utf8Sanitizer
-  class Seed
-    def initialize(args={})
-      # @pollute_seeds = args.fetch(:pollute_seeds, false)
-      # @seed_hashes = args.fetch(:seed_hashes, false)
-      # @seed_csv = args.fetch(:seed_csv, false)
-    end
-    def pollute_seeds(text)
-      list = ['h∑', 'lÔ', "\x92", "\x98", "\x99", "\xC0", "\xC1", "\xC2", "\xCC", "\xDD", "\xE5", "\xF8"]
-      index = text.length / 2
-      var = "#{list.sample}_#{list.sample}"
-      text.insert(index, var)
-      text.insert(-1, "\r\n")
-      text
-    end
-    def grab_seed_file_path
-      # "./lib/utf8_sanitizer/csv/seeds_clean.csv"
-      "./lib/utf8_sanitizer/csv/seeds_dirty.csv"
-      # "./lib/utf8_sanitizer/csv/seeds_mega.csv"
-      # "./lib/utf8_sanitizer/csv/seeds_mini.csv"
-      # "./lib/utf8_sanitizer/csv/seeds_mini_10.csv"
-      # './lib/utf8_sanitizer/csv/seeds_mini_2_bug.csv'
-    end
-    ### Sample Hashes for validate_data
-    def grab_seed_hashes
-      [{ row_id: 1,
-         url: 'stanleykaufman.com',
-         act_name: 'Stanley Chevrolet Kaufman',
-         street: '825 E Fair St',
-         city: 'Kaufman',
-         state: 'TX',
-         zip: '75142',
-         phone: '(888) 457-4391' },
-       { row_id: 2,
-         url: 'leepartyka',
-         act_name: 'Lee Partyka Chevrolet Mazda Isuzu Truck',
-         street: '200 Skiff St',
-         city: 'Hamden',
-         state: 'CT',
-         zip: '6518',
-         phone: '(203) 288-7761' },
-       { row_id: 3,
-         url: 'burienhonda.fake.not.net.com',
-         act_name: 'Honda of Burien 15026 1st Avenue South, Burien, WA 98148',
-         street: '15026 1st Avenue South',
-         city: 'Burien',
-         state: 'WA',
-         zip: '98148',
-         phone: '(206) 246-9700' },
-       { row_id: 4,
-         url: 'cortlandchryslerdodgejeep.com',
-         act_name: 'Cortland Chrysler Dodge Jeep RAM',
-         street: '3878 West Rd',
-         city: 'Cortland',
-         state: 'NY',
-         zip: '13045',
-         phone: '(877) 279-3113' },
-       { row_id: 5,
-         url: 'imperialmotors.net',
-         act_name: 'Imperial Motors',
-         street: '4839 Virginia Beach Blvd',
-         city: 'Virginia Beach',
-         state: 'VA',
-         zip: '23462',
-         phone: '(757) 490-3651' }]
-    end
-  end
-end