RubyGems - utf8_sanitizer - Versions diffs - 1.01 → 1.02 - Mend

utf8_sanitizer 1.01 → 1.02

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

checksums.yaml +4 -4
data/README.md +27 -8
data/lib/utf8_sanitizer/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 0de542eedb064eda2b7a85eda41c674dc7ae822e5bff19f0758bf7418ace84ee
-  data.tar.gz: b252e6c2aa92f32c2068ba59fb8f0719afedb034c47c8513c3017dda9588933b
+  metadata.gz: f0e4ff707e58e6238c6f62c2b8818b783406240a40e537663e928a914198e199
+  data.tar.gz: cff25ea7735710a5ed4d8c5fe1f779688d9f9117880e710e998c2eaef5ac8e3a
 SHA512:
-  metadata.gz: 8bf27abb9db602ab114a0606cb83b1e16d931bec77901381e5ad851334b58027c13cd1aa5942607334cf57b66a1394ff50984d62f17c8bac21cdcafe7907e074
-  data.tar.gz: 1e8bcbb08ef7bd8db08af0a79ba9dc06d824634197f005abfe5fc8e390703be0729a0c1ce568378d4fc22266db871e635df4b2237f54bf4eb3b023235bfea5d4
+  metadata.gz: a6b37f0b41b0f4340d350438580e83c78719ffc83c1fc91cf0580eb921354ba9e661215bd710ee4f7391bb9920dde812b0735c4c367ea0381b60f9304a3770ae
+  data.tar.gz: 8785995ab6dc235136a9e83f5ffb7e4d74fe5a4cee2d67539b0b0dd864542f6e959c10156e22e3ece65020e3f58c41e369b90267fdf2c6d0a86b41e0ad156254

data/README.md CHANGED Viewed

@@ -40,14 +40,27 @@ Options for UTF8 Sanitizing data:
 2. Data Hash of strings
 #### 1. CSV Parsing
-This is a good option if you are having problems with a CSV containing non-UTF8 characters.  Pass your file_path as a hash like below.  Hash MUST be a SYMBOL and named `:file_path`.  If not, default seeds will be passed as the system detects empty user input and thinks user is trying to run built-in seed data for testing.
+To clean CSV file containing non-UTF8 characters, pass file_path as a hash like below. Hash MUST meet the following guidelines:
+a. key as a SYMBOL `:` (not key as string)
+b. named `:file_path`
+c. be an Absolute Path from root `./`
+d. be a hash `{file_path: "./path/to/your_csv.csv"}`
+e. passed to `Utf8Sanitizer.sanitize()`
+Syntax Example Below:
 ```
-args = {file_path: "./path/to/your_csv.csv"}
-sanitized_data = Utf8Sanitizer.sanitize(args)
+sanitized_data = Utf8Sanitizer.sanitize({file_path: "./path/to/your_csv.csv"})
 ```
 #### 2. Hash of Strings
-This is a good option if you are scraping data or cleaning up existing databases.  Pass your data as a hash like below.  Hash MUST be a SYMBOL and named `:data`.  The value of `:data` should be an array of hashes like below and can be any size from one to many tens of thousands.  The hashes inside the data array can be named anything from crm contact data like below, stats, recipes, or any custom hashes as long as they are in an array and resemble the syntax and structure like below.
+To clean existing databases, web form submissions, or scraped data, pass input data as a hash like below.  Hash MUST be a SYMBOL and named `:data`.  The value of `:data` should be an array of hashes like below.
+Below is just an example.  Your input hash keys inside the parent data array can be named anything (not limited to url, act_name, street, etc.), but must be hashes inside a parent array like the below structure and syntax.
 ```
 array_of_hashes = [ { url: 'abc_autos_example.com',
                        act_name: 'ABC Aut\x92os',
@@ -65,19 +78,25 @@ array_of_hashes = [ { url: 'abc_autos_example.com',
                        phone: '(800) 555-5678\r\n' },
                   }]
-sanitized_data = Utf8Sanitizer.sanitize(data: array_of_hashes)
+sanitized_data = Utf8Sanitizer.sanitize({data: array_of_hashes})
 ```
 ### Returned Sanitized Data Format
-The returned data will be in hash format with the following keys: `:stats`, `:file_path`, `:data` like below.
+The returned data will contain a detailed report of the row or line numbers where UTF8 violations and extra white space were located.  The broad categories in the returned data will be in hash format with the following keys: `:stats`, `:file_path`, `:data` like below.
+IMPORTANT: `:valid_data` is the clean, converted output from your CSV or strings input, directly accessible via `sanitized_data[:data][:valid_data]`.
+Returned data also indicates if the input data was successfully encoded. In rare cases the data is beyond repair, and will be listed in the `:error` category.
+Each non-UTF8 row will be included in its original syntax like the example below and can be accessed directly via `sanitized_data[:data][:encoded_data]`.
 The `:stats` are a breakdown of the results. `:defective_rows` and `:error_rows` will usually be the same number which refer to the rows which are beyond repair (very rare). Otherwise, the results will be `:valid_rows` if they were perfect or successfully sanitized, including `:encoded_rows` which refers to the number of rows that contained non-utf8 characters, and `:wchar_rows` which is short for 'whitespace character rows'.
 `:data` is broken down into the following categories: `:valid_data`, `:encoded_data`, `:defective_data`, and `:error_data`.
-`:valid_data` is the most important data and you can access it with `sanitized_data[:data][:valid_data]`.  Each non-UTF8 row will be included in its original syntax like below and can be accessed directly via `sanitized_data[:data][:encoded_data]`.
+Below is an example of the returned data (`:stats`, `:file_path`, `:data`)
-**You can change the name of `sanitized_data` to anything you like, but it must be followed with `[:data][:valid_data]` and `[:data][:encoded_data]`, etc.**
+**`sanitized_data` is a local variable, which you can name anything you like, but it must be assigned in the following syntax: `[:data][:valid_data]` and `[:data][:encoded_data]`, etc.**
 ```
 { stats:

data/lib/utf8_sanitizer/version.rb CHANGED Viewed

@@ -1,4 +1,4 @@
 module Utf8Sanitizer
   # VERSION = "0.0.1-rc.1"
-  VERSION = '1.01'.freeze
+  VERSION = '1.02'.freeze
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: utf8_sanitizer
 version: !ruby/object:Gem::Version
-  version: '1.01'
+  version: '1.02'
 platform: ruby
 authors:
 - Adam Booth
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2018-06-21 00:00:00.000000000 Z
+date: 2018-06-22 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: activesupport