RubyGems - cure - Versions diffs - 0.1.2 → 0.4.0 - Mend

cure 0.1.2 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (112) hide show

checksums.yaml +4 -4
data/.rubocop.yml +13 -3
data/.tool-versions +1 -0
data/Dockerfile +1 -1
data/Gemfile +1 -0
data/Gemfile.lock +25 -6
data/README.md +61 -93
data/docs/README.md +33 -0
data/docs/about.md +219 -0
data/docs/builder/add.md +52 -0
data/docs/builder/black_white_list.md +83 -0
data/docs/builder/copy.md +48 -0
data/docs/builder/explode.md +70 -0
data/docs/builder/main.md +43 -0
data/docs/builder/remove.md +46 -0
data/docs/examples/examples.md +164 -0
data/docs/export/main.md +37 -0
data/docs/extract/main.md +89 -0
data/docs/metadata/main.md +29 -0
data/docs/query/main.md +45 -0
data/docs/sources/main.md +36 -0
data/docs/transform/main.md +53 -0
data/docs/validate/main.md +42 -0
data/exe/cure +12 -41
data/exe/cure.old +59 -0
data/lib/cure/builder/base_builder.rb +151 -0
data/lib/cure/builder/candidate.rb +56 -0
data/lib/cure/cli/command.rb +105 -0
data/lib/cure/cli/generate_command.rb +54 -0
data/lib/cure/cli/new_command.rb +52 -0
data/lib/cure/cli/run_command.rb +19 -0
data/lib/cure/cli/templates/README.md.erb +1 -0
data/lib/cure/cli/templates/gemfile.erb +5 -0
data/lib/cure/cli/templates/gitignore.erb +181 -0
data/lib/cure/cli/templates/new_template.rb.erb +31 -0
data/lib/cure/cli/templates/tool-versions.erb +1 -0
data/lib/cure/config.rb +142 -18
data/lib/cure/coordinator.rb +61 -25
data/lib/cure/database.rb +191 -0
data/lib/cure/dsl/builder.rb +26 -0
data/lib/cure/dsl/exporters.rb +45 -0
data/lib/cure/dsl/extraction.rb +60 -0
data/lib/cure/dsl/metadata.rb +33 -0
data/lib/cure/dsl/queries.rb +36 -0
data/lib/cure/dsl/source_files.rb +36 -0
data/lib/cure/dsl/template.rb +131 -0
data/lib/cure/dsl/transformations.rb +95 -0
data/lib/cure/dsl/validator.rb +22 -0
data/lib/cure/export/base_processor.rb +194 -0
data/lib/cure/export/manager.rb +24 -0
data/lib/cure/extract/base_processor.rb +47 -0
data/lib/cure/extract/csv_lookup.rb +14 -3
data/lib/cure/extract/extractor.rb +41 -84
data/lib/cure/extract/filter.rb +118 -0
data/lib/cure/extract/named_range.rb +94 -0
data/lib/cure/extract/named_range_processor.rb +128 -0
data/lib/cure/extract/variable.rb +25 -0
data/lib/cure/extract/variable_processor.rb +57 -0
data/lib/cure/generator/base_generator.rb +14 -4
data/lib/cure/generator/case_generator.rb +10 -3
data/lib/cure/generator/character_generator.rb +9 -3
data/lib/cure/generator/erb_generator.rb +21 -0
data/lib/cure/generator/eval_generator.rb +34 -0
data/lib/cure/generator/faker_generator.rb +7 -1
data/lib/cure/generator/guid_generator.rb +7 -2
data/lib/cure/generator/hex_generator.rb +6 -1
data/lib/cure/generator/imports.rb +4 -0
data/lib/cure/generator/number_generator.rb +6 -1
data/lib/cure/generator/placeholder_generator.rb +7 -1
data/lib/cure/generator/proc_generator.rb +21 -0
data/lib/cure/generator/redact_generator.rb +9 -3
data/lib/cure/generator/static_generator.rb +21 -0
data/lib/cure/generator/variable_generator.rb +11 -5
data/lib/cure/helpers/file_helpers.rb +12 -2
data/lib/cure/helpers/object_helpers.rb +5 -17
data/lib/cure/helpers/perf_helpers.rb +30 -0
data/lib/cure/helpers/string.rb +54 -0
data/lib/cure/launcher.rb +125 -0
data/lib/cure/log.rb +7 -0
data/lib/cure/planner.rb +136 -0
data/lib/cure/strategy/append_strategy.rb +4 -0
data/lib/cure/strategy/base_strategy.rb +19 -44
data/lib/cure/strategy/contain_strategy.rb +51 -0
data/lib/cure/strategy/end_with_strategy.rb +7 -1
data/lib/cure/strategy/full_strategy.rb +4 -0
data/lib/cure/strategy/history/history_cache.rb +82 -0
data/lib/cure/strategy/imports.rb +2 -0
data/lib/cure/strategy/match_strategy.rb +7 -2
data/lib/cure/strategy/prepend_strategy.rb +28 -0
data/lib/cure/strategy/regex_strategy.rb +7 -1
data/lib/cure/strategy/split_strategy.rb +8 -3
data/lib/cure/strategy/start_with_strategy.rb +7 -1
data/lib/cure/transformation/candidate.rb +32 -35
data/lib/cure/transformation/transform.rb +22 -56
data/lib/cure/validator/base_rule.rb +78 -0
data/lib/cure/validator/candidate.rb +54 -0
data/lib/cure/validator/manager.rb +21 -0
data/lib/cure/validators.rb +3 -3
data/lib/cure/version.rb +1 -1
data/lib/cure.rb +19 -11
data/templates/dsl_example.rb +48 -0
data/templates/empty_template.rb +31 -0
metadata +132 -21
data/lib/cure/export/exporter.rb +0 -74
data/lib/cure/extract/builder.rb +0 -27
data/lib/cure/main.rb +0 -72
data/lib/cure/template/dispatch.rb +0 -30
data/lib/cure/template/extraction.rb +0 -38
data/lib/cure/template/template.rb +0 -28
data/lib/cure/template/transformations.rb +0 -26
data/templates/aws_cur_template.json +0 -145
data/templates/example_template.json +0 -54

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 29e7c915346b45c1407208f6ba1810ece68004c8d7dd9ffb8eb2b5dcca903b35
-  data.tar.gz: c356c649bc4a902fcb347e9e16f2299f4c8a7c0db576a7fc8242d080b575920d
+  metadata.gz: 78d9b4b8ba9b29e7e299811d971a41f3600cbf7df4f1e17729b083f9573f1b49
+  data.tar.gz: f9fc2bfbc4ce3dfbb7769bc4af7cd5af2ed10c72e0d236887787c4a63a6a2d34
 SHA512:
-  metadata.gz: 5999a52c084656af0a6a18ba68884aec3de825be9f84f5c67cdf771031af414789aaebf24dfc9fed52f6a76b9a9495081f0f4e1c0f872927fbe34c271fa86e62
-  data.tar.gz: 458962e9905380ef0e5832ef9e2ae2da1ede47f448992314bceba95c997c2af3bb4d6f2b92e1f23e906563c77ad1fbcd37d171d75732cf29c15b6b9b0e05e5bc
+  metadata.gz: eccf1c880d1c04653364621f3d2136c9bb049e8b8be07e31ed2f763fedb7ceb4ae4b92ecb1a86baf8314989e0bfc52d51846c5461a782bdb757c0cc13f337992
+  data.tar.gz: d84dcc4031ff49cac80877478df86f9a276743f1815ac5386a46bd408788c6d4ef29e1edfab2648ca0d15c89925282cd5b5f076e2813cf7ad863628fe418c0ef

data/.rubocop.yml CHANGED Viewed

@@ -1,5 +1,7 @@
 AllCops:
-  TargetRubyVersion: 2.6
+  TargetRubyVersion: 3.2
+  Exclude:
+    - 'spec/**/*'
 Style/StringLiterals:
   Enabled: true
@@ -12,10 +14,13 @@ Style/StringLiteralsInInterpolation:
 Layout/LineLength:
   Max: 120
+Metrics/BlockLength:
+  Max: 40
 # Too short methods lead to extraction of single-use methods, which can make
 # the code easier to read (by naming things), but can also clutter the class
 Metrics/MethodLength:
-  Max: 20
+  Max: 40
 # The guiding principle of classes is SRP, SRP can't be accurately measured by LoC
 Metrics/ClassLength:
@@ -132,4 +137,9 @@ Lint/Debugger:
 # Style preference
 Style/MethodDefParentheses:
-  Enabled: false
+  Enabled: false
+Layout/FirstHashElementIndentation:
+  Enabled: true
+  EnforcedStyle: consistent
+  IndentationWidth: ~

data/.tool-versions ADDED Viewed

	@@ -0,0 +1 @@
1	+ ruby 3.2.1

data/Dockerfile CHANGED Viewed

@@ -1,4 +1,4 @@
-FROM ruby:3.0.2
+FROM ruby:3.1.4
 RUN apt-get update && apt-get install -y \
   build-essential

data/Gemfile CHANGED Viewed

@@ -10,3 +10,4 @@ gem "rake", "~> 13.0"
 gem "rspec", "~> 3.0"
 gem "rubocop", "~> 1.21"
+gem "standard", group: "development", require: false

data/Gemfile.lock CHANGED Viewed

@@ -1,20 +1,26 @@
 PATH
   remote: .
   specs:
-    cure (0.1.2)
-      faker
-      rcsv
+    cure (0.4.0)
+      artii (~> 2.1.2)
+      faker (~> 3.2.2)
+      rcsv (~> 0.3.1)
+      sequel (~> 5.74.0)
+      sqlite3 (~> 1.6.8)
+      terminal-table (~> 3.0.2)
 GEM
   remote: https://rubygems.org/
   specs:
+    artii (2.1.2)
     ast (2.4.2)
-    concurrent-ruby (1.1.10)
+    bigdecimal (3.1.5)
+    concurrent-ruby (1.2.2)
     diff-lcs (1.4.4)
     docile (1.4.0)
-    faker (2.21.0)
+    faker (3.2.2)
       i18n (>= 1.8.11, < 2)
-    i18n (1.10.0)
+    i18n (1.14.1)
       concurrent-ruby (~> 1.0)
     parallel (1.21.0)
     parser (3.0.2.0)
@@ -48,16 +54,28 @@ GEM
       unicode-display_width (>= 1.4.0, < 3.0)
     rubocop-ast (1.13.0)
       parser (>= 3.0.1.1)
+    rubocop-performance (1.11.5)
+      rubocop (>= 1.7.0, < 2.0)
+      rubocop-ast (>= 0.4.0)
     ruby-progressbar (1.11.0)
+    sequel (5.74.0)
+      bigdecimal
     simplecov (0.21.2)
       docile (~> 1.1)
       simplecov-html (~> 0.11)
       simplecov_json_formatter (~> 0.1)
     simplecov-html (0.12.3)
     simplecov_json_formatter (0.1.4)
+    sqlite3 (1.6.9-x86_64-linux)
+    standard (1.4.0)
+      rubocop (= 1.22.3)
+      rubocop-performance (= 1.11.5)
+    terminal-table (3.0.2)
+      unicode-display_width (>= 1.1.1, < 3)
     unicode-display_width (2.1.0)
 PLATFORMS
+  ruby
   x86_64-linux
 DEPENDENCIES
@@ -66,6 +84,7 @@ DEPENDENCIES
   rspec (~> 3.0)
   rubocop (~> 1.21)
   simplecov
+  standard
 BUNDLED WITH
    2.3.13

data/README.md CHANGED Viewed

@@ -3,111 +3,79 @@
 ![run tests](https://github.com/williamthom-as/cure/actions/workflows/rspec.yml/badge.svg)
 [![Gem Version](https://badge.fury.io/rb/cure.svg)](https://badge.fury.io/rb/cure)
-Cure is a simple tool to **extract/clean/transform/remove/redact/anonymize** and **replace** information in a spreadsheet.
-It has been written to anonymize private cloud billing data for use in public demo environments.  Since then, it has grown to
-additional processing capabilities that can take a CSV from junk to workable data.
-It has several key features:
-- Operate on your data to build what you need.
-  - Files are taken through an `Extract -> Build -> Transform -> Export` pipeline.
-- Extract parts of your file into named ranges to remove junk.
-- Build (Add/Remove/Explode) columns - handy for files that may have JSON as a column value.
-- Transform values:
-  - Define either full or regex match groups replacements.
-  - Choose from many strategies to replace anonymous data - random number sequences, GUIDs, placeholders, multipliers amongst many others.
-  - **Existing generated values are stored and recalled** so once a replacement is defined, it is kept around for other columns to use.
-    - For example, once a replacement **Account Number** is generated, any further use of that number sequence is other columns will be used, keeping data real(ish) and functional in a relational sense.
-- Export into one (or many) files, in a selection of chosen formats (CSV at the moment, coming soon with JSON, Parquet).
-## Use Cases
-- Strip out personal data from a CSV that may be used for public demo.
-- Extract specific parts of a CSV file and junk the rest.
-- Explode JSON values into individual columns per key.
+Cure provides a low-code solution for handling a wide range of tasks for importing, validating and manipulating one or
+more CSV files. Unlike other tools, Cure doesn't assume standard CSV formatting and is designed to handle a wide range of
+challenging scenarios.
-## Usage
+The library provides optional hooks for each data processing pipeline phase in:
-Cure requires two things, a **template** (or rules) file. This is a descriptive file that provides the translations required on each column.
-A candidate column entry provides the translations to be run on each column.
-Please see example below.
-```json
-    {
-      "column" : "identity/LineItemId",
-      "translations" : [{
-        "strategy" : {
-          "name": "full",
-          "options" : {}
-        },
-        "generator" : {
-          "name" : "character",
-          "options" : {
-            "length" : 52,
-            "types" : [
-              "lowercase", "number", "uppercase"
-            ]
-          }
-        }
-      }]
-    }
-```
+`Sources -> Extract -> Validate -> Build -> Query -> Transform -> Export`
-A **translation** is made up of a strategy and generator.
-**Strategies** are the means of selecting the *value* to change. You may choose from:
-  - **Full replacement**: replaces the full entry.
-  - **Regex replacement**: can replace either the match group (partial), or full record *if* there is a match.
-  - **Includes replacement**: can replace either the matched substring, or full record *if* there is a match.
-  - **StartWith replacement**: can replace either the starts with substring, or full record *if* there is a match.
-  - **EndWith replacement**: can replace either the end with substring, or full record *if* there is a match.
-**Generators** are the way a replacement value is generated. You may choose from:
-  - Random number generator
-  - Random Hex numbers
-  - Random character strings
-  - Placeholder lookups
-  - Redaction strings
-  - Removal (empty string)
-## Example
-```json
-    {
-      "column" : "identity/ResourceId",
-      "translations" : [{
-        "strategy" : {
-          "name": "full",
-          "options" : {}
-        },
-        "generator" : {
-          "name" : "character",
-          "options" : {
-            "length" : 10,
-            "types" : [
-              "lowercase", "number"
-            ]
-          }
-        }
-      }]
-    }
-```
+See below for a simple example that loads customer data from a single CSV, redacts the email records, and stores the
+result in a new CSV.
+```ruby
+require "cure"
+handler = Cure.init do
+  sources { csv :pathname, Pathname.new("customer_data.csv") }
+  extract { named_range at: "D2:G8" }
+  transform do
+    candidate column: "email" do
+      with_translation { replace("split", token: "@", index: 0).with("redact") }
+      with_translation { replace("split", token: "@", index: -1).with("redact") }
+    end
+  end
-The above example would translate a source value from column **identity/ResourceId** as follows:
-    i-ae44e104ef1 => ddsf78ds56
+  export { csv file_name: "cust_transformed", directory: "/tmp/cure" }
+end
-    # A full replacement, with a random generated 10 character string
-    # made up of lowercase letters and numbers
+handler.run_export
-You can see more of these examples in the Examples folder.
+# Input (customer_data.csv):                Output (cust_transformed.csv):
+#
+# | id | email                  |           | id | email                  |
+# |----|------------------------|    =>     |----|------------------------|
+# | 1  | john.smith@gmail.com   |           | 1  | xxxxxxxxxx@xxxxx.com   |
+# | 2  | lean.davis@outlook.com |           | 2  | xxxxxxxxxx@xxxxxxx.com |
+```
+Click this link to view the [documentation](docs/README.md), see a real world [example](http://www.williamthom.as/csv/ruby/2023/04/06/transforming-csvs-with-cure.html),
+or see a longer list of [features](docs/about.md).
 ## Installation
+### Requirements
+- Ruby 3.0 or above
+- SQLite3
 Install it yourself as:
     $ gem install cure
-### Getting started *quickly*
+## Usage
+### CLI
+You can start a new Cure project using CLI using the following command:
+    $ cure new [name]
+This will create a directory to house templates, input and output directories amongst others.
+To perform a one-off run, you can do it manually via the CLI using the following command:
+    $ cure run -t template.rb -s source_file.csv
+You can view help with the following command:
+    $ cure help
+### Try it out
 To quickly spin up a development environment, please use the Dockerfile provided. Run:
@@ -118,7 +86,7 @@ Please do not forget to mount any volumes which may have templates that you wish
 Once set up and connected to your container, run:
-    $ cure -t /file/path/to/template.json -s /file/path/to/source_file.csv -o /my/output/folder
+    $ cure run -t template.rb -s source_file.csv
 ## Development

data/docs/README.md ADDED Viewed

@@ -0,0 +1,33 @@
+# Cure
+### Cure Documentation
+- [Metadata](metadata/main.md)
+- [Sources](sources/main.md)
+- [Extract](extract/main.md)
+- [Validate](validate/main.md)
+- [Build](builder/main.md)
+- [Query](query/main.md)
+- [Transform](transform/main.md)
+- [Export](export/main.md)
+Cure has several key features:
+- A clean DSL to describe the operations that you want to do. This can be defined in code, or loaded from a file that
+  could be version controlled.
+- Operate on your data to build what you need.
+    - Files are taken through an `Source -> Extract -> Validate -> Build -> Query -> Transform -> Export` pipeline.
+    - Each of these steps is optional.
+- [Metadata](metadata/main.md) allows you to add some comments to your template. Will not impact functionality.
+- [Sources](sources/main.md) are where you define the file(s) that you wish to operate on.
+- [Extract](extract/main.md) parts of your file into named ranges to remove junk.
+- [Validate](validate/main.md) that data fits your expectations.
+- [Build](builder/main.md) (add, remove, rename, copy, explode) columns.
+- [Query](query/main.md) your extracted data using SQLite to further control your desired data.
+- [Transform](transform/main.md) values:
+    - Define either full, split, partials or regex match groups replacements.
+    - Choose from many strategies to replace data - random number sequences, GUIDs, placeholders, multipliers amongst many others.
+    - **Existing generated values are stored and recalled** so once a replacement is defined, it is kept around for other columns to use.
+        - For example, once a replacement **Account Number** is generated, any further use of that number sequence is other columns will be used, keeping data real(ish) and functional in a relational sense.
+- [Export](export/main.md) into one (or many) files, in a selection of chosen formats, CSV (single or chunked files), or create a custom proc to do whatever you want.
+Please see the [Examples](examples/examples.md) article in the examples directory for more information.

data/docs/about.md ADDED Viewed

@@ -0,0 +1,219 @@
+# Cure
+Cure is a versatile tool designed to handle a wide range of tasks for importing or manipulating CSV data.
+It may take time to get familiar with all the features, but once you do, it is capable of performing a wide
+range of tasks.
+Cure can be used as an end-to-end CSV importing tool, or for when you just want to validate, extract, merge,
+clean, transform, remove, anonymize, replace, or manipulate tabular data. It operates in memory by default and
+can be integrated into existing workflows or controlled via the CLI.
+## Use Cases
+Other CSV utils or importers often make assumptions that CSV data is nicely formatted tabular data. However, in the
+real world you may get files don't follow a standard [header,row 1,row 2,row n] format. With Cure, you can load
+specific parts of a file, or join multiple files together and treat them as one. See this
+[blog post](http://www.williamthom.as/csv/ruby/2023/04/06/transforming-csvs-with-cure.html) for a detailed example.
+Cure can be used for simple tasks like:
+- Import data from a spreadsheet into a database.
+- Split one CSV file into multiple files based on a filter (ex. M/F data in a single file into one M file and one F file).
+- Change one 10,000 line CSV file into 10 1,000 line files.
+- Extract specific parts of a CSV and discard the remaining data.
+- Validate a CSV has the expected data against a spec.
+- Fix data mistakes.
+... and more complex ones like:
+- Anonymize and transform personal data in a CSV to prepare it for a public demo environments.
+- Perform complex transformations on values according to specific rules.
+- Unpack JSON values into individual columns per key.
+- Process large files sequentially while retaining variable history.
+- Merge two or more CSV files (or parts thereof) together.
+### In Code
+Cure can be used as part of your existing application. It is configured using a DSL that can either be inline,
+or as a file. Check out [docs](docs/README.md) for more information.
+## When not to use
+Cure processes CSV files as a whole. Some of its features require a complete parse of the file to extract the necessary
+data before transforming it.
+These features include:
+- Variable extraction (for example, extracting a value from A1 and adding it to each row).
+- Non-zero indexed headers (for example, taking values from rows 4 to 10 and using row 2 as the source header row).
+- Expanding JSON fields into columns (for example, if row 1 has values [{"a":1, "b":2}], and row 2 has [{"c":3}], each
+  row needs columns A, B, C, but row 1 doesn't know that until row 2).
+If you have large datasets of streamable CSV data, there are more efficient and performant tools available. However,
+Cure makes it possible to perform more aggressive transformations, which may require more memory usage. If you still
+want to use Cure to process large files, you can choose to persist the datastore to disk instead of in memory, which
+may have a slight impact on performance.
+## How it works
+The library provides designated hooks for each distinct phase in the data processing pipeline
+`Extract -> Validate -> Build -> Query -> Transform -> Export`
+You can choose to opt in to as many or as few stages as needed, no steps are mandatory.
+Cure operates by extracting complete CSV files or specific portions of them into user defined named ranges (one or more
+cells of tabular data), which are subsequently inserted into SQLite tables. This allows for the ability to join or
+manipulate rows with SQL, *if you need it*. With data segmented into separate named ranges, multiple transforms and
+exports can be performed in a single pass.
+## Examples
+### Chunk CSV files
+This is a simple example that takes a sheet and exports it into multiple sheets of 10,000 rows max.
+```ruby
+require "cure"
+handler = Cure.init do
+  export do
+    chunk_csv file_name_prefix: "my_sheet", directory: "/tmp/cure", chunk_size: 10_000
+  end
+end
+handler.process(:path, "path/to/my_sheet.csv")
+```
+### Filter single file into multiple
+This example takes in a sheet of male and female data and exports it into two files based on gender.
+```ruby
+require "cure"
+handler = Cure.init do
+  extract do
+    named_range name: "male", at: -1
+    named_range name: "female", at: -1
+  end
+  query do
+    with named_range: "female", query: <<-SQL
+      SELECT
+        *
+      FROM female
+      WHERE
+        Sex = 'F' AND strftime('%Y', Date) > '2014'
+      ORDER BY Date DESC
+    SQL
+    with named_range: "male", query: <<-SQL
+      SELECT
+        *
+      FROM male
+      WHERE
+        Sex = 'M' AND strftime('%Y', Date) > '2014'
+      ORDER BY Date DESC
+    SQL
+  end
+  export do
+    csv file_name: "male", directory: "/tmp/cure", named_range: "male"
+    csv file_name: "female", directory: "/tmp/cure", named_range: "female"
+  end
+end
+handler.process(:path, "path/to/my_sheet.csv")
+```
+### Validate data
+This example validates that a sheet has valid columns. It will throw an error if it isn't valid.
+```ruby
+require "cure"
+handler = Cure.init do
+  validate do
+    candidate column: "rating", options: { fail_on_error: true } do
+      with_rule :not_null
+      with_rule :length, { min: 0, max: 5 }
+    end
+    candidate column: "phone_number", options: { fail_on_error: true } do
+      with_rule :custom, { proc: proc { |val| val =~ /^04\d{8}$/ } }
+    end
+  end
+end
+handler.process(:path, "path/to/my_sheet.csv")
+```
+### Transform data
+This example anonymizes private data found in a cloud invoice. Note that when the existing account number is found
+in any column, it is replaced with the same value, maintaining referential integrity whilst being anonymized.
+You can see the [before](spec/cure/e2e/input/aws_billing_input.csv) and [after](spec/cure/e2e/output/aws_billing_output.csv) CSVs
+made from this template by clicking on the links.
+```ruby
+require "cure"
+handler = Cure.init do
+  build do
+    candidate do
+      whitelist options: {
+        columns: %w[
+          bill/BillingEntity
+          bill/PayerAccountId
+          bill/BillingPeriodStartDate
+          bill/BillingPeriodEndDate
+          lineItem/UsageAccountId
+          lineItem/LineItemType
+          lineItem/UsageStartDate
+          lineItem/UsageEndDate
+          lineItem/UsageType
+          lineItem/ResourceId
+          lineItem/ProductCode
+          lineItem/UsageAmount
+          lineItem/CurrencyCode
+        ]
+      }
+    end
+  end
+  rot13_proc = proc { |source, _ctx|
+    source.gsub(/[^a-zA-Z0-9]/, '').tr('A-Za-z', 'N-ZA-Mn-za-m')
+  }
+  transform do
+    candidate column: "bill/PayerAccountId" do
+      with_translation { replace("full").with("placeholder", name: :account_number) }
+    end
+    candidate column: "lineItem/UsageAccountId" do
+      with_translation { replace("full").with("number", length: 10) }
+    end
+    candidate column: "lineItem/ResourceId", options: {ignore_empty: true} do
+      # If there is a match (i-[my-group]), replace just the match group with a hex string of 10 length
+      with_translation { replace("regex", regex_cg: "^i-(.*)").with("proc", execute: rot13_proc) }
+      # If the string contains the account number, replace with the account_number placeholder.
+      with_translation { replace("contain", match: "1234567890").with("placeholder", name: :account_number) }
+      # If no match is found, replace the whole match with a prefix hidden_ along with a random 10 char hex string
+      if_no_match { replace("full").with("proc", execute: rot13_proc) }
+    end
+    # Hardcoded values that we may wish to reference
+    place_holders({account_number: 987_654_321})
+  end
+  export do
+    terminal title: "Preview", limit_rows: 20
+    csv file_name: "aws", directory: "/tmp/cure"
+  end
+end
+handler.process(:path, "path/to/my_sheet.csv")
+```

data/docs/builder/add.md ADDED Viewed

@@ -0,0 +1,52 @@
+[... go back to build contents](main.md)
+## Add
+### What is it?
+Add builder will add a new, empty column to the spreadsheet.
+### Why would you need it?
+As useless as a new empty column sounds, it can be used for a placeholder column to be used later. A common example
+of this may be if you want to add a variable to each row.  For example, at the top of a spreadsheet, you may have a
+date, but you want to add that to each row.
+### Full Configuration
+```ruby
+build do
+  candidate(column: "new_column", named_range: "mysheet") { add options: { default_value: "-" } }
+end
+```
+- `column`: represents the column name, mandatory.
+- `named_range`: specifies the named range holding the column, if no named range has been set you can leave it blank.
+  - `options`:
+    - `value`: not mandatory, if provided will add to the initial row value.
+### Example
+```ruby
+build do
+  candidate(column: "col_b") { add }
+end
+```
+Original input:
+```
++-------+
+| col_a |
++-------+
+| a     |
++-------+
+```
+changes to:
+```
++-------+-------+
+| col_a | col_b |
++-------+-------+
+| a     |       |
++-------+-------+
+```