cure 0.1.1 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (112) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +16 -3
  3. data/.tool-versions +1 -0
  4. data/Dockerfile +1 -1
  5. data/Gemfile +1 -0
  6. data/Gemfile.lock +25 -6
  7. data/README.md +59 -81
  8. data/docs/README.md +33 -0
  9. data/docs/about.md +219 -0
  10. data/docs/builder/add.md +52 -0
  11. data/docs/builder/black_white_list.md +83 -0
  12. data/docs/builder/copy.md +48 -0
  13. data/docs/builder/explode.md +70 -0
  14. data/docs/builder/main.md +43 -0
  15. data/docs/builder/remove.md +46 -0
  16. data/docs/examples/examples.md +164 -0
  17. data/docs/export/main.md +37 -0
  18. data/docs/extract/main.md +89 -0
  19. data/docs/metadata/main.md +29 -0
  20. data/docs/query/main.md +45 -0
  21. data/docs/sources/main.md +36 -0
  22. data/docs/transform/main.md +53 -0
  23. data/docs/validate/main.md +42 -0
  24. data/exe/cure +12 -37
  25. data/exe/cure.old +59 -0
  26. data/lib/cure/builder/base_builder.rb +151 -0
  27. data/lib/cure/builder/candidate.rb +56 -0
  28. data/lib/cure/cli/command.rb +105 -0
  29. data/lib/cure/cli/generate_command.rb +54 -0
  30. data/lib/cure/cli/new_command.rb +52 -0
  31. data/lib/cure/cli/run_command.rb +19 -0
  32. data/lib/cure/cli/templates/README.md.erb +1 -0
  33. data/lib/cure/cli/templates/gemfile.erb +5 -0
  34. data/lib/cure/cli/templates/gitignore.erb +181 -0
  35. data/lib/cure/cli/templates/new_template.rb.erb +31 -0
  36. data/lib/cure/cli/templates/tool-versions.erb +1 -0
  37. data/lib/cure/config.rb +151 -13
  38. data/lib/cure/coordinator.rb +108 -0
  39. data/lib/cure/database.rb +191 -0
  40. data/lib/cure/dsl/builder.rb +26 -0
  41. data/lib/cure/dsl/exporters.rb +45 -0
  42. data/lib/cure/dsl/extraction.rb +60 -0
  43. data/lib/cure/dsl/metadata.rb +33 -0
  44. data/lib/cure/dsl/queries.rb +36 -0
  45. data/lib/cure/dsl/source_files.rb +36 -0
  46. data/lib/cure/dsl/template.rb +131 -0
  47. data/lib/cure/dsl/transformations.rb +95 -0
  48. data/lib/cure/dsl/validator.rb +22 -0
  49. data/lib/cure/export/base_processor.rb +194 -0
  50. data/lib/cure/export/manager.rb +24 -0
  51. data/lib/cure/extract/base_processor.rb +47 -0
  52. data/lib/cure/extract/csv_lookup.rb +43 -0
  53. data/lib/cure/extract/extractor.rb +80 -0
  54. data/lib/cure/extract/filter.rb +118 -0
  55. data/lib/cure/extract/named_range.rb +94 -0
  56. data/lib/cure/extract/named_range_processor.rb +128 -0
  57. data/lib/cure/extract/variable.rb +25 -0
  58. data/lib/cure/extract/variable_processor.rb +57 -0
  59. data/lib/cure/generator/base_generator.rb +61 -0
  60. data/lib/cure/generator/case_generator.rb +32 -0
  61. data/lib/cure/generator/character_generator.rb +41 -0
  62. data/lib/cure/generator/erb_generator.rb +21 -0
  63. data/lib/cure/generator/eval_generator.rb +34 -0
  64. data/lib/cure/generator/faker_generator.rb +31 -0
  65. data/lib/cure/generator/guid_generator.rb +21 -0
  66. data/lib/cure/generator/hex_generator.rb +21 -0
  67. data/lib/cure/generator/imports.rb +16 -0
  68. data/lib/cure/generator/number_generator.rb +21 -0
  69. data/lib/cure/generator/placeholder_generator.rb +26 -0
  70. data/lib/cure/generator/proc_generator.rb +21 -0
  71. data/lib/cure/generator/redact_generator.rb +22 -0
  72. data/lib/cure/generator/static_generator.rb +21 -0
  73. data/lib/cure/generator/variable_generator.rb +26 -0
  74. data/lib/cure/helpers/file_helpers.rb +50 -0
  75. data/lib/cure/helpers/object_helpers.rb +17 -0
  76. data/lib/cure/helpers/perf_helpers.rb +30 -0
  77. data/lib/cure/helpers/string.rb +54 -0
  78. data/lib/cure/launcher.rb +125 -0
  79. data/lib/cure/log.rb +10 -3
  80. data/lib/cure/planner.rb +136 -0
  81. data/lib/cure/strategy/append_strategy.rb +28 -0
  82. data/lib/cure/strategy/base_strategy.rb +98 -0
  83. data/lib/cure/strategy/contain_strategy.rb +51 -0
  84. data/lib/cure/strategy/end_with_strategy.rb +52 -0
  85. data/lib/cure/strategy/full_strategy.rb +28 -0
  86. data/lib/cure/strategy/history/history_cache.rb +82 -0
  87. data/lib/cure/strategy/imports.rb +12 -0
  88. data/lib/cure/strategy/match_strategy.rb +48 -0
  89. data/lib/cure/strategy/prepend_strategy.rb +28 -0
  90. data/lib/cure/strategy/regex_strategy.rb +55 -0
  91. data/lib/cure/strategy/split_strategy.rb +58 -0
  92. data/lib/cure/strategy/start_with_strategy.rb +53 -0
  93. data/lib/cure/transformation/candidate.rb +47 -36
  94. data/lib/cure/transformation/transform.rb +29 -71
  95. data/lib/cure/validator/base_rule.rb +78 -0
  96. data/lib/cure/validator/candidate.rb +54 -0
  97. data/lib/cure/validator/manager.rb +21 -0
  98. data/lib/cure/validators.rb +71 -0
  99. data/lib/cure/version.rb +1 -1
  100. data/lib/cure.rb +19 -6
  101. data/templates/dsl_example.rb +48 -0
  102. data/templates/empty_template.rb +31 -0
  103. metadata +161 -23
  104. data/lib/cure/csv_helpers.rb +0 -6
  105. data/lib/cure/export/exporter.rb +0 -49
  106. data/lib/cure/file_helpers.rb +0 -38
  107. data/lib/cure/generator/base.rb +0 -148
  108. data/lib/cure/main.rb +0 -63
  109. data/lib/cure/object_helpers.rb +0 -27
  110. data/lib/cure/strategy/base.rb +0 -223
  111. data/templates/aws_cur_template.json +0 -143
  112. data/templates/example_template.json +0 -38
@@ -0,0 +1,83 @@
1
+ [... go back to build contents](main.md)
2
+
3
+ ## Black/White List
4
+
5
+ ### What is it?
6
+
7
+ These builders operate at a sheet level, bulk changing the columns provided.
8
+ - Blacklist will bulk remove any columns **in** the list provided.
9
+ - Whitelist will bulk remove any columns **not in** the list provided.
10
+
11
+ ### Why would you need it?
12
+
13
+ If you have a large spreadsheet that you want to control the presence or removal of multiple columns,
14
+ the quickest way to do so is via this option.
15
+
16
+ ### Full Configuration
17
+
18
+ ```ruby
19
+ candidate(named_range: "mysheet") do
20
+ blacklist options: { columns: %w[col_a col_b] }
21
+ whitelist options: { columns: %w[col_c col_d] }
22
+ end
23
+ ```
24
+
25
+ - `named_range`: specifies the named range holding the column, if no named range has been set you can leave it blank.
26
+ - `options`:
27
+ - `columns`: mandatory, will perform filtering on these options.
28
+
29
+ ### Example
30
+
31
+ ## Blacklist
32
+
33
+ ```ruby
34
+ candidate(named_range: "mysheet") do
35
+ blacklist options: { columns: %w[col_a col_b] }
36
+ end
37
+ ```
38
+
39
+ Original input:
40
+ ```
41
+ +-------+-------+-------+
42
+ | col_a | col_b | col_c |
43
+ +-------+-------+-------+
44
+ | a | b | c |
45
+ +-------+-------+-------+
46
+ ```
47
+
48
+ changes to:
49
+
50
+ ```
51
+ +-------+
52
+ | col_c |
53
+ +-------+
54
+ | c |
55
+ +-------+
56
+ ```
57
+
58
+ ## Whitelist
59
+
60
+ ```ruby
61
+ candidate(named_range: "mysheet") do
62
+ whitelist options: { columns: %w[col_a col_b] }
63
+ end
64
+ ```
65
+
66
+ Original input:
67
+ ```
68
+ +-------+-------+-------+
69
+ | col_a | col_b | col_c |
70
+ +-------+-------+-------+
71
+ | a | b | c |
72
+ +-------+-------+-------+
73
+ ```
74
+
75
+ changes to:
76
+
77
+ ```
78
+ +-------+-------+
79
+ | col_a | col_b |
80
+ +-------+-------+
81
+ | a | b |
82
+ +-------+-------+
83
+ ```
@@ -0,0 +1,48 @@
1
+ [... go back to build contents](main.md)
2
+
3
+ ## Copy
4
+
5
+ ### What is it?
6
+
7
+ Copy builder will copy an entire column from the spreadsheet.
8
+
9
+ ### Why would you need it?
10
+
11
+ Copy a column from the spreadsheet, useful if you want to transform or manipulate the column in the transform stage.
12
+
13
+ ### Full Configuration
14
+
15
+ ```ruby
16
+ build do
17
+ candidate(column: "col_b", named_range: "_default") { copy options: { to_column: "col_b_copy" } }
18
+ end
19
+ ```
20
+ - `column`: represents the column name, mandatory.
21
+ - `named_range`: specifies the named range holding the column, if no named range has been set you can leave it blank.
22
+ - `options`:
23
+ - `to_column`: column will be renamed to this value if set, otherwise will default to <column>_copy.
24
+
25
+ ### Example
26
+
27
+ ```ruby
28
+ build do
29
+ candidate(column: "col_a") { copy options: { to_column: "col_a_copy" } }
30
+ end
31
+ ```
32
+
33
+ Original input:
34
+ ```
35
+ +-------+
36
+ | col_a |
37
+ +-------+
38
+ | a |
39
+ +-------+
40
+ ```
41
+ changes to:
42
+ ```
43
+ +-------+------------+
44
+ | col_a | col_a_copy |
45
+ +-------+------------+
46
+ | a | a |
47
+ +-------+------------+
48
+ ```
@@ -0,0 +1,70 @@
1
+ [... go back to build contents](main.md)
2
+
3
+ ## Explode
4
+
5
+ ### What is it?
6
+
7
+ Explode takes a JSON string and will break it out into columns intelligently.
8
+
9
+ ### Why would you need it?
10
+
11
+ If you have a JSON column and you wish to treat the values as an individual column per key. This is popular in technical
12
+
13
+ ### Full Configuration
14
+
15
+ ```yaml
16
+ build:
17
+ candidates:
18
+ - column: "column_name"
19
+ named_range: "default"
20
+ action:
21
+ type: "explode"
22
+ options:
23
+ filter:
24
+ type: "whitelist|blacklist"
25
+ values:
26
+ - "example"
27
+ ```
28
+ -
29
+ - `column`: represents the column name, mandatory.
30
+ - `named_range`: specifies the named range holding the column, if no named range has been set you can leave it blank.
31
+ - `action`: represents the action that will be taken on the data
32
+ - `type`: specifies the type of action, in this instance is explode
33
+ - `options`:
34
+ - `filter`: filters out the candidate columns, you can either use whitelist or blacklist.
35
+ - `values`: contains the names of the columns that will be filtered with.
36
+
37
+
38
+ ### Example
39
+
40
+ ```yaml
41
+ build:
42
+ candidates:
43
+ - column: tags
44
+ action:
45
+ type: "explode"
46
+ ```
47
+
48
+ Original input:
49
+ ```
50
+ +---------------------------------+
51
+ | tags |
52
+ +---------------------------------+
53
+ | {"type":"string","name":"abcd"} |
54
+ +---------------------------------+
55
+ | {"tier":"high"} |
56
+ +---------------------------------+
57
+ ```
58
+
59
+ changes to:
60
+
61
+ ```
62
+ +----------------------------------+--------+-------+------+
63
+ | tags | type | name | tier |
64
+ +----------------------------------+--------+-------+------+
65
+ | {"type":"string","name":"abcde"} | string | abcde | |
66
+ | {"tier":"string"} | | | high |
67
+ +----------------------------------+--------+-------+------+
68
+ ```
69
+
70
+ **Note:** if you want to remove the original column (tags) you can do so with the [remove](remove.md) option.
@@ -0,0 +1,43 @@
1
+ Source > Extract > Validate > **Build** > Query > Transform > Export
2
+
3
+ Build
4
+ =======
5
+
6
+ ### About
7
+
8
+ The build step immediately follows the **extract** step, and operates at a column level on the spreadsheet. It provides
9
+ an interface to manipulation that you may wish to occur across all columns. Individual build steps are called
10
+ candidates, and *multiple steps can be performed on a single column* if desired.
11
+
12
+ ---
13
+
14
+ **When you should use this**: You have a spreadsheet that requires changes to the column structure of the data. This
15
+ may be as trivial as adding or removing a column, or *exploding* a JSON object (Key => Val) into individual columns.
16
+
17
+ See below an example configuration block:
18
+
19
+ ```ruby
20
+ build do
21
+ # White/Blacklist - do not need to provide column into candidate
22
+ candidate do
23
+ blacklist options: { columns: %w[col_a col_b] }
24
+ whitelist options: { columns: %w[col_c col_d] }
25
+ end
26
+
27
+ # Simple addition of new column
28
+ candidate(column: "full_name") { add options: { default_value: "ABC" } }
29
+ # Simple renaming of existing column
30
+ candidate(column: "Tags") { rename options: { new_name: "test" } }
31
+ end
32
+ ```
33
+
34
+ - `column`: represents the column name, mandatory.
35
+ - `named_range`: specifies the named range holding the column, if no named range has been set you can leave it blank.
36
+
37
+ ### Components
38
+
39
+ There are four different types of operations you can perform in this step;
40
+ - [add](add.md)
41
+ - [remove](remove.md)
42
+ - [copy](copy.md)
43
+ - [black_white_list](black_white_list.md)
@@ -0,0 +1,46 @@
1
+ [... go back to build contents](main.md)
2
+
3
+ ## Remove
4
+
5
+ ### What is it?
6
+
7
+ Remove builder will remove an entire column from the spreadsheet.
8
+
9
+ ### Why would you need it?
10
+
11
+ Removes a column from the spreadsheet, useful if you want to remove entire columns from the output.
12
+
13
+ ### Full Configuration
14
+
15
+ ```ruby
16
+ build do
17
+ candidate(column: "remove_this", named_range: "mysheet") { remove }
18
+ end
19
+ ```
20
+ - `column`: represents the column name, mandatory.
21
+ - `named_range`: specifies the named range holding the column, if no named range has been set you can leave it blank.
22
+
23
+ ### Example
24
+
25
+ ```ruby
26
+ build do
27
+ candidate(column: "col_b") { remove }
28
+ end
29
+ ```
30
+
31
+ Original input:
32
+ ```
33
+ +-------+-------+
34
+ | col_a | col_b |
35
+ +-------+-------+
36
+ | a | b |
37
+ +-------+-------+
38
+ ```
39
+ changes to:
40
+ ```
41
+ +-------+
42
+ | col_a |
43
+ +-------+
44
+ | a |
45
+ +-------+
46
+ ```
@@ -0,0 +1,164 @@
1
+ ## Examples
2
+
3
+ Below are some examples of Cure being used to transform odd CSV formats, or unusual tasks.
4
+
5
+ ### Multi row grouping
6
+
7
+ In the example below we want to:
8
+ 1. Group the rows on identifier
9
+ 2. Change gender to single letter,
10
+ 3. Create a full name column that joins first_name and last_name, and capitalizes them.
11
+
12
+ | id | identifier | first_name | last_name | age | gender |
13
+ |----|------------|------------|-----------|-----|--------|
14
+ | 1 | 1 | joe | smith | 20 | |
15
+ | 2 | 1 | | | | male |
16
+ | 3 | 2 | lean | davis | 32 | |
17
+ | 4 | 2 | | | | female |
18
+
19
+ to
20
+
21
+ | id | identifier | first_name | last_name | age | gender | full_name |
22
+ |----|------------|------------|-----------|-----|--------|------------|
23
+ | 1 | 1 | joe | smith | 20 | M | Joe Smith |
24
+ | 3 | 2 | lean | davis | 32 | F | Lean Davis |
25
+
26
+ using
27
+
28
+ ```ruby
29
+ build do
30
+ candidate column: "full_name" do
31
+ add options: { default_value: "" }
32
+ end
33
+ end
34
+
35
+ transform do
36
+ from query: <<-SQL
37
+ SELECT
38
+ id as id,
39
+ identifier as identifier,
40
+ group_concat(first_name, '') as first_name,
41
+ group_concat(last_name, '') as last_name,
42
+ group_concat(gender, '') as gender,
43
+ group_concat(age, '') as age,
44
+ full_name FROM _default
45
+ GROUP BY identifier
46
+ SQL
47
+
48
+ candidate column: "gender" do
49
+ with_translation { replace("full").with("case",
50
+ statement: {
51
+ switch: [
52
+ {
53
+ case: "male",
54
+ return_value: "M"
55
+ },{
56
+ case: "female",
57
+ return_value: "F"
58
+ }
59
+ ],
60
+ else: [
61
+ return_value: "<unknown gender>"
62
+ ]
63
+ })
64
+ }
65
+ end
66
+
67
+ candidate column: "full_name" do
68
+ with_translation { replace("full").with("erb",
69
+ template: "<%= first_name.capitalize %> <%= last_name.capitalize %>")
70
+ }
71
+ end
72
+ end
73
+
74
+ export do
75
+ terminal title: "Exported", limit_rows: 5
76
+ end
77
+
78
+ ```
79
+
80
+ ### AWS Cost and Usage Report Anonymization
81
+
82
+ Below a small subset of the Cost and Usage Report provided by Amazon that hold information that we want to transform.
83
+
84
+ Some thoughts;
85
+ - the **identity/LineItemId** column has seemingly random characters that may be the same (see row ids 9 and 10)
86
+ - **lineItem/ResourceId** has records that hold account numbers, we want to ensure that they are the same as **bill/PayerAccountId**
87
+ and **lineItem/UsageAccountId** for consistent data.
88
+
89
+ | id | identity/LineItemId | bill/PayerAccountId | lineItem/UsageAccountId | lineItem/ProductCode | lineItem/ResourceId |
90
+ |----|------------------------------------------------------|---------------------|-------------------------|----------------------|----------------------------------------------|
91
+ | 1 | mggj00y7rig8p3xjma6rpzkvtrn98q4a0ortz9ddgquu0pv3xshq | 9876543210 | 9876543210 | AmazonS3 | cloudtrail-9876543210 |
92
+ | 2 | t8ubihdw6ad39awf1748v98yim4uh6wyjzr59bziwwcfnyu4rxhf | 9876543210 | 9876543210 | AmazonS3 | cloudtrail-9876543210 |
93
+ | 3 | 8c8u2fcetmrz3f0x52coe4wgjv77ffxx2ivgitg1a1nacpo8menv | 9876543210 | 9876543210 | AmazonCloudFront | arn:aws:cloudfront::9876543210:Overhold |
94
+ | 4 | 9jqoasom8qnxma5rjqhawkncrhev0ocsp4ax5pngrp8l1yno03v3 | 9876543210 | 9876543210 | AmazonS3 | aws-cloudtrail-logs-9876543210 |
95
+ | 5 | 35znibzyuoisze9x45377jqkbd7o677w4mhgl8hyte8born5h1h3 | 9876543210 | 9876543210 | AmazonCloudFront | arn:aws:cloudfront::9876543210:Overhold |
96
+ | 6 | tb8qzhsrqu0z613jervo541l7p95b5pq2k80m7hcsnqjjjs6jnlx | 9876543210 | 9876543210 | awskms | arn:aws:kms:ap-southeast-2:9876543210:Zoolab |
97
+ | 7 | c0k9bpm5y5m1aoebsrlc2ozdgqoqjkyjy7z0hx7kv4y93gx8ioji | 9876543210 | 9876543210 | AWSLambda | arn:aws:lambda:Trippledex |
98
+ | 8 | ju8pmo0qqn5c2tapej4toy3c95w08ym6uar9hllyf3r0oj1hoiya | 9876543210 | 9876543210 | AmazonEC2 | vol-3ef2aece632 |
99
+ | 9 | f5kta3av4k5k2fve6l8g370bj41leqzkazsad28hjnu2xngn8f86 | 9876543210 | 9876543210 | AmazonS3 | cloudtrail-9876543210 |
100
+ | 10 | f5kta3av4k5k2fve6l8g370bj41leqzkazsad28hjnu2xngn8f86 | 9876543210 | 9876543210 | AmazonS3 | cloudtrail-9876543210 |
101
+
102
+ ##### Configuration
103
+ ```ruby
104
+ transform do
105
+ # Operate on the "identity/LineItemId" column
106
+ candidate column: "identity/LineItemId" do
107
+ # Replace the full record with a random character string of 52 length, only consisting of
108
+ # lowercase and number values.
109
+ with_translation { replace("full").with("character", length: 52, types: %w[lowercase number]) }
110
+ end
111
+
112
+ candidate column: "bill/PayerAccountId" do
113
+ # Replace the full record with a placeholder named :account_number (See at bottom of file for placeholders)
114
+ with_translation { replace("full").with("placeholder", name: :account_number) }
115
+ end
116
+
117
+ candidate column: "lineItem/UsageAccountId" do
118
+ with_translation { replace("full").with("number", length: 10) }
119
+ end
120
+
121
+ candidate column: "lineItem/ResourceId" do
122
+ # If there is a match (i-[my-group]), replace just the match group with a hex string of 10 length
123
+ with_translation { replace("regex", regex_cg: "^i-(.*)").with("hex", length: 10) }
124
+ # If there is a match (vol-[my-group]), replace just the match group with a hex string of 10 length
125
+ with_translation { replace("regex", regex_cg: "^vol-(.*)").with("hex", length: 10) }
126
+ # If the string contains a token :, replace the 4th element with the account_number placeholder.
127
+ with_translation { replace("split", token: ":", index: 4).with("placeholder", name: :account_number) }
128
+ # If the string contains a token -, replace the last element with the account_number placeholder.
129
+ with_translation { replace("split", token: "-", index: -1).with("placeholder", name: :account_number) }
130
+ # If the string contains a token :, replace the last element with the a Faker value Faker::App.name.
131
+ with_translation { replace("split", token: ":", index: -1).with("faker", module: "App", method: "name") }
132
+
133
+ # If no match is found, replace the whole match with a prefix hidden_ along with a random 10 char hex string
134
+ if_no_match { replace("full").with("hex", prefix: "hidden_", length: 10) }
135
+ end
136
+
137
+ # Hardcoded values that we may wish to reference
138
+ place_holders({account_number: 1_234_567_890})
139
+ end
140
+
141
+ export do
142
+ # Export to terminal a table with only 10 rows.
143
+ terminal title: "Exported", row_count: 10
144
+ end
145
+ ```
146
+
147
+ With these rules, the above file becomes:
148
+
149
+ Output:
150
+
151
+ | id | identity/LineItemId | bill/PayerAccountId | lineItem/UsageAccountId | lineItem/ProductCode | lineItem/ResourceId |
152
+ |----|------------------------------------------------------|---------------------|-------------------------|----------------------|----------------------------------------------|
153
+ | 1 | ozsmh5j4oqnfgnv7k82tx1yne4h62rt2rfiilo0clt306ts9ib9g | 1234567890 | 1234567890 | AmazonS3 | cloudtrail-1234567890 |
154
+ | 2 | soha1946igwsaz8iju4a6q9305yd1cj9gluqwxu6lmjor1wf4yb0 | 1234567890 | 1234567890 | AmazonS3 | cloudtrail-1234567890 |
155
+ | 3 | k5a29qle33aqoemi74m75pwmhv5xq4sau6e6pyc9pc93g6stzk8s | 1234567890 | 1234567890 | AmazonCloudFront | arn:aws:cloudfront::1234567890:Latlux |
156
+ | 4 | 9i0pxzj7mgfy2nnjhalxatck9xidqt55vvmopiotv23raaol9wh1 | 1234567890 | 1234567890 | AmazonS3 | aws-cloudtrail-logs-1234567890 |
157
+ | 5 | uvws7h5xqc8qov8ana6arxyr0urkhpgu9a0g3wzv1emq9z19bl9m | 1234567890 | 1234567890 | AmazonCloudFront | arn:aws:cloudfront::1234567890:Latlux |
158
+ | 6 | lhv6swfx2ulsfs8mpfrjutgq45kixouh0xjfvfo40g42757r7mje | 1234567890 | 1234567890 | awskms | arn:aws:kms:ap-southeast-2:1234567890:Sonair |
159
+ | 7 | zm6gwy8c5qxbe24du6oipdls3iyjp83a3000z6p1l26xo44e0swa | 1234567890 | 1234567890 | AWSLambda | arn:aws:lambda:Biodex |
160
+ | 8 | xcpy7jqbash47ckhyv8bnaqrf1tvsmrqq325vbebu550v7nnhef5 | 1234567890 | 1234567890 | AmazonEC2 | vol-1234567890 |
161
+ | 9 | o1b4h0yvkw0jkbrhewqr1s0cd9abyqol1r90jtitu7vcr2e6qvcb | 1234567890 | 1234567890 | AmazonS3 | cloudtrail-1234567890 |
162
+ | 10 | o1b4h0yvkw0jkbrhewqr1s0cd9abyqol1r90jtitu7vcr2e6qvcb | 1234567890 | 1234567890 | AmazonS3 | cloudtrail-1234567890 |
163
+
164
+ Note that rows 9 and 10 have the same **identity/LineItemId**, and **lineItem/ResourceId** references our new made up account number.
@@ -0,0 +1,37 @@
1
+ Source > Extract > Validate > Build > Query > Transform > **Export**
2
+
3
+ Export
4
+ =======
5
+
6
+ ### About
7
+
8
+ Exporting is the final step, where you are given each row at the end of each previous step. You can have multiple
9
+ exporters, that can each point to different named ranges, or the same.
10
+
11
+ A common pattern is to export the first 10 rows to terminal, and export the larger dataset to a CSV.
12
+
13
+ ---
14
+
15
+ **When you should use this**: You have transformed your data, and you want to save the results.
16
+
17
+ See below an example configuration block:
18
+
19
+ ### Example
20
+
21
+ ```ruby
22
+ export do
23
+ # Export to terminal window
24
+ terminal title: "Exported", limit_rows: 5, named_range: "mysheet"
25
+
26
+ # Export to a single CSV
27
+ csv file_name: "mysheet", directory: "/tmp/cure", named_range: "mysheet"
28
+
29
+ # Export to multiple CSVs each with 100 rows.
30
+ # These will be exported as 1_mysheet.csv, 2_mysheet.csv... n_mysheet.csv
31
+ chunk_csv file_name_prefix: "mysheet", directory: "/tmp/cure", chunk_size: 100, named_range: "mysheet"
32
+
33
+ # Yield out each row to a custom proc. This allows for the caller to do whatever they want
34
+ # with the row. You could use this to make a API call to insert data to remote system.
35
+ yield_row named_range: "mysheet", proc: proc { |row| puts row }
36
+ end
37
+ ```
@@ -0,0 +1,89 @@
1
+ Source > **Extract** > Validate > Build > Query > Transform > Export
2
+
3
+ Extract
4
+ =======
5
+
6
+ ### About
7
+
8
+ The extract step is the first step that is undertaken on the spreadsheet. If the spreadsheet is in the form you need,
9
+ (where headers and rows are in the right place), this step is not necessary.
10
+
11
+ There are two main processes that are available in this section; named ranges and variables.
12
+
13
+ **Named ranges** are a subset of your spreadsheets data. In some situations, spreadsheets may have more than one section
14
+ of data that you are interested in. Using named ranges, and simple notation (eg. B2:G6), you can select as many ranges
15
+ as needed, and format them back together at the end.
16
+
17
+ **Variables** are a single row value that is extracted into a hash, and available at the transform stage. A common use
18
+ for this would be to extract a value from somewhere in the spreadsheet to allow it to be added to each row.
19
+
20
+ ---
21
+
22
+ **When you should use this**: You have a spreadsheet that has more data than you need, or is in a format that is not
23
+ strictly in a tabular format. You may want to extract a part (or multiple parts) of the spreadsheet, and discard the
24
+ rest.
25
+
26
+ See below an example configuration block:
27
+
28
+ ### Example
29
+
30
+ ```ruby
31
+ extract do
32
+ named_range name: "main", at: "B2:D4", headers: "B2:B4", ref_name: "_default"
33
+ named_range name: "secondary", at: "A2:D3", ref_name: "_default"
34
+ named_range name: "full", at: -1, ref_name: "_default"
35
+
36
+ variable name: "my_string", at: "E5", ref_name: "_default"
37
+ end
38
+ ```
39
+
40
+ - `name`: represents what you want to call the named range, mandatory.
41
+ - `at`: specifies the named range location in the sheet. -1 will collect the entire sheet.
42
+ - `headers`: specifies the named range location of the headers. Leave off unless they are not on the top row.
43
+ - `ref_name`: specifies the file to use to extract the named range. If you are only processing a single file you
44
+ do not need to supply (default ref_name is "_default").
45
+
46
+ If you do not supply any named range, a default named range is given "_default" which encompasses the entire sheet.
47
+ You do not need to supply this in other parts of the template as if they are not set, they will default to "_default".
48
+
49
+ Original input:
50
+ ```
51
+ +----+----+----+----+----+
52
+ | a1 | b1 | c1 | d1 | e1 |
53
+ | a2 | b2 | c2 | d2 | e2 |
54
+ | a3 | b3 | c3 | d3 | e3 |
55
+ | a4 | b4 | c4 | d4 | e4 |
56
+ | a5 | b5 | c5 | d5 | e5 |
57
+ +----+----+----+----+----+
58
+ ```
59
+ changes to:
60
+ ```
61
+ +--------------+
62
+ | main |
63
+ +----+----+----+
64
+ | b2 | c2 | d2 |
65
+ | b3 | c3 | d3 |
66
+ | b4 | c4 | d4 |
67
+ +----+----+----+
68
+
69
+ +----+----+----+----+
70
+ | secondary |
71
+ +----+----+----+----+
72
+ | a2 | b2 | c2 | d2 |
73
+ | a3 | b3 | c3 | d3 |
74
+ +----+----+----+----+
75
+
76
+ +----+----+----+----+----+
77
+ | full |
78
+ +----+----+----+----+----+
79
+ | a1 | b1 | c1 | d1 | e1 |
80
+ | a2 | b2 | c2 | d2 | e2 |
81
+ | a3 | b3 | c3 | d3 | e3 |
82
+ | a4 | b4 | c4 | d4 | e4 |
83
+ | a5 | b5 | c5 | d5 | e5 |
84
+ +----+----+----+----+----+
85
+
86
+ variables
87
+ - my_string => "e5"
88
+ ```
89
+
@@ -0,0 +1,29 @@
1
+ **Metadata** > Source > Extract > Validate > Build > Query > Transform > Export
2
+
3
+ Metadata
4
+ =======
5
+
6
+ ### About
7
+
8
+ The metadata step will not affect the process, but allows you to document things you might want to in the template.
9
+
10
+ ---
11
+
12
+ **When you should use this**: You want to record some information - version, author, date.
13
+
14
+ See below an example configuration block:
15
+
16
+ ### Example
17
+
18
+ ```ruby
19
+ metadata do
20
+ name "My Dataset"
21
+ version 1
22
+ comments "A useless comment"
23
+ additional data: {
24
+ created_date: "2023-01-01 00:00",
25
+ author: "william"
26
+ }
27
+ end
28
+ ```
29
+
@@ -0,0 +1,45 @@
1
+ Source > Extract > Validate > Build > **Query** > Transform > Export
2
+
3
+ Query
4
+ =======
5
+
6
+ ### About
7
+
8
+ The query step allows you to customise what data is returned from the extract step.
9
+
10
+ If this step is not provided, `SELECT * FROM _default` is run. Whatever you put in the SELECT (aliases etc) will
11
+ be returned to you for transforming.
12
+
13
+ ---
14
+
15
+ **When you should use this**: You want to harness the full power of SQL to return a more tailored response.
16
+
17
+ See below an example configuration block:
18
+
19
+ ### Example
20
+
21
+ ```ruby
22
+ query do
23
+ with named_range: "data_log", query: <<-SQL
24
+ SELECT
25
+ *
26
+ FROM
27
+ data_log
28
+ WHERE
29
+ Equipment = 'Raw'
30
+ AND
31
+ (Division = 'O' OR Division = 'Open')
32
+ AND
33
+ Event = 'SBD'
34
+ AND
35
+ ParentFederation = 'IPF'
36
+ AND
37
+ Sex = 'F'
38
+ AND
39
+ strftime('%Y', Date) > '2014'
40
+ ORDER BY Date DESC
41
+ SQL
42
+ end
43
+
44
+ ```
45
+