embulk-filter-csv_lookup 0.1.5 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 81903c15c6bf37ce58f84137606978f41381550dc7848d944414d9f4d3d29196
4
- data.tar.gz: b26ed58c532bfcaadc933d2087b441dbbd4f84ae508175c314fbd4118fa5bd12
3
+ metadata.gz: d8f5a38032aa139ae3265d6a4d3fb145686e81f93d9fde6da1af260b3a64c13c
4
+ data.tar.gz: b3dfec4d3c20c44cd532b0ec61bbd7a778fb5584a63d4f585a50aa7a664a206f
5
5
  SHA512:
6
- metadata.gz: 24dc923bf58b3873532edb57f9a346a6bc0c882721cbb4a04babb4345f68ca2e126786ee7cb20f743d7f9f22d1999d37ca260dade4fa08e421edd4d62f8356e2
7
- data.tar.gz: afe81f466984806010a64e1f86343b70438af744d230810b488933be7ef14b897382df5b3c442dc9a94de6ea3e7491ca8b43a0e71bd0cc7d256e196018c0ff27
6
+ metadata.gz: 365cc634d25b4cfa4cebaf441e57d7245f4737744ed77209ecf3a73d73909bb853798b7b3bfdef89325436d3a063308defc8997b1e34d8db8b5dde399f6af452
7
+ data.tar.gz: 6d0e4a9a03f2ab5f40940809b7c587d75f1b641a9849c20a0afd1bc360c2fc1ef8e8e4badbc4ff8349115f2141f1288a5d7c0dfd3ffa49a6e8fe80ac1a26d78f
data/README.md CHANGED
@@ -22,83 +22,98 @@ An Embulk filter plugin for Lookup Transformation with CSV
22
22
  - **Name of column-1**: column name-1 from input file
23
23
  - **Name of column-2**: column name-2 from input file
24
24
  - **new_columns**: (New generated column names) (required)
25
- - **Name-1,Type-1**: Any Name, Type of the name (name: country_name, type: string)
26
- - **Name-2,Type-2**: Any Name, Type of the name (name: country_address, type: string) etc ...
25
+ - **Name-1,Type-1**: Any Name, Type of the name { name: car_id, type: long }
26
+ - **Name-2,Type-2**: Any Name, Type of the name { name: category, type: string }
27
+ - **Name-3,Type-3**: Any Name, Type of the name { name: fuel_capacity, type: string }
27
28
  - **path_of_lookup_file**: lookup file path (required)
28
29
  ## Example - columns
29
30
 
30
- Input1 for table 1 is as follows :-
31
+ Customer.csv for table 1 is as follows :-
31
32
 
32
33
  ```
33
- year country_code country_name literacy_rate
34
-
35
- 1990 1 India 80%
36
- 1993 2 USA 83%
37
- 1997 3 JAPAN
38
- 1999 4 China 72%
39
- 2000 5 Ukraine 68%
40
- 2002 6 Italy 79%
41
- 2004 7 UK 75%
42
- 2011 8 NULL 42%
34
+ id customer_name address email car_name company
35
+ 1 John Doe 123 Main St, Anytown USA john.doe@example.com Civic Honda
36
+ 2 Jane Smith 456 Elm St, Anytown USA jane.smith@example.com E-Class Mercedes-Benz
37
+ 3 Bob Johnson 789 Oak St, Anytown USA bob.johnson@example.com GLE-Class Mercedes-Benz
38
+ 4 Amanda Hernandez 999 Cedar St, Anytown USA amanda.hernandez@example.com 911 119
39
+ 5 Tom Brown 567 Pine St, Anytown USA tom.brown@example.com C-Class Mercedes-Benz
40
+ 6 Samantha Davis 890 Cedar St, Anytown USA samantha.davis@example.com Civic Honda
41
+ 7 Mike Wilson 1234 Spruce St, Anytown USA mike.wilson@example.com GLE-Class Mercedes-Benz
42
+ 8 Jason Brown 888 Pine St, Anytown USA jason.brown@example.com 911 Porsche
43
+ 9 David Rodriguez 9010 Oak St, Anytown USA david.rodriguez@example.com GLC-Class Mercedes-Benz
44
+ 10 Mark Davis 666 Spruce St, Anytown USA mark.davis@example.com C-Class Mercedes-Benz
45
+ 11 Chris Thompson 222 Cedar St, Anytown USA chris.thompson@example.com Cayenne Porsche
46
+ 12 Linda Young 555 Birch St, Anytown USA linda.young@example.com RAV4
47
+ 13 Kevin Hernandez 444 Maple St, Anytown USA kevin.hernandez@example.com 911 119
43
48
  ```
44
49
 
45
- Input2 for table 2 is as follows :-
50
+ Car.csv for table 2 is as follows :-
46
51
 
47
52
  ```
48
- id country_population country_address country_GDP
49
-
50
- 1 11.3 India 1.67
51
- 2 18.2 USA 16.72
52
- 3 30 JAPAN 5.00
53
- 4 4 China 9.33
54
- 5 57 Ukraine 1.08
55
- 6 63 Italy 2.068
56
- 7 17 UK 2.49
57
- 8 28 UAE 1.18
53
+ car_id model brand category fuel_capacity
54
+ 87 GLE-Class Mercedes-Benz SUV 80
55
+ 101 Cayenne Porsche SUV 75
56
+ 119 911 Porsche Sports Car 64
57
+ 205 Accord Honda Sedan 56
58
+ 334 Pilot Honda SUV 70
59
+ 434 CR-v Honda SUV 64
60
+ 559 C-Class Mercedes-Benz Sedan 66
61
+ 603 Civic Honda Sedan 42
62
+ 697 E-Class Mercedes-Benz Sedan 72
63
+ 812 GLC-Class Mercedes-Benz Sedan 68
58
64
 
59
-
60
- Note: country_population is calculated in Billion and country_GDP is calculated in $USD Trillion
61
65
  ```
62
66
 
63
67
  As shown in yaml below, columns mentioned in mapping_from will be mapped with columns mentioned in mapping_to
64
68
  ie:
65
-
66
-
67
- country_code : id
68
- country_name : country_address
69
+ car_name : model
70
+ company : brand
69
71
 
70
72
  After successful mapping an Output.csv file containing the columns mentioned in new_columns will be generated
71
73
 
72
74
  Output File generated :-
73
75
 
74
76
  ```
75
- year country_code country_name literacy_rate country_GDP country_population
76
-
77
- 1990 1 India 80% 1.67 11.3
78
- 1993 2 USA 83% 16.72 18.2
79
- 1997 3 JAPAN 5.00 30
80
- 1999 4 China 72% 9.33 4
81
- 2000 5 Ukraine 68% 1.08 57
82
- 2002 6 Italy 79% 2.068 63
83
- 2004 7 UK 75% 2.49 17
84
- 2011 8 NULL 42%
77
+ id customer_name address email car_name company car_id category fuel_capacity
78
+ 1 John Doe 123 Main St, Anytown USA john.doe@example.com Civic Honda 603 Sedan 42
79
+ 2 Jane Smith 456 Elm St, Anytown USA jane.smith@example.com E-Class Mercedes-Benz 697 Sedan 72
80
+ 3 Bob Johnson 789 Oak St, Anytown USA bob.johnson@example.com GLE-Class Mercedes-Benz 87 SUV 80
81
+ 4 Amanda Hernandez 999 Cedar St, Anytown USA amanda.hernandez@example.com 911 119 0
82
+ 5 Tom Brown 567 Pine St, Anytown USA tom.brown@example.com C-Class Mercedes-Benz 559 Sedan 66
83
+ 6 Samantha Davis 890 Cedar St, Anytown USA samantha.davis@example.com Civic Honda 603 Sedan 42
84
+ 7 Mike Wilson 1234 Spruce St, Anytown USA mike.wilson@example.com GLE-Class Mercedes-Benz 87 SUV 80
85
+ 8 Jason Brown 888 Pine St, Anytown USA jason.brown@example.com 911 Porsche 119 Sport Car 64
86
+ 9 David Rodriguez 9010 Oak St, Anytown USA david.rodriguez@example.com GLC-Class Mercedes-Benz 812 SUV 68
87
+ 10 Mark Davis 666 Spruce St, Anytown USA mark.davis@example.com C-Class Mercedes-Benz 559 Sedan 66
88
+ 11 Chris Thompson 222 Cedar St, Anytown USA chris.thompson@example.com Cayenne Porsche 101 SUV 75
89
+ 12 Linda Young 555 Birch St, Anytown USA linda.young@example.com RAV4 \N 0
90
+ 13 Kevin Hernandez 444 Maple St, Anytown USA kevin.hernandez@example.com 911 119 0
85
91
  ```
86
92
 
87
93
  ```yaml
88
- - type: csv_lookup
89
- mapping_from:
90
- - country_code
91
- - country_name
92
- mapping_to:
93
- - id
94
- - country_address
95
- new_columns:
96
- - { name: country_GDP, type: string }
97
- - { name: country_population, type: string }
94
+ filters:
95
+ - type: csv_lookup
96
+ mapping_from:
97
+ - car_name
98
+ - company
99
+ mapping_to:
100
+ - model
101
+ - brand
102
+ new_columns:
103
+ - { name: car_id, type: long }
104
+ - { name: category, type: string }
105
+ - { name: fuel_capacity, type: string }
106
+ path_of_lookup_file: "..path../car.csv"
98
107
  ```
99
108
 
100
109
  Notes:
101
- 1. mapping_from attribute should be in same order as mentioned in input file.
110
+ 1. mapping_from attribute should be in the same the order as mentioned in the input file.
111
+ 2. In case with JDBC plugin if any integer column returned as float/decimal then use to cast that column as long as below
112
+ ```
113
+ column_options:
114
+ id: {value_type: long}
115
+ ```
116
+ 3. Matching columns data types must be int,long and String
102
117
 
103
118
  ## Development
104
119
 
data/build.gradle CHANGED
@@ -13,7 +13,7 @@ configurations {
13
13
  provided
14
14
  }
15
15
 
16
- version = "0.1.5"
16
+ version = "0.1.6"
17
17
 
18
18
  sourceCompatibility = 1.8
19
19
  targetCompatibility = 1.8
@@ -0,0 +1,47 @@
1
+ exec:
2
+ max_threads: 1
3
+ min_output_tasks: 1
4
+ in:
5
+ type: file
6
+ path_prefix: "..path../customer.csv"
7
+ parser:
8
+ charset: UTF-8
9
+ type: csv
10
+ delimiter: ','
11
+ quote: '"'
12
+ header_line: true
13
+ columns:
14
+ - { name: id, type: long }
15
+ - { name: customer_name, type: string }
16
+ - { name: address, type: string }
17
+ - { name: email, type: string }
18
+ - { name: car_name, type: string }
19
+ - { name: company, type: string }
20
+ filters:
21
+ - type: csv_lookup
22
+ mapping_from:
23
+ - car_name
24
+ - company
25
+ mapping_to:
26
+ - model
27
+ - brand
28
+ new_columns:
29
+ - { name: car_id, type: long }
30
+ - { name: category, type: string }
31
+ - { name: fuel_capacity, type: string }
32
+ path_of_lookup_file: "..path../car.csv"
33
+ out:
34
+ type: file
35
+ path_prefix: "..path../output.csv"
36
+ file_ext: csv
37
+ formatter:
38
+ type: csv
39
+ delimiter: ","
40
+ newline: CRLF
41
+ newline_in_field: LF
42
+ charset: UTF-8
43
+ quote_policy: MINIMAL
44
+ quote: '"'
45
+ escape: "\\"
46
+ null_string: "\\N"
47
+ default_timezone: 'UTC'
@@ -0,0 +1,33 @@
1
+ exec:
2
+ max_threads: 1
3
+ min_output_tasks: 1
4
+ in:
5
+ type: sqlserver
6
+ host: Localhost
7
+ driver_path: "..path../mssql-jdbc-10.2.0.jre17.jar"
8
+ user: "user"
9
+ password: "password"
10
+ database: "test"
11
+ table: customer
12
+ filters:
13
+ - type: csv_lookup
14
+ mapping_from:
15
+ - car_name
16
+ - company
17
+ mapping_to:
18
+ - model
19
+ - brand
20
+ new_columns:
21
+ - { name: car_id, type: long }
22
+ - { name: category, type: string }
23
+ - { name: fuel_capacity, type: string }
24
+ path_of_lookup_file: "..path../car.csv"
25
+ out:
26
+ type: sqlserver
27
+ host: Localhost
28
+ driver_path: "..path../mssql-jdbc-10.2.0.jre17.jar"
29
+ user: "usert"
30
+ password: "password"
31
+ database: "test"
32
+ table: output_table
33
+ mode: truncate_insert
@@ -0,0 +1,31 @@
1
+ exec:
2
+ max_threads: 1
3
+ min_output_tasks: 1
4
+ in:
5
+ type: mysql
6
+ host: localhost
7
+ user: root
8
+ password: 'passsword'
9
+ database: test
10
+ table: customer
11
+ filters:
12
+ - type: csv_lookup
13
+ mapping_from:
14
+ - car_name
15
+ - company
16
+ mapping_to:
17
+ - model
18
+ - brand
19
+ new_columns:
20
+ - { name: car_id, type: long }
21
+ - { name: category, type: string }
22
+ - { name: fuel_capacity, type: string }
23
+ path_of_lookup_file: "..path../car.csv"
24
+ out:
25
+ type: mysql
26
+ host: localhost
27
+ user: root
28
+ password: 'passsword'
29
+ database: test
30
+ table: output_table
31
+ mode: truncate_insert
@@ -0,0 +1,36 @@
1
+ exec:
2
+ max_threads: 1
3
+ min_output_tasks: 1
4
+ in:
5
+ type: jdbc
6
+ host: localhost
7
+ driver_path: "...path../ojdbc8.jar"
8
+ driver_class: 'oracle.jdbc.driver.OracleDriver'
9
+ url: jdbc:oracle:thin:@localhost:1521:orcl
10
+ user: MYUSER
11
+ password: ABCD
12
+ database: DEMO
13
+ table: customer
14
+ filters:
15
+ - type: csv_lookup
16
+ mapping_from:
17
+ - car_name
18
+ - company
19
+ mapping_to:
20
+ - model
21
+ - brand
22
+ new_columns:
23
+ - { name: car_id, type: long }
24
+ - { name: category, type: string }
25
+ - { name: fuel_capacity, type: string }
26
+ path_of_lookup_file: "..path../car.csv"
27
+ out:
28
+ type: jdbc
29
+ host: localhost
30
+ driver_path: "..path../ojdbc8.jar"
31
+ driver_class: 'oracle.jdbc.driver.OracleDriver'
32
+ url: jdbc:oracle:thin:@localhost:1521:orcl
33
+ user: MYUSER
34
+ password: ABCD
35
+ database: DEMO
36
+ table: output_table
@@ -0,0 +1,35 @@
1
+ exec:
2
+ max_threads: 1
3
+ min_output_tasks: 1
4
+ in:
5
+ type: postgresql
6
+ host: localhost
7
+ port: 5432
8
+ user: postgres
9
+ password: 1234
10
+ schema: public
11
+ database: test
12
+ table: customer
13
+ filters:
14
+ - type: csv_lookup
15
+ mapping_from:
16
+ - car_name
17
+ - company
18
+ mapping_to:
19
+ - model
20
+ - brand
21
+ new_columns:
22
+ - { name: car_id, type: long }
23
+ - { name: category, type: string }
24
+ - { name: fuel_capacity, type: string }
25
+ path_of_lookup_file: "..path../car.csv"
26
+ out:
27
+ type: postgresql
28
+ host: localhost
29
+ port: 5432
30
+ database: test
31
+ user: postgres
32
+ password: 1234
33
+ schema: public
34
+ table: output_table
35
+ mode: truncate_insert
@@ -14,6 +14,8 @@ import org.embulk.config.TaskSource;
14
14
  import org.embulk.spi.*;
15
15
  import org.embulk.spi.time.Timestamp;
16
16
  import org.embulk.spi.type.Types;
17
+ import org.slf4j.Logger;
18
+ import org.slf4j.LoggerFactory;
17
19
 
18
20
  import java.io.BufferedReader;
19
21
  import java.io.FileReader;
@@ -25,6 +27,8 @@ import java.util.*;
25
27
  public class CsvLookupFilterPlugin
26
28
  implements FilterPlugin
27
29
  {
30
+ private static final Logger logger = LoggerFactory.getLogger(CsvLookupFilterPlugin.class);
31
+
28
32
  public interface PluginTask
29
33
  extends Task
30
34
  {
@@ -178,6 +182,8 @@ public class CsvLookupFilterPlugin
178
182
  for (ColumnConfig columnConfig : task.getNewColumns().getColumns()) {
179
183
  columnConfigList.add(columnConfig);
180
184
  }
185
+ Set<String> unmatchedData = new LinkedHashSet<>();
186
+ List<String> keyColumns = task.getMappingFrom();
181
187
 
182
188
  while (reader.nextRecord()) {
183
189
 
@@ -211,6 +217,8 @@ public class CsvLookupFilterPlugin
211
217
  List<String> matchedData = new ArrayList<>();
212
218
  if (keyValuePair.containsKey(key)) {
213
219
  matchedData = keyValuePair.get(key);
220
+ }else{
221
+ unmatchedData.add(key);
214
222
  }
215
223
 
216
224
  if (matchedData.size() == 0) {
@@ -226,6 +234,19 @@ public class CsvLookupFilterPlugin
226
234
  }
227
235
  builder.addRecord();
228
236
  }
237
+ String info="\n--------------------Unmatched rows.....................\nMapping Key Columns: ";
238
+ for(int i=0;i<keyColumns.size();i++){
239
+ info+= keyColumns.get(i);
240
+ if(i!=keyColumns.size()-1){
241
+ info+=",";
242
+ }
243
+ }
244
+ info+="\n";
245
+
246
+ for(String key: unmatchedData){
247
+ info+= key+"\n";
248
+ }
249
+ logger.info(info);
229
250
 
230
251
  }
231
252
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-filter-csv_lookup
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.5
4
+ version: 0.1.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Infoobjects Inc.
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-02-13 00:00:00.000000000 Z
11
+ date: 2023-03-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -55,11 +55,15 @@ files:
55
55
  - classpath/commons-lang3-3.12.0.jar
56
56
  - classpath/commons-logging-1.2.jar
57
57
  - classpath/commons-text-1.9.jar
58
- - classpath/embulk-filter-csv_lookup-0.1.5.jar
58
+ - classpath/embulk-filter-csv_lookup-0.1.6.jar
59
59
  - classpath/opencsv-5.6.jar
60
60
  - config/checkstyle/checkstyle.xml
61
61
  - config/checkstyle/default.xml
62
- - example/config.yml
62
+ - example/csv-filter-lookup_testing_csv.yml.liquid
63
+ - example/csv-filter-lookup_testing_mssql.yml.liquid
64
+ - example/csv-filter-lookup_testing_mysql.yml.liquid
65
+ - example/csv-filter-lookup_testing_oracle.yml.liquid
66
+ - example/csv-filter-lookup_testing_postgres.yml.liquid
63
67
  - gradle/wrapper/gradle-wrapper.jar
64
68
  - gradle/wrapper/gradle-wrapper.properties
65
69
  - gradlew
data/example/config.yml DELETED
@@ -1,49 +0,0 @@
1
- exec:
2
- max_threads: 1
3
- min_output_tasks: 1
4
- in:
5
- type: file
6
- path_prefix: /home/infoobjects/Downloads/sample/calendarFloat.csv
7
- parser:
8
- type: csv
9
- columns:
10
- - {name: dim_calendar_key, type: long}
11
- - {name: year_number, type: long}
12
- - {name: quarter_number, type: long }
13
- - {name: attr_1, type: string }
14
- filters:
15
- - type: csv_lookup
16
- path_of_lookup_file: /home/infoobjects/GetFiles/countryKey_countryName.csv
17
- new_columns:
18
- - { name: country_address, type: string }
19
- mapping_from:
20
- - quarter_number
21
- - attr_1
22
- mapping_to:
23
- - id
24
- - country_code
25
- - type: csv_lookup
26
- path_of_lookup_file: /home/infoobjects/GetFiles/countryKey_countryName.csv
27
- new_columns:
28
- - { name: country_code,type: double }
29
- mapping_from:
30
- - quarter_number
31
- - attr_1
32
- mapping_to:
33
- - id
34
- - country_code
35
- out:
36
- type: file
37
- path_prefix: /home/infoobjects/GetFiles/output.csv
38
- file_ext: csv
39
- formatter:
40
- type: csv
41
- delimiter: "\t"
42
- newline: CRLF
43
- newline_in_field: LF
44
- charset: UTF-8
45
- quote_policy: MINIMAL
46
- quote: '"'
47
- escape: "\\"
48
- null_string: "\\N"
49
- default_timezone: 'UTC'