embulk-filter-timestamp_format 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: c06f4d7efebebe0e2abb454b7f6545e6ea843026
4
- data.tar.gz: d98aceb871fe643ed6afe46501758d15884ee124
3
+ metadata.gz: 7d7569b8adc1db79b292e271214f852fb080151b
4
+ data.tar.gz: df0c01a5893dc4a4bbb1f1228e3d72b031e59f93
5
5
  SHA512:
6
- metadata.gz: d8aae59c031cd582e3df88466417822e1f952ec9573d1df8fe189109496e909c9ca71d6e448d544ac8ffa64611ec2689651fccb7caa4e5de8559ff3c30bc6213
7
- data.tar.gz: 938432021acdece8554c884e033f5e4f2bcac15d2c0407fff3134f8757f84f52201e82f14ac46a4d954c900382d3d174e79ee04b94dc76848867a1b4ae4afe9e
6
+ metadata.gz: d81f4f2df4775444b5608432a2451158453f768cf6687188afae1169ea5eb15c699141d670987e73139c14c5ae8bbeb2122fbfcb6f73c89f5e98a425db8f2519
7
+ data.tar.gz: 5ecdc2f30763b7768fd1e9176c2c6b01fdafbd214f9191885e79de7f333303a9a7b86054a7c10acbf87ed458db5cab5d3b2c4871115198b829f65b8c36d855cc
data/CHANGELOG.md CHANGED
@@ -1,3 +1,10 @@
1
+ # 0.1.5 (2016-04-29)
2
+
3
+ Enhancements:
4
+
5
+ * Support to cast from/into timestamp
6
+ * Support to cast into long/double (unixtimesatmp)
7
+
1
8
  # 0.1.4 (2016-04-26)
2
9
 
3
10
  Enhancements:
data/README.md CHANGED
@@ -2,20 +2,21 @@
2
2
 
3
3
  [![Build Status](https://secure.travis-ci.org/sonots/embulk-filter-timestamp_format.png?branch=master)](http://travis-ci.org/sonots/embulk-filter-timestamp_format)
4
4
 
5
- A filter plugin for Embulk to change timesatmp format
5
+ A filter plugin for Embulk to change timestamp format
6
6
 
7
7
  ## Configuration
8
8
 
9
9
  - **columns**: columns to retain (array of hash)
10
- - **name**: name of column, must be a string column (required)
11
- - **from_format**: specify the format of the input timestamp (array of strings, default is default_from_format)
12
- - **from_timezone**: specify the timezone of the input timestamp (string, default is default_from_timezone)
13
- - **to_format**: specify the format of the output timestamp (string, default is default_to_format)
14
- - **to_timezone**: specify the timezone of the output timestamp (string, default is default_to_timezone)
15
- - **default_from_format**: default timestamp format for the input timestamp columns (array of strings, default is `["%Y-%m-%d %H:%M:%S.%N %z"]`)
16
- - **default_from_timezone**: default timezone for the input timestamp columns (string, default is `UTC`)
17
- - **default_to_format**: default timestamp format for the output timestamp columns (string, default is `%Y-%m-%d %H:%M:%S.%N %z`)
18
- - **default_to_timezone**: default timezone for the output timestamp olumns (string, default is `UTC`)
10
+ - **name**: name of column (required)
11
+ - **type**: type to cast (string, timestamp, long (unixtimestamp), double (unixtimestamp), default is string)
12
+ - **from_format**: specify the format of the input string (array of strings, default is default_from_timestamp_format)
13
+ - **from_timezone**: specify the timezone of the input string (string, default is default_from_timezone)
14
+ - **to_format**: specify the format of the output string (string, default is default_to_timestamp_format)
15
+ - **to_timezone**: specify the timezone of the output string (string, default is default_to_timezone)
16
+ - **default_from_timestamp_format**: default timestamp format for the input string (array of strings, default is `["%Y-%m-%d %H:%M:%S.%N %z"]`)
17
+ - **default_from_timezone**: default timezone for the input string (string, default is `UTC`)
18
+ - **default_to_timestamp_format**: default timestamp format for the output string (string, default is `%Y-%m-%d %H:%M:%S.%N %z`)
19
+ - **default_to_timezone**: default timezone for the output string (string, default is `UTC`)
19
20
  * **stop_on_invalid_record**: stop bulk load transaction if a invalid record is found (boolean, default is `false)
20
21
 
21
22
  ## Example
@@ -23,8 +24,8 @@ A filter plugin for Embulk to change timesatmp format
23
24
  Say example.jsonl is as follows (this is a typical format which Exporting BigQuery table outputs):
24
25
 
25
26
  ```
26
- {"timestamp":"2015-07-12 15:00:00 UTC","record":{"timestamp":"2015-07-12 15:00:00 UTC"}}
27
- {"timestamp":"2015-07-12 15:00:00.1 UTC","record":{"timestamp":"2015-07-12 15:00:00.1 UTC"}}
27
+ {"timestamp":"2015-07-12 15:00:00 UTC","nested":{"timestamp":"2015-07-12 15:00:00 UTC"}}
28
+ {"timestamp":"2015-07-12 15:00:00.1 UTC","nested":{"timestamp":"2015-07-12 15:00:00.1 UTC"}}
28
29
  ```
29
30
 
30
31
  ```yaml
@@ -35,27 +36,28 @@ in:
35
36
  type: jsonl
36
37
  columns:
37
38
  - {name: timestamp, type: string}
38
- - {name: record, type: json}
39
+ - {name: nested, type: json}
39
40
  filters:
40
41
  - type: timestamp_format
41
42
  default_to_timezone: "Asia/Tokyo"
42
- default_to_format: "%Y-%m-%d %H:%M:%S.%N"
43
+ default_to_timestamp_format: "%Y-%m-%d %H:%M:%S.%N"
43
44
  columns:
44
45
  - {name: timestamp, from_format: ["%Y-%m-%d %H:%M:%S.%N %z", "%Y-%m-%d %H:%M:%S %z"]}
45
- - {name: record.timestamp, from_format: ["%Y-%m-%d %H:%M:%S.%N %z", "%Y-%m-%d %H:%M:%S %z"]}
46
+ - {name: $.nested.timestamp, from_format: ["%Y-%m-%d %H:%M:%S.%N %z", "%Y-%m-%d %H:%M:%S %z"]}
46
47
  type: stdout
47
48
  ```
48
49
 
49
50
  Output will be as:
50
51
 
51
52
  ```
52
- {"timestamp":"2015-07-13 00:00:00.0","record":{"timestamp":"2015-07-13 00:00:00.0}}
53
- {"timestamp":"2015-07-13 00:00:00.1","record":{"timestamp":"2015-07-13 00:00:00.1}}
53
+ {"timestamp":"2015-07-13 00:00:00.0","nested":{"timestamp":"2015-07-13 00:00:00.0}}
54
+ {"timestamp":"2015-07-13 00:00:00.1","nested":{"timestamp":"2015-07-13 00:00:00.1}}
54
55
  ```
55
56
 
57
+ See [./example](./example) for more examples.
58
+
56
59
  ## ToDo
57
60
 
58
- * Currently, input must be a String column and output will be a String column. But, support Timestamp column (input / output)
59
61
  * Write test
60
62
 
61
63
  ## Development
@@ -63,9 +65,8 @@ Output will be as:
63
65
  Run example:
64
66
 
65
67
  ```
66
- $ embulk gem install embulk-parser-jsonl
67
68
  $ ./gradlew classpath
68
- $ embulk run -I lib example/example.yml
69
+ $ embulk preview -I lib example/example.yml
69
70
  ```
70
71
 
71
72
  Run test:
data/build.gradle CHANGED
@@ -13,7 +13,7 @@ configurations {
13
13
  provided
14
14
  }
15
15
 
16
- version = "0.1.4"
16
+ version = "0.1.5"
17
17
  sourceCompatibility = 1.7
18
18
  targetCompatibility = 1.7
19
19
 
@@ -72,7 +72,6 @@ Gem::Specification.new do |spec|
72
72
 
73
73
  spec.add_development_dependency 'bundler', ['~> 1.0']
74
74
  spec.add_development_dependency 'rake', ['>= 10.0']
75
- spec.add_development_dependency 'embulk-parser-jsonl'
76
75
  end
77
76
  /$)
78
77
  }
data/example/example.yml CHANGED
@@ -1,18 +1,22 @@
1
1
  in:
2
2
  type: file
3
- path_prefix: example/example.jsonl
3
+ path_prefix: example/string_example.csv
4
4
  parser:
5
- type: jsonl
5
+ type: csv
6
6
  columns:
7
- - {name: timestamp, type: string}
8
- - {name: record, type: json}
9
- - {name: ignore_record, type: json}
7
+ - {name: string1, type: string}
8
+ - {name: string2, type: string}
9
+ - {name: string3, type: string}
10
+ - {name: string4, type: string}
10
11
  filters:
11
12
  - type: timestamp_format
12
13
  default_to_timezone: "Asia/Tokyo"
13
- default_to_format: "%Y-%m-%d %H:%M:%S.%N"
14
+ default_to_timestamp_format: "%Y-%m-%d %H:%M:%S.%N"
15
+ default_from_timestamp_format: ["%Y-%m-%d %H:%M:%S.%N %z", "%Y-%m-%d %H:%M:%S %z"]
14
16
  columns:
15
- - {name: timestamp, from_format: ["%Y-%m-%d %H:%M:%S.%N %z", "%Y-%m-%d %H:%M:%S %z"]}
16
- - {name: "$.record.record[0].timestamp", from_format: ["%Y-%m-%d %H:%M:%S.%N %z", "%Y-%m-%d %H:%M:%S %z"]}
17
+ - {name: string1}
18
+ - {name: string2, type: timestamp}
19
+ - {name: string3, type: long}
20
+ - {name: string4, type: double}
17
21
  out:
18
- type: stdout
22
+ type: "null"
@@ -0,0 +1,2 @@
1
+ {"timestamp":"2015-07-12 15:00:00 UTC","nested":{"nested":[{"timestamp":"2015-07-12 15:00:00 UTC"}]},"ignore_nested":{"timestamp":"2015-07-12 15:00:00 UTC"}}
2
+ {"timestamp":"2015-07-12 15:00:00.1 UTC","nested":{"nested":[{"timestamp":"2015-07-12 15:00:00.1 UTC"}]},"ignore_nested":{"timestamp":"2015-07-12 15:00:00.1 UTC"}}
@@ -0,0 +1,14 @@
1
+ in:
2
+ type: file
3
+ path_prefix: example/json_example.jsonl
4
+ parser:
5
+ type: json
6
+ filters:
7
+ - type: timestamp_format
8
+ default_to_timezone: "Asia/Tokyo"
9
+ default_to_timestamp_format: "%Y-%m-%d %H:%M:%S.%N"
10
+ columns:
11
+ - {name: "$.record.timestamp", from_format: ["%Y-%m-%d %H:%M:%S.%N %z", "%Y-%m-%d %H:%M:%S %z"]}
12
+ - {name: "$.record.nested.nested[0].timestamp", from_format: ["%Y-%m-%d %H:%M:%S.%N %z", "%Y-%m-%d %H:%M:%S %z"]}
13
+ out:
14
+ type: "null"
@@ -0,0 +1,22 @@
1
+ in:
2
+ type: file
3
+ path_prefix: example/string_example.csv
4
+ parser:
5
+ type: csv
6
+ columns:
7
+ - {name: string1, type: string}
8
+ - {name: string2, type: string}
9
+ - {name: string3, type: string}
10
+ - {name: string4, type: string}
11
+ filters:
12
+ - type: timestamp_format
13
+ default_to_timezone: "Asia/Tokyo"
14
+ default_to_timestamp_format: "%Y-%m-%d %H:%M:%S.%N"
15
+ default_from_timestamp_format: ["%Y-%m-%d %H:%M:%S.%N %z", "%Y-%m-%d %H:%M:%S %z"]
16
+ columns:
17
+ - {name: string1}
18
+ - {name: string2, type: timestamp}
19
+ - {name: string3, type: long}
20
+ - {name: string4, type: double}
21
+ out:
22
+ type: "null"
@@ -0,0 +1,22 @@
1
+ in:
2
+ type: file
3
+ path_prefix: example/timestamp_example.csv
4
+ parser:
5
+ type: csv
6
+ default_timestamp_format: "%Y-%m-%d %H:%M:%S.%N %z"
7
+ columns:
8
+ - {name: timestamp1, type: timestamp}
9
+ - {name: timestamp2, type: timestamp}
10
+ - {name: timestamp3, type: timestamp}
11
+ - {name: timestamp4, type: timestamp}
12
+ filters:
13
+ - type: timestamp_format
14
+ default_to_timezone: "Asia/Tokyo"
15
+ default_to_timestamp_format: "%Y-%m-%d %H:%M:%S.%N"
16
+ columns:
17
+ - {name: timestamp1}
18
+ - {name: timestamp2, type: timestamp}
19
+ - {name: timestamp3, type: long}
20
+ - {name: timestamp4, type: double}
21
+ out:
22
+ type: "null"
@@ -0,0 +1,132 @@
1
+ package org.embulk.filter.timestamp_format;
2
+
3
+ import org.embulk.filter.timestamp_format.cast.StringCast;
4
+ import org.embulk.filter.timestamp_format.cast.TimestampCast;
5
+ import org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin.ColumnConfig;
6
+ import org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin.PluginTask;
7
+ import org.embulk.spi.Column;
8
+ import org.embulk.spi.Exec;
9
+ import org.embulk.spi.PageBuilder;
10
+ import org.embulk.spi.PageReader;
11
+ import org.embulk.spi.Schema;
12
+ import org.embulk.spi.time.Timestamp;
13
+ import org.embulk.spi.type.DoubleType;
14
+ import org.embulk.spi.type.LongType;
15
+ import org.embulk.spi.type.StringType;
16
+ import org.embulk.spi.type.TimestampType;
17
+ import org.embulk.spi.type.Type;
18
+ import org.joda.time.DateTimeZone;
19
+ import org.msgpack.value.Value;
20
+ import org.slf4j.Logger;
21
+
22
+ import java.util.HashMap;
23
+ import java.util.List;
24
+
25
+ public class ColumnCaster
26
+ {
27
+ private static final Logger logger = Exec.getLogger(TimestampFormatFilterPlugin.class);
28
+ private final PluginTask task;
29
+ private final Schema inputSchema;
30
+ private final Schema outputSchema;
31
+ private final PageReader pageReader;
32
+ private final PageBuilder pageBuilder;
33
+ private final HashMap<String, TimestampParser> timestampParserMap = new HashMap<>();
34
+ private final HashMap<String, TimestampFormatter> timestampFormatterMap = new HashMap<>();
35
+ private final JsonVisitor jsonVisitor;
36
+
37
+ ColumnCaster(PluginTask task, Schema inputSchema, Schema outputSchema, PageReader pageReader, PageBuilder pageBuilder)
38
+ {
39
+ this.task = task;
40
+ this.inputSchema = inputSchema;
41
+ this.outputSchema = outputSchema;
42
+ this.pageReader = pageReader;
43
+ this.pageBuilder = pageBuilder;
44
+
45
+ buildTimestampParserMap();
46
+ buildTimestampFormatterMap();
47
+
48
+ JsonCaster jsonCaster = new JsonCaster(task, timestampParserMap, timestampFormatterMap);
49
+ this.jsonVisitor = new JsonVisitor(task, jsonCaster);
50
+ }
51
+
52
+ private void buildTimestampParserMap()
53
+ {
54
+ // columnName or jsonPath => TimestampParser
55
+ for (ColumnConfig columnConfig : task.getColumns()) {
56
+ TimestampParser parser = getTimestampParser(columnConfig, task);
57
+ this.timestampParserMap.put(columnConfig.getName(), parser);
58
+ }
59
+ }
60
+
61
+ private void buildTimestampFormatterMap()
62
+ {
63
+ // columnName or jsonPath => TimestampFormatter
64
+ for (ColumnConfig columnConfig : task.getColumns()) {
65
+ TimestampFormatter parser = getTimestampFormatter(columnConfig, task);
66
+ this.timestampFormatterMap.put(columnConfig.getName(), parser);
67
+ }
68
+ }
69
+
70
+ private TimestampParser getTimestampParser(ColumnConfig columnConfig, PluginTask task)
71
+ {
72
+ DateTimeZone timezone = columnConfig.getFromTimeZone().or(task.getDefaultFromTimeZone());
73
+ List<String> formatList = columnConfig.getFromFormat().or(task.getDefaultFromTimestampFormat());
74
+ return new TimestampParser(task.getJRuby(), formatList, timezone);
75
+ }
76
+
77
+ private TimestampFormatter getTimestampFormatter(ColumnConfig columnConfig, PluginTask task)
78
+ {
79
+ String format = columnConfig.getToFormat().or(task.getDefaultToTimestampFormat());
80
+ DateTimeZone timezone = columnConfig.getToTimeZone().or(task.getDefaultToTimeZone());
81
+ return new TimestampFormatter(task.getJRuby(), format, timezone);
82
+ }
83
+
84
+ public void setFromString(Column outputColumn, String value)
85
+ {
86
+ Type outputType = outputColumn.getType();
87
+ TimestampParser timestampParser = timestampParserMap.get(outputColumn.getName());
88
+ if (outputType instanceof StringType) {
89
+ TimestampFormatter timestampFormatter = timestampFormatterMap.get(outputColumn.getName());
90
+ pageBuilder.setString(outputColumn, StringCast.asString(value, timestampParser, timestampFormatter));
91
+ }
92
+ else if (outputType instanceof TimestampType) {
93
+ pageBuilder.setTimestamp(outputColumn, StringCast.asTimestamp(value, timestampParser));
94
+ }
95
+ else if (outputType instanceof LongType) {
96
+ pageBuilder.setLong(outputColumn, StringCast.asLong(value, timestampParser));
97
+ }
98
+ else if (outputType instanceof DoubleType) {
99
+ pageBuilder.setDouble(outputColumn, StringCast.asDouble(value, timestampParser));
100
+ }
101
+ else {
102
+ assert false;
103
+ }
104
+ }
105
+
106
+ public void setFromTimestamp(Column outputColumn, Timestamp value)
107
+ {
108
+ Type outputType = outputColumn.getType();
109
+ if (outputType instanceof StringType) {
110
+ TimestampFormatter timestampFormatter = timestampFormatterMap.get(outputColumn.getName());
111
+ pageBuilder.setString(outputColumn, TimestampCast.asString(value, timestampFormatter));
112
+ }
113
+ else if (outputType instanceof TimestampType) {
114
+ pageBuilder.setTimestamp(outputColumn, value);
115
+ }
116
+ else if (outputType instanceof LongType) {
117
+ pageBuilder.setLong(outputColumn, TimestampCast.asLong(value));
118
+ }
119
+ else if (outputType instanceof DoubleType) {
120
+ pageBuilder.setDouble(outputColumn, TimestampCast.asDouble(value));
121
+ }
122
+ else {
123
+ assert false;
124
+ }
125
+ }
126
+
127
+ public void setFromJson(Column outputColumn, Value value)
128
+ {
129
+ String jsonPath = new StringBuilder("$.").append(outputColumn.getName()).toString();
130
+ pageBuilder.setJson(outputColumn, jsonVisitor.visit(jsonPath, value));
131
+ }
132
+ }
@@ -1,251 +1,154 @@
1
1
  package org.embulk.filter.timestamp_format;
2
2
 
3
- import com.google.common.base.Throwables;
3
+ import org.embulk.spi.DataException;
4
4
  import org.embulk.spi.PageReader;
5
- import org.msgpack.value.ArrayValue;
6
- import org.msgpack.value.MapValue;
7
- import org.msgpack.value.Value;
8
- import org.msgpack.value.ValueFactory;
5
+ import org.embulk.spi.Schema;
9
6
 
10
- import org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin.ColumnConfig;
11
7
  import org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin.PluginTask;
12
8
 
13
9
  import org.embulk.spi.Column;
14
10
  import org.embulk.spi.ColumnVisitor;
15
11
  import org.embulk.spi.Exec;
16
12
  import org.embulk.spi.PageBuilder;
17
- import org.embulk.spi.time.Timestamp;
18
- import org.embulk.spi.time.TimestampParseException;
19
- import org.joda.time.DateTimeZone;
20
13
  import org.slf4j.Logger;
21
14
 
22
15
  import java.util.HashMap;
23
- import java.util.HashSet;
24
- import java.util.List;
25
- import java.util.Map;
26
- import java.util.Objects;
27
16
 
28
17
  public class ColumnVisitorImpl
29
18
  implements ColumnVisitor
30
19
  {
31
20
  private static final Logger logger = Exec.getLogger(TimestampFormatFilterPlugin.class);
32
21
  private final PluginTask task;
22
+ private final Schema inputSchema;
23
+ private final Schema outputSchema;
33
24
  private final PageReader pageReader;
34
25
  private final PageBuilder pageBuilder;
35
- private final HashMap<String, TimestampParser> timestampParserMap = new HashMap<String, TimestampParser>();
36
- private final HashMap<String, TimestampFormatter> timestampFormatterMap = new HashMap<String, TimestampFormatter>();
37
- private final HashSet<String> shouldVisitRecursivelySet = new HashSet<String>();
26
+ private final HashMap<String, Column> outputColumnMap = new HashMap<>();
27
+ private final ColumnCaster columnCaster;
38
28
 
39
- ColumnVisitorImpl(PluginTask task, PageReader pageReader, PageBuilder pageBuilder)
29
+ ColumnVisitorImpl(PluginTask task, Schema inputSchema, Schema outputSchema,
30
+ PageReader pageReader, PageBuilder pageBuilder)
40
31
  {
41
- this.task = task;
42
- this.pageReader = pageReader;
43
- this.pageBuilder = pageBuilder;
44
-
45
- buildTimestampParserMap();
46
- buildTimestampFormatterMap();
47
- buildShouldVisitRecursivelySet();;
32
+ this.task = task;
33
+ this.inputSchema = inputSchema;
34
+ this.outputSchema = outputSchema;
35
+ this.pageReader = pageReader;
36
+ this.pageBuilder = pageBuilder;
37
+
38
+ buildOutputColumnMap();
39
+ this.columnCaster = new ColumnCaster(task, inputSchema, outputSchema, pageReader, pageBuilder);
48
40
  }
49
41
 
50
- private void buildTimestampParserMap()
42
+ private void buildOutputColumnMap()
51
43
  {
52
- // columnName or jsonPath => TimestampParser
53
- for (ColumnConfig columnConfig : task.getColumns()) {
54
- TimestampParser parser = getTimestampParser(columnConfig, task);
55
- this.timestampParserMap.put(columnConfig.getName(), parser); // NOTE: value would be null
44
+ // columnName => outputColumn
45
+ for (Column column : outputSchema.getColumns()) {
46
+ this.outputColumnMap.put(column.getName(), column);
56
47
  }
57
48
  }
58
49
 
59
- private TimestampParser getTimestampParser(ColumnConfig columnConfig, PluginTask task)
60
- {
61
- DateTimeZone timezone = columnConfig.getFromTimeZone().or(task.getDefaultFromTimeZone());
62
- List<String> formatList = columnConfig.getFromFormat().or(task.getDefaultFromTimestampFormat());
63
- return new TimestampParser(task.getJRuby(), formatList, timezone);
64
- }
65
-
66
- private void buildTimestampFormatterMap()
50
+ private interface PageBuildable
67
51
  {
68
- // columnName or jsonPath => TimestampFormatter
69
- for (ColumnConfig columnConfig : task.getColumns()) {
70
- TimestampFormatter parser = getTimestampFormatter(columnConfig, task);
71
- this.timestampFormatterMap.put(columnConfig.getName(), parser); // NOTE: value would be null
72
- }
52
+ public void run() throws DataException;
73
53
  }
74
54
 
75
- private TimestampFormatter getTimestampFormatter(ColumnConfig columnConfig, PluginTask task)
55
+ private void withStopOnInvalidRecord(final PageBuildable op,
56
+ final Column inputColumn, final Column outputColumn) throws DataException
76
57
  {
77
- String format = columnConfig.getToFormat().or(task.getDefaultToTimestampFormat());
78
- DateTimeZone timezone = columnConfig.getToTimeZone().or(task.getDefaultToTimeZone());
79
- return new TimestampFormatter(task.getJRuby(), format, timezone);
80
- }
81
-
82
-
83
- private void buildShouldVisitRecursivelySet()
84
- {
85
- // json partial path => Boolean to avoid unnecessary type: json visit
86
- for (ColumnConfig columnConfig : task.getColumns()) {
87
- String name = columnConfig.getName();
88
- if (!name.startsWith("$.")) {
89
- continue;
90
- }
91
- String[] parts = name.split("\\.");
92
- StringBuilder partialPath = new StringBuilder("$");
93
- for (int i = 1; i < parts.length; i++) {
94
- if (parts[i].contains("[")) {
95
- String[] arrayParts = parts[i].split("\\[");
96
- partialPath.append(".").append(arrayParts[0]);
97
- this.shouldVisitRecursivelySet.add(partialPath.toString());
98
- for (int j = 1; j < arrayParts.length; j++) {
99
- partialPath.append("[").append(arrayParts[j]);
100
- this.shouldVisitRecursivelySet.add(partialPath.toString());
101
- }
102
- }
103
- else {
104
- partialPath.append(".").append(parts[i]);
105
- this.shouldVisitRecursivelySet.add(partialPath.toString());
106
- }
107
- }
108
- }
109
- }
110
-
111
- private boolean shouldVisitRecursively(String name)
112
- {
113
- return shouldVisitRecursivelySet.contains(name);
114
- }
115
-
116
- private Value formatTimestampStringRecursively(PluginTask task, String path, Value value)
117
- throws TimestampParseException
118
- {
119
- if (!shouldVisitRecursively(path)) {
120
- return value;
121
- }
122
- if (value.isArrayValue()) {
123
- ArrayValue arrayValue = value.asArrayValue();
124
- int size = arrayValue.size();
125
- Value[] newValue = new Value[size];
126
- for (int i = 0; i < size; i++) {
127
- String k = new StringBuilder(path).append("[").append(Integer.toString(i)).append("]").toString();
128
- Value v = arrayValue.get(i);
129
- newValue[i] = formatTimestampStringRecursively(task, k, v);
130
- }
131
- return ValueFactory.newArray(newValue, true);
132
- }
133
- else if (value.isMapValue()) {
134
- MapValue mapValue = value.asMapValue();
135
- int size = mapValue.size() * 2;
136
- Value[] newValue = new Value[size];
137
- int i = 0;
138
- for (Map.Entry<Value, Value> entry : mapValue.entrySet()) {
139
- Value k = entry.getKey();
140
- Value v = entry.getValue();
141
- String newPath = new StringBuilder(path).append(".").append(k.asStringValue().asString()).toString();
142
- Value r = formatTimestampStringRecursively(task, newPath, v);
143
- newValue[i++] = k;
144
- newValue[i++] = r;
145
- }
146
- return ValueFactory.newMap(newValue, true);
147
- }
148
- else if (value.isStringValue()) {
149
- String stringValue = value.asStringValue().asString();
150
- String newValue = formatTimestampString(task, path, stringValue);
151
- return (Objects.equals(newValue, stringValue)) ? value : ValueFactory.newString(newValue);
58
+ if (pageReader.isNull(inputColumn)) {
59
+ pageBuilder.setNull(outputColumn);
152
60
  }
153
61
  else {
154
- return value;
155
- }
156
- }
157
-
158
- private String formatTimestampString(PluginTask task, String name, String value)
159
- throws TimestampParseException
160
- {
161
- TimestampParser parser = timestampParserMap.get(name);
162
- TimestampFormatter formatter = timestampFormatterMap.get(name);
163
- if (formatter == null || parser == null) {
164
- return value;
165
- }
166
- try {
167
- Timestamp timestamp = parser.parse(value);
168
- return formatter.format(timestamp);
169
- }
170
- catch (TimestampParseException ex) {
171
62
  if (task.getStopOnInvalidRecord()) {
172
- throw Throwables.propagate(ex);
63
+ op.run();
173
64
  }
174
65
  else {
175
- logger.warn("invalid value \"{}\":\"{}\"", name, value);
176
- return value;
66
+ try {
67
+ op.run();
68
+ }
69
+ catch (final DataException ex) {
70
+ logger.warn(ex.getMessage());
71
+ pageBuilder.setNull(outputColumn);
72
+ }
177
73
  }
178
74
  }
179
75
  }
180
76
 
181
-
182
77
  @Override
183
- public void booleanColumn(Column column)
78
+ public void booleanColumn(final Column inputColumn)
184
79
  {
185
- if (pageReader.isNull(column)) {
186
- pageBuilder.setNull(column);
80
+ if (pageReader.isNull(inputColumn)) {
81
+ pageBuilder.setNull(inputColumn);
187
82
  }
188
83
  else {
189
- pageBuilder.setBoolean(column, pageReader.getBoolean(column));
84
+ pageBuilder.setBoolean(inputColumn, pageReader.getBoolean(inputColumn));
190
85
  }
191
86
  }
192
87
 
193
88
  @Override
194
- public void longColumn(Column column)
89
+ public void longColumn(final Column inputColumn)
195
90
  {
196
- if (pageReader.isNull(column)) {
197
- pageBuilder.setNull(column);
91
+ if (pageReader.isNull(inputColumn)) {
92
+ pageBuilder.setNull(inputColumn);
198
93
  }
199
94
  else {
200
- pageBuilder.setLong(column, pageReader.getLong(column));
95
+ pageBuilder.setLong(inputColumn, pageReader.getLong(inputColumn));
201
96
  }
202
97
  }
203
98
 
204
99
  @Override
205
- public void doubleColumn(Column column)
100
+ public void doubleColumn(final Column inputColumn)
206
101
  {
207
- if (pageReader.isNull(column)) {
208
- pageBuilder.setNull(column);
102
+ if (pageReader.isNull(inputColumn)) {
103
+ pageBuilder.setNull(inputColumn);
209
104
  }
210
105
  else {
211
- pageBuilder.setDouble(column, pageReader.getDouble(column));
106
+ pageBuilder.setDouble(inputColumn, pageReader.getDouble(inputColumn));
212
107
  }
213
108
  }
214
109
 
215
110
  @Override
216
- public void stringColumn(Column column)
111
+ public void stringColumn(final Column inputColumn)
217
112
  {
218
- if (pageReader.isNull(column)) {
219
- pageBuilder.setNull(column);
113
+ if (pageReader.isNull(inputColumn)) {
114
+ pageBuilder.setNull(inputColumn);
220
115
  return;
221
116
  }
222
- String value = pageReader.getString(column);
223
- String formatted = formatTimestampString(task, column.getName(), value);
224
- pageBuilder.setString(column, formatted);
117
+ final Column outputColumn = outputColumnMap.get(inputColumn.getName());
118
+ PageBuildable op = new PageBuildable() {
119
+ public void run() throws DataException
120
+ {
121
+ columnCaster.setFromString(outputColumn, pageReader.getString(inputColumn));
122
+ }
123
+ };
124
+ withStopOnInvalidRecord(op, inputColumn, outputColumn);
225
125
  }
226
126
 
227
127
  @Override
228
- public void jsonColumn(Column column)
128
+ public void timestampColumn(final Column inputColumn)
229
129
  {
230
- if (pageReader.isNull(column)) {
231
- pageBuilder.setNull(column);
232
- }
233
- else {
234
- String path = new StringBuilder("$.").append(column.getName()).toString();
235
- Value value = pageReader.getJson(column);
236
- Value formatted = formatTimestampStringRecursively(task, path, value);
237
- pageBuilder.setJson(column, formatted);
130
+ if (pageReader.isNull(inputColumn)) {
131
+ pageBuilder.setNull(inputColumn);
132
+ return;
238
133
  }
134
+ final Column outputColumn = outputColumnMap.get(inputColumn.getName());
135
+ PageBuildable op = new PageBuildable() {
136
+ public void run() throws DataException
137
+ {
138
+ columnCaster.setFromTimestamp(outputColumn, pageReader.getTimestamp(inputColumn));
139
+ }
140
+ };
141
+ withStopOnInvalidRecord(op, inputColumn, outputColumn);
239
142
  }
240
143
 
241
144
  @Override
242
- public void timestampColumn(Column column)
145
+ public void jsonColumn(final Column inputColumn)
243
146
  {
244
- if (pageReader.isNull(column)) {
245
- pageBuilder.setNull(column);
246
- }
247
- else {
248
- pageBuilder.setTimestamp(column, pageReader.getTimestamp(column));
147
+ if (pageReader.isNull(inputColumn)) {
148
+ pageBuilder.setNull(inputColumn);
149
+ return;
249
150
  }
151
+ final Column outputColumn = outputColumnMap.get(inputColumn.getName());
152
+ columnCaster.setFromJson(outputColumn, pageReader.getJson(inputColumn));
250
153
  }
251
154
  }
@@ -0,0 +1,54 @@
1
+ package org.embulk.filter.timestamp_format;
2
+
3
+ import org.embulk.filter.timestamp_format.cast.StringCast;
4
+ import org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin.ColumnConfig;
5
+ import org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin.PluginTask;
6
+ import org.embulk.spi.Exec;
7
+ import org.embulk.spi.type.DoubleType;
8
+ import org.embulk.spi.type.LongType;
9
+ import org.embulk.spi.type.StringType;
10
+ import org.embulk.spi.type.Type;
11
+ import org.msgpack.value.StringValue;
12
+ import org.msgpack.value.Value;
13
+ import org.msgpack.value.ValueFactory;
14
+
15
+ import org.slf4j.Logger;
16
+
17
+ import java.util.HashMap;
18
+
19
+ class JsonCaster
20
+ {
21
+ private static final Logger logger = Exec.getLogger(TimestampFormatFilterPlugin.class);
22
+ private final PluginTask task;
23
+ private final HashMap<String, TimestampParser> timestampParserMap;
24
+ private final HashMap<String, TimestampFormatter> timestampFormatterMap;
25
+
26
+ JsonCaster(PluginTask task,
27
+ HashMap<String, TimestampParser> timestampParserMap,
28
+ HashMap<String, TimestampFormatter> timestampFormatterMap)
29
+ {
30
+ this.task = task;
31
+ this.timestampParserMap = timestampParserMap;
32
+ this.timestampFormatterMap = timestampFormatterMap;
33
+ }
34
+
35
+ public Value fromString(ColumnConfig columnConfig, StringValue value)
36
+ {
37
+ Type outputType = columnConfig.getType();
38
+ TimestampParser parser = timestampParserMap.get(columnConfig.getName());
39
+ if (outputType instanceof StringType) {
40
+ TimestampFormatter formatter = timestampFormatterMap.get(columnConfig.getName());
41
+ return ValueFactory.newString(StringCast.asString(value.asString(), parser, formatter));
42
+ }
43
+ else if (outputType instanceof LongType) {
44
+ return ValueFactory.newInteger(StringCast.asLong(value.asString(), parser));
45
+ }
46
+ else if (outputType instanceof DoubleType) {
47
+ return ValueFactory.newFloat(StringCast.asDouble(value.asString(), parser));
48
+ }
49
+ else {
50
+ assert false;
51
+ return null;
52
+ }
53
+ }
54
+ }
@@ -0,0 +1,119 @@
1
+ package org.embulk.filter.timestamp_format;
2
+
3
+ import org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin.ColumnConfig;
4
+ import org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin.PluginTask;
5
+
6
+ import org.embulk.spi.Exec;
7
+ import org.msgpack.value.ArrayValue;
8
+ import org.msgpack.value.MapValue;
9
+ import org.msgpack.value.Value;
10
+ import org.msgpack.value.ValueFactory;
11
+
12
+ import org.slf4j.Logger;
13
+
14
+ import java.util.HashMap;
15
+ import java.util.HashSet;
16
+ import java.util.Map;
17
+
18
+ public class JsonVisitor
19
+ {
20
+ private static final Logger logger = Exec.getLogger(TimestampFormatFilterPlugin.class);
21
+ private final PluginTask task;
22
+ private final JsonCaster jsonCaster;
23
+ private final HashMap<String, ColumnConfig> jsonPathColumnConfigMap = new HashMap<>();
24
+ private final HashSet<String> shouldVisitSet = new HashSet<>();
25
+
26
+ JsonVisitor(PluginTask task, JsonCaster jsonCaster)
27
+ {
28
+ this.task = task;
29
+ this.jsonCaster = jsonCaster;
30
+
31
+ buildJsonPathColumnConfigMap();
32
+ buildShouldVisitSet();
33
+ }
34
+
35
+ private void buildJsonPathColumnConfigMap()
36
+ {
37
+ // json path => Type
38
+ for (ColumnConfig columnConfig : task.getColumns()) {
39
+ String name = columnConfig.getName();
40
+ if (!name.startsWith("$.")) {
41
+ continue;
42
+ }
43
+ this.jsonPathColumnConfigMap.put(name, columnConfig);
44
+ }
45
+ }
46
+
47
+ private void buildShouldVisitSet()
48
+ {
49
+ // json partial path => Boolean to avoid unnecessary type: json visit
50
+ for (ColumnConfig columnConfig : task.getColumns()) {
51
+ String name = columnConfig.getName();
52
+ if (!name.startsWith("$.")) {
53
+ continue;
54
+ }
55
+ String[] parts = name.split("\\.");
56
+ StringBuilder partialPath = new StringBuilder("$");
57
+ for (int i = 1; i < parts.length; i++) {
58
+ if (parts[i].contains("[")) {
59
+ String[] arrayParts = parts[i].split("\\[");
60
+ partialPath.append(".").append(arrayParts[0]);
61
+ this.shouldVisitSet.add(partialPath.toString());
62
+ for (int j = 1; j < arrayParts.length; j++) {
63
+ partialPath.append("[").append(arrayParts[j]);
64
+ this.shouldVisitSet.add(partialPath.toString());
65
+ }
66
+ }
67
+ else {
68
+ partialPath.append(".").append(parts[i]);
69
+ this.shouldVisitSet.add(partialPath.toString());
70
+ }
71
+ }
72
+ }
73
+ }
74
+
75
+ private boolean shouldVisit(String jsonPath)
76
+ {
77
+ return shouldVisitSet.contains(jsonPath);
78
+ }
79
+
80
+ public Value visit(String jsonPath, Value value)
81
+ {
82
+ if (!shouldVisit(jsonPath)) {
83
+ return value;
84
+ }
85
+ if (value.isArrayValue()) {
86
+ ArrayValue arrayValue = value.asArrayValue();
87
+ int size = arrayValue.size();
88
+ Value[] newValue = new Value[size];
89
+ for (int i = 0; i < size; i++) {
90
+ String k = new StringBuilder(jsonPath).append("[").append(Integer.toString(i)).append("]").toString();
91
+ Value v = arrayValue.get(i);
92
+ newValue[i] = visit(k, v);
93
+ }
94
+ return ValueFactory.newArray(newValue, true);
95
+ }
96
+ else if (value.isMapValue()) {
97
+ MapValue mapValue = value.asMapValue();
98
+ int size = mapValue.size() * 2;
99
+ Value[] newValue = new Value[size];
100
+ int i = 0;
101
+ for (Map.Entry<Value, Value> entry : mapValue.entrySet()) {
102
+ Value k = entry.getKey();
103
+ Value v = entry.getValue();
104
+ String newPath = new StringBuilder(jsonPath).append(".").append(k.asStringValue().asString()).toString();
105
+ Value r = visit(newPath, v);
106
+ newValue[i++] = k;
107
+ newValue[i++] = r;
108
+ }
109
+ return ValueFactory.newMap(newValue, true);
110
+ }
111
+ else if (value.isStringValue()) {
112
+ ColumnConfig columnConfig = jsonPathColumnConfigMap.get(jsonPath);
113
+ return jsonCaster.fromString(columnConfig, value.asStringValue());
114
+ }
115
+ else {
116
+ return value;
117
+ }
118
+ }
119
+ }
@@ -1,12 +1,15 @@
1
1
  package org.embulk.filter.timestamp_format;
2
2
 
3
+ import com.google.common.collect.ImmutableList;
3
4
  import org.embulk.config.Config;
4
5
  import org.embulk.config.ConfigDefault;
6
+ import org.embulk.config.ConfigException;
5
7
  import org.embulk.config.ConfigInject;
6
8
  import org.embulk.config.ConfigSource;
7
9
  import org.embulk.config.Task;
8
10
  import org.embulk.config.TaskSource;
9
11
 
12
+ import org.embulk.spi.Column;
10
13
  import org.embulk.spi.Exec;
11
14
  import org.embulk.spi.FilterPlugin;
12
15
  import org.embulk.spi.Page;
@@ -15,6 +18,11 @@ import org.embulk.spi.PageOutput;
15
18
  import org.embulk.spi.PageReader;
16
19
  import org.embulk.spi.Schema;
17
20
 
21
+ import org.embulk.spi.type.DoubleType;
22
+ import org.embulk.spi.type.LongType;
23
+ import org.embulk.spi.type.StringType;
24
+ import org.embulk.spi.type.TimestampType;
25
+ import org.embulk.spi.type.Type;
18
26
  import org.jruby.embed.ScriptingContainer;
19
27
  import org.slf4j.Logger;
20
28
 
@@ -24,20 +32,22 @@ public class TimestampFormatFilterPlugin implements FilterPlugin
24
32
  {
25
33
  private static final Logger logger = Exec.getLogger(TimestampFormatFilterPlugin.class);
26
34
 
27
- public TimestampFormatFilterPlugin()
28
- {
29
- }
35
+ public TimestampFormatFilterPlugin() {}
30
36
 
31
37
  // NOTE: This is not spi.ColumnConfig
32
- public interface ColumnConfig extends Task,
38
+ interface ColumnConfig extends Task,
33
39
  TimestampParser.TimestampColumnOption, TimestampFormatter.TimestampColumnOption
34
40
  {
35
41
  @Config("name")
36
42
  String getName();
43
+
44
+ @Config("type")
45
+ @ConfigDefault("\"string\"")
46
+ Type getType();
37
47
  }
38
48
 
39
- public interface PluginTask extends Task,
40
- TimestampParser.Task, TimestampFormatter.Task
49
+ interface PluginTask extends Task,
50
+ TimestampParser.Task, TimestampFormatter.Task
41
51
  {
42
52
  @Config("columns")
43
53
  @ConfigDefault("[]")
@@ -57,12 +67,20 @@ public class TimestampFormatFilterPlugin implements FilterPlugin
57
67
  {
58
68
  PluginTask task = config.loadConfig(PluginTask.class);
59
69
 
70
+ configure(task, inputSchema);
71
+ Schema outputSchema = buildOuputSchema(task, inputSchema);
72
+ control.run(task.dump(), outputSchema);
73
+ }
74
+
75
+ private void configure(PluginTask task, Schema inputSchema)
76
+ {
60
77
  List<ColumnConfig> columns = task.getColumns();
78
+
61
79
  // throw if column does not exist
62
80
  for (ColumnConfig columnConfig : columns) {
63
81
  String name = columnConfig.getName();
64
82
  if (name.startsWith("$.")) {
65
- String firstName = name.split("\\.", 3)[1];
83
+ String firstName = name.split("\\.", 3)[1]; // check only top level column name
66
84
  inputSchema.lookupColumn(firstName);
67
85
  }
68
86
  else {
@@ -70,7 +88,55 @@ public class TimestampFormatFilterPlugin implements FilterPlugin
70
88
  }
71
89
  }
72
90
 
73
- control.run(task.dump(), inputSchema);
91
+ // throw if column type is not string or timestamp
92
+ for (ColumnConfig columnConfig : columns) {
93
+ Type type = columnConfig.getType();
94
+ boolean acceptable = false;
95
+ if (type instanceof StringType) {
96
+ continue;
97
+ }
98
+ else if (type instanceof TimestampType) {
99
+ continue;
100
+ }
101
+ else if (type instanceof LongType) {
102
+ continue;
103
+ }
104
+ else if (type instanceof DoubleType) {
105
+ continue;
106
+ }
107
+ else {
108
+ throw new ConfigException("column type must be string, timestamp, long, or double");
109
+ }
110
+ }
111
+ }
112
+
113
+ private Schema buildOuputSchema(final PluginTask task, final Schema inputSchema)
114
+ {
115
+ List<ColumnConfig> columnConfigs = task.getColumns();
116
+ ImmutableList.Builder<Column> builder = ImmutableList.builder();
117
+ int i = 0;
118
+ for (Column inputColumn : inputSchema.getColumns()) {
119
+ String name = inputColumn.getName();
120
+ Type type = inputColumn.getType();
121
+ ColumnConfig columnConfig = getColumnConfig(name, columnConfigs);
122
+ if (columnConfig != null) {
123
+ type = columnConfig.getType();
124
+ }
125
+ Column outputColumn = new Column(i++, name, type);
126
+ builder.add(outputColumn);
127
+ }
128
+ return new Schema(builder.build());
129
+ }
130
+
131
+ private ColumnConfig getColumnConfig(String name, List<ColumnConfig> columnConfigs)
132
+ {
133
+ // hash should be faster, though
134
+ for (ColumnConfig columnConfig : columnConfigs) {
135
+ if (columnConfig.getName().equals(name)) {
136
+ return columnConfig;
137
+ }
138
+ }
139
+ return null;
74
140
  }
75
141
 
76
142
  @Override
@@ -82,7 +148,7 @@ public class TimestampFormatFilterPlugin implements FilterPlugin
82
148
  return new PageOutput() {
83
149
  private PageReader pageReader = new PageReader(inputSchema);
84
150
  private PageBuilder pageBuilder = new PageBuilder(Exec.getBufferAllocator(), outputSchema, output);
85
- private ColumnVisitorImpl visitor = new ColumnVisitorImpl(task, pageReader, pageBuilder);
151
+ private ColumnVisitorImpl visitor = new ColumnVisitorImpl(task, inputSchema, outputSchema, pageReader, pageBuilder);
86
152
 
87
153
  @Override
88
154
  public void finish()
@@ -102,7 +168,7 @@ public class TimestampFormatFilterPlugin implements FilterPlugin
102
168
  pageReader.setPage(page);
103
169
 
104
170
  while (pageReader.nextRecord()) {
105
- outputSchema.visitColumns(visitor);
171
+ inputSchema.visitColumns(visitor);
106
172
  pageBuilder.addRecord();
107
173
  }
108
174
  }
@@ -10,10 +10,10 @@ import org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin.PluginTask
10
10
  import org.embulk.spi.time.JRubyTimeParserHelper;
11
11
  import org.embulk.spi.time.JRubyTimeParserHelperFactory;
12
12
  import org.embulk.spi.time.Timestamp;
13
- import org.embulk.spi.time.TimestampParseException;
14
13
 
15
14
  import static org.embulk.spi.time.TimestampFormat.parseDateTimeZone;
16
15
 
16
+ import org.embulk.spi.time.TimestampParseException;
17
17
  import org.joda.time.DateTimeZone;
18
18
  import org.jruby.embed.ScriptingContainer;
19
19
 
@@ -0,0 +1,59 @@
1
+ package org.embulk.filter.timestamp_format.cast;
2
+
3
+ import org.embulk.filter.timestamp_format.TimestampFormatter;
4
+ import org.embulk.filter.timestamp_format.TimestampParser;
5
+ import org.embulk.spi.DataException;
6
+ import org.embulk.spi.time.Timestamp;
7
+ import org.embulk.spi.time.TimestampParseException;
8
+
9
+ public class StringCast
10
+ {
11
+ private StringCast() {}
12
+
13
+ private static String buildErrorMessage(String value)
14
+ {
15
+ return String.format("failed to parse string: \"%s\"", value);
16
+ }
17
+
18
+ public static String asString(String value, TimestampParser parser, TimestampFormatter formatter) throws DataException
19
+ {
20
+ try {
21
+ Timestamp timestamp = parser.parse(value);
22
+ return formatter.format(timestamp);
23
+ }
24
+ catch (TimestampParseException ex) {
25
+ throw new DataException(buildErrorMessage(value), ex);
26
+ }
27
+ }
28
+
29
+ public static Timestamp asTimestamp(String value, TimestampParser parser) throws DataException
30
+ {
31
+ try {
32
+ return parser.parse(value);
33
+ }
34
+ catch (TimestampParseException ex) {
35
+ throw new DataException(buildErrorMessage(value), ex);
36
+ }
37
+ }
38
+
39
+ public static long asLong(String value, TimestampParser parser) throws DataException
40
+ {
41
+ try {
42
+ Timestamp timestamp = parser.parse(value);
43
+ return timestamp.getEpochSecond();
44
+ }
45
+ catch (TimestampParseException ex) {
46
+ throw new DataException(buildErrorMessage(value), ex);
47
+ }
48
+ }
49
+ public static double asDouble(String value, TimestampParser parser) throws DataException
50
+ {
51
+ try {
52
+ Timestamp timestamp = parser.parse(value);
53
+ return TimestampCast.asDouble(timestamp);
54
+ }
55
+ catch (TimestampParseException ex) {
56
+ throw new DataException(buildErrorMessage(value), ex);
57
+ }
58
+ }
59
+ }
@@ -0,0 +1,32 @@
1
+ package org.embulk.filter.timestamp_format.cast;
2
+
3
+ import org.embulk.filter.timestamp_format.TimestampFormatter;
4
+ import org.embulk.spi.DataException;
5
+ import org.embulk.spi.time.Timestamp;
6
+
7
+ public class TimestampCast
8
+ {
9
+ private TimestampCast() {}
10
+
11
+ public static String asString(Timestamp value, TimestampFormatter formatter) throws DataException
12
+ {
13
+ return formatter.format(value);
14
+ }
15
+
16
+ public static Timestamp asTimestamp(Timestamp value) throws DataException
17
+ {
18
+ return value;
19
+ }
20
+
21
+ public static long asLong(Timestamp value) throws DataException
22
+ {
23
+ return value.getEpochSecond();
24
+ }
25
+
26
+ public static double asDouble(Timestamp value) throws DataException
27
+ {
28
+ long epoch = value.getEpochSecond();
29
+ int nano = value.getNano();
30
+ return (double) epoch + ((double) nano / 1000000000.0);
31
+ }
32
+ }
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-filter-timestamp_format
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.4
4
+ version: 0.1.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Naotoshi Seo
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-04-26 00:00:00.000000000 Z
11
+ date: 2016-04-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -38,20 +38,6 @@ dependencies:
38
38
  version: '10.0'
39
39
  prerelease: false
40
40
  type: :development
41
- - !ruby/object:Gem::Dependency
42
- name: embulk-parser-jsonl
43
- version_requirements: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - '>='
46
- - !ruby/object:Gem::Version
47
- version: '0'
48
- requirement: !ruby/object:Gem::Requirement
49
- requirements:
50
- - - '>='
51
- - !ruby/object:Gem::Version
52
- version: '0'
53
- prerelease: false
54
- type: :development
55
41
  description: A filter plugin for Embulk to change timestamp format.
56
42
  email:
57
43
  - sonots@gmail.com
@@ -66,19 +52,27 @@ files:
66
52
  - README.md
67
53
  - build.gradle
68
54
  - config/checkstyle/checkstyle.xml
69
- - example/example.jsonl
70
55
  - example/example.yml
56
+ - example/json_example.jsonl
57
+ - example/json_example.yml
58
+ - example/string_example.yml
59
+ - example/timestamp_example.yml
71
60
  - gradle/wrapper/gradle-wrapper.jar
72
61
  - gradle/wrapper/gradle-wrapper.properties
73
62
  - gradlew
74
63
  - gradlew.bat
75
64
  - lib/embulk/filter/timestamp_format.rb
65
+ - src/main/java/org/embulk/filter/timestamp_format/ColumnCaster.java
76
66
  - src/main/java/org/embulk/filter/timestamp_format/ColumnVisitorImpl.java
67
+ - src/main/java/org/embulk/filter/timestamp_format/JsonCaster.java
68
+ - src/main/java/org/embulk/filter/timestamp_format/JsonVisitor.java
77
69
  - src/main/java/org/embulk/filter/timestamp_format/TimestampFormatFilterPlugin.java
78
70
  - src/main/java/org/embulk/filter/timestamp_format/TimestampFormatter.java
79
71
  - src/main/java/org/embulk/filter/timestamp_format/TimestampParser.java
72
+ - src/main/java/org/embulk/filter/timestamp_format/cast/StringCast.java
73
+ - src/main/java/org/embulk/filter/timestamp_format/cast/TimestampCast.java
80
74
  - src/test/java/org/embulk/filter/TestTimestampFormatFilterPlugin.java
81
- - classpath/embulk-filter-timestamp_format-0.1.4.jar
75
+ - classpath/embulk-filter-timestamp_format-0.1.5.jar
82
76
  homepage: https://github.com/sonots/embulk-filter-timestamp_format
83
77
  licenses:
84
78
  - MIT
@@ -1,2 +0,0 @@
1
- {"timestamp":"2015-07-12 15:00:00 UTC","record":{"record":[{"timestamp":"2015-07-12 15:00:00 UTC"}]},"ignore_record":{"timestamp":"2015-07-12 15:00:00 UTC"}}
2
- {"timestamp":"2015-07-12 15:00:00.1 UTC","record":{"record":[{"timestamp":"2015-07-12 15:00:00.1 UTC"}]},"ignore_record":{"timestamp":"2015-07-12 15:00:00.1 UTC"}}