embulk-input-mongodb 0.2.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: eca1bf1b21b4a44bab520a7cb93f59df2c18ea82
4
- data.tar.gz: 98f866d2313fe81dbf96da2fe61b9a303afcd698
3
+ metadata.gz: fafb06875aa38ec6b9142251fd9b57b15f56dc17
4
+ data.tar.gz: 6aa68d44e22d4b49e9c5918095dc1fd997a11f58
5
5
  SHA512:
6
- metadata.gz: 6251028e2ec5dc41523cbc43d3a06f57cde1a7b05a9a21a8098d169e6c3d0e05af74d12e325847fcbb74328f8c9cf98eb8e6ed167091cb2215bfbf2341fdb5ff
7
- data.tar.gz: 7a6c238d53ab10733917a05a170a3d06aacb8cbbd8a631c87d86d5fa2a6a198e887df27b61bfebe10655eeefe48293bf0a174417435aed4926285d6ef5ba581f
6
+ metadata.gz: 51b725f4f59acc26978861e96ce4c9388dc53fdd6ff9cb5edb959984a3fd0e2622731d4941940ece4370f5a1b2094b4bb087fd354ea086ba161d685ac170140c
7
+ data.tar.gz: 1ec52180d29002a95c024a878e257ba399caee04e06dba3e7fdb74f9d130e7e207abca5744be7620f958dbd81ff74ea59be53011216fe3809ceb9539ab9d0128
data/README.md CHANGED
@@ -3,61 +3,91 @@
3
3
  [![Build Status](https://travis-ci.org/hakobera/embulk-input-mongodb.svg)](https://travis-ci.org/hakobera/embulk-input-mongodb)
4
4
 
5
5
  MongoDB input plugin for Embulk loads records from MongoDB.
6
+ This plugin loads documents as single-column records (column name is "record"). You can use filter plugins such as [embulk-filter-expand_json](https://github.com/civitaspo/embulk-filter-expand_json) or [embulk-filter-add_time](https://github.com/treasure-data/embulk-filter-add_time) to convert the json column to typed columns. [Rename filter](http://www.embulk.org/docs/built-in.html#rename-filter-plugin) is also useful to rename the typed columns.
6
7
 
7
8
  ## Overview
8
9
 
9
10
  This plugin only works with embulk >= 0.8.8.
10
11
 
11
12
  * **Plugin type**: input
12
- * **Resume supported**: no
13
- * **Cleanup supported**: no
14
13
  * **Guess supported**: no
15
14
 
16
15
  ## Configuration
17
16
 
18
17
  - **uri**: [MongoDB connection string URI](http://docs.mongodb.org/manual/reference/connection-string/) (e.g. 'mongodb://localhost:27017/mydb') (string, required)
19
18
  - **collection**: source collection name (string, required)
20
- - **fields**: hash records that has the following two fields (array, required)
21
- - name: Name of the column
22
- - type: Column types as follows
23
- - boolean
24
- - long
25
- - double
26
- - string
27
- - timestamp
28
- - **query**: provides a JSON document as a query that optionally limits the documents returned (string, optional)
29
- - **sort**: specifies an ordering for exported results (string, optional)
19
+ - **fields**: **(deprecated)** ~~hash records that has the following two fields (array, required)~~
20
+ ~~- name: Name of the column~~
21
+ ~~- type: Column types as follows~~
22
+ ~~- boolean~~
23
+ ~~- long~~
24
+ ~~- double~~
25
+ ~~- string~~
26
+ ~~- timestamp~~
27
+ - **query**: a JSON document used for [querying](https://docs.mongodb.com/manual/tutorial/query-documents/) on the source collection. Documents are loaded from the colleciton if they match with this condition. (string, optional)
28
+ - **projection**: A JSON document used for [projection](https://docs.mongodb.com/manual/reference/operator/projection/positional/) on query results. Fields in a document are used only if they match with this condition. (string, optional)
29
+ - **sort**: ordering of results (string, optional)
30
+ - **stop_on_invalid_record** Stop bulk load transaction if a document includes invalid record (such as unsupported object type) (boolean, optional, default: false)
31
+ - **json_column_name**: column name used in outputs (string, optional, default: "json")
30
32
 
31
33
  ## Example
32
34
 
33
- ### Export all objects
35
+ ### Exporting all objects
34
36
 
35
37
  ```yaml
36
38
  in:
37
39
  type: mongodb
38
40
  uri: mongodb://myuser:mypassword@localhost:27017/my_database
39
41
  collection: "my_collection"
40
- fields:
41
- - { name: id, type: string }
42
- - { name: field1, type: long }
43
- - { name: field2, type: timestamp }
44
- - { name: field3, type: json }
45
42
  ```
46
43
 
47
- ### Filter object by query and sort
44
+ ### Filtering documents by query and projection
48
45
 
49
46
  ```yaml
50
47
  in:
51
48
  type: mongodb
52
49
  uri: mongodb://myuser:mypassword@localhost:27017/my_database
53
50
  collection: "my_collection"
54
- fields:
55
- - { name: id, type: string }
56
- - { name: field1, type: long }
57
- - { name: field2, type: timestamp }
58
- - { name: field3, type: json }
59
51
  query: '{ field1: { $gte: 3 } }'
60
- sort: '{ field1: 1 }'
52
+ projection: '{ "_id": 1, "field1": 1, "field2": 0 }'
53
+ sort: '{ "field1": 1 }'
54
+ ```
55
+
56
+ ### Advanced usage with filter plugins
57
+
58
+ ```yaml
59
+ in:
60
+ type: mongodb
61
+ uri: mongodb://myuser:mypassword@localhost:27017/my_database
62
+ collection: "my_collection"
63
+ query: '{ "age": { $gte: 3 } }'
64
+ projection: '{ "_id": 1, "age": 1, "ts": 1, "firstName": 1, "lastName": 1 }'
65
+
66
+ filters:
67
+ # convert json column into typed columns
68
+ - type: expand_json
69
+ json_column_name: record
70
+ expanded_columns:
71
+ - {name: _id, type: long}
72
+ - {name: ts, type: string}
73
+ - {name: firstName, type: string}
74
+ - {name: lastName, type: string}
75
+
76
+ # rename column names
77
+ - type: rename
78
+ columns:
79
+ _id: id
80
+ firstName: first_name
81
+ lastName: last_name
82
+
83
+ # convert string "ts" column into timestamp "time" column
84
+ - type: add_time
85
+ from_column:
86
+ name: ts
87
+ timestamp_format: "%Y-%m-%dT%H:%M:%S.%N%z"
88
+ to_column:
89
+ name: time
90
+ type: timestamp
61
91
  ```
62
92
 
63
93
  ## Build
@@ -65,3 +95,47 @@ in:
65
95
  ```
66
96
  $ ./gradlew gem
67
97
  ```
98
+
99
+ ## Test
100
+
101
+ ```
102
+ $ ./gradlew test # -t to watch change of files and rebuild continuously
103
+ ```
104
+
105
+ To run unit tests, we need to configure the following environment variables.
106
+
107
+ When environment variables are not set, skip almost test cases.
108
+
109
+ ```
110
+ MONGO_URI
111
+ MONGO_COLLECTION
112
+ ```
113
+
114
+ If you're using Mac OS X El Capitan and GUI Applications(IDE), like as follows.
115
+ ```xml
116
+ $ vi ~/Library/LaunchAgents/environment.plist
117
+ <?xml version="1.0" encoding="UTF-8"?>
118
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
119
+ <plist version="1.0">
120
+ <dict>
121
+ <key>Label</key>
122
+ <string>my.startup</string>
123
+ <key>ProgramArguments</key>
124
+ <array>
125
+ <string>sh</string>
126
+ <string>-c</string>
127
+ <string>
128
+ launchctl setenv MONGO_URI mongodb://myuser:mypassword@localhost:27017/my_database
129
+ launchctl setenv MONGO_COLLECTION my_collection
130
+ </string>
131
+ </array>
132
+ <key>RunAtLoad</key>
133
+ <true/>
134
+ </dict>
135
+ </plist>
136
+
137
+ $ launchctl load ~/Library/LaunchAgents/environment.plist
138
+ $ launchctl getenv MONGO_URI //try to get value.
139
+
140
+ Then start your applications.
141
+ ```
data/build.gradle CHANGED
@@ -2,6 +2,7 @@ plugins {
2
2
  id "com.jfrog.bintray" version "1.1"
3
3
  id "com.github.jruby-gradle.base" version "0.1.5"
4
4
  id "java"
5
+ id "jacoco"
5
6
  }
6
7
  import com.github.jrubygradle.JRubyExec
7
8
  repositories {
@@ -15,7 +16,7 @@ configurations {
15
16
  provided
16
17
  }
17
18
 
18
- version = "0.2.0"
19
+ version = "0.3.0"
19
20
 
20
21
  sourceCompatibility = 1.7
21
22
  targetCompatibility = 1.7
@@ -26,6 +27,8 @@ dependencies {
26
27
  compile "org.mongodb:mongo-java-driver:3.2.2"
27
28
 
28
29
  testCompile "junit:junit:4.+"
30
+ testCompile "org.embulk:embulk-core:0.8.8:tests"
31
+ testCompile "org.embulk:embulk-standards:0.8.8"
29
32
  }
30
33
 
31
34
  task classpath(type: Copy, dependsOn: ["jar"]) {
@@ -1,5 +1,6 @@
1
1
  package org.embulk.input.mongodb;
2
2
 
3
+ import com.google.common.base.Optional;
3
4
  import com.google.common.base.Throwables;
4
5
  import com.mongodb.MongoClient;
5
6
  import com.mongodb.MongoClientURI;
@@ -8,7 +9,9 @@ import com.mongodb.client.MongoCollection;
8
9
  import com.mongodb.client.MongoCursor;
9
10
  import com.mongodb.client.MongoDatabase;
10
11
  import com.mongodb.util.JSON;
11
- import org.bson.Document;
12
+ import com.mongodb.util.JSONParseException;
13
+ import org.bson.codecs.configuration.CodecRegistries;
14
+ import org.bson.codecs.configuration.CodecRegistry;
12
15
  import org.bson.conversions.Bson;
13
16
  import org.embulk.config.Config;
14
17
  import org.embulk.config.ConfigDefault;
@@ -21,18 +24,15 @@ import org.embulk.config.TaskReport;
21
24
  import org.embulk.config.TaskSource;
22
25
  import org.embulk.spi.BufferAllocator;
23
26
  import org.embulk.spi.Column;
24
- import org.embulk.spi.ColumnConfig;
25
27
  import org.embulk.spi.Exec;
26
28
  import org.embulk.spi.InputPlugin;
27
29
  import org.embulk.spi.PageBuilder;
28
30
  import org.embulk.spi.PageOutput;
29
31
  import org.embulk.spi.Schema;
30
32
  import org.embulk.spi.SchemaConfig;
31
- import org.embulk.spi.json.JsonParser;
32
- import org.embulk.spi.time.Timestamp;
33
- import org.embulk.spi.type.Type;
33
+ import org.embulk.spi.type.Types;
34
+ import org.msgpack.value.Value;
34
35
  import org.slf4j.Logger;
35
-
36
36
  import javax.validation.constraints.Min;
37
37
  import java.net.UnknownHostException;
38
38
  import java.util.List;
@@ -51,7 +51,12 @@ public class MongodbInputPlugin
51
51
  String getCollection();
52
52
 
53
53
  @Config("fields")
54
- SchemaConfig getFields();
54
+ @ConfigDefault("null")
55
+ Optional<SchemaConfig> getFields();
56
+
57
+ @Config("projection")
58
+ @ConfigDefault("\"{}\"")
59
+ String getProjection();
55
60
 
56
61
  @Config("query")
57
62
  @ConfigDefault("\"{}\"")
@@ -66,6 +71,14 @@ public class MongodbInputPlugin
66
71
  @Min(1)
67
72
  int getBatchSize();
68
73
 
74
+ @Config("stop_on_invalid_record")
75
+ @ConfigDefault("false")
76
+ boolean getStopOnInvalidRecord();
77
+
78
+ @Config("json_column_name")
79
+ @ConfigDefault("\"record\"")
80
+ String getJsonColumnName();
81
+
69
82
  @ConfigInject
70
83
  BufferAllocator getBufferAllocator();
71
84
  }
@@ -77,13 +90,21 @@ public class MongodbInputPlugin
77
90
  InputPlugin.Control control)
78
91
  {
79
92
  PluginTask task = config.loadConfig(PluginTask.class);
93
+ if (task.getFields().isPresent()) {
94
+ throw new ConfigException("field option was deprecated so setting will be ignored");
95
+ }
96
+
97
+ validateJsonField("projection", task.getProjection());
98
+ validateJsonField("query", task.getQuery());
99
+ validateJsonField("sort", task.getSort());
100
+
80
101
  // Connect once to throw ConfigException in earlier stage of excecution
81
102
  try {
82
103
  connect(task);
83
104
  } catch (UnknownHostException | MongoException ex) {
84
105
  throw new ConfigException(ex);
85
106
  }
86
- Schema schema = task.getFields().toSchema();
107
+ Schema schema = Schema.builder().add(task.getJsonColumnName(), Types.JSON).build();
87
108
  return resume(task.dump(), schema, 1, control);
88
109
  }
89
110
 
@@ -112,33 +133,39 @@ public class MongodbInputPlugin
112
133
  PluginTask task = taskSource.loadTask(PluginTask.class);
113
134
  BufferAllocator allocator = task.getBufferAllocator();
114
135
  PageBuilder pageBuilder = new PageBuilder(allocator, schema, output);
115
- JsonParser jsonParser = new JsonParser();
116
- List<Column> columns = pageBuilder.getSchema().getColumns();
136
+ final Column column = pageBuilder.getSchema().getColumns().get(0);
117
137
 
118
- MongoCollection<Document> collection;
138
+ MongoCollection<Value> collection;
119
139
  try {
120
140
  MongoDatabase db = connect(task);
121
- collection = db.getCollection(task.getCollection());
141
+
142
+ CodecRegistry registry = CodecRegistries.fromRegistries(
143
+ MongoClient.getDefaultCodecRegistry(),
144
+ CodecRegistries.fromCodecs(new ValueCodec(task.getStopOnInvalidRecord()))
145
+ );
146
+ collection = db.getCollection(task.getCollection(), Value.class)
147
+ .withCodecRegistry(registry);
122
148
  } catch (UnknownHostException | MongoException ex) {
123
149
  throw new ConfigException(ex);
124
150
  }
125
151
 
126
152
  Bson query = (Bson) JSON.parse(task.getQuery());
127
- Bson projection = getProjection(task);
153
+ Bson projection = (Bson) JSON.parse(task.getProjection());
128
154
  Bson sort = (Bson) JSON.parse(task.getSort());
129
155
 
130
156
  log.trace("query: {}", query);
131
157
  log.trace("projection: {}", projection);
132
158
  log.trace("sort: {}", sort);
133
159
 
134
- try (MongoCursor<Document> cursor = collection
160
+ try (MongoCursor<Value> cursor = collection
135
161
  .find(query)
136
162
  .projection(projection)
137
163
  .sort(sort)
138
164
  .batchSize(task.getBatchSize())
139
165
  .iterator()) {
140
166
  while (cursor.hasNext()) {
141
- fetch(cursor, pageBuilder, jsonParser, columns);
167
+ pageBuilder.setJson(column, cursor.next());
168
+ pageBuilder.addRecord();
142
169
  }
143
170
  } catch (MongoException ex) {
144
171
  Throwables.propagate(ex);
@@ -165,73 +192,11 @@ public class MongodbInputPlugin
165
192
  return db;
166
193
  }
167
194
 
168
- private void fetch(MongoCursor<Document> cursor, PageBuilder pageBuilder,
169
- JsonParser jsonParser, List<Column> columns) {
170
- Document doc = cursor.next();
171
- for (Column c : columns) {
172
- Type t = c.getType();
173
- String key = normalize(c.getName());
174
-
175
- if (!doc.containsKey(key) || doc.get(key) == null) {
176
- pageBuilder.setNull(c);
177
- } else {
178
- switch (t.getName()) {
179
- case "boolean":
180
- pageBuilder.setBoolean(c, doc.getBoolean(key));
181
- break;
182
-
183
- case "long":
184
- // MongoDB can contain both 'int' and 'long', but embulk only support 'long'
185
- // So enable handling both 'int' and 'long', first get value as java.lang.Number, then convert it to long
186
- pageBuilder.setLong(c, ((Number) doc.get(key)).longValue());
187
- break;
188
-
189
- case "double":
190
- pageBuilder.setDouble(c, ((Number) doc.get(key)).doubleValue());
191
- break;
192
-
193
- case "string":
194
- // Enable output object like ObjectId as string, this is reason I don't use doc.getString(key).
195
- pageBuilder.setString(c, doc.get(key).toString());
196
- break;
197
-
198
- case "timestamp":
199
- pageBuilder.setTimestamp(c, Timestamp.ofEpochMilli(doc.getDate(key).getTime()));
200
- break;
201
-
202
- case "json":
203
- pageBuilder.setJson(c, jsonParser.parse(((Document) doc.get(key)).toJson()));
204
- break;
205
- }
206
- }
207
- }
208
- pageBuilder.addRecord();
209
- }
210
-
211
- private Bson getProjection(PluginTask task) {
212
- SchemaConfig fields = task.getFields();
213
- StringBuilder sb = new StringBuilder("{");
214
- int l = fields.getColumnCount();
215
-
216
- for (int i = 0; i < l; i++) {
217
- ColumnConfig c = fields.getColumn(i);
218
- if (i != 0) {
219
- sb.append(",");
220
- }
221
- String key = normalize(c.getName());
222
- sb.append(key).append(":1");
223
- }
224
- sb.append("}");
225
-
226
- return (Bson) JSON.parse(sb.toString());
227
- }
228
-
229
- private String normalize(String key) {
230
- // 'id' is special alias key name of MongoDB ObjectId
231
- // http://docs.mongodb.org/manual/reference/object-id/
232
- if (key.equals("id")) {
233
- return "_id";
195
+ private void validateJsonField(String name, String jsonString) {
196
+ try {
197
+ JSON.parse(jsonString);
198
+ } catch (JSONParseException ex) {
199
+ throw new ConfigException(String.format("Invalid JSON string was given for '%s' parameter. [%s]", name, jsonString));
234
200
  }
235
- return key;
236
201
  }
237
202
  }
@@ -0,0 +1,151 @@
1
+ package org.embulk.input.mongodb;
2
+
3
+ import org.bson.BsonReader;
4
+ import org.bson.BsonType;
5
+ import org.bson.BsonWriter;
6
+ import org.bson.codecs.Codec;
7
+ import org.bson.codecs.DecoderContext;
8
+ import org.bson.codecs.EncoderContext;
9
+ import org.embulk.spi.DataException;
10
+ import org.embulk.spi.Exec;
11
+ import org.msgpack.value.Value;
12
+ import org.slf4j.Logger;
13
+
14
+ import static org.msgpack.value.ValueFactory.newArray;
15
+ import static org.msgpack.value.ValueFactory.newBinary;
16
+ import static org.msgpack.value.ValueFactory.newBoolean;
17
+ import static org.msgpack.value.ValueFactory.newFloat;
18
+ import static org.msgpack.value.ValueFactory.newInteger;
19
+ import static org.msgpack.value.ValueFactory.newMap;
20
+ import static org.msgpack.value.ValueFactory.newNil;
21
+ import static org.msgpack.value.ValueFactory.newString;
22
+
23
+ import java.text.SimpleDateFormat;
24
+ import java.util.ArrayList;
25
+ import java.util.Date;
26
+ import java.util.LinkedHashMap;
27
+ import java.util.List;
28
+ import java.util.Map;
29
+ import java.util.TimeZone;
30
+
31
+ public class ValueCodec implements Codec<Value> {
32
+ private final SimpleDateFormat formatter;
33
+ private final Logger log = Exec.getLogger(MongodbInputPlugin.class);
34
+ private final boolean stopOnInvalidRecord;
35
+
36
+ public ValueCodec(boolean stopOnInvalidRecord) {
37
+ this.formatter = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", java.util.Locale.ENGLISH);
38
+ formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
39
+ this.stopOnInvalidRecord = stopOnInvalidRecord;
40
+ }
41
+
42
+ @Override
43
+ public void encode(final BsonWriter writer, final Value value, final EncoderContext encoderContext) {
44
+ throw new UnsupportedOperationException();
45
+ }
46
+
47
+ @Override
48
+ public Value decode(final BsonReader reader, final DecoderContext decoderContext) {
49
+ Map<Value, Value> kvs = new LinkedHashMap<>();
50
+
51
+ reader.readStartDocument();
52
+ boolean isTopLevelNode = false;
53
+ while (reader.readBsonType() != BsonType.END_OF_DOCUMENT) {
54
+ String fieldName = reader.readName();
55
+ BsonType type = reader.getCurrentBsonType();
56
+ if (type == BsonType.OBJECT_ID) {
57
+ isTopLevelNode = true;
58
+ }
59
+ fieldName = normalize(fieldName, isTopLevelNode);
60
+
61
+ try {
62
+ kvs.put(newString(fieldName), readValue(reader, decoderContext));
63
+ } catch (UnknownTypeFoundException ex) {
64
+ reader.skipValue();
65
+ if (stopOnInvalidRecord) {
66
+ throw ex;
67
+ }
68
+ log.warn(String.format("Skipped document because field '%s' contains unsupported object type [%s]",
69
+ fieldName, type));
70
+ }
71
+ }
72
+ reader.readEndDocument();
73
+
74
+ return newMap(kvs);
75
+ }
76
+
77
+ public Value decodeArray(final BsonReader reader, final DecoderContext decoderContext) {
78
+ List<Value> list = new ArrayList<>();
79
+
80
+ reader.readStartArray();
81
+ while (reader.readBsonType() != BsonType.END_OF_DOCUMENT) {
82
+ list.add(readValue(reader, decoderContext));
83
+ }
84
+ reader.readEndArray();
85
+
86
+ return newArray(list);
87
+ }
88
+
89
+ private Value readValue(BsonReader reader, DecoderContext decoderContext) {
90
+ switch (reader.getCurrentBsonType()) {
91
+ // https://docs.mongodb.com/manual/reference/bson-types/
92
+ // https://github.com/mongodb/mongo-java-driver/tree/master/bson/src/main/org/bson/codecs
93
+ case DOUBLE:
94
+ return newFloat(reader.readDouble());
95
+ case STRING:
96
+ return newString(reader.readString());
97
+ case ARRAY:
98
+ return decodeArray(reader, decoderContext);
99
+ case BINARY:
100
+ return newBinary(reader.readBinaryData().getData(), true);
101
+ case OBJECT_ID:
102
+ return newString(reader.readObjectId().toString());
103
+ case BOOLEAN:
104
+ return newBoolean(reader.readBoolean());
105
+ case DATE_TIME:
106
+ return newString(formatter.format(new Date(reader.readDateTime())));
107
+ case NULL:
108
+ reader.readNull();
109
+ return newNil();
110
+ case REGULAR_EXPRESSION:
111
+ return newString(reader.readRegularExpression().toString());
112
+ case JAVASCRIPT:
113
+ return newString(reader.readJavaScript());
114
+ case JAVASCRIPT_WITH_SCOPE:
115
+ return newString(reader.readJavaScriptWithScope());
116
+ case INT32:
117
+ return newInteger(reader.readInt32());
118
+ case TIMESTAMP:
119
+ return newInteger(reader.readTimestamp().getTime());
120
+ case INT64:
121
+ return newInteger(reader.readInt64());
122
+ case DOCUMENT:
123
+ return decode(reader, decoderContext);
124
+ default: // e.g. MIN_KEY, MAX_KEY, SYMBOL, DB_POINTER, UNDEFINED
125
+ throw new UnknownTypeFoundException(String.format("Unsupported type %s of '%s' field. Please exclude the field from 'projection:' option",
126
+ reader.getCurrentBsonType(), reader.getCurrentName()));
127
+ }
128
+ }
129
+
130
+ @Override
131
+ public Class<Value> getEncoderClass() {
132
+ return Value.class;
133
+ }
134
+
135
+ private String normalize(String key, boolean isTopLevelNode) {
136
+ // 'id' is special alias key name of MongoDB ObjectId
137
+ // http://docs.mongodb.org/manual/reference/object-id/
138
+ if (key.equals("id") && isTopLevelNode) {
139
+ return "_id";
140
+ }
141
+ return key;
142
+ }
143
+
144
+ public static class UnknownTypeFoundException extends DataException
145
+ {
146
+ UnknownTypeFoundException(String message)
147
+ {
148
+ super(message);
149
+ }
150
+ }
151
+ }
@@ -1,5 +1,339 @@
1
1
  package org.embulk.input.mongodb;
2
2
 
3
- public class TestMongodbInputPlugin
4
- {
3
+ import com.fasterxml.jackson.databind.JsonNode;
4
+ import com.fasterxml.jackson.databind.ObjectMapper;
5
+ import com.google.common.collect.ImmutableList;
6
+ import com.google.common.collect.Lists;
7
+ import com.mongodb.client.MongoCollection;
8
+ import com.mongodb.client.MongoDatabase;
9
+ import org.bson.BsonBinary;
10
+ import org.bson.BsonInt64;
11
+ import org.bson.BsonJavaScript;
12
+ import org.bson.BsonMaxKey;
13
+ import org.bson.BsonRegularExpression;
14
+ import org.bson.BsonTimestamp;
15
+ import org.bson.Document;
16
+ import org.embulk.EmbulkTestRuntime;
17
+ import org.embulk.config.ConfigException;
18
+ import org.embulk.config.ConfigSource;
19
+ import org.embulk.config.TaskReport;
20
+ import org.embulk.config.TaskSource;
21
+ import org.embulk.input.mongodb.MongodbInputPlugin.PluginTask;
22
+ import org.embulk.spi.Column;
23
+ import org.embulk.spi.Exec;
24
+ import org.embulk.spi.InputPlugin;
25
+ import org.embulk.spi.Schema;
26
+ import org.embulk.spi.TestPageBuilderReader.MockPageOutput;
27
+ import org.embulk.spi.type.Types;
28
+ import org.embulk.spi.util.Pages;
29
+ import org.junit.Before;
30
+ import org.junit.BeforeClass;
31
+ import org.junit.Rule;
32
+ import org.junit.Test;
33
+ import org.junit.rules.ExpectedException;
34
+
35
+ import java.lang.reflect.InvocationTargetException;
36
+ import java.lang.reflect.Method;
37
+ import java.text.SimpleDateFormat;
38
+ import java.util.ArrayList;
39
+ import java.util.Arrays;
40
+ import java.util.List;
41
+ import java.util.Locale;
42
+
43
+ import static org.hamcrest.CoreMatchers.is;
44
+ import static org.junit.Assert.assertEquals;
45
+ import static org.junit.Assert.assertThat;
46
+
47
+ public class TestMongodbInputPlugin {
48
+ private static String MONGO_URI;
49
+ private static String MONGO_COLLECTION;
50
+
51
+ @Rule
52
+ public EmbulkTestRuntime runtime = new EmbulkTestRuntime();
53
+
54
+ @Rule
55
+ public ExpectedException exception = ExpectedException.none();
56
+
57
+ private ConfigSource config;
58
+ private MongodbInputPlugin plugin;
59
+ private MockPageOutput output;
60
+
61
+ /*
62
+ * This test case requires environment variables
63
+ * MONGO_URI
64
+ * MONGO_COLLECTION
65
+ */
66
+ @BeforeClass
67
+ public static void initializeConstant() {
68
+ MONGO_URI = System.getenv("MONGO_URI");
69
+ MONGO_COLLECTION = System.getenv("MONGO_COLLECTION");
70
+ }
71
+
72
+ @Before
73
+ public void createResources() throws Exception {
74
+ config = config();
75
+ plugin = new MongodbInputPlugin();
76
+ output = new MockPageOutput();
77
+ }
78
+
79
+ @Test
80
+ public void checkDefaultValues() {
81
+ ConfigSource config = Exec.newConfigSource()
82
+ .set("uri", MONGO_URI)
83
+ .set("collection", MONGO_COLLECTION);
84
+
85
+ PluginTask task = config.loadConfig(PluginTask.class);
86
+ assertEquals("{}", task.getQuery());
87
+ assertEquals("{}", task.getSort());
88
+ assertEquals((long) 10000, (long) task.getBatchSize());
89
+ assertEquals("record", task.getJsonColumnName());
90
+ }
91
+
92
+ @Test(expected = ConfigException.class)
93
+ public void checkDefaultValuesUriIsNull() {
94
+ ConfigSource config = Exec.newConfigSource()
95
+ .set("uri", null)
96
+ .set("collection", MONGO_COLLECTION);
97
+
98
+ plugin.transaction(config, new Control());
99
+ }
100
+
101
+ @Test(expected = ConfigException.class)
102
+ public void checkDefaultValuesInvalidUri()
103
+ {
104
+ ConfigSource config = Exec.newConfigSource()
105
+ .set("uri", "mongodb://mongouser:password@non-exists.example.com:23490/test")
106
+ .set("collection", MONGO_COLLECTION);
107
+
108
+ plugin.transaction(config, new Control());
109
+ }
110
+
111
+ @Test(expected = ConfigException.class)
112
+ public void checkDefaultValuesCollectionIsNull() {
113
+ ConfigSource config = Exec.newConfigSource()
114
+ .set("uri", MONGO_URI)
115
+ .set("collection", null);
116
+
117
+ plugin.transaction(config, new Control());
118
+ }
119
+
120
+ @Test
121
+ public void testResume() {
122
+ PluginTask task = config.loadConfig(PluginTask.class);
123
+ final Schema schema = getFieldSchema();
124
+ plugin.resume(task.dump(), schema, 0, new InputPlugin.Control() {
125
+ @Override
126
+ public List<TaskReport> run(TaskSource taskSource, Schema schema, int taskCount) {
127
+ return emptyTaskReports(taskCount);
128
+ }
129
+ });
130
+ // no errors happens
131
+ }
132
+
133
+ @Test
134
+ public void testCleanup() {
135
+ PluginTask task = config.loadConfig(PluginTask.class);
136
+ Schema schema = getFieldSchema();
137
+ plugin.cleanup(task.dump(), schema, 0, Lists.<TaskReport>newArrayList()); // no errors happens
138
+ }
139
+
140
+ @Test
141
+ public void testGuess() {
142
+ plugin.guess(config); // no errors happens
143
+ }
144
+
145
+ @Test
146
+ public void testRun() throws Exception {
147
+ PluginTask task = config.loadConfig(PluginTask.class);
148
+
149
+ dropCollection(task, MONGO_COLLECTION);
150
+ createCollection(task, MONGO_COLLECTION);
151
+ insertDocument(task, createValidDocuments());
152
+
153
+ plugin.transaction(config, new Control());
154
+ assertValidRecords(getFieldSchema(), output);
155
+ }
156
+
157
+ @Test(expected = ValueCodec.UnknownTypeFoundException.class)
158
+ public void testRunWithUnsupportedType() throws Exception {
159
+ ConfigSource config = Exec.newConfigSource()
160
+ .set("uri", MONGO_URI)
161
+ .set("collection", MONGO_COLLECTION)
162
+ .set("stop_on_invalid_record", true);
163
+
164
+ PluginTask task = config.loadConfig(PluginTask.class);
165
+
166
+ dropCollection(task, MONGO_COLLECTION);
167
+ createCollection(task, MONGO_COLLECTION);
168
+
169
+ List<Document> documents = new ArrayList<>();
170
+ documents.add(
171
+ new Document("invalid_field", new BsonMaxKey())
172
+ );
173
+ insertDocument(task, documents);
174
+
175
+ plugin.transaction(config, new Control());
176
+ }
177
+
178
+ @Test
179
+ public void testNormalize() throws Exception {
180
+ ValueCodec codec = new ValueCodec(true);
181
+
182
+ Method normalize = ValueCodec.class.getDeclaredMethod("normalize", String.class, boolean.class);
183
+ normalize.setAccessible(true);
184
+ assertEquals("_id", normalize.invoke(codec, "id", true).toString());
185
+ assertEquals("_id", normalize.invoke(codec, "_id", true).toString());
186
+ assertEquals("f1", normalize.invoke(codec, "f1", true).toString());
187
+
188
+ assertEquals("id", normalize.invoke(codec, "id", false).toString());
189
+ assertEquals("_id", normalize.invoke(codec, "_id", false).toString());
190
+ assertEquals("f1", normalize.invoke(codec, "f1", false).toString());
191
+ }
192
+
193
+ @Test
194
+ public void testValidateJsonField() throws Exception {
195
+ Method validate = MongodbInputPlugin.class.getDeclaredMethod("validateJsonField", String.class, String.class);
196
+ validate.setAccessible(true);
197
+ String invalidJsonString = "{\"name\": invalid}";
198
+ try {
199
+ validate.invoke(plugin, "name", invalidJsonString);
200
+ } catch (InvocationTargetException ex) {
201
+ assertEquals(ConfigException.class, ex.getCause().getClass());
202
+ }
203
+ }
204
+
205
+ static List<TaskReport> emptyTaskReports(int taskCount) {
206
+ ImmutableList.Builder<TaskReport> reports = new ImmutableList.Builder<>();
207
+ for (int i = 0; i < taskCount; i++) {
208
+ reports.add(Exec.newTaskReport());
209
+ }
210
+ return reports.build();
211
+ }
212
+
213
+ private class Control
214
+ implements InputPlugin.Control {
215
+ @Override
216
+ public List<TaskReport> run(TaskSource taskSource, Schema schema, int taskCount) {
217
+ List<TaskReport> reports = new ArrayList<>();
218
+ for (int i = 0; i < taskCount; i++) {
219
+ reports.add(plugin.run(taskSource, schema, i, output));
220
+ }
221
+ return reports;
222
+ }
223
+ }
224
+
225
+ private ConfigSource config() {
226
+ return Exec.newConfigSource()
227
+ .set("uri", MONGO_URI)
228
+ .set("collection", MONGO_COLLECTION)
229
+ .set("last_path", "");
230
+ }
231
+
232
+ private List<Document> createValidDocuments() throws Exception {
233
+ SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'", Locale.ENGLISH);
234
+
235
+ List<Document> documents = new ArrayList<>();
236
+ documents.add(
237
+ new Document("double_field", 1.23)
238
+ .append("string_field", "embulk")
239
+ .append("array_field", Arrays.asList(1,2,3))
240
+ .append("binary_field", new BsonBinary(("test").getBytes("UTF-8")))
241
+ .append("boolean_field", true)
242
+ .append("datetime_field", format.parse("2015-01-27T19:23:49Z"))
243
+ .append("null_field", null)
244
+ .append("regex_field", new BsonRegularExpression(".+?"))
245
+ .append("javascript_field", new BsonJavaScript("var s = \"javascript\";"))
246
+ .append("int32_field", 32864)
247
+ .append("timestamp_field", new BsonTimestamp(1463991177, 4))
248
+ .append("int64_field", new BsonInt64(314159265))
249
+ .append("document_field", new Document("k", true))
250
+ );
251
+
252
+ documents.add(
253
+ new Document("boolean_field", false)
254
+ .append("document_field", new Document("k", 1))
255
+ );
256
+
257
+ documents.add(new Document("document_field", new Document("k", 1.23)));
258
+
259
+ documents.add(new Document("document_field", new Document("k", "v")));
260
+
261
+ documents.add(new Document("document_field", new Document("k", format.parse("2015-02-03T08:13:45Z"))));
262
+
263
+ return documents;
264
+ }
265
+
266
+ private Schema getFieldSchema() {
267
+ ImmutableList.Builder<Column> columns = ImmutableList.builder();
268
+ columns.add(new Column(0, "record", Types.JSON));
269
+ return new Schema(columns.build());
270
+ }
271
+
272
+ private void assertValidRecords(Schema schema, MockPageOutput output) throws Exception {
273
+ List<Object[]> records = Pages.toObjects(schema, output.pages);
274
+ assertEquals(5, records.size());
275
+
276
+ ObjectMapper mapper = new ObjectMapper();
277
+
278
+ {
279
+ JsonNode node = mapper.readTree(records.get(0)[0].toString());
280
+ assertThat(1.23, is(node.get("double_field").asDouble()));
281
+ assertEquals("embulk", node.get("string_field").asText());
282
+ assertEquals("[1,2,3]", node.get("array_field").toString());
283
+ assertEquals("test", node.get("binary_field").asText());
284
+ assertEquals(true, node.get("boolean_field").asBoolean());
285
+ assertEquals("2015-01-27T10:23:49.000Z", node.get("datetime_field").asText());
286
+ assertEquals("null", node.get("null_field").asText());
287
+ assertEquals("BsonRegularExpression{pattern='.+?', options=''}", node.get("regex_field").asText());
288
+ assertEquals("var s = \"javascript\";", node.get("javascript_field").asText());
289
+ assertEquals(32864L, node.get("int32_field").asLong());
290
+ assertEquals("1463991177", node.get("timestamp_field").asText());
291
+ assertEquals(314159265L, node.get("int64_field").asLong());
292
+ assertEquals("{\"k\":true}", node.get("document_field").toString());
293
+ }
294
+
295
+ {
296
+ JsonNode node = mapper.readTree(records.get(1)[0].toString());
297
+ assertEquals(false, node.get("boolean_field").asBoolean());
298
+ assertEquals("{\"k\":1}", node.get("document_field").toString());
299
+ }
300
+
301
+ {
302
+ JsonNode node = mapper.readTree(records.get(2)[0].toString());
303
+ assertEquals("{\"k\":1.23}", node.get("document_field").toString());
304
+ }
305
+
306
+ {
307
+ JsonNode node = mapper.readTree(records.get(3)[0].toString());
308
+ assertEquals("{\"k\":\"v\"}", node.get("document_field").toString());
309
+ }
310
+
311
+ {
312
+ JsonNode node = mapper.readTree(records.get(4)[0].toString());
313
+ assertEquals("{\"k\":\"2015-02-02T23:13:45.000Z\"}", node.get("document_field").toString());
314
+ }
315
+ }
316
+
317
+ private void createCollection(PluginTask task, String collectionName) throws Exception {
318
+ Method method = MongodbInputPlugin.class.getDeclaredMethod("connect", PluginTask.class);
319
+ method.setAccessible(true);
320
+ MongoDatabase db = (MongoDatabase) method.invoke(plugin, task);
321
+ db.createCollection(collectionName);
322
+ }
323
+
324
+ private void dropCollection(PluginTask task, String collectionName) throws Exception {
325
+ Method method = MongodbInputPlugin.class.getDeclaredMethod("connect", PluginTask.class);
326
+ method.setAccessible(true);
327
+ MongoDatabase db = (MongoDatabase) method.invoke(plugin, task);
328
+ MongoCollection collection = db.getCollection(collectionName);
329
+ collection.drop();
330
+ }
331
+
332
+ private void insertDocument(PluginTask task, List<Document> documents) throws Exception {
333
+ Method method = MongodbInputPlugin.class.getDeclaredMethod("connect", PluginTask.class);
334
+ method.setAccessible(true);
335
+ MongoDatabase db = (MongoDatabase) method.invoke(plugin, task);
336
+ MongoCollection collection = db.getCollection(task.getCollection());
337
+ collection.insertMany(documents);
338
+ }
5
339
  }
@@ -2,9 +2,7 @@ in:
2
2
  type: mongodb
3
3
  uri: mongodb://localhost:27017/my_database
4
4
  collection: "my_collection"
5
- fields:
6
- - { name: name, type: string }
7
- - { name: rank, type: long }
5
+ projection: '{ "_id": 0, "name": 1, "rank": 1 }'
8
6
  sort: '{ rank: 1 }'
9
7
  out:
10
8
  type: file
@@ -1,10 +1,10 @@
1
- name,rank
2
- obj1,1
3
- obj2,2
4
- obj3,3
5
- obj4,4
6
- obj5,5
7
- obj6,6
8
- obj7,7
9
- obj8,8
10
- obj9,9
1
+ record
2
+ "{""name"":""obj1"",""rank"":1}"
3
+ "{""name"":""obj2"",""rank"":2}"
4
+ "{""name"":""obj3"",""rank"":3}"
5
+ "{""name"":""obj4"",""rank"":4}"
6
+ "{""name"":""obj5"",""rank"":5}"
7
+ "{""name"":""obj6"",""rank"":6}"
8
+ "{""name"":""obj7"",""rank"":7}"
9
+ "{""name"":""obj8"",""rank"":8}"
10
+ "{""name"":""obj9"",""rank"":9}"
@@ -2,13 +2,7 @@ in:
2
2
  type: mongodb
3
3
  uri: mongodb://localhost:27017/my_database
4
4
  collection: "my_collection"
5
- fields:
6
- - { name: id, type: string }
7
- - { name: name, type: string }
8
- - { name: rank, type: long }
9
- - { name: value, type: double }
10
- - { name: created_at, type: timestamp }
11
- - { name: embeded, type: json }
5
+ json_column_name: "json"
12
6
  query: '{ rank: { $gte: 3 } }'
13
7
  sort: '{ rank: -1 }'
14
8
  batch_size: 100
@@ -1,8 +1,8 @@
1
- id,name,rank,value,created_at,embeded
2
- 55eae883689a08361045d652,obj9,9,9.9,2015-09-06 10:05:18.786000 +0000,"{""key"":""value9""}"
3
- 55eae883689a08361045d651,obj8,8,8.8,2015-09-06 10:05:28.786000 +0000,"{""key"":""value8""}"
4
- 55eae883689a08361045d650,obj7,7,7.7,2015-09-06 10:05:38.786000 +0000,"{""key"":""value7""}"
5
- 55eae883689a08361045d64f,obj6,6,6.6,2015-09-06 10:05:48.786000 +0000,"{""key"":""value6""}"
6
- 55eae883689a08361045d64e,obj5,5,5.5,2015-09-06 10:05:58.786000 +0000,"{""key"":""value5""}"
7
- 55eae883689a08361045d64d,obj4,4,4.4,2015-09-06 10:06:08.786000 +0000,"{""key"":{""inner_key"":""value4""}}"
8
- 55eae883689a08361045d64c,obj3,3,3.3,2015-09-06 10:06:18.786000 +0000,"{""key"":[""v3-1"",""v3-2""]}"
1
+ json
2
+ "{""_id"":""55eae883689a08361045d652"",""name"":""obj9"",""rank"":9,""value"":9.9,""created_at"":""2015-09-06T10:05:18.786Z"",""embeded"":{""key"":""value9""}}"
3
+ "{""_id"":""55eae883689a08361045d651"",""name"":""obj8"",""rank"":8,""value"":8.8,""created_at"":""2015-09-06T10:05:28.786Z"",""embeded"":{""key"":""value8""}}"
4
+ "{""_id"":""55eae883689a08361045d650"",""name"":""obj7"",""rank"":7,""value"":7.7,""created_at"":""2015-09-06T10:05:38.786Z"",""embeded"":{""key"":""value7""}}"
5
+ "{""_id"":""55eae883689a08361045d64f"",""name"":""obj6"",""rank"":6,""value"":6.6,""created_at"":""2015-09-06T10:05:48.786Z"",""embeded"":{""key"":""value6""}}"
6
+ "{""_id"":""55eae883689a08361045d64e"",""name"":""obj5"",""rank"":5,""value"":5.5,""created_at"":""2015-09-06T10:05:58.786Z"",""embeded"":{""key"":""value5""}}"
7
+ "{""_id"":""55eae883689a08361045d64d"",""name"":""obj4"",""rank"":4,""value"":4.4,""created_at"":""2015-09-06T10:06:08.786Z"",""embeded"":{""key"":{""inner_key"":""value4""}}}"
8
+ "{""_id"":""55eae883689a08361045d64c"",""name"":""obj3"",""rank"":3,""value"":3.3,""created_at"":""2015-09-06T10:06:18.786Z"",""embeded"":{""key"":[""v3-1"",""v3-2""]}}"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-input-mongodb
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Kazuyuki Honda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-05-30 00:00:00.000000000 Z
11
+ date: 2016-06-10 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -56,13 +56,14 @@ files:
56
56
  - gradlew.bat
57
57
  - lib/embulk/input/mongodb.rb
58
58
  - src/main/java/org/embulk/input/mongodb/MongodbInputPlugin.java
59
+ - src/main/java/org/embulk/input/mongodb/ValueCodec.java
59
60
  - src/test/java/org/embulk/input/mongodb/TestMongodbInputPlugin.java
60
61
  - src/test/resources/basic.yml
61
62
  - src/test/resources/basic_expected.csv
62
63
  - src/test/resources/full.yml
63
64
  - src/test/resources/full_expected.csv
64
65
  - src/test/resources/my_collection.jsonl
65
- - classpath/embulk-input-mongodb-0.2.0.jar
66
+ - classpath/embulk-input-mongodb-0.3.0.jar
66
67
  - classpath/mongo-java-driver-3.2.2.jar
67
68
  homepage: https://github.com/hakobera/embulk-input-mongodb
68
69
  licenses: