embulk-input-mongodb 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: eca1bf1b21b4a44bab520a7cb93f59df2c18ea82
4
- data.tar.gz: 98f866d2313fe81dbf96da2fe61b9a303afcd698
3
+ metadata.gz: fafb06875aa38ec6b9142251fd9b57b15f56dc17
4
+ data.tar.gz: 6aa68d44e22d4b49e9c5918095dc1fd997a11f58
5
5
  SHA512:
6
- metadata.gz: 6251028e2ec5dc41523cbc43d3a06f57cde1a7b05a9a21a8098d169e6c3d0e05af74d12e325847fcbb74328f8c9cf98eb8e6ed167091cb2215bfbf2341fdb5ff
7
- data.tar.gz: 7a6c238d53ab10733917a05a170a3d06aacb8cbbd8a631c87d86d5fa2a6a198e887df27b61bfebe10655eeefe48293bf0a174417435aed4926285d6ef5ba581f
6
+ metadata.gz: 51b725f4f59acc26978861e96ce4c9388dc53fdd6ff9cb5edb959984a3fd0e2622731d4941940ece4370f5a1b2094b4bb087fd354ea086ba161d685ac170140c
7
+ data.tar.gz: 1ec52180d29002a95c024a878e257ba399caee04e06dba3e7fdb74f9d130e7e207abca5744be7620f958dbd81ff74ea59be53011216fe3809ceb9539ab9d0128
data/README.md CHANGED
@@ -3,61 +3,91 @@
3
3
  [![Build Status](https://travis-ci.org/hakobera/embulk-input-mongodb.svg)](https://travis-ci.org/hakobera/embulk-input-mongodb)
4
4
 
5
5
  MongoDB input plugin for Embulk loads records from MongoDB.
6
+ This plugin loads documents as single-column records (column name is "record"). You can use filter plugins such as [embulk-filter-expand_json](https://github.com/civitaspo/embulk-filter-expand_json) or [embulk-filter-add_time](https://github.com/treasure-data/embulk-filter-add_time) to convert the json column to typed columns. [Rename filter](http://www.embulk.org/docs/built-in.html#rename-filter-plugin) is also useful to rename the typed columns.
6
7
 
7
8
  ## Overview
8
9
 
9
10
  This plugin only works with embulk >= 0.8.8.
10
11
 
11
12
  * **Plugin type**: input
12
- * **Resume supported**: no
13
- * **Cleanup supported**: no
14
13
  * **Guess supported**: no
15
14
 
16
15
  ## Configuration
17
16
 
18
17
  - **uri**: [MongoDB connection string URI](http://docs.mongodb.org/manual/reference/connection-string/) (e.g. 'mongodb://localhost:27017/mydb') (string, required)
19
18
  - **collection**: source collection name (string, required)
20
- - **fields**: hash records that has the following two fields (array, required)
21
- - name: Name of the column
22
- - type: Column types as follows
23
- - boolean
24
- - long
25
- - double
26
- - string
27
- - timestamp
28
- - **query**: provides a JSON document as a query that optionally limits the documents returned (string, optional)
29
- - **sort**: specifies an ordering for exported results (string, optional)
19
+ - **fields**: **(deprecated)** ~~hash records that has the following two fields (array, required)~~
20
+ ~~- name: Name of the column~~
21
+ ~~- type: Column types as follows~~
22
+ ~~- boolean~~
23
+ ~~- long~~
24
+ ~~- double~~
25
+ ~~- string~~
26
+ ~~- timestamp~~
27
+ - **query**: a JSON document used for [querying](https://docs.mongodb.com/manual/tutorial/query-documents/) on the source collection. Documents are loaded from the colleciton if they match with this condition. (string, optional)
28
+ - **projection**: A JSON document used for [projection](https://docs.mongodb.com/manual/reference/operator/projection/positional/) on query results. Fields in a document are used only if they match with this condition. (string, optional)
29
+ - **sort**: ordering of results (string, optional)
30
+ - **stop_on_invalid_record** Stop bulk load transaction if a document includes invalid record (such as unsupported object type) (boolean, optional, default: false)
31
+ - **json_column_name**: column name used in outputs (string, optional, default: "json")
30
32
 
31
33
  ## Example
32
34
 
33
- ### Export all objects
35
+ ### Exporting all objects
34
36
 
35
37
  ```yaml
36
38
  in:
37
39
  type: mongodb
38
40
  uri: mongodb://myuser:mypassword@localhost:27017/my_database
39
41
  collection: "my_collection"
40
- fields:
41
- - { name: id, type: string }
42
- - { name: field1, type: long }
43
- - { name: field2, type: timestamp }
44
- - { name: field3, type: json }
45
42
  ```
46
43
 
47
- ### Filter object by query and sort
44
+ ### Filtering documents by query and projection
48
45
 
49
46
  ```yaml
50
47
  in:
51
48
  type: mongodb
52
49
  uri: mongodb://myuser:mypassword@localhost:27017/my_database
53
50
  collection: "my_collection"
54
- fields:
55
- - { name: id, type: string }
56
- - { name: field1, type: long }
57
- - { name: field2, type: timestamp }
58
- - { name: field3, type: json }
59
51
  query: '{ field1: { $gte: 3 } }'
60
- sort: '{ field1: 1 }'
52
+ projection: '{ "_id": 1, "field1": 1, "field2": 0 }'
53
+ sort: '{ "field1": 1 }'
54
+ ```
55
+
56
+ ### Advanced usage with filter plugins
57
+
58
+ ```yaml
59
+ in:
60
+ type: mongodb
61
+ uri: mongodb://myuser:mypassword@localhost:27017/my_database
62
+ collection: "my_collection"
63
+ query: '{ "age": { $gte: 3 } }'
64
+ projection: '{ "_id": 1, "age": 1, "ts": 1, "firstName": 1, "lastName": 1 }'
65
+
66
+ filters:
67
+ # convert json column into typed columns
68
+ - type: expand_json
69
+ json_column_name: record
70
+ expanded_columns:
71
+ - {name: _id, type: long}
72
+ - {name: ts, type: string}
73
+ - {name: firstName, type: string}
74
+ - {name: lastName, type: string}
75
+
76
+ # rename column names
77
+ - type: rename
78
+ columns:
79
+ _id: id
80
+ firstName: first_name
81
+ lastName: last_name
82
+
83
+ # convert string "ts" column into timestamp "time" column
84
+ - type: add_time
85
+ from_column:
86
+ name: ts
87
+ timestamp_format: "%Y-%m-%dT%H:%M:%S.%N%z"
88
+ to_column:
89
+ name: time
90
+ type: timestamp
61
91
  ```
62
92
 
63
93
  ## Build
@@ -65,3 +95,47 @@ in:
65
95
  ```
66
96
  $ ./gradlew gem
67
97
  ```
98
+
99
+ ## Test
100
+
101
+ ```
102
+ $ ./gradlew test # -t to watch change of files and rebuild continuously
103
+ ```
104
+
105
+ To run unit tests, we need to configure the following environment variables.
106
+
107
+ When environment variables are not set, skip almost test cases.
108
+
109
+ ```
110
+ MONGO_URI
111
+ MONGO_COLLECTION
112
+ ```
113
+
114
+ If you're using Mac OS X El Capitan and GUI Applications(IDE), like as follows.
115
+ ```xml
116
+ $ vi ~/Library/LaunchAgents/environment.plist
117
+ <?xml version="1.0" encoding="UTF-8"?>
118
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
119
+ <plist version="1.0">
120
+ <dict>
121
+ <key>Label</key>
122
+ <string>my.startup</string>
123
+ <key>ProgramArguments</key>
124
+ <array>
125
+ <string>sh</string>
126
+ <string>-c</string>
127
+ <string>
128
+ launchctl setenv MONGO_URI mongodb://myuser:mypassword@localhost:27017/my_database
129
+ launchctl setenv MONGO_COLLECTION my_collection
130
+ </string>
131
+ </array>
132
+ <key>RunAtLoad</key>
133
+ <true/>
134
+ </dict>
135
+ </plist>
136
+
137
+ $ launchctl load ~/Library/LaunchAgents/environment.plist
138
+ $ launchctl getenv MONGO_URI //try to get value.
139
+
140
+ Then start your applications.
141
+ ```
data/build.gradle CHANGED
@@ -2,6 +2,7 @@ plugins {
2
2
  id "com.jfrog.bintray" version "1.1"
3
3
  id "com.github.jruby-gradle.base" version "0.1.5"
4
4
  id "java"
5
+ id "jacoco"
5
6
  }
6
7
  import com.github.jrubygradle.JRubyExec
7
8
  repositories {
@@ -15,7 +16,7 @@ configurations {
15
16
  provided
16
17
  }
17
18
 
18
- version = "0.2.0"
19
+ version = "0.3.0"
19
20
 
20
21
  sourceCompatibility = 1.7
21
22
  targetCompatibility = 1.7
@@ -26,6 +27,8 @@ dependencies {
26
27
  compile "org.mongodb:mongo-java-driver:3.2.2"
27
28
 
28
29
  testCompile "junit:junit:4.+"
30
+ testCompile "org.embulk:embulk-core:0.8.8:tests"
31
+ testCompile "org.embulk:embulk-standards:0.8.8"
29
32
  }
30
33
 
31
34
  task classpath(type: Copy, dependsOn: ["jar"]) {
@@ -1,5 +1,6 @@
1
1
  package org.embulk.input.mongodb;
2
2
 
3
+ import com.google.common.base.Optional;
3
4
  import com.google.common.base.Throwables;
4
5
  import com.mongodb.MongoClient;
5
6
  import com.mongodb.MongoClientURI;
@@ -8,7 +9,9 @@ import com.mongodb.client.MongoCollection;
8
9
  import com.mongodb.client.MongoCursor;
9
10
  import com.mongodb.client.MongoDatabase;
10
11
  import com.mongodb.util.JSON;
11
- import org.bson.Document;
12
+ import com.mongodb.util.JSONParseException;
13
+ import org.bson.codecs.configuration.CodecRegistries;
14
+ import org.bson.codecs.configuration.CodecRegistry;
12
15
  import org.bson.conversions.Bson;
13
16
  import org.embulk.config.Config;
14
17
  import org.embulk.config.ConfigDefault;
@@ -21,18 +24,15 @@ import org.embulk.config.TaskReport;
21
24
  import org.embulk.config.TaskSource;
22
25
  import org.embulk.spi.BufferAllocator;
23
26
  import org.embulk.spi.Column;
24
- import org.embulk.spi.ColumnConfig;
25
27
  import org.embulk.spi.Exec;
26
28
  import org.embulk.spi.InputPlugin;
27
29
  import org.embulk.spi.PageBuilder;
28
30
  import org.embulk.spi.PageOutput;
29
31
  import org.embulk.spi.Schema;
30
32
  import org.embulk.spi.SchemaConfig;
31
- import org.embulk.spi.json.JsonParser;
32
- import org.embulk.spi.time.Timestamp;
33
- import org.embulk.spi.type.Type;
33
+ import org.embulk.spi.type.Types;
34
+ import org.msgpack.value.Value;
34
35
  import org.slf4j.Logger;
35
-
36
36
  import javax.validation.constraints.Min;
37
37
  import java.net.UnknownHostException;
38
38
  import java.util.List;
@@ -51,7 +51,12 @@ public class MongodbInputPlugin
51
51
  String getCollection();
52
52
 
53
53
  @Config("fields")
54
- SchemaConfig getFields();
54
+ @ConfigDefault("null")
55
+ Optional<SchemaConfig> getFields();
56
+
57
+ @Config("projection")
58
+ @ConfigDefault("\"{}\"")
59
+ String getProjection();
55
60
 
56
61
  @Config("query")
57
62
  @ConfigDefault("\"{}\"")
@@ -66,6 +71,14 @@ public class MongodbInputPlugin
66
71
  @Min(1)
67
72
  int getBatchSize();
68
73
 
74
+ @Config("stop_on_invalid_record")
75
+ @ConfigDefault("false")
76
+ boolean getStopOnInvalidRecord();
77
+
78
+ @Config("json_column_name")
79
+ @ConfigDefault("\"record\"")
80
+ String getJsonColumnName();
81
+
69
82
  @ConfigInject
70
83
  BufferAllocator getBufferAllocator();
71
84
  }
@@ -77,13 +90,21 @@ public class MongodbInputPlugin
77
90
  InputPlugin.Control control)
78
91
  {
79
92
  PluginTask task = config.loadConfig(PluginTask.class);
93
+ if (task.getFields().isPresent()) {
94
+ throw new ConfigException("field option was deprecated so setting will be ignored");
95
+ }
96
+
97
+ validateJsonField("projection", task.getProjection());
98
+ validateJsonField("query", task.getQuery());
99
+ validateJsonField("sort", task.getSort());
100
+
80
101
  // Connect once to throw ConfigException in earlier stage of excecution
81
102
  try {
82
103
  connect(task);
83
104
  } catch (UnknownHostException | MongoException ex) {
84
105
  throw new ConfigException(ex);
85
106
  }
86
- Schema schema = task.getFields().toSchema();
107
+ Schema schema = Schema.builder().add(task.getJsonColumnName(), Types.JSON).build();
87
108
  return resume(task.dump(), schema, 1, control);
88
109
  }
89
110
 
@@ -112,33 +133,39 @@ public class MongodbInputPlugin
112
133
  PluginTask task = taskSource.loadTask(PluginTask.class);
113
134
  BufferAllocator allocator = task.getBufferAllocator();
114
135
  PageBuilder pageBuilder = new PageBuilder(allocator, schema, output);
115
- JsonParser jsonParser = new JsonParser();
116
- List<Column> columns = pageBuilder.getSchema().getColumns();
136
+ final Column column = pageBuilder.getSchema().getColumns().get(0);
117
137
 
118
- MongoCollection<Document> collection;
138
+ MongoCollection<Value> collection;
119
139
  try {
120
140
  MongoDatabase db = connect(task);
121
- collection = db.getCollection(task.getCollection());
141
+
142
+ CodecRegistry registry = CodecRegistries.fromRegistries(
143
+ MongoClient.getDefaultCodecRegistry(),
144
+ CodecRegistries.fromCodecs(new ValueCodec(task.getStopOnInvalidRecord()))
145
+ );
146
+ collection = db.getCollection(task.getCollection(), Value.class)
147
+ .withCodecRegistry(registry);
122
148
  } catch (UnknownHostException | MongoException ex) {
123
149
  throw new ConfigException(ex);
124
150
  }
125
151
 
126
152
  Bson query = (Bson) JSON.parse(task.getQuery());
127
- Bson projection = getProjection(task);
153
+ Bson projection = (Bson) JSON.parse(task.getProjection());
128
154
  Bson sort = (Bson) JSON.parse(task.getSort());
129
155
 
130
156
  log.trace("query: {}", query);
131
157
  log.trace("projection: {}", projection);
132
158
  log.trace("sort: {}", sort);
133
159
 
134
- try (MongoCursor<Document> cursor = collection
160
+ try (MongoCursor<Value> cursor = collection
135
161
  .find(query)
136
162
  .projection(projection)
137
163
  .sort(sort)
138
164
  .batchSize(task.getBatchSize())
139
165
  .iterator()) {
140
166
  while (cursor.hasNext()) {
141
- fetch(cursor, pageBuilder, jsonParser, columns);
167
+ pageBuilder.setJson(column, cursor.next());
168
+ pageBuilder.addRecord();
142
169
  }
143
170
  } catch (MongoException ex) {
144
171
  Throwables.propagate(ex);
@@ -165,73 +192,11 @@ public class MongodbInputPlugin
165
192
  return db;
166
193
  }
167
194
 
168
- private void fetch(MongoCursor<Document> cursor, PageBuilder pageBuilder,
169
- JsonParser jsonParser, List<Column> columns) {
170
- Document doc = cursor.next();
171
- for (Column c : columns) {
172
- Type t = c.getType();
173
- String key = normalize(c.getName());
174
-
175
- if (!doc.containsKey(key) || doc.get(key) == null) {
176
- pageBuilder.setNull(c);
177
- } else {
178
- switch (t.getName()) {
179
- case "boolean":
180
- pageBuilder.setBoolean(c, doc.getBoolean(key));
181
- break;
182
-
183
- case "long":
184
- // MongoDB can contain both 'int' and 'long', but embulk only support 'long'
185
- // So enable handling both 'int' and 'long', first get value as java.lang.Number, then convert it to long
186
- pageBuilder.setLong(c, ((Number) doc.get(key)).longValue());
187
- break;
188
-
189
- case "double":
190
- pageBuilder.setDouble(c, ((Number) doc.get(key)).doubleValue());
191
- break;
192
-
193
- case "string":
194
- // Enable output object like ObjectId as string, this is reason I don't use doc.getString(key).
195
- pageBuilder.setString(c, doc.get(key).toString());
196
- break;
197
-
198
- case "timestamp":
199
- pageBuilder.setTimestamp(c, Timestamp.ofEpochMilli(doc.getDate(key).getTime()));
200
- break;
201
-
202
- case "json":
203
- pageBuilder.setJson(c, jsonParser.parse(((Document) doc.get(key)).toJson()));
204
- break;
205
- }
206
- }
207
- }
208
- pageBuilder.addRecord();
209
- }
210
-
211
- private Bson getProjection(PluginTask task) {
212
- SchemaConfig fields = task.getFields();
213
- StringBuilder sb = new StringBuilder("{");
214
- int l = fields.getColumnCount();
215
-
216
- for (int i = 0; i < l; i++) {
217
- ColumnConfig c = fields.getColumn(i);
218
- if (i != 0) {
219
- sb.append(",");
220
- }
221
- String key = normalize(c.getName());
222
- sb.append(key).append(":1");
223
- }
224
- sb.append("}");
225
-
226
- return (Bson) JSON.parse(sb.toString());
227
- }
228
-
229
- private String normalize(String key) {
230
- // 'id' is special alias key name of MongoDB ObjectId
231
- // http://docs.mongodb.org/manual/reference/object-id/
232
- if (key.equals("id")) {
233
- return "_id";
195
+ private void validateJsonField(String name, String jsonString) {
196
+ try {
197
+ JSON.parse(jsonString);
198
+ } catch (JSONParseException ex) {
199
+ throw new ConfigException(String.format("Invalid JSON string was given for '%s' parameter. [%s]", name, jsonString));
234
200
  }
235
- return key;
236
201
  }
237
202
  }
@@ -0,0 +1,151 @@
1
+ package org.embulk.input.mongodb;
2
+
3
+ import org.bson.BsonReader;
4
+ import org.bson.BsonType;
5
+ import org.bson.BsonWriter;
6
+ import org.bson.codecs.Codec;
7
+ import org.bson.codecs.DecoderContext;
8
+ import org.bson.codecs.EncoderContext;
9
+ import org.embulk.spi.DataException;
10
+ import org.embulk.spi.Exec;
11
+ import org.msgpack.value.Value;
12
+ import org.slf4j.Logger;
13
+
14
+ import static org.msgpack.value.ValueFactory.newArray;
15
+ import static org.msgpack.value.ValueFactory.newBinary;
16
+ import static org.msgpack.value.ValueFactory.newBoolean;
17
+ import static org.msgpack.value.ValueFactory.newFloat;
18
+ import static org.msgpack.value.ValueFactory.newInteger;
19
+ import static org.msgpack.value.ValueFactory.newMap;
20
+ import static org.msgpack.value.ValueFactory.newNil;
21
+ import static org.msgpack.value.ValueFactory.newString;
22
+
23
+ import java.text.SimpleDateFormat;
24
+ import java.util.ArrayList;
25
+ import java.util.Date;
26
+ import java.util.LinkedHashMap;
27
+ import java.util.List;
28
+ import java.util.Map;
29
+ import java.util.TimeZone;
30
+
31
+ public class ValueCodec implements Codec<Value> {
32
+ private final SimpleDateFormat formatter;
33
+ private final Logger log = Exec.getLogger(MongodbInputPlugin.class);
34
+ private final boolean stopOnInvalidRecord;
35
+
36
+ public ValueCodec(boolean stopOnInvalidRecord) {
37
+ this.formatter = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", java.util.Locale.ENGLISH);
38
+ formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
39
+ this.stopOnInvalidRecord = stopOnInvalidRecord;
40
+ }
41
+
42
+ @Override
43
+ public void encode(final BsonWriter writer, final Value value, final EncoderContext encoderContext) {
44
+ throw new UnsupportedOperationException();
45
+ }
46
+
47
+ @Override
48
+ public Value decode(final BsonReader reader, final DecoderContext decoderContext) {
49
+ Map<Value, Value> kvs = new LinkedHashMap<>();
50
+
51
+ reader.readStartDocument();
52
+ boolean isTopLevelNode = false;
53
+ while (reader.readBsonType() != BsonType.END_OF_DOCUMENT) {
54
+ String fieldName = reader.readName();
55
+ BsonType type = reader.getCurrentBsonType();
56
+ if (type == BsonType.OBJECT_ID) {
57
+ isTopLevelNode = true;
58
+ }
59
+ fieldName = normalize(fieldName, isTopLevelNode);
60
+
61
+ try {
62
+ kvs.put(newString(fieldName), readValue(reader, decoderContext));
63
+ } catch (UnknownTypeFoundException ex) {
64
+ reader.skipValue();
65
+ if (stopOnInvalidRecord) {
66
+ throw ex;
67
+ }
68
+ log.warn(String.format("Skipped document because field '%s' contains unsupported object type [%s]",
69
+ fieldName, type));
70
+ }
71
+ }
72
+ reader.readEndDocument();
73
+
74
+ return newMap(kvs);
75
+ }
76
+
77
+ public Value decodeArray(final BsonReader reader, final DecoderContext decoderContext) {
78
+ List<Value> list = new ArrayList<>();
79
+
80
+ reader.readStartArray();
81
+ while (reader.readBsonType() != BsonType.END_OF_DOCUMENT) {
82
+ list.add(readValue(reader, decoderContext));
83
+ }
84
+ reader.readEndArray();
85
+
86
+ return newArray(list);
87
+ }
88
+
89
+ private Value readValue(BsonReader reader, DecoderContext decoderContext) {
90
+ switch (reader.getCurrentBsonType()) {
91
+ // https://docs.mongodb.com/manual/reference/bson-types/
92
+ // https://github.com/mongodb/mongo-java-driver/tree/master/bson/src/main/org/bson/codecs
93
+ case DOUBLE:
94
+ return newFloat(reader.readDouble());
95
+ case STRING:
96
+ return newString(reader.readString());
97
+ case ARRAY:
98
+ return decodeArray(reader, decoderContext);
99
+ case BINARY:
100
+ return newBinary(reader.readBinaryData().getData(), true);
101
+ case OBJECT_ID:
102
+ return newString(reader.readObjectId().toString());
103
+ case BOOLEAN:
104
+ return newBoolean(reader.readBoolean());
105
+ case DATE_TIME:
106
+ return newString(formatter.format(new Date(reader.readDateTime())));
107
+ case NULL:
108
+ reader.readNull();
109
+ return newNil();
110
+ case REGULAR_EXPRESSION:
111
+ return newString(reader.readRegularExpression().toString());
112
+ case JAVASCRIPT:
113
+ return newString(reader.readJavaScript());
114
+ case JAVASCRIPT_WITH_SCOPE:
115
+ return newString(reader.readJavaScriptWithScope());
116
+ case INT32:
117
+ return newInteger(reader.readInt32());
118
+ case TIMESTAMP:
119
+ return newInteger(reader.readTimestamp().getTime());
120
+ case INT64:
121
+ return newInteger(reader.readInt64());
122
+ case DOCUMENT:
123
+ return decode(reader, decoderContext);
124
+ default: // e.g. MIN_KEY, MAX_KEY, SYMBOL, DB_POINTER, UNDEFINED
125
+ throw new UnknownTypeFoundException(String.format("Unsupported type %s of '%s' field. Please exclude the field from 'projection:' option",
126
+ reader.getCurrentBsonType(), reader.getCurrentName()));
127
+ }
128
+ }
129
+
130
+ @Override
131
+ public Class<Value> getEncoderClass() {
132
+ return Value.class;
133
+ }
134
+
135
+ private String normalize(String key, boolean isTopLevelNode) {
136
+ // 'id' is special alias key name of MongoDB ObjectId
137
+ // http://docs.mongodb.org/manual/reference/object-id/
138
+ if (key.equals("id") && isTopLevelNode) {
139
+ return "_id";
140
+ }
141
+ return key;
142
+ }
143
+
144
+ public static class UnknownTypeFoundException extends DataException
145
+ {
146
+ UnknownTypeFoundException(String message)
147
+ {
148
+ super(message);
149
+ }
150
+ }
151
+ }
@@ -1,5 +1,339 @@
1
1
  package org.embulk.input.mongodb;
2
2
 
3
- public class TestMongodbInputPlugin
4
- {
3
+ import com.fasterxml.jackson.databind.JsonNode;
4
+ import com.fasterxml.jackson.databind.ObjectMapper;
5
+ import com.google.common.collect.ImmutableList;
6
+ import com.google.common.collect.Lists;
7
+ import com.mongodb.client.MongoCollection;
8
+ import com.mongodb.client.MongoDatabase;
9
+ import org.bson.BsonBinary;
10
+ import org.bson.BsonInt64;
11
+ import org.bson.BsonJavaScript;
12
+ import org.bson.BsonMaxKey;
13
+ import org.bson.BsonRegularExpression;
14
+ import org.bson.BsonTimestamp;
15
+ import org.bson.Document;
16
+ import org.embulk.EmbulkTestRuntime;
17
+ import org.embulk.config.ConfigException;
18
+ import org.embulk.config.ConfigSource;
19
+ import org.embulk.config.TaskReport;
20
+ import org.embulk.config.TaskSource;
21
+ import org.embulk.input.mongodb.MongodbInputPlugin.PluginTask;
22
+ import org.embulk.spi.Column;
23
+ import org.embulk.spi.Exec;
24
+ import org.embulk.spi.InputPlugin;
25
+ import org.embulk.spi.Schema;
26
+ import org.embulk.spi.TestPageBuilderReader.MockPageOutput;
27
+ import org.embulk.spi.type.Types;
28
+ import org.embulk.spi.util.Pages;
29
+ import org.junit.Before;
30
+ import org.junit.BeforeClass;
31
+ import org.junit.Rule;
32
+ import org.junit.Test;
33
+ import org.junit.rules.ExpectedException;
34
+
35
+ import java.lang.reflect.InvocationTargetException;
36
+ import java.lang.reflect.Method;
37
+ import java.text.SimpleDateFormat;
38
+ import java.util.ArrayList;
39
+ import java.util.Arrays;
40
+ import java.util.List;
41
+ import java.util.Locale;
42
+
43
+ import static org.hamcrest.CoreMatchers.is;
44
+ import static org.junit.Assert.assertEquals;
45
+ import static org.junit.Assert.assertThat;
46
+
47
+ public class TestMongodbInputPlugin {
48
+ private static String MONGO_URI;
49
+ private static String MONGO_COLLECTION;
50
+
51
+ @Rule
52
+ public EmbulkTestRuntime runtime = new EmbulkTestRuntime();
53
+
54
+ @Rule
55
+ public ExpectedException exception = ExpectedException.none();
56
+
57
+ private ConfigSource config;
58
+ private MongodbInputPlugin plugin;
59
+ private MockPageOutput output;
60
+
61
+ /*
62
+ * This test case requires environment variables
63
+ * MONGO_URI
64
+ * MONGO_COLLECTION
65
+ */
66
+ @BeforeClass
67
+ public static void initializeConstant() {
68
+ MONGO_URI = System.getenv("MONGO_URI");
69
+ MONGO_COLLECTION = System.getenv("MONGO_COLLECTION");
70
+ }
71
+
72
+ @Before
73
+ public void createResources() throws Exception {
74
+ config = config();
75
+ plugin = new MongodbInputPlugin();
76
+ output = new MockPageOutput();
77
+ }
78
+
79
+ @Test
80
+ public void checkDefaultValues() {
81
+ ConfigSource config = Exec.newConfigSource()
82
+ .set("uri", MONGO_URI)
83
+ .set("collection", MONGO_COLLECTION);
84
+
85
+ PluginTask task = config.loadConfig(PluginTask.class);
86
+ assertEquals("{}", task.getQuery());
87
+ assertEquals("{}", task.getSort());
88
+ assertEquals((long) 10000, (long) task.getBatchSize());
89
+ assertEquals("record", task.getJsonColumnName());
90
+ }
91
+
92
+ @Test(expected = ConfigException.class)
93
+ public void checkDefaultValuesUriIsNull() {
94
+ ConfigSource config = Exec.newConfigSource()
95
+ .set("uri", null)
96
+ .set("collection", MONGO_COLLECTION);
97
+
98
+ plugin.transaction(config, new Control());
99
+ }
100
+
101
+ @Test(expected = ConfigException.class)
102
+ public void checkDefaultValuesInvalidUri()
103
+ {
104
+ ConfigSource config = Exec.newConfigSource()
105
+ .set("uri", "mongodb://mongouser:password@non-exists.example.com:23490/test")
106
+ .set("collection", MONGO_COLLECTION);
107
+
108
+ plugin.transaction(config, new Control());
109
+ }
110
+
111
+ @Test(expected = ConfigException.class)
112
+ public void checkDefaultValuesCollectionIsNull() {
113
+ ConfigSource config = Exec.newConfigSource()
114
+ .set("uri", MONGO_URI)
115
+ .set("collection", null);
116
+
117
+ plugin.transaction(config, new Control());
118
+ }
119
+
120
+ @Test
121
+ public void testResume() {
122
+ PluginTask task = config.loadConfig(PluginTask.class);
123
+ final Schema schema = getFieldSchema();
124
+ plugin.resume(task.dump(), schema, 0, new InputPlugin.Control() {
125
+ @Override
126
+ public List<TaskReport> run(TaskSource taskSource, Schema schema, int taskCount) {
127
+ return emptyTaskReports(taskCount);
128
+ }
129
+ });
130
+ // no errors happens
131
+ }
132
+
133
+ @Test
134
+ public void testCleanup() {
135
+ PluginTask task = config.loadConfig(PluginTask.class);
136
+ Schema schema = getFieldSchema();
137
+ plugin.cleanup(task.dump(), schema, 0, Lists.<TaskReport>newArrayList()); // no errors happens
138
+ }
139
+
140
+ @Test
141
+ public void testGuess() {
142
+ plugin.guess(config); // no errors happens
143
+ }
144
+
145
+ @Test
146
+ public void testRun() throws Exception {
147
+ PluginTask task = config.loadConfig(PluginTask.class);
148
+
149
+ dropCollection(task, MONGO_COLLECTION);
150
+ createCollection(task, MONGO_COLLECTION);
151
+ insertDocument(task, createValidDocuments());
152
+
153
+ plugin.transaction(config, new Control());
154
+ assertValidRecords(getFieldSchema(), output);
155
+ }
156
+
157
+ @Test(expected = ValueCodec.UnknownTypeFoundException.class)
158
+ public void testRunWithUnsupportedType() throws Exception {
159
+ ConfigSource config = Exec.newConfigSource()
160
+ .set("uri", MONGO_URI)
161
+ .set("collection", MONGO_COLLECTION)
162
+ .set("stop_on_invalid_record", true);
163
+
164
+ PluginTask task = config.loadConfig(PluginTask.class);
165
+
166
+ dropCollection(task, MONGO_COLLECTION);
167
+ createCollection(task, MONGO_COLLECTION);
168
+
169
+ List<Document> documents = new ArrayList<>();
170
+ documents.add(
171
+ new Document("invalid_field", new BsonMaxKey())
172
+ );
173
+ insertDocument(task, documents);
174
+
175
+ plugin.transaction(config, new Control());
176
+ }
177
+
178
+ @Test
179
+ public void testNormalize() throws Exception {
180
+ ValueCodec codec = new ValueCodec(true);
181
+
182
+ Method normalize = ValueCodec.class.getDeclaredMethod("normalize", String.class, boolean.class);
183
+ normalize.setAccessible(true);
184
+ assertEquals("_id", normalize.invoke(codec, "id", true).toString());
185
+ assertEquals("_id", normalize.invoke(codec, "_id", true).toString());
186
+ assertEquals("f1", normalize.invoke(codec, "f1", true).toString());
187
+
188
+ assertEquals("id", normalize.invoke(codec, "id", false).toString());
189
+ assertEquals("_id", normalize.invoke(codec, "_id", false).toString());
190
+ assertEquals("f1", normalize.invoke(codec, "f1", false).toString());
191
+ }
192
+
193
+ @Test
194
+ public void testValidateJsonField() throws Exception {
195
+ Method validate = MongodbInputPlugin.class.getDeclaredMethod("validateJsonField", String.class, String.class);
196
+ validate.setAccessible(true);
197
+ String invalidJsonString = "{\"name\": invalid}";
198
+ try {
199
+ validate.invoke(plugin, "name", invalidJsonString);
200
+ } catch (InvocationTargetException ex) {
201
+ assertEquals(ConfigException.class, ex.getCause().getClass());
202
+ }
203
+ }
204
+
205
+ static List<TaskReport> emptyTaskReports(int taskCount) {
206
+ ImmutableList.Builder<TaskReport> reports = new ImmutableList.Builder<>();
207
+ for (int i = 0; i < taskCount; i++) {
208
+ reports.add(Exec.newTaskReport());
209
+ }
210
+ return reports.build();
211
+ }
212
+
213
+ private class Control
214
+ implements InputPlugin.Control {
215
+ @Override
216
+ public List<TaskReport> run(TaskSource taskSource, Schema schema, int taskCount) {
217
+ List<TaskReport> reports = new ArrayList<>();
218
+ for (int i = 0; i < taskCount; i++) {
219
+ reports.add(plugin.run(taskSource, schema, i, output));
220
+ }
221
+ return reports;
222
+ }
223
+ }
224
+
225
+ private ConfigSource config() {
226
+ return Exec.newConfigSource()
227
+ .set("uri", MONGO_URI)
228
+ .set("collection", MONGO_COLLECTION)
229
+ .set("last_path", "");
230
+ }
231
+
232
+ private List<Document> createValidDocuments() throws Exception {
233
+ SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'", Locale.ENGLISH);
234
+
235
+ List<Document> documents = new ArrayList<>();
236
+ documents.add(
237
+ new Document("double_field", 1.23)
238
+ .append("string_field", "embulk")
239
+ .append("array_field", Arrays.asList(1,2,3))
240
+ .append("binary_field", new BsonBinary(("test").getBytes("UTF-8")))
241
+ .append("boolean_field", true)
242
+ .append("datetime_field", format.parse("2015-01-27T19:23:49Z"))
243
+ .append("null_field", null)
244
+ .append("regex_field", new BsonRegularExpression(".+?"))
245
+ .append("javascript_field", new BsonJavaScript("var s = \"javascript\";"))
246
+ .append("int32_field", 32864)
247
+ .append("timestamp_field", new BsonTimestamp(1463991177, 4))
248
+ .append("int64_field", new BsonInt64(314159265))
249
+ .append("document_field", new Document("k", true))
250
+ );
251
+
252
+ documents.add(
253
+ new Document("boolean_field", false)
254
+ .append("document_field", new Document("k", 1))
255
+ );
256
+
257
+ documents.add(new Document("document_field", new Document("k", 1.23)));
258
+
259
+ documents.add(new Document("document_field", new Document("k", "v")));
260
+
261
+ documents.add(new Document("document_field", new Document("k", format.parse("2015-02-03T08:13:45Z"))));
262
+
263
+ return documents;
264
+ }
265
+
266
+ private Schema getFieldSchema() {
267
+ ImmutableList.Builder<Column> columns = ImmutableList.builder();
268
+ columns.add(new Column(0, "record", Types.JSON));
269
+ return new Schema(columns.build());
270
+ }
271
+
272
+ private void assertValidRecords(Schema schema, MockPageOutput output) throws Exception {
273
+ List<Object[]> records = Pages.toObjects(schema, output.pages);
274
+ assertEquals(5, records.size());
275
+
276
+ ObjectMapper mapper = new ObjectMapper();
277
+
278
+ {
279
+ JsonNode node = mapper.readTree(records.get(0)[0].toString());
280
+ assertThat(1.23, is(node.get("double_field").asDouble()));
281
+ assertEquals("embulk", node.get("string_field").asText());
282
+ assertEquals("[1,2,3]", node.get("array_field").toString());
283
+ assertEquals("test", node.get("binary_field").asText());
284
+ assertEquals(true, node.get("boolean_field").asBoolean());
285
+ assertEquals("2015-01-27T10:23:49.000Z", node.get("datetime_field").asText());
286
+ assertEquals("null", node.get("null_field").asText());
287
+ assertEquals("BsonRegularExpression{pattern='.+?', options=''}", node.get("regex_field").asText());
288
+ assertEquals("var s = \"javascript\";", node.get("javascript_field").asText());
289
+ assertEquals(32864L, node.get("int32_field").asLong());
290
+ assertEquals("1463991177", node.get("timestamp_field").asText());
291
+ assertEquals(314159265L, node.get("int64_field").asLong());
292
+ assertEquals("{\"k\":true}", node.get("document_field").toString());
293
+ }
294
+
295
+ {
296
+ JsonNode node = mapper.readTree(records.get(1)[0].toString());
297
+ assertEquals(false, node.get("boolean_field").asBoolean());
298
+ assertEquals("{\"k\":1}", node.get("document_field").toString());
299
+ }
300
+
301
+ {
302
+ JsonNode node = mapper.readTree(records.get(2)[0].toString());
303
+ assertEquals("{\"k\":1.23}", node.get("document_field").toString());
304
+ }
305
+
306
+ {
307
+ JsonNode node = mapper.readTree(records.get(3)[0].toString());
308
+ assertEquals("{\"k\":\"v\"}", node.get("document_field").toString());
309
+ }
310
+
311
+ {
312
+ JsonNode node = mapper.readTree(records.get(4)[0].toString());
313
+ assertEquals("{\"k\":\"2015-02-02T23:13:45.000Z\"}", node.get("document_field").toString());
314
+ }
315
+ }
316
+
317
+ private void createCollection(PluginTask task, String collectionName) throws Exception {
318
+ Method method = MongodbInputPlugin.class.getDeclaredMethod("connect", PluginTask.class);
319
+ method.setAccessible(true);
320
+ MongoDatabase db = (MongoDatabase) method.invoke(plugin, task);
321
+ db.createCollection(collectionName);
322
+ }
323
+
324
+ private void dropCollection(PluginTask task, String collectionName) throws Exception {
325
+ Method method = MongodbInputPlugin.class.getDeclaredMethod("connect", PluginTask.class);
326
+ method.setAccessible(true);
327
+ MongoDatabase db = (MongoDatabase) method.invoke(plugin, task);
328
+ MongoCollection collection = db.getCollection(collectionName);
329
+ collection.drop();
330
+ }
331
+
332
+ private void insertDocument(PluginTask task, List<Document> documents) throws Exception {
333
+ Method method = MongodbInputPlugin.class.getDeclaredMethod("connect", PluginTask.class);
334
+ method.setAccessible(true);
335
+ MongoDatabase db = (MongoDatabase) method.invoke(plugin, task);
336
+ MongoCollection collection = db.getCollection(task.getCollection());
337
+ collection.insertMany(documents);
338
+ }
5
339
  }
@@ -2,9 +2,7 @@ in:
2
2
  type: mongodb
3
3
  uri: mongodb://localhost:27017/my_database
4
4
  collection: "my_collection"
5
- fields:
6
- - { name: name, type: string }
7
- - { name: rank, type: long }
5
+ projection: '{ "_id": 0, "name": 1, "rank": 1 }'
8
6
  sort: '{ rank: 1 }'
9
7
  out:
10
8
  type: file
@@ -1,10 +1,10 @@
1
- name,rank
2
- obj1,1
3
- obj2,2
4
- obj3,3
5
- obj4,4
6
- obj5,5
7
- obj6,6
8
- obj7,7
9
- obj8,8
10
- obj9,9
1
+ record
2
+ "{""name"":""obj1"",""rank"":1}"
3
+ "{""name"":""obj2"",""rank"":2}"
4
+ "{""name"":""obj3"",""rank"":3}"
5
+ "{""name"":""obj4"",""rank"":4}"
6
+ "{""name"":""obj5"",""rank"":5}"
7
+ "{""name"":""obj6"",""rank"":6}"
8
+ "{""name"":""obj7"",""rank"":7}"
9
+ "{""name"":""obj8"",""rank"":8}"
10
+ "{""name"":""obj9"",""rank"":9}"
@@ -2,13 +2,7 @@ in:
2
2
  type: mongodb
3
3
  uri: mongodb://localhost:27017/my_database
4
4
  collection: "my_collection"
5
- fields:
6
- - { name: id, type: string }
7
- - { name: name, type: string }
8
- - { name: rank, type: long }
9
- - { name: value, type: double }
10
- - { name: created_at, type: timestamp }
11
- - { name: embeded, type: json }
5
+ json_column_name: "json"
12
6
  query: '{ rank: { $gte: 3 } }'
13
7
  sort: '{ rank: -1 }'
14
8
  batch_size: 100
@@ -1,8 +1,8 @@
1
- id,name,rank,value,created_at,embeded
2
- 55eae883689a08361045d652,obj9,9,9.9,2015-09-06 10:05:18.786000 +0000,"{""key"":""value9""}"
3
- 55eae883689a08361045d651,obj8,8,8.8,2015-09-06 10:05:28.786000 +0000,"{""key"":""value8""}"
4
- 55eae883689a08361045d650,obj7,7,7.7,2015-09-06 10:05:38.786000 +0000,"{""key"":""value7""}"
5
- 55eae883689a08361045d64f,obj6,6,6.6,2015-09-06 10:05:48.786000 +0000,"{""key"":""value6""}"
6
- 55eae883689a08361045d64e,obj5,5,5.5,2015-09-06 10:05:58.786000 +0000,"{""key"":""value5""}"
7
- 55eae883689a08361045d64d,obj4,4,4.4,2015-09-06 10:06:08.786000 +0000,"{""key"":{""inner_key"":""value4""}}"
8
- 55eae883689a08361045d64c,obj3,3,3.3,2015-09-06 10:06:18.786000 +0000,"{""key"":[""v3-1"",""v3-2""]}"
1
+ json
2
+ "{""_id"":""55eae883689a08361045d652"",""name"":""obj9"",""rank"":9,""value"":9.9,""created_at"":""2015-09-06T10:05:18.786Z"",""embeded"":{""key"":""value9""}}"
3
+ "{""_id"":""55eae883689a08361045d651"",""name"":""obj8"",""rank"":8,""value"":8.8,""created_at"":""2015-09-06T10:05:28.786Z"",""embeded"":{""key"":""value8""}}"
4
+ "{""_id"":""55eae883689a08361045d650"",""name"":""obj7"",""rank"":7,""value"":7.7,""created_at"":""2015-09-06T10:05:38.786Z"",""embeded"":{""key"":""value7""}}"
5
+ "{""_id"":""55eae883689a08361045d64f"",""name"":""obj6"",""rank"":6,""value"":6.6,""created_at"":""2015-09-06T10:05:48.786Z"",""embeded"":{""key"":""value6""}}"
6
+ "{""_id"":""55eae883689a08361045d64e"",""name"":""obj5"",""rank"":5,""value"":5.5,""created_at"":""2015-09-06T10:05:58.786Z"",""embeded"":{""key"":""value5""}}"
7
+ "{""_id"":""55eae883689a08361045d64d"",""name"":""obj4"",""rank"":4,""value"":4.4,""created_at"":""2015-09-06T10:06:08.786Z"",""embeded"":{""key"":{""inner_key"":""value4""}}}"
8
+ "{""_id"":""55eae883689a08361045d64c"",""name"":""obj3"",""rank"":3,""value"":3.3,""created_at"":""2015-09-06T10:06:18.786Z"",""embeded"":{""key"":[""v3-1"",""v3-2""]}}"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-input-mongodb
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Kazuyuki Honda
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-05-30 00:00:00.000000000 Z
11
+ date: 2016-06-10 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -56,13 +56,14 @@ files:
56
56
  - gradlew.bat
57
57
  - lib/embulk/input/mongodb.rb
58
58
  - src/main/java/org/embulk/input/mongodb/MongodbInputPlugin.java
59
+ - src/main/java/org/embulk/input/mongodb/ValueCodec.java
59
60
  - src/test/java/org/embulk/input/mongodb/TestMongodbInputPlugin.java
60
61
  - src/test/resources/basic.yml
61
62
  - src/test/resources/basic_expected.csv
62
63
  - src/test/resources/full.yml
63
64
  - src/test/resources/full_expected.csv
64
65
  - src/test/resources/my_collection.jsonl
65
- - classpath/embulk-input-mongodb-0.2.0.jar
66
+ - classpath/embulk-input-mongodb-0.3.0.jar
66
67
  - classpath/mongo-java-driver-3.2.2.jar
67
68
  homepage: https://github.com/hakobera/embulk-input-mongodb
68
69
  licenses: