logstash-input-s3 3.3.7 → 3.7.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b6b2e69a2cc95f3fc7bdcc0a5b828e08a3795d88f5468795fafc4a518b7e4128
4
- data.tar.gz: 5a1d0103482e624fe8131eed9c28e636c71bb38b70980e463e1548bcb7efaff1
3
+ metadata.gz: 0b5ad2fe42e0003da67ebb3e1aad774f5bab3cb2afc07e4699fbfa5e84c0d472
4
+ data.tar.gz: fb8691a6276181e17a2c80319247d04f11c8d547affff1b109f5447547eca4be
5
5
  SHA512:
6
- metadata.gz: 7cb50c30b2acdd5f8da2602346b0fc9b3ef82f4dd5d1649e7cff8e7d746c28ef0e997cb8db8c1821639e3e8b8001ef19e2c8a0557d5b8a6f37ea9b5a27e451f8
7
- data.tar.gz: 41be7059efac9cd05b2379a5035cc11104bd7c9716939ac9a2c63aa94ee7bbf79832c7219492ce5576987d36bae038c530af64c11e021c89dee47aa53ea4bf51
6
+ metadata.gz: 3782d59c63fe4e53cb01dff9c7a431773fc69126dfc17c90ea45b8018205897dfabd8d942a732c37cb1740a4c924608110e81955205734523a9f5d1a928bed5d
7
+ data.tar.gz: 3ff5255afa6d8a978a2231bc69a3c11dd9f354a90895e779be569971aaa75c34c456d35b979effbb0304801160cb15ce5f81b5967e121133d79e0a48f59638b7
data/CHANGELOG.md CHANGED
@@ -1,3 +1,28 @@
1
+ ## 3.7.0
2
+ - Add ECS support. [#228](https://github.com/logstash-plugins/logstash-input-s3/pull/228)
3
+ - Fix missing file in cutoff time change. [#224](https://github.com/logstash-plugins/logstash-input-s3/pull/224)
4
+
5
+ ## 3.6.0
6
+ - Fixed unprocessed file with the same `last_modified` in ingestion. [#220](https://github.com/logstash-plugins/logstash-input-s3/pull/220)
7
+
8
+ ## 3.5.2
9
+ - [DOC]Added note that only AWS S3 is supported. No other S3 compatible storage solutions are supported. [#208](https://github.com/logstash-plugins/logstash-input-s3/issues/208)
10
+
11
+ ## 3.5.1
12
+ - [DOC]Added example for `exclude_pattern` and reordered option descriptions [#204](https://github.com/logstash-plugins/logstash-input-s3/issues/204)
13
+
14
+ ## 3.5.0
15
+ - Added support for including objects restored from Glacier or Glacier Deep [#199](https://github.com/logstash-plugins/logstash-input-s3/issues/199)
16
+ - Added `gzip_pattern` option, enabling more flexible determination of whether a file is gzipped [#165](https://github.com/logstash-plugins/logstash-input-s3/issues/165)
17
+ - Refactor: log exception: class + unify logging messages a bit [#201](https://github.com/logstash-plugins/logstash-input-s3/pull/201)
18
+
19
+ ## 3.4.1
20
+ - Fixed link formatting for input type (documentation)
21
+
22
+ ## 3.4.0
23
+ - Skips objects that are archived to AWS Glacier with a helpful log message (previously they would log as matched, but then fail to load events) [#160](https://github.com/logstash-plugins/logstash-input-s3/pull/160)
24
+ - Added `watch_for_new_files` option, enabling single-batch imports [#159](https://github.com/logstash-plugins/logstash-input-s3/pull/159)
25
+
1
26
  ## 3.3.7
2
27
  - Added ability to optionally include S3 object properties inside @metadata [#155](https://github.com/logstash-plugins/logstash-input-s3/pull/155)
3
28
 
data/LICENSE CHANGED
@@ -1,13 +1,202 @@
1
- Copyright (c) 2012-2018 Elasticsearch <http://www.elastic.co>
2
1
 
3
- Licensed under the Apache License, Version 2.0 (the "License");
4
- you may not use this file except in compliance with the License.
5
- You may obtain a copy of the License at
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
6
5
 
7
- http://www.apache.org/licenses/LICENSE-2.0
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
8
7
 
9
- Unless required by applicable law or agreed to in writing, software
10
- distributed under the License is distributed on an "AS IS" BASIS,
11
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
- See the License for the specific language governing permissions and
13
- limitations under the License.
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright 2020 Elastic and contributors
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Logstash Plugin
2
2
 
3
- [![Travis Build Status](https://travis-ci.org/logstash-plugins/logstash-input-s3.svg)](https://travis-ci.org/logstash-plugins/logstash-input-s3)
3
+ [![Travis Build Status](https://travis-ci.com/logstash-plugins/logstash-input-s3.svg)](https://travis-ci.com/logstash-plugins/logstash-input-s3)
4
4
 
5
5
  This is a plugin for [Logstash](https://github.com/elastic/logstash).
6
6
 
@@ -38,7 +38,7 @@ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/log
38
38
 
39
39
  ## Developing
40
40
 
41
- ### 1. Plugin Developement and Testing
41
+ ### 1. Plugin Development and Testing
42
42
 
43
43
  #### Code
44
44
  - To get started, you'll need JRuby with the Bundler gem installed.
data/docs/index.asciidoc CHANGED
@@ -23,9 +23,29 @@ include::{include_path}/plugin_header.asciidoc[]
23
23
 
24
24
  Stream events from files from a S3 bucket.
25
25
 
26
+ IMPORTANT: The S3 input plugin only supports AWS S3.
27
+ Other S3 compatible storage solutions are not supported.
28
+
26
29
  Each line from each file generates an event.
27
30
  Files ending in `.gz` are handled as gzip'ed files.
28
31
 
32
+ Files that are archived to AWS Glacier will be skipped.
33
+
34
+ [id="plugins-{type}s-{plugin}-ecs_metadata"]
35
+ ==== Event Metadata and the Elastic Common Schema (ECS)
36
+ This plugin adds cloudfront metadata to event.
37
+ When ECS compatibility is disabled, the value is stored in the root level.
38
+ When ECS is enabled, the value is stored in the `@metadata` where it can be used by other plugins in your pipeline.
39
+
40
+ Here’s how ECS compatibility mode affects output.
41
+ [cols="<l,<l,e,<e"]
42
+ |=======================================================================
43
+ | ECS disabled | ECS v1 | Availability | Description
44
+
45
+ | cloudfront_fields | [@metadata][s3][cloudfront][fields] | available when the file is a CloudFront log | column names of log
46
+ | cloudfront_version | [@metadata][s3][cloudfront][version] | available when the file is a CloudFront log | version of log
47
+ |=======================================================================
48
+
29
49
  [id="plugins-{type}s-{plugin}-options"]
30
50
  ==== S3 Input Configuration Options
31
51
 
@@ -42,8 +62,10 @@ This plugin supports the following configuration options plus the <<plugins-{typ
42
62
  | <<plugins-{type}s-{plugin}-backup_to_dir>> |<<string,string>>|No
43
63
  | <<plugins-{type}s-{plugin}-bucket>> |<<string,string>>|Yes
44
64
  | <<plugins-{type}s-{plugin}-delete>> |<<boolean,boolean>>|No
65
+ | <<plugins-{type}s-{plugin}-ecs_compatibility>> |<<string,string>>|No
45
66
  | <<plugins-{type}s-{plugin}-endpoint>> |<<string,string>>|No
46
67
  | <<plugins-{type}s-{plugin}-exclude_pattern>> |<<string,string>>|No
68
+ | <<plugins-{type}s-{plugin}-gzip_pattern>> |<<string,string>>|No
47
69
  | <<plugins-{type}s-{plugin}-include_object_properties>> |<<boolean,boolean>>|No
48
70
  | <<plugins-{type}s-{plugin}-interval>> |<<number,number>>|No
49
71
  | <<plugins-{type}s-{plugin}-prefix>> |<<string,string>>|No
@@ -55,6 +77,7 @@ This plugin supports the following configuration options plus the <<plugins-{typ
55
77
  | <<plugins-{type}s-{plugin}-session_token>> |<<string,string>>|No
56
78
  | <<plugins-{type}s-{plugin}-sincedb_path>> |<<string,string>>|No
57
79
  | <<plugins-{type}s-{plugin}-temporary_directory>> |<<string,string>>|No
80
+ | <<plugins-{type}s-{plugin}-watch_for_new_files>> |<<boolean,boolean>>|No
58
81
  |=======================================================================
59
82
 
60
83
  Also see <<plugins-{type}s-{plugin}-common-options>> for a list of options supported by all
@@ -76,6 +99,29 @@ This plugin uses the AWS SDK and supports several ways to get credentials, which
76
99
  4. Environment variables `AMAZON_ACCESS_KEY_ID` and `AMAZON_SECRET_ACCESS_KEY`
77
100
  5. IAM Instance Profile (available when running inside EC2)
78
101
 
102
+
103
+ [id="plugins-{type}s-{plugin}-additional_settings"]
104
+ ===== `additional_settings`
105
+
106
+ * Value type is <<hash,hash>>
107
+ * Default value is `{}`
108
+
109
+ Key-value pairs of settings and corresponding values used to parametrize
110
+ the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html[the AWS SDK documentation]. Example:
111
+
112
+ [source,ruby]
113
+ input {
114
+ s3 {
115
+ access_key_id => "1234"
116
+ secret_access_key => "secret"
117
+ bucket => "logstash-test"
118
+ additional_settings => {
119
+ force_path_style => true
120
+ follow_redirects => false
121
+ }
122
+ }
123
+ }
124
+
79
125
  [id="plugins-{type}s-{plugin}-aws_credentials_file"]
80
126
  ===== `aws_credentials_file`
81
127
 
@@ -137,6 +183,18 @@ The name of the S3 bucket.
137
183
 
138
184
  Whether to delete processed files from the original bucket.
139
185
 
186
+ [id="plugins-{type}s-{plugin}-ecs_compatibility"]
187
+ ===== `ecs_compatibility`
188
+
189
+ * Value type is <<string,string>>
190
+ * Supported values are:
191
+ ** `disabled`: does not use ECS-compatible field names
192
+ ** `v1`: uses metadata fields that are compatible with Elastic Common Schema
193
+
194
+ Controls this plugin's compatibility with the
195
+ {ecs-ref}[Elastic Common Schema (ECS)].
196
+ See <<plugins-{type}s-{plugin}-ecs_metadata>> for detailed information.
197
+
140
198
  [id="plugins-{type}s-{plugin}-endpoint"]
141
199
  ===== `endpoint`
142
200
 
@@ -153,29 +211,28 @@ guaranteed to work correctly with the AWS SDK.
153
211
  * Value type is <<string,string>>
154
212
  * Default value is `nil`
155
213
 
156
- Ruby style regexp of keys to exclude from the bucket
157
-
158
- [id="plugins-{type}s-{plugin}-additional_settings"]
159
- ===== `additional_settings`
214
+ Ruby style regexp of keys to exclude from the bucket.
160
215
 
161
- * Value type is <<hash,hash>>
162
- * Default value is `{}`
216
+ Note that files matching the pattern are skipped _after_ they have been listed.
217
+ Consider using <<plugins-{type}s-{plugin}-prefix>> instead where possible.
163
218
 
164
- Key-value pairs of settings and corresponding values used to parametrize
165
- the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html[the AWS SDK documentation]. Example:
219
+ Example:
166
220
 
167
221
  [source,ruby]
168
- input {
169
- s3 {
170
- "access_key_id" => "1234"
171
- "secret_access_key" => "secret"
172
- "bucket" => "logstash-test"
173
- "additional_settings" => {
174
- "force_path_style" => true
175
- "follow_redirects" => false
176
- }
177
- }
178
- }
222
+ -----
223
+ "exclude_pattern" => "\/2020\/04\/"
224
+ -----
225
+
226
+ This pattern excludes all logs containing "/2020/04/" in the path.
227
+
228
+
229
+ [id="plugins-{type}s-{plugin}-gzip_pattern"]
230
+ ===== `gzip_pattern`
231
+
232
+ * Value type is <<string,string>>
233
+ * Default value is `"\.gz(ip)?$"`
234
+
235
+ Regular expression used to determine whether an input file is in gzip format.
179
236
 
180
237
  [id="plugins-{type}s-{plugin}-include_object_properties"]
181
238
  ===== `include_object_properties`
@@ -184,7 +241,7 @@ the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/ap
184
241
  * Default value is `false`
185
242
 
186
243
  Whether or not to include the S3 object's properties (last_modified, content_type, metadata) into each Event at
187
- `[@metadata][s3]`. Regardless of this setting, `[@metdata][s3][key]` will always be present.
244
+ `[@metadata][s3]`. Regardless of this setting, `[@metadata][s3][key]` will always be present.
188
245
 
189
246
  [id="plugins-{type}s-{plugin}-interval"]
190
247
  ===== `interval`
@@ -273,7 +330,14 @@ If specified, this setting must be a filename path and not just a directory.
273
330
 
274
331
  Set the directory where logstash will store the tmp files before processing them.
275
332
 
333
+ [id="plugins-{type}s-{plugin}-watch_for_new_files"]
334
+ ===== `watch_for_new_files`
335
+
336
+ * Value type is <<boolean,boolean>>
337
+ * Default value is `true`
276
338
 
339
+ Whether or not to watch for new files.
340
+ Disabling this option causes the input to close itself after processing the files from a single listing.
277
341
 
278
342
  [id="plugins-{type}s-{plugin}-common-options"]
279
343
  include::{include_path}/{type}.asciidoc[]
@@ -3,19 +3,15 @@ require "logstash/inputs/base"
3
3
  require "logstash/namespace"
4
4
  require "logstash/plugin_mixins/aws_config"
5
5
  require "time"
6
+ require "date"
6
7
  require "tmpdir"
7
8
  require "stud/interval"
8
9
  require "stud/temporary"
9
10
  require "aws-sdk"
10
11
  require "logstash/inputs/s3/patch"
12
+ require "logstash/plugin_mixins/ecs_compatibility_support"
11
13
 
12
14
  require 'java'
13
- java_import java.io.InputStream
14
- java_import java.io.InputStreamReader
15
- java_import java.io.FileInputStream
16
- java_import java.io.BufferedReader
17
- java_import java.util.zip.GZIPInputStream
18
- java_import java.util.zip.ZipException
19
15
 
20
16
  Aws.eager_autoload!
21
17
  # Stream events from files from a S3 bucket.
@@ -23,7 +19,16 @@ Aws.eager_autoload!
23
19
  # Each line from each file generates an event.
24
20
  # Files ending in `.gz` are handled as gzip'ed files.
25
21
  class LogStash::Inputs::S3 < LogStash::Inputs::Base
22
+
23
+ java_import java.io.InputStream
24
+ java_import java.io.InputStreamReader
25
+ java_import java.io.FileInputStream
26
+ java_import java.io.BufferedReader
27
+ java_import java.util.zip.GZIPInputStream
28
+ java_import java.util.zip.ZipException
29
+
26
30
  include LogStash::PluginMixins::AwsConfig::V2
31
+ include LogStash::PluginMixins::ECSCompatibilitySupport(:disabled, :v1)
27
32
 
28
33
  config_name "s3"
29
34
 
@@ -63,6 +68,10 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
63
68
  # Value is in seconds.
64
69
  config :interval, :validate => :number, :default => 60
65
70
 
71
+ # Whether to watch for new files with the interval.
72
+ # If false, overrides any interval and only lists the s3 bucket once.
73
+ config :watch_for_new_files, :validate => :boolean, :default => true
74
+
66
75
  # Ruby style regexp of keys to exclude from the bucket
67
76
  config :exclude_pattern, :validate => :string, :default => nil
68
77
 
@@ -75,13 +84,24 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
75
84
  # be present.
76
85
  config :include_object_properties, :validate => :boolean, :default => false
77
86
 
78
- public
87
+ # Regular expression used to determine whether an input file is in gzip format.
88
+ # default to an expression that matches *.gz and *.gzip file extensions
89
+ config :gzip_pattern, :validate => :string, :default => "\.gz(ip)?$"
90
+
91
+ CUTOFF_SECOND = 3
92
+
93
+ def initialize(*params)
94
+ super
95
+ @cloudfront_fields_key = ecs_select[disabled: 'cloudfront_fields', v1: '[@metadata][s3][cloudfront][fields]']
96
+ @cloudfront_version_key = ecs_select[disabled: 'cloudfront_version', v1: '[@metadata][s3][cloudfront][version]']
97
+ end
98
+
79
99
  def register
80
100
  require "fileutils"
81
101
  require "digest/md5"
82
102
  require "aws-sdk-resources"
83
103
 
84
- @logger.info("Registering s3 input", :bucket => @bucket, :region => @region)
104
+ @logger.info("Registering", :bucket => @bucket, :region => @region)
85
105
 
86
106
  s3 = get_s3object
87
107
 
@@ -101,42 +121,50 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
101
121
  end
102
122
 
103
123
  FileUtils.mkdir_p(@temporary_directory) unless Dir.exist?(@temporary_directory)
124
+
125
+ if !@watch_for_new_files && original_params.include?('interval')
126
+ logger.warn("`watch_for_new_files` has been disabled; `interval` directive will be ignored.")
127
+ end
104
128
  end
105
129
 
106
- public
107
130
  def run(queue)
108
131
  @current_thread = Thread.current
109
132
  Stud.interval(@interval) do
110
133
  process_files(queue)
134
+ stop unless @watch_for_new_files
111
135
  end
112
136
  end # def run
113
137
 
114
- public
115
138
  def list_new_files
116
- objects = {}
139
+ objects = []
117
140
  found = false
141
+ current_time = Time.now
118
142
  begin
119
143
  @s3bucket.objects(:prefix => @prefix).each do |log|
120
144
  found = true
121
- @logger.debug("S3 input: Found key", :key => log.key)
122
- if !ignore_filename?(log.key)
123
- if sincedb.newer?(log.last_modified) && log.content_length > 0
124
- objects[log.key] = log.last_modified
125
- @logger.debug("S3 input: Adding to objects[]", :key => log.key)
126
- @logger.debug("objects[] length is: ", :length => objects.length)
127
- end
145
+ @logger.debug('Found key', :key => log.key)
146
+ if ignore_filename?(log.key)
147
+ @logger.debug('Ignoring', :key => log.key)
148
+ elsif log.content_length <= 0
149
+ @logger.debug('Object Zero Length', :key => log.key)
150
+ elsif !sincedb.newer?(log.last_modified)
151
+ @logger.debug('Object Not Modified', :key => log.key)
152
+ elsif log.last_modified > (current_time - CUTOFF_SECOND).utc # file modified within last two seconds will be processed in next cycle
153
+ @logger.debug('Object Modified After Cutoff Time', :key => log.key)
154
+ elsif (log.storage_class == 'GLACIER' || log.storage_class == 'DEEP_ARCHIVE') && !file_restored?(log.object)
155
+ @logger.debug('Object Archived to Glacier', :key => log.key)
128
156
  else
129
- @logger.debug('S3 input: Ignoring', :key => log.key)
157
+ objects << log
158
+ @logger.debug("Added to objects[]", :key => log.key, :length => objects.length)
130
159
  end
131
160
  end
132
- @logger.info('S3 input: No files found in bucket', :prefix => prefix) unless found
161
+ @logger.info('No files found in bucket', :prefix => prefix) unless found
133
162
  rescue Aws::Errors::ServiceError => e
134
- @logger.error("S3 input: Unable to list objects in bucket", :prefix => prefix, :message => e.message)
163
+ @logger.error("Unable to list objects in bucket", :exception => e.class, :message => e.message, :backtrace => e.backtrace, :prefix => prefix)
135
164
  end
136
- objects.keys.sort {|a,b| objects[a] <=> objects[b]}
165
+ objects.sort_by { |log| log.last_modified }
137
166
  end # def fetch_new_files
138
167
 
139
- public
140
168
  def backup_to_bucket(object)
141
169
  unless @backup_to_bucket.nil?
142
170
  backup_key = "#{@backup_add_prefix}#{object.key}"
@@ -147,28 +175,24 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
147
175
  end
148
176
  end
149
177
 
150
- public
151
178
  def backup_to_dir(filename)
152
179
  unless @backup_to_dir.nil?
153
180
  FileUtils.cp(filename, @backup_to_dir)
154
181
  end
155
182
  end
156
183
 
157
- public
158
184
  def process_files(queue)
159
185
  objects = list_new_files
160
186
 
161
- objects.each do |key|
187
+ objects.each do |log|
162
188
  if stop?
163
189
  break
164
190
  else
165
- @logger.debug("S3 input processing", :bucket => @bucket, :key => key)
166
- process_log(queue, key)
191
+ process_log(queue, log)
167
192
  end
168
193
  end
169
194
  end # def process_files
170
195
 
171
- public
172
196
  def stop
173
197
  # @current_thread is initialized in the `#run` method,
174
198
  # this variable is needed because the `#stop` is a called in another thread
@@ -212,9 +236,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
212
236
  else
213
237
  decorate(event)
214
238
 
215
- event.set("cloudfront_version", metadata[:cloudfront_version]) unless metadata[:cloudfront_version].nil?
216
- event.set("cloudfront_fields", metadata[:cloudfront_fields]) unless metadata[:cloudfront_fields].nil?
217
-
218
239
  if @include_object_properties
219
240
  event.set("[@metadata][s3]", object.data.to_h)
220
241
  else
@@ -222,6 +243,8 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
222
243
  end
223
244
 
224
245
  event.set("[@metadata][s3][key]", object.key)
246
+ event.set(@cloudfront_version_key, metadata[:cloudfront_version]) unless metadata[:cloudfront_version].nil?
247
+ event.set(@cloudfront_fields_key, metadata[:cloudfront_fields]) unless metadata[:cloudfront_fields].nil?
225
248
 
226
249
  queue << event
227
250
  end
@@ -235,24 +258,20 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
235
258
  return true
236
259
  end # def process_local_log
237
260
 
238
- private
239
261
  def event_is_metadata?(event)
240
262
  return false unless event.get("message").class == String
241
263
  line = event.get("message")
242
264
  version_metadata?(line) || fields_metadata?(line)
243
265
  end
244
266
 
245
- private
246
267
  def version_metadata?(line)
247
268
  line.start_with?('#Version: ')
248
269
  end
249
270
 
250
- private
251
271
  def fields_metadata?(line)
252
272
  line.start_with?('#Fields: ')
253
273
  end
254
274
 
255
- private
256
275
  def update_metadata(metadata, event)
257
276
  line = event.get('message').strip
258
277
 
@@ -265,7 +284,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
265
284
  end
266
285
  end
267
286
 
268
- private
269
287
  def read_file(filename, &block)
270
288
  if gzip?(filename)
271
289
  read_gzip_file(filename, block)
@@ -274,7 +292,7 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
274
292
  end
275
293
  rescue => e
276
294
  # skip any broken file
277
- @logger.error("Failed to read the file. Skip processing.", :filename => filename, :exception => e.message)
295
+ @logger.error("Failed to read file, processing skipped", :exception => e.class, :message => e.message, :filename => filename)
278
296
  end
279
297
 
280
298
  def read_plain_file(filename, block)
@@ -283,7 +301,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
283
301
  end
284
302
  end
285
303
 
286
- private
287
304
  def read_gzip_file(filename, block)
288
305
  file_stream = FileInputStream.new(filename)
289
306
  gzip_stream = GZIPInputStream.new(file_stream)
@@ -300,24 +317,20 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
300
317
  file_stream.close unless file_stream.nil?
301
318
  end
302
319
 
303
- private
304
320
  def gzip?(filename)
305
- filename.end_with?('.gz','.gzip')
321
+ Regexp.new(@gzip_pattern).match(filename)
306
322
  end
307
-
308
- private
309
- def sincedb
323
+
324
+ def sincedb
310
325
  @sincedb ||= if @sincedb_path.nil?
311
326
  @logger.info("Using default generated file for the sincedb", :filename => sincedb_file)
312
327
  SinceDB::File.new(sincedb_file)
313
328
  else
314
- @logger.info("Using the provided sincedb_path",
315
- :sincedb_path => @sincedb_path)
329
+ @logger.info("Using the provided sincedb_path", :sincedb_path => @sincedb_path)
316
330
  SinceDB::File.new(@sincedb_path)
317
331
  end
318
332
  end
319
333
 
320
- private
321
334
  def sincedb_file
322
335
  digest = Digest::MD5.hexdigest("#{@bucket}+#{@prefix}")
323
336
  dir = File.join(LogStash::SETTINGS.get_value("path.data"), "plugins", "inputs", "s3")
@@ -350,11 +363,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
350
363
  symbolized
351
364
  end
352
365
 
353
- private
354
- def old_sincedb_file
355
- end
356
-
357
- private
358
366
  def ignore_filename?(filename)
359
367
  if @prefix == filename
360
368
  return true
@@ -371,26 +379,28 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
371
379
  end
372
380
  end
373
381
 
374
- private
375
- def process_log(queue, key)
376
- object = @s3bucket.object(key)
382
+ def process_log(queue, log)
383
+ @logger.debug("Processing", :bucket => @bucket, :key => log.key)
384
+ object = @s3bucket.object(log.key)
377
385
 
378
- filename = File.join(temporary_directory, File.basename(key))
386
+ filename = File.join(temporary_directory, File.basename(log.key))
379
387
  if download_remote_file(object, filename)
380
388
  if process_local_log(queue, filename, object)
381
- lastmod = object.last_modified
382
- backup_to_bucket(object)
383
- backup_to_dir(filename)
384
- delete_file_from_bucket(object)
385
- FileUtils.remove_entry_secure(filename, true)
386
- sincedb.write(lastmod)
389
+ if object.last_modified == log.last_modified
390
+ backup_to_bucket(object)
391
+ backup_to_dir(filename)
392
+ delete_file_from_bucket(object)
393
+ FileUtils.remove_entry_secure(filename, true)
394
+ sincedb.write(log.last_modified)
395
+ else
396
+ @logger.info("#{log.key} is updated at #{object.last_modified} and will process in the next cycle")
397
+ end
387
398
  end
388
399
  else
389
400
  FileUtils.remove_entry_secure(filename, true)
390
401
  end
391
402
  end
392
403
 
393
- private
394
404
  # Stream the remove file to the local disk
395
405
  #
396
406
  # @param [S3Object] Reference to the remove S3 objec to download
@@ -398,33 +408,48 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
398
408
  # @return [Boolean] True if the file was completely downloaded
399
409
  def download_remote_file(remote_object, local_filename)
400
410
  completed = false
401
- @logger.debug("S3 input: Download remote file", :remote_key => remote_object.key, :local_filename => local_filename)
411
+ @logger.debug("Downloading remote file", :remote_key => remote_object.key, :local_filename => local_filename)
402
412
  File.open(local_filename, 'wb') do |s3file|
403
413
  return completed if stop?
404
414
  begin
405
415
  remote_object.get(:response_target => s3file)
406
416
  completed = true
407
417
  rescue Aws::Errors::ServiceError => e
408
- @logger.warn("S3 input: Unable to download remote file", :remote_key => remote_object.key, :message => e.message)
418
+ @logger.warn("Unable to download remote file", :exception => e.class, :message => e.message, :remote_key => remote_object.key)
409
419
  end
410
420
  end
411
421
  completed
412
422
  end
413
423
 
414
- private
415
424
  def delete_file_from_bucket(object)
416
425
  if @delete and @backup_to_bucket.nil?
417
426
  object.delete()
418
427
  end
419
428
  end
420
429
 
421
- private
422
430
  def get_s3object
423
431
  options = symbolized_settings.merge(aws_options_hash || {})
424
432
  s3 = Aws::S3::Resource.new(options)
425
433
  end
426
434
 
427
- private
435
+ def file_restored?(object)
436
+ begin
437
+ restore = object.data.restore
438
+ if restore && restore.match(/ongoing-request\s?=\s?["']false["']/)
439
+ if restore = restore.match(/expiry-date\s?=\s?["'](.*?)["']/)
440
+ expiry_date = DateTime.parse(restore[1])
441
+ return true if DateTime.now < expiry_date # restored
442
+ else
443
+ @logger.debug("No expiry-date header for restore request: #{object.data.restore}")
444
+ return nil # no expiry-date found for ongoing request
445
+ end
446
+ end
447
+ rescue => e
448
+ @logger.debug("Could not determine Glacier restore status", :exception => e.class, :message => e.message)
449
+ end
450
+ return false
451
+ end
452
+
428
453
  module SinceDB
429
454
  class File
430
455
  def initialize(file)