logstash-input-s3 3.4.1 → 3.8.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 71fa5a89a8d744412ab8ac15f60f4d7e59ff2800de976466450de5142962285e
4
- data.tar.gz: 795c0caf4c280c90d4193aacacd8b09d64f33637859b3eda35349366862be9ba
3
+ metadata.gz: 98dffffbe8df111d10d5be8d650c47ecb5306128217388a513e29ebe6a312a15
4
+ data.tar.gz: 7941b6588ff4b448004140818898b09f98637405aa6cc5446a5040fb856c96ba
5
5
  SHA512:
6
- metadata.gz: 27be2ecb1234ba44fb4004b0a972d9cb643e9429df468d1777f189f3f207ce849d95b5655077343960edf2c4817254d4eb5ff9fb73c87afb99e0ce35c64e0f38
7
- data.tar.gz: 243140b50837ed67fe8e30997f560e41cc66b7f9da3fd4c6668bb6345cedd911ed9d18c5deb07c594bd24a8caa6c4efc11a3c500e66656e782343b05405d40c6
6
+ metadata.gz: 4f76ea76ea076bce90ee89519543e35b4845cfea30987228bd0f435f965b8e71eddd0a03a3cb51486c5dc78ec76220acf033f2d92ff90dea40d206bf2368d838
7
+ data.tar.gz: 0cba659c39202f51d3e8d5ffc8053d8c027a158677d1da16bcc4ac90b935eb6c1f9b3093f310a07afae79fb61cc2d49c335a5cf9b3f2ab3062f0f60c3af1047f
data/CHANGELOG.md CHANGED
@@ -1,3 +1,24 @@
1
+ ## 3.8.0
2
+ - Add ECS v8 support.
3
+
4
+ ## 3.7.0
5
+ - Add ECS support. [#228](https://github.com/logstash-plugins/logstash-input-s3/pull/228)
6
+ - Fix missing file in cutoff time change. [#224](https://github.com/logstash-plugins/logstash-input-s3/pull/224)
7
+
8
+ ## 3.6.0
9
+ - Fixed unprocessed file with the same `last_modified` in ingestion. [#220](https://github.com/logstash-plugins/logstash-input-s3/pull/220)
10
+
11
+ ## 3.5.2
12
+ - [DOC]Added note that only AWS S3 is supported. No other S3 compatible storage solutions are supported. [#208](https://github.com/logstash-plugins/logstash-input-s3/issues/208)
13
+
14
+ ## 3.5.1
15
+ - [DOC]Added example for `exclude_pattern` and reordered option descriptions [#204](https://github.com/logstash-plugins/logstash-input-s3/issues/204)
16
+
17
+ ## 3.5.0
18
+ - Added support for including objects restored from Glacier or Glacier Deep [#199](https://github.com/logstash-plugins/logstash-input-s3/issues/199)
19
+ - Added `gzip_pattern` option, enabling more flexible determination of whether a file is gzipped [#165](https://github.com/logstash-plugins/logstash-input-s3/issues/165)
20
+ - Refactor: log exception: class + unify logging messages a bit [#201](https://github.com/logstash-plugins/logstash-input-s3/pull/201)
21
+
1
22
  ## 3.4.1
2
23
  - Fixed link formatting for input type (documentation)
3
24
 
data/LICENSE CHANGED
@@ -1,13 +1,202 @@
1
- Copyright (c) 2012-2018 Elasticsearch <http://www.elastic.co>
2
1
 
3
- Licensed under the Apache License, Version 2.0 (the "License");
4
- you may not use this file except in compliance with the License.
5
- You may obtain a copy of the License at
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
6
5
 
7
- http://www.apache.org/licenses/LICENSE-2.0
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
8
7
 
9
- Unless required by applicable law or agreed to in writing, software
10
- distributed under the License is distributed on an "AS IS" BASIS,
11
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
- See the License for the specific language governing permissions and
13
- limitations under the License.
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright 2020 Elastic and contributors
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Logstash Plugin
2
2
 
3
- [![Travis Build Status](https://travis-ci.org/logstash-plugins/logstash-input-s3.svg)](https://travis-ci.org/logstash-plugins/logstash-input-s3)
3
+ [![Travis Build Status](https://travis-ci.com/logstash-plugins/logstash-input-s3.svg)](https://travis-ci.com/logstash-plugins/logstash-input-s3)
4
4
 
5
5
  This is a plugin for [Logstash](https://github.com/elastic/logstash).
6
6
 
@@ -38,7 +38,7 @@ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/log
38
38
 
39
39
  ## Developing
40
40
 
41
- ### 1. Plugin Developement and Testing
41
+ ### 1. Plugin Development and Testing
42
42
 
43
43
  #### Code
44
44
  - To get started, you'll need JRuby with the Bundler gem installed.
data/docs/index.asciidoc CHANGED
@@ -23,11 +23,29 @@ include::{include_path}/plugin_header.asciidoc[]
23
23
 
24
24
  Stream events from files from a S3 bucket.
25
25
 
26
+ IMPORTANT: The S3 input plugin only supports AWS S3.
27
+ Other S3 compatible storage solutions are not supported.
28
+
26
29
  Each line from each file generates an event.
27
30
  Files ending in `.gz` are handled as gzip'ed files.
28
31
 
29
32
  Files that are archived to AWS Glacier will be skipped.
30
33
 
34
+ [id="plugins-{type}s-{plugin}-ecs_metadata"]
35
+ ==== Event Metadata and the Elastic Common Schema (ECS)
36
+ This plugin adds cloudfront metadata to event.
37
+ When ECS compatibility is disabled, the value is stored in the root level.
38
+ When ECS is enabled, the value is stored in the `@metadata` where it can be used by other plugins in your pipeline.
39
+
40
+ Here’s how ECS compatibility mode affects output.
41
+ [cols="<l,<l,e,<e"]
42
+ |=======================================================================
43
+ | ECS disabled | ECS v1 | Availability | Description
44
+
45
+ | cloudfront_fields | [@metadata][s3][cloudfront][fields] | available when the file is a CloudFront log | column names of log
46
+ | cloudfront_version | [@metadata][s3][cloudfront][version] | available when the file is a CloudFront log | version of log
47
+ |=======================================================================
48
+
31
49
  [id="plugins-{type}s-{plugin}-options"]
32
50
  ==== S3 Input Configuration Options
33
51
 
@@ -44,8 +62,10 @@ This plugin supports the following configuration options plus the <<plugins-{typ
44
62
  | <<plugins-{type}s-{plugin}-backup_to_dir>> |<<string,string>>|No
45
63
  | <<plugins-{type}s-{plugin}-bucket>> |<<string,string>>|Yes
46
64
  | <<plugins-{type}s-{plugin}-delete>> |<<boolean,boolean>>|No
65
+ | <<plugins-{type}s-{plugin}-ecs_compatibility>> |<<string,string>>|No
47
66
  | <<plugins-{type}s-{plugin}-endpoint>> |<<string,string>>|No
48
67
  | <<plugins-{type}s-{plugin}-exclude_pattern>> |<<string,string>>|No
68
+ | <<plugins-{type}s-{plugin}-gzip_pattern>> |<<string,string>>|No
49
69
  | <<plugins-{type}s-{plugin}-include_object_properties>> |<<boolean,boolean>>|No
50
70
  | <<plugins-{type}s-{plugin}-interval>> |<<number,number>>|No
51
71
  | <<plugins-{type}s-{plugin}-prefix>> |<<string,string>>|No
@@ -79,6 +99,29 @@ This plugin uses the AWS SDK and supports several ways to get credentials, which
79
99
  4. Environment variables `AMAZON_ACCESS_KEY_ID` and `AMAZON_SECRET_ACCESS_KEY`
80
100
  5. IAM Instance Profile (available when running inside EC2)
81
101
 
102
+
103
+ [id="plugins-{type}s-{plugin}-additional_settings"]
104
+ ===== `additional_settings`
105
+
106
+ * Value type is <<hash,hash>>
107
+ * Default value is `{}`
108
+
109
+ Key-value pairs of settings and corresponding values used to parametrize
110
+ the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html[the AWS SDK documentation]. Example:
111
+
112
+ [source,ruby]
113
+ input {
114
+ s3 {
115
+ access_key_id => "1234"
116
+ secret_access_key => "secret"
117
+ bucket => "logstash-test"
118
+ additional_settings => {
119
+ force_path_style => true
120
+ follow_redirects => false
121
+ }
122
+ }
123
+ }
124
+
82
125
  [id="plugins-{type}s-{plugin}-aws_credentials_file"]
83
126
  ===== `aws_credentials_file`
84
127
 
@@ -140,6 +183,18 @@ The name of the S3 bucket.
140
183
 
141
184
  Whether to delete processed files from the original bucket.
142
185
 
186
+ [id="plugins-{type}s-{plugin}-ecs_compatibility"]
187
+ ===== `ecs_compatibility`
188
+
189
+ * Value type is <<string,string>>
190
+ * Supported values are:
191
+ ** `disabled`: does not use ECS-compatible field names
192
+ ** `v1`,`v8`: uses metadata fields that are compatible with Elastic Common Schema
193
+
194
+ Controls this plugin's compatibility with the
195
+ {ecs-ref}[Elastic Common Schema (ECS)].
196
+ See <<plugins-{type}s-{plugin}-ecs_metadata>> for detailed information.
197
+
143
198
  [id="plugins-{type}s-{plugin}-endpoint"]
144
199
  ===== `endpoint`
145
200
 
@@ -156,29 +211,28 @@ guaranteed to work correctly with the AWS SDK.
156
211
  * Value type is <<string,string>>
157
212
  * Default value is `nil`
158
213
 
159
- Ruby style regexp of keys to exclude from the bucket
214
+ Ruby style regexp of keys to exclude from the bucket.
160
215
 
161
- [id="plugins-{type}s-{plugin}-additional_settings"]
162
- ===== `additional_settings`
216
+ Note that files matching the pattern are skipped _after_ they have been listed.
217
+ Consider using <<plugins-{type}s-{plugin}-prefix>> instead where possible.
163
218
 
164
- * Value type is <<hash,hash>>
165
- * Default value is `{}`
166
-
167
- Key-value pairs of settings and corresponding values used to parametrize
168
- the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html[the AWS SDK documentation]. Example:
219
+ Example:
169
220
 
170
221
  [source,ruby]
171
- input {
172
- s3 {
173
- "access_key_id" => "1234"
174
- "secret_access_key" => "secret"
175
- "bucket" => "logstash-test"
176
- "additional_settings" => {
177
- "force_path_style" => true
178
- "follow_redirects" => false
179
- }
180
- }
181
- }
222
+ -----
223
+ "exclude_pattern" => "\/2020\/04\/"
224
+ -----
225
+
226
+ This pattern excludes all logs containing "/2020/04/" in the path.
227
+
228
+
229
+ [id="plugins-{type}s-{plugin}-gzip_pattern"]
230
+ ===== `gzip_pattern`
231
+
232
+ * Value type is <<string,string>>
233
+ * Default value is `"\.gz(ip)?$"`
234
+
235
+ Regular expression used to determine whether an input file is in gzip format.
182
236
 
183
237
  [id="plugins-{type}s-{plugin}-include_object_properties"]
184
238
  ===== `include_object_properties`
@@ -187,7 +241,7 @@ the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/ap
187
241
  * Default value is `false`
188
242
 
189
243
  Whether or not to include the S3 object's properties (last_modified, content_type, metadata) into each Event at
190
- `[@metadata][s3]`. Regardless of this setting, `[@metdata][s3][key]` will always be present.
244
+ `[@metadata][s3]`. Regardless of this setting, `[@metadata][s3][key]` will always be present.
191
245
 
192
246
  [id="plugins-{type}s-{plugin}-interval"]
193
247
  ===== `interval`
@@ -3,19 +3,15 @@ require "logstash/inputs/base"
3
3
  require "logstash/namespace"
4
4
  require "logstash/plugin_mixins/aws_config"
5
5
  require "time"
6
+ require "date"
6
7
  require "tmpdir"
7
8
  require "stud/interval"
8
9
  require "stud/temporary"
9
10
  require "aws-sdk"
10
11
  require "logstash/inputs/s3/patch"
12
+ require "logstash/plugin_mixins/ecs_compatibility_support"
11
13
 
12
14
  require 'java'
13
- java_import java.io.InputStream
14
- java_import java.io.InputStreamReader
15
- java_import java.io.FileInputStream
16
- java_import java.io.BufferedReader
17
- java_import java.util.zip.GZIPInputStream
18
- java_import java.util.zip.ZipException
19
15
 
20
16
  Aws.eager_autoload!
21
17
  # Stream events from files from a S3 bucket.
@@ -23,7 +19,16 @@ Aws.eager_autoload!
23
19
  # Each line from each file generates an event.
24
20
  # Files ending in `.gz` are handled as gzip'ed files.
25
21
  class LogStash::Inputs::S3 < LogStash::Inputs::Base
22
+
23
+ java_import java.io.InputStream
24
+ java_import java.io.InputStreamReader
25
+ java_import java.io.FileInputStream
26
+ java_import java.io.BufferedReader
27
+ java_import java.util.zip.GZIPInputStream
28
+ java_import java.util.zip.ZipException
29
+
26
30
  include LogStash::PluginMixins::AwsConfig::V2
31
+ include LogStash::PluginMixins::ECSCompatibilitySupport(:disabled, :v1, :v8 => :v1)
27
32
 
28
33
  config_name "s3"
29
34
 
@@ -63,7 +68,7 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
63
68
  # Value is in seconds.
64
69
  config :interval, :validate => :number, :default => 60
65
70
 
66
- # Whether to watch for new files with the interval.
71
+ # Whether to watch for new files with the interval.
67
72
  # If false, overrides any interval and only lists the s3 bucket once.
68
73
  config :watch_for_new_files, :validate => :boolean, :default => true
69
74
 
@@ -79,13 +84,24 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
79
84
  # be present.
80
85
  config :include_object_properties, :validate => :boolean, :default => false
81
86
 
82
- public
87
+ # Regular expression used to determine whether an input file is in gzip format.
88
+ # default to an expression that matches *.gz and *.gzip file extensions
89
+ config :gzip_pattern, :validate => :string, :default => "\.gz(ip)?$"
90
+
91
+ CUTOFF_SECOND = 3
92
+
93
+ def initialize(*params)
94
+ super
95
+ @cloudfront_fields_key = ecs_select[disabled: 'cloudfront_fields', v1: '[@metadata][s3][cloudfront][fields]']
96
+ @cloudfront_version_key = ecs_select[disabled: 'cloudfront_version', v1: '[@metadata][s3][cloudfront][version]']
97
+ end
98
+
83
99
  def register
84
100
  require "fileutils"
85
101
  require "digest/md5"
86
102
  require "aws-sdk-resources"
87
103
 
88
- @logger.info("Registering s3 input", :bucket => @bucket, :region => @region)
104
+ @logger.info("Registering", :bucket => @bucket, :region => @region)
89
105
 
90
106
  s3 = get_s3object
91
107
 
@@ -111,7 +127,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
111
127
  end
112
128
  end
113
129
 
114
- public
115
130
  def run(queue)
116
131
  @current_thread = Thread.current
117
132
  Stud.interval(@interval) do
@@ -120,36 +135,36 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
120
135
  end
121
136
  end # def run
122
137
 
123
- public
124
138
  def list_new_files
125
- objects = {}
139
+ objects = []
126
140
  found = false
141
+ current_time = Time.now
127
142
  begin
128
143
  @s3bucket.objects(:prefix => @prefix).each do |log|
129
144
  found = true
130
- @logger.debug("S3 input: Found key", :key => log.key)
145
+ @logger.debug('Found key', :key => log.key)
131
146
  if ignore_filename?(log.key)
132
- @logger.debug('S3 input: Ignoring', :key => log.key)
147
+ @logger.debug('Ignoring', :key => log.key)
133
148
  elsif log.content_length <= 0
134
- @logger.debug('S3 Input: Object Zero Length', :key => log.key)
149
+ @logger.debug('Object Zero Length', :key => log.key)
135
150
  elsif !sincedb.newer?(log.last_modified)
136
- @logger.debug('S3 Input: Object Not Modified', :key => log.key)
137
- elsif log.storage_class.start_with?('GLACIER')
138
- @logger.debug('S3 Input: Object Archived to Glacier', :key => log.key)
151
+ @logger.debug('Object Not Modified', :key => log.key)
152
+ elsif log.last_modified > (current_time - CUTOFF_SECOND).utc # file modified within last two seconds will be processed in next cycle
153
+ @logger.debug('Object Modified After Cutoff Time', :key => log.key)
154
+ elsif (log.storage_class == 'GLACIER' || log.storage_class == 'DEEP_ARCHIVE') && !file_restored?(log.object)
155
+ @logger.debug('Object Archived to Glacier', :key => log.key)
139
156
  else
140
- objects[log.key] = log.last_modified
141
- @logger.debug("S3 input: Adding to objects[]", :key => log.key)
142
- @logger.debug("objects[] length is: ", :length => objects.length)
157
+ objects << log
158
+ @logger.debug("Added to objects[]", :key => log.key, :length => objects.length)
143
159
  end
144
160
  end
145
- @logger.info('S3 input: No files found in bucket', :prefix => prefix) unless found
161
+ @logger.info('No files found in bucket', :prefix => prefix) unless found
146
162
  rescue Aws::Errors::ServiceError => e
147
- @logger.error("S3 input: Unable to list objects in bucket", :prefix => prefix, :message => e.message)
163
+ @logger.error("Unable to list objects in bucket", :exception => e.class, :message => e.message, :backtrace => e.backtrace, :prefix => prefix)
148
164
  end
149
- objects.keys.sort {|a,b| objects[a] <=> objects[b]}
165
+ objects.sort_by { |log| log.last_modified }
150
166
  end # def fetch_new_files
151
167
 
152
- public
153
168
  def backup_to_bucket(object)
154
169
  unless @backup_to_bucket.nil?
155
170
  backup_key = "#{@backup_add_prefix}#{object.key}"
@@ -160,28 +175,24 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
160
175
  end
161
176
  end
162
177
 
163
- public
164
178
  def backup_to_dir(filename)
165
179
  unless @backup_to_dir.nil?
166
180
  FileUtils.cp(filename, @backup_to_dir)
167
181
  end
168
182
  end
169
183
 
170
- public
171
184
  def process_files(queue)
172
185
  objects = list_new_files
173
186
 
174
- objects.each do |key|
187
+ objects.each do |log|
175
188
  if stop?
176
189
  break
177
190
  else
178
- @logger.debug("S3 input processing", :bucket => @bucket, :key => key)
179
- process_log(queue, key)
191
+ process_log(queue, log)
180
192
  end
181
193
  end
182
194
  end # def process_files
183
195
 
184
- public
185
196
  def stop
186
197
  # @current_thread is initialized in the `#run` method,
187
198
  # this variable is needed because the `#stop` is a called in another thread
@@ -225,9 +236,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
225
236
  else
226
237
  decorate(event)
227
238
 
228
- event.set("cloudfront_version", metadata[:cloudfront_version]) unless metadata[:cloudfront_version].nil?
229
- event.set("cloudfront_fields", metadata[:cloudfront_fields]) unless metadata[:cloudfront_fields].nil?
230
-
231
239
  if @include_object_properties
232
240
  event.set("[@metadata][s3]", object.data.to_h)
233
241
  else
@@ -235,6 +243,8 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
235
243
  end
236
244
 
237
245
  event.set("[@metadata][s3][key]", object.key)
246
+ event.set(@cloudfront_version_key, metadata[:cloudfront_version]) unless metadata[:cloudfront_version].nil?
247
+ event.set(@cloudfront_fields_key, metadata[:cloudfront_fields]) unless metadata[:cloudfront_fields].nil?
238
248
 
239
249
  queue << event
240
250
  end
@@ -248,24 +258,20 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
248
258
  return true
249
259
  end # def process_local_log
250
260
 
251
- private
252
261
  def event_is_metadata?(event)
253
262
  return false unless event.get("message").class == String
254
263
  line = event.get("message")
255
264
  version_metadata?(line) || fields_metadata?(line)
256
265
  end
257
266
 
258
- private
259
267
  def version_metadata?(line)
260
268
  line.start_with?('#Version: ')
261
269
  end
262
270
 
263
- private
264
271
  def fields_metadata?(line)
265
272
  line.start_with?('#Fields: ')
266
273
  end
267
274
 
268
- private
269
275
  def update_metadata(metadata, event)
270
276
  line = event.get('message').strip
271
277
 
@@ -278,7 +284,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
278
284
  end
279
285
  end
280
286
 
281
- private
282
287
  def read_file(filename, &block)
283
288
  if gzip?(filename)
284
289
  read_gzip_file(filename, block)
@@ -287,7 +292,7 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
287
292
  end
288
293
  rescue => e
289
294
  # skip any broken file
290
- @logger.error("Failed to read the file. Skip processing.", :filename => filename, :exception => e.message)
295
+ @logger.error("Failed to read file, processing skipped", :exception => e.class, :message => e.message, :filename => filename)
291
296
  end
292
297
 
293
298
  def read_plain_file(filename, block)
@@ -296,7 +301,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
296
301
  end
297
302
  end
298
303
 
299
- private
300
304
  def read_gzip_file(filename, block)
301
305
  file_stream = FileInputStream.new(filename)
302
306
  gzip_stream = GZIPInputStream.new(file_stream)
@@ -313,24 +317,20 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
313
317
  file_stream.close unless file_stream.nil?
314
318
  end
315
319
 
316
- private
317
320
  def gzip?(filename)
318
- filename.end_with?('.gz','.gzip')
321
+ Regexp.new(@gzip_pattern).match(filename)
319
322
  end
320
-
321
- private
322
- def sincedb
323
+
324
+ def sincedb
323
325
  @sincedb ||= if @sincedb_path.nil?
324
326
  @logger.info("Using default generated file for the sincedb", :filename => sincedb_file)
325
327
  SinceDB::File.new(sincedb_file)
326
328
  else
327
- @logger.info("Using the provided sincedb_path",
328
- :sincedb_path => @sincedb_path)
329
+ @logger.info("Using the provided sincedb_path", :sincedb_path => @sincedb_path)
329
330
  SinceDB::File.new(@sincedb_path)
330
331
  end
331
332
  end
332
333
 
333
- private
334
334
  def sincedb_file
335
335
  digest = Digest::MD5.hexdigest("#{@bucket}+#{@prefix}")
336
336
  dir = File.join(LogStash::SETTINGS.get_value("path.data"), "plugins", "inputs", "s3")
@@ -363,11 +363,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
363
363
  symbolized
364
364
  end
365
365
 
366
- private
367
- def old_sincedb_file
368
- end
369
-
370
- private
371
366
  def ignore_filename?(filename)
372
367
  if @prefix == filename
373
368
  return true
@@ -384,26 +379,28 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
384
379
  end
385
380
  end
386
381
 
387
- private
388
- def process_log(queue, key)
389
- object = @s3bucket.object(key)
382
+ def process_log(queue, log)
383
+ @logger.debug("Processing", :bucket => @bucket, :key => log.key)
384
+ object = @s3bucket.object(log.key)
390
385
 
391
- filename = File.join(temporary_directory, File.basename(key))
386
+ filename = File.join(temporary_directory, File.basename(log.key))
392
387
  if download_remote_file(object, filename)
393
388
  if process_local_log(queue, filename, object)
394
- lastmod = object.last_modified
395
- backup_to_bucket(object)
396
- backup_to_dir(filename)
397
- delete_file_from_bucket(object)
398
- FileUtils.remove_entry_secure(filename, true)
399
- sincedb.write(lastmod)
389
+ if object.last_modified == log.last_modified
390
+ backup_to_bucket(object)
391
+ backup_to_dir(filename)
392
+ delete_file_from_bucket(object)
393
+ FileUtils.remove_entry_secure(filename, true)
394
+ sincedb.write(log.last_modified)
395
+ else
396
+ @logger.info("#{log.key} is updated at #{object.last_modified} and will process in the next cycle")
397
+ end
400
398
  end
401
399
  else
402
400
  FileUtils.remove_entry_secure(filename, true)
403
401
  end
404
402
  end
405
403
 
406
- private
407
404
  # Stream the remove file to the local disk
408
405
  #
409
406
  # @param [S3Object] Reference to the remove S3 objec to download
@@ -411,33 +408,48 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
411
408
  # @return [Boolean] True if the file was completely downloaded
412
409
  def download_remote_file(remote_object, local_filename)
413
410
  completed = false
414
- @logger.debug("S3 input: Download remote file", :remote_key => remote_object.key, :local_filename => local_filename)
411
+ @logger.debug("Downloading remote file", :remote_key => remote_object.key, :local_filename => local_filename)
415
412
  File.open(local_filename, 'wb') do |s3file|
416
413
  return completed if stop?
417
414
  begin
418
415
  remote_object.get(:response_target => s3file)
419
416
  completed = true
420
417
  rescue Aws::Errors::ServiceError => e
421
- @logger.warn("S3 input: Unable to download remote file", :remote_key => remote_object.key, :message => e.message)
418
+ @logger.warn("Unable to download remote file", :exception => e.class, :message => e.message, :remote_key => remote_object.key)
422
419
  end
423
420
  end
424
421
  completed
425
422
  end
426
423
 
427
- private
428
424
  def delete_file_from_bucket(object)
429
425
  if @delete and @backup_to_bucket.nil?
430
426
  object.delete()
431
427
  end
432
428
  end
433
429
 
434
- private
435
430
  def get_s3object
436
431
  options = symbolized_settings.merge(aws_options_hash || {})
437
432
  s3 = Aws::S3::Resource.new(options)
438
433
  end
439
434
 
440
- private
435
+ def file_restored?(object)
436
+ begin
437
+ restore = object.data.restore
438
+ if restore && restore.match(/ongoing-request\s?=\s?["']false["']/)
439
+ if restore = restore.match(/expiry-date\s?=\s?["'](.*?)["']/)
440
+ expiry_date = DateTime.parse(restore[1])
441
+ return true if DateTime.now < expiry_date # restored
442
+ else
443
+ @logger.debug("No expiry-date header for restore request: #{object.data.restore}")
444
+ return nil # no expiry-date found for ongoing request
445
+ end
446
+ end
447
+ rescue => e
448
+ @logger.debug("Could not determine Glacier restore status", :exception => e.class, :message => e.message)
449
+ end
450
+ return false
451
+ end
452
+
441
453
  module SinceDB
442
454
  class File
443
455
  def initialize(file)
@@ -1,7 +1,7 @@
1
1
  Gem::Specification.new do |s|
2
2
 
3
3
  s.name = 'logstash-input-s3'
4
- s.version = '3.4.1'
4
+ s.version = '3.8.0'
5
5
  s.licenses = ['Apache-2.0']
6
6
  s.summary = "Streams events from files in a S3 bucket"
7
7
  s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
@@ -27,4 +27,5 @@ Gem::Specification.new do |s|
27
27
  s.add_development_dependency 'logstash-devutils'
28
28
  s.add_development_dependency "logstash-codec-json"
29
29
  s.add_development_dependency "logstash-codec-multiline"
30
+ s.add_runtime_dependency 'logstash-mixin-ecs_compatibility_support', '~>1.2'
30
31
  end
@@ -1,5 +1,6 @@
1
1
  # encoding: utf-8
2
2
  require "logstash/devutils/rspec/spec_helper"
3
+ require "logstash/devutils/rspec/shared_examples"
3
4
  require "logstash/inputs/s3"
4
5
  require "logstash/codecs/multiline"
5
6
  require "logstash/errors"
@@ -8,6 +9,7 @@ require_relative "../support/helpers"
8
9
  require "stud/temporary"
9
10
  require "aws-sdk"
10
11
  require "fileutils"
12
+ require 'logstash/plugin_mixins/ecs_compatibility_support/spec_helper'
11
13
 
12
14
  describe LogStash::Inputs::S3 do
13
15
  let(:temporary_directory) { Stud::Temporary.pathname }
@@ -23,6 +25,7 @@ describe LogStash::Inputs::S3 do
23
25
  "sincedb_path" => File.join(sincedb_path, ".sincedb")
24
26
  }
25
27
  }
28
+ let(:cutoff) { LogStash::Inputs::S3::CUTOFF_SECOND }
26
29
 
27
30
 
28
31
  before do
@@ -32,13 +35,16 @@ describe LogStash::Inputs::S3 do
32
35
  end
33
36
 
34
37
  context "when interrupting the plugin" do
35
- let(:config) { super.merge({ "interval" => 5 }) }
38
+ let(:config) { super().merge({ "interval" => 5 }) }
39
+ let(:s3_obj) { double(:key => "awesome-key", :last_modified => Time.now.round, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) }
36
40
 
37
41
  before do
38
- expect_any_instance_of(LogStash::Inputs::S3).to receive(:list_new_files).and_return(TestInfiniteS3Object.new)
42
+ expect_any_instance_of(LogStash::Inputs::S3).to receive(:list_new_files).and_return(TestInfiniteS3Object.new(s3_obj))
39
43
  end
40
44
 
41
- it_behaves_like "an interruptible input plugin"
45
+ it_behaves_like "an interruptible input plugin" do
46
+ let(:allowed_lag) { 16 } if LOGSTASH_VERSION.split('.').first.to_i <= 6
47
+ end
42
48
  end
43
49
 
44
50
  describe "#register" do
@@ -114,14 +120,21 @@ describe LogStash::Inputs::S3 do
114
120
  describe "#list_new_files" do
115
121
  before { allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { objects_list } }
116
122
 
117
- let!(:present_object) { double(:key => 'this-should-be-present', :last_modified => Time.now, :content_length => 10, :storage_class => 'STANDARD') }
118
- let!(:archived_object) {double(:key => 'this-should-be-archived', :last_modified => Time.now, :content_length => 10, :storage_class => 'GLACIER') }
123
+ let!(:present_object_after_cutoff) {double(:key => 'this-should-not-be-present', :last_modified => Time.now, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) }
124
+ let!(:present_object) {double(:key => 'this-should-be-present', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) }
125
+ let!(:archived_object) {double(:key => 'this-should-be-archived', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => nil)) ) }
126
+ let!(:deep_archived_object) {double(:key => 'this-should-be-archived', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => nil)) ) }
127
+ let!(:restored_object) {double(:key => 'this-should-be-restored-from-archive', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => 'ongoing-request="false", expiry-date="Thu, 01 Jan 2099 00:00:00 GMT"')) ) }
128
+ let!(:deep_restored_object) {double(:key => 'this-should-be-restored-from-deep-archive', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'DEEP_ARCHIVE', :object => double(:data => double(:restore => 'ongoing-request="false", expiry-date="Thu, 01 Jan 2099 00:00:00 GMT"')) ) }
119
129
  let(:objects_list) {
120
130
  [
121
131
  double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100, :storage_class => 'STANDARD'),
122
132
  double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50, :storage_class => 'STANDARD'),
123
133
  archived_object,
124
- present_object
134
+ restored_object,
135
+ deep_restored_object,
136
+ present_object,
137
+ present_object_after_cutoff
125
138
  ]
126
139
  }
127
140
 
@@ -129,24 +142,32 @@ describe LogStash::Inputs::S3 do
129
142
  plugin = LogStash::Inputs::S3.new(config.merge({ "exclude_pattern" => "^exclude" }))
130
143
  plugin.register
131
144
 
132
- files = plugin.list_new_files
145
+ files = plugin.list_new_files.map { |item| item.key }
133
146
  expect(files).to include(present_object.key)
147
+ expect(files).to include(restored_object.key)
148
+ expect(files).to include(deep_restored_object.key)
134
149
  expect(files).to_not include('exclude-this-file-1') # matches exclude pattern
135
150
  expect(files).to_not include('exclude/logstash') # matches exclude pattern
136
151
  expect(files).to_not include(archived_object.key) # archived
137
- expect(files.size).to eq(1)
152
+ expect(files).to_not include(deep_archived_object.key) # archived
153
+ expect(files).to_not include(present_object_after_cutoff.key) # after cutoff
154
+ expect(files.size).to eq(3)
138
155
  end
139
156
 
140
157
  it 'should support not providing a exclude pattern' do
141
158
  plugin = LogStash::Inputs::S3.new(config)
142
159
  plugin.register
143
160
 
144
- files = plugin.list_new_files
161
+ files = plugin.list_new_files.map { |item| item.key }
145
162
  expect(files).to include(present_object.key)
163
+ expect(files).to include(restored_object.key)
164
+ expect(files).to include(deep_restored_object.key)
146
165
  expect(files).to include('exclude-this-file-1') # no exclude pattern given
147
166
  expect(files).to include('exclude/logstash') # no exclude pattern given
148
167
  expect(files).to_not include(archived_object.key) # archived
149
- expect(files.size).to eq(3)
168
+ expect(files).to_not include(deep_archived_object.key) # archived
169
+ expect(files).to_not include(present_object_after_cutoff.key) # after cutoff
170
+ expect(files.size).to eq(5)
150
171
  end
151
172
 
152
173
  context 'when all files are excluded from a bucket' do
@@ -192,7 +213,7 @@ describe LogStash::Inputs::S3 do
192
213
  'backup_to_bucket' => config['bucket']}))
193
214
  plugin.register
194
215
 
195
- files = plugin.list_new_files
216
+ files = plugin.list_new_files.map { |item| item.key }
196
217
  expect(files).to include(present_object.key)
197
218
  expect(files).to_not include('mybackup-log-1') # matches backup prefix
198
219
  expect(files.size).to eq(1)
@@ -206,12 +227,16 @@ describe LogStash::Inputs::S3 do
206
227
  allow_any_instance_of(LogStash::Inputs::S3::SinceDB::File).to receive(:read).and_return(Time.now - day)
207
228
  plugin.register
208
229
 
209
- files = plugin.list_new_files
230
+ files = plugin.list_new_files.map { |item| item.key }
210
231
  expect(files).to include(present_object.key)
232
+ expect(files).to include(restored_object.key)
233
+ expect(files).to include(deep_restored_object.key)
211
234
  expect(files).to_not include('exclude-this-file-1') # too old
212
235
  expect(files).to_not include('exclude/logstash') # too old
213
236
  expect(files).to_not include(archived_object.key) # archived
214
- expect(files.size).to eq(1)
237
+ expect(files).to_not include(deep_archived_object.key) # archived
238
+ expect(files).to_not include(present_object_after_cutoff.key) # after cutoff
239
+ expect(files.size).to eq(3)
215
240
  end
216
241
 
217
242
  it 'should ignore file if the file match the prefix' do
@@ -226,13 +251,14 @@ describe LogStash::Inputs::S3 do
226
251
 
227
252
  plugin = LogStash::Inputs::S3.new(config.merge({ 'prefix' => prefix }))
228
253
  plugin.register
229
- expect(plugin.list_new_files).to eq([present_object.key])
254
+ expect(plugin.list_new_files.map { |item| item.key }).to eq([present_object.key])
230
255
  end
231
256
 
232
257
  it 'should sort return object sorted by last_modification date with older first' do
233
258
  objects = [
234
259
  double(:key => 'YESTERDAY', :last_modified => Time.now - day, :content_length => 5, :storage_class => 'STANDARD'),
235
260
  double(:key => 'TODAY', :last_modified => Time.now, :content_length => 5, :storage_class => 'STANDARD'),
261
+ double(:key => 'TODAY_BEFORE_CUTOFF', :last_modified => Time.now - cutoff, :content_length => 5, :storage_class => 'STANDARD'),
236
262
  double(:key => 'TWO_DAYS_AGO', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD')
237
263
  ]
238
264
 
@@ -241,7 +267,7 @@ describe LogStash::Inputs::S3 do
241
267
 
242
268
  plugin = LogStash::Inputs::S3.new(config)
243
269
  plugin.register
244
- expect(plugin.list_new_files).to eq(['TWO_DAYS_AGO', 'YESTERDAY', 'TODAY'])
270
+ expect(plugin.list_new_files.map { |item| item.key }).to eq(['TWO_DAYS_AGO', 'YESTERDAY', 'TODAY_BEFORE_CUTOFF'])
245
271
  end
246
272
 
247
273
  describe "when doing backup on the s3" do
@@ -301,7 +327,7 @@ describe LogStash::Inputs::S3 do
301
327
  it 'should process events' do
302
328
  events = fetch_events(config)
303
329
  expect(events.size).to eq(events_to_process)
304
- insist { events[0].get("[@metadata][s3][key]") } == log.key
330
+ expect(events[0].get("[@metadata][s3][key]")).to eql log.key
305
331
  end
306
332
 
307
333
  it "deletes the temporary file" do
@@ -420,7 +446,7 @@ describe LogStash::Inputs::S3 do
420
446
  let(:events_to_process) { 16 }
421
447
  end
422
448
  end
423
-
449
+
424
450
  context 'compressed' do
425
451
  let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
426
452
  let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gz') }
@@ -428,13 +454,20 @@ describe LogStash::Inputs::S3 do
428
454
  include_examples "generated events"
429
455
  end
430
456
 
431
- context 'compressed with gzip extension' do
457
+ context 'compressed with gzip extension and using default gzip_pattern option' do
432
458
  let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
433
459
  let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gzip') }
434
460
 
435
461
  include_examples "generated events"
436
462
  end
437
463
 
464
+ context 'compressed with gzip extension and using custom gzip_pattern option' do
465
+ let(:config) { super().merge({ "gzip_pattern" => "gee.zip$" }) }
466
+ let(:log) { double(:key => 'log.gee.zip', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
467
+ let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gee.zip') }
468
+ include_examples "generated events"
469
+ end
470
+
438
471
  context 'plain text' do
439
472
  let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') }
440
473
 
@@ -464,12 +497,20 @@ describe LogStash::Inputs::S3 do
464
497
  context 'cloudfront' do
465
498
  let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'cloudfront.log') }
466
499
 
467
- it 'should extract metadata from cloudfront log' do
468
- events = fetch_events(config)
500
+ describe "metadata", :ecs_compatibility_support, :aggregate_failures do
501
+ ecs_compatibility_matrix(:disabled, :v1) do |ecs_select|
502
+ before(:each) do
503
+ allow_any_instance_of(described_class).to receive(:ecs_compatibility).and_return(ecs_compatibility)
504
+ end
469
505
 
470
- events.each do |event|
471
- expect(event.get('cloudfront_fields')).to eq('date time x-edge-location c-ip x-event sc-bytes x-cf-status x-cf-client-id cs-uri-stem cs-uri-query c-referrer x-page-url​ c-user-agent x-sname x-sname-query x-file-ext x-sid')
472
- expect(event.get('cloudfront_version')).to eq('1.0')
506
+ it 'should extract metadata from cloudfront log' do
507
+ events = fetch_events(config)
508
+
509
+ events.each do |event|
510
+ expect(event.get ecs_select[disabled: "cloudfront_fields", v1: "[@metadata][s3][cloudfront][fields]"] ).to eq('date time x-edge-location c-ip x-event sc-bytes x-cf-status x-cf-client-id cs-uri-stem cs-uri-query c-referrer x-page-url​ c-user-agent x-sname x-sname-query x-file-ext x-sid')
511
+ expect(event.get ecs_select[disabled: "cloudfront_version", v1: "[@metadata][s3][cloudfront][version]"] ).to eq('1.0')
512
+ end
513
+ end
473
514
  end
474
515
  end
475
516
 
@@ -477,7 +518,7 @@ describe LogStash::Inputs::S3 do
477
518
  end
478
519
 
479
520
  context 'when include_object_properties is set to true' do
480
- let(:config) { super.merge({ "include_object_properties" => true }) }
521
+ let(:config) { super().merge({ "include_object_properties" => true }) }
481
522
  let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') }
482
523
 
483
524
  it 'should extract object properties onto [@metadata][s3]' do
@@ -491,7 +532,7 @@ describe LogStash::Inputs::S3 do
491
532
  end
492
533
 
493
534
  context 'when include_object_properties is set to false' do
494
- let(:config) { super.merge({ "include_object_properties" => false }) }
535
+ let(:config) { super().merge({ "include_object_properties" => false }) }
495
536
  let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') }
496
537
 
497
538
  it 'should NOT extract object properties onto [@metadata][s3]' do
@@ -503,6 +544,67 @@ describe LogStash::Inputs::S3 do
503
544
 
504
545
  include_examples "generated events"
505
546
  end
547
+ end
548
+
549
+ describe "data loss" do
550
+ let(:s3_plugin) { LogStash::Inputs::S3.new(config) }
551
+ let(:queue) { [] }
552
+
553
+ before do
554
+ s3_plugin.register
555
+ end
506
556
 
557
+ context 'events come after cutoff time' do
558
+ it 'should be processed in next cycle' do
559
+ s3_objects = [
560
+ double(:key => 'TWO_DAYS_AGO', :last_modified => Time.now.round - 2 * day, :content_length => 5, :storage_class => 'STANDARD'),
561
+ double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'),
562
+ double(:key => 'TODAY_BEFORE_CUTOFF', :last_modified => Time.now.round - cutoff, :content_length => 5, :storage_class => 'STANDARD'),
563
+ double(:key => 'TODAY', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD'),
564
+ double(:key => 'TODAY', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD')
565
+ ]
566
+ size = s3_objects.length
567
+
568
+ allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { s3_objects }
569
+ allow_any_instance_of(Aws::S3::Bucket).to receive(:object).and_return(*s3_objects)
570
+ expect(s3_plugin).to receive(:process_log).at_least(size).and_call_original
571
+ expect(s3_plugin).to receive(:stop?).and_return(false).at_least(size)
572
+ expect(s3_plugin).to receive(:download_remote_file).and_return(true).at_least(size)
573
+ expect(s3_plugin).to receive(:process_local_log).and_return(true).at_least(size)
574
+
575
+ # first iteration
576
+ s3_plugin.process_files(queue)
577
+
578
+ # second iteration
579
+ sleep(cutoff + 1)
580
+ s3_plugin.process_files(queue)
581
+ end
582
+ end
583
+
584
+ context 's3 object updated after getting summary' do
585
+ it 'should not update sincedb' do
586
+ s3_summary = [
587
+ double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'),
588
+ double(:key => 'TODAY', :last_modified => Time.now.round - (cutoff * 10), :content_length => 5, :storage_class => 'STANDARD')
589
+ ]
590
+
591
+ s3_objects = [
592
+ double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'),
593
+ double(:key => 'TODAY_UPDATED', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD')
594
+ ]
595
+
596
+ size = s3_objects.length
597
+
598
+ allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { s3_summary }
599
+ allow_any_instance_of(Aws::S3::Bucket).to receive(:object).and_return(*s3_objects)
600
+ expect(s3_plugin).to receive(:process_log).at_least(size).and_call_original
601
+ expect(s3_plugin).to receive(:stop?).and_return(false).at_least(size)
602
+ expect(s3_plugin).to receive(:download_remote_file).and_return(true).at_least(size)
603
+ expect(s3_plugin).to receive(:process_local_log).and_return(true).at_least(size)
604
+
605
+ s3_plugin.process_files(queue)
606
+ expect(s3_plugin.send(:sincedb).read).to eq(s3_summary[0].last_modified)
607
+ end
608
+ end
507
609
  end
508
610
  end
@@ -10,6 +10,7 @@ describe LogStash::Inputs::S3, :integration => true, :s3 => true do
10
10
 
11
11
  upload_file('../fixtures/uncompressed.log' , "#{prefix}uncompressed_1.log")
12
12
  upload_file('../fixtures/compressed.log.gz', "#{prefix}compressed_1.log.gz")
13
+ sleep(LogStash::Inputs::S3::CUTOFF_SECOND + 1)
13
14
  end
14
15
 
15
16
  after do
@@ -28,6 +29,7 @@ describe LogStash::Inputs::S3, :integration => true, :s3 => true do
28
29
  "prefix" => prefix,
29
30
  "temporary_directory" => temporary_directory } }
30
31
  let(:backup_prefix) { "backup/" }
32
+ let(:backup_bucket) { "logstash-s3-input-backup" }
31
33
 
32
34
  it "support prefix to scope the remote files" do
33
35
  events = fetch_events(minimal_settings)
@@ -49,13 +51,17 @@ describe LogStash::Inputs::S3, :integration => true, :s3 => true do
49
51
  end
50
52
 
51
53
  context "remote backup" do
54
+ before do
55
+ create_bucket(backup_bucket)
56
+ end
57
+
52
58
  it "another bucket" do
53
- fetch_events(minimal_settings.merge({ "backup_to_bucket" => "logstash-s3-input-backup"}))
54
- expect(list_remote_files("", "logstash-s3-input-backup").size).to eq(2)
59
+ fetch_events(minimal_settings.merge({ "backup_to_bucket" => backup_bucket}))
60
+ expect(list_remote_files("", backup_bucket).size).to eq(2)
55
61
  end
56
62
 
57
63
  after do
58
- delete_bucket("logstash-s3-input-backup")
64
+ delete_bucket(backup_bucket)
59
65
  end
60
66
  end
61
67
  end
@@ -23,6 +23,10 @@ def list_remote_files(prefix, target_bucket = ENV['AWS_LOGSTASH_TEST_BUCKET'])
23
23
  bucket.objects(:prefix => prefix).collect(&:key)
24
24
  end
25
25
 
26
+ def create_bucket(name)
27
+ s3object.bucket(name).create
28
+ end
29
+
26
30
  def delete_bucket(name)
27
31
  s3object.bucket(name).objects.map(&:delete)
28
32
  s3object.bucket(name).delete
@@ -33,13 +37,16 @@ def s3object
33
37
  end
34
38
 
35
39
  class TestInfiniteS3Object
40
+ def initialize(s3_obj)
41
+ @s3_obj = s3_obj
42
+ end
43
+
36
44
  def each
37
45
  counter = 1
38
46
 
39
47
  loop do
40
- yield "awesome-#{counter}"
48
+ yield @s3_obj
41
49
  counter +=1
42
50
  end
43
51
  end
44
- end
45
-
52
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-s3
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.4.1
4
+ version: 3.8.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Elastic
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-09-14 00:00:00.000000000 Z
11
+ date: 2021-08-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -100,6 +100,20 @@ dependencies:
100
100
  - - ">="
101
101
  - !ruby/object:Gem::Version
102
102
  version: '0'
103
+ - !ruby/object:Gem::Dependency
104
+ requirement: !ruby/object:Gem::Requirement
105
+ requirements:
106
+ - - "~>"
107
+ - !ruby/object:Gem::Version
108
+ version: '1.2'
109
+ name: logstash-mixin-ecs_compatibility_support
110
+ prerelease: false
111
+ type: :runtime
112
+ version_requirements: !ruby/object:Gem::Requirement
113
+ requirements:
114
+ - - "~>"
115
+ - !ruby/object:Gem::Version
116
+ version: '1.2'
103
117
  description: This gem is a Logstash plugin required to be installed on top of the
104
118
  Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This
105
119
  gem is not a stand-alone program
@@ -119,6 +133,7 @@ files:
119
133
  - lib/logstash/inputs/s3/patch.rb
120
134
  - logstash-input-s3.gemspec
121
135
  - spec/fixtures/cloudfront.log
136
+ - spec/fixtures/compressed.log.gee.zip
122
137
  - spec/fixtures/compressed.log.gz
123
138
  - spec/fixtures/compressed.log.gzip
124
139
  - spec/fixtures/invalid_utf8.gbk.log
@@ -152,13 +167,13 @@ required_rubygems_version: !ruby/object:Gem::Requirement
152
167
  - !ruby/object:Gem::Version
153
168
  version: '0'
154
169
  requirements: []
155
- rubyforge_project:
156
- rubygems_version: 2.6.13
170
+ rubygems_version: 3.1.6
157
171
  signing_key:
158
172
  specification_version: 4
159
173
  summary: Streams events from files in a S3 bucket
160
174
  test_files:
161
175
  - spec/fixtures/cloudfront.log
176
+ - spec/fixtures/compressed.log.gee.zip
162
177
  - spec/fixtures/compressed.log.gz
163
178
  - spec/fixtures/compressed.log.gzip
164
179
  - spec/fixtures/invalid_utf8.gbk.log