logstash-input-s3 3.3.6 → 3.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 33d02e0410e180a35672fc42215f5758ab34592b0acbac20caeedc3b2c555f8c
4
- data.tar.gz: 5010dba8d504458b455a4941db2de4accd039b8ff47eeeb31da4059d0580847f
3
+ metadata.gz: a4e1cc2ba334eb9e35fc68cc6a773e13b9b7de5bc6c3b4ee40cf98903d140323
4
+ data.tar.gz: a43fe645c1016095639e092fa4d3e8e8b77384dd17190a712a0eaa5163e68854
5
5
  SHA512:
6
- metadata.gz: 57ad952479ee8ac3a8582898f630e1db633101a595693566dd7697f4f70b7fc74568f1ab6da95e9498bae49f73dc6a4d76e96b28d677eab492965fc70e948f99
7
- data.tar.gz: 34d23aebf95e88011d9bb6c2d4e6a14a2e7fc8110003672c9c703613bcbba095f5311043906022fcec443424b865c87b48d49682a13d5667d7fd5184deadb949
6
+ metadata.gz: e10a6e21aa62270ce5f78c269fb3abf0a28b0755f78b0cb7adcc76183be460472327bdabbc94fab51c35774ae930d25846084ad3b9b0e95bd9e021c995c08bbd
7
+ data.tar.gz: 23b6882049688b1cd57ccba4b8aa247103bf5b9f91011f1aa426e81280c3b4ca701bf88fd96e278968c7e38fd58a01980a830b4ce2bb9ad8e034cfe07113de79
data/CHANGELOG.md CHANGED
@@ -1,3 +1,27 @@
1
+ ## 3.6.0
2
+ - Fixed unprocessed file with the same `last_modified` in ingestion. [#220](https://github.com/logstash-plugins/logstash-input-s3/pull/220)
3
+
4
+ ## 3.5.2
5
+ - [DOC]Added note that only AWS S3 is supported. No other S3 compatible storage solutions are supported. [#208](https://github.com/logstash-plugins/logstash-input-s3/issues/208)
6
+
7
+ ## 3.5.1
8
+ - [DOC]Added example for `exclude_pattern` and reordered option descriptions [#204](https://github.com/logstash-plugins/logstash-input-s3/issues/204)
9
+
10
+ ## 3.5.0
11
+ - Added support for including objects restored from Glacier or Glacier Deep [#199](https://github.com/logstash-plugins/logstash-input-s3/issues/199)
12
+ - Added `gzip_pattern` option, enabling more flexible determination of whether a file is gzipped [#165](https://github.com/logstash-plugins/logstash-input-s3/issues/165)
13
+ - Refactor: log exception: class + unify logging messages a bit [#201](https://github.com/logstash-plugins/logstash-input-s3/pull/201)
14
+
15
+ ## 3.4.1
16
+ - Fixed link formatting for input type (documentation)
17
+
18
+ ## 3.4.0
19
+ - Skips objects that are archived to AWS Glacier with a helpful log message (previously they would log as matched, but then fail to load events) [#160](https://github.com/logstash-plugins/logstash-input-s3/pull/160)
20
+ - Added `watch_for_new_files` option, enabling single-batch imports [#159](https://github.com/logstash-plugins/logstash-input-s3/pull/159)
21
+
22
+ ## 3.3.7
23
+ - Added ability to optionally include S3 object properties inside @metadata [#155](https://github.com/logstash-plugins/logstash-input-s3/pull/155)
24
+
1
25
  ## 3.3.6
2
26
  - Fixed error in documentation by removing illegal commas [#154](https://github.com/logstash-plugins/logstash-input-s3/pull/154)
3
27
 
data/LICENSE CHANGED
@@ -1,13 +1,202 @@
1
- Copyright (c) 2012-2018 Elasticsearch <http://www.elastic.co>
2
1
 
3
- Licensed under the Apache License, Version 2.0 (the "License");
4
- you may not use this file except in compliance with the License.
5
- You may obtain a copy of the License at
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
6
5
 
7
- http://www.apache.org/licenses/LICENSE-2.0
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
8
7
 
9
- Unless required by applicable law or agreed to in writing, software
10
- distributed under the License is distributed on an "AS IS" BASIS,
11
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
- See the License for the specific language governing permissions and
13
- limitations under the License.
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright 2020 Elastic and contributors
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Logstash Plugin
2
2
 
3
- [![Travis Build Status](https://travis-ci.org/logstash-plugins/logstash-input-s3.svg)](https://travis-ci.org/logstash-plugins/logstash-input-s3)
3
+ [![Travis Build Status](https://travis-ci.com/logstash-plugins/logstash-input-s3.svg)](https://travis-ci.com/logstash-plugins/logstash-input-s3)
4
4
 
5
5
  This is a plugin for [Logstash](https://github.com/elastic/logstash).
6
6
 
@@ -38,7 +38,7 @@ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/log
38
38
 
39
39
  ## Developing
40
40
 
41
- ### 1. Plugin Developement and Testing
41
+ ### 1. Plugin Development and Testing
42
42
 
43
43
  #### Code
44
44
  - To get started, you'll need JRuby with the Bundler gem installed.
data/docs/index.asciidoc CHANGED
@@ -23,9 +23,14 @@ include::{include_path}/plugin_header.asciidoc[]
23
23
 
24
24
  Stream events from files from a S3 bucket.
25
25
 
26
+ IMPORTANT: The S3 input plugin only supports AWS S3.
27
+ Other S3 compatible storage solutions are not supported.
28
+
26
29
  Each line from each file generates an event.
27
30
  Files ending in `.gz` are handled as gzip'ed files.
28
31
 
32
+ Files that are archived to AWS Glacier will be skipped.
33
+
29
34
  [id="plugins-{type}s-{plugin}-options"]
30
35
  ==== S3 Input Configuration Options
31
36
 
@@ -44,6 +49,8 @@ This plugin supports the following configuration options plus the <<plugins-{typ
44
49
  | <<plugins-{type}s-{plugin}-delete>> |<<boolean,boolean>>|No
45
50
  | <<plugins-{type}s-{plugin}-endpoint>> |<<string,string>>|No
46
51
  | <<plugins-{type}s-{plugin}-exclude_pattern>> |<<string,string>>|No
52
+ | <<plugins-{type}s-{plugin}-gzip_pattern>> |<<string,string>>|No
53
+ | <<plugins-{type}s-{plugin}-include_object_properties>> |<<boolean,boolean>>|No
47
54
  | <<plugins-{type}s-{plugin}-interval>> |<<number,number>>|No
48
55
  | <<plugins-{type}s-{plugin}-prefix>> |<<string,string>>|No
49
56
  | <<plugins-{type}s-{plugin}-proxy_uri>> |<<string,string>>|No
@@ -54,6 +61,7 @@ This plugin supports the following configuration options plus the <<plugins-{typ
54
61
  | <<plugins-{type}s-{plugin}-session_token>> |<<string,string>>|No
55
62
  | <<plugins-{type}s-{plugin}-sincedb_path>> |<<string,string>>|No
56
63
  | <<plugins-{type}s-{plugin}-temporary_directory>> |<<string,string>>|No
64
+ | <<plugins-{type}s-{plugin}-watch_for_new_files>> |<<boolean,boolean>>|No
57
65
  |=======================================================================
58
66
 
59
67
  Also see <<plugins-{type}s-{plugin}-common-options>> for a list of options supported by all
@@ -75,6 +83,29 @@ This plugin uses the AWS SDK and supports several ways to get credentials, which
75
83
  4. Environment variables `AMAZON_ACCESS_KEY_ID` and `AMAZON_SECRET_ACCESS_KEY`
76
84
  5. IAM Instance Profile (available when running inside EC2)
77
85
 
86
+
87
+ [id="plugins-{type}s-{plugin}-additional_settings"]
88
+ ===== `additional_settings`
89
+
90
+ * Value type is <<hash,hash>>
91
+ * Default value is `{}`
92
+
93
+ Key-value pairs of settings and corresponding values used to parametrize
94
+ the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html[the AWS SDK documentation]. Example:
95
+
96
+ [source,ruby]
97
+ input {
98
+ s3 {
99
+ "access_key_id" => "1234"
100
+ "secret_access_key" => "secret"
101
+ "bucket" => "logstash-test"
102
+ "additional_settings" => {
103
+ "force_path_style" => true
104
+ "follow_redirects" => false
105
+ }
106
+ }
107
+ }
108
+
78
109
  [id="plugins-{type}s-{plugin}-aws_credentials_file"]
79
110
  ===== `aws_credentials_file`
80
111
 
@@ -152,29 +183,37 @@ guaranteed to work correctly with the AWS SDK.
152
183
  * Value type is <<string,string>>
153
184
  * Default value is `nil`
154
185
 
155
- Ruby style regexp of keys to exclude from the bucket
156
-
157
- [id="plugins-{type}s-{plugin}-additional_settings"]
158
- ===== `additional_settings`
186
+ Ruby style regexp of keys to exclude from the bucket.
159
187
 
160
- * Value type is <<hash,hash>>
161
- * Default value is `{}`
188
+ Note that files matching the pattern are skipped _after_ they have been listed.
189
+ Consider using <<plugins-{type}s-{plugin}-prefix>> instead where possible.
162
190
 
163
- Key-value pairs of settings and corresponding values used to parametrize
164
- the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html[the AWS SDK documentation]. Example:
191
+ Example:
165
192
 
166
193
  [source,ruby]
167
- input {
168
- s3 {
169
- "access_key_id" => "1234"
170
- "secret_access_key" => "secret"
171
- "bucket" => "logstash-test"
172
- "additional_settings" => {
173
- "force_path_style" => true
174
- "follow_redirects" => false
175
- }
176
- }
177
- }
194
+ -----
195
+ "exclude_pattern" => "\/2020\/04\/"
196
+ -----
197
+
198
+ This pattern excludes all logs containing "/2020/04/" in the path.
199
+
200
+
201
+ [id="plugins-{type}s-{plugin}-gzip_pattern"]
202
+ ===== `gzip_pattern`
203
+
204
+ * Value type is <<string,string>>
205
+ * Default value is `"\.gz(ip)?$"`
206
+
207
+ Regular expression used to determine whether an input file is in gzip format.
208
+
209
+ [id="plugins-{type}s-{plugin}-include_object_properties"]
210
+ ===== `include_object_properties`
211
+
212
+ * Value type is <<boolean,boolean>>
213
+ * Default value is `false`
214
+
215
+ Whether or not to include the S3 object's properties (last_modified, content_type, metadata) into each Event at
216
+ `[@metadata][s3]`. Regardless of this setting, `[@metadata][s3][key]` will always be present.
178
217
 
179
218
  [id="plugins-{type}s-{plugin}-interval"]
180
219
  ===== `interval`
@@ -263,7 +302,14 @@ If specified, this setting must be a filename path and not just a directory.
263
302
 
264
303
  Set the directory where logstash will store the tmp files before processing them.
265
304
 
305
+ [id="plugins-{type}s-{plugin}-watch_for_new_files"]
306
+ ===== `watch_for_new_files`
307
+
308
+ * Value type is <<boolean,boolean>>
309
+ * Default value is `true`
266
310
 
311
+ Whether or not to watch for new files.
312
+ Disabling this option causes the input to close itself after processing the files from a single listing.
267
313
 
268
314
  [id="plugins-{type}s-{plugin}-common-options"]
269
315
  include::{include_path}/{type}.asciidoc[]
@@ -3,6 +3,7 @@ require "logstash/inputs/base"
3
3
  require "logstash/namespace"
4
4
  require "logstash/plugin_mixins/aws_config"
5
5
  require "time"
6
+ require "date"
6
7
  require "tmpdir"
7
8
  require "stud/interval"
8
9
  require "stud/temporary"
@@ -10,12 +11,6 @@ require "aws-sdk"
10
11
  require "logstash/inputs/s3/patch"
11
12
 
12
13
  require 'java'
13
- java_import java.io.InputStream
14
- java_import java.io.InputStreamReader
15
- java_import java.io.FileInputStream
16
- java_import java.io.BufferedReader
17
- java_import java.util.zip.GZIPInputStream
18
- java_import java.util.zip.ZipException
19
14
 
20
15
  Aws.eager_autoload!
21
16
  # Stream events from files from a S3 bucket.
@@ -23,6 +18,14 @@ Aws.eager_autoload!
23
18
  # Each line from each file generates an event.
24
19
  # Files ending in `.gz` are handled as gzip'ed files.
25
20
  class LogStash::Inputs::S3 < LogStash::Inputs::Base
21
+
22
+ java_import java.io.InputStream
23
+ java_import java.io.InputStreamReader
24
+ java_import java.io.FileInputStream
25
+ java_import java.io.BufferedReader
26
+ java_import java.util.zip.GZIPInputStream
27
+ java_import java.util.zip.ZipException
28
+
26
29
  include LogStash::PluginMixins::AwsConfig::V2
27
30
 
28
31
  config_name "s3"
@@ -63,6 +66,10 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
63
66
  # Value is in seconds.
64
67
  config :interval, :validate => :number, :default => 60
65
68
 
69
+ # Whether to watch for new files with the interval.
70
+ # If false, overrides any interval and only lists the s3 bucket once.
71
+ config :watch_for_new_files, :validate => :boolean, :default => true
72
+
66
73
  # Ruby style regexp of keys to exclude from the bucket
67
74
  config :exclude_pattern, :validate => :string, :default => nil
68
75
 
@@ -70,13 +77,23 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
70
77
  # default to the current OS temporary directory in linux /tmp/logstash
71
78
  config :temporary_directory, :validate => :string, :default => File.join(Dir.tmpdir, "logstash")
72
79
 
73
- public
80
+ # Whether or not to include the S3 object's properties (last_modified, content_type, metadata)
81
+ # into each Event at [@metadata][s3]. Regardless of this setting, [@metdata][s3][key] will always
82
+ # be present.
83
+ config :include_object_properties, :validate => :boolean, :default => false
84
+
85
+ # Regular expression used to determine whether an input file is in gzip format.
86
+ # default to an expression that matches *.gz and *.gzip file extensions
87
+ config :gzip_pattern, :validate => :string, :default => "\.gz(ip)?$"
88
+
89
+ CUTOFF_SECOND = 3
90
+
74
91
  def register
75
92
  require "fileutils"
76
93
  require "digest/md5"
77
94
  require "aws-sdk-resources"
78
95
 
79
- @logger.info("Registering s3 input", :bucket => @bucket, :region => @region)
96
+ @logger.info("Registering", :bucket => @bucket, :region => @region)
80
97
 
81
98
  s3 = get_s3object
82
99
 
@@ -96,42 +113,49 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
96
113
  end
97
114
 
98
115
  FileUtils.mkdir_p(@temporary_directory) unless Dir.exist?(@temporary_directory)
116
+
117
+ if !@watch_for_new_files && original_params.include?('interval')
118
+ logger.warn("`watch_for_new_files` has been disabled; `interval` directive will be ignored.")
119
+ end
99
120
  end
100
121
 
101
- public
102
122
  def run(queue)
103
123
  @current_thread = Thread.current
104
124
  Stud.interval(@interval) do
105
125
  process_files(queue)
126
+ stop unless @watch_for_new_files
106
127
  end
107
128
  end # def run
108
129
 
109
- public
110
130
  def list_new_files
111
- objects = {}
131
+ objects = []
112
132
  found = false
113
133
  begin
114
134
  @s3bucket.objects(:prefix => @prefix).each do |log|
115
135
  found = true
116
- @logger.debug("S3 input: Found key", :key => log.key)
117
- if !ignore_filename?(log.key)
118
- if sincedb.newer?(log.last_modified) && log.content_length > 0
119
- objects[log.key] = log.last_modified
120
- @logger.debug("S3 input: Adding to objects[]", :key => log.key)
121
- @logger.debug("objects[] length is: ", :length => objects.length)
122
- end
136
+ @logger.debug('Found key', :key => log.key)
137
+ if ignore_filename?(log.key)
138
+ @logger.debug('Ignoring', :key => log.key)
139
+ elsif log.content_length <= 0
140
+ @logger.debug('Object Zero Length', :key => log.key)
141
+ elsif !sincedb.newer?(log.last_modified)
142
+ @logger.debug('Object Not Modified', :key => log.key)
143
+ elsif log.last_modified > (Time.now - CUTOFF_SECOND).utc # file modified within last two seconds will be processed in next cycle
144
+ @logger.debug('Object Modified After Cutoff Time', :key => log.key)
145
+ elsif (log.storage_class == 'GLACIER' || log.storage_class == 'DEEP_ARCHIVE') && !file_restored?(log.object)
146
+ @logger.debug('Object Archived to Glacier', :key => log.key)
123
147
  else
124
- @logger.debug('S3 input: Ignoring', :key => log.key)
148
+ objects << log
149
+ @logger.debug("Added to objects[]", :key => log.key, :length => objects.length)
125
150
  end
126
151
  end
127
- @logger.info('S3 input: No files found in bucket', :prefix => prefix) unless found
152
+ @logger.info('No files found in bucket', :prefix => prefix) unless found
128
153
  rescue Aws::Errors::ServiceError => e
129
- @logger.error("S3 input: Unable to list objects in bucket", :prefix => prefix, :message => e.message)
154
+ @logger.error("Unable to list objects in bucket", :exception => e.class, :message => e.message, :backtrace => e.backtrace, :prefix => prefix)
130
155
  end
131
- objects.keys.sort {|a,b| objects[a] <=> objects[b]}
156
+ objects.sort_by { |log| log.last_modified }
132
157
  end # def fetch_new_files
133
158
 
134
- public
135
159
  def backup_to_bucket(object)
136
160
  unless @backup_to_bucket.nil?
137
161
  backup_key = "#{@backup_add_prefix}#{object.key}"
@@ -142,28 +166,24 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
142
166
  end
143
167
  end
144
168
 
145
- public
146
169
  def backup_to_dir(filename)
147
170
  unless @backup_to_dir.nil?
148
171
  FileUtils.cp(filename, @backup_to_dir)
149
172
  end
150
173
  end
151
174
 
152
- public
153
175
  def process_files(queue)
154
176
  objects = list_new_files
155
177
 
156
- objects.each do |key|
178
+ objects.each do |log|
157
179
  if stop?
158
180
  break
159
181
  else
160
- @logger.debug("S3 input processing", :bucket => @bucket, :key => key)
161
- process_log(queue, key)
182
+ process_log(queue, log)
162
183
  end
163
184
  end
164
185
  end # def process_files
165
186
 
166
- public
167
187
  def stop
168
188
  # @current_thread is initialized in the `#run` method,
169
189
  # this variable is needed because the `#stop` is a called in another thread
@@ -177,8 +197,9 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
177
197
  #
178
198
  # @param [Queue] Where to push the event
179
199
  # @param [String] Which file to read from
200
+ # @param [S3Object] Source s3 object
180
201
  # @return [Boolean] True if the file was completely read, false otherwise.
181
- def process_local_log(queue, filename, key)
202
+ def process_local_log(queue, filename, object)
182
203
  @logger.debug('Processing file', :filename => filename)
183
204
  metadata = {}
184
205
  # Currently codecs operates on bytes instead of stream.
@@ -209,7 +230,13 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
209
230
  event.set("cloudfront_version", metadata[:cloudfront_version]) unless metadata[:cloudfront_version].nil?
210
231
  event.set("cloudfront_fields", metadata[:cloudfront_fields]) unless metadata[:cloudfront_fields].nil?
211
232
 
212
- event.set("[@metadata][s3]", { "key" => key })
233
+ if @include_object_properties
234
+ event.set("[@metadata][s3]", object.data.to_h)
235
+ else
236
+ event.set("[@metadata][s3]", {})
237
+ end
238
+
239
+ event.set("[@metadata][s3][key]", object.key)
213
240
 
214
241
  queue << event
215
242
  end
@@ -223,24 +250,20 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
223
250
  return true
224
251
  end # def process_local_log
225
252
 
226
- private
227
253
  def event_is_metadata?(event)
228
254
  return false unless event.get("message").class == String
229
255
  line = event.get("message")
230
256
  version_metadata?(line) || fields_metadata?(line)
231
257
  end
232
258
 
233
- private
234
259
  def version_metadata?(line)
235
260
  line.start_with?('#Version: ')
236
261
  end
237
262
 
238
- private
239
263
  def fields_metadata?(line)
240
264
  line.start_with?('#Fields: ')
241
265
  end
242
266
 
243
- private
244
267
  def update_metadata(metadata, event)
245
268
  line = event.get('message').strip
246
269
 
@@ -253,7 +276,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
253
276
  end
254
277
  end
255
278
 
256
- private
257
279
  def read_file(filename, &block)
258
280
  if gzip?(filename)
259
281
  read_gzip_file(filename, block)
@@ -262,7 +284,7 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
262
284
  end
263
285
  rescue => e
264
286
  # skip any broken file
265
- @logger.error("Failed to read the file. Skip processing.", :filename => filename, :exception => e.message)
287
+ @logger.error("Failed to read file, processing skipped", :exception => e.class, :message => e.message, :filename => filename)
266
288
  end
267
289
 
268
290
  def read_plain_file(filename, block)
@@ -271,7 +293,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
271
293
  end
272
294
  end
273
295
 
274
- private
275
296
  def read_gzip_file(filename, block)
276
297
  file_stream = FileInputStream.new(filename)
277
298
  gzip_stream = GZIPInputStream.new(file_stream)
@@ -288,24 +309,20 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
288
309
  file_stream.close unless file_stream.nil?
289
310
  end
290
311
 
291
- private
292
312
  def gzip?(filename)
293
- filename.end_with?('.gz','.gzip')
313
+ Regexp.new(@gzip_pattern).match(filename)
294
314
  end
295
-
296
- private
297
- def sincedb
315
+
316
+ def sincedb
298
317
  @sincedb ||= if @sincedb_path.nil?
299
318
  @logger.info("Using default generated file for the sincedb", :filename => sincedb_file)
300
319
  SinceDB::File.new(sincedb_file)
301
320
  else
302
- @logger.info("Using the provided sincedb_path",
303
- :sincedb_path => @sincedb_path)
321
+ @logger.info("Using the provided sincedb_path", :sincedb_path => @sincedb_path)
304
322
  SinceDB::File.new(@sincedb_path)
305
323
  end
306
324
  end
307
325
 
308
- private
309
326
  def sincedb_file
310
327
  digest = Digest::MD5.hexdigest("#{@bucket}+#{@prefix}")
311
328
  dir = File.join(LogStash::SETTINGS.get_value("path.data"), "plugins", "inputs", "s3")
@@ -338,11 +355,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
338
355
  symbolized
339
356
  end
340
357
 
341
- private
342
- def old_sincedb_file
343
- end
344
-
345
- private
346
358
  def ignore_filename?(filename)
347
359
  if @prefix == filename
348
360
  return true
@@ -359,26 +371,28 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
359
371
  end
360
372
  end
361
373
 
362
- private
363
- def process_log(queue, key)
364
- object = @s3bucket.object(key)
374
+ def process_log(queue, log)
375
+ @logger.debug("Processing", :bucket => @bucket, :key => log.key)
376
+ object = @s3bucket.object(log.key)
365
377
 
366
- filename = File.join(temporary_directory, File.basename(key))
378
+ filename = File.join(temporary_directory, File.basename(log.key))
367
379
  if download_remote_file(object, filename)
368
- if process_local_log(queue, filename, key)
369
- lastmod = object.last_modified
370
- backup_to_bucket(object)
371
- backup_to_dir(filename)
372
- delete_file_from_bucket(object)
373
- FileUtils.remove_entry_secure(filename, true)
374
- sincedb.write(lastmod)
380
+ if process_local_log(queue, filename, object)
381
+ if object.last_modified == log.last_modified
382
+ backup_to_bucket(object)
383
+ backup_to_dir(filename)
384
+ delete_file_from_bucket(object)
385
+ FileUtils.remove_entry_secure(filename, true)
386
+ sincedb.write(log.last_modified)
387
+ else
388
+ @logger.info("#{log.key} is updated at #{object.last_modified} and will process in the next cycle")
389
+ end
375
390
  end
376
391
  else
377
392
  FileUtils.remove_entry_secure(filename, true)
378
393
  end
379
394
  end
380
395
 
381
- private
382
396
  # Stream the remove file to the local disk
383
397
  #
384
398
  # @param [S3Object] Reference to the remove S3 objec to download
@@ -386,33 +400,48 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
386
400
  # @return [Boolean] True if the file was completely downloaded
387
401
  def download_remote_file(remote_object, local_filename)
388
402
  completed = false
389
- @logger.debug("S3 input: Download remote file", :remote_key => remote_object.key, :local_filename => local_filename)
403
+ @logger.debug("Downloading remote file", :remote_key => remote_object.key, :local_filename => local_filename)
390
404
  File.open(local_filename, 'wb') do |s3file|
391
405
  return completed if stop?
392
406
  begin
393
407
  remote_object.get(:response_target => s3file)
394
408
  completed = true
395
409
  rescue Aws::Errors::ServiceError => e
396
- @logger.warn("S3 input: Unable to download remote file", :remote_key => remote_object.key, :message => e.message)
410
+ @logger.warn("Unable to download remote file", :exception => e.class, :message => e.message, :remote_key => remote_object.key)
397
411
  end
398
412
  end
399
413
  completed
400
414
  end
401
415
 
402
- private
403
416
  def delete_file_from_bucket(object)
404
417
  if @delete and @backup_to_bucket.nil?
405
418
  object.delete()
406
419
  end
407
420
  end
408
421
 
409
- private
410
422
  def get_s3object
411
423
  options = symbolized_settings.merge(aws_options_hash || {})
412
424
  s3 = Aws::S3::Resource.new(options)
413
425
  end
414
426
 
415
- private
427
+ def file_restored?(object)
428
+ begin
429
+ restore = object.data.restore
430
+ if restore && restore.match(/ongoing-request\s?=\s?["']false["']/)
431
+ if restore = restore.match(/expiry-date\s?=\s?["'](.*?)["']/)
432
+ expiry_date = DateTime.parse(restore[1])
433
+ return true if DateTime.now < expiry_date # restored
434
+ else
435
+ @logger.debug("No expiry-date header for restore request: #{object.data.restore}")
436
+ return nil # no expiry-date found for ongoing request
437
+ end
438
+ end
439
+ rescue => e
440
+ @logger.debug("Could not determine Glacier restore status", :exception => e.class, :message => e.message)
441
+ end
442
+ return false
443
+ end
444
+
416
445
  module SinceDB
417
446
  class File
418
447
  def initialize(file)
@@ -1,7 +1,7 @@
1
1
  Gem::Specification.new do |s|
2
2
 
3
3
  s.name = 'logstash-input-s3'
4
- s.version = '3.3.6'
4
+ s.version = '3.6.0'
5
5
  s.licenses = ['Apache-2.0']
6
6
  s.summary = "Streams events from files in a S3 bucket"
7
7
  s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
@@ -1,5 +1,6 @@
1
1
  # encoding: utf-8
2
2
  require "logstash/devutils/rspec/spec_helper"
3
+ require "logstash/devutils/rspec/shared_examples"
3
4
  require "logstash/inputs/s3"
4
5
  require "logstash/codecs/multiline"
5
6
  require "logstash/errors"
@@ -23,6 +24,7 @@ describe LogStash::Inputs::S3 do
23
24
  "sincedb_path" => File.join(sincedb_path, ".sincedb")
24
25
  }
25
26
  }
27
+ let(:cutoff) { LogStash::Inputs::S3::CUTOFF_SECOND }
26
28
 
27
29
 
28
30
  before do
@@ -32,10 +34,11 @@ describe LogStash::Inputs::S3 do
32
34
  end
33
35
 
34
36
  context "when interrupting the plugin" do
35
- let(:config) { super.merge({ "interval" => 5 }) }
37
+ let(:config) { super().merge({ "interval" => 5 }) }
38
+ let(:s3_obj) { double(:key => "awesome-key", :last_modified => Time.now.round, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) }
36
39
 
37
40
  before do
38
- expect_any_instance_of(LogStash::Inputs::S3).to receive(:list_new_files).and_return(TestInfiniteS3Object.new)
41
+ expect_any_instance_of(LogStash::Inputs::S3).to receive(:list_new_files).and_return(TestInfiniteS3Object.new(s3_obj))
39
42
  end
40
43
 
41
44
  it_behaves_like "an interruptible input plugin"
@@ -114,32 +117,61 @@ describe LogStash::Inputs::S3 do
114
117
  describe "#list_new_files" do
115
118
  before { allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { objects_list } }
116
119
 
117
- let!(:present_object) { double(:key => 'this-should-be-present', :last_modified => Time.now, :content_length => 10) }
120
+ let!(:present_object_after_cutoff) {double(:key => 'this-should-not-be-present', :last_modified => Time.now, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) }
121
+ let!(:present_object) {double(:key => 'this-should-be-present', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) }
122
+ let!(:archived_object) {double(:key => 'this-should-be-archived', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => nil)) ) }
123
+ let!(:deep_archived_object) {double(:key => 'this-should-be-archived', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => nil)) ) }
124
+ let!(:restored_object) {double(:key => 'this-should-be-restored-from-archive', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => 'ongoing-request="false", expiry-date="Thu, 01 Jan 2099 00:00:00 GMT"')) ) }
125
+ let!(:deep_restored_object) {double(:key => 'this-should-be-restored-from-deep-archive', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'DEEP_ARCHIVE', :object => double(:data => double(:restore => 'ongoing-request="false", expiry-date="Thu, 01 Jan 2099 00:00:00 GMT"')) ) }
118
126
  let(:objects_list) {
119
127
  [
120
- double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100),
121
- double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50),
122
- present_object
128
+ double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100, :storage_class => 'STANDARD'),
129
+ double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50, :storage_class => 'STANDARD'),
130
+ archived_object,
131
+ restored_object,
132
+ deep_restored_object,
133
+ present_object,
134
+ present_object_after_cutoff
123
135
  ]
124
136
  }
125
137
 
126
138
  it 'should allow user to exclude files from the s3 bucket' do
127
139
  plugin = LogStash::Inputs::S3.new(config.merge({ "exclude_pattern" => "^exclude" }))
128
140
  plugin.register
129
- expect(plugin.list_new_files).to eq([present_object.key])
141
+
142
+ files = plugin.list_new_files.map { |item| item.key }
143
+ expect(files).to include(present_object.key)
144
+ expect(files).to include(restored_object.key)
145
+ expect(files).to include(deep_restored_object.key)
146
+ expect(files).to_not include('exclude-this-file-1') # matches exclude pattern
147
+ expect(files).to_not include('exclude/logstash') # matches exclude pattern
148
+ expect(files).to_not include(archived_object.key) # archived
149
+ expect(files).to_not include(deep_archived_object.key) # archived
150
+ expect(files).to_not include(present_object_after_cutoff.key) # after cutoff
151
+ expect(files.size).to eq(3)
130
152
  end
131
153
 
132
154
  it 'should support not providing a exclude pattern' do
133
155
  plugin = LogStash::Inputs::S3.new(config)
134
156
  plugin.register
135
- expect(plugin.list_new_files).to eq(objects_list.map(&:key))
157
+
158
+ files = plugin.list_new_files.map { |item| item.key }
159
+ expect(files).to include(present_object.key)
160
+ expect(files).to include(restored_object.key)
161
+ expect(files).to include(deep_restored_object.key)
162
+ expect(files).to include('exclude-this-file-1') # no exclude pattern given
163
+ expect(files).to include('exclude/logstash') # no exclude pattern given
164
+ expect(files).to_not include(archived_object.key) # archived
165
+ expect(files).to_not include(deep_archived_object.key) # archived
166
+ expect(files).to_not include(present_object_after_cutoff.key) # after cutoff
167
+ expect(files.size).to eq(5)
136
168
  end
137
169
 
138
170
  context 'when all files are excluded from a bucket' do
139
171
  let(:objects_list) {
140
172
  [
141
- double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100),
142
- double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50),
173
+ double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100, :storage_class => 'STANDARD'),
174
+ double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50, :storage_class => 'STANDARD'),
143
175
  ]
144
176
  }
145
177
 
@@ -168,7 +200,7 @@ describe LogStash::Inputs::S3 do
168
200
  context "If the bucket is the same as the backup bucket" do
169
201
  it 'should ignore files from the bucket if they match the backup prefix' do
170
202
  objects_list = [
171
- double(:key => 'mybackup-log-1', :last_modified => Time.now, :content_length => 5),
203
+ double(:key => 'mybackup-log-1', :last_modified => Time.now, :content_length => 5, :storage_class => 'STANDARD'),
172
204
  present_object
173
205
  ]
174
206
 
@@ -177,24 +209,38 @@ describe LogStash::Inputs::S3 do
177
209
  plugin = LogStash::Inputs::S3.new(config.merge({ 'backup_add_prefix' => 'mybackup',
178
210
  'backup_to_bucket' => config['bucket']}))
179
211
  plugin.register
180
- expect(plugin.list_new_files).to eq([present_object.key])
212
+
213
+ files = plugin.list_new_files.map { |item| item.key }
214
+ expect(files).to include(present_object.key)
215
+ expect(files).to_not include('mybackup-log-1') # matches backup prefix
216
+ expect(files.size).to eq(1)
181
217
  end
182
218
  end
183
219
 
184
220
  it 'should ignore files older than X' do
185
221
  plugin = LogStash::Inputs::S3.new(config.merge({ 'backup_add_prefix' => 'exclude-this-file'}))
186
222
 
187
- expect_any_instance_of(LogStash::Inputs::S3::SinceDB::File).to receive(:read).exactly(objects_list.size) { Time.now - day }
223
+
224
+ allow_any_instance_of(LogStash::Inputs::S3::SinceDB::File).to receive(:read).and_return(Time.now - day)
188
225
  plugin.register
189
226
 
190
- expect(plugin.list_new_files).to eq([present_object.key])
227
+ files = plugin.list_new_files.map { |item| item.key }
228
+ expect(files).to include(present_object.key)
229
+ expect(files).to include(restored_object.key)
230
+ expect(files).to include(deep_restored_object.key)
231
+ expect(files).to_not include('exclude-this-file-1') # too old
232
+ expect(files).to_not include('exclude/logstash') # too old
233
+ expect(files).to_not include(archived_object.key) # archived
234
+ expect(files).to_not include(deep_archived_object.key) # archived
235
+ expect(files).to_not include(present_object_after_cutoff.key) # after cutoff
236
+ expect(files.size).to eq(3)
191
237
  end
192
238
 
193
239
  it 'should ignore file if the file match the prefix' do
194
240
  prefix = 'mysource/'
195
241
 
196
242
  objects_list = [
197
- double(:key => prefix, :last_modified => Time.now, :content_length => 5),
243
+ double(:key => prefix, :last_modified => Time.now, :content_length => 5, :storage_class => 'STANDARD'),
198
244
  present_object
199
245
  ]
200
246
 
@@ -202,14 +248,15 @@ describe LogStash::Inputs::S3 do
202
248
 
203
249
  plugin = LogStash::Inputs::S3.new(config.merge({ 'prefix' => prefix }))
204
250
  plugin.register
205
- expect(plugin.list_new_files).to eq([present_object.key])
251
+ expect(plugin.list_new_files.map { |item| item.key }).to eq([present_object.key])
206
252
  end
207
253
 
208
254
  it 'should sort return object sorted by last_modification date with older first' do
209
255
  objects = [
210
- double(:key => 'YESTERDAY', :last_modified => Time.now - day, :content_length => 5),
211
- double(:key => 'TODAY', :last_modified => Time.now, :content_length => 5),
212
- double(:key => 'TWO_DAYS_AGO', :last_modified => Time.now - 2 * day, :content_length => 5)
256
+ double(:key => 'YESTERDAY', :last_modified => Time.now - day, :content_length => 5, :storage_class => 'STANDARD'),
257
+ double(:key => 'TODAY', :last_modified => Time.now, :content_length => 5, :storage_class => 'STANDARD'),
258
+ double(:key => 'TODAY_BEFORE_CUTOFF', :last_modified => Time.now - cutoff, :content_length => 5, :storage_class => 'STANDARD'),
259
+ double(:key => 'TWO_DAYS_AGO', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD')
213
260
  ]
214
261
 
215
262
  allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { objects }
@@ -217,7 +264,7 @@ describe LogStash::Inputs::S3 do
217
264
 
218
265
  plugin = LogStash::Inputs::S3.new(config)
219
266
  plugin.register
220
- expect(plugin.list_new_files).to eq(['TWO_DAYS_AGO', 'YESTERDAY', 'TODAY'])
267
+ expect(plugin.list_new_files.map { |item| item.key }).to eq(['TWO_DAYS_AGO', 'YESTERDAY', 'TODAY_BEFORE_CUTOFF'])
221
268
  end
222
269
 
223
270
  describe "when doing backup on the s3" do
@@ -277,7 +324,7 @@ describe LogStash::Inputs::S3 do
277
324
  it 'should process events' do
278
325
  events = fetch_events(config)
279
326
  expect(events.size).to eq(events_to_process)
280
- insist { events[0].get("[@metadata][s3]") } == {"key" => log.key }
327
+ expect(events[0].get("[@metadata][s3][key]")).to eql log.key
281
328
  end
282
329
 
283
330
  it "deletes the temporary file" do
@@ -315,7 +362,7 @@ describe LogStash::Inputs::S3 do
315
362
  %w(AccessDenied NoSuchKey).each do |error|
316
363
  context "when retrieving an object, #{error} is returned" do
317
364
  let(:objects) { [log] }
318
- let(:log) { double(:key => 'uncompressed.log', :last_modified => Time.now - 2 * day, :content_length => 5) }
365
+ let(:log) { double(:key => 'uncompressed.log', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
319
366
 
320
367
  let(:config) {
321
368
  {
@@ -344,7 +391,7 @@ describe LogStash::Inputs::S3 do
344
391
 
345
392
  context 'when working with logs' do
346
393
  let(:objects) { [log] }
347
- let(:log) { double(:key => 'uncompressed.log', :last_modified => Time.now - 2 * day, :content_length => 5) }
394
+ let(:log) { double(:key => 'uncompressed.log', :last_modified => Time.now - 2 * day, :content_length => 5, :data => { "etag" => 'c2c966251da0bc3229d12c2642ba50a4' }, :storage_class => 'STANDARD') }
348
395
  let(:data) { File.read(log_file) }
349
396
 
350
397
  before do
@@ -389,28 +436,35 @@ describe LogStash::Inputs::S3 do
389
436
  end
390
437
 
391
438
  context "multiple compressed streams" do
392
- let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5) }
439
+ let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
393
440
  let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'multiple_compressed_streams.gz') }
394
441
 
395
442
  include_examples "generated events" do
396
443
  let(:events_to_process) { 16 }
397
444
  end
398
445
  end
399
-
446
+
400
447
  context 'compressed' do
401
- let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5) }
448
+ let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
402
449
  let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gz') }
403
450
 
404
451
  include_examples "generated events"
405
452
  end
406
453
 
407
- context 'compressed with gzip extension' do
408
- let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5) }
454
+ context 'compressed with gzip extension and using default gzip_pattern option' do
455
+ let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
409
456
  let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gzip') }
410
457
 
411
458
  include_examples "generated events"
412
459
  end
413
460
 
461
+ context 'compressed with gzip extension and using custom gzip_pattern option' do
462
+ let(:config) { super().merge({ "gzip_pattern" => "gee.zip$" }) }
463
+ let(:log) { double(:key => 'log.gee.zip', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
464
+ let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gee.zip') }
465
+ include_examples "generated events"
466
+ end
467
+
414
468
  context 'plain text' do
415
469
  let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') }
416
470
 
@@ -451,5 +505,95 @@ describe LogStash::Inputs::S3 do
451
505
 
452
506
  include_examples "generated events"
453
507
  end
508
+
509
+ context 'when include_object_properties is set to true' do
510
+ let(:config) { super().merge({ "include_object_properties" => true }) }
511
+ let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') }
512
+
513
+ it 'should extract object properties onto [@metadata][s3]' do
514
+ events = fetch_events(config)
515
+ events.each do |event|
516
+ expect(event.get('[@metadata][s3]')).to include(log.data)
517
+ end
518
+ end
519
+
520
+ include_examples "generated events"
521
+ end
522
+
523
+ context 'when include_object_properties is set to false' do
524
+ let(:config) { super().merge({ "include_object_properties" => false }) }
525
+ let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') }
526
+
527
+ it 'should NOT extract object properties onto [@metadata][s3]' do
528
+ events = fetch_events(config)
529
+ events.each do |event|
530
+ expect(event.get('[@metadata][s3]')).to_not include(log.data)
531
+ end
532
+ end
533
+
534
+ include_examples "generated events"
535
+ end
536
+ end
537
+
538
+ describe "data loss" do
539
+ let(:s3_plugin) { LogStash::Inputs::S3.new(config) }
540
+ let(:queue) { [] }
541
+
542
+ before do
543
+ s3_plugin.register
544
+ end
545
+
546
+ context 'events come after cutoff time' do
547
+ it 'should be processed in next cycle' do
548
+ s3_objects = [
549
+ double(:key => 'TWO_DAYS_AGO', :last_modified => Time.now.round - 2 * day, :content_length => 5, :storage_class => 'STANDARD'),
550
+ double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'),
551
+ double(:key => 'TODAY_BEFORE_CUTOFF', :last_modified => Time.now.round - cutoff, :content_length => 5, :storage_class => 'STANDARD'),
552
+ double(:key => 'TODAY', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD'),
553
+ double(:key => 'TODAY', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD')
554
+ ]
555
+ size = s3_objects.length
556
+
557
+ allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { s3_objects }
558
+ allow_any_instance_of(Aws::S3::Bucket).to receive(:object).and_return(*s3_objects)
559
+ expect(s3_plugin).to receive(:process_log).at_least(size).and_call_original
560
+ expect(s3_plugin).to receive(:stop?).and_return(false).at_least(size)
561
+ expect(s3_plugin).to receive(:download_remote_file).and_return(true).at_least(size)
562
+ expect(s3_plugin).to receive(:process_local_log).and_return(true).at_least(size)
563
+
564
+ # first iteration
565
+ s3_plugin.process_files(queue)
566
+
567
+ # second iteration
568
+ sleep(cutoff + 1)
569
+ s3_plugin.process_files(queue)
570
+ end
571
+ end
572
+
573
+ context 's3 object updated after getting summary' do
574
+ it 'should not update sincedb' do
575
+ s3_summary = [
576
+ double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'),
577
+ double(:key => 'TODAY', :last_modified => Time.now.round - (cutoff * 10), :content_length => 5, :storage_class => 'STANDARD')
578
+ ]
579
+
580
+ s3_objects = [
581
+ double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'),
582
+ double(:key => 'TODAY_UPDATED', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD')
583
+ ]
584
+
585
+ size = s3_objects.length
586
+
587
+ allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { s3_summary }
588
+ allow_any_instance_of(Aws::S3::Bucket).to receive(:object).and_return(*s3_objects)
589
+ expect(s3_plugin).to receive(:process_log).at_least(size).and_call_original
590
+ expect(s3_plugin).to receive(:stop?).and_return(false).at_least(size)
591
+ expect(s3_plugin).to receive(:download_remote_file).and_return(true).at_least(size)
592
+ expect(s3_plugin).to receive(:process_local_log).and_return(true).at_least(size)
593
+
594
+ s3_plugin.process_files(queue)
595
+ expect(s3_plugin.send(:sincedb).read).to eq(s3_summary[0].last_modified)
596
+ end
597
+ end
454
598
  end
455
599
  end
@@ -10,6 +10,7 @@ describe LogStash::Inputs::S3, :integration => true, :s3 => true do
10
10
 
11
11
  upload_file('../fixtures/uncompressed.log' , "#{prefix}uncompressed_1.log")
12
12
  upload_file('../fixtures/compressed.log.gz', "#{prefix}compressed_1.log.gz")
13
+ sleep(LogStash::Inputs::S3::CUTOFF_SECOND + 1)
13
14
  end
14
15
 
15
16
  after do
@@ -28,6 +29,7 @@ describe LogStash::Inputs::S3, :integration => true, :s3 => true do
28
29
  "prefix" => prefix,
29
30
  "temporary_directory" => temporary_directory } }
30
31
  let(:backup_prefix) { "backup/" }
32
+ let(:backup_bucket) { "logstash-s3-input-backup" }
31
33
 
32
34
  it "support prefix to scope the remote files" do
33
35
  events = fetch_events(minimal_settings)
@@ -49,13 +51,17 @@ describe LogStash::Inputs::S3, :integration => true, :s3 => true do
49
51
  end
50
52
 
51
53
  context "remote backup" do
54
+ before do
55
+ create_bucket(backup_bucket)
56
+ end
57
+
52
58
  it "another bucket" do
53
- fetch_events(minimal_settings.merge({ "backup_to_bucket" => "logstash-s3-input-backup"}))
54
- expect(list_remote_files("", "logstash-s3-input-backup").size).to eq(2)
59
+ fetch_events(minimal_settings.merge({ "backup_to_bucket" => backup_bucket}))
60
+ expect(list_remote_files("", backup_bucket).size).to eq(2)
55
61
  end
56
62
 
57
63
  after do
58
- delete_bucket("logstash-s3-input-backup")
64
+ delete_bucket(backup_bucket)
59
65
  end
60
66
  end
61
67
  end
@@ -23,6 +23,10 @@ def list_remote_files(prefix, target_bucket = ENV['AWS_LOGSTASH_TEST_BUCKET'])
23
23
  bucket.objects(:prefix => prefix).collect(&:key)
24
24
  end
25
25
 
26
+ def create_bucket(name)
27
+ s3object.bucket(name).create
28
+ end
29
+
26
30
  def delete_bucket(name)
27
31
  s3object.bucket(name).objects.map(&:delete)
28
32
  s3object.bucket(name).delete
@@ -33,13 +37,16 @@ def s3object
33
37
  end
34
38
 
35
39
  class TestInfiniteS3Object
40
+ def initialize(s3_obj)
41
+ @s3_obj = s3_obj
42
+ end
43
+
36
44
  def each
37
45
  counter = 1
38
46
 
39
47
  loop do
40
- yield "awesome-#{counter}"
48
+ yield @s3_obj
41
49
  counter +=1
42
50
  end
43
51
  end
44
- end
45
-
52
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logstash-input-s3
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.3.6
4
+ version: 3.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Elastic
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-07-19 00:00:00.000000000 Z
11
+ date: 2021-03-11 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement
@@ -119,6 +119,7 @@ files:
119
119
  - lib/logstash/inputs/s3/patch.rb
120
120
  - logstash-input-s3.gemspec
121
121
  - spec/fixtures/cloudfront.log
122
+ - spec/fixtures/compressed.log.gee.zip
122
123
  - spec/fixtures/compressed.log.gz
123
124
  - spec/fixtures/compressed.log.gzip
124
125
  - spec/fixtures/invalid_utf8.gbk.log
@@ -159,6 +160,7 @@ specification_version: 4
159
160
  summary: Streams events from files in a S3 bucket
160
161
  test_files:
161
162
  - spec/fixtures/cloudfront.log
163
+ - spec/fixtures/compressed.log.gee.zip
162
164
  - spec/fixtures/compressed.log.gz
163
165
  - spec/fixtures/compressed.log.gzip
164
166
  - spec/fixtures/invalid_utf8.gbk.log