logstash-input-s3 3.3.6 → 3.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +24 -0
- data/LICENSE +199 -10
- data/README.md +2 -2
- data/docs/index.asciidoc +65 -19
- data/lib/logstash/inputs/s3.rb +97 -68
- data/logstash-input-s3.gemspec +1 -1
- data/spec/fixtures/compressed.log.gee.zip +0 -0
- data/spec/inputs/s3_spec.rb +172 -28
- data/spec/integration/s3_spec.rb +9 -3
- data/spec/support/helpers.rb +10 -3
- metadata +4 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a4e1cc2ba334eb9e35fc68cc6a773e13b9b7de5bc6c3b4ee40cf98903d140323
|
4
|
+
data.tar.gz: a43fe645c1016095639e092fa4d3e8e8b77384dd17190a712a0eaa5163e68854
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e10a6e21aa62270ce5f78c269fb3abf0a28b0755f78b0cb7adcc76183be460472327bdabbc94fab51c35774ae930d25846084ad3b9b0e95bd9e021c995c08bbd
|
7
|
+
data.tar.gz: 23b6882049688b1cd57ccba4b8aa247103bf5b9f91011f1aa426e81280c3b4ca701bf88fd96e278968c7e38fd58a01980a830b4ce2bb9ad8e034cfe07113de79
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,27 @@
|
|
1
|
+
## 3.6.0
|
2
|
+
- Fixed unprocessed file with the same `last_modified` in ingestion. [#220](https://github.com/logstash-plugins/logstash-input-s3/pull/220)
|
3
|
+
|
4
|
+
## 3.5.2
|
5
|
+
- [DOC]Added note that only AWS S3 is supported. No other S3 compatible storage solutions are supported. [#208](https://github.com/logstash-plugins/logstash-input-s3/issues/208)
|
6
|
+
|
7
|
+
## 3.5.1
|
8
|
+
- [DOC]Added example for `exclude_pattern` and reordered option descriptions [#204](https://github.com/logstash-plugins/logstash-input-s3/issues/204)
|
9
|
+
|
10
|
+
## 3.5.0
|
11
|
+
- Added support for including objects restored from Glacier or Glacier Deep [#199](https://github.com/logstash-plugins/logstash-input-s3/issues/199)
|
12
|
+
- Added `gzip_pattern` option, enabling more flexible determination of whether a file is gzipped [#165](https://github.com/logstash-plugins/logstash-input-s3/issues/165)
|
13
|
+
- Refactor: log exception: class + unify logging messages a bit [#201](https://github.com/logstash-plugins/logstash-input-s3/pull/201)
|
14
|
+
|
15
|
+
## 3.4.1
|
16
|
+
- Fixed link formatting for input type (documentation)
|
17
|
+
|
18
|
+
## 3.4.0
|
19
|
+
- Skips objects that are archived to AWS Glacier with a helpful log message (previously they would log as matched, but then fail to load events) [#160](https://github.com/logstash-plugins/logstash-input-s3/pull/160)
|
20
|
+
- Added `watch_for_new_files` option, enabling single-batch imports [#159](https://github.com/logstash-plugins/logstash-input-s3/pull/159)
|
21
|
+
|
22
|
+
## 3.3.7
|
23
|
+
- Added ability to optionally include S3 object properties inside @metadata [#155](https://github.com/logstash-plugins/logstash-input-s3/pull/155)
|
24
|
+
|
1
25
|
## 3.3.6
|
2
26
|
- Fixed error in documentation by removing illegal commas [#154](https://github.com/logstash-plugins/logstash-input-s3/pull/154)
|
3
27
|
|
data/LICENSE
CHANGED
@@ -1,13 +1,202 @@
|
|
1
|
-
Copyright (c) 2012-2018 Elasticsearch <http://www.elastic.co>
|
2
1
|
|
3
|
-
|
4
|
-
|
5
|
-
|
2
|
+
Apache License
|
3
|
+
Version 2.0, January 2004
|
4
|
+
http://www.apache.org/licenses/
|
6
5
|
|
7
|
-
|
6
|
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
8
7
|
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
8
|
+
1. Definitions.
|
9
|
+
|
10
|
+
"License" shall mean the terms and conditions for use, reproduction,
|
11
|
+
and distribution as defined by Sections 1 through 9 of this document.
|
12
|
+
|
13
|
+
"Licensor" shall mean the copyright owner or entity authorized by
|
14
|
+
the copyright owner that is granting the License.
|
15
|
+
|
16
|
+
"Legal Entity" shall mean the union of the acting entity and all
|
17
|
+
other entities that control, are controlled by, or are under common
|
18
|
+
control with that entity. For the purposes of this definition,
|
19
|
+
"control" means (i) the power, direct or indirect, to cause the
|
20
|
+
direction or management of such entity, whether by contract or
|
21
|
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
22
|
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
23
|
+
|
24
|
+
"You" (or "Your") shall mean an individual or Legal Entity
|
25
|
+
exercising permissions granted by this License.
|
26
|
+
|
27
|
+
"Source" form shall mean the preferred form for making modifications,
|
28
|
+
including but not limited to software source code, documentation
|
29
|
+
source, and configuration files.
|
30
|
+
|
31
|
+
"Object" form shall mean any form resulting from mechanical
|
32
|
+
transformation or translation of a Source form, including but
|
33
|
+
not limited to compiled object code, generated documentation,
|
34
|
+
and conversions to other media types.
|
35
|
+
|
36
|
+
"Work" shall mean the work of authorship, whether in Source or
|
37
|
+
Object form, made available under the License, as indicated by a
|
38
|
+
copyright notice that is included in or attached to the work
|
39
|
+
(an example is provided in the Appendix below).
|
40
|
+
|
41
|
+
"Derivative Works" shall mean any work, whether in Source or Object
|
42
|
+
form, that is based on (or derived from) the Work and for which the
|
43
|
+
editorial revisions, annotations, elaborations, or other modifications
|
44
|
+
represent, as a whole, an original work of authorship. For the purposes
|
45
|
+
of this License, Derivative Works shall not include works that remain
|
46
|
+
separable from, or merely link (or bind by name) to the interfaces of,
|
47
|
+
the Work and Derivative Works thereof.
|
48
|
+
|
49
|
+
"Contribution" shall mean any work of authorship, including
|
50
|
+
the original version of the Work and any modifications or additions
|
51
|
+
to that Work or Derivative Works thereof, that is intentionally
|
52
|
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
53
|
+
or by an individual or Legal Entity authorized to submit on behalf of
|
54
|
+
the copyright owner. For the purposes of this definition, "submitted"
|
55
|
+
means any form of electronic, verbal, or written communication sent
|
56
|
+
to the Licensor or its representatives, including but not limited to
|
57
|
+
communication on electronic mailing lists, source code control systems,
|
58
|
+
and issue tracking systems that are managed by, or on behalf of, the
|
59
|
+
Licensor for the purpose of discussing and improving the Work, but
|
60
|
+
excluding communication that is conspicuously marked or otherwise
|
61
|
+
designated in writing by the copyright owner as "Not a Contribution."
|
62
|
+
|
63
|
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
64
|
+
on behalf of whom a Contribution has been received by Licensor and
|
65
|
+
subsequently incorporated within the Work.
|
66
|
+
|
67
|
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
68
|
+
this License, each Contributor hereby grants to You a perpetual,
|
69
|
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
70
|
+
copyright license to reproduce, prepare Derivative Works of,
|
71
|
+
publicly display, publicly perform, sublicense, and distribute the
|
72
|
+
Work and such Derivative Works in Source or Object form.
|
73
|
+
|
74
|
+
3. Grant of Patent License. Subject to the terms and conditions of
|
75
|
+
this License, each Contributor hereby grants to You a perpetual,
|
76
|
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
77
|
+
(except as stated in this section) patent license to make, have made,
|
78
|
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
79
|
+
where such license applies only to those patent claims licensable
|
80
|
+
by such Contributor that are necessarily infringed by their
|
81
|
+
Contribution(s) alone or by combination of their Contribution(s)
|
82
|
+
with the Work to which such Contribution(s) was submitted. If You
|
83
|
+
institute patent litigation against any entity (including a
|
84
|
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
85
|
+
or a Contribution incorporated within the Work constitutes direct
|
86
|
+
or contributory patent infringement, then any patent licenses
|
87
|
+
granted to You under this License for that Work shall terminate
|
88
|
+
as of the date such litigation is filed.
|
89
|
+
|
90
|
+
4. Redistribution. You may reproduce and distribute copies of the
|
91
|
+
Work or Derivative Works thereof in any medium, with or without
|
92
|
+
modifications, and in Source or Object form, provided that You
|
93
|
+
meet the following conditions:
|
94
|
+
|
95
|
+
(a) You must give any other recipients of the Work or
|
96
|
+
Derivative Works a copy of this License; and
|
97
|
+
|
98
|
+
(b) You must cause any modified files to carry prominent notices
|
99
|
+
stating that You changed the files; and
|
100
|
+
|
101
|
+
(c) You must retain, in the Source form of any Derivative Works
|
102
|
+
that You distribute, all copyright, patent, trademark, and
|
103
|
+
attribution notices from the Source form of the Work,
|
104
|
+
excluding those notices that do not pertain to any part of
|
105
|
+
the Derivative Works; and
|
106
|
+
|
107
|
+
(d) If the Work includes a "NOTICE" text file as part of its
|
108
|
+
distribution, then any Derivative Works that You distribute must
|
109
|
+
include a readable copy of the attribution notices contained
|
110
|
+
within such NOTICE file, excluding those notices that do not
|
111
|
+
pertain to any part of the Derivative Works, in at least one
|
112
|
+
of the following places: within a NOTICE text file distributed
|
113
|
+
as part of the Derivative Works; within the Source form or
|
114
|
+
documentation, if provided along with the Derivative Works; or,
|
115
|
+
within a display generated by the Derivative Works, if and
|
116
|
+
wherever such third-party notices normally appear. The contents
|
117
|
+
of the NOTICE file are for informational purposes only and
|
118
|
+
do not modify the License. You may add Your own attribution
|
119
|
+
notices within Derivative Works that You distribute, alongside
|
120
|
+
or as an addendum to the NOTICE text from the Work, provided
|
121
|
+
that such additional attribution notices cannot be construed
|
122
|
+
as modifying the License.
|
123
|
+
|
124
|
+
You may add Your own copyright statement to Your modifications and
|
125
|
+
may provide additional or different license terms and conditions
|
126
|
+
for use, reproduction, or distribution of Your modifications, or
|
127
|
+
for any such Derivative Works as a whole, provided Your use,
|
128
|
+
reproduction, and distribution of the Work otherwise complies with
|
129
|
+
the conditions stated in this License.
|
130
|
+
|
131
|
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
132
|
+
any Contribution intentionally submitted for inclusion in the Work
|
133
|
+
by You to the Licensor shall be under the terms and conditions of
|
134
|
+
this License, without any additional terms or conditions.
|
135
|
+
Notwithstanding the above, nothing herein shall supersede or modify
|
136
|
+
the terms of any separate license agreement you may have executed
|
137
|
+
with Licensor regarding such Contributions.
|
138
|
+
|
139
|
+
6. Trademarks. This License does not grant permission to use the trade
|
140
|
+
names, trademarks, service marks, or product names of the Licensor,
|
141
|
+
except as required for reasonable and customary use in describing the
|
142
|
+
origin of the Work and reproducing the content of the NOTICE file.
|
143
|
+
|
144
|
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
145
|
+
agreed to in writing, Licensor provides the Work (and each
|
146
|
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
147
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
148
|
+
implied, including, without limitation, any warranties or conditions
|
149
|
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
150
|
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
151
|
+
appropriateness of using or redistributing the Work and assume any
|
152
|
+
risks associated with Your exercise of permissions under this License.
|
153
|
+
|
154
|
+
8. Limitation of Liability. In no event and under no legal theory,
|
155
|
+
whether in tort (including negligence), contract, or otherwise,
|
156
|
+
unless required by applicable law (such as deliberate and grossly
|
157
|
+
negligent acts) or agreed to in writing, shall any Contributor be
|
158
|
+
liable to You for damages, including any direct, indirect, special,
|
159
|
+
incidental, or consequential damages of any character arising as a
|
160
|
+
result of this License or out of the use or inability to use the
|
161
|
+
Work (including but not limited to damages for loss of goodwill,
|
162
|
+
work stoppage, computer failure or malfunction, or any and all
|
163
|
+
other commercial damages or losses), even if such Contributor
|
164
|
+
has been advised of the possibility of such damages.
|
165
|
+
|
166
|
+
9. Accepting Warranty or Additional Liability. While redistributing
|
167
|
+
the Work or Derivative Works thereof, You may choose to offer,
|
168
|
+
and charge a fee for, acceptance of support, warranty, indemnity,
|
169
|
+
or other liability obligations and/or rights consistent with this
|
170
|
+
License. However, in accepting such obligations, You may act only
|
171
|
+
on Your own behalf and on Your sole responsibility, not on behalf
|
172
|
+
of any other Contributor, and only if You agree to indemnify,
|
173
|
+
defend, and hold each Contributor harmless for any liability
|
174
|
+
incurred by, or claims asserted against, such Contributor by reason
|
175
|
+
of your accepting any such warranty or additional liability.
|
176
|
+
|
177
|
+
END OF TERMS AND CONDITIONS
|
178
|
+
|
179
|
+
APPENDIX: How to apply the Apache License to your work.
|
180
|
+
|
181
|
+
To apply the Apache License to your work, attach the following
|
182
|
+
boilerplate notice, with the fields enclosed by brackets "[]"
|
183
|
+
replaced with your own identifying information. (Don't include
|
184
|
+
the brackets!) The text should be enclosed in the appropriate
|
185
|
+
comment syntax for the file format. We also recommend that a
|
186
|
+
file or class name and description of purpose be included on the
|
187
|
+
same "printed page" as the copyright notice for easier
|
188
|
+
identification within third-party archives.
|
189
|
+
|
190
|
+
Copyright 2020 Elastic and contributors
|
191
|
+
|
192
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
193
|
+
you may not use this file except in compliance with the License.
|
194
|
+
You may obtain a copy of the License at
|
195
|
+
|
196
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
197
|
+
|
198
|
+
Unless required by applicable law or agreed to in writing, software
|
199
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
200
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
201
|
+
See the License for the specific language governing permissions and
|
202
|
+
limitations under the License.
|
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# Logstash Plugin
|
2
2
|
|
3
|
-
[![Travis Build Status](https://travis-ci.
|
3
|
+
[![Travis Build Status](https://travis-ci.com/logstash-plugins/logstash-input-s3.svg)](https://travis-ci.com/logstash-plugins/logstash-input-s3)
|
4
4
|
|
5
5
|
This is a plugin for [Logstash](https://github.com/elastic/logstash).
|
6
6
|
|
@@ -38,7 +38,7 @@ Need help? Try #logstash on freenode IRC or the https://discuss.elastic.co/c/log
|
|
38
38
|
|
39
39
|
## Developing
|
40
40
|
|
41
|
-
### 1. Plugin
|
41
|
+
### 1. Plugin Development and Testing
|
42
42
|
|
43
43
|
#### Code
|
44
44
|
- To get started, you'll need JRuby with the Bundler gem installed.
|
data/docs/index.asciidoc
CHANGED
@@ -23,9 +23,14 @@ include::{include_path}/plugin_header.asciidoc[]
|
|
23
23
|
|
24
24
|
Stream events from files from a S3 bucket.
|
25
25
|
|
26
|
+
IMPORTANT: The S3 input plugin only supports AWS S3.
|
27
|
+
Other S3 compatible storage solutions are not supported.
|
28
|
+
|
26
29
|
Each line from each file generates an event.
|
27
30
|
Files ending in `.gz` are handled as gzip'ed files.
|
28
31
|
|
32
|
+
Files that are archived to AWS Glacier will be skipped.
|
33
|
+
|
29
34
|
[id="plugins-{type}s-{plugin}-options"]
|
30
35
|
==== S3 Input Configuration Options
|
31
36
|
|
@@ -44,6 +49,8 @@ This plugin supports the following configuration options plus the <<plugins-{typ
|
|
44
49
|
| <<plugins-{type}s-{plugin}-delete>> |<<boolean,boolean>>|No
|
45
50
|
| <<plugins-{type}s-{plugin}-endpoint>> |<<string,string>>|No
|
46
51
|
| <<plugins-{type}s-{plugin}-exclude_pattern>> |<<string,string>>|No
|
52
|
+
| <<plugins-{type}s-{plugin}-gzip_pattern>> |<<string,string>>|No
|
53
|
+
| <<plugins-{type}s-{plugin}-include_object_properties>> |<<boolean,boolean>>|No
|
47
54
|
| <<plugins-{type}s-{plugin}-interval>> |<<number,number>>|No
|
48
55
|
| <<plugins-{type}s-{plugin}-prefix>> |<<string,string>>|No
|
49
56
|
| <<plugins-{type}s-{plugin}-proxy_uri>> |<<string,string>>|No
|
@@ -54,6 +61,7 @@ This plugin supports the following configuration options plus the <<plugins-{typ
|
|
54
61
|
| <<plugins-{type}s-{plugin}-session_token>> |<<string,string>>|No
|
55
62
|
| <<plugins-{type}s-{plugin}-sincedb_path>> |<<string,string>>|No
|
56
63
|
| <<plugins-{type}s-{plugin}-temporary_directory>> |<<string,string>>|No
|
64
|
+
| <<plugins-{type}s-{plugin}-watch_for_new_files>> |<<boolean,boolean>>|No
|
57
65
|
|=======================================================================
|
58
66
|
|
59
67
|
Also see <<plugins-{type}s-{plugin}-common-options>> for a list of options supported by all
|
@@ -75,6 +83,29 @@ This plugin uses the AWS SDK and supports several ways to get credentials, which
|
|
75
83
|
4. Environment variables `AMAZON_ACCESS_KEY_ID` and `AMAZON_SECRET_ACCESS_KEY`
|
76
84
|
5. IAM Instance Profile (available when running inside EC2)
|
77
85
|
|
86
|
+
|
87
|
+
[id="plugins-{type}s-{plugin}-additional_settings"]
|
88
|
+
===== `additional_settings`
|
89
|
+
|
90
|
+
* Value type is <<hash,hash>>
|
91
|
+
* Default value is `{}`
|
92
|
+
|
93
|
+
Key-value pairs of settings and corresponding values used to parametrize
|
94
|
+
the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html[the AWS SDK documentation]. Example:
|
95
|
+
|
96
|
+
[source,ruby]
|
97
|
+
input {
|
98
|
+
s3 {
|
99
|
+
"access_key_id" => "1234"
|
100
|
+
"secret_access_key" => "secret"
|
101
|
+
"bucket" => "logstash-test"
|
102
|
+
"additional_settings" => {
|
103
|
+
"force_path_style" => true
|
104
|
+
"follow_redirects" => false
|
105
|
+
}
|
106
|
+
}
|
107
|
+
}
|
108
|
+
|
78
109
|
[id="plugins-{type}s-{plugin}-aws_credentials_file"]
|
79
110
|
===== `aws_credentials_file`
|
80
111
|
|
@@ -152,29 +183,37 @@ guaranteed to work correctly with the AWS SDK.
|
|
152
183
|
* Value type is <<string,string>>
|
153
184
|
* Default value is `nil`
|
154
185
|
|
155
|
-
Ruby style regexp of keys to exclude from the bucket
|
156
|
-
|
157
|
-
[id="plugins-{type}s-{plugin}-additional_settings"]
|
158
|
-
===== `additional_settings`
|
186
|
+
Ruby style regexp of keys to exclude from the bucket.
|
159
187
|
|
160
|
-
|
161
|
-
|
188
|
+
Note that files matching the pattern are skipped _after_ they have been listed.
|
189
|
+
Consider using <<plugins-{type}s-{plugin}-prefix>> instead where possible.
|
162
190
|
|
163
|
-
|
164
|
-
the connection to s3. See full list in https://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Client.html[the AWS SDK documentation]. Example:
|
191
|
+
Example:
|
165
192
|
|
166
193
|
[source,ruby]
|
167
|
-
|
168
|
-
|
169
|
-
|
170
|
-
|
171
|
-
|
172
|
-
|
173
|
-
|
174
|
-
|
175
|
-
|
176
|
-
|
177
|
-
|
194
|
+
-----
|
195
|
+
"exclude_pattern" => "\/2020\/04\/"
|
196
|
+
-----
|
197
|
+
|
198
|
+
This pattern excludes all logs containing "/2020/04/" in the path.
|
199
|
+
|
200
|
+
|
201
|
+
[id="plugins-{type}s-{plugin}-gzip_pattern"]
|
202
|
+
===== `gzip_pattern`
|
203
|
+
|
204
|
+
* Value type is <<string,string>>
|
205
|
+
* Default value is `"\.gz(ip)?$"`
|
206
|
+
|
207
|
+
Regular expression used to determine whether an input file is in gzip format.
|
208
|
+
|
209
|
+
[id="plugins-{type}s-{plugin}-include_object_properties"]
|
210
|
+
===== `include_object_properties`
|
211
|
+
|
212
|
+
* Value type is <<boolean,boolean>>
|
213
|
+
* Default value is `false`
|
214
|
+
|
215
|
+
Whether or not to include the S3 object's properties (last_modified, content_type, metadata) into each Event at
|
216
|
+
`[@metadata][s3]`. Regardless of this setting, `[@metadata][s3][key]` will always be present.
|
178
217
|
|
179
218
|
[id="plugins-{type}s-{plugin}-interval"]
|
180
219
|
===== `interval`
|
@@ -263,7 +302,14 @@ If specified, this setting must be a filename path and not just a directory.
|
|
263
302
|
|
264
303
|
Set the directory where logstash will store the tmp files before processing them.
|
265
304
|
|
305
|
+
[id="plugins-{type}s-{plugin}-watch_for_new_files"]
|
306
|
+
===== `watch_for_new_files`
|
307
|
+
|
308
|
+
* Value type is <<boolean,boolean>>
|
309
|
+
* Default value is `true`
|
266
310
|
|
311
|
+
Whether or not to watch for new files.
|
312
|
+
Disabling this option causes the input to close itself after processing the files from a single listing.
|
267
313
|
|
268
314
|
[id="plugins-{type}s-{plugin}-common-options"]
|
269
315
|
include::{include_path}/{type}.asciidoc[]
|
data/lib/logstash/inputs/s3.rb
CHANGED
@@ -3,6 +3,7 @@ require "logstash/inputs/base"
|
|
3
3
|
require "logstash/namespace"
|
4
4
|
require "logstash/plugin_mixins/aws_config"
|
5
5
|
require "time"
|
6
|
+
require "date"
|
6
7
|
require "tmpdir"
|
7
8
|
require "stud/interval"
|
8
9
|
require "stud/temporary"
|
@@ -10,12 +11,6 @@ require "aws-sdk"
|
|
10
11
|
require "logstash/inputs/s3/patch"
|
11
12
|
|
12
13
|
require 'java'
|
13
|
-
java_import java.io.InputStream
|
14
|
-
java_import java.io.InputStreamReader
|
15
|
-
java_import java.io.FileInputStream
|
16
|
-
java_import java.io.BufferedReader
|
17
|
-
java_import java.util.zip.GZIPInputStream
|
18
|
-
java_import java.util.zip.ZipException
|
19
14
|
|
20
15
|
Aws.eager_autoload!
|
21
16
|
# Stream events from files from a S3 bucket.
|
@@ -23,6 +18,14 @@ Aws.eager_autoload!
|
|
23
18
|
# Each line from each file generates an event.
|
24
19
|
# Files ending in `.gz` are handled as gzip'ed files.
|
25
20
|
class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
21
|
+
|
22
|
+
java_import java.io.InputStream
|
23
|
+
java_import java.io.InputStreamReader
|
24
|
+
java_import java.io.FileInputStream
|
25
|
+
java_import java.io.BufferedReader
|
26
|
+
java_import java.util.zip.GZIPInputStream
|
27
|
+
java_import java.util.zip.ZipException
|
28
|
+
|
26
29
|
include LogStash::PluginMixins::AwsConfig::V2
|
27
30
|
|
28
31
|
config_name "s3"
|
@@ -63,6 +66,10 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
63
66
|
# Value is in seconds.
|
64
67
|
config :interval, :validate => :number, :default => 60
|
65
68
|
|
69
|
+
# Whether to watch for new files with the interval.
|
70
|
+
# If false, overrides any interval and only lists the s3 bucket once.
|
71
|
+
config :watch_for_new_files, :validate => :boolean, :default => true
|
72
|
+
|
66
73
|
# Ruby style regexp of keys to exclude from the bucket
|
67
74
|
config :exclude_pattern, :validate => :string, :default => nil
|
68
75
|
|
@@ -70,13 +77,23 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
70
77
|
# default to the current OS temporary directory in linux /tmp/logstash
|
71
78
|
config :temporary_directory, :validate => :string, :default => File.join(Dir.tmpdir, "logstash")
|
72
79
|
|
73
|
-
|
80
|
+
# Whether or not to include the S3 object's properties (last_modified, content_type, metadata)
|
81
|
+
# into each Event at [@metadata][s3]. Regardless of this setting, [@metdata][s3][key] will always
|
82
|
+
# be present.
|
83
|
+
config :include_object_properties, :validate => :boolean, :default => false
|
84
|
+
|
85
|
+
# Regular expression used to determine whether an input file is in gzip format.
|
86
|
+
# default to an expression that matches *.gz and *.gzip file extensions
|
87
|
+
config :gzip_pattern, :validate => :string, :default => "\.gz(ip)?$"
|
88
|
+
|
89
|
+
CUTOFF_SECOND = 3
|
90
|
+
|
74
91
|
def register
|
75
92
|
require "fileutils"
|
76
93
|
require "digest/md5"
|
77
94
|
require "aws-sdk-resources"
|
78
95
|
|
79
|
-
@logger.info("Registering
|
96
|
+
@logger.info("Registering", :bucket => @bucket, :region => @region)
|
80
97
|
|
81
98
|
s3 = get_s3object
|
82
99
|
|
@@ -96,42 +113,49 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
96
113
|
end
|
97
114
|
|
98
115
|
FileUtils.mkdir_p(@temporary_directory) unless Dir.exist?(@temporary_directory)
|
116
|
+
|
117
|
+
if !@watch_for_new_files && original_params.include?('interval')
|
118
|
+
logger.warn("`watch_for_new_files` has been disabled; `interval` directive will be ignored.")
|
119
|
+
end
|
99
120
|
end
|
100
121
|
|
101
|
-
public
|
102
122
|
def run(queue)
|
103
123
|
@current_thread = Thread.current
|
104
124
|
Stud.interval(@interval) do
|
105
125
|
process_files(queue)
|
126
|
+
stop unless @watch_for_new_files
|
106
127
|
end
|
107
128
|
end # def run
|
108
129
|
|
109
|
-
public
|
110
130
|
def list_new_files
|
111
|
-
objects =
|
131
|
+
objects = []
|
112
132
|
found = false
|
113
133
|
begin
|
114
134
|
@s3bucket.objects(:prefix => @prefix).each do |log|
|
115
135
|
found = true
|
116
|
-
@logger.debug(
|
117
|
-
if
|
118
|
-
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
|
136
|
+
@logger.debug('Found key', :key => log.key)
|
137
|
+
if ignore_filename?(log.key)
|
138
|
+
@logger.debug('Ignoring', :key => log.key)
|
139
|
+
elsif log.content_length <= 0
|
140
|
+
@logger.debug('Object Zero Length', :key => log.key)
|
141
|
+
elsif !sincedb.newer?(log.last_modified)
|
142
|
+
@logger.debug('Object Not Modified', :key => log.key)
|
143
|
+
elsif log.last_modified > (Time.now - CUTOFF_SECOND).utc # file modified within last two seconds will be processed in next cycle
|
144
|
+
@logger.debug('Object Modified After Cutoff Time', :key => log.key)
|
145
|
+
elsif (log.storage_class == 'GLACIER' || log.storage_class == 'DEEP_ARCHIVE') && !file_restored?(log.object)
|
146
|
+
@logger.debug('Object Archived to Glacier', :key => log.key)
|
123
147
|
else
|
124
|
-
|
148
|
+
objects << log
|
149
|
+
@logger.debug("Added to objects[]", :key => log.key, :length => objects.length)
|
125
150
|
end
|
126
151
|
end
|
127
|
-
@logger.info('
|
152
|
+
@logger.info('No files found in bucket', :prefix => prefix) unless found
|
128
153
|
rescue Aws::Errors::ServiceError => e
|
129
|
-
@logger.error("
|
154
|
+
@logger.error("Unable to list objects in bucket", :exception => e.class, :message => e.message, :backtrace => e.backtrace, :prefix => prefix)
|
130
155
|
end
|
131
|
-
objects.
|
156
|
+
objects.sort_by { |log| log.last_modified }
|
132
157
|
end # def fetch_new_files
|
133
158
|
|
134
|
-
public
|
135
159
|
def backup_to_bucket(object)
|
136
160
|
unless @backup_to_bucket.nil?
|
137
161
|
backup_key = "#{@backup_add_prefix}#{object.key}"
|
@@ -142,28 +166,24 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
142
166
|
end
|
143
167
|
end
|
144
168
|
|
145
|
-
public
|
146
169
|
def backup_to_dir(filename)
|
147
170
|
unless @backup_to_dir.nil?
|
148
171
|
FileUtils.cp(filename, @backup_to_dir)
|
149
172
|
end
|
150
173
|
end
|
151
174
|
|
152
|
-
public
|
153
175
|
def process_files(queue)
|
154
176
|
objects = list_new_files
|
155
177
|
|
156
|
-
objects.each do |
|
178
|
+
objects.each do |log|
|
157
179
|
if stop?
|
158
180
|
break
|
159
181
|
else
|
160
|
-
|
161
|
-
process_log(queue, key)
|
182
|
+
process_log(queue, log)
|
162
183
|
end
|
163
184
|
end
|
164
185
|
end # def process_files
|
165
186
|
|
166
|
-
public
|
167
187
|
def stop
|
168
188
|
# @current_thread is initialized in the `#run` method,
|
169
189
|
# this variable is needed because the `#stop` is a called in another thread
|
@@ -177,8 +197,9 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
177
197
|
#
|
178
198
|
# @param [Queue] Where to push the event
|
179
199
|
# @param [String] Which file to read from
|
200
|
+
# @param [S3Object] Source s3 object
|
180
201
|
# @return [Boolean] True if the file was completely read, false otherwise.
|
181
|
-
def process_local_log(queue, filename,
|
202
|
+
def process_local_log(queue, filename, object)
|
182
203
|
@logger.debug('Processing file', :filename => filename)
|
183
204
|
metadata = {}
|
184
205
|
# Currently codecs operates on bytes instead of stream.
|
@@ -209,7 +230,13 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
209
230
|
event.set("cloudfront_version", metadata[:cloudfront_version]) unless metadata[:cloudfront_version].nil?
|
210
231
|
event.set("cloudfront_fields", metadata[:cloudfront_fields]) unless metadata[:cloudfront_fields].nil?
|
211
232
|
|
212
|
-
|
233
|
+
if @include_object_properties
|
234
|
+
event.set("[@metadata][s3]", object.data.to_h)
|
235
|
+
else
|
236
|
+
event.set("[@metadata][s3]", {})
|
237
|
+
end
|
238
|
+
|
239
|
+
event.set("[@metadata][s3][key]", object.key)
|
213
240
|
|
214
241
|
queue << event
|
215
242
|
end
|
@@ -223,24 +250,20 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
223
250
|
return true
|
224
251
|
end # def process_local_log
|
225
252
|
|
226
|
-
private
|
227
253
|
def event_is_metadata?(event)
|
228
254
|
return false unless event.get("message").class == String
|
229
255
|
line = event.get("message")
|
230
256
|
version_metadata?(line) || fields_metadata?(line)
|
231
257
|
end
|
232
258
|
|
233
|
-
private
|
234
259
|
def version_metadata?(line)
|
235
260
|
line.start_with?('#Version: ')
|
236
261
|
end
|
237
262
|
|
238
|
-
private
|
239
263
|
def fields_metadata?(line)
|
240
264
|
line.start_with?('#Fields: ')
|
241
265
|
end
|
242
266
|
|
243
|
-
private
|
244
267
|
def update_metadata(metadata, event)
|
245
268
|
line = event.get('message').strip
|
246
269
|
|
@@ -253,7 +276,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
253
276
|
end
|
254
277
|
end
|
255
278
|
|
256
|
-
private
|
257
279
|
def read_file(filename, &block)
|
258
280
|
if gzip?(filename)
|
259
281
|
read_gzip_file(filename, block)
|
@@ -262,7 +284,7 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
262
284
|
end
|
263
285
|
rescue => e
|
264
286
|
# skip any broken file
|
265
|
-
@logger.error("Failed to read
|
287
|
+
@logger.error("Failed to read file, processing skipped", :exception => e.class, :message => e.message, :filename => filename)
|
266
288
|
end
|
267
289
|
|
268
290
|
def read_plain_file(filename, block)
|
@@ -271,7 +293,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
271
293
|
end
|
272
294
|
end
|
273
295
|
|
274
|
-
private
|
275
296
|
def read_gzip_file(filename, block)
|
276
297
|
file_stream = FileInputStream.new(filename)
|
277
298
|
gzip_stream = GZIPInputStream.new(file_stream)
|
@@ -288,24 +309,20 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
288
309
|
file_stream.close unless file_stream.nil?
|
289
310
|
end
|
290
311
|
|
291
|
-
private
|
292
312
|
def gzip?(filename)
|
293
|
-
|
313
|
+
Regexp.new(@gzip_pattern).match(filename)
|
294
314
|
end
|
295
|
-
|
296
|
-
|
297
|
-
def sincedb
|
315
|
+
|
316
|
+
def sincedb
|
298
317
|
@sincedb ||= if @sincedb_path.nil?
|
299
318
|
@logger.info("Using default generated file for the sincedb", :filename => sincedb_file)
|
300
319
|
SinceDB::File.new(sincedb_file)
|
301
320
|
else
|
302
|
-
@logger.info("Using the provided sincedb_path",
|
303
|
-
:sincedb_path => @sincedb_path)
|
321
|
+
@logger.info("Using the provided sincedb_path", :sincedb_path => @sincedb_path)
|
304
322
|
SinceDB::File.new(@sincedb_path)
|
305
323
|
end
|
306
324
|
end
|
307
325
|
|
308
|
-
private
|
309
326
|
def sincedb_file
|
310
327
|
digest = Digest::MD5.hexdigest("#{@bucket}+#{@prefix}")
|
311
328
|
dir = File.join(LogStash::SETTINGS.get_value("path.data"), "plugins", "inputs", "s3")
|
@@ -338,11 +355,6 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
338
355
|
symbolized
|
339
356
|
end
|
340
357
|
|
341
|
-
private
|
342
|
-
def old_sincedb_file
|
343
|
-
end
|
344
|
-
|
345
|
-
private
|
346
358
|
def ignore_filename?(filename)
|
347
359
|
if @prefix == filename
|
348
360
|
return true
|
@@ -359,26 +371,28 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
359
371
|
end
|
360
372
|
end
|
361
373
|
|
362
|
-
|
363
|
-
|
364
|
-
object = @s3bucket.object(key)
|
374
|
+
def process_log(queue, log)
|
375
|
+
@logger.debug("Processing", :bucket => @bucket, :key => log.key)
|
376
|
+
object = @s3bucket.object(log.key)
|
365
377
|
|
366
|
-
filename = File.join(temporary_directory, File.basename(key))
|
378
|
+
filename = File.join(temporary_directory, File.basename(log.key))
|
367
379
|
if download_remote_file(object, filename)
|
368
|
-
if process_local_log(queue, filename,
|
369
|
-
|
370
|
-
|
371
|
-
|
372
|
-
|
373
|
-
|
374
|
-
|
380
|
+
if process_local_log(queue, filename, object)
|
381
|
+
if object.last_modified == log.last_modified
|
382
|
+
backup_to_bucket(object)
|
383
|
+
backup_to_dir(filename)
|
384
|
+
delete_file_from_bucket(object)
|
385
|
+
FileUtils.remove_entry_secure(filename, true)
|
386
|
+
sincedb.write(log.last_modified)
|
387
|
+
else
|
388
|
+
@logger.info("#{log.key} is updated at #{object.last_modified} and will process in the next cycle")
|
389
|
+
end
|
375
390
|
end
|
376
391
|
else
|
377
392
|
FileUtils.remove_entry_secure(filename, true)
|
378
393
|
end
|
379
394
|
end
|
380
395
|
|
381
|
-
private
|
382
396
|
# Stream the remove file to the local disk
|
383
397
|
#
|
384
398
|
# @param [S3Object] Reference to the remove S3 objec to download
|
@@ -386,33 +400,48 @@ class LogStash::Inputs::S3 < LogStash::Inputs::Base
|
|
386
400
|
# @return [Boolean] True if the file was completely downloaded
|
387
401
|
def download_remote_file(remote_object, local_filename)
|
388
402
|
completed = false
|
389
|
-
@logger.debug("
|
403
|
+
@logger.debug("Downloading remote file", :remote_key => remote_object.key, :local_filename => local_filename)
|
390
404
|
File.open(local_filename, 'wb') do |s3file|
|
391
405
|
return completed if stop?
|
392
406
|
begin
|
393
407
|
remote_object.get(:response_target => s3file)
|
394
408
|
completed = true
|
395
409
|
rescue Aws::Errors::ServiceError => e
|
396
|
-
@logger.warn("
|
410
|
+
@logger.warn("Unable to download remote file", :exception => e.class, :message => e.message, :remote_key => remote_object.key)
|
397
411
|
end
|
398
412
|
end
|
399
413
|
completed
|
400
414
|
end
|
401
415
|
|
402
|
-
private
|
403
416
|
def delete_file_from_bucket(object)
|
404
417
|
if @delete and @backup_to_bucket.nil?
|
405
418
|
object.delete()
|
406
419
|
end
|
407
420
|
end
|
408
421
|
|
409
|
-
private
|
410
422
|
def get_s3object
|
411
423
|
options = symbolized_settings.merge(aws_options_hash || {})
|
412
424
|
s3 = Aws::S3::Resource.new(options)
|
413
425
|
end
|
414
426
|
|
415
|
-
|
427
|
+
def file_restored?(object)
|
428
|
+
begin
|
429
|
+
restore = object.data.restore
|
430
|
+
if restore && restore.match(/ongoing-request\s?=\s?["']false["']/)
|
431
|
+
if restore = restore.match(/expiry-date\s?=\s?["'](.*?)["']/)
|
432
|
+
expiry_date = DateTime.parse(restore[1])
|
433
|
+
return true if DateTime.now < expiry_date # restored
|
434
|
+
else
|
435
|
+
@logger.debug("No expiry-date header for restore request: #{object.data.restore}")
|
436
|
+
return nil # no expiry-date found for ongoing request
|
437
|
+
end
|
438
|
+
end
|
439
|
+
rescue => e
|
440
|
+
@logger.debug("Could not determine Glacier restore status", :exception => e.class, :message => e.message)
|
441
|
+
end
|
442
|
+
return false
|
443
|
+
end
|
444
|
+
|
416
445
|
module SinceDB
|
417
446
|
class File
|
418
447
|
def initialize(file)
|
data/logstash-input-s3.gemspec
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
|
3
3
|
s.name = 'logstash-input-s3'
|
4
|
-
s.version = '3.
|
4
|
+
s.version = '3.6.0'
|
5
5
|
s.licenses = ['Apache-2.0']
|
6
6
|
s.summary = "Streams events from files in a S3 bucket"
|
7
7
|
s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
|
Binary file
|
data/spec/inputs/s3_spec.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
# encoding: utf-8
|
2
2
|
require "logstash/devutils/rspec/spec_helper"
|
3
|
+
require "logstash/devutils/rspec/shared_examples"
|
3
4
|
require "logstash/inputs/s3"
|
4
5
|
require "logstash/codecs/multiline"
|
5
6
|
require "logstash/errors"
|
@@ -23,6 +24,7 @@ describe LogStash::Inputs::S3 do
|
|
23
24
|
"sincedb_path" => File.join(sincedb_path, ".sincedb")
|
24
25
|
}
|
25
26
|
}
|
27
|
+
let(:cutoff) { LogStash::Inputs::S3::CUTOFF_SECOND }
|
26
28
|
|
27
29
|
|
28
30
|
before do
|
@@ -32,10 +34,11 @@ describe LogStash::Inputs::S3 do
|
|
32
34
|
end
|
33
35
|
|
34
36
|
context "when interrupting the plugin" do
|
35
|
-
let(:config) { super.merge({ "interval" => 5 }) }
|
37
|
+
let(:config) { super().merge({ "interval" => 5 }) }
|
38
|
+
let(:s3_obj) { double(:key => "awesome-key", :last_modified => Time.now.round, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) }
|
36
39
|
|
37
40
|
before do
|
38
|
-
expect_any_instance_of(LogStash::Inputs::S3).to receive(:list_new_files).and_return(TestInfiniteS3Object.new)
|
41
|
+
expect_any_instance_of(LogStash::Inputs::S3).to receive(:list_new_files).and_return(TestInfiniteS3Object.new(s3_obj))
|
39
42
|
end
|
40
43
|
|
41
44
|
it_behaves_like "an interruptible input plugin"
|
@@ -114,32 +117,61 @@ describe LogStash::Inputs::S3 do
|
|
114
117
|
describe "#list_new_files" do
|
115
118
|
before { allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { objects_list } }
|
116
119
|
|
117
|
-
let!(:
|
120
|
+
let!(:present_object_after_cutoff) {double(:key => 'this-should-not-be-present', :last_modified => Time.now, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) }
|
121
|
+
let!(:present_object) {double(:key => 'this-should-be-present', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'STANDARD', :object => double(:data => double(:restore => nil)) ) }
|
122
|
+
let!(:archived_object) {double(:key => 'this-should-be-archived', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => nil)) ) }
|
123
|
+
let!(:deep_archived_object) {double(:key => 'this-should-be-archived', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => nil)) ) }
|
124
|
+
let!(:restored_object) {double(:key => 'this-should-be-restored-from-archive', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'GLACIER', :object => double(:data => double(:restore => 'ongoing-request="false", expiry-date="Thu, 01 Jan 2099 00:00:00 GMT"')) ) }
|
125
|
+
let!(:deep_restored_object) {double(:key => 'this-should-be-restored-from-deep-archive', :last_modified => Time.now - cutoff, :content_length => 10, :storage_class => 'DEEP_ARCHIVE', :object => double(:data => double(:restore => 'ongoing-request="false", expiry-date="Thu, 01 Jan 2099 00:00:00 GMT"')) ) }
|
118
126
|
let(:objects_list) {
|
119
127
|
[
|
120
|
-
double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100),
|
121
|
-
double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50),
|
122
|
-
|
128
|
+
double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100, :storage_class => 'STANDARD'),
|
129
|
+
double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50, :storage_class => 'STANDARD'),
|
130
|
+
archived_object,
|
131
|
+
restored_object,
|
132
|
+
deep_restored_object,
|
133
|
+
present_object,
|
134
|
+
present_object_after_cutoff
|
123
135
|
]
|
124
136
|
}
|
125
137
|
|
126
138
|
it 'should allow user to exclude files from the s3 bucket' do
|
127
139
|
plugin = LogStash::Inputs::S3.new(config.merge({ "exclude_pattern" => "^exclude" }))
|
128
140
|
plugin.register
|
129
|
-
|
141
|
+
|
142
|
+
files = plugin.list_new_files.map { |item| item.key }
|
143
|
+
expect(files).to include(present_object.key)
|
144
|
+
expect(files).to include(restored_object.key)
|
145
|
+
expect(files).to include(deep_restored_object.key)
|
146
|
+
expect(files).to_not include('exclude-this-file-1') # matches exclude pattern
|
147
|
+
expect(files).to_not include('exclude/logstash') # matches exclude pattern
|
148
|
+
expect(files).to_not include(archived_object.key) # archived
|
149
|
+
expect(files).to_not include(deep_archived_object.key) # archived
|
150
|
+
expect(files).to_not include(present_object_after_cutoff.key) # after cutoff
|
151
|
+
expect(files.size).to eq(3)
|
130
152
|
end
|
131
153
|
|
132
154
|
it 'should support not providing a exclude pattern' do
|
133
155
|
plugin = LogStash::Inputs::S3.new(config)
|
134
156
|
plugin.register
|
135
|
-
|
157
|
+
|
158
|
+
files = plugin.list_new_files.map { |item| item.key }
|
159
|
+
expect(files).to include(present_object.key)
|
160
|
+
expect(files).to include(restored_object.key)
|
161
|
+
expect(files).to include(deep_restored_object.key)
|
162
|
+
expect(files).to include('exclude-this-file-1') # no exclude pattern given
|
163
|
+
expect(files).to include('exclude/logstash') # no exclude pattern given
|
164
|
+
expect(files).to_not include(archived_object.key) # archived
|
165
|
+
expect(files).to_not include(deep_archived_object.key) # archived
|
166
|
+
expect(files).to_not include(present_object_after_cutoff.key) # after cutoff
|
167
|
+
expect(files.size).to eq(5)
|
136
168
|
end
|
137
169
|
|
138
170
|
context 'when all files are excluded from a bucket' do
|
139
171
|
let(:objects_list) {
|
140
172
|
[
|
141
|
-
double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100),
|
142
|
-
double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50),
|
173
|
+
double(:key => 'exclude-this-file-1', :last_modified => Time.now - 2 * day, :content_length => 100, :storage_class => 'STANDARD'),
|
174
|
+
double(:key => 'exclude/logstash', :last_modified => Time.now - 2 * day, :content_length => 50, :storage_class => 'STANDARD'),
|
143
175
|
]
|
144
176
|
}
|
145
177
|
|
@@ -168,7 +200,7 @@ describe LogStash::Inputs::S3 do
|
|
168
200
|
context "If the bucket is the same as the backup bucket" do
|
169
201
|
it 'should ignore files from the bucket if they match the backup prefix' do
|
170
202
|
objects_list = [
|
171
|
-
double(:key => 'mybackup-log-1', :last_modified => Time.now, :content_length => 5),
|
203
|
+
double(:key => 'mybackup-log-1', :last_modified => Time.now, :content_length => 5, :storage_class => 'STANDARD'),
|
172
204
|
present_object
|
173
205
|
]
|
174
206
|
|
@@ -177,24 +209,38 @@ describe LogStash::Inputs::S3 do
|
|
177
209
|
plugin = LogStash::Inputs::S3.new(config.merge({ 'backup_add_prefix' => 'mybackup',
|
178
210
|
'backup_to_bucket' => config['bucket']}))
|
179
211
|
plugin.register
|
180
|
-
|
212
|
+
|
213
|
+
files = plugin.list_new_files.map { |item| item.key }
|
214
|
+
expect(files).to include(present_object.key)
|
215
|
+
expect(files).to_not include('mybackup-log-1') # matches backup prefix
|
216
|
+
expect(files.size).to eq(1)
|
181
217
|
end
|
182
218
|
end
|
183
219
|
|
184
220
|
it 'should ignore files older than X' do
|
185
221
|
plugin = LogStash::Inputs::S3.new(config.merge({ 'backup_add_prefix' => 'exclude-this-file'}))
|
186
222
|
|
187
|
-
|
223
|
+
|
224
|
+
allow_any_instance_of(LogStash::Inputs::S3::SinceDB::File).to receive(:read).and_return(Time.now - day)
|
188
225
|
plugin.register
|
189
226
|
|
190
|
-
|
227
|
+
files = plugin.list_new_files.map { |item| item.key }
|
228
|
+
expect(files).to include(present_object.key)
|
229
|
+
expect(files).to include(restored_object.key)
|
230
|
+
expect(files).to include(deep_restored_object.key)
|
231
|
+
expect(files).to_not include('exclude-this-file-1') # too old
|
232
|
+
expect(files).to_not include('exclude/logstash') # too old
|
233
|
+
expect(files).to_not include(archived_object.key) # archived
|
234
|
+
expect(files).to_not include(deep_archived_object.key) # archived
|
235
|
+
expect(files).to_not include(present_object_after_cutoff.key) # after cutoff
|
236
|
+
expect(files.size).to eq(3)
|
191
237
|
end
|
192
238
|
|
193
239
|
it 'should ignore file if the file match the prefix' do
|
194
240
|
prefix = 'mysource/'
|
195
241
|
|
196
242
|
objects_list = [
|
197
|
-
double(:key => prefix, :last_modified => Time.now, :content_length => 5),
|
243
|
+
double(:key => prefix, :last_modified => Time.now, :content_length => 5, :storage_class => 'STANDARD'),
|
198
244
|
present_object
|
199
245
|
]
|
200
246
|
|
@@ -202,14 +248,15 @@ describe LogStash::Inputs::S3 do
|
|
202
248
|
|
203
249
|
plugin = LogStash::Inputs::S3.new(config.merge({ 'prefix' => prefix }))
|
204
250
|
plugin.register
|
205
|
-
expect(plugin.list_new_files).to eq([present_object.key])
|
251
|
+
expect(plugin.list_new_files.map { |item| item.key }).to eq([present_object.key])
|
206
252
|
end
|
207
253
|
|
208
254
|
it 'should sort return object sorted by last_modification date with older first' do
|
209
255
|
objects = [
|
210
|
-
double(:key => 'YESTERDAY', :last_modified => Time.now - day, :content_length => 5),
|
211
|
-
double(:key => 'TODAY', :last_modified => Time.now, :content_length => 5),
|
212
|
-
double(:key => '
|
256
|
+
double(:key => 'YESTERDAY', :last_modified => Time.now - day, :content_length => 5, :storage_class => 'STANDARD'),
|
257
|
+
double(:key => 'TODAY', :last_modified => Time.now, :content_length => 5, :storage_class => 'STANDARD'),
|
258
|
+
double(:key => 'TODAY_BEFORE_CUTOFF', :last_modified => Time.now - cutoff, :content_length => 5, :storage_class => 'STANDARD'),
|
259
|
+
double(:key => 'TWO_DAYS_AGO', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD')
|
213
260
|
]
|
214
261
|
|
215
262
|
allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { objects }
|
@@ -217,7 +264,7 @@ describe LogStash::Inputs::S3 do
|
|
217
264
|
|
218
265
|
plugin = LogStash::Inputs::S3.new(config)
|
219
266
|
plugin.register
|
220
|
-
expect(plugin.list_new_files).to eq(['TWO_DAYS_AGO', 'YESTERDAY', '
|
267
|
+
expect(plugin.list_new_files.map { |item| item.key }).to eq(['TWO_DAYS_AGO', 'YESTERDAY', 'TODAY_BEFORE_CUTOFF'])
|
221
268
|
end
|
222
269
|
|
223
270
|
describe "when doing backup on the s3" do
|
@@ -277,7 +324,7 @@ describe LogStash::Inputs::S3 do
|
|
277
324
|
it 'should process events' do
|
278
325
|
events = fetch_events(config)
|
279
326
|
expect(events.size).to eq(events_to_process)
|
280
|
-
|
327
|
+
expect(events[0].get("[@metadata][s3][key]")).to eql log.key
|
281
328
|
end
|
282
329
|
|
283
330
|
it "deletes the temporary file" do
|
@@ -315,7 +362,7 @@ describe LogStash::Inputs::S3 do
|
|
315
362
|
%w(AccessDenied NoSuchKey).each do |error|
|
316
363
|
context "when retrieving an object, #{error} is returned" do
|
317
364
|
let(:objects) { [log] }
|
318
|
-
let(:log) { double(:key => 'uncompressed.log', :last_modified => Time.now - 2 * day, :content_length => 5) }
|
365
|
+
let(:log) { double(:key => 'uncompressed.log', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
|
319
366
|
|
320
367
|
let(:config) {
|
321
368
|
{
|
@@ -344,7 +391,7 @@ describe LogStash::Inputs::S3 do
|
|
344
391
|
|
345
392
|
context 'when working with logs' do
|
346
393
|
let(:objects) { [log] }
|
347
|
-
let(:log) { double(:key => 'uncompressed.log', :last_modified => Time.now - 2 * day, :content_length => 5) }
|
394
|
+
let(:log) { double(:key => 'uncompressed.log', :last_modified => Time.now - 2 * day, :content_length => 5, :data => { "etag" => 'c2c966251da0bc3229d12c2642ba50a4' }, :storage_class => 'STANDARD') }
|
348
395
|
let(:data) { File.read(log_file) }
|
349
396
|
|
350
397
|
before do
|
@@ -389,28 +436,35 @@ describe LogStash::Inputs::S3 do
|
|
389
436
|
end
|
390
437
|
|
391
438
|
context "multiple compressed streams" do
|
392
|
-
let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5) }
|
439
|
+
let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
|
393
440
|
let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'multiple_compressed_streams.gz') }
|
394
441
|
|
395
442
|
include_examples "generated events" do
|
396
443
|
let(:events_to_process) { 16 }
|
397
444
|
end
|
398
445
|
end
|
399
|
-
|
446
|
+
|
400
447
|
context 'compressed' do
|
401
|
-
let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5) }
|
448
|
+
let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
|
402
449
|
let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gz') }
|
403
450
|
|
404
451
|
include_examples "generated events"
|
405
452
|
end
|
406
453
|
|
407
|
-
context 'compressed with gzip extension' do
|
408
|
-
let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5) }
|
454
|
+
context 'compressed with gzip extension and using default gzip_pattern option' do
|
455
|
+
let(:log) { double(:key => 'log.gz', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
|
409
456
|
let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gzip') }
|
410
457
|
|
411
458
|
include_examples "generated events"
|
412
459
|
end
|
413
460
|
|
461
|
+
context 'compressed with gzip extension and using custom gzip_pattern option' do
|
462
|
+
let(:config) { super().merge({ "gzip_pattern" => "gee.zip$" }) }
|
463
|
+
let(:log) { double(:key => 'log.gee.zip', :last_modified => Time.now - 2 * day, :content_length => 5, :storage_class => 'STANDARD') }
|
464
|
+
let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'compressed.log.gee.zip') }
|
465
|
+
include_examples "generated events"
|
466
|
+
end
|
467
|
+
|
414
468
|
context 'plain text' do
|
415
469
|
let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') }
|
416
470
|
|
@@ -451,5 +505,95 @@ describe LogStash::Inputs::S3 do
|
|
451
505
|
|
452
506
|
include_examples "generated events"
|
453
507
|
end
|
508
|
+
|
509
|
+
context 'when include_object_properties is set to true' do
|
510
|
+
let(:config) { super().merge({ "include_object_properties" => true }) }
|
511
|
+
let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') }
|
512
|
+
|
513
|
+
it 'should extract object properties onto [@metadata][s3]' do
|
514
|
+
events = fetch_events(config)
|
515
|
+
events.each do |event|
|
516
|
+
expect(event.get('[@metadata][s3]')).to include(log.data)
|
517
|
+
end
|
518
|
+
end
|
519
|
+
|
520
|
+
include_examples "generated events"
|
521
|
+
end
|
522
|
+
|
523
|
+
context 'when include_object_properties is set to false' do
|
524
|
+
let(:config) { super().merge({ "include_object_properties" => false }) }
|
525
|
+
let(:log_file) { File.join(File.dirname(__FILE__), '..', 'fixtures', 'uncompressed.log') }
|
526
|
+
|
527
|
+
it 'should NOT extract object properties onto [@metadata][s3]' do
|
528
|
+
events = fetch_events(config)
|
529
|
+
events.each do |event|
|
530
|
+
expect(event.get('[@metadata][s3]')).to_not include(log.data)
|
531
|
+
end
|
532
|
+
end
|
533
|
+
|
534
|
+
include_examples "generated events"
|
535
|
+
end
|
536
|
+
end
|
537
|
+
|
538
|
+
describe "data loss" do
|
539
|
+
let(:s3_plugin) { LogStash::Inputs::S3.new(config) }
|
540
|
+
let(:queue) { [] }
|
541
|
+
|
542
|
+
before do
|
543
|
+
s3_plugin.register
|
544
|
+
end
|
545
|
+
|
546
|
+
context 'events come after cutoff time' do
|
547
|
+
it 'should be processed in next cycle' do
|
548
|
+
s3_objects = [
|
549
|
+
double(:key => 'TWO_DAYS_AGO', :last_modified => Time.now.round - 2 * day, :content_length => 5, :storage_class => 'STANDARD'),
|
550
|
+
double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'),
|
551
|
+
double(:key => 'TODAY_BEFORE_CUTOFF', :last_modified => Time.now.round - cutoff, :content_length => 5, :storage_class => 'STANDARD'),
|
552
|
+
double(:key => 'TODAY', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD'),
|
553
|
+
double(:key => 'TODAY', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD')
|
554
|
+
]
|
555
|
+
size = s3_objects.length
|
556
|
+
|
557
|
+
allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { s3_objects }
|
558
|
+
allow_any_instance_of(Aws::S3::Bucket).to receive(:object).and_return(*s3_objects)
|
559
|
+
expect(s3_plugin).to receive(:process_log).at_least(size).and_call_original
|
560
|
+
expect(s3_plugin).to receive(:stop?).and_return(false).at_least(size)
|
561
|
+
expect(s3_plugin).to receive(:download_remote_file).and_return(true).at_least(size)
|
562
|
+
expect(s3_plugin).to receive(:process_local_log).and_return(true).at_least(size)
|
563
|
+
|
564
|
+
# first iteration
|
565
|
+
s3_plugin.process_files(queue)
|
566
|
+
|
567
|
+
# second iteration
|
568
|
+
sleep(cutoff + 1)
|
569
|
+
s3_plugin.process_files(queue)
|
570
|
+
end
|
571
|
+
end
|
572
|
+
|
573
|
+
context 's3 object updated after getting summary' do
|
574
|
+
it 'should not update sincedb' do
|
575
|
+
s3_summary = [
|
576
|
+
double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'),
|
577
|
+
double(:key => 'TODAY', :last_modified => Time.now.round - (cutoff * 10), :content_length => 5, :storage_class => 'STANDARD')
|
578
|
+
]
|
579
|
+
|
580
|
+
s3_objects = [
|
581
|
+
double(:key => 'YESTERDAY', :last_modified => Time.now.round - day, :content_length => 5, :storage_class => 'STANDARD'),
|
582
|
+
double(:key => 'TODAY_UPDATED', :last_modified => Time.now.round, :content_length => 5, :storage_class => 'STANDARD')
|
583
|
+
]
|
584
|
+
|
585
|
+
size = s3_objects.length
|
586
|
+
|
587
|
+
allow_any_instance_of(Aws::S3::Bucket).to receive(:objects) { s3_summary }
|
588
|
+
allow_any_instance_of(Aws::S3::Bucket).to receive(:object).and_return(*s3_objects)
|
589
|
+
expect(s3_plugin).to receive(:process_log).at_least(size).and_call_original
|
590
|
+
expect(s3_plugin).to receive(:stop?).and_return(false).at_least(size)
|
591
|
+
expect(s3_plugin).to receive(:download_remote_file).and_return(true).at_least(size)
|
592
|
+
expect(s3_plugin).to receive(:process_local_log).and_return(true).at_least(size)
|
593
|
+
|
594
|
+
s3_plugin.process_files(queue)
|
595
|
+
expect(s3_plugin.send(:sincedb).read).to eq(s3_summary[0].last_modified)
|
596
|
+
end
|
597
|
+
end
|
454
598
|
end
|
455
599
|
end
|
data/spec/integration/s3_spec.rb
CHANGED
@@ -10,6 +10,7 @@ describe LogStash::Inputs::S3, :integration => true, :s3 => true do
|
|
10
10
|
|
11
11
|
upload_file('../fixtures/uncompressed.log' , "#{prefix}uncompressed_1.log")
|
12
12
|
upload_file('../fixtures/compressed.log.gz', "#{prefix}compressed_1.log.gz")
|
13
|
+
sleep(LogStash::Inputs::S3::CUTOFF_SECOND + 1)
|
13
14
|
end
|
14
15
|
|
15
16
|
after do
|
@@ -28,6 +29,7 @@ describe LogStash::Inputs::S3, :integration => true, :s3 => true do
|
|
28
29
|
"prefix" => prefix,
|
29
30
|
"temporary_directory" => temporary_directory } }
|
30
31
|
let(:backup_prefix) { "backup/" }
|
32
|
+
let(:backup_bucket) { "logstash-s3-input-backup" }
|
31
33
|
|
32
34
|
it "support prefix to scope the remote files" do
|
33
35
|
events = fetch_events(minimal_settings)
|
@@ -49,13 +51,17 @@ describe LogStash::Inputs::S3, :integration => true, :s3 => true do
|
|
49
51
|
end
|
50
52
|
|
51
53
|
context "remote backup" do
|
54
|
+
before do
|
55
|
+
create_bucket(backup_bucket)
|
56
|
+
end
|
57
|
+
|
52
58
|
it "another bucket" do
|
53
|
-
fetch_events(minimal_settings.merge({ "backup_to_bucket" =>
|
54
|
-
expect(list_remote_files("",
|
59
|
+
fetch_events(minimal_settings.merge({ "backup_to_bucket" => backup_bucket}))
|
60
|
+
expect(list_remote_files("", backup_bucket).size).to eq(2)
|
55
61
|
end
|
56
62
|
|
57
63
|
after do
|
58
|
-
delete_bucket(
|
64
|
+
delete_bucket(backup_bucket)
|
59
65
|
end
|
60
66
|
end
|
61
67
|
end
|
data/spec/support/helpers.rb
CHANGED
@@ -23,6 +23,10 @@ def list_remote_files(prefix, target_bucket = ENV['AWS_LOGSTASH_TEST_BUCKET'])
|
|
23
23
|
bucket.objects(:prefix => prefix).collect(&:key)
|
24
24
|
end
|
25
25
|
|
26
|
+
def create_bucket(name)
|
27
|
+
s3object.bucket(name).create
|
28
|
+
end
|
29
|
+
|
26
30
|
def delete_bucket(name)
|
27
31
|
s3object.bucket(name).objects.map(&:delete)
|
28
32
|
s3object.bucket(name).delete
|
@@ -33,13 +37,16 @@ def s3object
|
|
33
37
|
end
|
34
38
|
|
35
39
|
class TestInfiniteS3Object
|
40
|
+
def initialize(s3_obj)
|
41
|
+
@s3_obj = s3_obj
|
42
|
+
end
|
43
|
+
|
36
44
|
def each
|
37
45
|
counter = 1
|
38
46
|
|
39
47
|
loop do
|
40
|
-
yield
|
48
|
+
yield @s3_obj
|
41
49
|
counter +=1
|
42
50
|
end
|
43
51
|
end
|
44
|
-
end
|
45
|
-
|
52
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: logstash-input-s3
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.
|
4
|
+
version: 3.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Elastic
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-03-11 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|
@@ -119,6 +119,7 @@ files:
|
|
119
119
|
- lib/logstash/inputs/s3/patch.rb
|
120
120
|
- logstash-input-s3.gemspec
|
121
121
|
- spec/fixtures/cloudfront.log
|
122
|
+
- spec/fixtures/compressed.log.gee.zip
|
122
123
|
- spec/fixtures/compressed.log.gz
|
123
124
|
- spec/fixtures/compressed.log.gzip
|
124
125
|
- spec/fixtures/invalid_utf8.gbk.log
|
@@ -159,6 +160,7 @@ specification_version: 4
|
|
159
160
|
summary: Streams events from files in a S3 bucket
|
160
161
|
test_files:
|
161
162
|
- spec/fixtures/cloudfront.log
|
163
|
+
- spec/fixtures/compressed.log.gee.zip
|
162
164
|
- spec/fixtures/compressed.log.gz
|
163
165
|
- spec/fixtures/compressed.log.gzip
|
164
166
|
- spec/fixtures/invalid_utf8.gbk.log
|