embulk-input-splunk 0.1.3 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +24 -6
- data/embulk-input-splunk.gemspec +1 -2
- data/lib/embulk/input/splunk.rb +39 -20
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: daf345bda2bb42c7ae1945b66cd0d70ebdd6c02cb1137c83c9ee34916716e6e4
|
4
|
+
data.tar.gz: 503aabbc4ff9e9c5b0674f4bb910a0b40f1f01d73d53cbf0f89a4702935641ae
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 78f10b454a736fccfb373419c35897c7fcd9fc76970d4116e8101dd536d2e5b9fb108d29a2bebd7c3077d88ef93ef8253dcb3507f77327ddca3e9ab0ddb3de7b
|
7
|
+
data.tar.gz: 398aa35c57d07675a6ad2cf1e0f71b731304d3942644002b28789c898e454c29df171e3ce4ada0d3681434c071b95c0e04412ae48c2e44f5f52809f00de1b556
|
data/README.md
CHANGED
@@ -2,14 +2,18 @@
|
|
2
2
|
|
3
3
|
A simple plug-in to run a once-off Splunk query and emit the results.
|
4
4
|
|
5
|
-
This plugin
|
5
|
+
This plugin uses Splunks `table` command to effeciently and flexibly return results. If you want more flexibility, you can add `_raw` as a table field and then use filter plugins such as [embulk-filter-expand_json](https://github.com/civitaspo/embulk-filter-expand_json) or [embulk-filter-add_time](https://github.com/treasure-data/embulk-filter-add_time) to convert the json column to typed columns. [Rename filter](http://www.embulk.org/docs/built-in.html#rename-filter-plugin) is also useful to rename the typed columns.
|
6
6
|
|
7
|
-
|
7
|
+
# _time and this plugin
|
8
|
+
|
9
|
+
This plugin expects and requires `_time`. If you do not include time in your list of columns, this plugin will automatically add it. It is possible to rename or reformat `_time` in the query in a such a way that this plugin will fail or have unexpected results. It is recommended you do not alter the `_time` in the query unless you know what you're doing. If you need to do something esoteric with `_time`, create another field to work with in your Splunk query.
|
10
|
+
|
11
|
+
In addition, as a column we treat `_time` as a String, but only because we couldn't get the plugin to work with timestamps. We'd welcome a pull request to fix this issue.
|
8
12
|
|
9
13
|
## Overview
|
10
14
|
|
11
15
|
- **Plugin type**: input
|
12
|
-
- **Resume supported**:
|
16
|
+
- **Resume supported**: yes
|
13
17
|
- **Cleanup supported**: no
|
14
18
|
- **Guess supported**: no
|
15
19
|
|
@@ -25,6 +29,7 @@ Note that the time is fetched from Splunk's `_time` field. It is possible to ren
|
|
25
29
|
- **earliest_time**: the earliest time for the splunk search. (string, default: nil, which is unbounded)
|
26
30
|
- **latest_time**: the latest time for the splunk search. (string, default: nil, which is unbounded)
|
27
31
|
- **incremental**: whether to resume next search from last result time (boolean, default: false)
|
32
|
+
- **table** array of columns to include in the results (array, default: [])
|
28
33
|
|
29
34
|
### Earliest and latest times
|
30
35
|
|
@@ -45,7 +50,7 @@ The default Splunk API limits resuts to 100. In this plugin, the limit is not se
|
|
45
50
|
|
46
51
|
## Examples
|
47
52
|
|
48
|
-
Remember the queries much be prefixed with the search command or they are unlikely not to work.
|
53
|
+
Remember the queries much be prefixed with the `search` command or they are unlikely not to work. See examples below.
|
49
54
|
|
50
55
|
### Unbounded time range
|
51
56
|
|
@@ -57,6 +62,9 @@ in:
|
|
57
62
|
password: abc123
|
58
63
|
port: 8089
|
59
64
|
query: search index="main"
|
65
|
+
table:
|
66
|
+
# We treat time as a string, only because we can't get timestamp + format to work
|
67
|
+
- {name: "_time", type: "string"}
|
60
68
|
```
|
61
69
|
|
62
70
|
### Relative time range
|
@@ -70,6 +78,10 @@ in:
|
|
70
78
|
port: 8089
|
71
79
|
query: search index="main"
|
72
80
|
earliest_time: -1m@m
|
81
|
+
table:
|
82
|
+
- {name: "_time", type: "string"}
|
83
|
+
- {name: "foo", type: "string"}
|
84
|
+
- {name: "bar", type: "long"}
|
73
85
|
```
|
74
86
|
|
75
87
|
### Absolute time range
|
@@ -84,6 +96,10 @@ in:
|
|
84
96
|
query: search index="main"
|
85
97
|
earliest_time: 2017-01-18T19:23:08.237+11:00
|
86
98
|
latest_time: 2018-01-18T19:23:08.237+11:00
|
99
|
+
table:
|
100
|
+
- {name: "_time", type: "string"}
|
101
|
+
- {name: "foo", type: "string"}
|
102
|
+
- {name: "bar", type: "long"}
|
87
103
|
```
|
88
104
|
|
89
105
|
### Complex Searches
|
@@ -102,13 +118,15 @@ in:
|
|
102
118
|
query: |
|
103
119
|
search index="main" |
|
104
120
|
eval foo=bar |
|
105
|
-
where like(bar, "%baz%" |
|
121
|
+
where like(bar, "%baz%") |
|
106
122
|
head 100
|
107
123
|
earliest_time: 2017-01-18T19:23:08.237+11:00
|
108
124
|
latest_time: 2018-01-18T19:23:08.237+11:00
|
125
|
+
table:
|
126
|
+
- {name: "_time", type: "string"}
|
127
|
+
- {name: "foo", type: "string"} # Uses foo from the above query
|
109
128
|
```
|
110
129
|
|
111
|
-
|
112
130
|
## Build
|
113
131
|
|
114
132
|
```
|
data/embulk-input-splunk.gemspec
CHANGED
data/lib/embulk/input/splunk.rb
CHANGED
@@ -13,9 +13,11 @@ module Embulk
|
|
13
13
|
SPLUNK_UNLIMITED_RESULTS = 0
|
14
14
|
SPLUNK_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%L%:z"
|
15
15
|
SPLUNK_OUTPUT_FORMAT = "json"
|
16
|
+
SPLUNK_DEFAULT_TIME_FIELD = "_time"
|
17
|
+
SPLUNK_TIME_FIELD = { "name" => SPLUNK_DEFAULT_TIME_FIELD, "type" => "string" }
|
16
18
|
|
17
19
|
def self.transaction(config, &control)
|
18
|
-
|
20
|
+
|
19
21
|
task = {
|
20
22
|
"scheme" => config.param("scheme", :string, default: "https"),
|
21
23
|
"host" => config.param("host", :string),
|
@@ -29,16 +31,21 @@ module Embulk
|
|
29
31
|
"latest_time" => config.param(:latest_time, :string, default: nil),
|
30
32
|
|
31
33
|
"incremental" => config.param("incremental", :bool, default: false),
|
34
|
+
"table" => config.param("table", :array, default: [])
|
32
35
|
}
|
33
36
|
|
34
37
|
if task["incremental"] && task["latest_time"]
|
35
38
|
Embulk.logger.warn "Incremental is 'true' and latest_time is set. This may have unexpected results."
|
36
39
|
end
|
40
|
+
|
41
|
+
if task["table"].select { |field| field["name"] == "_time" }.empty?
|
42
|
+
Embulk.logger.warn "_time is not included in table. Automatically adding it."
|
43
|
+
task["table"] << SPLUNK_TIME_FIELD
|
44
|
+
end
|
37
45
|
|
38
|
-
columns = [
|
39
|
-
Column.new(
|
40
|
-
|
41
|
-
]
|
46
|
+
columns = task["table"].map do |column|
|
47
|
+
Column.new(nil, column["name"], column["type"]&.to_sym || :string, column["format"])
|
48
|
+
end
|
42
49
|
|
43
50
|
resume(task, columns, 1, &control)
|
44
51
|
end
|
@@ -48,15 +55,16 @@ module Embulk
|
|
48
55
|
|
49
56
|
next_config_diff = {}
|
50
57
|
|
51
|
-
|
52
|
-
next_config_diff[:earliest_time] = Time.parse( task_reports.first[:latest_time_in_results] ).strftime(SPLUNK_TIME_FORMAT)
|
53
|
-
end
|
58
|
+
latest_time_in_results = task_reports.first[:latest_time_in_results]
|
54
59
|
|
60
|
+
if task["incremental"] && latest_time_in_results.present?
|
61
|
+
next_config_diff[:earliest_time] = latest_time_in_results
|
62
|
+
end
|
63
|
+
|
55
64
|
return next_config_diff
|
56
65
|
end
|
57
66
|
|
58
67
|
def init
|
59
|
-
# initialization code:
|
60
68
|
splunk_config = {
|
61
69
|
:scheme => task[:scheme],
|
62
70
|
:host => task[:host],
|
@@ -64,14 +72,23 @@ module Embulk
|
|
64
72
|
:username => task[:username],
|
65
73
|
:password => task[:password]
|
66
74
|
}
|
67
|
-
|
68
|
-
@query = task["query"]
|
69
75
|
@earliest_time, @latest_time = task[:earliest_time], task[:latest_time]
|
70
|
-
|
76
|
+
Embulk.logger.info "Earliest time: #{@earliest_time} / Latest time: #{@latest_time}"
|
77
|
+
|
78
|
+
@fields = task["table"].collect { |entry| entry["name"] }
|
79
|
+
Embulk.logger.info "Using fields #{@fields.join', '} in query"
|
80
|
+
|
81
|
+
@query = build_query( task[:query] )
|
82
|
+
|
71
83
|
Embulk.logger.info "Establishing connection to Splunk"
|
72
84
|
@service = ::Splunk::connect(splunk_config)
|
73
85
|
end
|
74
86
|
|
87
|
+
def build_query(query)
|
88
|
+
# Append table expression to query. Even if already present in the query, this should do no harm.
|
89
|
+
"#{query} | table #{ @fields.join(", ") } "
|
90
|
+
end
|
91
|
+
|
75
92
|
def run
|
76
93
|
Embulk.logger.info "Running query `#{@query}`"
|
77
94
|
|
@@ -83,22 +100,24 @@ module Embulk
|
|
83
100
|
|
84
101
|
reader = ::Splunk::ResultsReader.new(stream)
|
85
102
|
|
86
|
-
|
103
|
+
latest_time = nil
|
87
104
|
|
88
105
|
reader.each do |result|
|
89
|
-
event_time
|
90
|
-
|
106
|
+
#We convert event_time to Ruby time for comparison only.
|
107
|
+
event_time = Time.strptime( result[SPLUNK_DEFAULT_TIME_FIELD], SPLUNK_TIME_FORMAT )
|
91
108
|
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
109
|
+
#We need to keep track of latest time for incremental loads.
|
110
|
+
# Unfortunately, Splunk was not respecting our sort requests, so we need to do a comparison for each row.
|
111
|
+
latest_time = latest_time.nil? ? event_time : [latest_time, event_time].max
|
112
|
+
|
113
|
+
row = @fields.map { |field| result[ field ] }
|
114
|
+
page_builder.add( row )
|
96
115
|
end
|
97
116
|
|
98
117
|
page_builder.finish
|
99
118
|
|
100
119
|
task_result = {
|
101
|
-
latest_time_in_results:
|
120
|
+
latest_time_in_results: latest_time.strftime(SPLUNK_TIME_FORMAT)
|
102
121
|
}
|
103
122
|
|
104
123
|
return task_result
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: embulk-input-splunk
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Scott Arbeitman
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-02-
|
11
|
+
date: 2018-02-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|