embulk-input-splunk 0.1.3 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +24 -6
- data/embulk-input-splunk.gemspec +1 -2
- data/lib/embulk/input/splunk.rb +39 -20
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: daf345bda2bb42c7ae1945b66cd0d70ebdd6c02cb1137c83c9ee34916716e6e4
|
4
|
+
data.tar.gz: 503aabbc4ff9e9c5b0674f4bb910a0b40f1f01d73d53cbf0f89a4702935641ae
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 78f10b454a736fccfb373419c35897c7fcd9fc76970d4116e8101dd536d2e5b9fb108d29a2bebd7c3077d88ef93ef8253dcb3507f77327ddca3e9ab0ddb3de7b
|
7
|
+
data.tar.gz: 398aa35c57d07675a6ad2cf1e0f71b731304d3942644002b28789c898e454c29df171e3ce4ada0d3681434c071b95c0e04412ae48c2e44f5f52809f00de1b556
|
data/README.md
CHANGED
@@ -2,14 +2,18 @@
|
|
2
2
|
|
3
3
|
A simple plug-in to run a once-off Splunk query and emit the results.
|
4
4
|
|
5
|
-
This plugin
|
5
|
+
This plugin uses Splunks `table` command to effeciently and flexibly return results. If you want more flexibility, you can add `_raw` as a table field and then use filter plugins such as [embulk-filter-expand_json](https://github.com/civitaspo/embulk-filter-expand_json) or [embulk-filter-add_time](https://github.com/treasure-data/embulk-filter-add_time) to convert the json column to typed columns. [Rename filter](http://www.embulk.org/docs/built-in.html#rename-filter-plugin) is also useful to rename the typed columns.
|
6
6
|
|
7
|
-
|
7
|
+
# _time and this plugin
|
8
|
+
|
9
|
+
This plugin expects and requires `_time`. If you do not include time in your list of columns, this plugin will automatically add it. It is possible to rename or reformat `_time` in the query in a such a way that this plugin will fail or have unexpected results. It is recommended you do not alter the `_time` in the query unless you know what you're doing. If you need to do something esoteric with `_time`, create another field to work with in your Splunk query.
|
10
|
+
|
11
|
+
In addition, as a column we treat `_time` as a String, but only because we couldn't get the plugin to work with timestamps. We'd welcome a pull request to fix this issue.
|
8
12
|
|
9
13
|
## Overview
|
10
14
|
|
11
15
|
- **Plugin type**: input
|
12
|
-
- **Resume supported**:
|
16
|
+
- **Resume supported**: yes
|
13
17
|
- **Cleanup supported**: no
|
14
18
|
- **Guess supported**: no
|
15
19
|
|
@@ -25,6 +29,7 @@ Note that the time is fetched from Splunk's `_time` field. It is possible to ren
|
|
25
29
|
- **earliest_time**: the earliest time for the splunk search. (string, default: nil, which is unbounded)
|
26
30
|
- **latest_time**: the latest time for the splunk search. (string, default: nil, which is unbounded)
|
27
31
|
- **incremental**: whether to resume next search from last result time (boolean, default: false)
|
32
|
+
- **table** array of columns to include in the results (array, default: [])
|
28
33
|
|
29
34
|
### Earliest and latest times
|
30
35
|
|
@@ -45,7 +50,7 @@ The default Splunk API limits resuts to 100. In this plugin, the limit is not se
|
|
45
50
|
|
46
51
|
## Examples
|
47
52
|
|
48
|
-
Remember the queries much be prefixed with the search command or they are unlikely not to work.
|
53
|
+
Remember the queries much be prefixed with the `search` command or they are unlikely not to work. See examples below.
|
49
54
|
|
50
55
|
### Unbounded time range
|
51
56
|
|
@@ -57,6 +62,9 @@ in:
|
|
57
62
|
password: abc123
|
58
63
|
port: 8089
|
59
64
|
query: search index="main"
|
65
|
+
table:
|
66
|
+
# We treat time as a string, only because we can't get timestamp + format to work
|
67
|
+
- {name: "_time", type: "string"}
|
60
68
|
```
|
61
69
|
|
62
70
|
### Relative time range
|
@@ -70,6 +78,10 @@ in:
|
|
70
78
|
port: 8089
|
71
79
|
query: search index="main"
|
72
80
|
earliest_time: -1m@m
|
81
|
+
table:
|
82
|
+
- {name: "_time", type: "string"}
|
83
|
+
- {name: "foo", type: "string"}
|
84
|
+
- {name: "bar", type: "long"}
|
73
85
|
```
|
74
86
|
|
75
87
|
### Absolute time range
|
@@ -84,6 +96,10 @@ in:
|
|
84
96
|
query: search index="main"
|
85
97
|
earliest_time: 2017-01-18T19:23:08.237+11:00
|
86
98
|
latest_time: 2018-01-18T19:23:08.237+11:00
|
99
|
+
table:
|
100
|
+
- {name: "_time", type: "string"}
|
101
|
+
- {name: "foo", type: "string"}
|
102
|
+
- {name: "bar", type: "long"}
|
87
103
|
```
|
88
104
|
|
89
105
|
### Complex Searches
|
@@ -102,13 +118,15 @@ in:
|
|
102
118
|
query: |
|
103
119
|
search index="main" |
|
104
120
|
eval foo=bar |
|
105
|
-
where like(bar, "%baz%" |
|
121
|
+
where like(bar, "%baz%") |
|
106
122
|
head 100
|
107
123
|
earliest_time: 2017-01-18T19:23:08.237+11:00
|
108
124
|
latest_time: 2018-01-18T19:23:08.237+11:00
|
125
|
+
table:
|
126
|
+
- {name: "_time", type: "string"}
|
127
|
+
- {name: "foo", type: "string"} # Uses foo from the above query
|
109
128
|
```
|
110
129
|
|
111
|
-
|
112
130
|
## Build
|
113
131
|
|
114
132
|
```
|
data/embulk-input-splunk.gemspec
CHANGED
data/lib/embulk/input/splunk.rb
CHANGED
@@ -13,9 +13,11 @@ module Embulk
|
|
13
13
|
SPLUNK_UNLIMITED_RESULTS = 0
|
14
14
|
SPLUNK_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%L%:z"
|
15
15
|
SPLUNK_OUTPUT_FORMAT = "json"
|
16
|
+
SPLUNK_DEFAULT_TIME_FIELD = "_time"
|
17
|
+
SPLUNK_TIME_FIELD = { "name" => SPLUNK_DEFAULT_TIME_FIELD, "type" => "string" }
|
16
18
|
|
17
19
|
def self.transaction(config, &control)
|
18
|
-
|
20
|
+
|
19
21
|
task = {
|
20
22
|
"scheme" => config.param("scheme", :string, default: "https"),
|
21
23
|
"host" => config.param("host", :string),
|
@@ -29,16 +31,21 @@ module Embulk
|
|
29
31
|
"latest_time" => config.param(:latest_time, :string, default: nil),
|
30
32
|
|
31
33
|
"incremental" => config.param("incremental", :bool, default: false),
|
34
|
+
"table" => config.param("table", :array, default: [])
|
32
35
|
}
|
33
36
|
|
34
37
|
if task["incremental"] && task["latest_time"]
|
35
38
|
Embulk.logger.warn "Incremental is 'true' and latest_time is set. This may have unexpected results."
|
36
39
|
end
|
40
|
+
|
41
|
+
if task["table"].select { |field| field["name"] == "_time" }.empty?
|
42
|
+
Embulk.logger.warn "_time is not included in table. Automatically adding it."
|
43
|
+
task["table"] << SPLUNK_TIME_FIELD
|
44
|
+
end
|
37
45
|
|
38
|
-
columns = [
|
39
|
-
Column.new(
|
40
|
-
|
41
|
-
]
|
46
|
+
columns = task["table"].map do |column|
|
47
|
+
Column.new(nil, column["name"], column["type"]&.to_sym || :string, column["format"])
|
48
|
+
end
|
42
49
|
|
43
50
|
resume(task, columns, 1, &control)
|
44
51
|
end
|
@@ -48,15 +55,16 @@ module Embulk
|
|
48
55
|
|
49
56
|
next_config_diff = {}
|
50
57
|
|
51
|
-
|
52
|
-
next_config_diff[:earliest_time] = Time.parse( task_reports.first[:latest_time_in_results] ).strftime(SPLUNK_TIME_FORMAT)
|
53
|
-
end
|
58
|
+
latest_time_in_results = task_reports.first[:latest_time_in_results]
|
54
59
|
|
60
|
+
if task["incremental"] && latest_time_in_results.present?
|
61
|
+
next_config_diff[:earliest_time] = latest_time_in_results
|
62
|
+
end
|
63
|
+
|
55
64
|
return next_config_diff
|
56
65
|
end
|
57
66
|
|
58
67
|
def init
|
59
|
-
# initialization code:
|
60
68
|
splunk_config = {
|
61
69
|
:scheme => task[:scheme],
|
62
70
|
:host => task[:host],
|
@@ -64,14 +72,23 @@ module Embulk
|
|
64
72
|
:username => task[:username],
|
65
73
|
:password => task[:password]
|
66
74
|
}
|
67
|
-
|
68
|
-
@query = task["query"]
|
69
75
|
@earliest_time, @latest_time = task[:earliest_time], task[:latest_time]
|
70
|
-
|
76
|
+
Embulk.logger.info "Earliest time: #{@earliest_time} / Latest time: #{@latest_time}"
|
77
|
+
|
78
|
+
@fields = task["table"].collect { |entry| entry["name"] }
|
79
|
+
Embulk.logger.info "Using fields #{@fields.join', '} in query"
|
80
|
+
|
81
|
+
@query = build_query( task[:query] )
|
82
|
+
|
71
83
|
Embulk.logger.info "Establishing connection to Splunk"
|
72
84
|
@service = ::Splunk::connect(splunk_config)
|
73
85
|
end
|
74
86
|
|
87
|
+
def build_query(query)
|
88
|
+
# Append table expression to query. Even if already present in the query, this should do no harm.
|
89
|
+
"#{query} | table #{ @fields.join(", ") } "
|
90
|
+
end
|
91
|
+
|
75
92
|
def run
|
76
93
|
Embulk.logger.info "Running query `#{@query}`"
|
77
94
|
|
@@ -83,22 +100,24 @@ module Embulk
|
|
83
100
|
|
84
101
|
reader = ::Splunk::ResultsReader.new(stream)
|
85
102
|
|
86
|
-
|
103
|
+
latest_time = nil
|
87
104
|
|
88
105
|
reader.each do |result|
|
89
|
-
event_time
|
90
|
-
|
106
|
+
#We convert event_time to Ruby time for comparison only.
|
107
|
+
event_time = Time.strptime( result[SPLUNK_DEFAULT_TIME_FIELD], SPLUNK_TIME_FORMAT )
|
91
108
|
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
109
|
+
#We need to keep track of latest time for incremental loads.
|
110
|
+
# Unfortunately, Splunk was not respecting our sort requests, so we need to do a comparison for each row.
|
111
|
+
latest_time = latest_time.nil? ? event_time : [latest_time, event_time].max
|
112
|
+
|
113
|
+
row = @fields.map { |field| result[ field ] }
|
114
|
+
page_builder.add( row )
|
96
115
|
end
|
97
116
|
|
98
117
|
page_builder.finish
|
99
118
|
|
100
119
|
task_result = {
|
101
|
-
latest_time_in_results:
|
120
|
+
latest_time_in_results: latest_time.strftime(SPLUNK_TIME_FORMAT)
|
102
121
|
}
|
103
122
|
|
104
123
|
return task_result
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: embulk-input-splunk
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Scott Arbeitman
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-02-
|
11
|
+
date: 2018-02-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
requirement: !ruby/object:Gem::Requirement
|