embulk-input-splunk 0.1.3 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bb169aa2ac863553e446617ab350bb93b56f4711ce48c5b96ec0ce47e0e0525e
4
- data.tar.gz: d715cadc423c7c1dfb6136b1d1cc15a1030f426cb00ee1374eae2f9059058994
3
+ metadata.gz: daf345bda2bb42c7ae1945b66cd0d70ebdd6c02cb1137c83c9ee34916716e6e4
4
+ data.tar.gz: 503aabbc4ff9e9c5b0674f4bb910a0b40f1f01d73d53cbf0f89a4702935641ae
5
5
  SHA512:
6
- metadata.gz: 8cd23ac86966cfc83fd7f7fea8c5d41623639aa2056104c8e4429eb23cf27f62099b1dea2b3167fcd0502d710d0cef3656de0663909ba500e9e35ec72debbc48
7
- data.tar.gz: e9507de2e73fdc6324e7de78febcbb85f66166b77b8e3b19687bd965c6d27cfe164920c616ab5a180eaafc93d49043386e366fd79c3d2a091b779e4d7dab53fb
6
+ metadata.gz: 78f10b454a736fccfb373419c35897c7fcd9fc76970d4116e8101dd536d2e5b9fb108d29a2bebd7c3077d88ef93ef8253dcb3507f77327ddca3e9ab0ddb3de7b
7
+ data.tar.gz: 398aa35c57d07675a6ad2cf1e0f71b731304d3942644002b28789c898e454c29df171e3ce4ada0d3681434c071b95c0e04412ae48c2e44f5f52809f00de1b556
data/README.md CHANGED
@@ -2,14 +2,18 @@
2
2
 
3
3
  A simple plug-in to run a once-off Splunk query and emit the results.
4
4
 
5
- This plugin loads events as two columns: `time` and `event`. `event` is JSON contain the results of your query. You can use filter plugins such as [embulk-filter-expand_json](https://github.com/civitaspo/embulk-filter-expand_json) or [embulk-filter-add_time](https://github.com/treasure-data/embulk-filter-add_time) to convert the json column to typed columns. [Rename filter](http://www.embulk.org/docs/built-in.html#rename-filter-plugin) is also useful to rename the typed columns.
5
+ This plugin uses Splunks `table` command to effeciently and flexibly return results. If you want more flexibility, you can add `_raw` as a table field and then use filter plugins such as [embulk-filter-expand_json](https://github.com/civitaspo/embulk-filter-expand_json) or [embulk-filter-add_time](https://github.com/treasure-data/embulk-filter-add_time) to convert the json column to typed columns. [Rename filter](http://www.embulk.org/docs/built-in.html#rename-filter-plugin) is also useful to rename the typed columns.
6
6
 
7
- Note that the time is fetched from Splunk's `_time` field. It is possible to rename or reformat this field in the query in a such a way that this plugin will fail or have unexpected results. It is recommended you do not alter the `_time` in the query unless you know what you're doing.
7
+ # _time and this plugin
8
+
9
+ This plugin expects and requires `_time`. If you do not include time in your list of columns, this plugin will automatically add it. It is possible to rename or reformat `_time` in the query in a such a way that this plugin will fail or have unexpected results. It is recommended you do not alter the `_time` in the query unless you know what you're doing. If you need to do something esoteric with `_time`, create another field to work with in your Splunk query.
10
+
11
+ In addition, as a column we treat `_time` as a String, but only because we couldn't get the plugin to work with timestamps. We'd welcome a pull request to fix this issue.
8
12
 
9
13
  ## Overview
10
14
 
11
15
  - **Plugin type**: input
12
- - **Resume supported**: no
16
+ - **Resume supported**: yes
13
17
  - **Cleanup supported**: no
14
18
  - **Guess supported**: no
15
19
 
@@ -25,6 +29,7 @@ Note that the time is fetched from Splunk's `_time` field. It is possible to ren
25
29
  - **earliest_time**: the earliest time for the splunk search. (string, default: nil, which is unbounded)
26
30
  - **latest_time**: the latest time for the splunk search. (string, default: nil, which is unbounded)
27
31
  - **incremental**: whether to resume next search from last result time (boolean, default: false)
32
+ - **table** array of columns to include in the results (array, default: [])
28
33
 
29
34
  ### Earliest and latest times
30
35
 
@@ -45,7 +50,7 @@ The default Splunk API limits resuts to 100. In this plugin, the limit is not se
45
50
 
46
51
  ## Examples
47
52
 
48
- Remember the queries much be prefixed with the search command or they are unlikely not to work.
53
+ Remember the queries much be prefixed with the `search` command or they are unlikely not to work. See examples below.
49
54
 
50
55
  ### Unbounded time range
51
56
 
@@ -57,6 +62,9 @@ in:
57
62
  password: abc123
58
63
  port: 8089
59
64
  query: search index="main"
65
+ table:
66
+ # We treat time as a string, only because we can't get timestamp + format to work
67
+ - {name: "_time", type: "string"}
60
68
  ```
61
69
 
62
70
  ### Relative time range
@@ -70,6 +78,10 @@ in:
70
78
  port: 8089
71
79
  query: search index="main"
72
80
  earliest_time: -1m@m
81
+ table:
82
+ - {name: "_time", type: "string"}
83
+ - {name: "foo", type: "string"}
84
+ - {name: "bar", type: "long"}
73
85
  ```
74
86
 
75
87
  ### Absolute time range
@@ -84,6 +96,10 @@ in:
84
96
  query: search index="main"
85
97
  earliest_time: 2017-01-18T19:23:08.237+11:00
86
98
  latest_time: 2018-01-18T19:23:08.237+11:00
99
+ table:
100
+ - {name: "_time", type: "string"}
101
+ - {name: "foo", type: "string"}
102
+ - {name: "bar", type: "long"}
87
103
  ```
88
104
 
89
105
  ### Complex Searches
@@ -102,13 +118,15 @@ in:
102
118
  query: |
103
119
  search index="main" |
104
120
  eval foo=bar |
105
- where like(bar, "%baz%" |
121
+ where like(bar, "%baz%") |
106
122
  head 100
107
123
  earliest_time: 2017-01-18T19:23:08.237+11:00
108
124
  latest_time: 2018-01-18T19:23:08.237+11:00
125
+ table:
126
+ - {name: "_time", type: "string"}
127
+ - {name: "foo", type: "string"} # Uses foo from the above query
109
128
  ```
110
129
 
111
-
112
130
  ## Build
113
131
 
114
132
  ```
@@ -1,7 +1,6 @@
1
-
2
1
  Gem::Specification.new do |spec|
3
2
  spec.name = "embulk-input-splunk"
4
- spec.version = "0.1.3"
3
+ spec.version = "0.2.0"
5
4
  spec.authors = ["Scott Arbeitman"]
6
5
  spec.summary = "Splunk input plugin for Embulk"
7
6
  spec.description = "Loads records from a Splunk query."
@@ -13,9 +13,11 @@ module Embulk
13
13
  SPLUNK_UNLIMITED_RESULTS = 0
14
14
  SPLUNK_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%L%:z"
15
15
  SPLUNK_OUTPUT_FORMAT = "json"
16
+ SPLUNK_DEFAULT_TIME_FIELD = "_time"
17
+ SPLUNK_TIME_FIELD = { "name" => SPLUNK_DEFAULT_TIME_FIELD, "type" => "string" }
16
18
 
17
19
  def self.transaction(config, &control)
18
- # configuration code:
20
+
19
21
  task = {
20
22
  "scheme" => config.param("scheme", :string, default: "https"),
21
23
  "host" => config.param("host", :string),
@@ -29,16 +31,21 @@ module Embulk
29
31
  "latest_time" => config.param(:latest_time, :string, default: nil),
30
32
 
31
33
  "incremental" => config.param("incremental", :bool, default: false),
34
+ "table" => config.param("table", :array, default: [])
32
35
  }
33
36
 
34
37
  if task["incremental"] && task["latest_time"]
35
38
  Embulk.logger.warn "Incremental is 'true' and latest_time is set. This may have unexpected results."
36
39
  end
40
+
41
+ if task["table"].select { |field| field["name"] == "_time" }.empty?
42
+ Embulk.logger.warn "_time is not included in table. Automatically adding it."
43
+ task["table"] << SPLUNK_TIME_FIELD
44
+ end
37
45
 
38
- columns = [
39
- Column.new(0, "time", :timestamp),
40
- Column.new(1, "event", :json),
41
- ]
46
+ columns = task["table"].map do |column|
47
+ Column.new(nil, column["name"], column["type"]&.to_sym || :string, column["format"])
48
+ end
42
49
 
43
50
  resume(task, columns, 1, &control)
44
51
  end
@@ -48,15 +55,16 @@ module Embulk
48
55
 
49
56
  next_config_diff = {}
50
57
 
51
- if task["incremental"]
52
- next_config_diff[:earliest_time] = Time.parse( task_reports.first[:latest_time_in_results] ).strftime(SPLUNK_TIME_FORMAT)
53
- end
58
+ latest_time_in_results = task_reports.first[:latest_time_in_results]
54
59
 
60
+ if task["incremental"] && latest_time_in_results.present?
61
+ next_config_diff[:earliest_time] = latest_time_in_results
62
+ end
63
+
55
64
  return next_config_diff
56
65
  end
57
66
 
58
67
  def init
59
- # initialization code:
60
68
  splunk_config = {
61
69
  :scheme => task[:scheme],
62
70
  :host => task[:host],
@@ -64,14 +72,23 @@ module Embulk
64
72
  :username => task[:username],
65
73
  :password => task[:password]
66
74
  }
67
-
68
- @query = task["query"]
69
75
  @earliest_time, @latest_time = task[:earliest_time], task[:latest_time]
70
-
76
+ Embulk.logger.info "Earliest time: #{@earliest_time} / Latest time: #{@latest_time}"
77
+
78
+ @fields = task["table"].collect { |entry| entry["name"] }
79
+ Embulk.logger.info "Using fields #{@fields.join', '} in query"
80
+
81
+ @query = build_query( task[:query] )
82
+
71
83
  Embulk.logger.info "Establishing connection to Splunk"
72
84
  @service = ::Splunk::connect(splunk_config)
73
85
  end
74
86
 
87
+ def build_query(query)
88
+ # Append table expression to query. Even if already present in the query, this should do no harm.
89
+ "#{query} | table #{ @fields.join(", ") } "
90
+ end
91
+
75
92
  def run
76
93
  Embulk.logger.info "Running query `#{@query}`"
77
94
 
@@ -83,22 +100,24 @@ module Embulk
83
100
 
84
101
  reader = ::Splunk::ResultsReader.new(stream)
85
102
 
86
- latest_time_in_results = Time.at(0)
103
+ latest_time = nil
87
104
 
88
105
  reader.each do |result|
89
- event_time = Time.strptime( result["_time"], SPLUNK_TIME_FORMAT )
90
- latest_time_in_results = [latest_time_in_results, event_time].max
106
+ #We convert event_time to Ruby time for comparison only.
107
+ event_time = Time.strptime( result[SPLUNK_DEFAULT_TIME_FIELD], SPLUNK_TIME_FORMAT )
91
108
 
92
- page_builder.add( [
93
- event_time,
94
- result.to_json
95
- ] )
109
+ #We need to keep track of latest time for incremental loads.
110
+ # Unfortunately, Splunk was not respecting our sort requests, so we need to do a comparison for each row.
111
+ latest_time = latest_time.nil? ? event_time : [latest_time, event_time].max
112
+
113
+ row = @fields.map { |field| result[ field ] }
114
+ page_builder.add( row )
96
115
  end
97
116
 
98
117
  page_builder.finish
99
118
 
100
119
  task_result = {
101
- latest_time_in_results: latest_time_in_results
120
+ latest_time_in_results: latest_time.strftime(SPLUNK_TIME_FORMAT)
102
121
  }
103
122
 
104
123
  return task_result
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-input-splunk
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Scott Arbeitman
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-02-18 00:00:00.000000000 Z
11
+ date: 2018-02-21 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement