embulk-input-splunk 0.1.3 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bb169aa2ac863553e446617ab350bb93b56f4711ce48c5b96ec0ce47e0e0525e
4
- data.tar.gz: d715cadc423c7c1dfb6136b1d1cc15a1030f426cb00ee1374eae2f9059058994
3
+ metadata.gz: daf345bda2bb42c7ae1945b66cd0d70ebdd6c02cb1137c83c9ee34916716e6e4
4
+ data.tar.gz: 503aabbc4ff9e9c5b0674f4bb910a0b40f1f01d73d53cbf0f89a4702935641ae
5
5
  SHA512:
6
- metadata.gz: 8cd23ac86966cfc83fd7f7fea8c5d41623639aa2056104c8e4429eb23cf27f62099b1dea2b3167fcd0502d710d0cef3656de0663909ba500e9e35ec72debbc48
7
- data.tar.gz: e9507de2e73fdc6324e7de78febcbb85f66166b77b8e3b19687bd965c6d27cfe164920c616ab5a180eaafc93d49043386e366fd79c3d2a091b779e4d7dab53fb
6
+ metadata.gz: 78f10b454a736fccfb373419c35897c7fcd9fc76970d4116e8101dd536d2e5b9fb108d29a2bebd7c3077d88ef93ef8253dcb3507f77327ddca3e9ab0ddb3de7b
7
+ data.tar.gz: 398aa35c57d07675a6ad2cf1e0f71b731304d3942644002b28789c898e454c29df171e3ce4ada0d3681434c071b95c0e04412ae48c2e44f5f52809f00de1b556
data/README.md CHANGED
@@ -2,14 +2,18 @@
2
2
 
3
3
  A simple plug-in to run a once-off Splunk query and emit the results.
4
4
 
5
- This plugin loads events as two columns: `time` and `event`. `event` is JSON contain the results of your query. You can use filter plugins such as [embulk-filter-expand_json](https://github.com/civitaspo/embulk-filter-expand_json) or [embulk-filter-add_time](https://github.com/treasure-data/embulk-filter-add_time) to convert the json column to typed columns. [Rename filter](http://www.embulk.org/docs/built-in.html#rename-filter-plugin) is also useful to rename the typed columns.
5
+ This plugin uses Splunks `table` command to effeciently and flexibly return results. If you want more flexibility, you can add `_raw` as a table field and then use filter plugins such as [embulk-filter-expand_json](https://github.com/civitaspo/embulk-filter-expand_json) or [embulk-filter-add_time](https://github.com/treasure-data/embulk-filter-add_time) to convert the json column to typed columns. [Rename filter](http://www.embulk.org/docs/built-in.html#rename-filter-plugin) is also useful to rename the typed columns.
6
6
 
7
- Note that the time is fetched from Splunk's `_time` field. It is possible to rename or reformat this field in the query in a such a way that this plugin will fail or have unexpected results. It is recommended you do not alter the `_time` in the query unless you know what you're doing.
7
+ # _time and this plugin
8
+
9
+ This plugin expects and requires `_time`. If you do not include time in your list of columns, this plugin will automatically add it. It is possible to rename or reformat `_time` in the query in a such a way that this plugin will fail or have unexpected results. It is recommended you do not alter the `_time` in the query unless you know what you're doing. If you need to do something esoteric with `_time`, create another field to work with in your Splunk query.
10
+
11
+ In addition, as a column we treat `_time` as a String, but only because we couldn't get the plugin to work with timestamps. We'd welcome a pull request to fix this issue.
8
12
 
9
13
  ## Overview
10
14
 
11
15
  - **Plugin type**: input
12
- - **Resume supported**: no
16
+ - **Resume supported**: yes
13
17
  - **Cleanup supported**: no
14
18
  - **Guess supported**: no
15
19
 
@@ -25,6 +29,7 @@ Note that the time is fetched from Splunk's `_time` field. It is possible to ren
25
29
  - **earliest_time**: the earliest time for the splunk search. (string, default: nil, which is unbounded)
26
30
  - **latest_time**: the latest time for the splunk search. (string, default: nil, which is unbounded)
27
31
  - **incremental**: whether to resume next search from last result time (boolean, default: false)
32
+ - **table** array of columns to include in the results (array, default: [])
28
33
 
29
34
  ### Earliest and latest times
30
35
 
@@ -45,7 +50,7 @@ The default Splunk API limits resuts to 100. In this plugin, the limit is not se
45
50
 
46
51
  ## Examples
47
52
 
48
- Remember the queries much be prefixed with the search command or they are unlikely not to work.
53
+ Remember the queries much be prefixed with the `search` command or they are unlikely not to work. See examples below.
49
54
 
50
55
  ### Unbounded time range
51
56
 
@@ -57,6 +62,9 @@ in:
57
62
  password: abc123
58
63
  port: 8089
59
64
  query: search index="main"
65
+ table:
66
+ # We treat time as a string, only because we can't get timestamp + format to work
67
+ - {name: "_time", type: "string"}
60
68
  ```
61
69
 
62
70
  ### Relative time range
@@ -70,6 +78,10 @@ in:
70
78
  port: 8089
71
79
  query: search index="main"
72
80
  earliest_time: -1m@m
81
+ table:
82
+ - {name: "_time", type: "string"}
83
+ - {name: "foo", type: "string"}
84
+ - {name: "bar", type: "long"}
73
85
  ```
74
86
 
75
87
  ### Absolute time range
@@ -84,6 +96,10 @@ in:
84
96
  query: search index="main"
85
97
  earliest_time: 2017-01-18T19:23:08.237+11:00
86
98
  latest_time: 2018-01-18T19:23:08.237+11:00
99
+ table:
100
+ - {name: "_time", type: "string"}
101
+ - {name: "foo", type: "string"}
102
+ - {name: "bar", type: "long"}
87
103
  ```
88
104
 
89
105
  ### Complex Searches
@@ -102,13 +118,15 @@ in:
102
118
  query: |
103
119
  search index="main" |
104
120
  eval foo=bar |
105
- where like(bar, "%baz%" |
121
+ where like(bar, "%baz%") |
106
122
  head 100
107
123
  earliest_time: 2017-01-18T19:23:08.237+11:00
108
124
  latest_time: 2018-01-18T19:23:08.237+11:00
125
+ table:
126
+ - {name: "_time", type: "string"}
127
+ - {name: "foo", type: "string"} # Uses foo from the above query
109
128
  ```
110
129
 
111
-
112
130
  ## Build
113
131
 
114
132
  ```
@@ -1,7 +1,6 @@
1
-
2
1
  Gem::Specification.new do |spec|
3
2
  spec.name = "embulk-input-splunk"
4
- spec.version = "0.1.3"
3
+ spec.version = "0.2.0"
5
4
  spec.authors = ["Scott Arbeitman"]
6
5
  spec.summary = "Splunk input plugin for Embulk"
7
6
  spec.description = "Loads records from a Splunk query."
@@ -13,9 +13,11 @@ module Embulk
13
13
  SPLUNK_UNLIMITED_RESULTS = 0
14
14
  SPLUNK_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%L%:z"
15
15
  SPLUNK_OUTPUT_FORMAT = "json"
16
+ SPLUNK_DEFAULT_TIME_FIELD = "_time"
17
+ SPLUNK_TIME_FIELD = { "name" => SPLUNK_DEFAULT_TIME_FIELD, "type" => "string" }
16
18
 
17
19
  def self.transaction(config, &control)
18
- # configuration code:
20
+
19
21
  task = {
20
22
  "scheme" => config.param("scheme", :string, default: "https"),
21
23
  "host" => config.param("host", :string),
@@ -29,16 +31,21 @@ module Embulk
29
31
  "latest_time" => config.param(:latest_time, :string, default: nil),
30
32
 
31
33
  "incremental" => config.param("incremental", :bool, default: false),
34
+ "table" => config.param("table", :array, default: [])
32
35
  }
33
36
 
34
37
  if task["incremental"] && task["latest_time"]
35
38
  Embulk.logger.warn "Incremental is 'true' and latest_time is set. This may have unexpected results."
36
39
  end
40
+
41
+ if task["table"].select { |field| field["name"] == "_time" }.empty?
42
+ Embulk.logger.warn "_time is not included in table. Automatically adding it."
43
+ task["table"] << SPLUNK_TIME_FIELD
44
+ end
37
45
 
38
- columns = [
39
- Column.new(0, "time", :timestamp),
40
- Column.new(1, "event", :json),
41
- ]
46
+ columns = task["table"].map do |column|
47
+ Column.new(nil, column["name"], column["type"]&.to_sym || :string, column["format"])
48
+ end
42
49
 
43
50
  resume(task, columns, 1, &control)
44
51
  end
@@ -48,15 +55,16 @@ module Embulk
48
55
 
49
56
  next_config_diff = {}
50
57
 
51
- if task["incremental"]
52
- next_config_diff[:earliest_time] = Time.parse( task_reports.first[:latest_time_in_results] ).strftime(SPLUNK_TIME_FORMAT)
53
- end
58
+ latest_time_in_results = task_reports.first[:latest_time_in_results]
54
59
 
60
+ if task["incremental"] && latest_time_in_results.present?
61
+ next_config_diff[:earliest_time] = latest_time_in_results
62
+ end
63
+
55
64
  return next_config_diff
56
65
  end
57
66
 
58
67
  def init
59
- # initialization code:
60
68
  splunk_config = {
61
69
  :scheme => task[:scheme],
62
70
  :host => task[:host],
@@ -64,14 +72,23 @@ module Embulk
64
72
  :username => task[:username],
65
73
  :password => task[:password]
66
74
  }
67
-
68
- @query = task["query"]
69
75
  @earliest_time, @latest_time = task[:earliest_time], task[:latest_time]
70
-
76
+ Embulk.logger.info "Earliest time: #{@earliest_time} / Latest time: #{@latest_time}"
77
+
78
+ @fields = task["table"].collect { |entry| entry["name"] }
79
+ Embulk.logger.info "Using fields #{@fields.join', '} in query"
80
+
81
+ @query = build_query( task[:query] )
82
+
71
83
  Embulk.logger.info "Establishing connection to Splunk"
72
84
  @service = ::Splunk::connect(splunk_config)
73
85
  end
74
86
 
87
+ def build_query(query)
88
+ # Append table expression to query. Even if already present in the query, this should do no harm.
89
+ "#{query} | table #{ @fields.join(", ") } "
90
+ end
91
+
75
92
  def run
76
93
  Embulk.logger.info "Running query `#{@query}`"
77
94
 
@@ -83,22 +100,24 @@ module Embulk
83
100
 
84
101
  reader = ::Splunk::ResultsReader.new(stream)
85
102
 
86
- latest_time_in_results = Time.at(0)
103
+ latest_time = nil
87
104
 
88
105
  reader.each do |result|
89
- event_time = Time.strptime( result["_time"], SPLUNK_TIME_FORMAT )
90
- latest_time_in_results = [latest_time_in_results, event_time].max
106
+ #We convert event_time to Ruby time for comparison only.
107
+ event_time = Time.strptime( result[SPLUNK_DEFAULT_TIME_FIELD], SPLUNK_TIME_FORMAT )
91
108
 
92
- page_builder.add( [
93
- event_time,
94
- result.to_json
95
- ] )
109
+ #We need to keep track of latest time for incremental loads.
110
+ # Unfortunately, Splunk was not respecting our sort requests, so we need to do a comparison for each row.
111
+ latest_time = latest_time.nil? ? event_time : [latest_time, event_time].max
112
+
113
+ row = @fields.map { |field| result[ field ] }
114
+ page_builder.add( row )
96
115
  end
97
116
 
98
117
  page_builder.finish
99
118
 
100
119
  task_result = {
101
- latest_time_in_results: latest_time_in_results
120
+ latest_time_in_results: latest_time.strftime(SPLUNK_TIME_FORMAT)
102
121
  }
103
122
 
104
123
  return task_result
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: embulk-input-splunk
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Scott Arbeitman
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-02-18 00:00:00.000000000 Z
11
+ date: 2018-02-21 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  requirement: !ruby/object:Gem::Requirement