logstash-codec-csv 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: a6f3dc6030f103dc961e34e2babeeccd8fe64884
4
+ data.tar.gz: 7a26a6c91a3f5cfd272a7ce75e38e46c8fe53bc5
5
+ SHA512:
6
+ metadata.gz: ec21f23421f5da8349534a2c3f787a7a4f04be9166b078c296538679bac1da9f1fbbedb3b401dde2a5d274c7a00adbce71b27fc04642d6017379e82a0d06a952
7
+ data.tar.gz: b5219a99d9bac584872604117631f99c1377195ea77317ffb867ff3f59d572cc60f3c50ee013a074ab86dd281998853e9883eb74e605fa35e6bd3c04e1312107
@@ -0,0 +1,10 @@
1
+ # 0.1.2
2
+ - Depend on logstash-core-plugin-api instead of logstash-core, removing the need to mass update plugins on major releases of logstash
3
+ # 0.1.1
4
+ - New dependency requirements for logstash-core for the 5.0 release
5
+ ## 0.1.0
6
+ - Initial version of the codec, this first version include feature parity with the current CSV filter, this the ability to set column names, the column separator, a quote char, decide if autogeneration of columns is ok, type conversion and skip empty columns.
7
+ - This initial version also include an option to treat the first
8
+ chunk of data seen as containing the headers. This functionality
9
+ will be useful when reading CSV files (all at once, usually) that
10
+ contain this information.
@@ -0,0 +1,16 @@
1
+ The following is a list of people who have contributed ideas, code, bug
2
+ reports, or in general have helped logstash along its way.
3
+
4
+ Contributors:
5
+ * Colin Surprenant (colinsurprenant)
6
+ * Jordan Sissel (jordansissel)
7
+ * João Duarte (jsvd)
8
+ * Nick Ethier (nickethier)
9
+ * Pier-Hugues Pellerin (ph)
10
+ * Richard Pijnenburg (electrical)
11
+ * Suyog Rao (suyograo)
12
+
13
+ Note: If you've sent us patches, bug reports, or otherwise contributed to
14
+ Logstash, and you aren't on the list above and want to be, please let us know
15
+ and we'll make sure you're here. Contributions from folks like you are what make
16
+ open source awesome.
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,13 @@
1
+ Copyright (c) 2012-2015 Elasticsearch <http://www.elasticsearch.org>
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
@@ -0,0 +1,97 @@
1
+ # Logstash CSV Codec
2
+
3
+ ## WIP: Under Development, NOT FOR PRODUCTION
4
+
5
+ This is a plugin for [Logstash](https://github.com/elasticsearch/logstash).
6
+
7
+ It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
8
+
9
+ ## Documentation
10
+
11
+ Logstash provides infrastructure to automatically generate documentation for this plugin. We use the asciidoc format to write documentation so any comments in the source code will be first converted into asciidoc and then into html. All plugin documentation are placed under one [central location](http://www.elasticsearch.org/guide/en/logstash/current/).
12
+
13
+ - For formatting code or config example, you can use the asciidoc `[source,ruby]` directive
14
+ - For more asciidoc formatting tips, see the excellent reference here https://github.com/elasticsearch/docs#asciidoc-guide
15
+
16
+ ## Need Help?
17
+
18
+ Need help? Try #logstash on freenode IRC or the logstash-users@googlegroups.com mailing list.
19
+
20
+ ## Developing
21
+
22
+ ### 1. Plugin Developement and Testing
23
+
24
+ #### Code
25
+ - To get started, you'll need JRuby with the Bundler gem installed.
26
+
27
+ - Create a new plugin or clone and existing from the GitHub [logstash-plugins](https://github.com/logstash-plugins) organization.
28
+
29
+ - Install dependencies
30
+ ```sh
31
+ bundle install
32
+ ```
33
+
34
+ #### Test
35
+
36
+ ```sh
37
+ bundle exec rspec
38
+ ```
39
+
40
+ The Logstash code required to run the tests/specs is specified in the `Gemfile` by the line similar to:
41
+ ```ruby
42
+ gem "logstash", :github => "elasticsearch/logstash", :branch => "1.5"
43
+ ```
44
+ To test against another version or a local Logstash, edit the `Gemfile` to specify an alternative location, for example:
45
+ ```ruby
46
+ gem "logstash", :github => "elasticsearch/logstash", :ref => "master"
47
+ ```
48
+ ```ruby
49
+ gem "logstash", :path => "/your/local/logstash"
50
+ ```
51
+
52
+ Then update your dependencies and run your tests:
53
+
54
+ ```sh
55
+ bundle install
56
+ bundle exec rspec
57
+ ```
58
+
59
+ ### 2. Running your unpublished Plugin in Logstash
60
+
61
+ #### 2.1 Run in a local Logstash clone
62
+
63
+ - Edit Logstash `tools/Gemfile` and add the local plugin path, for example:
64
+ ```ruby
65
+ gem "logstash-filter-awesome", :path => "/your/local/logstash-filter-awesome"
66
+ ```
67
+ - Update Logstash dependencies
68
+ ```sh
69
+ rake vendor:gems
70
+ ```
71
+ - Run Logstash with your plugin
72
+ ```sh
73
+ bin/logstash -e 'filter {awesome {}}'
74
+ ```
75
+ At this point any modifications to the plugin code will be applied to this local Logstash setup. After modifying the plugin, simply rerun Logstash.
76
+
77
+ #### 2.2 Run in an installed Logstash
78
+
79
+ - Build your plugin gem
80
+ ```sh
81
+ gem build logstash-filter-awesome.gemspec
82
+ ```
83
+ - Install the plugin from the Logstash home
84
+ ```sh
85
+ bin/plugin install /your/local/plugin/logstash-filter-awesome.gem
86
+ ```
87
+ - Start Logstash and proceed to test the plugin
88
+
89
+ ## Contributing
90
+
91
+ All contributions are welcome: ideas, patches, documentation, bug reports, complaints, and even something you drew up on a napkin.
92
+
93
+ Programming is not a required skill. Whatever you've seen about open source and maintainers or community members saying "send patches or die" - you will not see that here.
94
+
95
+ It is more important to me that you are able to contribute.
96
+
97
+ For more information about contributing, see the [CONTRIBUTING](https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md) file.
@@ -0,0 +1,163 @@
1
+ # encoding: utf-8
2
+ require "logstash/codecs/base"
3
+ require "logstash/util/charset"
4
+ require "csv"
5
+
6
+ class LogStash::Codecs::CSV < LogStash::Codecs::Base
7
+
8
+ config_name "csv"
9
+
10
+ # Define a list of column names (in the order they appear in the CSV,
11
+ # as if it were a header line). If `columns` is not configured, or there
12
+ # are not enough columns specified, the default column names are
13
+ # "column1", "column2", etc. In the case that there are more columns
14
+ # in the data than specified in this column list, extra columns will be auto-numbered:
15
+ # (e.g. "user_defined_1", "user_defined_2", "column3", "column4", etc.)
16
+ config :columns, :validate => :array, :default => []
17
+
18
+ # Define the column separator value. If this is not specified, the default
19
+ # is a comma `,`.
20
+ # Optional.
21
+ config :separator, :validate => :string, :default => ","
22
+
23
+ # Define the character used to quote CSV fields. If this is not specified
24
+ # the default is a double quote `"`.
25
+ # Optional.
26
+ config :quote_char, :validate => :string, :default => '"'
27
+
28
+ # Treats the first line received as the hearder information, this information will
29
+ # be used to compose the field names in the generated events. Note this information can
30
+ # be reset on demand, useful for example when dealing with new files in the file input
31
+ # or new request in the http_poller. Default => false
32
+ config :include_headers, :validate => :boolean, :default => false
33
+
34
+ # Define whether column names should autogenerated or not.
35
+ # Defaults to true. If set to false, columns not having a header specified will not be parsed.
36
+ config :autogenerate_column_names, :validate => :boolean, :default => true
37
+
38
+ # Define whether empty columns should be skipped.
39
+ # Defaults to false. If set to true, columns containing no value will not get set.
40
+ config :skip_empty_columns, :validate => :boolean, :default => false
41
+
42
+ # Define a set of datatype conversions to be applied to columns.
43
+ # Possible conversions are integer, float, date, date_time, boolean
44
+ #
45
+ # # Example:
46
+ # [source,ruby]
47
+ # filter {
48
+ # csv {
49
+ # convert => { "column1" => "integer", "column2" => "boolean" }
50
+ # }
51
+ # }
52
+ config :convert, :validate => :hash, :default => {}
53
+
54
+ ##
55
+ # List of valid conversion types used for the convert option
56
+ ##
57
+ VALID_CONVERT_TYPES = [ "integer", "float", "date", "date_time", "boolean" ].freeze
58
+
59
+
60
+ # The character encoding used in this codec. Examples include "UTF-8" and
61
+ # "CP1252".
62
+ config :charset, :validate => ::Encoding.name_list, :default => "UTF-8"
63
+
64
+ def register
65
+ @converter = LogStash::Util::Charset.new(@charset)
66
+ @converter.logger = @logger
67
+
68
+ # validate conversion types to be the valid ones.
69
+ @convert.each_pair do |column, type|
70
+ if !VALID_CONVERT_TYPES.include?(type)
71
+ raise LogStash::ConfigurationError, "#{type} is not a valid conversion type."
72
+ end
73
+ end
74
+
75
+ @headers = false
76
+ @options = { :col_sep => @separator, :quote_char => @quote_char }
77
+ end
78
+
79
+ def decode(data)
80
+ data = @converter.convert(data)
81
+ begin
82
+ values = CSV.parse_line(data, @options)
83
+ if @include_headers && !@headers
84
+ @headers = true
85
+ @options[:headers] = values
86
+ else
87
+ decoded = {}
88
+ values.each_with_index do |fields, index|
89
+ field_name, value = nil, nil
90
+ if fields.is_a?(String) && !( @skip_empty_columns && fields.nil?) # No headers
91
+ next if ignore_field?(index)
92
+ field_name = ( !@columns[index].nil? ? @columns[index] : "column#{(index+1)}")
93
+ value = fields
94
+ elsif fields.is_a?(Array) # Got headers
95
+ field_name = fields[0]
96
+ value = fields[1]
97
+ end
98
+ next unless field_name
99
+ decoded[field_name] = if should_transform?(field_name)
100
+ transform(field_name, value)
101
+ else
102
+ value
103
+ end
104
+ end
105
+ yield LogStash::Event.new(decoded) if block_given?
106
+ end
107
+ rescue CSV::MalformedCSVError => e
108
+ @logger.info("CSV parse failure. Falling back to plain-text", :error => e, :data => data)
109
+ yield LogStash::Event.new("message" => data, "tags" => ["_csvparsefailure"]) if block_given?
110
+ end
111
+ end
112
+
113
+ def encode(event)
114
+ csv_data = CSV.generate_line(event.to_hash.values, @options)
115
+ @on_event.call(event, csv_data)
116
+ end
117
+
118
+ def reset
119
+ @headers = false
120
+ @options.delete(:headers)
121
+ end
122
+
123
+ private
124
+
125
+ def ignore_field?(index)
126
+ !@columns[index] && !@autogenerate_column_names
127
+ end
128
+
129
+ def should_transform?(field_name)
130
+ !@convert[field_name].nil?
131
+ end
132
+
133
+ def transform(field_name, value)
134
+ transformation = @convert[field_name].to_sym
135
+ converters[transformation].call(value)
136
+ end
137
+
138
+ def converters
139
+ @converters ||= {
140
+ :integer => lambda do |value|
141
+ CSV::Converters[:integer].call(value)
142
+ end,
143
+ :float => lambda do |value|
144
+ CSV::Converters[:float].call(value)
145
+
146
+ end,
147
+ :date => lambda do |value|
148
+ CSV::Converters[:date].call(value)
149
+
150
+ end,
151
+ :date_time => lambda do |value|
152
+ CSV::Converters[:date_time].call(value)
153
+ end,
154
+ :boolean => lambda do |value|
155
+ value = value.strip.downcase
156
+ return false if value == "false"
157
+ return true if value == "true"
158
+ return value
159
+ end
160
+ }
161
+ end
162
+
163
+ end # class LogStash::Codecs::Plain
@@ -0,0 +1,27 @@
1
+ Gem::Specification.new do |s|
2
+
3
+ s.name = 'logstash-codec-csv'
4
+ s.version = '0.1.2'
5
+ s.licenses = ['Apache License (2.0)']
6
+ s.summary = "The csv codec take CSV data, parses it and passes it away"
7
+ s.description = "This gem is a logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/plugin install gemname. This gem is not a stand-alone program"
8
+ s.authors = ["Elasticsearch"]
9
+ s.email = 'info@elasticsearch.com'
10
+ s.homepage = "http://www.elasticsearch.org/guide/en/logstash/current/index.html"
11
+ s.require_paths = ["lib"]
12
+
13
+ # Files
14
+ s.files = Dir['lib/**/*','spec/**/*','vendor/**/*','*.gemspec','*.md','CONTRIBUTORS','Gemfile','LICENSE','NOTICE.TXT']
15
+
16
+ # Tests
17
+ s.test_files = s.files.grep(%r{^(test|spec|features)/})
18
+
19
+ # Special flag to let us know this is actually a logstash plugin
20
+ s.metadata = { "logstash_plugin" => "true", "logstash_group" => "codec" }
21
+
22
+ # Gem dependencies
23
+ s.add_runtime_dependency "logstash-core-plugin-api", "~> 1.0"
24
+
25
+ s.add_development_dependency 'logstash-devutils'
26
+ end
27
+
@@ -0,0 +1,206 @@
1
+ # encoding: utf-8
2
+ require "logstash/codecs/csv"
3
+ require "logstash/event"
4
+
5
+ describe LogStash::Codecs::CSV do
6
+
7
+ subject(:codec) { LogStash::Codecs::CSV.new(config) }
8
+ let(:config) { Hash.new }
9
+
10
+ before(:each) do
11
+ codec.register
12
+ end
13
+
14
+ describe "decode" do
15
+
16
+ let(:data) { "big,bird,sesame street" }
17
+
18
+ it "return an event from CSV data" do
19
+ codec.decode(data) do |event|
20
+ expect(event["column1"]).to eq("big")
21
+ expect(event["column2"]).to eq("bird")
22
+ expect(event["column3"]).to eq("sesame street")
23
+ end
24
+ end
25
+
26
+ describe "given column names" do
27
+ let(:doc) { "big,bird,sesame street" }
28
+ let(:config) do
29
+ { "columns" => ["first", "last", "address" ] }
30
+ end
31
+
32
+ it "extract all the values" do
33
+ codec.decode(data) do |event|
34
+ expect(event["first"]).to eq("big")
35
+ expect(event["last"]).to eq("bird")
36
+ expect(event["address"]).to eq("sesame street")
37
+ end
38
+ end
39
+
40
+ context "parse csv skipping empty columns" do
41
+
42
+ let(:data) { "val1,,val3" }
43
+
44
+ let(:config) do
45
+ { "skip_empty_columns" => true,
46
+ "columns" => ["custom1", "custom2", "custom3"] }
47
+ end
48
+
49
+ it "extract all the values" do
50
+ codec.decode(data) do |event|
51
+ expect(event["custom1"]).to eq("val1")
52
+ expect(event.to_hash).not_to include("custom2")
53
+ expect(event["custom3"]).to eq("val3")
54
+ end
55
+ end
56
+ end
57
+
58
+ context "parse csv without autogeneration of names" do
59
+
60
+ let(:data) { "val1,val2,val3" }
61
+ let(:config) do
62
+ { "autogenerate_column_names" => false,
63
+ "columns" => ["custom1", "custom2"] }
64
+ end
65
+
66
+ it "extract all the values" do
67
+ codec.decode(data) do |event|
68
+ expect(event["custom1"]).to eq("val1")
69
+ expect(event["custom2"]).to eq("val2")
70
+ expect(event["column3"]).to be_falsey
71
+ end
72
+ end
73
+ end
74
+
75
+ end
76
+
77
+ describe "custom separator" do
78
+ let(:data) { "big,bird;sesame street" }
79
+
80
+ let(:config) do
81
+ { "separator" => ";" }
82
+ end
83
+
84
+ it "return an event from CSV data" do
85
+ codec.decode(data) do |event|
86
+ expect(event["column1"]).to eq("big,bird")
87
+ expect(event["column2"]).to eq("sesame street")
88
+ end
89
+ end
90
+ end
91
+
92
+ describe "quote char" do
93
+ let(:data) { "big,bird,'sesame street'" }
94
+
95
+ let(:config) do
96
+ { "quote_char" => "'"}
97
+ end
98
+
99
+ it "return an event from CSV data" do
100
+ codec.decode(data) do |event|
101
+ expect(event["column1"]).to eq("big")
102
+ expect(event["column2"]).to eq("bird")
103
+ expect(event["column3"]).to eq("sesame street")
104
+ end
105
+ end
106
+
107
+ context "using the default one" do
108
+ let(:data) { 'big,bird,"sesame, street"' }
109
+ let(:config) { Hash.new }
110
+
111
+ it "return an event from CSV data" do
112
+ codec.decode(data) do |event|
113
+ expect(event["column1"]).to eq("big")
114
+ expect(event["column2"]).to eq("bird")
115
+ expect(event["column3"]).to eq("sesame, street")
116
+ end
117
+ end
118
+ end
119
+
120
+ context "using a null" do
121
+ let(:data) { 'big,bird,"sesame" street' }
122
+ let(:config) do
123
+ { "quote_char" => "\x00" }
124
+ end
125
+
126
+ it "return an event from CSV data" do
127
+ codec.decode(data) do |event|
128
+ expect(event["column1"]).to eq("big")
129
+ expect(event["column2"]).to eq("bird")
130
+ expect(event["column3"]).to eq('"sesame" street')
131
+ end
132
+ end
133
+ end
134
+ end
135
+
136
+ describe "having headers" do
137
+
138
+ let(:data) do
139
+ [ "size,animal,movie", "big,bird,sesame street"]
140
+ end
141
+
142
+ let(:new_data) do
143
+ [ "host,country,city", "example.com,germany,berlin"]
144
+ end
145
+
146
+ let(:config) do
147
+ { "include_headers" => true }
148
+ end
149
+
150
+ it "include header information when requested" do
151
+ codec.decode(data[0]) # Read the headers
152
+ codec.decode(data[1]) do |event|
153
+ expect(event["size"]).to eq("big")
154
+ expect(event["animal"]).to eq("bird")
155
+ expect(event["movie"]).to eq("sesame street")
156
+ end
157
+ end
158
+
159
+ it "reset headers and fetch the new ones" do
160
+ data.each do |row|
161
+ codec.decode(row)
162
+ end
163
+ codec.reset
164
+ codec.decode(new_data[0]) # set the new headers
165
+ codec.decode(new_data[1]) do |event|
166
+ expect(event["host"]).to eq("example.com")
167
+ expect(event["country"]).to eq("germany")
168
+ expect(event["city"]).to eq("berlin")
169
+ end
170
+ end
171
+ end
172
+
173
+ describe "using field convertion" do
174
+
175
+ let(:config) do
176
+ { "convert" => { "column1" => "integer", "column3" => "boolean" } }
177
+ end
178
+ let(:data) { "1234,bird,false" }
179
+
180
+ it "get converted values to the expected type" do
181
+ codec.decode(data) do |event|
182
+ expect(event["column1"]).to eq(1234)
183
+ expect(event["column2"]).to eq("bird")
184
+ expect(event["column3"]).to eq(false)
185
+ end
186
+ end
187
+
188
+ context "when using column names" do
189
+
190
+ let(:config) do
191
+ { "convert" => { "custom1" => "integer", "custom3" => "boolean" },
192
+ "columns" => ["custom1", "custom2", "custom3"] }
193
+ end
194
+
195
+ it "get converted values to the expected type" do
196
+ codec.decode(data) do |event|
197
+ expect(event["custom1"]).to eq(1234)
198
+ expect(event["custom2"]).to eq("bird")
199
+ expect(event["custom3"]).to eq(false)
200
+ end
201
+ end
202
+ end
203
+ end
204
+
205
+ end
206
+ end
metadata ADDED
@@ -0,0 +1,82 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: logstash-codec-csv
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.2
5
+ platform: ruby
6
+ authors:
7
+ - Elasticsearch
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-03-24 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '1.0'
19
+ name: logstash-core-plugin-api
20
+ prerelease: false
21
+ type: :runtime
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.0'
27
+ - !ruby/object:Gem::Dependency
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: '0'
33
+ name: logstash-devutils
34
+ prerelease: false
35
+ type: :development
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ description: This gem is a logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/plugin install gemname. This gem is not a stand-alone program
42
+ email: info@elasticsearch.com
43
+ executables: []
44
+ extensions: []
45
+ extra_rdoc_files: []
46
+ files:
47
+ - CHANGELOG.md
48
+ - CONTRIBUTORS
49
+ - Gemfile
50
+ - LICENSE
51
+ - README.md
52
+ - lib/logstash/codecs/csv.rb
53
+ - logstash-codec-csv.gemspec
54
+ - spec/codecs/csv_spec.rb
55
+ homepage: http://www.elasticsearch.org/guide/en/logstash/current/index.html
56
+ licenses:
57
+ - Apache License (2.0)
58
+ metadata:
59
+ logstash_plugin: 'true'
60
+ logstash_group: codec
61
+ post_install_message:
62
+ rdoc_options: []
63
+ require_paths:
64
+ - lib
65
+ required_ruby_version: !ruby/object:Gem::Requirement
66
+ requirements:
67
+ - - ">="
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ required_rubygems_version: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ requirements: []
76
+ rubyforge_project:
77
+ rubygems_version: 2.4.8
78
+ signing_key:
79
+ specification_version: 4
80
+ summary: The csv codec take CSV data, parses it and passes it away
81
+ test_files:
82
+ - spec/codecs/csv_spec.rb