RubyGems - fluent-plugin-webhdfs - Versions diffs - 0.7.1 → 1.0.0 - Mend

fluent-plugin-webhdfs 0.7.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/.travis.yml +1 -10
data/Appraisals +0 -2
data/README.md +55 -45
data/fluent-plugin-webhdfs.gemspec +4 -4
data/lib/fluent/plugin/out_webhdfs.rb +193 -100
data/lib/fluent/plugin/webhdfs_compressor_bzip2.rb +3 -7
data/lib/fluent/plugin/webhdfs_compressor_gzip.rb +3 -3
data/lib/fluent/plugin/webhdfs_compressor_lzo_command.rb +3 -3
data/lib/fluent/plugin/webhdfs_compressor_snappy.rb +2 -2
data/lib/fluent/plugin/webhdfs_compressor_text.rb +2 -2
data/test/helper.rb +5 -0
data/test/plugin/test_compressor.rb +3 -3
data/test/plugin/test_out_webhdfs.rb +179 -105
metadata +18 -24

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: a090b029f8d2a4b177bb42ae8b496c35135c759d
-  data.tar.gz: 679a0a2dead29bd353ff1b3dd34585ca82ca7a52
+  metadata.gz: 8865ee69f790536d7ce119ead16abc7862efdf59
+  data.tar.gz: df3f7cc64b42d465733ca51815b9e066b9d4310c
 SHA512:
-  metadata.gz: e3304414c60a15aca45e1cb37ba5dfb367a68dc008f2e21e5c9fc52f44ee28ddd62d120cd000f91ea3ac599188120ce6d9ad3c997ad90c7425f28a75d0688fa7
-  data.tar.gz: c4afeda42653ea5b65a9c14838cbffc248e9a2d9cd8c18721ba11a0dc83df5443d309cc72b015f20d5e6ced1b21cd0125832ebc0b73701f31d66430876367e4d
+  metadata.gz: 384e20026a6b64a91c3ca59058e6e4e33480dda26057165cf9464006bc7e0b6d61170d97878abfe5c8c29a23b263e7a6db770f4cfcf4170b5de57ee3f2337da6
+  data.tar.gz: 464ec8333c3c97002150cabbeed4f13077abfe473944f4a56fb764414f25e6e03e793942f11ca270695c22cc813866e63c802debfa6f967ac08f98e4637b9e5d

data/.travis.yml CHANGED

@@ -2,10 +2,9 @@ sudo: false
 language: ruby
 rvm:
-  - 2.0.0
   - 2.1
   - 2.2
-  - 2.3.0
+  - 2.3.1
 branches:
   only:
@@ -23,12 +22,4 @@ script: bundle exec rake test
 gemfile:
   - Gemfile
-  - gemfiles/fluentd_v0.12.gemfile
   - gemfiles/fluentd_v0.14.gemfile
-matrix:
-  exclude:
-    - rvm: 2.0.0
-      gemfile: Gemfile
-    - rvm: 2.0.0
-      gemfile: gemfiles/fluentd_v0.14.gemfile

data/Appraisals CHANGED

@@ -1,12 +1,10 @@
 appraise "fluentd v0.12" do
   gem "fluentd", "~>0.12.0"
   gem "snappy"
-  gem "bzip2-ffi"
 end
 appraise "fluentd v0.14" do
   gem "fluentd", "~>0.14.0"
   gem "snappy"
-  gem "bzip2-ffi"
 end

data/README.md CHANGED

@@ -2,16 +2,18 @@
 [Fluentd](http://fluentd.org/) output plugin to write data into Hadoop HDFS over WebHDFS/HttpFs.
-WebHDFSOutput slices data by time (specified unit), and store these data as hdfs file of plain text. You can specify to:
+"webhdfs" output plugin formats data into plain text, and store it as files on HDFS. This plugin supports:
-* format whole data as serialized JSON, single attribute or separated multi attributes
-  * or LTSV, labeled-TSV (see http://ltsv.org/ )
-* include time as line header, or not
-* include tag as line header, or not
-* change field separator (default: TAB)
-* add new line as termination, or not
+* inject tag and time into record (and output plain text data) using `<inject>` section
+* format events into plain text by format plugins using `<format>` section
+* control flushing using `<buffer>` section
-And you can specify output file path as 'path /path/to/dir/access.%Y%m%d.log', then got '/path/to/dir/access.20120316.log' on HDFS.
+Paths on HDFS can be generated from event timestamp, tag or any other fields in records.
+### Older versions
+The versions of `0.x.x` of this plugin are for older version of Fluentd (v0.12.x). Old style configuration parameters (using `output_data_type`, `output_include_*` or others) are still supported, but are deprecated.
+Users should use `<format>` section to control how to format events into plain text.
 ## Configuration
@@ -26,15 +28,16 @@ To store data by time,tag,json (same with '@type file') over WebHDFS:
       path /path/on/hdfs/access.log.%Y%m%d_%H.log
     </match>
-If you want JSON object only (without time or tag or both on header of lines), specify it by `output_include_time` or `output_include_tag` (default true):
+If you want JSON object only (without time or tag or both on header of lines), use `<format>` section to specify `json` formatter:
     <match access.**>
       @type webhdfs
       host namenode.your.cluster.local
       port 50070
       path /path/on/hdfs/access.log.%Y%m%d_%H.log
-      output_include_time false
-      output_include_tag  false
+      <format>
+        @type json
+      </format>
     </match>
 To specify namenode, `namenode` is also available:
@@ -45,14 +48,47 @@ To specify namenode, `namenode` is also available:
       path     /path/on/hdfs/access.log.%Y%m%d_%H.log
     </match>
-To store data as LTSV without time and tag over WebHDFS:
+To store data as JSON, including time and tag (using `<inject>`), over WebHDFS:
     <match access.**>
       @type webhdfs
       host namenode.your.cluster.local
       port 50070
       path /path/on/hdfs/access.log.%Y%m%d_%H.log
-      output_data_type ltsv
+      <buffer>
+        timekey_zone -0700 # to specify timezone used for "path" time placeholder formatting
+      </buffer>
+      <inject>
+        tag_key   tag
+        time_key  time
+        time_type string
+        timezone  -0700
+      </inject>
+      <format>
+        @type json
+      </format>
+    </match>
+To store data as JSON, including time as unix time, using path including tag as directory:
+    <match access.**>
+      @type webhdfs
+      host namenode.your.cluster.local
+      port 50070
+      path /path/on/hdfs/${tag}/access.log.%Y%m%d_%H.log
+      <buffer time,tag>
+        @type   file                    # using file buffer
+        path    /var/log/fluentd/buffer # buffer directory path
+        timekey 3h           # create a file per 3h
+        timekey_use_utc true # time in path are formatted in UTC (default false means localtime)
+      </buffer>
+      <inject>
+        time_key  time
+        time_type unixtime
+      </inject>
+      <format>
+        @type json
+      </format>
     </match>
 With username of pseudo authentication:
@@ -75,24 +111,6 @@ Store data over HttpFs (instead of WebHDFS):
       httpfs true
     </match>
-Store data as TSV (TAB separated values) of specified keys, without time, with tag (removed prefix 'access'):
-    <match access.**>
-      @type webhdfs
-      host namenode.your.cluster.local
-      port 50070
-      path /path/on/hdfs/access.log.%Y%m%d_%H.log
-      field_separator TAB        # or 'SPACE', 'COMMA' or 'SOH'(Start Of Heading: \001)
-      output_include_time false
-      output_include_tag true
-      remove_prefix access
-      output_data_type attr:path,status,referer,agent,bytes
-    </match>
-If message doesn't have specified attribute, fluent-plugin-webhdfs outputs 'NULL' instead of values.
 With ssl:
     <match access.**>
@@ -118,11 +136,8 @@ With kerberos authentication:
       port 50070
       path /path/on/hdfs/access.log.%Y%m%d_%H.log
       kerberos true
-      kerberos_keytab /path/to/keytab # if needed
     </match>
-NOTE: You need to install `gssapi` gem for kerberos. See https://github.com/kzk/webhdfs#for-kerberos-authentication
 If you want to compress data before storing it:
     <match access.**>
@@ -134,10 +149,7 @@ If you want to compress data before storing it:
     </match>
 Note that if you set `compress gzip`, then the suffix `.gz` will be added to path (or `.bz2`, `sz`, `.lzo`).
-Note that you have to install additional gem for several compress algorithms:
-- snappy: install snappy gem
-- bzip2: install bzip2-ffi gem
+Note that you have to install snappy gem if you want to  set `compress snappy`.
 ### Namenode HA / Auto retry for WebHDFS known errors
@@ -164,7 +176,7 @@ And you can also specify to retry known hdfs errors (such like `LeaseExpiredExce
 ### Performance notifications
 Writing data on HDFS single file from 2 or more fluentd nodes, makes many bad blocks of HDFS. If you want to run 2 or more fluentd nodes with fluent-plugin-webhdfs, you should configure 'path' for each node.
-You can use '${hostname}' or '${uuid:random}' placeholders in configuration for this purpose.
+To include hostname, `#{Socket.gethostname}` is available in Fluentd configuration string literals by ruby expression (in `"..."` strings). This plugin also supports `${uuid}` placeholder to include random uuid in paths.
 For hostname:
@@ -172,7 +184,7 @@ For hostname:
       @type webhdfs
       host namenode.your.cluster.local
       port 50070
-      path /log/access/%Y%m%d/${hostname}.log
+      path "/log/access/%Y%m%d/#{Socket.gethostname}.log" # double quotes needed to expand ruby expression in string
     </match>
 Or with random filename (to avoid duplicated file name only):
@@ -181,10 +193,10 @@ Or with random filename (to avoid duplicated file name only):
       @type webhdfs
       host namenode.your.cluster.local
       port 50070
-      path /log/access/%Y%m%d/${uuid:random}.log
+      path /log/access/%Y%m%d/${uuid}.log
     </match>
-With configurations above, you can handle all of files of '/log/access/20120820/*' as specified timeslice access logs.
+With configurations above, you can handle all of files of `/log/access/20120820/*` as specified timeslice access logs.
 For high load cluster nodes, you can specify timeouts for HTTP requests.
@@ -220,15 +232,13 @@ With unstable datanodes that frequently downs, appending over WebHDFS may produc
       port 50070
       append no
-      path   /log/access/%Y%m%d/${hostname}.${chunk_id}.log
+      path   "/log/access/%Y%m%d/#{Socket.gethostname}.${chunk_id}.log"
     </match>
 `out_webhdfs` creates new files on hdfs per flush of fluentd, with chunk id. You shouldn't care broken files from append operations.
 ## TODO
-* configuration example for Hadoop Namenode HA
-  * here, or docs.fluentd.org ?
 * patches welcome!
 ## Copyright

data/fluent-plugin-webhdfs.gemspec CHANGED

@@ -2,7 +2,7 @@
 Gem::Specification.new do |gem|
   gem.name          = "fluent-plugin-webhdfs"
-  gem.version       = "0.7.1"
+  gem.version       = "1.0.0"
   gem.authors       = ["TAGOMORI Satoshi"]
   gem.email         = ["tagomoris@gmail.com"]
   gem.summary       = %q{Fluentd plugin to write data on HDFS over WebHDFS, with flexible formatting}
@@ -17,10 +17,10 @@ Gem::Specification.new do |gem|
   gem.add_development_dependency "rake"
   gem.add_development_dependency "test-unit"
+  gem.add_development_dependency "test-unit-rr"
   gem.add_development_dependency "appraisal"
   gem.add_development_dependency "snappy", '>= 0.0.13'
-  gem.add_development_dependency "bzip2-ffi"
-  gem.add_runtime_dependency "fluentd", ['>= 0.10.59', "< 0.14.0"]
-  gem.add_runtime_dependency "fluent-mixin-plaintextformatter", '>= 0.2.1'
+  gem.add_runtime_dependency "fluentd", '>= 0.14.4'
   gem.add_runtime_dependency "webhdfs", '>= 0.6.0'
+  gem.add_runtime_dependency "bzip2-ffi"
 end

data/lib/fluent/plugin/out_webhdfs.rb CHANGED

@@ -1,131 +1,138 @@
 # -*- coding: utf-8 -*-
+require 'fluent/plugin/output'
+require 'fluent/config/element'
+require 'webhdfs'
 require 'tempfile'
 require 'securerandom'
-require 'fluent/mixin/plaintextformatter'
-class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
+class Fluent::Plugin::WebHDFSOutput < Fluent::Plugin::Output
   Fluent::Plugin.register_output('webhdfs', self)
-  config_set_default :buffer_type, 'memory'
-  config_set_default :time_slice_format, '%Y%m%d'
-  # For fluentd v0.12.16 or earlier
-  class << self
-    unless method_defined?(:desc)
-      def desc(description)
-      end
-    end
-  end
+  helpers :inject, :formatter, :compat_parameters
   desc 'WebHDFS/HttpFs host'
-  config_param :host, :string, :default => nil
+  config_param :host, :string, default: nil
   desc 'WebHDFS/HttpFs port'
-  config_param :port, :integer, :default => 50070
+  config_param :port, :integer, default: 50070
   desc 'Namenode (host:port)'
-  config_param :namenode, :string, :default => nil # host:port
+  config_param :namenode, :string, default: nil # host:port
   desc 'Standby namenode for Namenode HA (host:port)'
-  config_param :standby_namenode, :string, :default => nil # host:port
+  config_param :standby_namenode, :string, default: nil # host:port
   desc 'Ignore errors on start up'
-  config_param :ignore_start_check_error, :bool, :default => false
+  config_param :ignore_start_check_error, :bool, default: false
   desc 'Output file path on HDFS'
   config_param :path, :string
   desc 'User name for pseudo authentication'
-  config_param :username, :string, :default => nil
+  config_param :username, :string, default: nil
   desc 'Store data over HttpFs instead of WebHDFS'
-  config_param :httpfs, :bool, :default => false
+  config_param :httpfs, :bool, default: false
   desc 'Number of seconds to wait for the connection to open'
-  config_param :open_timeout, :integer, :default => 30 # from ruby net/http default
+  config_param :open_timeout, :integer, default: 30 # from ruby net/http default
   desc 'Number of seconds to wait for one block to be read'
-  config_param :read_timeout, :integer, :default => 60 # from ruby net/http default
+  config_param :read_timeout, :integer, default: 60 # from ruby net/http default
   desc 'Retry automatically when known errors of HDFS are occurred'
-  config_param :retry_known_errors, :bool, :default => false
+  config_param :retry_known_errors, :bool, default: false
   desc 'Retry interval'
-  config_param :retry_interval, :integer, :default => nil
+  config_param :retry_interval, :integer, default: nil
   desc 'The number of retries'
-  config_param :retry_times, :integer, :default => nil
+  config_param :retry_times, :integer, default: nil
   # how many times of write failure before switch to standby namenode
   # by default it's 11 times that costs 1023 seconds inside fluentd,
   # which is considered enough to exclude the scenes that caused by temporary network fail or single datanode fail
   desc 'How many times of write failure before switch to standby namenode'
-  config_param :failures_before_use_standby, :integer, :default => 11
-  include Fluent::Mixin::PlainTextFormatter
+  config_param :failures_before_use_standby, :integer, default: 11
-  config_param :default_tag, :string, :default => 'tag_missing'
+  config_param :end_with_newline, :bool, default: true
   desc 'Append data or not'
-  config_param :append, :bool, :default => true
+  config_param :append, :bool, default: true
   desc 'Use SSL or not'
-  config_param :ssl, :bool, :default => false
+  config_param :ssl, :bool, default: false
   desc 'OpenSSL certificate authority file'
-  config_param :ssl_ca_file, :string, :default => nil
+  config_param :ssl_ca_file, :string, default: nil
   desc 'OpenSSL verify mode (none,peer)'
-  config_param :ssl_verify_mode, :default => nil do |val|
-    case val
-    when 'none'
-      :none
-    when 'peer'
-      :peer
-    else
-      raise Fluent::ConfigError, "unexpected parameter on ssl_verify_mode: #{val}"
-    end
-  end
+  config_param :ssl_verify_mode, :enum, list: [:none, :peer], default: :none
   desc 'Use kerberos authentication or not'
-  config_param :kerberos, :bool, :default => false
-  desc 'kerberos keytab file'
-  config_param :kerberos_keytab, :string, :default => nil
+  config_param :kerberos, :bool, default: false
-  SUPPORTED_COMPRESS = ['gzip', 'bzip2', 'snappy', 'lzo_command', 'text']
+  SUPPORTED_COMPRESS = [:gzip, :bzip2, :snappy, :lzo_command, :text]
   desc "Compress method (#{SUPPORTED_COMPRESS.join(',')})"
-  config_param :compress, :default => nil do |val|
-    unless SUPPORTED_COMPRESS.include? val
-      raise Fluent::ConfigError, "unsupported compress: #{val}"
-    end
-    val
-  end
+  config_param :compress, :enum, list: SUPPORTED_COMPRESS, default: :text
+  config_param :remove_prefix, :string, default: nil, deprecated: "use @label for routing"
+  config_param :default_tag, :string, default: nil, deprecated: "use @label for routing"
+  config_param :null_value, :string, default: nil, deprecated: "use filter plugins to convert null values into any specified string"
+  config_param :suppress_log_broken_string, :bool, default: false, deprecated: "use @log_level for plugin to suppress such info logs"
   CHUNK_ID_PLACE_HOLDER = '${chunk_id}'
-  attr_reader :compressor
+  config_section :buffer do
+    config_set_default :chunk_keys, ["time"]
+  end
+  config_section :format do
+    config_set_default :@type, 'out_file'
+    config_set_default :localtime, false # default timezone is UTC
+  end
+  attr_reader :formatter, :compressor
   def initialize
     super
-    require 'net/http'
-    require 'time'
-    require 'webhdfs'
     @compressor = nil
-  end
-  # Define `log` method for v0.10.42 or earlier
-  unless method_defined?(:log)
-    define_method("log") { $log }
+    @standby_namenode_host = nil
+    @output_include_tag = @output_include_time = nil # TODO: deprecated
+    @header_separator = @field_separator = nil # TODO: deprecated
   end
   def configure(conf)
-    if conf['path']
-      if conf['path'].index('%S')
-        conf['time_slice_format'] = '%Y%m%d%H%M%S'
-      elsif conf['path'].index('%M')
-        conf['time_slice_format'] = '%Y%m%d%H%M'
-      elsif conf['path'].index('%H')
-        conf['time_slice_format'] = '%Y%m%d%H'
-      end
+    compat_parameters_convert(conf, :buffer, default_chunk_key: "time")
+    timekey = case conf["path"]
+              when /%S/ then 1
+              when /%M/ then 60
+              when /%H/ then 3600
+              else 86400
+              end
+    if conf.elements(name: "buffer").empty?
+      e = Fluent::Config::Element.new("buffer", "time", {}, [])
+      conf.elements << e
     end
+    buffer_config = conf.elements(name: "buffer").first
+    buffer_config["timekey"] = timekey unless buffer_config["timekey"]
-    verify_config_placeholders_in_path!(conf)
+    compat_parameters_convert_plaintextformatter(conf)
     super
+    @formatter = formatter_create
+    if @using_formatter_config
+      @null_value = nil
+    else
+      @formatter.delimiter = "\x01" if @formatter.respond_to?(:delimiter) && @formatter.delimiter == 'SOH'
+      @null_value ||= 'NULL'
+    end
+    if @default_tag.nil? && !@using_formatter_config && @output_include_tag
+      @default_tag = "tag_missing"
+    end
+    if @remove_prefix
+      @remove_prefix_actual = @remove_prefix + "."
+      @remove_prefix_actual_length = @remove_prefix_actual.length
+    end
+    verify_config_placeholders_in_path!(conf)
     @replace_random_uuid = @path.include?('%{uuid}') || @path.include?('%{uuid_flush}')
     if @replace_random_uuid
       # to check SecureRandom.uuid is available or not (NotImplementedError raised in such environment)
@@ -136,14 +143,7 @@ class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
       end
     end
-    begin
-      @compressor = COMPRESSOR_REGISTRY.lookup(@compress || 'text').new
-    rescue Fluent::ConfigError
-      raise
-    rescue
-      $log.warn "#{@comress} not found. Use 'text' instead"
-      @compressor = COMPRESSOR_REGISTRY.lookup('text').new
-    end
+    @compressor = COMPRESSOR_REGISTRY.lookup(@compress.to_s).new
     if @host
       @namenode_host = @host
@@ -178,7 +178,7 @@ class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
       @client_standby = nil
     end
-    if not @append
+    unless @append
       if @path.index(CHUNK_ID_PLACE_HOLDER).nil?
         raise Fluent::ConfigError, "path must contain ${chunk_id}, which is the placeholder for chunk_id, when append is set to false."
       end
@@ -204,7 +204,6 @@ class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
     end
     if @kerberos
       client.kerberos = true
-      client.kerberos_keytab = @kerberos_keytab if @kerberos_keytab
     end
     client
@@ -242,14 +241,6 @@ class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
     end
   end
-  def shutdown
-    super
-  end
-  def path_format(chunk_key)
-    Time.strptime(chunk_key, @time_slice_format).strftime(@path)
-  end
   def is_standby_exception(e)
     e.is_a?(WebHDFS::IOError) && e.message.match(/org\.apache\.hadoop\.ipc\.StandbyException/)
   end
@@ -261,12 +252,6 @@ class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
     end
   end
-  def chunk_unique_id_to_str(unique_id)
-    unique_id.unpack('C*').map{|x| x.to_s(16).rjust(2,'0')}.join('')
-  end
-  # TODO check conflictions
   def send_data(path, data)
     if @append
       begin
@@ -281,7 +266,7 @@ class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
   HOSTNAME_PLACEHOLDERS_DEPRECATED = ['${hostname}', '%{hostname}', '__HOSTNAME__']
   UUID_RANDOM_PLACEHOLDERS_DEPRECATED = ['${uuid}', '${uuid:random}', '__UUID__', '__UUID_RANDOM__']
-  UUID_OTHER_PLACEHOLDERS_OBSOLETED = ['${uuid:hostname}', '%{uuid:hostname}', '__UUID_HOSTNAME__', '${uuid:timestamp}', '%{uuid:timestamp}', '__UUID_TIMESTAMP__']
+  UUID_OTHER_PLACEHOLDERS_OBSOLETED = ['${uuid:hostname}', '%{uuid:hostname}', '__UUID_HOSTNAME__', '${uuid:timestamp}', '%{uuid:timestamp}', '__UUID_TIMESTAMP__']
   def verify_config_placeholders_in_path!(conf)
     return unless conf.has_key?('path')
@@ -310,20 +295,20 @@ class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
           log.error "configuration placeholder #{ph} is now unsupported by webhdfs output plugin."
         end
       end
-      raise Fluent::ConfigError, "there are unsupported placeholders in path."
+      raise ConfigError, "there are unsupported placeholders in path."
     end
   end
   def generate_path(chunk)
     hdfs_path = if @append
-                  path_format(chunk.key)
+                  extract_placeholders(@path, chunk.metadata)
                 else
-                  path_format(chunk.key).gsub(CHUNK_ID_PLACE_HOLDER, chunk_unique_id_to_str(chunk.unique_id))
+                  extract_placeholders(@path, chunk.metadata).gsub(CHUNK_ID_PLACE_HOLDER, dump_unique_id(chunk.unique_id))
                 end
     hdfs_path = "#{hdfs_path}#{@compressor.ext}"
     if @replace_random_uuid
       uuid_random = SecureRandom.uuid
-      hdfs_path = hdfs_path.gsub('%{uuid}', uuid_random).gsub('%{uuid_flush}', uuid_random)
+      hdfs_path.gsub!('%{uuid}', uuid_random).gsub!('%{uuid_flush}', uuid_random)
     end
     hdfs_path
   end
@@ -339,6 +324,48 @@ class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
     end
   end
+  def format(tag, time, record)
+    if @remove_prefix # TODO: remove when it's obsoleted
+      if tag.start_with?(@remove_prefix_actual)
+        if tag.length > @remove_prefix_actual_length
+          tag = tag[@remove_prefix_actual_length..-1]
+        else
+          tag = @default_tag
+        end
+      elsif tag.start_with?(@remove_prefix)
+        if tag == @remove_prefix
+          tag = @default_tag
+        else
+          tag = tag.sub(@remove_prefix, '')
+        end
+      end
+    end
+    if @null_value # TODO: remove when it's obsoleted
+      check_keys = (record.keys + @null_convert_keys).uniq
+      check_keys.each do |key|
+        record[key] = @null_value if record[key].nil?
+      end
+    end
+    if @using_formatter_config
+      record = inject_values_to_record(tag, time, record)
+      line = @formatter.format(tag, time, record)
+    else # TODO: remove when it's obsoleted
+      time_str = @output_include_time ? @time_formatter.call(time) + @header_separator : ''
+      tag_str = @output_include_tag ? tag + @header_separator : ''
+      record_str = @formatter.format(tag, time, record)
+      line = time_str + tag_str + record_str
+    end
+    line << "\n" if @end_with_newline && !line.end_with?("\n")
+    line
+  rescue => e # remove this clause when @suppress_log_broken_string is obsoleted
+    unless @suppress_log_broken_string
+      log.info "unexpected error while formatting events, ignored", tag: tag, record: record, error: e
+    end
+    ''
+  end
   def write(chunk)
     hdfs_path = generate_path(chunk)
@@ -369,6 +396,72 @@ class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
     hdfs_path
   end
+  def compat_parameters_convert_plaintextformatter(conf)
+    if !conf.elements('format').empty? || !conf['output_data_type']
+      @using_formatter_config = true
+      @null_convert_keys = []
+      return
+    end
+    log.warn "webhdfs output plugin is working with old configuration parameters. use <inject>/<format> sections instead for further releases."
+    @using_formatter_config = false
+    @null_convert_keys = []
+    @header_separator = case conf['field_separator']
+                        when nil     then "\t"
+                        when 'SPACE' then ' '
+                        when 'TAB'   then "\t"
+                        when 'COMMA' then ','
+                        when 'SOH'   then "\x01"
+                        else conf['field_separator']
+                        end
+    format_section = Fluent::Config::Element.new('format', '', {}, [])
+    case conf['output_data_type']
+    when '', 'json' # blank value is for compatibility reason (especially in testing)
+      format_section['@type'] = 'json'
+    when 'ltsv'
+      format_section['@type'] = 'ltsv'
+    else
+      unless conf['output_data_type'].start_with?('attr:')
+        raise Fluent::ConfigError, "output_data_type is invalid: #{conf['output_data_type']}"
+      end
+      format_section['@format'] = 'tsv'
+      keys_part = conf['output_data_type'].sub(/^attr:/, '')
+      @null_convert_keys = keys_part.split(',')
+      format_section['keys'] = keys_part
+      format_section['delimiter'] = case conf['field_separator']
+                                    when nil then '\t'
+                                    when 'SPACE' then ' '
+                                    when 'TAB'   then '\t'
+                                    when 'COMMA' then ','
+                                    when 'SOH'   then 'SOH' # fixed later
+                                    else conf['field_separator']
+                                    end
+    end
+    conf.elements << format_section
+    @output_include_time = conf.has_key?('output_include_time') ? Fluent::Config.bool_value(conf['output_include_time']) : true
+    @output_include_tag = conf.has_key?('output_include_tag') ? Fluent::Config.bool_value(conf['output_include_tag']) : true
+    if @output_include_time
+      # default timezone is UTC
+      using_localtime = if !conf.has_key?('utc') && !conf.has_key?('localtime')
+                          false
+                        elsif conf.has_key?('localtime') && conf.has_key?('utc')
+                          raise Fluent::ConfigError, "specify either 'localtime' or 'utc'"
+                        elsif conf.has_key?('localtime')
+                          Fluent::Config.bool_value('localtime')
+                        else
+                          Fluent::Config.bool_value('utc')
+                        end
+      @time_formatter = Fluent::TimeFormatter.new(conf['time_format'], using_localtime)
+    else
+      @time_formatter = nil
+    end
+  end
   class Compressor
     include Fluent::Configurable
@@ -395,7 +488,7 @@ class Fluent::WebHDFSOutput < Fluent::TimeSlicedOutput
       begin
         Open3.capture3("#{command} -V")
       rescue Errno::ENOENT
-        raise Fluent::ConfigError, "'#{command}' utility must be in PATH for #{algo} compression"
+        raise ConfigError, "'#{command}' utility must be in PATH for #{algo} compression"
       end
     end
   end