RubyGems - template_parser - Versions diffs - 1.0.0 - Mend

template_parser 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

data/.gitignore +5 -0
data/Gemfile +4 -0
data/README.md +79 -0
data/Rakefile +2 -0
data/lib/template_parser.rb +248 -0
data/lib/template_parser/version.rb +3 -0
data/template_parser.gemspec +21 -0
metadata +63 -0

data/.gitignore ADDED

@@ -0,0 +1,5 @@
+*.gem
+.bundle
+Gemfile.lock
+pkg/*
+config/deploy.rb

data/Gemfile ADDED

@@ -0,0 +1,4 @@
+source "http://rubygems.org"
+# Specify your gem's dependencies in template_parser.gemspec
+gemspec

data/README.md ADDED

@@ -0,0 +1,79 @@
+# Template Parser
+Parse ASCII files by example
+## Basic usage
+First, create a template. In this example I'm parsing an ASCII purchase
+order of some sort.
+  template = TemplateParser.compile_template([
+    "?<:>REPORT NUMBER: ?                                                                                                                 ",
+    " INVENTORY NO:  #number                REF NUMBER:  #refer_number                  LOGICAL UNIT: #unit_number    ?                   ",
+    "                    CATALOG CODE:    :cat_code   :sub_code                         VENDOR NO:    :vendor                             ",
+    "                    TYPE CODE:       :lpr                FLAG: ?  BILL FLAG: <:bill_flag>#_    SHIPPING FROM: ?            SHIPPING TO: ?        ",
+    "                    FROM-LOCATION:  :address_code                  TO LOCATION: :to_address_code                                     ",
+    "                                    :address_name                               ?                                                    ",
+    "                                    :address_1                                  ?                                                    ",
+    "                                    :address_2                                  ?                                                    ",
+    "                                    :address_city         :address_postal       ?                                                    ",
+    "                    USER NAME: :user_name                                            SERIAL NUMBER: :serial                          "
+    ].join("\n"))
+Let's break down the template a little bit.
+The first thing to note is that it's not hard to imagine what the date
+we're parsing will actually look like. That's because the field
+definitions go in the physical locations where the data should be, while
+the rest of the report appears unchanged. While parsing a report, if a
+single character is out of place, parsing will fail with a detailed
+error message. I've found that by failing fast I've been able to find
+the edge cases easily and get perfect parsing results on literally
+gigabytes of generated reports.
+There are a few different types of fields visible here as well. Some
+start with #, indicating a numeric field. Most start with : and look
+like symbols, indicating a text field. There are also some ?'s
+indicating that something may appear there but we'll ignore it. Finally
+there are some zero-width field names which look like <:name>:_ which
+can appear where the field name would otherwise not fit. A variation of
+that is where <:> can be used by itself as a 0-width delimiter to
+prevent a field from being too long.
+Side note: typically I would use [ruby here doc](http://blog.jayfields.com/2006/12/ruby-multiline-strings-here-doc-or.html)
+string notation but here am concatinating an array of strings to help
+demonstrate that whitespace is significant. That way I can just copy in
+an example of the report I'm interested in and carve out my field
+definitions directly.
+An array of line matchers are returned from the compile_template method.
+### Using the array of line matchers (template)
+Does the given line have a match in any of the lines in the template?
+  TemplateParser.match_template?(template, line)
+Get the results of matching any line in the template to the given line
+  TemplateParser.match_template(template, line, file_position_metadata) { |matcher, converted_data, raw_data| }
+Process all given lines against the template in order
+  TemplateParser.process_lines(template, lines, file_position_metadata)
+Return true if process_lines would run successfully on the given lines
+for the given template.
+  TemplateParser.lines_match_template?(template, lines)
+### Using individual line matchers
+Does the given line match the given line matcher?
+  TemplateParser.match_line?(line_matcher, line)
+Process a given line against a given line matcher
+  TemplateParser.process_line(line_matcher, line, file_position_metadata) { |matcher, converted_data, raw_data| }

data/Rakefile ADDED

	@@ -0,0 +1,2 @@
1	+ require 'bundler'
2	+ Bundler::GemHelper.install_tasks

data/lib/template_parser.rb ADDED

@@ -0,0 +1,248 @@
+module TemplateParser
+  class ProcessingError < StandardError
+    attr_reader :matcher, :line, :pos, :meta
+    def initialize(matcher, line, pos, meta, message)
+      @matcher, @line, @pos, @meta = matcher, line, pos, meta
+      super(message)
+    end
+  end
+  class ProcessingErrors < StandardError
+    attr_reader :errors
+    def initialize(message, errors)
+      @errors = errors
+      super(message + "\n\n" + errors.map { |e| e.message }.compact.join("\n-----------------------------------------------------------\n"))
+    end
+  end
+  module Parser
+    # Create an array of line matchers based on a template.
+    def compile_template(template)
+      template_lines = template.to_enum(:each_line).map { |line| line.chomp }
+      template_lines.map do |line|
+        compile_template_line(line)
+      end
+    end
+    # Does the given line match the given line matcher?
+    def match_line?(matchers, line)
+      matchers.first[:regex] =~ line
+    end
+    # Does the given line have a match in any of the lines in the template?
+    def match_template?(template, line)
+      template.detect { |matcher| match_line?(matcher, line) }
+    end
+    # Return true if process_lines would run successfully on the given
+    # lines for the given template.
+    def lines_match_template?(template, lines)
+      template.zip(lines).all? do |matchers, line|
+        match_line?(matchers, line)
+      end
+    end
+    # Get the results of matching any line in the template to the given line
+    def match_template(template, line, meta = {})
+      matcher = match_template? template, line
+      if matcher
+        if block_given?
+          process_line(matcher, line, meta) { |*x| yield *x }
+        else
+          process_line(matcher, line, meta)
+        end
+      else
+        errors = template.map do |matcher|
+          begin
+            process_line(matcher, line, meta) { |*x| }
+          rescue ProcessingError => e
+            e
+          end
+        end
+        raise ProcessingErrors.new("At least one of the following #{ template.length } template lines should match the given line", errors)
+      end
+    end
+    # Process all given lines against the template in order
+    def process_lines(line_matchers, lines, meta = {})
+      record = {}
+      line_matchers.zip(lines) do |matchers, line|
+        process_line(matchers, line, meta) do |matcher, data, raw|
+          record[matcher[:symbol]] = data if data != ''
+        end
+      end
+      record
+    end
+    def formatters(template)
+      template.map { |matcher| matcher.first[:formatter] }
+    end
+    def format_any_line(template, data)
+      formatter = formatters(template).detect do |formatter|
+        formatter[:lengths].all? do |name, length|
+          data[name] and data[name].to_s.length <= length
+        end
+      end
+      if formatter
+        formatter[:format] % formatter[:keys].map { |key| data[key] }
+      else
+        raise 'No formatter found'
+      end
+    end
+    def format_template(template, data)
+      template.map do |matcher|
+        formatter = matcher.first[:formatter]
+        formatter[:format] % formatter[:keys].map { |key| data[key] }
+      end.join "\n"
+    end
+    def process_any_line(template, line, meta = {})
+      record = {}
+      match_template(template, line) do |matcher, data, raw|
+        record[matcher[:symbol]] = data if data != ''
+      end
+      record
+    end
+    # Process a given line against a given line matcher
+    def process_line(matchers, line, meta = {})
+      pos = 0
+      unless block_given?
+        record = {}
+        process_line(matchers, line, meta) do |m, d, lp|
+          record[m[:symbol]] = d
+        end
+        return record
+      end
+      matchers.each do |matcher|
+        line_part = line[pos, matcher[:length]]
+        processing_error!('Unexpected EOL', matcher, line, pos, meta) unless line_part
+        #processing_error!('Unexpected EOL', matcher, line, pos, meta) if line_part.length < matcher[:length]
+        case matcher[:type]
+        when :string
+          if matcher[:string] != line_part
+            processing_error!("Mismatch: #{ line_part.inspect } should be #{ matcher[:string].inspect }", matcher, line, pos, meta)
+          end
+          yield matcher, line_part, line_part if matcher[:symbol]
+        when :data
+          yield matcher, line_part.strip, line_part
+        when :int
+          data = line_part.strip
+          if data == ''
+            yield matcher, nil, line_part
+          else
+            begin
+              yield matcher, Integer(data.sub(/^0*(\d)/, '\1')), line_part
+            rescue => e
+              processing_error!(e.message, matcher, line, pos, meta)
+            end
+          end
+        end
+        pos += matcher[:length]
+      end
+    end
+    private
+    def compile_template_line(line)
+      parts = line.split(/([#:]\w+\s*\]?|\?\s*\]?|<[#:]\w*>)/)
+      next_symbol = nil
+      next_type = nil
+      matchers = parts.map do |part|
+        len = part.length
+        if len > 0
+          if part =~ /^<[#:]\w*>$/
+            next_symbol = part[2..-2]
+            if next_symbol.length > 0
+              next_symbol = next_symbol.to_sym
+              next_type = part[1, 1] == '#' ? :int : :data
+            else
+              # <:> used as a 0-width delimiter
+              next_symbol = nil
+            end
+            nil
+          else
+            part = part[0..-2] if part[-1, 1] == ']'
+            matcher = case part[0, 1]
+              when ':'
+                { :type => :data, :symbol => part.strip[1..-1].to_sym, :length => len, :template => line }
+              when '#'
+                { :type => :int, :symbol => part.strip[1..-1].to_sym, :length => len, :template => line }
+              when '?'
+                { :type => :ignore, :length => len, :template => line }
+              else
+                { :type => :string, :string => part, :length => len, :template => line }
+              end
+            if next_symbol
+              matcher[:symbol] = next_symbol
+              matcher[:type] = next_type if matcher[:type] == :ignore
+              next_symbol = nil
+            end
+            matcher
+          end
+        end
+      end.compact
+      compile_regex(matchers)
+      compile_formatter(matchers)
+      matchers
+    end
+    def compile_regex(matchers)
+      str = matchers.map do |matcher|
+        case matcher[:type]
+        when :string
+          "(?:#{Regexp.escape matcher[:string]}|#{Regexp.escape(matcher[:string].rstrip)}$)"
+        when :int
+          "(?:[ 0-9]{#{matcher[:length]}}|[ 0-9]{,#{matcher[:length]}}$)"
+        when :data, :ignore
+          "(?:.{#{matcher[:length]}}|.{,#{matcher[:length]}}$)"
+        end
+      end.join('')
+      matchers.first[:regex] = Regexp.new("\\A#{ str }", Regexp::MULTILINE)
+    end
+    def compile_formatter(matchers)
+      keys = []
+      lengths = {}
+      str = matchers.map do |matcher|
+        if matcher[:symbol]
+          keys << matcher[:symbol]
+          lengths[matcher[:symbol]] = matcher[:length]
+        end
+        case matcher[:type]
+        when :data
+          "%-#{matcher[:length]}s"
+        when :int
+          "%#{matcher[:length]}s"
+        when :ignore
+          " " * matcher[:length]
+        when :string
+          matcher[:string]
+        end
+      end.join('')
+      matchers.first[:formatter] = { :format => str, :keys => keys, :lengths => lengths }
+    end
+    def processing_error!(message, matcher, line, pos, meta)
+      message = <<-MESSAGE
+#{ message }:
+  #{ matcher[:template].inspect.gsub(/<[:#]\w*>/, '') }
+  #{ line.inspect }
+   #{ ' ' * pos }^#{ '^' * (matcher[:length] > 0 ? matcher[:length] - 1 : 0) }
+  file: #{ meta[:file] } @ #{ meta[:line_num] }
+      MESSAGE
+      message += "\n\n#{ meta[:lines] }" if meta[:lines]
+      raise ProcessingError.new(matcher, line, pos, meta, message)
+    end
+  end
+  extend Parser
+  class Base
+    include Parser
+  end
+end

data/lib/template_parser/version.rb ADDED

@@ -0,0 +1,3 @@
+module TemplateParser
+  VERSION = "1.0.0"
+end

data/template_parser.gemspec ADDED

@@ -0,0 +1,21 @@
+# -*- encoding: utf-8 -*-
+$:.push File.expand_path("../lib", __FILE__)
+require "template_parser/version"
+Gem::Specification.new do |s|
+  s.name        = "template_parser"
+  s.version     = TemplateParser::VERSION
+  s.platform    = Gem::Platform::RUBY
+  s.authors     = ["Darrick Wiebe"]
+  s.email       = ["darrick@innatesoftware.com"]
+  s.homepage    = "https://github.com/pangloss/template_parser"
+  s.summary     = %q{Parse text files by example}
+  s.description = %q{When you need to parse crazy oldschool ascii reports from mainframes or legacy applications of all sorts, this tool can make it quite easy and keep your code concise and maintainable.}
+  s.rubyforge_project = "template_parser"
+  s.files         = `git ls-files`.split("\n")
+  s.test_files    = `git ls-files -- {test,spec,features}/*`.split("\n")
+  s.executables   = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
+  s.require_paths = ["lib"]
+end

metadata ADDED

@@ -0,0 +1,63 @@
+--- !ruby/object:Gem::Specification
+name: template_parser
+version: !ruby/object:Gem::Version
+  prerelease:
+  version: 1.0.0
+platform: ruby
+authors:
+- Darrick Wiebe
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2011-06-24 00:00:00 -04:00
+default_executable:
+dependencies: []
+description: When you need to parse crazy oldschool ascii reports from mainframes or legacy applications of all sorts, this tool can make it quite easy and keep your code concise and maintainable.
+email:
+- darrick@innatesoftware.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- .gitignore
+- Gemfile
+- README.md
+- Rakefile
+- lib/template_parser.rb
+- lib/template_parser/version.rb
+- template_parser.gemspec
+has_rdoc: true
+homepage: https://github.com/pangloss/template_parser
+licenses: []
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: "0"
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: "0"
+requirements: []
+rubyforge_project: template_parser
+rubygems_version: 1.5.2
+signing_key:
+specification_version: 3
+summary: Parse text files by example
+test_files: []