RubyGems - structuredtext - Versions diffs - 1.0.0 - Mend

structuredtext 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

data/README +90 -0
data/lib/structuredtext.rb +206 -0
data/test/test_structuredtext.rb +144 -0
metadata +60 -0

data/README ADDED Viewed

@@ -0,0 +1,90 @@
+= Structured Text Utilities
+This module provides utilities for working with structured text.
+This includes comment handling and delimited field (aka CSV) parsing with
+support for quoted strings.
+=== Commented Text
+The StructuredText::CommentedReader class removes comments from text.
+Comments start with a specified comment delimiter (the default is "#") and
+continue to the end of the line.  For example, the following text:
+	line 1 # comment 1
+	# comment 2
+	line 3 # comment 3
+becomes:
+	line 1
+	line 3
+The default comment delimiter is "#".  A different delimiter may be specified
+when the object is created.  Blank lines may either be returned or ignored.
+=== Delimited Text
+The StructuredText::DelimitedReader class parses field-delimited text
+yielding records.
+In field-delimited text, each line is a record that consists of a series of
+fields delimited by a specified character. When that character is a comma
+these are called comma-separated-value (CSV) files.
+An array of fields is yielded for each line of field-delimited text.  For
+example, the following text:
+	apples, red, round
+	bananas, yellow, oblong
+is parsed into these arrays:
+	['apples', ' red', ' round']
+	['bananas', ' yellow', ' oblong']
+The field text may contain quoted strings.  Delimiter characters inside
+quotes are not treated as field delimiters.  So:
+	apples,"red,green",round
+	bananas,yellow,oblong
+becomes:
+	['apples', '"red,green"', 'round']
+	['bananas', 'yellow', 'oblong']
+Note here that the second field of the first line contains the text
+"red,green".
+The caller may specify custom field delimiter and right- and left-hand quote
+characters.
+The StructuredText::LabeledDelimitedReader class extends this functionality
+by treating the first line of the text as a header row that contains field
+names. A hash with the field values assigned to their corresponding header
+names is yielded for each line of input.  For example, the following text:
+	Fruit,Color,Shape
+	apples,red,round
+	bananas,yellow,oblong
+is parsed into these arrays:
+	{"Shape"=>"round", "Fruit"=>"apples", "Color"=>"red"}
+	{"Shape"=>"oblong", "Fruit"=>"bananas", "Color"=>"yellow"}
+= History
+1.0.0:: Comment handling and field-delimited text
+= Copyright
+Copyright 2009, William Patrick McNeill
+This program is distributed under the GNU General Public License.
+= Author
+W.P. McNeill mailto:billmcn@gmail.com

data/lib/structuredtext.rb ADDED Viewed

@@ -0,0 +1,206 @@
+# Copyright 2009 William Patrick McNeill
+#
+# This file is part of StructuredText.
+#
+# StructuredText is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 2 of the License, or (at your option)
+# any later version.
+#
+# StructuredText is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# You should have received a copy of the GNU General Public License along with
+# editalign; if not, write to the Free Software Foundation, Inc., 51 Franklin
+# St, Fifth Floor, Boston, MA 02110-1301 USA
+#
+# Utilities for working with various kinds of structured text.
+module StructuredText
+  VERSION = "1.0.0"
+  # Removes comments from text.
+  #
+  # Comments start with a specified comment delimiter and continue to the end
+  # of the line.
+  #
+  #   > StructuredText::CommentedReader.new(<<-EOTEXT
+  #   " line 1 # comment 1
+  #   " # comment 2
+  #   " line 3 # comment 3
+  #   " EOTEXT
+  #   > ).collect
+  #   => ["line 1 ", "", "line 3 "]
+  #
+  # The default comment delimiter is "#".  A different delimiter may be
+  # specified when the object is created.  Blank lines may either be returned
+  # or ignored.
+  class CommentedReader
+    include Enumerable
+    # Intialize the reader with the text to parse and parameters that
+    # determine how comments will be processed.
+    #
+    # [_source_] an enumerable set of text lines, e.g. a stream or a string.
+    # [<em>comment_delimiter</em>] the comment delimiter, the default is
+    #                              <em>"#"</em>
+    # [<em>skip_blanks</em>] skip blank lines in the input, the default is
+    #                        _true_
+    def initialize(source, comment_delimiter = "#", skip_blanks = true)
+      @source = source
+      @comment_regex = Regexp.compile(comment_delimiter + '.*$')
+      @skip_blanks = skip_blanks
+    end
+    # Enumerate the lines in the source, removing all text after a comment
+    # character.
+    def each # :yields: line with comments removed
+      @source.each do |line|
+        line.chomp!
+        line.sub!(@comment_regex, "")
+        yield line if not @skip_blanks or not line.empty?
+      end
+    end
+  end # CommentedReader
+  # Parses field-delimited text yielding records.
+  #
+  # In field-delimited text, each line is a record that consists of a series
+  # of fields delimited by a specified character.  When that character is a
+  # comma these are called comma-separated-value (CSV) files.
+  #
+  # This class enumerates field-delimited text yielding an array of fields for
+  # each line of input.
+  #
+  #   > StructuredText::DelimitedReader.new(<<-EOTEXT
+  #   " apples, red, round
+  #   " bananas, yellow, oblong
+  #   " EOTEXT
+  #   > ).collect
+  #   => [["apples", " red", " round"], ["bananas", " yellow", " oblong"]]
+  #
+  # The field text may contain quoted strings.  Delimiter characters inside
+  # quotes are not treated as field delimiters.
+  #
+  #   > StructuredText::DelimitedReader.new(<<-EOTEXT
+  #   " apples,"red,green",round
+  #   " bananas,yellow,oblong
+  #   " EOTEXT
+  #   > ).collect
+  #   => [["apples", "\"red,green\"", "round"], ["bananas", "yellow", "oblong"]]
+  #
+  # Note here that the second field of the first line contains the text
+  # "red,green".
+  #
+  # The caller may specify custom field delimiter and right- and left-hand
+  # quote characters.
+  class DelimitedReader
+    include Enumerable
+    # Intialize the reader with the text and optional characters that control
+    # the parsing format.
+    #
+    # By default, the field delimiter is a comma (,) and the quote character
+    # is a double-quote (").  Both of these defaults can be overridden with
+    # arguments passed to this function. The caller may also specify different
+    # left-hand and right-hand quote characters, e.g. ( and ).
+    #
+    # [_source_] an enumerable set of text lines, e.g. a stream or a string.
+    # [_delimiter_] the field delimiter character
+    # [_lquote_] the left-hand field quote character
+    # [_rquote_] the right-hand field quote character; if unspecified, it is
+    #            identical to the left-hand field quote
+    def initialize(source, delimiter = ",", lquote = '"', rquote = nil)
+      @source = source
+      # Escape the custom characters the caller provides a regular expression
+      # control character.
+      delimiter = Regexp.escape(delimiter)
+      lquote = Regexp.escape(lquote)
+      rquote = rquote.nil? ? lquote : Regexp.escape(rquote)
+      s = <<-EOTEXT
+(?:  # Match delimiter
+  (#{delimiter})   # field delimiter
+   |    # ...or...
+  ($)   # end of line
+)
+  |     # ...or...
+(     # Match text
+  (?: #{lquote}.*?#{rquote})  # quoted string
+      |       # ...or...
+  (?: [^#{delimiter}]*)  # text without delimiters
+)
+EOTEXT
+      @field_regex = Regexp.compile(s, Regexp::EXTENDED)
+    end
+    # Enumerate the lines in the source yielding arrays of comma-separated
+    # fields.
+    #
+    # A double-quote delimited field may contain non-field-delimiting commas.
+    def each
+      @source.each do |line|
+        line.chomp!
+        record = []
+        # Scan comma-delimited fields.  Allow commas to appear inside
+        # double-quoted strings.
+        field = ""
+        line.scan(@field_regex) do |match|
+          comma_delimiter = (not match[0].nil?)
+          eol_delimiter = (not match[1].nil?)
+          text = match[2]
+          if not (comma_delimiter or eol_delimiter)
+            # Append text in the middle of a field.
+            field += text if not text.nil?
+          else
+            # Add field to the record at a delimiter.
+            record << field
+            field = ""
+          end
+        end # line.scan
+        yield record
+      end # @source.each
+    end
+  end # DelimitedReader
+  # Parses field-delimited text with a header row yielding record hashes.
+  #
+  # The first row of the file contains field names.  This class yields a hash
+  # with the field values assigned to their corresponding header names.
+  #
+  #   > StructuredText::LabeledDelimitedReader.new(<<-EOTEXT
+  #   " Fruit,Color,Shape
+  #   " apples,red,round
+  #   " bananas,yellow,oblong
+  #   " EOTEXT
+  #   > ).collect
+  #   => [{"Shape"=>"round", "Fruit"=>"apples", "Color"=>"red"}, {"Shape"=>"oblong", "Fruit"=>"bananas", "Color"=>"yellow"}]
+  #
+  # If there are fewer fields in a line than there are headers, the remaining
+  # ones will be padded with nil.  If there are more fields, an RuntimeError
+  # will be raised.
+  class LabeledDelimitedReader < DelimitedReader
+    def each # :yields: Hash of column labels and field values
+      header_row = nil
+      super do |record|
+        if header_row.nil?
+          header_row = record
+        else
+          if record.length > header_row.length
+            raise "More fields than headers:\n#{record.inspect}"
+          end
+          yield Hash[*header_row.zip(record).flatten]
+        end
+      end
+    end
+  end # LabeledDelimitedReader
+end # StructuredText

data/test/test_structuredtext.rb ADDED Viewed

@@ -0,0 +1,144 @@
+#!/usr/bin/env ruby -w
+#--
+# Copyright 2009 William Patrick McNeill
+#
+# This file is part of StructuredText.
+#
+# StructuredText is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 2 of the License, or (at your option)
+# any later version.
+#
+# StructuredText is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# You should have received a copy of the GNU General Public License along with
+# editalign; if not, write to the Free Software Foundation, Inc., 51 Franklin
+# St, Fifth Floor, Boston, MA 02110-1301 USA
+#
+#++
+# Test cases for the StructuredText module
+require "test/unit"
+require "structuredtext"
+class CommentedReaderTestCase < Test::Unit::TestCase
+  def test_basic
+    # Canonical
+    assert_equal([" one "], StructuredText::CommentedReader.new(" one # comment").collect)
+    # Multiline
+    assert_equal([" one ", "two"], StructuredText::CommentedReader.new(" one # comment\ntwo").collect)
+  end
+  def test_custom_comment_delimiter
+    # Canonical
+    assert_equal([" one "], StructuredText::CommentedReader.new(" one ; comment", ";").collect)
+    # Multiline
+    assert_equal([" one ", "two"], StructuredText::CommentedReader.new(" one ; comment\ntwo", ";").collect)
+  end
+  def test_skip_blanks
+    s = " one # comment\n# comment\n\ntwo\n# comment"
+    assert_equal([" one ", "two"], StructuredText::CommentedReader.new(s).collect)
+    assert_equal([" one ", "", "", "two", ""], StructuredText::CommentedReader.new(s, "#", false).collect)
+  end
+end # CommentedReaderTestCase
+class DelimitedReaderTestCase < Test::Unit::TestCase
+  def test_basic
+    # Canonical
+    assert_equal([["a", "b", "c"]], StructuredText::DelimitedReader.new("a,b,c").collect)
+    # Multiline: uniform record length
+    assert_equal([["a", "b", "c"], ["d", "e", "f"]], StructuredText::DelimitedReader.new("a,b,c\nd,e,f").collect)
+    # Multiline: variying record length
+    assert_equal([["a", "b", "c"], ["d", "e"]], StructuredText::DelimitedReader.new("a,b,c\nd,e").collect)
+  end
+  def test_quoted
+    # Begining
+    assert_equal([['"a,b"', "c"]], StructuredText::DelimitedReader.new('"a,b",c').collect)
+    # Middle
+    assert_equal([["a", '"b,c"', "d"]], StructuredText::DelimitedReader.new('a,"b,c",d').collect)
+    # End
+    assert_equal([["a",'"b,c"']], StructuredText::DelimitedReader.new('a,"b,c"').collect)
+  end
+  def test_empty
+    # Begining
+    assert_equal([["", "b", "c"]], StructuredText::DelimitedReader.new(",b,c").collect)
+    # Middle
+    assert_equal([["a", "", "c"]], StructuredText::DelimitedReader.new("a,,c").collect)
+    # End
+    assert_equal([["a", "b", ""]], StructuredText::DelimitedReader.new("a,b,").collect)
+  end
+  def test_single
+    assert_equal([["a"]], StructuredText::DelimitedReader.new("a").collect)
+    assert_equal([], StructuredText::DelimitedReader.new("").collect)
+  end
+  def test_custom_delimiter
+    # Canonical
+    assert_equal([["a", "b", "c"], ["d", "e", "f"]], StructuredText::DelimitedReader.new("a;b;c\nd;e;f", ";").collect)
+    # Quoted string in the middle
+    assert_equal([["a", '"b;c"', "d"]], StructuredText::DelimitedReader.new('a;"b;c";d', ";").collect)
+  end
+  def test_custom_quote_left_and_right_same
+    # Begining
+    assert_equal([['|a,b|', 'c']], StructuredText::DelimitedReader.new('|a,b|,c', ",", "|").collect)
+    # Middle
+    assert_equal([['a', '|b,c|', 'd']], StructuredText::DelimitedReader.new('a,|b,c|,d', ",", "|").collect)
+    # End
+    assert_equal([['a','|b,c|']], StructuredText::DelimitedReader.new('a,|b,c|', ",", "|").collect)
+  end
+  def test_custom_quote_left_and_right_different
+    # Begining
+    assert_equal([['(a,b)', 'c']], StructuredText::DelimitedReader.new('(a,b),c', ",", "(", ")").collect)
+    # Middle
+    assert_equal([['a', '(b,c)', 'd']], StructuredText::DelimitedReader.new('a,(b,c),d', ",", "(", ")").collect)
+    # End
+    assert_equal([['a','(b,c)']], StructuredText::DelimitedReader.new('a,(b,c)', ",", "(", ")").collect)
+  end
+end # DelimitedReaderTestCase
+class LabeledReaderTestCase < Test::Unit::TestCase
+  def test_basic
+    # Canonical
+    assert_equal([{"X"=>"a", "Y"=>"b", "Z"=>"c"}], StructuredText::LabeledDelimitedReader.new("X,Y,Z\na,b,c").collect)
+    # Multiline
+    assert_equal([{"X"=>"a", "Y"=>"b", "Z"=>"c"}, {"X"=>"d", "Y"=>"e", "Z"=>"f"}],
+                 StructuredText::LabeledDelimitedReader.new("X,Y,Z\na,b,c\nd,e,f").collect)
+  end
+  def test_exception
+    assert_raise(RuntimeError) { StructuredText::LabeledDelimitedReader.new("X,Y,Z\na,b,c,d").collect  }
+  end
+end
+class ScenarioTestCase < Test::Unit::TestCase
+  def test_commented_labeled_text_with_all_custom_characters
+    text =<<-EOTEXT
+; This is the header row
+Fruit|Color|Shape
+apples|(red|green)|round ; The first data row
+bananas|yellow|oblong
+; The end
+EOTEXT
+  r = StructuredText::LabeledDelimitedReader.new(StructuredText::CommentedReader.new(text, ";"), "|", "(", ")")
+  assert_equal([{'Shape'=>'round ', 'Fruit'=>'apples', 'Color'=>'(red|green)'},
+                {'Shape'=>'oblong', 'Fruit'=>'bananas', 'Color'=>'yellow'}], r.collect)
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,60 @@
+--- !ruby/object:Gem::Specification
+name: structuredtext
+version: !ruby/object:Gem::Version
+  version: 1.0.0
+platform: ruby
+authors:
+- W.P. McNeill
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2009-08-11 00:00:00 -07:00
+default_executable:
+dependencies: []
+description: This module provides utilities for working with various kinds of structured text.  It includes comment handling and comma-separated-value (CSV) parsing with support for quoted strings.
+email: billmcn@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files:
+- README
+files:
+- test/test_structuredtext.rb
+- lib/structuredtext.rb
+- README
+has_rdoc: true
+homepage: http://structuredtext.rubyforge.org/
+post_install_message:
+rdoc_options:
+- - --title
+  - StructuredText -- Structured Text Utilities
+  - --main
+  - README
+  - --line-numbers
+  - --inline-source
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: "0"
+  version:
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: "0"
+  version:
+requirements: []
+rubyforge_project: structuredtext
+rubygems_version: 1.1.1
+signing_key:
+specification_version: 2
+summary: Structued text processing utilities
+test_files:
+- test/test_structuredtext.rb