RubyGems - csvhuman - Versions diffs - 0.1.0 → 0.2.0 - Mend

csvhuman 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 358d150c2a69a16f741b0dae47328857787d2dc2
-  data.tar.gz: fd36923138a7453510d2d26a4e2997c4475b3aea
+  metadata.gz: 0e03d4dc51acff7d6b47f1648abb47cfaa2a9028
+  data.tar.gz: b4921c44a67c57feae5c1f62eff5aa87ef81c996
 SHA512:
-  metadata.gz: 1540846d223cb4bcf8dd4d2982f5cdc96b13966328f4037d84d6b1a7f8a3bee998856edcc3aaeb7d14e476f12e5a3cc5fd81d81a6aeed6a44e55c39a24cd8144
-  data.tar.gz: 2fbc7a4ee6f22f75ab4cdea35c5411fc311baf4185d5ebc3012ae4b99a43d301a93a70ff388ac58aaa3eec0eb630d9c5742e94e67cb583214d88dfb05fadebad
+  metadata.gz: 675050a1e5af601ea6634fe17c0dcea511c917170438469c5f09349e4bd26678b5d42cc7cd5b9c97c2455b61ea67cd5719bff7c4637971849ae056955d562f2b
+  data.tar.gz: 9a7da3cdf466ebfec142344c505558b2c86fd38bee9c3b7d766c69cd0d127e5c42f4f854a6f61df7e4c57610ea4b2bec3bbdc121821a9c72b875cb470af2b50f

data/Manifest.txt CHANGED

@@ -3,8 +3,11 @@ Manifest.txt
 README.md
 Rakefile
 lib/csvhuman.rb
+lib/csvhuman/column.rb
 lib/csvhuman/reader.rb
+lib/csvhuman/tag.rb
 lib/csvhuman/version.rb
 test/data/test.csv
 test/helper.rb
 test/test_reader.rb
+test/test_tags.rb

data/README.md CHANGED

@@ -10,9 +10,221 @@ csvhuman library / gem - read tabular data in the CSV Humanitarian eXchange Lang
+## What's Humanitarian eXchange Language (HXL)?
+[Humanitarian eXchange Language (HXL)](https://github.com/csvspecs/csv-hxl)
+is a (meta data) convention for
+adding agreed on hashtags e.g. `#org,#country,#sex+#targeted,#adm1`
+inline in a (single new line / row)
+between the last header row and the first data row
+for sharing tabular data across organisations
+(during a humanitarian crisis).
+Example:
+```
+What,,,Who,Where,For whom,
+Record,Sector/Cluster,Subsector,Organisation,Country,Males,Females,Subregion
+,#sector+en,#subsector,#org,#country,#sex+#targeted,#sex+#targeted,#adm1
+001,WASH,Subsector 1,Org 1,Country 1,100,100,Region 1
+002,Health,Subsector 2,Org 2,Country 2,,,Region 2
+003,Education,Subsector 3,Org 3,Country 2,250,300,Region 3
+004,WASH,Subsector 4,Org 1,Country 3,80,95,Region 4
+```
 ## Usage
-to be done
+Pass in an array of arrays (or a stream responding to `#each` with an array of strings).
+Example:
+``` ruby
+pp CsvHuman.parse( [["Organisation", "Cluster", "Province" ], ## or use HXL.parse
+                    [ "#org", "#sector", "#adm1" ],
+                    [ "Org A", "WASH", "Coastal Province" ],
+                    [ "Org B", "Health", "Mountain Province" ],
+                    [ "Org C", "Education", "Coastal Province" ],
+                    [ "Org A", "WASH", "Plains Province" ]]
+```
+resulting in:
+``` ruby
+[{"org" => "Org A", "sector" => "WASH",      "adm1" => "Coastal Province"},
+ {"org" => "Org B", "sector" => "Health",    "adm1" => "Mountain Province"},
+ {"org" => "Org C", "sector" => "Education", "adm1" => "Coastal Province"},
+ {"org" => "Org A", "sector" => "WASH",      "adm1" => "Plains Province"}]
+```
+Or pass in the text. Example:
+``` ruby
+pp CsvHuman.parse( <<TXT )      ## or use HXL.parse
+  What,,,Who,Where,For whom,
+  Record,Sector/Cluster,Subsector,Organisation,Country,Males,Females,Subregion
+  ,#sector+en,#subsector,#org,#country,#sex+#targeted,#sex+#targeted,#adm1
+  001,WASH,Subsector 1,Org 1,Country 1,100,100,Region 1
+  002,Health,Subsector 2,Org 2,Country 2,,,Region 2
+  003,Education,Subsector 3,Org 3,Country 2,250,300,Region 3
+  004,WASH,Subsector 4,Org 1,Country 3,80,95,Region 4
+TXT
+```
+resulting in:
+```
+[{"sector+en"    => "WASH",
+  "subsector"    => "Subsector 1",
+  "org"          => "Org 1",
+  "country"      => "Country 1",
+  "sex+targeted" => ["100", "100"],
+  "adm1"         => "Region 1"},
+ {"sector+en"    => "Health",
+  "subsector"    => "Subsector 2",
+  "org"          => "Org 2",
+  "country"      => "Country 2",
+  "sex+targeted" => ["", ""],
+  "adm1"         => "Region 2"},
+ {"sector+en"    => "Education",
+  "subsector"    => "Subsector 3",
+  "org"          => "Org 3",
+  "country"      => "Country 2",
+  "sex+targeted" => ["250", "300"],
+  "adm1"         => "Region 3"},
+ {"sector+en"    => "WASH",
+  "subsector"    => "Subsector 4",
+  "org"          => "Org 1",
+  "country"      => "Country 3",
+  "sex+targeted" => ["80", "95"],
+  "adm1"         => "Region 4"}]
+```
+More ways to use the reader:
+``` ruby
+csv = CsvHuman.new( recs )
+csv.each do |rec|
+  pp rec
+end
+pp csv.read
+CsvHuman.parse( recs ).each do |rec|
+  pp rec
+end
+pp CsvHuman.read( "./test.csv" )
+CsvHuman.foreach( "./test.csv" ) do |rec|
+  pp rec
+end
+#...
+```
+or use the `HXL` alias:
+``` ruby
+hxl = HXL.new( recs )
+hxl.each do |rec|
+  pp rec
+end
+pp hxl.read
+HXL.parse( recs ).each do |rec|
+  pp rec
+end
+pp HXL.read( "./test.csv" )
+HXL.foreach( "./test.csv" ) do |rec|
+  pp rec
+end
+#...
+```
+Note: More aliases for `CsvHuman`, `HXL`? Yes, you can use
+`CsvHum`, `CSV_HXL`, `CSVHXL` too.
+## Tag Helpers
+**Normalize**. Use `CsvHuman::Tag.normalize` to pretty print or normalize a tag.
+All parts get downcased (lowercased), all attributes sorted by a-to-z,
+all extra or missing hashtags or pluses added or removed
+all extra or missing spaces added or removed. Example:
+``` ruby
+HXL::Tag.normalize( "#sector+en" )
+# => "#sector +en"
+HXL::Tag.normalize( "#SECTOR EN" )
+# => "#sector +en"
+HXL::Tag.normalize( "# SECTOR  + #EN " )
+# => "#sector +en"
+HXL::Tag.normalize( "SECTOR EN" )
+# => "#sector +en"
+# ...
+```
+**Split**. Use `CsvHuman::Tag.split` to split (and normalize) a tag into its parts.
+Example:
+``` ruby
+HXL::Tag.split( "#sector+en" )
+# => ["sector", "en"]
+HXL::Tag.split( "#SECTOR EN" )
+# => ["sector", "en"]
+HXL::Tag.split( "# SECTOR  + #EN " )
+# => ["sector", "en"]
+HXL::Tag.split( "SECTOR EN" )
+# => ["sector", "en"]
+## sort attributes a-to-z
+HXL::Tag.split( "#affected +f +children" )
+# => ["affected", "children", "f"]
+HXL::Tag.split( "#population +children +affected +m" )
+# => ["population", "affected", "children", "m"]
+HXL::Tag.split( "#population+children+affected+m" )
+# => ["population", "affected", "children", "m"]
+HXL::Tag.split( "#population+#children+#affected+#m" )
+# => ["population", "affected", "children", "m"]
+HXL::Tag.split( "#population #children #affected #m" )
+# => ["population", "affected", "children", "m"]
+HXL::Tag.split( "POPULATION CHILDREN AFFECTED M" )
+# => ["population", "affected", "children", "m"]
+#...
+```
+## Frequently Asked Questions (FAQ) and Answers
+###  Q: How to deal with un-tagged fields?
+**A**: Un-tagged fields get skipped / ignored.
+###  Q: How to deal with duplicate / repeated fields (e.g. `#sex+#targeted,#sex+#targeted`)?
+**A**: Repeated fields (auto-magically) get turned into an array / list.
 ## License

data/lib/csvhuman.rb CHANGED

@@ -1,13 +1,14 @@
 # encoding: utf-8
 require 'pp'
-require 'logger'
 require 'csvreader'
 ## our own code
 require 'csvhuman/version'    # note: let version always go first
+require 'csvhuman/tag'
+require 'csvhuman/column'
 require 'csvhuman/reader'

data/lib/csvhuman/column.rb ADDED

@@ -0,0 +1,89 @@
+# encoding: utf-8
+class CsvHuman
+class Columns
+  def self.build( values )
+    ## "clean" unify/normalize names
+    tag_keys = values.map do |value|
+      if value
+        if value.empty?
+          nil
+        else
+          ## e.g. #ADM1 CODE                      => #adm1 +code
+          ##      POPULATION F CHILDREN AFFECTED  => #population +affected +children +f
+          value = Tag.normalize( value )
+          ## turn empty normalized tags (e.g. "stray" hashtag) into nil too
+          value = nil   if value.empty?
+          value
+        end
+      else  # keep (nil) as is
+        nil
+      end
+    end
+    counts = {}
+    tag_keys.each_with_index do |key,i|
+       if key
+         counts[key] ||= []
+         counts[key] << i
+       end
+    end
+    ## puts "counts:"
+    ## pp counts
+    ## create all unique tags
+    tags = {}
+    counts.each_key do |key|
+      tags[key] = Tag.parse( key )
+    end
+    ## puts "tags:"
+    ## pp tags
+    cols = []
+    tag_keys.each do |key|
+      if key
+        count = counts[key]
+        tag   = tags[key]    ## note: "reuse" tag for all columns if list
+        if count.size > 1
+          ## note: defaults to use "standard/default" tag key (as a string)
+          cols << Column.new( tag.key, tag, list: true )
+        else
+          cols << Column.new( tag.key, tag )
+        end
+      else
+        cols << Column.new
+      end
+    end
+    cols
+  end
+end ## class Columns
+class Column
+   attr_reader  :key   # used for record (record key); note: list columns must use the same key
+   attr_reader  :tag
+   def initialize( key=nil, tag=nil, list: false )
+     @key  = key
+     @tag  = tag
+     @list = list
+   end
+   def tagged?()  @tag.nil? == false; end
+   def list?()    @list; end
+end  # class Column
+end # class CsvHuman

data/lib/csvhuman/reader.rb CHANGED

@@ -65,21 +65,6 @@ class CsvHuman
-class Column
-   attr_reader  :tag
-   def initialize( tag=nil, list: false )
-     @tag  = tag
-     @list = list
-   end
-   def tagged?()  @tag.nil? == false; end
-   def list?()    @list; end
-end  # class Column
 attr_reader :header, :tags
 def initialize( recs_or_stream )
@@ -106,8 +91,8 @@ def each( &block )
   @recs.each do |values|
     ## pp values
     if @cols.nil?
-      if values.any? { |value| value && value.start_with?('#') }
-        @cols = build_cols( values )
+      if values.any? { |value| value && value.strip.start_with?('#') }
+        @cols = Columns.build( values )
         @tags = values
       else
         @header << values
@@ -119,8 +104,8 @@ def each( &block )
       record = {}
       @cols.each_with_index do |col,i|
         if col.tagged?
-          key   = col.tag
-          value = values[i]
+          key   = col.key
+          value = values[i]   ## todo/fix: use col.tag.typecast( values[i] )
           if col.list?
             record[ key ] ||= []
             record[ key ] << value
@@ -144,54 +129,4 @@ def read() to_a; end # method read
 ##   add closed? and close
 ##    if self.open used without block (user needs to close file "manually")
-####
-# helpers
-def build_cols( values )
-  ## "clean" unify/normalize names
-  values = values.map do |value|
-    if value
-      if value.empty?
-        nil     ## make untagged fields nil
-      else
-        ## todo: sort attributes by a-to-z
-        ##  strip / remove all spaces
-        value.strip.gsub('#','')   ## remove leading # - why? why not?
-      end
-    else
-      value   ## keep (nil) as is
-    end
-  end
-  counts = {}
-  values.each_with_index do |value,i|
-     if value
-       counts[value] ||= []
-       counts[value] << i
-     end
-  end
-  ## pp counts
-  cols = []
-  values.each do |value|
-    if value
-      count = counts[value]
-      if count.size > 1
-        cols << Column.new( value, list: true )
-      else
-        cols << Column.new( value )
-      end
-    else
-      cols << Column.new
-    end
-  end
-  cols
-end
 end # class CsvHuman

data/lib/csvhuman/tag.rb ADDED

@@ -0,0 +1,162 @@
+# encoding: utf-8
+class CsvHuman
+class Tag
+  ##  1) plus (with optional hashtag and/or optional leading and trailing spaces)
+  ##  2) hashtag (with optional leading and trailing spaces)
+  ##  3) spaces only (not followed by plus) or
+  ##   note: plus pattern must go first (otherwise "sector  + en" becomes ["sector", "", "en"])
+  SEP_REGEX = /(?:  \s*\++
+                        (?:\s*\#+)?
+                    \s*  )
+                   |
+               (?:  \s*\#+\s*  )
+                   |
+               (?:  \s+)
+              /x    ## check if \s includes space AND tab?
+  def self.split( value )
+    value = value.strip
+    value = value.downcase
+    while value.start_with?('#') do   ## allow one or more hashes
+      value = value[1..-1]    ## remove leading #
+      value = value.strip   ## strip (optional) leading spaces (again)
+    end
+    ## pp value
+    parts = value.split( SEP_REGEX )
+    ## sort attributes a-z
+    if parts.size > 2
+       [parts[0]] + parts[1..-1].sort
+    else
+      parts
+    end
+  end
+  def self.normalize( value )   ## todo: rename to pretty or something or add alias
+    parts = split( value )
+    name       = parts[0]
+    attributes = parts[1..-1]   ## note: might be nil
+    buf = ''
+    if name  ## note: name might be nil too e.g. value = "" or value = "   "
+      buf << '#' + name
+      if attributes && attributes.size > 0
+        buf << ' +'
+        buf << attributes.join(' +')
+      end
+    end
+    buf
+  end
+  def self.guess_type( name, attributes )
+    if name == 'date'
+       Date
+    elsif ['affected', 'inneed'].include?( name )
+       Integer
+    else
+      ## check attributes
+      if attributes.nil? || attributes.empty?
+        String  ## assume (default to) string
+      elsif attributes.include?( 'num' )
+        Integer
+      elsif attributes.include?( 'date' )   ### todo/check: exists +date?
+        Date
+      elsif attributes.include?( 'affected' )
+        Integer
+      else
+        String   ## assume (default to) string
+      end
+    end
+  end
+  def self.parse( value )
+    parts = split( value )
+    name       = parts[0]
+    attributes = parts[1..-1]   ## todo/fix: check if nil (make it empty array [] always) - why? why not?
+    type       = guess_type( name, attributes )
+    new( name, attributes, type )
+  end
+  attr_reader :name
+  attr_reader :attributes   ## use attribs or something shorter - why? why not?
+  attr_reader :type
+  def initialize( name, attributes=nil, type=String )
+    @name       = name
+    ## sorted a-z - note: make sure attributes is [] NOT nil if empty - why? why not?
+    @attributes = attributes || []
+    @type       = type         ## type class (defaults to String)
+  end
+  def key
+    ## convenience short cut for "standard/default" string key
+    ##   cache/pre-built/memoize - why? why not?
+    ##  builds:
+    ##   population+affected+children+f
+    buf = ''
+    buf << @name
+    if @attributes && @attributes.size > 0
+      buf << '+'
+      buf << @attributes.join('+')
+    end
+    buf
+  end
+  def to_s
+    ## cache/pre-built/memoize - why? why not?
+    ##
+    ##  builds
+    ##     #population +affected +children +f
+    buf = ''
+    buf << '#' + @name
+    if @attributes && @attributes.size > 0
+      buf << ' +'
+      buf << @attributes.join(' +')
+    end
+    buf
+  end
+  def typecast( value )   ## use convert or call - why? why not?
+    if @type == Integer
+      conv_to_i( value )
+    else   ## assume String
+      # pass through as is
+      value
+    end
+  end
+private
+  def conv_to_i( value )
+    if value.nil? || value.empty?
+      nil   ## return nil - why? why not?
+    else
+      Integer( value )
+    end
+  end
+end # class Tag
+end # class CsvHuman

data/lib/csvhuman/version.rb CHANGED

@@ -4,7 +4,7 @@
 class CsvHuman
   MAJOR = 0
-  MINOR = 1
+  MINOR = 2
   PATCH = 0
   VERSION = [MAJOR,MINOR,PATCH].join('.')

data/test/test_reader.rb CHANGED

@@ -18,6 +18,26 @@ def recs
      [ "Org A", "WASH", "Plains Province" ]]
 end
+def recs2
+   [["Organisation", "Cluster", "Province" ],
+     [ "ORG", "#SECTOR", "ADM1" ],
+     [ "Org A", "WASH", "Coastal Province" ],
+     [ "Org B", "Health", "Mountain Province" ],
+     [ "Org C", "Education", "Coastal Province" ],
+     [ "Org A", "WASH", "Plains Province" ]]
+end
+def expected_recs
+  [{"org"=>"Org A", "sector"=>"WASH",      "adm1"=>"Coastal Province"},
+   {"org"=>"Org B", "sector"=>"Health",    "adm1"=>"Mountain Province"},
+   {"org"=>"Org C", "sector"=>"Education", "adm1"=>"Coastal Province"},
+   {"org"=>"Org A", "sector"=>"WASH",      "adm1"=>"Plains Province"}]
+end
 def txt
   <<TXT
   What,,,Who,Where,For whom,
@@ -38,7 +58,10 @@ def test_readme
   end
   pp csv.read
-  pp CsvHuman.parse( recs )
+  assert_equal expected_recs, CsvHuman.parse( recs )
+  assert_equal expected_recs, CsvHuman.parse( recs2 )
   CsvHuman.parse( recs ).each do |rec|
     pp rec

data/test/test_tags.rb ADDED

@@ -0,0 +1,106 @@
+# encoding: utf-8
+###
+#  to run use
+#     ruby -I ./lib -I ./test test/test_tags.rb
+require 'helper'
+class TestTags < MiniTest::Test
+def split( value )
+  CsvHuman::Tag.split( value )  ## returns an array of strings (name+attributes[])
+end
+def normalize( value )
+  CsvHuman::Tag.normalize( value )   ## returns a string
+end
+def parse( value )
+  CsvHuman::Tag.parse( value )   ## returns a Tag class
+end
+def test_split
+  assert_equal [], split( "" )   # empty
+  assert_equal [], split( "     " )   # empty
+  ## more empties (all matched by separator regex/pattern)
+  ##   keep as empty - why? why not?
+  assert_equal [], split( " #    " )   # empty
+  assert_equal [], split( " ##   " )   # empty
+  assert_equal [], split( " +   " )   # empty
+  assert_equal [], split( " +++ " )   # empty
+  assert_equal [], split( " +++## " )   # empty
+  assert_equal ["sector", "en"], split( "#sector+en" )
+  assert_equal ["sector", "en"], split( "#SECTOR EN" )
+  assert_equal ["sector", "en"], split( "  # SECTOR  + EN " )
+  assert_equal ["sector", "en"], split( "SeCtOr en" )
+  assert_equal ["sector", "en"], split( "#sector#en" )
+  assert_equal ["sector", "en"], split( "#sector+#en" )  ## allow (optional) hash for attributes
+  assert_equal ["sector", "en"], split( "##sector#en" )  ## allow hash only for attributes
+  assert_equal ["sector", "en"], split( "# #sector+++ ##en" )  ## allow one or more plus or hashes (typos) for attibutes
+  assert_equal ["adm1", "code"], split( "#ADM1 +CODE" )
+  assert_equal ["adm1", "code"], split( " # ADM1 + CODE" )
+  assert_equal ["adm1", "code"], split( "ADM1 CODE" )
+  ## sort attributes a-to-z
+  assert_equal ["affected", "children", "f"], split( "#affected +f +children" )
+  assert_equal ["population", "affected", "children", "m"], split( "#population +children +affected +m" )
+  assert_equal ["population", "affected", "children", "m"], split( "#population+children+affected+m" )
+  assert_equal ["population", "affected", "children", "m"], split( "#population+#children+#affected+#m" )
+  assert_equal ["population", "affected", "children", "m"], split( "#population #children #affected #m" )
+  assert_equal ["population", "affected", "children", "m"], split( "POPULATION CHILDREN AFFECTED M" )
+end
+def test_normalize
+  assert_equal "", normalize( "" )   # empty
+  assert_equal "", normalize( "   " )   # empty
+  assert_equal "#sector +en", normalize( "#sector+en" )
+  assert_equal "#sector +en", normalize( "#SECTOR EN" )
+  assert_equal "#sector +en", normalize( "  # SECTOR  + EN " )
+  assert_equal "#sector +en", normalize( "  # SECTOR  # EN " )
+  assert_equal "#sector +en", normalize( "SeCToR en" )
+  assert_equal "#adm1 +code", normalize( "#ADM1 +CODE" )
+  assert_equal "#adm1 +code", normalize( " # ADM1 + CODE" )
+  assert_equal "#adm1 +code", normalize( " # ADM1 + #CODE" )
+  assert_equal "#adm1 +code", normalize( "ADM1 Code" )
+  ## sort attributes a-to-z
+  assert_equal "#affected +children +f", normalize( "#affected +f +children" )
+  assert_equal "#population +affected +children +m", normalize( "#population +children +affected +m" )
+  assert_equal "#population +affected +children +m", normalize( "#population+children+affected+m" )
+  assert_equal "#population +affected +children +m", normalize( "POPULATION CHILDREN AFFECTED M" )
+end
+def test_parse
+  tag = parse( "#sector+en" )
+  assert_equal "#sector +en", tag.to_s
+  assert_equal "sector",      tag.name
+  assert_equal ["en"],        tag.attributes
+  assert_equal String,        tag.type
+  assert_equal "#sector +en", parse( "#SECTOR EN" ).to_s
+  assert_equal "#sector +en", parse( "  # SECTOR  + EN " ).to_s
+  tag = parse( "#adm1" )
+  assert_equal "#adm1", tag.to_s
+  assert_equal "adm1",  tag.name
+  assert_equal [],      tag.attributes
+  assert_equal String,  tag.type
+  assert_equal "#adm1", parse( "ADM1" ).to_s
+end
+end # class TestTags

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: csvhuman
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.0
 platform: ruby
 authors:
 - Gerald Bauer
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-11-06 00:00:00.000000000 Z
+date: 2018-11-10 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: csvreader
@@ -68,11 +68,14 @@ files:
 - README.md
 - Rakefile
 - lib/csvhuman.rb
+- lib/csvhuman/column.rb
 - lib/csvhuman/reader.rb
+- lib/csvhuman/tag.rb
 - lib/csvhuman/version.rb
 - test/data/test.csv
 - test/helper.rb
 - test/test_reader.rb
+- test/test_tags.rb
 homepage: https://github.com/csvreader/csvhuman
 licenses:
 - Public Domain