RubyGems - bio-table - Versions diffs - 0.0.1 - Mend

bio-table 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

data/.document +5 -0
data/.rspec +1 -0
data/.travis.yml +12 -0
data/Gemfile +18 -0
data/LICENSE.txt +20 -0
data/README.md +283 -0
data/Rakefile +55 -0
data/VERSION +1 -0
data/bin/bio-table +141 -0
data/features/bio-table-csv-reader-feature.rb +28 -0
data/features/bio-table-csv-reader.feature +22 -0
data/features/step_definitions/bio-table_steps.rb +0 -0
data/features/support/env.rb +13 -0
data/lib/bio-table.rb +23 -0
data/lib/bio-table/columns.rb +11 -0
data/lib/bio-table/diff.rb +30 -0
data/lib/bio-table/filter.rb +57 -0
data/lib/bio-table/formatter.rb +28 -0
data/lib/bio-table/overlap.rb +1 -0
data/lib/bio-table/parser.rb +22 -0
data/lib/bio-table/table.rb +121 -0
data/lib/bio-table/tablereader.rb +13 -0
data/lib/bio-table/tablerow.rb +29 -0
data/lib/bio-table/tablewriter.rb +6 -0
data/lib/bio-table/validator.rb +26 -0
data/spec/bio-table_spec.rb +7 -0
data/spec/spec_helper.rb +12 -0
data/test/data/input/table1.csv +381 -0
metadata +168 -0

data/.document ADDED

@@ -0,0 +1,5 @@
+lib/**/*.rb
+bin/*
+-
+features/**/*.feature
+LICENSE.txt

data/.rspec ADDED

	@@ -0,0 +1 @@
1	+ --color

data/.travis.yml ADDED

@@ -0,0 +1,12 @@
+language: ruby
+rvm:
+  - 1.9.2
+  - 1.9.3
+  - jruby-19mode # JRuby in 1.9 mode
+  - rbx-19mode
+#  - 1.8.7
+#  - jruby-18mode # JRuby in 1.8 mode
+#  - rbx-18mode
+# uncomment this line if your project needs to run something other than `rake`:
+# script: bundle exec rspec spec

data/Gemfile ADDED

@@ -0,0 +1,18 @@
+source "http://rubygems.org"
+# Add dependencies required to use your gem here.
+# Example:
+#   gem "activesupport", ">= 2.3.5"
+gem "bio-logger"
+# Add dependencies to develop your gem here.
+# Include everything needed to run rake, tests, features, etc.
+group :development do
+  gem "rspec", "~> 2.8.0"
+  gem "rdoc", "~> 3.12"
+  gem "cucumber", ">= 0"
+  gem "bundler", "> 1.0.0"
+  gem "jeweler", "~> 1.8.3"
+  gem "bio", ">= 1.4.2"
+  gem "rdoc", "~> 3.12"
+end

data/LICENSE.txt ADDED

@@ -0,0 +1,20 @@
+Copyright (c) 2012 Pjotr Prins
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED

@@ -0,0 +1,283 @@
+# bio-table
+[![Build Status](https://secure.travis-ci.org/pjotrp/bioruby-table.png)](http://travis-ci.org/pjotrp/bioruby-table)
+Tables of data are often used in bioinformatics, especially in
+conjunction with Excel spreadsheets and DB queries. This biogem
+contains support for reading tables, writing tables, and manipulation
+of rows and columns, both using a command line interface and through a
+Ruby library. If you don't like R dataframes, maybe you like this.
+Also, because bio-table is command line driven, it easily fits in a
+pipe-line setup.
+Quick example, say we want to filter out rows that contain certain
+p-values listed in the 4th column:
+```
+    bio-table test/data/input/table1.csv --num-filter "values[3] <= 0.05"
+```
+bio-table should be lazy, be good for big data, and the library
+support a functional style of programming. You don't need to know Ruby
+to use the command line interface (CLI).
+Note: this software is under active development!
+## Installation
+```sh
+    gem install bio-table
+```
+## The command line interface (CLI)
+### Transforming a table
+Tables can be transformed through the command line. To transform a
+comma separated file to a tab delimited one
+```
+    bio-table test/data/input/table1.csv --in-format csv --format tab > test1.tab
+```
+Tab is actually the general default. Still, if the file name ends in
+csv, it will assume CSV. To convert the table back
+```
+    bio-table test1.tab --format csv > table1.csv
+```
+To filter out rows that contain certain values
+```
+    bio-table test/data/input/table1.csv --num-filter "values[3] <= 0.05" > test1a.tab
+```
+The filter ignores the header row, and the row names. If you need
+either, use the switches --with-header and --with-rownames. With math, list all rows
+```
+    bio-table test/data/input/table1.csv --num-filter "values[3]-values[6] >= 0.05" > test1a.tab
+```
+or, list all rows that have a least a field with values >= 1000.0
+```
+    bio-table test/data/input/table1.csv --num-filter "values.max >= 1000.0" > test1a.tab
+```
+Produce all rows that have at least 3 values above 3.0 and 1 one value
+above 10.0:
+```
+    bio-table test/data/input/table1.csv --num-filter "values.max >= 10.0 and values.count{|x| x>=3.0} > 3"
+```
+How is that for expressiveness? Looks like Ruby to me.
+The --num-filter will convert fields lazily to numerical values (only
+valid numbers are converted). If there are NA (nil) values in the table, you
+may wish to remove them, like this
+```
+    bio-table test/data/input/table1.csv --num-filter "values[0..12].compact.max >= 1000.0" > test1a.tab
+```
+which takes the first 13 fields and compact removes the nil values.
+Also string comparisons and regular expressions can be used. E.g.
+filter on rownames and a row field both containing 'BGT'
+```
+    # not yet implemented
+    bio-table test/data/input/table1.csv --filter "rowname =~ /BGT/ and field[1] =~ /BGT/" > test1a.tab
+```
+To reorder/reduce table columns by name
+```
+    bio-table test/data/input/table1.csv --columns AJ,B6,Axb1,Axb4,AXB13,Axb15,Axb19 > test1a.tab
+```
+or use their index numbers
+```
+    bio-table test/data/input/table1.csv --columns 0,1,8,2,4,6 > test1a.tab
+```
+### Sorting a table
+To sort a table on column 4 and 2
+```
+    # not yet implemented
+    bio-table test/data/input/table1.csv --sort 4,2 > test1a.tab
+```
+Note: not all is implemented (just yet). Please check bio-table --help first.
+### Combining a table
+You can combine/concat tables by passing in multiple file names
+    bio-table test/data/input/table1.csv test/data/input/table2.csv
+assuming they have the same headers (you can use the --columns switch!)
+### Splitting a table
+Splitting a table by column is possible by named or indexed columns,
+see the --columns switch.
+more soon
+### Diffing and overlapping tables
+With two tables it may be interesting to see the differences, or
+overlap, based on shared columns. The bio-table diff command shows the
+difference between two tables using the row names (i.e. those rows
+with rownames that appear in table2, but not in table1)
+    bio-table --diff 0 table1.csv table2.csv
+To find it the other way, switch the file names
+    bio-table --diff 0 table1.csv table2.csv
+To diff on something else
+    bio-table --diff 0,3 table2.csv table1.csv
+creates a (hopefully unique) key using columns 0 and 3 (0 is the rownames column).
+Similarly
+    bio-table --overlap 2 table1.csv table2.csv
+finds the overlapping rows, based on column 2 (NYI)
+### Different parsers
+more soon
+## Usage
+```ruby
+    require 'bio-table'
+    include BioTable
+```
+### Reading, transforming, and writing a table
+Note: the Ruby API below is a work in progress.
+Tables are two dimensional matrixes, which can be read from a file
+```
+    t = Table.read_file('test/data/input/table1.csv')
+    p t.header              # print the header array
+    p t.name[0],t[0]        # print the row name and row row
+    p t[0][0]               # print the top corner field
+```
+The table reader has quite a few options for defining field separator,
+which column to use for names etc. More interestingly you can pass a
+function to limit the amount of row read into memory:
+```
+    t = Table.read_file('test/data/input/table1.csv',
+      :by_row => { | row | row[0..3] } )
+```
+will create a table of the column name +row[0]+ and 2 table fields. You can use
+the same idea to reformat and reorder table columns when reading data
+into the table. E.g.
+```
+    t = Table.read_file('test/data/input/table1.csv',
+      :by_row => { | row | [row.rowname, row[0..3], row[6].to_i].flatten } )
+```
+When a header can not be transformed, it may fail. You can test for
+the header with row.header?, but in this case you
+can pass in a :by_header, which will have :by_row only call on
+actual table rows.
+```
+    t = Table.read_file('test/data/input/table1.csv',
+      :by_header => { | header | ["Row name", header[0..3], header[6]].flatten } )
+      :by_row => { | row | [row.rowname, row[0..3], row[6].to_i].flatten } )
+```
+When by_row returns nil or false, the table row is skipped. One way to
+transform a file, and not loading it in memory, is
+```
+    f = File.new('test.tab','w')
+    t = Table.read_file('test/data/input/table1.csv',
+      :by_row => { | row |
+        TableRow::write(f,[row.rowname,row[0..3],row[6].to_i].flatten, :separator => "\t")
+        nil   # don't create a table in memory, effectively a filter
+      })
+```
+Another function is :filter which only acts on rows, but can not
+transform them.
+To write a full table from memory to file use
+```
+    t.write_file('test1a.csv')
+```
+again columns can be reordered/transformed using a function. Another
+option is by passing in an list of column numbers or header names, so
+only those get written, e.g.
+```
+    t.write_file('test1a.csv', columns: [0,1,2,4,6,8])
+    t.write_file('test1b.csv', columns: ["AJ","B6","Axb1","Axb4","AXB13","Axb15","Axb19"] )
+```
+other options are available for excluding row names (rownames: false), etc.
+To sort a table file, the current routine is to load the file in
+memory and sort according to table columns. In the near future we aim
+to have a low-memory version, by reading only the sorting columns in
+memory, and indexing them before writing output. That means reading a
+file twice, but being able to handle much larger data.
+### Loading a numerical matrix
+Coming soon
+### More...
+The API doc is online. For more code examples see the test files in
+the source tree.
+## Project home page
+Information on the source tree, documentation, examples, issues and
+how to contribute, see
+  http://github.com/pjotrp/bioruby-table
+The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
+## Cite
+If you use this software, please cite one of
+* [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
+* [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
+## Biogems.info
+This Biogem is published at [#bio-table](http://biogems.info/index.html)
+## Copyright
+Copyright (c) 2012 Pjotr Prins. See LICENSE.txt for further details.

data/Rakefile ADDED

@@ -0,0 +1,55 @@
+# encoding: utf-8
+require 'rubygems'
+require 'bundler'
+begin
+  Bundler.setup(:default, :development)
+rescue Bundler::BundlerError => e
+  $stderr.puts e.message
+  $stderr.puts "Run `bundle install` to install missing gems"
+  exit e.status_code
+end
+require 'rake'
+require 'jeweler'
+Jeweler::Tasks.new do |gem|
+  # gem is a Gem::Specification... see http://docs.rubygems.org/read/chapter/20 for more options
+  gem.name = "bio-table"
+  gem.homepage = "http://github.com/pjotrp/bioruby-table"
+  gem.license = "MIT"
+  gem.summary = %Q{Transforming/filtering tab/csv files}
+  gem.description = %Q{Functions and tools for tranforming and changing tab delimited and comma separated table files - useful for Excel sheets and SQL/RDF output}
+  gem.email = "pjotr.public01@thebird.nl"
+  gem.authors = ["Pjotr Prins"]
+  # dependencies defined in Gemfile
+end
+Jeweler::RubygemsDotOrgTasks.new
+require 'rspec/core'
+require 'rspec/core/rake_task'
+RSpec::Core::RakeTask.new(:spec) do |spec|
+  spec.pattern = FileList['spec/**/*_spec.rb']
+end
+RSpec::Core::RakeTask.new(:rcov) do |spec|
+  spec.pattern = 'spec/**/*_spec.rb'
+  spec.rcov = true
+end
+require 'cucumber/rake/task'
+Cucumber::Rake::Task.new do |features|
+end
+task :test => [ :cucumber ]
+task :default => :test
+require 'rdoc/task'
+Rake::RDocTask.new do |rdoc|
+  version = File.exist?('VERSION') ? File.read('VERSION') : ""
+  rdoc.rdoc_dir = 'rdoc'
+  rdoc.title = "bio-table #{version}"
+  rdoc.rdoc_files.include('README*')
+  rdoc.rdoc_files.include('lib/**/*.rb')
+end

data/VERSION ADDED

	@@ -0,0 +1 @@
1	+ 0.0.1

data/bin/bio-table ADDED

@@ -0,0 +1,141 @@
+#!/usr/bin/env ruby
+#
+# BioRuby bio-table Plugin BioTable
+# Author:: Pjotr Prins
+# Copyright:: 2012
+rootpath = File.dirname(File.dirname(__FILE__))
+$: << File.join(rootpath,'lib')
+_VERSION = File.new(File.join(rootpath,'VERSION')).read.chomp
+$stderr.print "bio-table "+_VERSION+" Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>\n\n"
+USAGE =<<EOU
+bio-table transforms, filters and reorders table files (CSV, tab-delimited).
+EOU
+if ARGV.size == 0
+  print USAGE
+end
+require 'bio-table'
+require 'optparse'
+require 'bio-logger'
+log = Bio::Log::LoggerPlus.new 'bio-table'
+# log.outputters = Bio::Log::Outputter.stderr
+Bio::Log::CLI.logger('stderr')
+Bio::Log::CLI.trace('info')
+options = {show_help: false, write_header: true}
+options[:show_help] = true if ARGV.size == 0
+opts = OptionParser.new do |o|
+  o.banner = "Usage: #{File.basename($0)} [options] filename\n\n"
+  o.on('--in-format [tab,csv]', [:tab, :csv], 'Input format (default tab)') do |par|
+    options[:in_format] = par.to_sym
+  end
+  o.on('--format [tab,csv]', [:tab, :csv], 'Output format (default tab)') do |par|
+    options[:format] = par.to_sym
+  end
+  o.on('--num-filter func', 'Numeric filtering function') do |par|
+    options[:num_filter] = par
+  end
+  o.on('--columns list', Array, 'List of column names or indices') do |l|
+    options[:columns] = l
+  end
+  o.on('--diff list',Array,'Diff two input files on columns') do |l|
+    options[:diff] = l
+  end
+  o.on('--overlap list',Array,'Find overlap of two input files on columns)') do |l|
+    options[:overlap] = l
+  end
+  # o.on('--with-header','Include the header element in filtering etc.') do
+  #   options[:with_header] = true
+  # end
+  o.on('--with-rownames','Include the rownames in filtering etc.') do
+    options[:with_rownames] = true
+  end
+  o.separator ""
+  o.on("--logger filename",String,"Log to file (default stderr)") do | name |
+    Bio::Log::CLI.logger(name)
+  end
+  o.on("--trace options",String,"Set log level (default INFO, see bio-logger)") do | s |
+    Bio::Log::CLI.trace(s)
+  end
+  o.on("-q", "--quiet", "Run quietly") do |q|
+    Bio::Log::CLI.trace('error')
+  end
+  o.on("-v", "--verbose", "Run verbosely") do |v|
+    Bio::Log::CLI.trace('info')
+  end
+  o.on("--debug", "Show debug messages") do |v|
+    Bio::Log::CLI.trace('debug')
+  end
+  o.separator ""
+  o.on_tail('-h', '--help', 'Display this help and exit') do
+    options[:show_help] = true
+  end
+end
+begin
+  opts.parse!(ARGV)
+  if options[:show_help]
+    print opts
+    print USAGE
+  end
+  # TODO: your code here
+  # use options for your logic
+rescue OptionParser::InvalidOption => e
+  options[:invalid_argument] = e.message
+end
+Bio::Log::CLI.configure('bio-table')
+logger = Bio::Log::LoggerPlus['bio-table']
+logger.info [options, ARGV]
+include BioTable
+if options[:diff]
+  logger.warn "Column settings are ignored for --diff" if options[:columns]
+  logger.warn "Ignoring extraneaousfiles" if ARGV.size>2
+  t1 = TableReader::read_file(ARGV[0], options)
+  t2 = TableReader::read_file(ARGV[1], options)
+  t = Diff::diff_tables(t1,t2, options)
+  t.write(options)
+  exit
+end
+if options[:overlap]
+  logger.warn "Column settings are ignored for --overlap" if options[:columns]
+  exit
+end
+ARGV.each do | fn |
+  t = TableReader::read_file(fn, options)
+  t.write(options)
+  options[:write_header] = false  # don't write the header for chained files
+end