RubyGems - dreader - Versions diffs - 0.5.0 → 1.0.0 - Mend

dreader 0.5.0 → 1.0.0

Files changed (21) hide show

checksums.yaml +4 -4
data/CHANGELOG.ORG +45 -0
data/Gemfile.lock +20 -7
data/README.org +794 -0
data/dreader.gemspec +6 -4
data/examples/age/age.rb +22 -6
data/examples/age_with_multiple_checks/Birthdays.ods +0 -0
data/examples/age_with_multiple_checks/age_with_multiple_checks.rb +62 -0
data/examples/template/template_generation.rb +37 -0
data/examples/wikipedia_big_us_cities/big_us_cities.rb +20 -18
data/examples/wikipedia_us_cities/us_cities.rb +28 -27
data/examples/wikipedia_us_cities/us_cities_bulk_declare.rb +22 -22
data/lib/dreader/column.rb +39 -0
data/lib/dreader/engine.rb +473 -0
data/lib/dreader/options.rb +16 -0
data/lib/dreader/util.rb +71 -0
data/lib/dreader/version.rb +1 -1
data/lib/dreader.rb +5 -411
metadata +59 -24
data/Changelog.org +0 -20
data/README.md +0 -469

data/README.md DELETED Viewed

@@ -1,469 +0,0 @@
-# Dreader
-A simple DSL built on top of [Roo](https://github.com/roo-rb/roo) to
-read and process tabular data (CSV, LibreOffice, Excel).
-This gem allows you to:
-1. specify the structure of some tabular data you want to process
-2. debug and check correctness of the data you read
-3. read a file and process it, that is, execute code for each cell and
-   each row of the file
-If your data require elaborations which cannot be performed line by
-line, you can also access all the data read by the gem and manipulate
-it as you need.
-The input data can be in CSV (comma or tab separated), LibreOffice,
-and Excel.
-We use it to import data into Rails application, but the gem can used
-in any Ruby application.
-The gem should be relatively easy to use, despite its name: *dread*
-stands for *d*ata *r*eader.
-The gem depends on `roo`, from which we leverage all data
-reading/parsing facilities and which allows us to achieve what we want
-in about 250 lines of code.
-## Installation
-Add this line to your application's Gemfile:
-```ruby
-gem 'dreader'
-```
-And then execute:
-    $ bundle
-Or install it yourself as:
-    $ gem install dreader
-## Usage
-### Declare what file you want to read
-Require `dreader` and declare an instance of the `Dreader::Engine` class:
-```ruby
-require 'dreader'
-i = Dreader::Engine.new
-```
-Specify parsing option, using the following syntax:
-```ruby
-i.options do
-  filename 'example.ods'
-  sheet 'Sheet 1'
-  first_row 1
-  last_row 20
-end
-```
-where:
-* (optional) `filename` is the file to read.  If not specified, you
-  will have to supply a filename when loading the file (see `read`,
-  below).  The extension determines the file type.  **Use `tsv` for
-  tab-separated files.**
-* (optional) `first_row` is the first line to read (use `2` if your
-  file has a header)
-* (optional) `last_row` is the last line to read. If not specified, we
-  will rely on `roo` to determine the last row
-* (optional) `sheet` is the sheet name or number to read from. If not
-  specified, the first (default) sheet is used
-### Declare the columns you want to read
-Declare the columns you want to read by assigning them a name and a
-column reference:
-```ruby
-# we will access column A in Ruby code using :name
-i.column :name do
-  colref 'A'
-end
-```
-You can also specify two ruby blocks, `process` and `check` to
-preprocess data and to check for errors.
-For instance, given the following file:
-| Name             | Date of birth   |
-|------------------|-----------------|
-| Forest Whitaker  | July 15, 1961   |
-| Daniel Day-Lewis | April 29, 1957  |
-| Sean Penn        | August 17, 1960 |
-we could use the following declaration to specify the data to read:
-```ruby
-# we want to access column 1 using :name
-# :name should be non nil and of length greater than 0
-i.column :name do
-  colref 1
-  check do |x|
-    x and x.length > 0
-  end
-end
-# we want to access column 2 (Date of birth) using :birthdate
-i.column :birthdate do
-  colref 2
-  # make sure the column is transformed into an integer
-  process do |x|
-    Date.parse(x)
-  end
-  # check age is a date (check is invoked on the value returned
-  # by process)
-  check do |x|
-    x.class == Date
-  end
-end
-# we don't care about any other column (and, therefore,
-# we are done with our declarations)
-```
-If there are different columns that you want to read and process in
-the same way, you can use the method `bulk_declare`, which accepts a
-hash as input.
-For instance:
-```ruby
-i.bulk_declare {a: 'A', b: 'B'}
-```
-is equivalent to:
-```ruby
-i.column :a do
-  colref 'A'
-end
-i.column :b do
-  colref 'B'
-end
-```
-The method also accepts a code block, which allows to define a common
-`process` function for all columns.  In case, **don't forget to put
-the hash in parentheses, or the Ruby parser won't be able to
-distinguish the hash from the code block.** For instance:
-```ruby
-i.bulk_declare({a: 'A', b: 'B'}) do
-  process do |cell|
-    ...
-  end
-end
-```
-There is an example of `bulk_declare` in the examples directory:
-([us_cities_bulk_declare.rb](examples/wikipedia_us_cities/us_cities_bulk_declare.rb)).
-**Remarks:**
-1. the column name can be anything ruby can use as a Hash key.  You
-   can use symbols, strings, and even object instances, if you wish to
-   do so.
-2. `colref` can be a string (e.g., `'A'`) or an integer, in which case
-   the first column is one
-3. **you need to declare only the columns you want to import.**  For
-   instance, we could skip the declaration for column 1, if 'Date of
-   Birth' is the only data we want to import
-4. If `process` and `check` are specified, then `check` will receive
-   the result of invoking `process` on the cell value.  This makes
-   sense if process is used to make the cell value more accessible to
-   ruby code (e.g., transforming a string into an integer).
-### Add virtual columns, if you want
-Sometimes it is convenient to aggregate or otherwise manipulate the
-data read from each row before doing the actual processing.
-For instance, we might have a table with dates of birth, while we are
-really interested in the age of people.
-In such cases, we can use virtual column. A **virtual column** allows
-one to add a column to the data read.  The value of the column for
-each row is computed using the values of other cells.
-Virtual columns are declared similar to columns.  Thus, for instance,
-the following declaration adds an `age` column to each row of the data
-we read from the previous example:
-```ruby
-i.virtual_column :age do
-  process do |row|
-    # `compute_birthday` has to be defined
-    compute_birthday(row[:birthdate])
-  end
-end
-```
-Virtual columns are, of course, available to the `mapping` directive
-(see below).
-### Specify how to process data
-Finally we can specify how we process lines, using the `mapping`
-directive.  Mapping takes an arbitrary piece of ruby code, which can
-reference the fields of a row.
-For instance:
-```ruby
-i.mapping do |row|
-  puts "#{row[:name][:value]} is #{row[:age][:value]} years old"
-end
-```
-Notice that the data read from each row of our input data is stored in
-a hash.  The hash uses column names as the primary key and stores
-the values in the `:value` key.
-### Start working with the data
-We are now all set and we can start working with the data.
-First use `read` or `load` (synonyms), to read all data and put it
-into a `@table` instance variable.
-```ruby
-i.read
-```
-After reading the file we can use `errors` to see whether any of the
-`check` functions failed:
-```ruby
-array_of_strings = i.errors
-array_of_strings ech do |error_line|
-  puts error_line
-end
-```
-We can then use `virtual_columns` to process data and generate the
-virtual columns:
-```ruby
-i.virtual_columns
-```
-Finally we can use the `process` function to execute the `mapping`
-directive to each line read from the file.
-```ruby
-i.process
-```
-Look in the examples directory for further details and a couple of
-working examples.
-**Remark.** You can override some of the defaults by passing a hash as
-argument to read.  For instance:
-```ruby
-i.read filename: another_filepath
-```
-will read data from `another_filepath`, rather than from the filename
-specified in the options.  This might be useful, for instance, if the
-same specification has to be used for different files.
-## Digging deeper
-If you need to perform more elaborations on the data which cannot be
-captured with `process` (that is, by processing the data row by row),
-you can also directly access all data read, using the `table` method:
-```ruby
-i.read
-i.table
-# an array of hashes (one hash per row)
-```
-More in details, the `read` method fills a `@table` instance variable
-with an array of hashes.  Each hash represents a line of the file.
-Each hash contains one key per column, following your specification.
-Its value is, in turn, a hash with the following structure:
-```ruby
-{
-  value: ...,     # the result of calling process on the cell
-  row_number: ... # the row number
-  col_number: ... # the column number
-  error: ...      # the result of calling check on the cell processed value
-}
-```
-(Note that virtual columns only store `value` and a Boolean `virtual`,
-which is always `true`.)
-Thus, for instance, given the example above:
-```ruby
-i.table
-[ { name: { value: "John", row_number: 1, col_number: 1, errors: nil },
-    age:  { value: 30, row_number: 1, col_number: 2, errors: nil } },
-  { name: { value: "Jane", row_number: 2, col_number: 1, errors: nil },
-    age:  { value: 31, row_number: 2, col_number: 2, errors: nil } } ]
-```
-## Simplifying the hash with the data read
-The `Dreader::Util` class provides some functions to simplify and
-restructure the hashes built by `dreader`.
-`Dreader::Util.simplify hash` simplifies the hash passed as input by
-removing all information but the value and making the value
-accessible directly from the name of the column.
-```ruby
-Dreader::Util.simplify i.table[0]
-{name: "John", age: 30}
-```
-`Dreader::Util.slice hash, keys` and `Dreader::Util.slice hash,
-keys`, where `keys` is an arrays of keys, are respectively used to
-select or remove some keys from `hash`.
-```ruby
-i.table[0]
-{ name: { value: "John", row_number: 1, col_number: 1, errors: nil },
-  age:  { value: 30, row_number: 1, col_number: 2, errors: nil }}
-Dreader::Util.slice i.table[0], :name
-{name: { value: "John", row_number: 1, col_number: 1, errors: nil}
-Dreader::Util.clean i.table[0], :name
-{age:  { value: 30, row_number: 1, col_number: 2, errors: nil }
-```
-The methods `slice` and `clean` are more useful when used in
-conjuction with `simplify`:
-```ruby
-hash = Dreader::Util.simplify i.table[0]
-{name: "John", age: 30}
-Dreader::Util.slice hash, [:age]
-{age: 30}
-Dreader::Util.clean hash, [:age]
-{name: "John"}
-```
-Notice that the output produced by `slice` and `simplify` is a has
-which can be used to create an `ActiveRecord` object.
-Finally, the `Dreader::Util.restructure` method helps building hashes
-to create
-[ActiveModel](http://api.rubyonrails.org/classes/ActiveModel/Model.html)
-objects with nested attributes:
-```ruby
-hash = {name: "John", surname: "Doe", address: "Unknown", city: "NY" }
-Dreader::Util.restructure hash, [:name, :surname], :address_attributes, [:address, :city]
-{name: "John", surname: "Doe", address_attributes: {address: "Unknonw", city: "NY"}}
-```
-## Debugging your specification
-If you are not sure about what is going on (like I often am when
-reading tabular data), you can use the `debug` function, which prints
-the current configuration, reads some records from your files, and
-shows them to standard output:
-```ruby
-i.debug
-i.debug n: 40 # read 40 lines (from first_row, if the option is declared)
-i.debug n: 40, filename: filepath # like above, but read from filepath
-```
-Another possibility is getting the value of the `@table` variable,
-which contains all the data read.
-By default `debug` invokes the `process` and `check` directives.  Pass
-the following options, if you want to disable this behavior; this
-might be useful, for instance, if you intend to check only what data
-is read:
-```ruby
-i.debug process: false, debug: false
-```
-Notice that `check` implies `process`, since `check` is invoked on the
-output of the `process` directive.`
-## Changelog
-See [[Changelog]].
-## Known Limitations
-At the moment:
-- it is not possible to specify column references using header names
-  (like Roo does).
-- it is not possible to pass options to the file readers.  As a
-  consequence tab-separated files must have the `.tsv` extension to be
-  correctly parsed.
-- some testing wouldn't hurt.
-## Known Bugs
-No known bugs and an unknown number of unknown bugs.
-(See the open issues for the known bugs.)
-## Development
-After checking out the repo, run `bin/setup` to install dependencies. You can
-also run `bin/console` for an interactive prompt that will allow you to
-experiment.
-To install this gem onto your local machine, run `bundle exec rake
-install`. To release a new version, update the version number in `version.rb`,
-and then run `bundle exec rake release`, which will create a git tag for the
-version, push git commits and tags, and push the `.gem` file to
-[rubygems.org](https://rubygems.org).
-## Contributing
-Bug reports and pull requests are welcome on GitHub at
-https://github.com/avillafiorita/dreader.
-## License
-The gem is available as open source under the terms of the [MIT
-License](https://opensource.org/licenses/MIT).