mikon 0.1.0.rc1

checksums.yaml ADDED
@@ -0,0 +1,7 @@
+ ---
+ SHA1:
+ metadata.gz: b1231fd8b6e9190c0b931b73b4aaaa65ff7edf82
+ data.tar.gz: 51bb7dc183b01f776257ab967cc2375bf4f91c85
+ SHA512:
+ metadata.gz: 5ee069b432821407f437189b40c1f0b5dc5f16db44d3e4b3908845c543f847ea84e5d757146d1ef5c5ddbd311cfae175a1b0382fb7fb65307fb54c9f93845d33
+ data.tar.gz: a635fc8e8a5538564d8edbf38c9831782777ff0807e2d49129d4c69f40617728a42efd557596c804bf4123ad9bd4e9246859b47e7c6d4ea85f5353c7eedcbeee
data/.gitignore ADDED
@@ -0,0 +1,18 @@
+ *.gem
+ *.rbc
+ .bundle
+ .config
+ .yardoc
+ Gemfile.lock
+ InstalledFiles
+ _yardoc
+ coverage
+ doc/
+ lib/bundler/man
+ pkg
+ rdoc
+ spec/reports
+ test/tmp
+ test/version_tmp
+ tmp
+ *~
data/Gemfile ADDED
@@ -0,0 +1,4 @@
+ source 'https://rubygems.org'
+ # Specify your gem's dependencies in mikon.gemspec
+ gemspec
@@ -0,0 +1,22 @@
+ Copyright (c) 2014 Naoki Nishida
+ MIT License
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+ The above copyright notice and this permission notice shall be
+ included in all copies or substantial portions of the Software.
@@ -0,0 +1,160 @@
+ # ☗ Mikon
+ ![top](https://dl.dropboxusercontent.com/u/47978121/mikon/top.png)
+ Mikon is a flexible data structure for Ruby language, inspired by data.frame of R and Pandas of Python.
+ Its goal is to make it easy to manipulate the real data, apply statistical function to it and visualize the result in Ruby language.
+ It is compatible with `Nyaplot::DataFrame` and `Statsample::Vector`, and most methods the both gem have can be applied to Mikon's data structure.
+ Main Features:
+ * Fast data manipulation with [NMatrix](https://github.com/SciRuby/nmatrix)
+ * Advanced plotting with [Nyaplot](https://github.com/domitry/nyaplot)
+ ## Dependencies
+ * CRuby >= 2.0.0-p451
+ * [NMatrix](https://github.com/SciRuby/nmatrix) >= v0.1.0.rc5
+ * [Formatador](https://github.com/geemus/formatador) >= 0.2.5
+ ### Optional Dependencies
+ * [Nyaplot](https://github.com/domitry/nyaplot): for plotting
+ * [Statsample](https://github.com/clbustos/statsample): for statistical function
+ * [IRuby](https://github.com/minad/iruby): for the interactive manipulation of data
+ ## Installation
+ $ gem install mikon
+ ## Examples
+ Notebooks created with [IRuby](https://github.com/minad/iruby):
+ * [Basic Data Manipulation](http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/47978121/gsoc/Mikon_Manipuration.ipynb)
+ * [Statistical functions](http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/47978121/gsoc/Mikon_stats.ipynb)
+ * [Plotting](http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/47978121/gsoc/Plotting.ipynb)
+ ## Usage
+ ### Initializing DataFrame
+ ```ruby
+ require 'mikon'
+ df2 = Mikon::DataFrame.new([{a: 1, b: 2}, {a: 2, b: 3}, {a: 3, b: 4}])
+ ```
+ ![init0](https://dl.dropboxusercontent.com/u/47978121/mikon/init0.png)
+ ```ruby
+ Mikon::DataFrame.new({a: [1,2,3,4], b: [2,3,4,5]}, index: [:a, :b, :c, :d])
+ ```
+ ![init1](https://dl.dropboxusercontent.com/u/47978121/mikon/init1.png)
+ ```ruby
+ df = Mikon::DataFrame.from_csv("~/data.csv")
+ ```
+ ![init2](https://dl.dropboxusercontent.com/u/47978121/mikon/init2.png)
+ ### Basic data manipulating
+ ```ruby
+ df[:value]
+ ```
+ ![init2](https://dl.dropboxusercontent.com/u/47978121/mikon/column_label.png)
+ ```ruby
+ df[10..20]
+ ```
+ ![init2](https://dl.dropboxusercontent.com/u/47978121/mikon/row_num.png)
+ ```ruby
+ df.head(2)
+ ```
+ ![head](https://dl.dropboxusercontent.com/u/47978121/mikon/head.png)
+ ```ruby
+ df.tail(2)
+ ```
+ ![tail](https://dl.dropboxusercontent.com/u/47978121/mikon/tail.png)
+ ### Row-based data manipulating
+ ```ruby
+ df.select{value > 100}
+ ```
+ ![select](https://dl.dropboxusercontent.com/u/47978121/mikon/select.png)
+ ```ruby
+ df2.map{b+1}.name(:c)
+ ```
+ ![map](https://dl.dropboxusercontent.com/u/47978121/mikon/map.png)
+ ```ruby
+ foo = []
+ df.each{foo.push(2*a)}
+ p foo #-> [2,4,6]
+ ```
+ ```ruby
+ df.insert_column(:new_value){value * 2}
+ ```
+ ![insert_column](https://dl.dropboxusercontent.com/u/47978121/mikon/insert_column_row.png)
+ ```ruby
+ df.any?{value >= 100} #-> true
+ df.all?{valu > 1} #-> false
+ ```
+ ### Column-based data manipulating
+ In most cases column-based manipulating is **faster than Row-based**.
+ ```ruby
+ df2[:b] - df2[:a]
+ ```
+ ![column_base0](https://dl.dropboxusercontent.com/u/47978121/mikon/column-base0.png)
+ ```ruby
+ df.insert_column(:new_value, df[:value]*2)
+ ```
+ ![column_base1](https://dl.dropboxusercontent.com/u/47978121/mikon/insert_column_row.png)
+ ### Plotting
+ ```ruby
+ df[:value].plot
+ ```
+ ![hist](https://dl.dropboxusercontent.com/u/47978121/mikon/hist.png)
+ ### Plotting with Nyaplot
+ ```ruby
+ require 'nyaplot'
+ plot = Nyaplot::Plot.new
+ plot.add_with_df(df, :histogram, :value)
+ plot
+ ```
+ ![hist](https://dl.dropboxusercontent.com/u/47978121/mikon/hist.png)
+ ### Statistical with Statsample
+ `Mikon::Series` is compatible with `Statsample::Vector`, so most methods of Statsample can be applied to `Mikon::Series`.
+ ```
+ require 'statsample'
+ Statsample::Analysis.store(Statsample::Test::T) do
+ t_2 = Statsample::Test.t_two_samples_independent(df1[:value], df1[:new_value])
+ summary t_2
+ end
+ Statsample::Analysis.run_batch
+ ```
+ ![statsample](https://dl.dropboxusercontent.com/u/47978121/mikon/statsample.png)
+ ## License
+ MIT License
+ ## Acknowledgement
+ [Ruby Association Grant 2014](http://www.ruby.or.jp/en/news/20140805.html) has been earmarked for the development of Mikon.
+ ## Contributing
+ 1. Fork it ( http://github.com/domitry/mikon/fork )
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
+ 4. Run tests by running `rspec` on `/path_to_gem/mikon/`
+ 5. Push to the branch (`git push origin my-new-feature`)
+ 6. Create new Pull Request
data/Rakefile ADDED
@@ -0,0 +1 @@
+ require "bundler/gem_tasks"
+ ]
+ }
+ ],
+ "prompt_number": 11
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "2*df1[:value]"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<table><tr><th></th><th>value</th></tr><tr><th>0</th><td>7.10468064397404</td></tr><tr><th>1</th><td>9.57325535130944</td></tr><tr><th>2</th><td>7.33606651925336</td></tr><tr><th>3</th><td>4.08530156538588</td></tr><tr><th>...</th><td>...</td></tr><tr><th>4127</th><td>9.9100629476181</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 8,
+ "text": [
+ ]
+ }
+ ],
+ "prompt_number": 8
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df1[:value] - df1[:value]"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<table><tr><th></th><th>value</th></tr><tr><th>0</th><td>0.0</td></tr><tr><th>1</th><td>0.0</td></tr><tr><th>2</th><td>0.0</td></tr><tr><th>3</th><td>0.0</td></tr><tr><th>...</th><td>...</td></tr><tr><th>4127</th><td>0.0</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 7,
+ "text": [
+ ]
+ }
+ ],
+ "prompt_number": 7
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df1.any?{value > 5}"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 13,
+ "text": [
+ "true"
+ ]
+ }
+ ],
+ "prompt_number": 13
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df1.all?{value > 1}"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 14,
+ "text": [
+ "false"
+ ]
+ }
+ ],
+ "prompt_number": 14
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df = Mikon::DataFrame.new({a: [1,2,3], b: [2,3,4]})"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<html><table><tr><td></td><th>a</th><th>b</th></tr><tr><th>0</th><td>1</td><td>2</td></tr><tr><th>1</th><td>2</td><td>3</td></tr><tr><th>2</th><td>3</td><td>4</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 9,
+ "text": [
+ "#<Mikon::DataFrame:0xb8551b54 @labels=[:a, :b], @data=[#<Mikon::DArray:0xb8551a14 @data=#<NMatrix:0xb85519b0 shape:[3] dtype:int32 stype:dense>, @dtype=:int32>, #<Mikon::DArray:0xb855199c @data=#<NMatrix:0xb8551938 shape:[3] dtype:int32 stype:dense>, @dtype=:int32>], @index=[0, 1, 2], @name=\"0817dd66-f81c-46dc-8393-8e8821abaa7f\">"
+ ]
+ }
+ ],
+ "prompt_number": 9
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df[:b] - df[:a]"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<table><tr><th></th><th>b</th></tr><tr><th>0</th><td>1</td></tr><tr><th>1</th><td>1</td></tr><tr><th>2</th><td>1</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 10,
+ "text": [
+ "#<Mikon::Series:0xb8544e54 @data=#<Mikon::DArray:0xb8544eb8 @data=#<NMatrix:0xb8544ecc shape:[3] dtype:int32 stype:dense>, @dtype=:int32>, @index=[0, 1, 2], @name=:b>"
+ ]
+ }
+ ],
+ "prompt_number": 10
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "2 * df[:a]"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<table><tr><th></th><th>a</th></tr><tr><th>0</th><td>2</td></tr><tr><th>1</th><td>4</td></tr><tr><th>2</th><td>6</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 11,
+ "text": [
+ "#<Mikon::Series:0xb8534cfc @data=#<Mikon::DArray:0xb8534d60 @data=#<NMatrix:0xb8534d74 shape:[3] dtype:int32 stype:dense>, @dtype=:int32>, @index=[0, 1, 2], @name=:a>"
+ ]
+ }
+ ],
+ "prompt_number": 11
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df[:a]%2"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<table><tr><th></th><th>a</th></tr><tr><th>0</th><td>1</td></tr><tr><th>1</th><td>0</td></tr><tr><th>2</th><td>1</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 13,
+ "text": [
+ "#<Mikon::Series:0xb8518a34 @data=#<Mikon::DArray:0xb8518a98 @data=#<NMatrix:0xb8518ac0 shape:[3] dtype:int32 stype:dense>, @dtype=:int32>, @index=[0, 1, 2], @name=:a>"
+ ]
+ }
+ ],
+ "prompt_number": 13
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df[1..2]"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<html><table><tr><td></td><th>a</th><th>b</th></tr><tr><th>1</th><td>2</td><td>3</td></tr><tr><th>2</th><td>3</td><td>4</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 16,
+ "text": [
+ "#<Mikon::DataFrame:0xb95ec2d4 @labels=[:a, :b], @data=[#<Mikon::DArray:0xb95e3ee0 @data=#<NMatrix:0xb95e3e68 shape:[2] dtype:int32 stype:dense>, @dtype=:int32>, #<Mikon::DArray:0xb95e3e54 @data=#<NMatrix:0xb95e3ddc shape:[2] dtype:int32 stype:dense>, @dtype=:int32>], @index=[1, 2], @name=\"300fa37f-11fd-4fd8-8197-5b7c46b3c454\">"
+ ]
+ }
+ ],
+ "prompt_number": 16
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df.select{a**2 < b}"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<html><table><tr><td></td><th>a</th><th>b</th></tr><tr><th>0</th><td>1</td><td>2</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 17,
+ "text": [
+ "#<Mikon::DataFrame:0xb95d3518 @labels=[:a, :b], @data=[#<Mikon::DArray:0xb95d3194 @data=#<NMatrix:0xb95d3130 shape:[1] dtype:int32 stype:dense>, @dtype=:int32>, #<Mikon::DArray:0xb95d311c @data=#<NMatrix:0xb95d30b8 shape:[1] dtype:int32 stype:dense>, @dtype=:int32>], @index=[0], @name=\"e9b2d0c5-66e5-49da-9d66-6b9068926b88\">"
+ ]
+ }
+ ],
+ "prompt_number": 17
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df2 = Mikon::DataFrame.new({a: [1,2,3,4], b: [2,3,4,5]}, index: [:a, :b, :c, :d])"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<html><table><tr><td></td><th>a</th><th>b</th></tr><tr><th>a</th><td>1</td><td>2</td></tr><tr><th>b</th><td>2</td><td>3</td></tr><tr><th>c</th><td>3</td><td>4</td></tr><tr><th>d</th><td>4</td><td>5</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 9,
+ "text": [
+ "#<Mikon::DataFrame:0xb8a919fc @labels=[:a, :b], @data=[#<Mikon::DArray:0xb8a91768 @data=#<NMatrix:0xb8a916f0 shape:[4] dtype:int32 stype:dense>, @dtype=:int32>, #<Mikon::DArray:0xb8a916dc @data=#<NMatrix:0xb8a9163c shape:[4] dtype:int32 stype:dense>, @dtype=:int32>], @index=[:a, :b, :c, :d], @name=\"0ded92a5-3b7d-45dd-8520-418e5c17a849\">"
+ ]
+ }
+ ],
+ "prompt_number": 9
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df2[:a .. :c]"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<html><table><tr><td></td><th>a</th><th>b</th></tr><tr><th>a</th><td>1</td><td>2</td></tr><tr><th>b</th><td>2</td><td>3</td></tr><tr><th>c</th><td>3</td><td>4</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 10,
+ "text": [
+ "#<Mikon::DataFrame:0xb8a940bc @labels=[:a, :b], @data=[#<Mikon::DArray:0xb8a9bc04 @data=#<NMatrix:0xb8a9b920 shape:[3] dtype:int32 stype:dense>, @dtype=:int32>, #<Mikon::DArray:0xb8a9b894 @data=#<NMatrix:0xb8a9b718 shape:[3] dtype:int32 stype:dense>, @dtype=:int32>], @index=[:a, :b, :c], @name=\"7f6fb7de-725d-47e0-babf-f37a1d4d72f4\">"
+ ]
+ }
+ ],
+ "prompt_number": 10
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "[1,2].each{|val| p val}"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "stream": "stdout",
+ "text": [
+ "1"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "stream": "stdout",
+ "text": [
+ "\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "stream": "stdout",
+ "text": [
+ "2"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "stream": "stdout",
+ "text": [
+ "\n"
+ ]
+ },
+ {
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 20,
+ "text": [
+ "[1, 2]"
+ ]
+ }
+ ],
+ "prompt_number": 20
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df = Mikon::DataFrame.new({a: [1,2,3,4,5,6,7,8,9], b: [1,2,3,4,5,6,7,8,9]})"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<html><table><tr><td></td><th>a</th><th>b</th></tr><tr><th>0</th><td>1</td><td>1</td></tr><tr><th>1</th><td>2</td><td>2</td></tr><tr><th>2</th><td>3</td><td>3</td></tr><tr><th>3</th><td>4</td><td>4</td></tr><tr><th>4</th><td>5</td><td>5</td></tr><tr><th>5</th><td>6</td><td>6</td></tr><tr><th>6</th><td>7</td><td>7</td></tr><tr><th>7</th><td>8</td><td>8</td></tr><tr><th>8</th><td>9</td><td>9</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 21,
+ "text": [
+ "#<Mikon::DataFrame:0xb8c40e74 @labels=[:a, :b], @data=[#<Mikon::DArray:0xb8c40ce4 @data=#<NMatrix:0xb8c40c6c shape:[9] dtype:int32 stype:dense>, @dtype=:int32>, #<Mikon::DArray:0xb8c40c58 @data=#<NMatrix:0xb8c40be0 shape:[9] dtype:int32 stype:dense>, @dtype=:int32>], @index=[0, 1, 2, 3, 4, 5, 6, 7, 8], @name=\"103c6d4a-465c-45e8-b252-060d25809082\">"
+ ]
+ }
+ ],
+ "prompt_number": 21
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "df.map{a+b}.name(:c)"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "html": [
+ "<table><tr><th></th><th>c</th></tr><tr><th>0</th><td>3</td></tr><tr><th>1</th><td>5</td></tr><tr><th>2</th><td>7</td></tr></table>"
+ ],
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 14,
+ "text": [
+ "#<Mikon::Series:0xb8503530 @data=#<Mikon::DArray:0xb8503508 @data=#<NMatrix:0xb85034a4 shape:[3] dtype:int32 stype:dense>, @dtype=:int32>, @index=[0, 1, 2], @name=:c>"
+ ]
+ }
+ ],
+ "prompt_number": 14
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "hoge = []\n",
+ "df.each{hoge.push(a*b)}\n",
+ "hoge"
+ ],
+ "language": "python",
+ "metadata": {},
+ "outputs": [
+ {
+ "metadata": {},
+ "output_type": "pyout",
+ "prompt_number": 23,
+ "text": [
+ "[1, 4, 9, 16, 25, 36, 49, 64, 81]"
+ ]
+ }
+ ],
+ "prompt_number": 23
+ }
+ ],
+ "metadata": {}
+ }
+ ]
+ }