daru 0.1.1 → 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 91fd17361715d81a4eda319e4695bc53a44979de
4
- data.tar.gz: 8768b7c62619d4f8446e8167a95990ee59553dde
3
+ metadata.gz: ed2a3e2a4cd9fce8d95af6aac9c3db532eed444f
4
+ data.tar.gz: 90ca6a62ee824d20f72a9f6689c03f27d7667168
5
5
  SHA512:
6
- metadata.gz: bd7d40bb2e1b7ed2f4ea5a598556e8a58f5bf35c2fcbe895150a21752b4ca223ca184b6c7c6e1c1c30bf753df478578beec1d99aff2c31d4d7cfe2520013c47d
7
- data.tar.gz: 264425a5bcd87e2eca1261792d3adf16a497ac20ec1c9c04743cafc84e9f4335525a2c289d36be63b9f88d8fcf19f84cc70824b6ada29ce3105872bc20231d18
6
+ metadata.gz: e6f3345ef4372e1c45a3d80c0cc61c2b4c72e4c810cfb183f30bfd9285a09639ea39cd0a3597fc63551d7f72398d8d83af4424855018e8a5b2a99274b46625cd
7
+ data.tar.gz: 65d262b1deec54680a5fdcfecda3530c9fb9450dbd280c18833b655521418ed340f311253221dfd4e018577b063f9d3638d0d600a108c3b98ec5a7cd2dfe98ec
data/.travis.yml CHANGED
@@ -1,13 +1,11 @@
1
1
  language:
2
2
  ruby
3
3
 
4
- env:
5
- - CPLUS_INCLUDE_PATH=/usr/include/atlas C_INCLUDE_PATH=/usr/include/atlas
6
-
7
4
  rvm:
8
5
  - '2.0'
9
6
  - '2.1'
10
7
  - '2.2'
8
+ - '2.3.0'
11
9
 
12
10
  matrix:
13
11
  fast_finish:
@@ -17,11 +15,9 @@ script: "bundle exec rspec"
17
15
 
18
16
  install:
19
17
  - gem install bundler
20
- - ./.build.sh
21
18
  - bundle install
22
19
 
23
20
  before_install:
24
21
  - sudo apt-get update -qq
25
- - sudo apt-get install -qq libatlas-base-dev
26
22
  - sudo apt-get install -y libgsl0-dev r-base r-base-dev
27
23
  - sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')"
data/CONTRIBUTING.md CHANGED
@@ -2,25 +2,16 @@
2
2
 
3
3
  ## Installing daru development dependencies
4
4
 
5
- If you want to run the full rspec suite, you will need the latest unreleased nmatrix gem. They will released upstream soon but please follow this procedure for now.
6
-
7
- Keep in mind that either nmatrix or rb-gsl are NOT NECESSARY for using daru. They are just required for an optional speed up and for running the test suite.
5
+ Either nmatrix or rb-gsl are NOT NECESSARY for using daru. They are just required for an optional speed up and for running the test suite.
8
6
 
9
7
  To install dependencies, execute the following commands:
10
8
 
11
- `export CPLUS_INCLUDE_PATH=/usr/include/atlas`
12
- `export C_INCLUDE_PATH=/usr/include/atlas`
13
9
  `sudo apt-get update -qq`
14
- `sudo apt-get install -qq libatlas-base-dev`
15
- `sudo apt-get --purge remove liblapack-dev liblapack3 liblapack3gf`
16
10
  `sudo apt-get install -y libgsl0-dev r-base r-base-dev`
17
11
  `sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')"`
18
12
 
19
- Then execute the [.build.sh script](https://github.com/v0dro/daru/blob/master/.build.sh) to clone and install the latest nmatrix system:
20
-
21
- `./.build.sh`
22
13
 
23
- Then finally install remaining dependencies:
14
+ Then install remaining dependencies:
24
15
 
25
16
  `bundle install`
26
17
 
data/History.md CHANGED
@@ -1,3 +1,21 @@
1
+ # 0.1.2
2
+
3
+ * Enhancements
4
+ - New method `DataFrame.from_activerecord` for importing data sets from ActiveRecord. (by @mrkn)
5
+ - Better importing of data from SQL databases by extracting that functionality into a separate class called `Daru::IO::SqlDataSource` (by @mrkn).
6
+ - Faster algorithm for performing inner joins by using the bloomfilter-rb gem. Available only for MRI. (by Peter Tung)
7
+ - Added exception `SizeError` (by Peter Tung).
8
+ - Removed outdated dependencies and build scripts, updated existing dependencies.
9
+ - Ability to sort a Daru::Vector with nils present (by @gnilrets)
10
+
11
+ * Fixes
12
+ - Fix column creation for `Dataframe.from_sql` (by @dansbits).
13
+ - group_by can now be performed on DataFrames with nils (@gnilrets).
14
+ - Bug fix for DataFrame Vectors not duplicating when calling `DataFrame#dup` (by @gnilrets).
15
+ - Bug fix when concantenating DataFrames (by @gnilrets)
16
+ - Handling improper arguments to `Daru::Vector#[]` (by @lokeshh)
17
+ - Resolve narray conflict by using the latest nmatrix require methods (by @lokeshh)
18
+
1
19
  # 0.1.1
2
20
 
3
21
  * Enhancements
data/README.md CHANGED
@@ -1,18 +1,13 @@
1
- daru
2
- ====
3
-
4
- Data Analysis in RUby
1
+ # daru - Data Analysis in RUby
5
2
 
6
3
  [![Gem Version](https://badge.fury.io/rb/daru.svg)](http://badge.fury.io/rb/daru)
7
4
  [![Build Status](https://travis-ci.org/v0dro/daru.svg)](https://travis-ci.org/v0dro/daru)
8
5
 
9
6
  ## Introduction
10
7
 
11
- daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data.
12
-
13
- daru is inspired by pandas, a very mature solution in Python.
8
+ daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby.
14
9
 
15
- Written in pure Ruby so should work with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2.
10
+ daru makes it easy and intuituive to process data predominantly through 2 data structures: `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2 and 2.3.
16
11
 
17
12
  ## Features
18
13
 
@@ -28,7 +23,7 @@ Written in pure Ruby so should work with all ruby implementations. Tested with M
28
23
  * Optional speed and space optimization on MRI with [NMatrix](https://github.com/SciRuby/nmatrix) and GSL.
29
24
  * Easy splitting, aggregation and grouping of data.
30
25
  * Quickly reducing data with pivot tables for quick data summary.
31
- * Import and export data from and to Excel, CSV, SQL Databases and plain text files.
26
+ * Import and export data from and to Excel, CSV, SQL Databases, ActiveRecord and plain text files.
32
27
 
33
28
  ## Notebooks
34
29
 
@@ -64,6 +59,111 @@ Written in pure Ruby so should work with all ruby implementations. Tested with M
64
59
  * [Analysis of Time Series in daru](http://v0dro.github.io/blog/2015/07/31/analysis-of-time-series-in-daru/)
65
60
  * [Date Offsets in Daru](http://v0dro.github.io/blog/2015/07/27/date-offsets-in-daru/)
66
61
 
62
+ ## Basic Usage
63
+
64
+ daru exposes two major data structures: `DataFrame` and `Vector`. The Vector is a basic 1-D structure corresponding to a labelled Array, while the `DataFrame` - daru's primary data structure - is 2-D spreadsheet-like structure for manipulating and storing data sets.
65
+
66
+ Basic DataFrame intitialization.
67
+
68
+ ``` ruby
69
+ data_frame = Daru::DataFrame.new(
70
+ {
71
+ 'Beer' => ['Kingfisher', 'Snow', 'Bud Light', 'Tiger Beer', 'Budweiser'],
72
+ 'Gallons sold' => [500, 400, 450, 200, 250]
73
+ },
74
+ index: ['India', 'China', 'USA', 'Malaysia', 'Canada']
75
+ )
76
+ data_frame
77
+ ```
78
+ ![init0](images/init0.png)
79
+
80
+
81
+ Load data from CSV files.
82
+ ``` ruby
83
+ df = Daru::DataFrame.from_csv('TradeoffData.csv')
84
+ ```
85
+ ![init1](images/init1.png)
86
+
87
+ *Basic Data Manipulation*
88
+
89
+ Selecting rows.
90
+ ``` ruby
91
+ data_frame.row['USA']
92
+ ```
93
+ ![man0](images/man0.png)
94
+
95
+ Selecting columns.
96
+ ``` ruby
97
+ data_frame['Beer']
98
+ ```
99
+ ![man1](images/man1.png)
100
+
101
+ A range of rows.
102
+ ``` ruby
103
+ data_frame.row['India'..'USA']
104
+ ```
105
+ ![man2](images/man2.png)
106
+
107
+ The first 2 rows.
108
+ ``` ruby
109
+ data_frame.first(2)
110
+ ```
111
+ ![man3](images/man3.png)
112
+
113
+ The last 2 rows.
114
+ ``` ruby
115
+ data_frame.last(2)
116
+ ```
117
+ ![man4](images/man4.png)
118
+
119
+ Adding a new column.
120
+ ``` ruby
121
+ data_frame['Gallons produced'] = [550, 500, 600, 210, 240]
122
+ ```
123
+ ![man5](images/man5.png)
124
+
125
+ Creating a new column based on data in other columns.
126
+ ``` ruby
127
+ data_frame['Demand supply gap'] = data_frame['Gallons produced'] - data_frame['Gallons sold']
128
+ ```
129
+ ![man6](images/man6.png)
130
+
131
+ *Condition based selection*
132
+
133
+ Selecting countries based on the number of gallons sold in each. We use a syntax similar to that defined by [Arel](https://github.com/rails/arel), i.e. by using the `where` clause.
134
+ ``` ruby
135
+ data_frame.where(data_frame['Gallons sold'].lt(300))
136
+ ```
137
+ ![con0](images/con0.png)
138
+
139
+ You can pass a combination of boolean operations into the `#where` method and it should work fine:
140
+ ``` ruby
141
+ data_frame.where(
142
+ data_frame['Beer']
143
+ .in(['Snow', 'Kingfisher','Tiger Beer'])
144
+ .and(
145
+ data_frame['Gallons produced'].gt(520).or(data_frame['Gallons produced'].lt(250))
146
+ )
147
+ )
148
+ ```
149
+ ![con1](images/con1.png)
150
+
151
+ *Plotting*
152
+
153
+ Daru supports plotting of interactive graphs with [nyaplot](). You can easily create a plot with the `#plot` method. Here we plot the gallons sold on the Y axis and name of the brand on the X axis in a bar graph.
154
+ ``` ruby
155
+ data_frame.plot type: :bar, x: 'Beer', y: 'Gallons sold' do |plot, diagram|
156
+ plot.x_label "Beer"
157
+ plot.y_label "Gallons Sold"
158
+ plot.yrange [0,600]
159
+ plot.width 500
160
+ plot.height 400
161
+ end
162
+ ```
163
+ ![plot0](images/plot0.png)
164
+
165
+ In addition to nyaplot, daru also supports plotting out of the box with [gnuplotrb](https://github.com/SciRuby/gnuplotrb).
166
+
67
167
  ## Documentation
68
168
 
69
169
  Docs can be found [here](https://rubygems.org/gems/daru).
@@ -71,8 +171,6 @@ Docs can be found [here](https://rubygems.org/gems/daru).
71
171
  ## Roadmap
72
172
 
73
173
  * Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
74
- * Basic Data manipulation and analysis operations:
75
- - DF concat
76
174
  * Assignment of a column to a single number should set the entire column to that number.
77
175
  * Multiple column assignment with []=
78
176
  * Multiple value assignment for vectors with []=.
data/daru.gemspec CHANGED
@@ -34,7 +34,7 @@ Thank you for installing daru!
34
34
  oOOOOOo
35
35
  ,| oO
36
36
  //| |
37
- \\| |
37
+ \\\\| |
38
38
  `| |
39
39
  `-----`
40
40
 
@@ -50,17 +50,22 @@ Cheers!
50
50
  EOF
51
51
 
52
52
  spec.add_runtime_dependency 'reportbuilder', '~> 1.4'
53
- spec.add_runtime_dependency 'spreadsheet', '~> 1.0.3'
53
+ spec.add_runtime_dependency 'spreadsheet', '~> 1.1.1'
54
54
 
55
55
  spec.add_development_dependency 'bundler', '~> 1.10'
56
- spec.add_development_dependency 'rake'
56
+ spec.add_development_dependency 'rake', '~>10.5'
57
57
  spec.add_development_dependency 'pry', '~> 0.10'
58
58
  spec.add_development_dependency 'pry-byebug'
59
59
  spec.add_development_dependency 'rserve-client', '~> 0.3'
60
- spec.add_development_dependency 'rspec'
60
+ spec.add_development_dependency 'rspec', '~> 3.4'
61
61
  spec.add_development_dependency 'awesome_print'
62
62
  spec.add_development_dependency 'nyaplot', '~> 0.1.5'
63
- spec.add_development_dependency 'nmatrix', '~> 0.1.0'
63
+ spec.add_development_dependency 'nmatrix', '~> 0.2.1'
64
64
  spec.add_development_dependency 'distribution', '~> 0.7'
65
65
  spec.add_development_dependency 'rb-gsl', '~>1.16'
66
- end
66
+ spec.add_development_dependency 'bloomfilter-rb', '~> 2.1'
67
+ spec.add_development_dependency 'dbd-sqlite3'
68
+ spec.add_development_dependency 'dbi'
69
+ spec.add_development_dependency 'activerecord', '~> 4.0'
70
+ spec.add_development_dependency 'sqlite3'
71
+ end
data/images/README.md ADDED
@@ -0,0 +1,5 @@
1
+ # Images
2
+
3
+ This folder contains images that are being used in the project README to display code examples.
4
+
5
+ Do not change any names of files.
data/images/con0.png ADDED
Binary file
data/images/con1.png ADDED
Binary file
data/images/init0.png ADDED
Binary file
data/images/init1.png ADDED
Binary file
data/images/man0.png ADDED
Binary file
data/images/man1.png ADDED
Binary file
data/images/man2.png ADDED
Binary file
data/images/man3.png ADDED
Binary file
data/images/man4.png ADDED
Binary file
data/images/man5.png ADDED
Binary file
data/images/man6.png ADDED
Binary file
data/images/plot0.png ADDED
Binary file
data/lib/daru.rb CHANGED
@@ -38,10 +38,12 @@ module Daru
38
38
  attr_accessor :lazy_update
39
39
 
40
40
  def create_has_library(library)
41
- define_singleton_method("has_#{library}?") do
42
- cv = "@@#{library}"
41
+ lib_underscore = library.to_s.gsub(/-/, '_')
42
+ define_singleton_method("has_#{lib_underscore}?") do
43
+ cv = "@@#{lib_underscore}"
43
44
  unless class_variable_defined? cv
44
45
  begin
46
+ library = 'nmatrix/nmatrix' if library == :nmatrix
45
47
  require library.to_s
46
48
  class_variable_set(cv, true)
47
49
  rescue LoadError
@@ -56,6 +58,7 @@ module Daru
56
58
  create_has_library :gsl
57
59
  create_has_library :nmatrix
58
60
  create_has_library :nyaplot
61
+ create_has_library :'bloomfilter-rb'
59
62
  end
60
63
 
61
64
  autoload :Spreadsheet, 'spreadsheet'
@@ -18,7 +18,7 @@ module Daru
18
18
  @context = context
19
19
  vectors = names.map { |vec| context[vec].to_a }
20
20
  tuples = vectors[0].zip(*vectors[1..-1])
21
- keys = tuples.uniq.sort
21
+ keys = tuples.uniq.sort { |a,b| a && b ? a.compact <=> b.compact : a ? 1 : -1 }
22
22
 
23
23
  keys.each do |key|
24
24
  @groups[key] = all_indices_for(tuples, key)
@@ -28,7 +28,7 @@ module Daru
28
28
 
29
29
  # Get a Daru::Vector of the size of each group.
30
30
  def size
31
- index =
31
+ index =
32
32
  if multi_indexed_grouping?
33
33
  Daru::MultiIndex.from_tuples @groups.keys
34
34
  else
@@ -59,15 +59,15 @@ module Daru
59
59
  # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
60
60
  # })
61
61
  # df.group_by([:a, :b]).head(1)
62
- # # =>
62
+ # # =>
63
63
  # # #<Daru::DataFrame:82745170 @name = d7003f75-5eb9-4967-9303-c08dd9160224 @size = 6>
64
- # # a b c d
65
- # # 1 bar one 2 22
66
- # # 3 bar three 1 44
67
- # # 5 bar two 6 66
68
- # # 0 foo one 1 11
69
- # # 7 foo three 8 88
70
- # # 2 foo two 3 33
64
+ # # a b c d
65
+ # # 1 bar one 2 22
66
+ # # 3 bar three 1 44
67
+ # # 5 bar two 6 66
68
+ # # 0 foo one 1 11
69
+ # # 7 foo three 8 88
70
+ # # 2 foo two 3 33
71
71
  def head quantity=5
72
72
  select_groups_from :first, quantity
73
73
  end
@@ -82,14 +82,14 @@ module Daru
82
82
  # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
83
83
  # })
84
84
  # # df.group_by([:a, :b]).tail(1)
85
- # # =>
85
+ # # =>
86
86
  # # #<Daru::DataFrame:82378270 @name = 0623db46-5425-41bd-a843-99baac3d1d9a @size = 6>
87
- # # a b c d
88
- # # 1 bar one 2 22
89
- # # 3 bar three 1 44
90
- # # 5 bar two 6 66
91
- # # 6 foo one 3 77
92
- # # 7 foo three 8 88
87
+ # # a b c d
88
+ # # 1 bar one 2 22
89
+ # # 3 bar three 1 44
90
+ # # 5 bar two 6 66
91
+ # # 6 foo one 3 77
92
+ # # 7 foo three 8 88
93
93
  # # 4 foo two 3 55
94
94
  def tail quantity=5
95
95
  select_groups_from :last, quantity
@@ -103,15 +103,15 @@ module Daru
103
103
  # c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
104
104
  # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
105
105
  # df.group_by([:a, :b]).mean
106
- # # =>
106
+ # # =>
107
107
  # # #<Daru::DataFrame:81097450 @name = 0c32983f-3e06-451f-a9c9-051cadfe7371 @size = 6>
108
- # # c d
109
- # # ["bar", "one"] 2 22
110
- # # ["bar", "three"] 1 44
111
- # # ["bar", "two"] 6 66
112
- # # ["foo", "one"] 2.0 44.0
113
- # # ["foo", "three"] 8 88
114
- # # ["foo", "two"] 3.0 44.0
108
+ # # c d
109
+ # # ["bar", "one"] 2 22
110
+ # # ["bar", "three"] 1 44
111
+ # # ["bar", "two"] 6 66
112
+ # # ["foo", "one"] 2.0 44.0
113
+ # # ["foo", "three"] 8 88
114
+ # # ["foo", "two"] 3.0 44.0
115
115
  def mean
116
116
  apply_method :numeric, :mean
117
117
  end
@@ -128,28 +128,28 @@ module Daru
128
128
 
129
129
  # Count groups, excludes missing values.
130
130
  # @example Using count
131
- # df = Daru::DataFrame.new({
132
- # a: %w{foo bar foo bar foo bar foo foo},
133
- # b: %w{one one two three two two one three},
134
- # c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
135
- # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
136
- # })
131
+ # df = Daru::DataFrame.new({
132
+ # a: %w{foo bar foo bar foo bar foo foo},
133
+ # b: %w{one one two three two two one three},
134
+ # c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
135
+ # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
136
+ # })
137
137
  # df.group_by([:a, :b]).count
138
- # # =>
138
+ # # =>
139
139
  # # #<Daru::DataFrame:76900210 @name = 7b9cf55d-17f8-48c7-b03a-2586c6e5ec5a @size = 6>
140
- # # c d
141
- # # ["bar", "one"] 1 1
142
- # # ["bar", "two"] 1 1
143
- # # ["bar", "three"] 1 1
144
- # # ["foo", "one"] 2 2
145
- # # ["foo", "three"] 1 1
146
- # # ["foo", "two"] 2 2
140
+ # # c d
141
+ # # ["bar", "one"] 1 1
142
+ # # ["bar", "two"] 1 1
143
+ # # ["bar", "three"] 1 1
144
+ # # ["foo", "one"] 2 2
145
+ # # ["foo", "three"] 1 1
146
+ # # ["foo", "two"] 2 2
147
147
  def count
148
148
  width = @non_group_vectors.size
149
149
  Daru::DataFrame.new([size]*width, order: @non_group_vectors)
150
150
  end
151
151
 
152
- # Calculate sample standard deviation of numeric vector groups, excluding
152
+ # Calculate sample standard deviation of numeric vector groups, excluding
153
153
  # missing values.
154
154
  def std
155
155
  apply_method :numeric, :std
@@ -177,9 +177,9 @@ module Daru
177
177
  # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
178
178
  # })
179
179
  # df.group_by([:a, :b]).get_group ['bar','two']
180
- # #=>
180
+ # #=>
181
181
  # ##<Daru::DataFrame:83258980 @name = 687ee3f6-8874-4899-97fa-9b31d84fa1d5 @size = 1>
182
- # # a b c d
182
+ # # a b c d
183
183
  # # 5 bar two 6 66
184
184
  def get_group group
185
185
  indexes = @groups[group]
@@ -198,7 +198,7 @@ module Daru
198
198
  rows, index: @context.index[indexes], order: @context.vectors)
199
199
  end
200
200
 
201
- private
201
+ private
202
202
 
203
203
  def select_groups_from method, quantity
204
204
  selection = @context
@@ -227,7 +227,7 @@ module Daru
227
227
  slice = vec[*indexes]
228
228
  single_row << (slice.is_a?(Numeric) ? slice : slice.send(method))
229
229
  end
230
- end
230
+ end
231
231
 
232
232
  rows << single_row
233
233
  end
@@ -260,4 +260,4 @@ module Daru
260
260
  end
261
261
  end
262
262
  end
263
- end
263
+ end