daru 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 91fd17361715d81a4eda319e4695bc53a44979de
4
- data.tar.gz: 8768b7c62619d4f8446e8167a95990ee59553dde
3
+ metadata.gz: ed2a3e2a4cd9fce8d95af6aac9c3db532eed444f
4
+ data.tar.gz: 90ca6a62ee824d20f72a9f6689c03f27d7667168
5
5
  SHA512:
6
- metadata.gz: bd7d40bb2e1b7ed2f4ea5a598556e8a58f5bf35c2fcbe895150a21752b4ca223ca184b6c7c6e1c1c30bf753df478578beec1d99aff2c31d4d7cfe2520013c47d
7
- data.tar.gz: 264425a5bcd87e2eca1261792d3adf16a497ac20ec1c9c04743cafc84e9f4335525a2c289d36be63b9f88d8fcf19f84cc70824b6ada29ce3105872bc20231d18
6
+ metadata.gz: e6f3345ef4372e1c45a3d80c0cc61c2b4c72e4c810cfb183f30bfd9285a09639ea39cd0a3597fc63551d7f72398d8d83af4424855018e8a5b2a99274b46625cd
7
+ data.tar.gz: 65d262b1deec54680a5fdcfecda3530c9fb9450dbd280c18833b655521418ed340f311253221dfd4e018577b063f9d3638d0d600a108c3b98ec5a7cd2dfe98ec
data/.travis.yml CHANGED
@@ -1,13 +1,11 @@
1
1
  language:
2
2
  ruby
3
3
 
4
- env:
5
- - CPLUS_INCLUDE_PATH=/usr/include/atlas C_INCLUDE_PATH=/usr/include/atlas
6
-
7
4
  rvm:
8
5
  - '2.0'
9
6
  - '2.1'
10
7
  - '2.2'
8
+ - '2.3.0'
11
9
 
12
10
  matrix:
13
11
  fast_finish:
@@ -17,11 +15,9 @@ script: "bundle exec rspec"
17
15
 
18
16
  install:
19
17
  - gem install bundler
20
- - ./.build.sh
21
18
  - bundle install
22
19
 
23
20
  before_install:
24
21
  - sudo apt-get update -qq
25
- - sudo apt-get install -qq libatlas-base-dev
26
22
  - sudo apt-get install -y libgsl0-dev r-base r-base-dev
27
23
  - sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')"
data/CONTRIBUTING.md CHANGED
@@ -2,25 +2,16 @@
2
2
 
3
3
  ## Installing daru development dependencies
4
4
 
5
- If you want to run the full rspec suite, you will need the latest unreleased nmatrix gem. They will released upstream soon but please follow this procedure for now.
6
-
7
- Keep in mind that either nmatrix or rb-gsl are NOT NECESSARY for using daru. They are just required for an optional speed up and for running the test suite.
5
+ Either nmatrix or rb-gsl are NOT NECESSARY for using daru. They are just required for an optional speed up and for running the test suite.
8
6
 
9
7
  To install dependencies, execute the following commands:
10
8
 
11
- `export CPLUS_INCLUDE_PATH=/usr/include/atlas`
12
- `export C_INCLUDE_PATH=/usr/include/atlas`
13
9
  `sudo apt-get update -qq`
14
- `sudo apt-get install -qq libatlas-base-dev`
15
- `sudo apt-get --purge remove liblapack-dev liblapack3 liblapack3gf`
16
10
  `sudo apt-get install -y libgsl0-dev r-base r-base-dev`
17
11
  `sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')"`
18
12
 
19
- Then execute the [.build.sh script](https://github.com/v0dro/daru/blob/master/.build.sh) to clone and install the latest nmatrix system:
20
-
21
- `./.build.sh`
22
13
 
23
- Then finally install remaining dependencies:
14
+ Then install remaining dependencies:
24
15
 
25
16
  `bundle install`
26
17
 
data/History.md CHANGED
@@ -1,3 +1,21 @@
1
+ # 0.1.2
2
+
3
+ * Enhancements
4
+ - New method `DataFrame.from_activerecord` for importing data sets from ActiveRecord. (by @mrkn)
5
+ - Better importing of data from SQL databases by extracting that functionality into a separate class called `Daru::IO::SqlDataSource` (by @mrkn).
6
+ - Faster algorithm for performing inner joins by using the bloomfilter-rb gem. Available only for MRI. (by Peter Tung)
7
+ - Added exception `SizeError` (by Peter Tung).
8
+ - Removed outdated dependencies and build scripts, updated existing dependencies.
9
+ - Ability to sort a Daru::Vector with nils present (by @gnilrets)
10
+
11
+ * Fixes
12
+ - Fix column creation for `Dataframe.from_sql` (by @dansbits).
13
+ - group_by can now be performed on DataFrames with nils (@gnilrets).
14
+ - Bug fix for DataFrame Vectors not duplicating when calling `DataFrame#dup` (by @gnilrets).
15
+ - Bug fix when concantenating DataFrames (by @gnilrets)
16
+ - Handling improper arguments to `Daru::Vector#[]` (by @lokeshh)
17
+ - Resolve narray conflict by using the latest nmatrix require methods (by @lokeshh)
18
+
1
19
  # 0.1.1
2
20
 
3
21
  * Enhancements
data/README.md CHANGED
@@ -1,18 +1,13 @@
1
- daru
2
- ====
3
-
4
- Data Analysis in RUby
1
+ # daru - Data Analysis in RUby
5
2
 
6
3
  [![Gem Version](https://badge.fury.io/rb/daru.svg)](http://badge.fury.io/rb/daru)
7
4
  [![Build Status](https://travis-ci.org/v0dro/daru.svg)](https://travis-ci.org/v0dro/daru)
8
5
 
9
6
  ## Introduction
10
7
 
11
- daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data.
12
-
13
- daru is inspired by pandas, a very mature solution in Python.
8
+ daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby.
14
9
 
15
- Written in pure Ruby so should work with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2.
10
+ daru makes it easy and intuituive to process data predominantly through 2 data structures: `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2 and 2.3.
16
11
 
17
12
  ## Features
18
13
 
@@ -28,7 +23,7 @@ Written in pure Ruby so should work with all ruby implementations. Tested with M
28
23
  * Optional speed and space optimization on MRI with [NMatrix](https://github.com/SciRuby/nmatrix) and GSL.
29
24
  * Easy splitting, aggregation and grouping of data.
30
25
  * Quickly reducing data with pivot tables for quick data summary.
31
- * Import and export data from and to Excel, CSV, SQL Databases and plain text files.
26
+ * Import and export data from and to Excel, CSV, SQL Databases, ActiveRecord and plain text files.
32
27
 
33
28
  ## Notebooks
34
29
 
@@ -64,6 +59,111 @@ Written in pure Ruby so should work with all ruby implementations. Tested with M
64
59
  * [Analysis of Time Series in daru](http://v0dro.github.io/blog/2015/07/31/analysis-of-time-series-in-daru/)
65
60
  * [Date Offsets in Daru](http://v0dro.github.io/blog/2015/07/27/date-offsets-in-daru/)
66
61
 
62
+ ## Basic Usage
63
+
64
+ daru exposes two major data structures: `DataFrame` and `Vector`. The Vector is a basic 1-D structure corresponding to a labelled Array, while the `DataFrame` - daru's primary data structure - is 2-D spreadsheet-like structure for manipulating and storing data sets.
65
+
66
+ Basic DataFrame intitialization.
67
+
68
+ ``` ruby
69
+ data_frame = Daru::DataFrame.new(
70
+ {
71
+ 'Beer' => ['Kingfisher', 'Snow', 'Bud Light', 'Tiger Beer', 'Budweiser'],
72
+ 'Gallons sold' => [500, 400, 450, 200, 250]
73
+ },
74
+ index: ['India', 'China', 'USA', 'Malaysia', 'Canada']
75
+ )
76
+ data_frame
77
+ ```
78
+ ![init0](images/init0.png)
79
+
80
+
81
+ Load data from CSV files.
82
+ ``` ruby
83
+ df = Daru::DataFrame.from_csv('TradeoffData.csv')
84
+ ```
85
+ ![init1](images/init1.png)
86
+
87
+ *Basic Data Manipulation*
88
+
89
+ Selecting rows.
90
+ ``` ruby
91
+ data_frame.row['USA']
92
+ ```
93
+ ![man0](images/man0.png)
94
+
95
+ Selecting columns.
96
+ ``` ruby
97
+ data_frame['Beer']
98
+ ```
99
+ ![man1](images/man1.png)
100
+
101
+ A range of rows.
102
+ ``` ruby
103
+ data_frame.row['India'..'USA']
104
+ ```
105
+ ![man2](images/man2.png)
106
+
107
+ The first 2 rows.
108
+ ``` ruby
109
+ data_frame.first(2)
110
+ ```
111
+ ![man3](images/man3.png)
112
+
113
+ The last 2 rows.
114
+ ``` ruby
115
+ data_frame.last(2)
116
+ ```
117
+ ![man4](images/man4.png)
118
+
119
+ Adding a new column.
120
+ ``` ruby
121
+ data_frame['Gallons produced'] = [550, 500, 600, 210, 240]
122
+ ```
123
+ ![man5](images/man5.png)
124
+
125
+ Creating a new column based on data in other columns.
126
+ ``` ruby
127
+ data_frame['Demand supply gap'] = data_frame['Gallons produced'] - data_frame['Gallons sold']
128
+ ```
129
+ ![man6](images/man6.png)
130
+
131
+ *Condition based selection*
132
+
133
+ Selecting countries based on the number of gallons sold in each. We use a syntax similar to that defined by [Arel](https://github.com/rails/arel), i.e. by using the `where` clause.
134
+ ``` ruby
135
+ data_frame.where(data_frame['Gallons sold'].lt(300))
136
+ ```
137
+ ![con0](images/con0.png)
138
+
139
+ You can pass a combination of boolean operations into the `#where` method and it should work fine:
140
+ ``` ruby
141
+ data_frame.where(
142
+ data_frame['Beer']
143
+ .in(['Snow', 'Kingfisher','Tiger Beer'])
144
+ .and(
145
+ data_frame['Gallons produced'].gt(520).or(data_frame['Gallons produced'].lt(250))
146
+ )
147
+ )
148
+ ```
149
+ ![con1](images/con1.png)
150
+
151
+ *Plotting*
152
+
153
+ Daru supports plotting of interactive graphs with [nyaplot](). You can easily create a plot with the `#plot` method. Here we plot the gallons sold on the Y axis and name of the brand on the X axis in a bar graph.
154
+ ``` ruby
155
+ data_frame.plot type: :bar, x: 'Beer', y: 'Gallons sold' do |plot, diagram|
156
+ plot.x_label "Beer"
157
+ plot.y_label "Gallons Sold"
158
+ plot.yrange [0,600]
159
+ plot.width 500
160
+ plot.height 400
161
+ end
162
+ ```
163
+ ![plot0](images/plot0.png)
164
+
165
+ In addition to nyaplot, daru also supports plotting out of the box with [gnuplotrb](https://github.com/SciRuby/gnuplotrb).
166
+
67
167
  ## Documentation
68
168
 
69
169
  Docs can be found [here](https://rubygems.org/gems/daru).
@@ -71,8 +171,6 @@ Docs can be found [here](https://rubygems.org/gems/daru).
71
171
  ## Roadmap
72
172
 
73
173
  * Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
74
- * Basic Data manipulation and analysis operations:
75
- - DF concat
76
174
  * Assignment of a column to a single number should set the entire column to that number.
77
175
  * Multiple column assignment with []=
78
176
  * Multiple value assignment for vectors with []=.
data/daru.gemspec CHANGED
@@ -34,7 +34,7 @@ Thank you for installing daru!
34
34
  oOOOOOo
35
35
  ,| oO
36
36
  //| |
37
- \\| |
37
+ \\\\| |
38
38
  `| |
39
39
  `-----`
40
40
 
@@ -50,17 +50,22 @@ Cheers!
50
50
  EOF
51
51
 
52
52
  spec.add_runtime_dependency 'reportbuilder', '~> 1.4'
53
- spec.add_runtime_dependency 'spreadsheet', '~> 1.0.3'
53
+ spec.add_runtime_dependency 'spreadsheet', '~> 1.1.1'
54
54
 
55
55
  spec.add_development_dependency 'bundler', '~> 1.10'
56
- spec.add_development_dependency 'rake'
56
+ spec.add_development_dependency 'rake', '~>10.5'
57
57
  spec.add_development_dependency 'pry', '~> 0.10'
58
58
  spec.add_development_dependency 'pry-byebug'
59
59
  spec.add_development_dependency 'rserve-client', '~> 0.3'
60
- spec.add_development_dependency 'rspec'
60
+ spec.add_development_dependency 'rspec', '~> 3.4'
61
61
  spec.add_development_dependency 'awesome_print'
62
62
  spec.add_development_dependency 'nyaplot', '~> 0.1.5'
63
- spec.add_development_dependency 'nmatrix', '~> 0.1.0'
63
+ spec.add_development_dependency 'nmatrix', '~> 0.2.1'
64
64
  spec.add_development_dependency 'distribution', '~> 0.7'
65
65
  spec.add_development_dependency 'rb-gsl', '~>1.16'
66
- end
66
+ spec.add_development_dependency 'bloomfilter-rb', '~> 2.1'
67
+ spec.add_development_dependency 'dbd-sqlite3'
68
+ spec.add_development_dependency 'dbi'
69
+ spec.add_development_dependency 'activerecord', '~> 4.0'
70
+ spec.add_development_dependency 'sqlite3'
71
+ end
data/images/README.md ADDED
@@ -0,0 +1,5 @@
1
+ # Images
2
+
3
+ This folder contains images that are being used in the project README to display code examples.
4
+
5
+ Do not change any names of files.
data/images/con0.png ADDED
Binary file
data/images/con1.png ADDED
Binary file
data/images/init0.png ADDED
Binary file
data/images/init1.png ADDED
Binary file
data/images/man0.png ADDED
Binary file
data/images/man1.png ADDED
Binary file
data/images/man2.png ADDED
Binary file
data/images/man3.png ADDED
Binary file
data/images/man4.png ADDED
Binary file
data/images/man5.png ADDED
Binary file
data/images/man6.png ADDED
Binary file
data/images/plot0.png ADDED
Binary file
data/lib/daru.rb CHANGED
@@ -38,10 +38,12 @@ module Daru
38
38
  attr_accessor :lazy_update
39
39
 
40
40
  def create_has_library(library)
41
- define_singleton_method("has_#{library}?") do
42
- cv = "@@#{library}"
41
+ lib_underscore = library.to_s.gsub(/-/, '_')
42
+ define_singleton_method("has_#{lib_underscore}?") do
43
+ cv = "@@#{lib_underscore}"
43
44
  unless class_variable_defined? cv
44
45
  begin
46
+ library = 'nmatrix/nmatrix' if library == :nmatrix
45
47
  require library.to_s
46
48
  class_variable_set(cv, true)
47
49
  rescue LoadError
@@ -56,6 +58,7 @@ module Daru
56
58
  create_has_library :gsl
57
59
  create_has_library :nmatrix
58
60
  create_has_library :nyaplot
61
+ create_has_library :'bloomfilter-rb'
59
62
  end
60
63
 
61
64
  autoload :Spreadsheet, 'spreadsheet'
@@ -18,7 +18,7 @@ module Daru
18
18
  @context = context
19
19
  vectors = names.map { |vec| context[vec].to_a }
20
20
  tuples = vectors[0].zip(*vectors[1..-1])
21
- keys = tuples.uniq.sort
21
+ keys = tuples.uniq.sort { |a,b| a && b ? a.compact <=> b.compact : a ? 1 : -1 }
22
22
 
23
23
  keys.each do |key|
24
24
  @groups[key] = all_indices_for(tuples, key)
@@ -28,7 +28,7 @@ module Daru
28
28
 
29
29
  # Get a Daru::Vector of the size of each group.
30
30
  def size
31
- index =
31
+ index =
32
32
  if multi_indexed_grouping?
33
33
  Daru::MultiIndex.from_tuples @groups.keys
34
34
  else
@@ -59,15 +59,15 @@ module Daru
59
59
  # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
60
60
  # })
61
61
  # df.group_by([:a, :b]).head(1)
62
- # # =>
62
+ # # =>
63
63
  # # #<Daru::DataFrame:82745170 @name = d7003f75-5eb9-4967-9303-c08dd9160224 @size = 6>
64
- # # a b c d
65
- # # 1 bar one 2 22
66
- # # 3 bar three 1 44
67
- # # 5 bar two 6 66
68
- # # 0 foo one 1 11
69
- # # 7 foo three 8 88
70
- # # 2 foo two 3 33
64
+ # # a b c d
65
+ # # 1 bar one 2 22
66
+ # # 3 bar three 1 44
67
+ # # 5 bar two 6 66
68
+ # # 0 foo one 1 11
69
+ # # 7 foo three 8 88
70
+ # # 2 foo two 3 33
71
71
  def head quantity=5
72
72
  select_groups_from :first, quantity
73
73
  end
@@ -82,14 +82,14 @@ module Daru
82
82
  # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
83
83
  # })
84
84
  # # df.group_by([:a, :b]).tail(1)
85
- # # =>
85
+ # # =>
86
86
  # # #<Daru::DataFrame:82378270 @name = 0623db46-5425-41bd-a843-99baac3d1d9a @size = 6>
87
- # # a b c d
88
- # # 1 bar one 2 22
89
- # # 3 bar three 1 44
90
- # # 5 bar two 6 66
91
- # # 6 foo one 3 77
92
- # # 7 foo three 8 88
87
+ # # a b c d
88
+ # # 1 bar one 2 22
89
+ # # 3 bar three 1 44
90
+ # # 5 bar two 6 66
91
+ # # 6 foo one 3 77
92
+ # # 7 foo three 8 88
93
93
  # # 4 foo two 3 55
94
94
  def tail quantity=5
95
95
  select_groups_from :last, quantity
@@ -103,15 +103,15 @@ module Daru
103
103
  # c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
104
104
  # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
105
105
  # df.group_by([:a, :b]).mean
106
- # # =>
106
+ # # =>
107
107
  # # #<Daru::DataFrame:81097450 @name = 0c32983f-3e06-451f-a9c9-051cadfe7371 @size = 6>
108
- # # c d
109
- # # ["bar", "one"] 2 22
110
- # # ["bar", "three"] 1 44
111
- # # ["bar", "two"] 6 66
112
- # # ["foo", "one"] 2.0 44.0
113
- # # ["foo", "three"] 8 88
114
- # # ["foo", "two"] 3.0 44.0
108
+ # # c d
109
+ # # ["bar", "one"] 2 22
110
+ # # ["bar", "three"] 1 44
111
+ # # ["bar", "two"] 6 66
112
+ # # ["foo", "one"] 2.0 44.0
113
+ # # ["foo", "three"] 8 88
114
+ # # ["foo", "two"] 3.0 44.0
115
115
  def mean
116
116
  apply_method :numeric, :mean
117
117
  end
@@ -128,28 +128,28 @@ module Daru
128
128
 
129
129
  # Count groups, excludes missing values.
130
130
  # @example Using count
131
- # df = Daru::DataFrame.new({
132
- # a: %w{foo bar foo bar foo bar foo foo},
133
- # b: %w{one one two three two two one three},
134
- # c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
135
- # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
136
- # })
131
+ # df = Daru::DataFrame.new({
132
+ # a: %w{foo bar foo bar foo bar foo foo},
133
+ # b: %w{one one two three two two one three},
134
+ # c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
135
+ # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
136
+ # })
137
137
  # df.group_by([:a, :b]).count
138
- # # =>
138
+ # # =>
139
139
  # # #<Daru::DataFrame:76900210 @name = 7b9cf55d-17f8-48c7-b03a-2586c6e5ec5a @size = 6>
140
- # # c d
141
- # # ["bar", "one"] 1 1
142
- # # ["bar", "two"] 1 1
143
- # # ["bar", "three"] 1 1
144
- # # ["foo", "one"] 2 2
145
- # # ["foo", "three"] 1 1
146
- # # ["foo", "two"] 2 2
140
+ # # c d
141
+ # # ["bar", "one"] 1 1
142
+ # # ["bar", "two"] 1 1
143
+ # # ["bar", "three"] 1 1
144
+ # # ["foo", "one"] 2 2
145
+ # # ["foo", "three"] 1 1
146
+ # # ["foo", "two"] 2 2
147
147
  def count
148
148
  width = @non_group_vectors.size
149
149
  Daru::DataFrame.new([size]*width, order: @non_group_vectors)
150
150
  end
151
151
 
152
- # Calculate sample standard deviation of numeric vector groups, excluding
152
+ # Calculate sample standard deviation of numeric vector groups, excluding
153
153
  # missing values.
154
154
  def std
155
155
  apply_method :numeric, :std
@@ -177,9 +177,9 @@ module Daru
177
177
  # d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
178
178
  # })
179
179
  # df.group_by([:a, :b]).get_group ['bar','two']
180
- # #=>
180
+ # #=>
181
181
  # ##<Daru::DataFrame:83258980 @name = 687ee3f6-8874-4899-97fa-9b31d84fa1d5 @size = 1>
182
- # # a b c d
182
+ # # a b c d
183
183
  # # 5 bar two 6 66
184
184
  def get_group group
185
185
  indexes = @groups[group]
@@ -198,7 +198,7 @@ module Daru
198
198
  rows, index: @context.index[indexes], order: @context.vectors)
199
199
  end
200
200
 
201
- private
201
+ private
202
202
 
203
203
  def select_groups_from method, quantity
204
204
  selection = @context
@@ -227,7 +227,7 @@ module Daru
227
227
  slice = vec[*indexes]
228
228
  single_row << (slice.is_a?(Numeric) ? slice : slice.send(method))
229
229
  end
230
- end
230
+ end
231
231
 
232
232
  rows << single_row
233
233
  end
@@ -260,4 +260,4 @@ module Daru
260
260
  end
261
261
  end
262
262
  end
263
- end
263
+ end