daru 0.0.2.3 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/History.txt +10 -0
- data/README.md +17 -10
- data/Rakefile +5 -0
- data/daru.gemspec +2 -0
- data/lib/daru.rb +1 -1
- data/lib/daru/dataframe.rb +426 -146
- data/lib/daru/dataframe_by_row.rb +15 -0
- data/lib/daru/dataframe_by_vector.rb +15 -0
- data/lib/daru/index.rb +83 -0
- data/lib/daru/io.rb +30 -0
- data/lib/daru/monkeys.rb +18 -10
- data/lib/daru/vector.rb +178 -47
- data/lib/version.rb +1 -1
- data/spec/dataframe_spec.rb +550 -0
- data/spec/fixtures/countries.json +7794 -0
- data/spec/index_spec.rb +54 -0
- data/spec/io_spec.rb +49 -0
- data/spec/monkeys_spec.rb +6 -0
- data/spec/spec_helper.rb +10 -1
- data/spec/vector_spec.rb +155 -0
- metadata +47 -10
- data/spec/jruby/dataframe_spec.rb +0 -1
- data/spec/jruby/vector_spec.rb +0 -20
- data/spec/mri/dataframe_spec.rb +0 -139
- data/spec/mri/vector_spec.rb +0 -104
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7bee85826c8bd5bb962982278d93ee62b7874d93
|
4
|
+
data.tar.gz: 7e9e66b3282f44888c3d018bfcfe09e3d03ae065
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: de3897032876c4ced80ca9b8ac741e369b09478adbcc0bec1001e38b7c1474e62b2b0d9d2f5d29802830c2540c460b0688b66124a1aac570699424f95c6884c2
|
7
|
+
data.tar.gz: edeb5ae0b7d9523a1fc72ae845e89145b7a462b09c89989f01e614620c27860ea2bb14b020d59d3e9f21851085ff66d4104fe243d4daa57bc0544fcc46b023ba
|
data/History.txt
CHANGED
@@ -15,3 +15,13 @@
|
|
15
15
|
* Vector objects passed into a DataFrame are now duplicated so that any changes dont affect the original vector.
|
16
16
|
* Added an optional opts argument to DataFrame.
|
17
17
|
* Sending more fields than vectors in DataFrame will cause addition of nil vectors.
|
18
|
+
* Init a DataFrame without having to convert explicitly to vectors.
|
19
|
+
|
20
|
+
== 0.0.2.4
|
21
|
+
* Initialize dataframe from an array which looks like [{a: 10, b: 20}, {a: 11, b: 12}]. Works for parsed JSON.
|
22
|
+
* Over-riding vectors in DataFrame will still preserve order.
|
23
|
+
* Any re-assignment of rows in #each_row and #each_row_with_index will reflect in the DataFrame.
|
24
|
+
* Added #to_a and #to_json to DataFrame.
|
25
|
+
|
26
|
+
== 0.0.3
|
27
|
+
* This release is a complete rewrite of the entire gem to accomodate index values.
|
data/README.md
CHANGED
@@ -23,6 +23,8 @@ daru employs several data structures for storing and manipulating data:
|
|
23
23
|
|
24
24
|
daru data structures can be constructed by using several Ruby classes. These include `Array`, `Hash`, `Matrix`, [NMatrix](https://github.com/SciRuby/nmatrix) and [MDArray](https://github.com/rbotafogo/mdarray). daru brings a uniform API for handling and manipulating data represented in any of the above Ruby classes.
|
25
25
|
|
26
|
+
Currently things work as expected for Arrays only. Rest will added over the next few weeks.
|
27
|
+
|
26
28
|
## Testing
|
27
29
|
|
28
30
|
Install jruby using `rvm install jruby`, then run `jruby -S gem install mdarray`, followed by `bundle install`. You will need to install `mdarray` manually because of strange gemspec file behaviour. If anyone can automate this then I'd greatly appreciate it! Then run `rspec` in JRuby to test for MDArray functionality.
|
@@ -33,9 +35,7 @@ Then switch to MRI, do a normal `bundle install` followed by `rspec` for testing
|
|
33
35
|
|
34
36
|
* Automate testing for both MRI and JRuby.
|
35
37
|
* Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
|
36
|
-
*
|
37
|
-
* Destructive version #filter\_rows!
|
38
|
-
* NMatrix.first should return NMatrix (in vector).
|
38
|
+
* Destructive map iterators for DataFrame and Vector.
|
39
39
|
* Completely test all functionality for NMatrix and MDArray.
|
40
40
|
* Basic Data manipulation and analysis operations:
|
41
41
|
- Different kinds of join operations
|
@@ -43,12 +43,19 @@ Then switch to MRI, do a normal `bundle install` followed by `rspec` for testing
|
|
43
43
|
- Creation of correlation, covariance matrices
|
44
44
|
- Verification of data in a vector
|
45
45
|
- Basic vector statistics - mean, median, variance, etc.
|
46
|
-
* Add indexing on vectors.
|
47
|
-
- Creation of vector by supplying an index-value hash.
|
48
|
-
- Auto generation of real numbered indices for any vector.
|
49
|
-
- Ability to separately specify index for each element of a vector.
|
50
|
-
- Runtime alteration of index.
|
51
|
-
* Indexing on DataFrame.
|
52
46
|
* Vector arithmetic - elementwise addition, subtraction, multiplication, division.
|
53
47
|
* Transpose a dataframe.
|
54
|
-
* Option to express a DataFrame as an NMatrix or MDArray so as to use more efficient storage techniques.
|
48
|
+
* Option to express a DataFrame as an NMatrix or MDArray so as to use more efficient storage techniques.
|
49
|
+
* Assignment of a column to a single number should set the entire column to that number.
|
50
|
+
* == between daru_vector and string/number.
|
51
|
+
* Multiple column assignment with []=
|
52
|
+
* Creation of DataFrame from Array of Arrays.
|
53
|
+
* Multiple value assignment for vectors with []=.
|
54
|
+
* Load DataFrame from multiple sources (excel, SQL, etc.).
|
55
|
+
* Allow for boolean operations inside #[].
|
56
|
+
* Deletion of elements from Vector should only modify the index and leave the vector as it is so that compacting is not needed and things are faster.
|
57
|
+
* Add a #sync method which will sync the modified index with the unmodified vector.
|
58
|
+
* Ability to reorder the index of a dataframe.
|
59
|
+
|
60
|
+
Copyright (c) 2014, Sameer Deshmukh
|
61
|
+
All rights reserved
|
data/Rakefile
ADDED
data/daru.gemspec
CHANGED
@@ -24,7 +24,9 @@ Gem::Specification.new do |spec|
|
|
24
24
|
spec.require_paths = ["lib"]
|
25
25
|
|
26
26
|
spec.add_development_dependency 'bundler'
|
27
|
+
spec.add_development_dependency 'rake'
|
27
28
|
spec.add_development_dependency 'rspec'
|
29
|
+
spec.add_development_dependency 'awesome_print'
|
28
30
|
if RUBY_ENGINE != 'jruby'
|
29
31
|
spec.add_development_dependency 'nmatrix', '~> 0.1.0.rc5'
|
30
32
|
end
|
data/lib/daru.rb
CHANGED
data/lib/daru/dataframe.rb
CHANGED
@@ -1,259 +1,539 @@
|
|
1
|
+
require_relative 'dataframe_by_row.rb'
|
2
|
+
require_relative 'dataframe_by_vector.rb'
|
3
|
+
require_relative 'io.rb'
|
4
|
+
|
1
5
|
module Daru
|
2
6
|
class DataFrame
|
3
7
|
|
4
|
-
|
5
|
-
|
6
|
-
|
8
|
+
class << self
|
9
|
+
def from_csv path, opts={}, &block
|
10
|
+
Daru::IO.from_csv path, opts, &block
|
11
|
+
end
|
12
|
+
end
|
7
13
|
|
14
|
+
attr_reader :vectors
|
15
|
+
attr_reader :index
|
16
|
+
attr_reader :name
|
8
17
|
attr_reader :size
|
9
18
|
|
10
|
-
|
19
|
+
# DataFrame basically consists of an Array of Vector objects.
|
20
|
+
# These objects are indexed by row and column by vectors and index Index objects.
|
21
|
+
# Arguments - source, vectors, index, name in that order. Last 3 are optional.
|
22
|
+
def initialize source, *args
|
23
|
+
vectors = args.shift
|
24
|
+
index = args.shift
|
25
|
+
@name = args.shift || SecureRandom.uuid
|
11
26
|
|
12
|
-
|
13
|
-
@opts = opts
|
14
|
-
set_default_opts
|
27
|
+
@data = []
|
15
28
|
|
16
29
|
if source.empty?
|
17
|
-
@vectors =
|
30
|
+
@vectors = Daru::Index.new vectors
|
31
|
+
@index = Daru::Index.new index
|
32
|
+
|
33
|
+
create_empty_vectors
|
18
34
|
else
|
19
|
-
|
20
|
-
|
21
|
-
|
35
|
+
case source
|
36
|
+
when Array
|
37
|
+
if vectors.nil?
|
38
|
+
@vectors = Daru::Index.new source[0].keys.map(&:to_sym)
|
39
|
+
else
|
40
|
+
@vectors = Daru::Index.new (vectors + (source[0].keys - vectors)).uniq.map(&:to_sym)
|
41
|
+
end
|
42
|
+
|
43
|
+
if index.nil?
|
44
|
+
@index = Daru::Index.new source.size
|
45
|
+
else
|
46
|
+
@index = Daru::Index.new index
|
47
|
+
end
|
48
|
+
|
49
|
+
@vectors.each do |name|
|
50
|
+
v = []
|
51
|
+
|
52
|
+
source.each do |hsh|
|
53
|
+
v << (hsh[name] || hsh[name.to_s])
|
54
|
+
end
|
55
|
+
|
56
|
+
@data << v.dv(name, @index)
|
57
|
+
end
|
58
|
+
when Hash
|
59
|
+
create_vectors_index_with vectors, source
|
60
|
+
|
61
|
+
if all_daru_vectors_in_source? source
|
62
|
+
|
63
|
+
if !index.nil?
|
64
|
+
@index = index.to_index
|
65
|
+
elsif all_vectors_have_equal_indexes? source
|
66
|
+
@index = source.values[0].index.dup
|
67
|
+
else
|
68
|
+
all_indexes = []
|
69
|
+
|
70
|
+
source.each_value do |vector|
|
71
|
+
all_indexes << vector.index.to_a
|
72
|
+
end
|
73
|
+
# sort only if missing indexes detected
|
74
|
+
all_indexes.flatten!.uniq!.sort!
|
75
|
+
|
76
|
+
@index = Daru::Index.new all_indexes
|
77
|
+
end
|
78
|
+
|
79
|
+
@vectors.each do |vector|
|
80
|
+
@data << Daru::Vector.new(vector, [], @index)
|
81
|
+
|
82
|
+
@index.each do |idx|
|
83
|
+
begin
|
84
|
+
@data[@vectors[vector]][idx] = source[vector][idx]
|
85
|
+
rescue IndexError
|
86
|
+
# If the index is not present in the vector under consideration
|
87
|
+
# (in source) then an error is raised. Put a nil in that place if
|
88
|
+
# that is the case.
|
89
|
+
@data[@vectors[vector]][idx] = nil
|
90
|
+
end
|
91
|
+
end
|
92
|
+
end
|
93
|
+
else
|
94
|
+
index = source.values[0].size if index.nil?
|
95
|
+
|
96
|
+
if index.is_a?(Daru::Index)
|
97
|
+
@index = index.to_index
|
98
|
+
else
|
99
|
+
@index = Daru::Index.new index
|
100
|
+
end
|
101
|
+
|
102
|
+
@vectors.each do |name|
|
103
|
+
@data << source[name].dup.dv(name, @index)
|
104
|
+
end
|
105
|
+
end
|
106
|
+
|
22
107
|
end
|
23
108
|
end
|
24
109
|
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
check_length
|
29
|
-
set_missing_vectors if @vectors.keys.size < @fields.size
|
30
|
-
set_fields_order if @vectors.keys.sort != @fields.sort
|
31
|
-
set_vector_names
|
110
|
+
set_size
|
111
|
+
validate
|
32
112
|
end
|
33
113
|
|
34
|
-
def
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
114
|
+
def [](*names, axis)
|
115
|
+
if axis == :vector
|
116
|
+
access_vector *names
|
117
|
+
elsif axis == :row
|
118
|
+
access_row *names
|
119
|
+
else
|
120
|
+
raise IndexError, "Expected axis to be row or vector not #{axis}"
|
121
|
+
end
|
122
|
+
end
|
41
123
|
|
42
|
-
|
124
|
+
def []=(name, axis ,vector)
|
125
|
+
if axis == :vector
|
126
|
+
insert_or_modify_vector name, vector
|
127
|
+
elsif axis == :row
|
128
|
+
insert_or_modify_row name, vector
|
129
|
+
else
|
130
|
+
raise IndexError, "Expected axis to be row or vector, not #{axis}."
|
131
|
+
end
|
132
|
+
end
|
43
133
|
|
44
|
-
|
45
|
-
|
134
|
+
def vector
|
135
|
+
Daru::DataFrameByVector.new(self)
|
136
|
+
end
|
46
137
|
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
first = false
|
51
|
-
end
|
138
|
+
def row
|
139
|
+
Daru::DataFrameByRow.new(self)
|
140
|
+
end
|
52
141
|
|
53
|
-
|
142
|
+
def dup
|
143
|
+
src = {}
|
144
|
+
@vectors.each do |vector|
|
145
|
+
src[vector] = @data[@vectors[vector]]
|
54
146
|
end
|
55
147
|
|
56
|
-
|
148
|
+
Daru::DataFrame.new src, @vectors.dup, @index.dup, @name
|
57
149
|
end
|
58
150
|
|
59
|
-
def
|
60
|
-
@
|
61
|
-
end
|
151
|
+
def each_vector(&block)
|
152
|
+
@data.each(&block)
|
62
153
|
|
63
|
-
|
64
|
-
@vectors.delete name
|
65
|
-
@fields.delete name
|
154
|
+
self
|
66
155
|
end
|
67
156
|
|
68
|
-
|
157
|
+
def each_vector_with_index(&block)
|
158
|
+
@vectors.each do |vector|
|
159
|
+
yield @data[@vectors[vector]], vector
|
160
|
+
end
|
69
161
|
|
70
|
-
|
71
|
-
|
72
|
-
raise "Expected index less than size." if index > @size
|
162
|
+
self
|
163
|
+
end
|
73
164
|
|
74
|
-
|
75
|
-
|
165
|
+
def each_row(&block)
|
166
|
+
@index.each do |index|
|
167
|
+
yield access_row(index)
|
76
168
|
end
|
77
|
-
|
169
|
+
|
170
|
+
self
|
78
171
|
end
|
79
172
|
|
80
|
-
def
|
81
|
-
|
173
|
+
def each_row_with_index(&block)
|
174
|
+
@index.each do |index|
|
175
|
+
yield access_row(index), index
|
176
|
+
end
|
82
177
|
|
83
|
-
self
|
84
|
-
|
178
|
+
self
|
179
|
+
end
|
85
180
|
|
86
|
-
|
181
|
+
def map_vectors(&block)
|
182
|
+
df = self.dup
|
183
|
+
|
184
|
+
df.each_vector_with_index do |vector, name|
|
185
|
+
df[name, :vector] = yield(vector)
|
87
186
|
end
|
88
187
|
|
89
188
|
df
|
90
189
|
end
|
91
190
|
|
92
|
-
def
|
93
|
-
|
94
|
-
return column(name[0])
|
95
|
-
end
|
96
|
-
|
97
|
-
h = {}
|
98
|
-
req_fields = @fields & name
|
191
|
+
def map_vectors_with_index(&block)
|
192
|
+
df = self.dup
|
99
193
|
|
100
|
-
|
101
|
-
|
194
|
+
df.each_vector_with_index do |vector, name|
|
195
|
+
df[name, :vector] = yield(vector, name)
|
102
196
|
end
|
103
197
|
|
104
|
-
|
198
|
+
df
|
105
199
|
end
|
106
200
|
|
107
|
-
def
|
108
|
-
|
109
|
-
|
110
|
-
|
201
|
+
def map_rows(&block)
|
202
|
+
df = self.dup
|
203
|
+
|
204
|
+
df.each_row_with_index do |row, index|
|
205
|
+
df[index, :row] = yield(row)
|
206
|
+
end
|
111
207
|
|
112
|
-
|
113
|
-
insert_vector name, vector
|
208
|
+
df
|
114
209
|
end
|
115
210
|
|
116
|
-
def
|
117
|
-
|
211
|
+
def map_rows_with_index(&block)
|
212
|
+
df = self.dup
|
118
213
|
|
119
|
-
row
|
120
|
-
|
121
|
-
row[column.name] = column[index]
|
214
|
+
df.each_row_with_index do |row, index|
|
215
|
+
df[index, :row] = yield(row, index)
|
122
216
|
end
|
123
217
|
|
124
|
-
|
218
|
+
df
|
125
219
|
end
|
126
220
|
|
127
|
-
def
|
128
|
-
|
221
|
+
def delete_vector vector
|
222
|
+
if @vectors.include? vector
|
223
|
+
@data.delete_at @vectors[vector]
|
224
|
+
@vectors = Daru::Index.new @vectors.to_a - [vector]
|
225
|
+
else
|
226
|
+
raise IndexError, "Vector #{vector} does not exist."
|
227
|
+
end
|
129
228
|
end
|
130
229
|
|
131
|
-
def
|
132
|
-
|
133
|
-
yield row(index)
|
134
|
-
end
|
230
|
+
def delete_row index
|
231
|
+
idx = named_index_for index
|
135
232
|
|
136
|
-
|
137
|
-
|
233
|
+
if @index.include? idx
|
234
|
+
@index = (@index.to_a - [idx]).to_index
|
138
235
|
|
139
|
-
|
140
|
-
|
141
|
-
|
236
|
+
self.each_vector do |vector|
|
237
|
+
vector.delete_at idx
|
238
|
+
end
|
239
|
+
else
|
240
|
+
raise IndexError, "Index #{index} does not exist."
|
142
241
|
end
|
143
242
|
|
144
|
-
|
243
|
+
set_size
|
145
244
|
end
|
146
245
|
|
147
|
-
def
|
148
|
-
@
|
149
|
-
yield
|
150
|
-
end
|
246
|
+
def keep_row_if &block
|
247
|
+
@index.each do |index|
|
248
|
+
keep_row = yield access_row(index)
|
151
249
|
|
152
|
-
|
250
|
+
delete_row index unless keep_row
|
251
|
+
end
|
153
252
|
end
|
154
253
|
|
155
|
-
def
|
156
|
-
@
|
157
|
-
yield @vectors[
|
254
|
+
def keep_vector_if &block
|
255
|
+
@vectors.each do |vector|
|
256
|
+
keep_vector = yield @data[@vectors[vector]], vector
|
257
|
+
|
258
|
+
delete_vector vector unless keep_vector
|
158
259
|
end
|
159
|
-
|
160
|
-
self
|
161
260
|
end
|
162
261
|
|
163
|
-
def
|
164
|
-
|
165
|
-
size." if vector.size != self.size
|
166
|
-
|
167
|
-
@vectors.merge({name => vector})
|
168
|
-
@fields << name
|
262
|
+
def has_vector? name
|
263
|
+
!!@vectors[name]
|
169
264
|
end
|
170
265
|
|
171
|
-
|
172
|
-
|
173
|
-
|
266
|
+
# Converts the DataFrame into an array of hashes where key is vector name
|
267
|
+
# and value is the corresponding element.
|
268
|
+
# The 0th index of the array contains the array of hashes while the 1th
|
269
|
+
# index contains the indexes of each row of the dataframe. Each element in
|
270
|
+
# the index array corresponds to its row in the array of hashes, which has
|
271
|
+
# the same index.
|
272
|
+
def to_a
|
273
|
+
arry = [[],[]]
|
174
274
|
|
175
|
-
|
176
|
-
|
275
|
+
self.each_row do |row|
|
276
|
+
arry[0] << row.to_hash
|
177
277
|
end
|
178
278
|
|
179
|
-
|
279
|
+
arry[1] = @index.to_a
|
280
|
+
|
281
|
+
arry
|
180
282
|
end
|
181
283
|
|
182
|
-
def
|
183
|
-
|
284
|
+
def to_json no_index=true
|
285
|
+
if no_index
|
286
|
+
self.to_a[0].to_json
|
287
|
+
else
|
288
|
+
self.to_a.to_json
|
289
|
+
end
|
290
|
+
end
|
291
|
+
|
292
|
+
def to_html threshold=15
|
293
|
+
html = '<table><tr><th></th>'
|
294
|
+
|
295
|
+
@vectors.each { |vector| html += '<th>' + vector.to_s + '</th>' }
|
184
296
|
|
185
|
-
html += '<tr>'
|
186
|
-
@fields.each { |f| html.concat('<td>' + f.to_s + '</td>') }
|
187
297
|
html += '</tr>'
|
188
298
|
|
189
|
-
|
190
|
-
break if index > threshold and index <= @size
|
299
|
+
@index.each_with_index do |index, num|
|
191
300
|
html += '<tr>'
|
192
|
-
|
301
|
+
html += '<td>' + index.to_s + '</td>'
|
302
|
+
|
303
|
+
self.row[index].each do |element|
|
304
|
+
html += '<td>' + element.to_s + '</td>'
|
305
|
+
end
|
193
306
|
html += '</tr>'
|
194
|
-
|
307
|
+
|
308
|
+
if num > threshold
|
195
309
|
html += '<tr>'
|
196
|
-
|
310
|
+
(@vectors + 1).size.times { html += '<td>...</td>' }
|
197
311
|
html += '</tr>'
|
312
|
+
break
|
198
313
|
end
|
199
314
|
end
|
200
315
|
|
201
316
|
html += '</table>'
|
317
|
+
|
318
|
+
html
|
202
319
|
end
|
203
320
|
|
204
321
|
def to_s
|
205
322
|
to_html
|
206
323
|
end
|
207
324
|
|
208
|
-
def
|
325
|
+
def inspect spacing=10, threshold=15
|
326
|
+
longest = [@vectors.map(&:to_s).map(&:size).max,
|
327
|
+
@index .map(&:to_s).map(&:size).max,
|
328
|
+
@data .map{ |v| v.map(&:to_s).map(&:size).max }.max].max
|
329
|
+
|
330
|
+
name = @name || 'nil'
|
331
|
+
content = ""
|
332
|
+
longest = spacing if longest > spacing
|
333
|
+
formatter = "\n"
|
334
|
+
|
335
|
+
(@vectors.size + 1).times { formatter += "%#{longest}.#{longest}s " }
|
336
|
+
|
337
|
+
content += "\n#<" + self.class.to_s + ":" + self.object_id.to_s + " @name = " +
|
338
|
+
name.to_s + " @size = " + @size.to_s + ">"
|
339
|
+
|
340
|
+
content += sprintf formatter, "" , *@vectors.map(&:to_s)
|
341
|
+
|
342
|
+
row_num = 1
|
343
|
+
|
344
|
+
self.each_row_with_index do |row, index|
|
345
|
+
content += sprintf formatter, index.to_s, *row.to_hash.values.map { |e| (e || 'nil').to_s }
|
346
|
+
|
347
|
+
row_num += 1
|
348
|
+
if row_num > threshold
|
349
|
+
dots = []
|
350
|
+
|
351
|
+
(@vectors.size + 1).times { dots << "..." }
|
352
|
+
content += sprint formatter, *dots
|
353
|
+
break
|
354
|
+
end
|
355
|
+
end
|
356
|
+
|
357
|
+
content += "\n"
|
358
|
+
|
359
|
+
content
|
360
|
+
end
|
361
|
+
|
362
|
+
def == other
|
363
|
+
@index == other.index and @size == other.size and @vectors.all? { |vector|
|
364
|
+
self[vector, :vector] == other[vector, :vector] }
|
365
|
+
end
|
366
|
+
|
367
|
+
def method_missing(name, *args, &block)
|
209
368
|
if md = name.match(/(.+)\=/)
|
210
|
-
|
369
|
+
insert_or_modify_vector name[/(.+)\=/].delete("="), args[0]
|
211
370
|
elsif self.has_vector? name
|
212
|
-
|
371
|
+
self[name, :vector]
|
213
372
|
else
|
214
373
|
super(name, *args)
|
215
374
|
end
|
216
375
|
end
|
217
376
|
|
218
377
|
private
|
219
|
-
|
220
|
-
|
221
|
-
|
222
|
-
|
223
|
-
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
378
|
+
|
379
|
+
def access_vector *names
|
380
|
+
unless names[1]
|
381
|
+
if @vectors.include? names[0]
|
382
|
+
return @data[@vectors[names[0]]]
|
383
|
+
elsif @vectors.key names[0]
|
384
|
+
return @data[names[0]]
|
385
|
+
else
|
386
|
+
raise IndexError, "Specified index #{names[0]} does not exist."
|
387
|
+
end
|
388
|
+
end
|
389
|
+
|
390
|
+
new_vcs = {}
|
391
|
+
|
392
|
+
names.each do |name|
|
393
|
+
name = name.to_sym unless name.is_a?(Integer)
|
394
|
+
|
395
|
+
new_vcs[name] = @data[@vectors[name]]
|
396
|
+
end
|
397
|
+
|
398
|
+
Daru::DataFrame.new new_vcs, new_vcs.keys, @index, @name
|
399
|
+
end
|
400
|
+
|
401
|
+
def access_row *names
|
402
|
+
unless names[1]
|
403
|
+
row = []
|
404
|
+
|
405
|
+
@vectors.each do |vector|
|
406
|
+
row << @data[@vectors[vector]][names[0]]
|
407
|
+
end
|
408
|
+
|
409
|
+
if @index.include? names[0]
|
410
|
+
name = names[0]
|
411
|
+
elsif @index.key names[0]
|
412
|
+
name = @index.key names[0]
|
413
|
+
else
|
414
|
+
raise IndexError, "Specified row #{names[0]} does not exist."
|
415
|
+
end
|
416
|
+
|
417
|
+
Daru::Vector.new name, row, @vectors
|
418
|
+
else
|
419
|
+
# TODO: Access multiple rows
|
420
|
+
end
|
421
|
+
end
|
422
|
+
|
423
|
+
def insert_or_modify_vector name, vector
|
424
|
+
@vectors = @vectors.re_index(@vectors + name)
|
425
|
+
|
426
|
+
v = nil
|
427
|
+
|
428
|
+
if vector.is_a?(Daru::Vector)
|
429
|
+
v = Daru::Vector.new name, [], @index
|
430
|
+
|
431
|
+
@index.each do |idx|
|
432
|
+
begin
|
433
|
+
v[idx] = vector[idx]
|
434
|
+
rescue IndexError
|
435
|
+
v[idx] = nil
|
436
|
+
end
|
437
|
+
end
|
438
|
+
else
|
439
|
+
raise Exception, "Specified vector of length #{vector.size} cannot be inserted in DataFrame of size #{@size}" if
|
440
|
+
@size != vector.size
|
441
|
+
|
442
|
+
v = vector.dv(name, @index)
|
443
|
+
end
|
444
|
+
|
445
|
+
@data[@vectors[name]] = v
|
446
|
+
end
|
447
|
+
|
448
|
+
def insert_or_modify_row name, vector
|
449
|
+
if @index.include? name
|
450
|
+
v = vector.dv(name, @vectors)
|
451
|
+
|
452
|
+
@vectors.each do |vector|
|
453
|
+
begin
|
454
|
+
@data[@vectors[vector]][name] = v[vector]
|
455
|
+
rescue IndexError
|
456
|
+
@data[@vectors[vector]][name] = nil
|
457
|
+
end
|
458
|
+
end
|
459
|
+
else
|
460
|
+
@index = @index.re_index(@index + name)
|
461
|
+
v = vector.dv(name, @vectors)
|
462
|
+
|
463
|
+
@vectors.each do |vector|
|
464
|
+
begin
|
465
|
+
@data[@vectors[vector]].concat v[vector], name
|
466
|
+
rescue IndexError
|
467
|
+
@data[@vectors[vector]].concat nil, name
|
468
|
+
end
|
228
469
|
end
|
229
470
|
end
|
230
471
|
|
231
|
-
|
472
|
+
set_size
|
232
473
|
end
|
233
474
|
|
234
|
-
def
|
235
|
-
@
|
236
|
-
|
475
|
+
def create_empty_vectors
|
476
|
+
@vectors.each do |name|
|
477
|
+
@data << Daru::Vector.new(name, [], @index)
|
478
|
+
end
|
479
|
+
end
|
480
|
+
|
481
|
+
def validate_labels
|
482
|
+
raise IndexError, "Expected equal number of vectors for number of Hash pairs" if
|
483
|
+
@vectors.size != @data.size
|
484
|
+
|
485
|
+
raise IndexError, "Expected number of indexes same as number of rows" if
|
486
|
+
@index.size != @data[0].size
|
487
|
+
end
|
488
|
+
|
489
|
+
def validate_vector_sizes
|
490
|
+
@data.each do |vector|
|
491
|
+
raise IndexError, "Expected vectors with equal length" if vector.size != @size
|
492
|
+
end
|
493
|
+
end
|
494
|
+
|
495
|
+
def validate
|
496
|
+
# TODO: [IMP] when vectors of different dimensions are specified, they should
|
497
|
+
# be inserted into the dataframe by inserting nils wherever necessary.
|
498
|
+
validate_labels
|
499
|
+
validate_vector_sizes
|
237
500
|
end
|
238
501
|
|
239
|
-
|
240
|
-
|
241
|
-
|
242
|
-
@fields.each do |name|
|
243
|
-
@vectors[name].name = name
|
502
|
+
def all_daru_vectors_in_source? source
|
503
|
+
source.values.all? do |vector|
|
504
|
+
vector.is_a?(Daru::Vector)
|
244
505
|
end
|
245
506
|
end
|
246
507
|
|
247
|
-
def
|
248
|
-
|
508
|
+
def set_size
|
509
|
+
@size = @index.size
|
510
|
+
end
|
511
|
+
|
512
|
+
def named_index_for index
|
513
|
+
if @index.include? index
|
514
|
+
index
|
515
|
+
elsif @index.key index
|
516
|
+
@index.key index
|
517
|
+
else
|
518
|
+
raise IndexError, "Specified index #{index} does not exist."
|
519
|
+
end
|
520
|
+
end
|
521
|
+
|
522
|
+
def create_vectors_index_with vectors, source
|
523
|
+
vectors = source.keys.sort if vectors.nil?
|
524
|
+
|
525
|
+
if vectors.is_a?(Daru::Index)
|
526
|
+
@vectors = vectors.to_index
|
527
|
+
else
|
528
|
+
@vectors = Daru::Index.new (vectors + (source.keys - vectors)).uniq.map(&:to_sym)
|
529
|
+
end
|
249
530
|
end
|
250
531
|
|
251
|
-
def
|
252
|
-
|
532
|
+
def all_vectors_have_equal_indexes? source
|
533
|
+
index = source.values[0].index
|
253
534
|
|
254
|
-
|
255
|
-
|
256
|
-
@fields << field
|
535
|
+
source.all? do |name, vector|
|
536
|
+
index == vector.index
|
257
537
|
end
|
258
538
|
end
|
259
539
|
end
|