daru 0.0.2.3 → 0.0.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/History.txt +10 -0
- data/README.md +17 -10
- data/Rakefile +5 -0
- data/daru.gemspec +2 -0
- data/lib/daru.rb +1 -1
- data/lib/daru/dataframe.rb +426 -146
- data/lib/daru/dataframe_by_row.rb +15 -0
- data/lib/daru/dataframe_by_vector.rb +15 -0
- data/lib/daru/index.rb +83 -0
- data/lib/daru/io.rb +30 -0
- data/lib/daru/monkeys.rb +18 -10
- data/lib/daru/vector.rb +178 -47
- data/lib/version.rb +1 -1
- data/spec/dataframe_spec.rb +550 -0
- data/spec/fixtures/countries.json +7794 -0
- data/spec/index_spec.rb +54 -0
- data/spec/io_spec.rb +49 -0
- data/spec/monkeys_spec.rb +6 -0
- data/spec/spec_helper.rb +10 -1
- data/spec/vector_spec.rb +155 -0
- metadata +47 -10
- data/spec/jruby/dataframe_spec.rb +0 -1
- data/spec/jruby/vector_spec.rb +0 -20
- data/spec/mri/dataframe_spec.rb +0 -139
- data/spec/mri/vector_spec.rb +0 -104
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7bee85826c8bd5bb962982278d93ee62b7874d93
|
4
|
+
data.tar.gz: 7e9e66b3282f44888c3d018bfcfe09e3d03ae065
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: de3897032876c4ced80ca9b8ac741e369b09478adbcc0bec1001e38b7c1474e62b2b0d9d2f5d29802830c2540c460b0688b66124a1aac570699424f95c6884c2
|
7
|
+
data.tar.gz: edeb5ae0b7d9523a1fc72ae845e89145b7a462b09c89989f01e614620c27860ea2bb14b020d59d3e9f21851085ff66d4104fe243d4daa57bc0544fcc46b023ba
|
data/History.txt
CHANGED
@@ -15,3 +15,13 @@
|
|
15
15
|
* Vector objects passed into a DataFrame are now duplicated so that any changes dont affect the original vector.
|
16
16
|
* Added an optional opts argument to DataFrame.
|
17
17
|
* Sending more fields than vectors in DataFrame will cause addition of nil vectors.
|
18
|
+
* Init a DataFrame without having to convert explicitly to vectors.
|
19
|
+
|
20
|
+
== 0.0.2.4
|
21
|
+
* Initialize dataframe from an array which looks like [{a: 10, b: 20}, {a: 11, b: 12}]. Works for parsed JSON.
|
22
|
+
* Over-riding vectors in DataFrame will still preserve order.
|
23
|
+
* Any re-assignment of rows in #each_row and #each_row_with_index will reflect in the DataFrame.
|
24
|
+
* Added #to_a and #to_json to DataFrame.
|
25
|
+
|
26
|
+
== 0.0.3
|
27
|
+
* This release is a complete rewrite of the entire gem to accomodate index values.
|
data/README.md
CHANGED
@@ -23,6 +23,8 @@ daru employs several data structures for storing and manipulating data:
|
|
23
23
|
|
24
24
|
daru data structures can be constructed by using several Ruby classes. These include `Array`, `Hash`, `Matrix`, [NMatrix](https://github.com/SciRuby/nmatrix) and [MDArray](https://github.com/rbotafogo/mdarray). daru brings a uniform API for handling and manipulating data represented in any of the above Ruby classes.
|
25
25
|
|
26
|
+
Currently things work as expected for Arrays only. Rest will added over the next few weeks.
|
27
|
+
|
26
28
|
## Testing
|
27
29
|
|
28
30
|
Install jruby using `rvm install jruby`, then run `jruby -S gem install mdarray`, followed by `bundle install`. You will need to install `mdarray` manually because of strange gemspec file behaviour. If anyone can automate this then I'd greatly appreciate it! Then run `rspec` in JRuby to test for MDArray functionality.
|
@@ -33,9 +35,7 @@ Then switch to MRI, do a normal `bundle install` followed by `rspec` for testing
|
|
33
35
|
|
34
36
|
* Automate testing for both MRI and JRuby.
|
35
37
|
* Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
|
36
|
-
*
|
37
|
-
* Destructive version #filter\_rows!
|
38
|
-
* NMatrix.first should return NMatrix (in vector).
|
38
|
+
* Destructive map iterators for DataFrame and Vector.
|
39
39
|
* Completely test all functionality for NMatrix and MDArray.
|
40
40
|
* Basic Data manipulation and analysis operations:
|
41
41
|
- Different kinds of join operations
|
@@ -43,12 +43,19 @@ Then switch to MRI, do a normal `bundle install` followed by `rspec` for testing
|
|
43
43
|
- Creation of correlation, covariance matrices
|
44
44
|
- Verification of data in a vector
|
45
45
|
- Basic vector statistics - mean, median, variance, etc.
|
46
|
-
* Add indexing on vectors.
|
47
|
-
- Creation of vector by supplying an index-value hash.
|
48
|
-
- Auto generation of real numbered indices for any vector.
|
49
|
-
- Ability to separately specify index for each element of a vector.
|
50
|
-
- Runtime alteration of index.
|
51
|
-
* Indexing on DataFrame.
|
52
46
|
* Vector arithmetic - elementwise addition, subtraction, multiplication, division.
|
53
47
|
* Transpose a dataframe.
|
54
|
-
* Option to express a DataFrame as an NMatrix or MDArray so as to use more efficient storage techniques.
|
48
|
+
* Option to express a DataFrame as an NMatrix or MDArray so as to use more efficient storage techniques.
|
49
|
+
* Assignment of a column to a single number should set the entire column to that number.
|
50
|
+
* == between daru_vector and string/number.
|
51
|
+
* Multiple column assignment with []=
|
52
|
+
* Creation of DataFrame from Array of Arrays.
|
53
|
+
* Multiple value assignment for vectors with []=.
|
54
|
+
* Load DataFrame from multiple sources (excel, SQL, etc.).
|
55
|
+
* Allow for boolean operations inside #[].
|
56
|
+
* Deletion of elements from Vector should only modify the index and leave the vector as it is so that compacting is not needed and things are faster.
|
57
|
+
* Add a #sync method which will sync the modified index with the unmodified vector.
|
58
|
+
* Ability to reorder the index of a dataframe.
|
59
|
+
|
60
|
+
Copyright (c) 2014, Sameer Deshmukh
|
61
|
+
All rights reserved
|
data/Rakefile
ADDED
data/daru.gemspec
CHANGED
@@ -24,7 +24,9 @@ Gem::Specification.new do |spec|
|
|
24
24
|
spec.require_paths = ["lib"]
|
25
25
|
|
26
26
|
spec.add_development_dependency 'bundler'
|
27
|
+
spec.add_development_dependency 'rake'
|
27
28
|
spec.add_development_dependency 'rspec'
|
29
|
+
spec.add_development_dependency 'awesome_print'
|
28
30
|
if RUBY_ENGINE != 'jruby'
|
29
31
|
spec.add_development_dependency 'nmatrix', '~> 0.1.0.rc5'
|
30
32
|
end
|
data/lib/daru.rb
CHANGED
data/lib/daru/dataframe.rb
CHANGED
@@ -1,259 +1,539 @@
|
|
1
|
+
require_relative 'dataframe_by_row.rb'
|
2
|
+
require_relative 'dataframe_by_vector.rb'
|
3
|
+
require_relative 'io.rb'
|
4
|
+
|
1
5
|
module Daru
|
2
6
|
class DataFrame
|
3
7
|
|
4
|
-
|
5
|
-
|
6
|
-
|
8
|
+
class << self
|
9
|
+
def from_csv path, opts={}, &block
|
10
|
+
Daru::IO.from_csv path, opts, &block
|
11
|
+
end
|
12
|
+
end
|
7
13
|
|
14
|
+
attr_reader :vectors
|
15
|
+
attr_reader :index
|
16
|
+
attr_reader :name
|
8
17
|
attr_reader :size
|
9
18
|
|
10
|
-
|
19
|
+
# DataFrame basically consists of an Array of Vector objects.
|
20
|
+
# These objects are indexed by row and column by vectors and index Index objects.
|
21
|
+
# Arguments - source, vectors, index, name in that order. Last 3 are optional.
|
22
|
+
def initialize source, *args
|
23
|
+
vectors = args.shift
|
24
|
+
index = args.shift
|
25
|
+
@name = args.shift || SecureRandom.uuid
|
11
26
|
|
12
|
-
|
13
|
-
@opts = opts
|
14
|
-
set_default_opts
|
27
|
+
@data = []
|
15
28
|
|
16
29
|
if source.empty?
|
17
|
-
@vectors =
|
30
|
+
@vectors = Daru::Index.new vectors
|
31
|
+
@index = Daru::Index.new index
|
32
|
+
|
33
|
+
create_empty_vectors
|
18
34
|
else
|
19
|
-
|
20
|
-
|
21
|
-
|
35
|
+
case source
|
36
|
+
when Array
|
37
|
+
if vectors.nil?
|
38
|
+
@vectors = Daru::Index.new source[0].keys.map(&:to_sym)
|
39
|
+
else
|
40
|
+
@vectors = Daru::Index.new (vectors + (source[0].keys - vectors)).uniq.map(&:to_sym)
|
41
|
+
end
|
42
|
+
|
43
|
+
if index.nil?
|
44
|
+
@index = Daru::Index.new source.size
|
45
|
+
else
|
46
|
+
@index = Daru::Index.new index
|
47
|
+
end
|
48
|
+
|
49
|
+
@vectors.each do |name|
|
50
|
+
v = []
|
51
|
+
|
52
|
+
source.each do |hsh|
|
53
|
+
v << (hsh[name] || hsh[name.to_s])
|
54
|
+
end
|
55
|
+
|
56
|
+
@data << v.dv(name, @index)
|
57
|
+
end
|
58
|
+
when Hash
|
59
|
+
create_vectors_index_with vectors, source
|
60
|
+
|
61
|
+
if all_daru_vectors_in_source? source
|
62
|
+
|
63
|
+
if !index.nil?
|
64
|
+
@index = index.to_index
|
65
|
+
elsif all_vectors_have_equal_indexes? source
|
66
|
+
@index = source.values[0].index.dup
|
67
|
+
else
|
68
|
+
all_indexes = []
|
69
|
+
|
70
|
+
source.each_value do |vector|
|
71
|
+
all_indexes << vector.index.to_a
|
72
|
+
end
|
73
|
+
# sort only if missing indexes detected
|
74
|
+
all_indexes.flatten!.uniq!.sort!
|
75
|
+
|
76
|
+
@index = Daru::Index.new all_indexes
|
77
|
+
end
|
78
|
+
|
79
|
+
@vectors.each do |vector|
|
80
|
+
@data << Daru::Vector.new(vector, [], @index)
|
81
|
+
|
82
|
+
@index.each do |idx|
|
83
|
+
begin
|
84
|
+
@data[@vectors[vector]][idx] = source[vector][idx]
|
85
|
+
rescue IndexError
|
86
|
+
# If the index is not present in the vector under consideration
|
87
|
+
# (in source) then an error is raised. Put a nil in that place if
|
88
|
+
# that is the case.
|
89
|
+
@data[@vectors[vector]][idx] = nil
|
90
|
+
end
|
91
|
+
end
|
92
|
+
end
|
93
|
+
else
|
94
|
+
index = source.values[0].size if index.nil?
|
95
|
+
|
96
|
+
if index.is_a?(Daru::Index)
|
97
|
+
@index = index.to_index
|
98
|
+
else
|
99
|
+
@index = Daru::Index.new index
|
100
|
+
end
|
101
|
+
|
102
|
+
@vectors.each do |name|
|
103
|
+
@data << source[name].dup.dv(name, @index)
|
104
|
+
end
|
105
|
+
end
|
106
|
+
|
22
107
|
end
|
23
108
|
end
|
24
109
|
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
check_length
|
29
|
-
set_missing_vectors if @vectors.keys.size < @fields.size
|
30
|
-
set_fields_order if @vectors.keys.sort != @fields.sort
|
31
|
-
set_vector_names
|
110
|
+
set_size
|
111
|
+
validate
|
32
112
|
end
|
33
113
|
|
34
|
-
def
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
114
|
+
def [](*names, axis)
|
115
|
+
if axis == :vector
|
116
|
+
access_vector *names
|
117
|
+
elsif axis == :row
|
118
|
+
access_row *names
|
119
|
+
else
|
120
|
+
raise IndexError, "Expected axis to be row or vector not #{axis}"
|
121
|
+
end
|
122
|
+
end
|
41
123
|
|
42
|
-
|
124
|
+
def []=(name, axis ,vector)
|
125
|
+
if axis == :vector
|
126
|
+
insert_or_modify_vector name, vector
|
127
|
+
elsif axis == :row
|
128
|
+
insert_or_modify_row name, vector
|
129
|
+
else
|
130
|
+
raise IndexError, "Expected axis to be row or vector, not #{axis}."
|
131
|
+
end
|
132
|
+
end
|
43
133
|
|
44
|
-
|
45
|
-
|
134
|
+
def vector
|
135
|
+
Daru::DataFrameByVector.new(self)
|
136
|
+
end
|
46
137
|
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
first = false
|
51
|
-
end
|
138
|
+
def row
|
139
|
+
Daru::DataFrameByRow.new(self)
|
140
|
+
end
|
52
141
|
|
53
|
-
|
142
|
+
def dup
|
143
|
+
src = {}
|
144
|
+
@vectors.each do |vector|
|
145
|
+
src[vector] = @data[@vectors[vector]]
|
54
146
|
end
|
55
147
|
|
56
|
-
|
148
|
+
Daru::DataFrame.new src, @vectors.dup, @index.dup, @name
|
57
149
|
end
|
58
150
|
|
59
|
-
def
|
60
|
-
@
|
61
|
-
end
|
151
|
+
def each_vector(&block)
|
152
|
+
@data.each(&block)
|
62
153
|
|
63
|
-
|
64
|
-
@vectors.delete name
|
65
|
-
@fields.delete name
|
154
|
+
self
|
66
155
|
end
|
67
156
|
|
68
|
-
|
157
|
+
def each_vector_with_index(&block)
|
158
|
+
@vectors.each do |vector|
|
159
|
+
yield @data[@vectors[vector]], vector
|
160
|
+
end
|
69
161
|
|
70
|
-
|
71
|
-
|
72
|
-
raise "Expected index less than size." if index > @size
|
162
|
+
self
|
163
|
+
end
|
73
164
|
|
74
|
-
|
75
|
-
|
165
|
+
def each_row(&block)
|
166
|
+
@index.each do |index|
|
167
|
+
yield access_row(index)
|
76
168
|
end
|
77
|
-
|
169
|
+
|
170
|
+
self
|
78
171
|
end
|
79
172
|
|
80
|
-
def
|
81
|
-
|
173
|
+
def each_row_with_index(&block)
|
174
|
+
@index.each do |index|
|
175
|
+
yield access_row(index), index
|
176
|
+
end
|
82
177
|
|
83
|
-
self
|
84
|
-
|
178
|
+
self
|
179
|
+
end
|
85
180
|
|
86
|
-
|
181
|
+
def map_vectors(&block)
|
182
|
+
df = self.dup
|
183
|
+
|
184
|
+
df.each_vector_with_index do |vector, name|
|
185
|
+
df[name, :vector] = yield(vector)
|
87
186
|
end
|
88
187
|
|
89
188
|
df
|
90
189
|
end
|
91
190
|
|
92
|
-
def
|
93
|
-
|
94
|
-
return column(name[0])
|
95
|
-
end
|
96
|
-
|
97
|
-
h = {}
|
98
|
-
req_fields = @fields & name
|
191
|
+
def map_vectors_with_index(&block)
|
192
|
+
df = self.dup
|
99
193
|
|
100
|
-
|
101
|
-
|
194
|
+
df.each_vector_with_index do |vector, name|
|
195
|
+
df[name, :vector] = yield(vector, name)
|
102
196
|
end
|
103
197
|
|
104
|
-
|
198
|
+
df
|
105
199
|
end
|
106
200
|
|
107
|
-
def
|
108
|
-
|
109
|
-
|
110
|
-
|
201
|
+
def map_rows(&block)
|
202
|
+
df = self.dup
|
203
|
+
|
204
|
+
df.each_row_with_index do |row, index|
|
205
|
+
df[index, :row] = yield(row)
|
206
|
+
end
|
111
207
|
|
112
|
-
|
113
|
-
insert_vector name, vector
|
208
|
+
df
|
114
209
|
end
|
115
210
|
|
116
|
-
def
|
117
|
-
|
211
|
+
def map_rows_with_index(&block)
|
212
|
+
df = self.dup
|
118
213
|
|
119
|
-
row
|
120
|
-
|
121
|
-
row[column.name] = column[index]
|
214
|
+
df.each_row_with_index do |row, index|
|
215
|
+
df[index, :row] = yield(row, index)
|
122
216
|
end
|
123
217
|
|
124
|
-
|
218
|
+
df
|
125
219
|
end
|
126
220
|
|
127
|
-
def
|
128
|
-
|
221
|
+
def delete_vector vector
|
222
|
+
if @vectors.include? vector
|
223
|
+
@data.delete_at @vectors[vector]
|
224
|
+
@vectors = Daru::Index.new @vectors.to_a - [vector]
|
225
|
+
else
|
226
|
+
raise IndexError, "Vector #{vector} does not exist."
|
227
|
+
end
|
129
228
|
end
|
130
229
|
|
131
|
-
def
|
132
|
-
|
133
|
-
yield row(index)
|
134
|
-
end
|
230
|
+
def delete_row index
|
231
|
+
idx = named_index_for index
|
135
232
|
|
136
|
-
|
137
|
-
|
233
|
+
if @index.include? idx
|
234
|
+
@index = (@index.to_a - [idx]).to_index
|
138
235
|
|
139
|
-
|
140
|
-
|
141
|
-
|
236
|
+
self.each_vector do |vector|
|
237
|
+
vector.delete_at idx
|
238
|
+
end
|
239
|
+
else
|
240
|
+
raise IndexError, "Index #{index} does not exist."
|
142
241
|
end
|
143
242
|
|
144
|
-
|
243
|
+
set_size
|
145
244
|
end
|
146
245
|
|
147
|
-
def
|
148
|
-
@
|
149
|
-
yield
|
150
|
-
end
|
246
|
+
def keep_row_if &block
|
247
|
+
@index.each do |index|
|
248
|
+
keep_row = yield access_row(index)
|
151
249
|
|
152
|
-
|
250
|
+
delete_row index unless keep_row
|
251
|
+
end
|
153
252
|
end
|
154
253
|
|
155
|
-
def
|
156
|
-
@
|
157
|
-
yield @vectors[
|
254
|
+
def keep_vector_if &block
|
255
|
+
@vectors.each do |vector|
|
256
|
+
keep_vector = yield @data[@vectors[vector]], vector
|
257
|
+
|
258
|
+
delete_vector vector unless keep_vector
|
158
259
|
end
|
159
|
-
|
160
|
-
self
|
161
260
|
end
|
162
261
|
|
163
|
-
def
|
164
|
-
|
165
|
-
size." if vector.size != self.size
|
166
|
-
|
167
|
-
@vectors.merge({name => vector})
|
168
|
-
@fields << name
|
262
|
+
def has_vector? name
|
263
|
+
!!@vectors[name]
|
169
264
|
end
|
170
265
|
|
171
|
-
|
172
|
-
|
173
|
-
|
266
|
+
# Converts the DataFrame into an array of hashes where key is vector name
|
267
|
+
# and value is the corresponding element.
|
268
|
+
# The 0th index of the array contains the array of hashes while the 1th
|
269
|
+
# index contains the indexes of each row of the dataframe. Each element in
|
270
|
+
# the index array corresponds to its row in the array of hashes, which has
|
271
|
+
# the same index.
|
272
|
+
def to_a
|
273
|
+
arry = [[],[]]
|
174
274
|
|
175
|
-
|
176
|
-
|
275
|
+
self.each_row do |row|
|
276
|
+
arry[0] << row.to_hash
|
177
277
|
end
|
178
278
|
|
179
|
-
|
279
|
+
arry[1] = @index.to_a
|
280
|
+
|
281
|
+
arry
|
180
282
|
end
|
181
283
|
|
182
|
-
def
|
183
|
-
|
284
|
+
def to_json no_index=true
|
285
|
+
if no_index
|
286
|
+
self.to_a[0].to_json
|
287
|
+
else
|
288
|
+
self.to_a.to_json
|
289
|
+
end
|
290
|
+
end
|
291
|
+
|
292
|
+
def to_html threshold=15
|
293
|
+
html = '<table><tr><th></th>'
|
294
|
+
|
295
|
+
@vectors.each { |vector| html += '<th>' + vector.to_s + '</th>' }
|
184
296
|
|
185
|
-
html += '<tr>'
|
186
|
-
@fields.each { |f| html.concat('<td>' + f.to_s + '</td>') }
|
187
297
|
html += '</tr>'
|
188
298
|
|
189
|
-
|
190
|
-
break if index > threshold and index <= @size
|
299
|
+
@index.each_with_index do |index, num|
|
191
300
|
html += '<tr>'
|
192
|
-
|
301
|
+
html += '<td>' + index.to_s + '</td>'
|
302
|
+
|
303
|
+
self.row[index].each do |element|
|
304
|
+
html += '<td>' + element.to_s + '</td>'
|
305
|
+
end
|
193
306
|
html += '</tr>'
|
194
|
-
|
307
|
+
|
308
|
+
if num > threshold
|
195
309
|
html += '<tr>'
|
196
|
-
|
310
|
+
(@vectors + 1).size.times { html += '<td>...</td>' }
|
197
311
|
html += '</tr>'
|
312
|
+
break
|
198
313
|
end
|
199
314
|
end
|
200
315
|
|
201
316
|
html += '</table>'
|
317
|
+
|
318
|
+
html
|
202
319
|
end
|
203
320
|
|
204
321
|
def to_s
|
205
322
|
to_html
|
206
323
|
end
|
207
324
|
|
208
|
-
def
|
325
|
+
def inspect spacing=10, threshold=15
|
326
|
+
longest = [@vectors.map(&:to_s).map(&:size).max,
|
327
|
+
@index .map(&:to_s).map(&:size).max,
|
328
|
+
@data .map{ |v| v.map(&:to_s).map(&:size).max }.max].max
|
329
|
+
|
330
|
+
name = @name || 'nil'
|
331
|
+
content = ""
|
332
|
+
longest = spacing if longest > spacing
|
333
|
+
formatter = "\n"
|
334
|
+
|
335
|
+
(@vectors.size + 1).times { formatter += "%#{longest}.#{longest}s " }
|
336
|
+
|
337
|
+
content += "\n#<" + self.class.to_s + ":" + self.object_id.to_s + " @name = " +
|
338
|
+
name.to_s + " @size = " + @size.to_s + ">"
|
339
|
+
|
340
|
+
content += sprintf formatter, "" , *@vectors.map(&:to_s)
|
341
|
+
|
342
|
+
row_num = 1
|
343
|
+
|
344
|
+
self.each_row_with_index do |row, index|
|
345
|
+
content += sprintf formatter, index.to_s, *row.to_hash.values.map { |e| (e || 'nil').to_s }
|
346
|
+
|
347
|
+
row_num += 1
|
348
|
+
if row_num > threshold
|
349
|
+
dots = []
|
350
|
+
|
351
|
+
(@vectors.size + 1).times { dots << "..." }
|
352
|
+
content += sprint formatter, *dots
|
353
|
+
break
|
354
|
+
end
|
355
|
+
end
|
356
|
+
|
357
|
+
content += "\n"
|
358
|
+
|
359
|
+
content
|
360
|
+
end
|
361
|
+
|
362
|
+
def == other
|
363
|
+
@index == other.index and @size == other.size and @vectors.all? { |vector|
|
364
|
+
self[vector, :vector] == other[vector, :vector] }
|
365
|
+
end
|
366
|
+
|
367
|
+
def method_missing(name, *args, &block)
|
209
368
|
if md = name.match(/(.+)\=/)
|
210
|
-
|
369
|
+
insert_or_modify_vector name[/(.+)\=/].delete("="), args[0]
|
211
370
|
elsif self.has_vector? name
|
212
|
-
|
371
|
+
self[name, :vector]
|
213
372
|
else
|
214
373
|
super(name, *args)
|
215
374
|
end
|
216
375
|
end
|
217
376
|
|
218
377
|
private
|
219
|
-
|
220
|
-
|
221
|
-
|
222
|
-
|
223
|
-
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
378
|
+
|
379
|
+
def access_vector *names
|
380
|
+
unless names[1]
|
381
|
+
if @vectors.include? names[0]
|
382
|
+
return @data[@vectors[names[0]]]
|
383
|
+
elsif @vectors.key names[0]
|
384
|
+
return @data[names[0]]
|
385
|
+
else
|
386
|
+
raise IndexError, "Specified index #{names[0]} does not exist."
|
387
|
+
end
|
388
|
+
end
|
389
|
+
|
390
|
+
new_vcs = {}
|
391
|
+
|
392
|
+
names.each do |name|
|
393
|
+
name = name.to_sym unless name.is_a?(Integer)
|
394
|
+
|
395
|
+
new_vcs[name] = @data[@vectors[name]]
|
396
|
+
end
|
397
|
+
|
398
|
+
Daru::DataFrame.new new_vcs, new_vcs.keys, @index, @name
|
399
|
+
end
|
400
|
+
|
401
|
+
def access_row *names
|
402
|
+
unless names[1]
|
403
|
+
row = []
|
404
|
+
|
405
|
+
@vectors.each do |vector|
|
406
|
+
row << @data[@vectors[vector]][names[0]]
|
407
|
+
end
|
408
|
+
|
409
|
+
if @index.include? names[0]
|
410
|
+
name = names[0]
|
411
|
+
elsif @index.key names[0]
|
412
|
+
name = @index.key names[0]
|
413
|
+
else
|
414
|
+
raise IndexError, "Specified row #{names[0]} does not exist."
|
415
|
+
end
|
416
|
+
|
417
|
+
Daru::Vector.new name, row, @vectors
|
418
|
+
else
|
419
|
+
# TODO: Access multiple rows
|
420
|
+
end
|
421
|
+
end
|
422
|
+
|
423
|
+
def insert_or_modify_vector name, vector
|
424
|
+
@vectors = @vectors.re_index(@vectors + name)
|
425
|
+
|
426
|
+
v = nil
|
427
|
+
|
428
|
+
if vector.is_a?(Daru::Vector)
|
429
|
+
v = Daru::Vector.new name, [], @index
|
430
|
+
|
431
|
+
@index.each do |idx|
|
432
|
+
begin
|
433
|
+
v[idx] = vector[idx]
|
434
|
+
rescue IndexError
|
435
|
+
v[idx] = nil
|
436
|
+
end
|
437
|
+
end
|
438
|
+
else
|
439
|
+
raise Exception, "Specified vector of length #{vector.size} cannot be inserted in DataFrame of size #{@size}" if
|
440
|
+
@size != vector.size
|
441
|
+
|
442
|
+
v = vector.dv(name, @index)
|
443
|
+
end
|
444
|
+
|
445
|
+
@data[@vectors[name]] = v
|
446
|
+
end
|
447
|
+
|
448
|
+
def insert_or_modify_row name, vector
|
449
|
+
if @index.include? name
|
450
|
+
v = vector.dv(name, @vectors)
|
451
|
+
|
452
|
+
@vectors.each do |vector|
|
453
|
+
begin
|
454
|
+
@data[@vectors[vector]][name] = v[vector]
|
455
|
+
rescue IndexError
|
456
|
+
@data[@vectors[vector]][name] = nil
|
457
|
+
end
|
458
|
+
end
|
459
|
+
else
|
460
|
+
@index = @index.re_index(@index + name)
|
461
|
+
v = vector.dv(name, @vectors)
|
462
|
+
|
463
|
+
@vectors.each do |vector|
|
464
|
+
begin
|
465
|
+
@data[@vectors[vector]].concat v[vector], name
|
466
|
+
rescue IndexError
|
467
|
+
@data[@vectors[vector]].concat nil, name
|
468
|
+
end
|
228
469
|
end
|
229
470
|
end
|
230
471
|
|
231
|
-
|
472
|
+
set_size
|
232
473
|
end
|
233
474
|
|
234
|
-
def
|
235
|
-
@
|
236
|
-
|
475
|
+
def create_empty_vectors
|
476
|
+
@vectors.each do |name|
|
477
|
+
@data << Daru::Vector.new(name, [], @index)
|
478
|
+
end
|
479
|
+
end
|
480
|
+
|
481
|
+
def validate_labels
|
482
|
+
raise IndexError, "Expected equal number of vectors for number of Hash pairs" if
|
483
|
+
@vectors.size != @data.size
|
484
|
+
|
485
|
+
raise IndexError, "Expected number of indexes same as number of rows" if
|
486
|
+
@index.size != @data[0].size
|
487
|
+
end
|
488
|
+
|
489
|
+
def validate_vector_sizes
|
490
|
+
@data.each do |vector|
|
491
|
+
raise IndexError, "Expected vectors with equal length" if vector.size != @size
|
492
|
+
end
|
493
|
+
end
|
494
|
+
|
495
|
+
def validate
|
496
|
+
# TODO: [IMP] when vectors of different dimensions are specified, they should
|
497
|
+
# be inserted into the dataframe by inserting nils wherever necessary.
|
498
|
+
validate_labels
|
499
|
+
validate_vector_sizes
|
237
500
|
end
|
238
501
|
|
239
|
-
|
240
|
-
|
241
|
-
|
242
|
-
@fields.each do |name|
|
243
|
-
@vectors[name].name = name
|
502
|
+
def all_daru_vectors_in_source? source
|
503
|
+
source.values.all? do |vector|
|
504
|
+
vector.is_a?(Daru::Vector)
|
244
505
|
end
|
245
506
|
end
|
246
507
|
|
247
|
-
def
|
248
|
-
|
508
|
+
def set_size
|
509
|
+
@size = @index.size
|
510
|
+
end
|
511
|
+
|
512
|
+
def named_index_for index
|
513
|
+
if @index.include? index
|
514
|
+
index
|
515
|
+
elsif @index.key index
|
516
|
+
@index.key index
|
517
|
+
else
|
518
|
+
raise IndexError, "Specified index #{index} does not exist."
|
519
|
+
end
|
520
|
+
end
|
521
|
+
|
522
|
+
def create_vectors_index_with vectors, source
|
523
|
+
vectors = source.keys.sort if vectors.nil?
|
524
|
+
|
525
|
+
if vectors.is_a?(Daru::Index)
|
526
|
+
@vectors = vectors.to_index
|
527
|
+
else
|
528
|
+
@vectors = Daru::Index.new (vectors + (source.keys - vectors)).uniq.map(&:to_sym)
|
529
|
+
end
|
249
530
|
end
|
250
531
|
|
251
|
-
def
|
252
|
-
|
532
|
+
def all_vectors_have_equal_indexes? source
|
533
|
+
index = source.values[0].index
|
253
534
|
|
254
|
-
|
255
|
-
|
256
|
-
@fields << field
|
535
|
+
source.all? do |name, vector|
|
536
|
+
index == vector.index
|
257
537
|
end
|
258
538
|
end
|
259
539
|
end
|