daru 0.0.2.3 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e425d7fab01db79087549701e3e9e95df2b495d1
4
- data.tar.gz: f3b186c9405b1fb14fb7b95a2ecde7a291decbb4
3
+ metadata.gz: 7bee85826c8bd5bb962982278d93ee62b7874d93
4
+ data.tar.gz: 7e9e66b3282f44888c3d018bfcfe09e3d03ae065
5
5
  SHA512:
6
- metadata.gz: fd89ad49623169a8caf335746094a38e2613864ec05f059488773285deaff87eaa5a8a68b6f60a2a12e37f5ad29cc800e8cdb7b0fba12eeb3a21b9bb3dbcca69
7
- data.tar.gz: 0b1458e1df11590ca858e9350503eef9a29420661ea92d1c73907ee47c33879637260f489bd3451aa9d7774d69e20b639e17ecdfac339c18f710c6ba25955908
6
+ metadata.gz: de3897032876c4ced80ca9b8ac741e369b09478adbcc0bec1001e38b7c1474e62b2b0d9d2f5d29802830c2540c460b0688b66124a1aac570699424f95c6884c2
7
+ data.tar.gz: edeb5ae0b7d9523a1fc72ae845e89145b7a462b09c89989f01e614620c27860ea2bb14b020d59d3e9f21851085ff66d4104fe243d4daa57bc0544fcc46b023ba
@@ -15,3 +15,13 @@
15
15
  * Vector objects passed into a DataFrame are now duplicated so that any changes dont affect the original vector.
16
16
  * Added an optional opts argument to DataFrame.
17
17
  * Sending more fields than vectors in DataFrame will cause addition of nil vectors.
18
+ * Init a DataFrame without having to convert explicitly to vectors.
19
+
20
+ == 0.0.2.4
21
+ * Initialize dataframe from an array which looks like [{a: 10, b: 20}, {a: 11, b: 12}]. Works for parsed JSON.
22
+ * Over-riding vectors in DataFrame will still preserve order.
23
+ * Any re-assignment of rows in #each_row and #each_row_with_index will reflect in the DataFrame.
24
+ * Added #to_a and #to_json to DataFrame.
25
+
26
+ == 0.0.3
27
+ * This release is a complete rewrite of the entire gem to accomodate index values.
data/README.md CHANGED
@@ -23,6 +23,8 @@ daru employs several data structures for storing and manipulating data:
23
23
 
24
24
  daru data structures can be constructed by using several Ruby classes. These include `Array`, `Hash`, `Matrix`, [NMatrix](https://github.com/SciRuby/nmatrix) and [MDArray](https://github.com/rbotafogo/mdarray). daru brings a uniform API for handling and manipulating data represented in any of the above Ruby classes.
25
25
 
26
+ Currently things work as expected for Arrays only. Rest will added over the next few weeks.
27
+
26
28
  ## Testing
27
29
 
28
30
  Install jruby using `rvm install jruby`, then run `jruby -S gem install mdarray`, followed by `bundle install`. You will need to install `mdarray` manually because of strange gemspec file behaviour. If anyone can automate this then I'd greatly appreciate it! Then run `rspec` in JRuby to test for MDArray functionality.
@@ -33,9 +35,7 @@ Then switch to MRI, do a normal `bundle install` followed by `rspec` for testing
33
35
 
34
36
  * Automate testing for both MRI and JRuby.
35
37
  * Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
36
- * Add support for missing values in vectors.
37
- * Destructive version #filter\_rows!
38
- * NMatrix.first should return NMatrix (in vector).
38
+ * Destructive map iterators for DataFrame and Vector.
39
39
  * Completely test all functionality for NMatrix and MDArray.
40
40
  * Basic Data manipulation and analysis operations:
41
41
  - Different kinds of join operations
@@ -43,12 +43,19 @@ Then switch to MRI, do a normal `bundle install` followed by `rspec` for testing
43
43
  - Creation of correlation, covariance matrices
44
44
  - Verification of data in a vector
45
45
  - Basic vector statistics - mean, median, variance, etc.
46
- * Add indexing on vectors.
47
- - Creation of vector by supplying an index-value hash.
48
- - Auto generation of real numbered indices for any vector.
49
- - Ability to separately specify index for each element of a vector.
50
- - Runtime alteration of index.
51
- * Indexing on DataFrame.
52
46
  * Vector arithmetic - elementwise addition, subtraction, multiplication, division.
53
47
  * Transpose a dataframe.
54
- * Option to express a DataFrame as an NMatrix or MDArray so as to use more efficient storage techniques.
48
+ * Option to express a DataFrame as an NMatrix or MDArray so as to use more efficient storage techniques.
49
+ * Assignment of a column to a single number should set the entire column to that number.
50
+ * == between daru_vector and string/number.
51
+ * Multiple column assignment with []=
52
+ * Creation of DataFrame from Array of Arrays.
53
+ * Multiple value assignment for vectors with []=.
54
+ * Load DataFrame from multiple sources (excel, SQL, etc.).
55
+ * Allow for boolean operations inside #[].
56
+ * Deletion of elements from Vector should only modify the index and leave the vector as it is so that compacting is not needed and things are faster.
57
+ * Add a #sync method which will sync the modified index with the unmodified vector.
58
+ * Ability to reorder the index of a dataframe.
59
+
60
+ Copyright (c) 2014, Sameer Deshmukh
61
+ All rights reserved
@@ -0,0 +1,5 @@
1
+ require 'rspec/core/rake_task'
2
+
3
+ RSpec::Core::RakeTask.new(:spec)
4
+
5
+ task :default => :spec
@@ -24,7 +24,9 @@ Gem::Specification.new do |spec|
24
24
  spec.require_paths = ["lib"]
25
25
 
26
26
  spec.add_development_dependency 'bundler'
27
+ spec.add_development_dependency 'rake'
27
28
  spec.add_development_dependency 'rspec'
29
+ spec.add_development_dependency 'awesome_print'
28
30
  if RUBY_ENGINE != 'jruby'
29
31
  spec.add_development_dependency 'nmatrix', '~> 0.1.0.rc5'
30
32
  end
@@ -1,7 +1,7 @@
1
1
  require 'securerandom'
2
- require 'matrix'
3
2
  require 'csv'
4
3
 
4
+ require 'daru/index.rb'
5
5
  require 'daru/vector.rb'
6
6
  require 'daru/dataframe.rb'
7
7
  require 'daru/monkeys.rb'
@@ -1,259 +1,539 @@
1
+ require_relative 'dataframe_by_row.rb'
2
+ require_relative 'dataframe_by_vector.rb'
3
+ require_relative 'io.rb'
4
+
1
5
  module Daru
2
6
  class DataFrame
3
7
 
4
- attr_reader :vectors
5
-
6
- attr_reader :fields
8
+ class << self
9
+ def from_csv path, opts={}, &block
10
+ Daru::IO.from_csv path, opts, &block
11
+ end
12
+ end
7
13
 
14
+ attr_reader :vectors
15
+ attr_reader :index
16
+ attr_reader :name
8
17
  attr_reader :size
9
18
 
10
- attr_reader :name
19
+ # DataFrame basically consists of an Array of Vector objects.
20
+ # These objects are indexed by row and column by vectors and index Index objects.
21
+ # Arguments - source, vectors, index, name in that order. Last 3 are optional.
22
+ def initialize source, *args
23
+ vectors = args.shift
24
+ index = args.shift
25
+ @name = args.shift || SecureRandom.uuid
11
26
 
12
- def initialize source, fields=[], name=SecureRandom.uuid, opts={}
13
- @opts = opts
14
- set_default_opts
27
+ @data = []
15
28
 
16
29
  if source.empty?
17
- @vectors = fields.inject({}){ |a,x| a[x]=Daru::Vector.new; a}
30
+ @vectors = Daru::Index.new vectors
31
+ @index = Daru::Index.new index
32
+
33
+ create_empty_vectors
18
34
  else
19
- @vectors = source.inject({}) do |acc, h|
20
- acc[h[0]] = h[1].dv.dup
21
- acc
35
+ case source
36
+ when Array
37
+ if vectors.nil?
38
+ @vectors = Daru::Index.new source[0].keys.map(&:to_sym)
39
+ else
40
+ @vectors = Daru::Index.new (vectors + (source[0].keys - vectors)).uniq.map(&:to_sym)
41
+ end
42
+
43
+ if index.nil?
44
+ @index = Daru::Index.new source.size
45
+ else
46
+ @index = Daru::Index.new index
47
+ end
48
+
49
+ @vectors.each do |name|
50
+ v = []
51
+
52
+ source.each do |hsh|
53
+ v << (hsh[name] || hsh[name.to_s])
54
+ end
55
+
56
+ @data << v.dv(name, @index)
57
+ end
58
+ when Hash
59
+ create_vectors_index_with vectors, source
60
+
61
+ if all_daru_vectors_in_source? source
62
+
63
+ if !index.nil?
64
+ @index = index.to_index
65
+ elsif all_vectors_have_equal_indexes? source
66
+ @index = source.values[0].index.dup
67
+ else
68
+ all_indexes = []
69
+
70
+ source.each_value do |vector|
71
+ all_indexes << vector.index.to_a
72
+ end
73
+ # sort only if missing indexes detected
74
+ all_indexes.flatten!.uniq!.sort!
75
+
76
+ @index = Daru::Index.new all_indexes
77
+ end
78
+
79
+ @vectors.each do |vector|
80
+ @data << Daru::Vector.new(vector, [], @index)
81
+
82
+ @index.each do |idx|
83
+ begin
84
+ @data[@vectors[vector]][idx] = source[vector][idx]
85
+ rescue IndexError
86
+ # If the index is not present in the vector under consideration
87
+ # (in source) then an error is raised. Put a nil in that place if
88
+ # that is the case.
89
+ @data[@vectors[vector]][idx] = nil
90
+ end
91
+ end
92
+ end
93
+ else
94
+ index = source.values[0].size if index.nil?
95
+
96
+ if index.is_a?(Daru::Index)
97
+ @index = index.to_index
98
+ else
99
+ @index = Daru::Index.new index
100
+ end
101
+
102
+ @vectors.each do |name|
103
+ @data << source[name].dup.dv(name, @index)
104
+ end
105
+ end
106
+
22
107
  end
23
108
  end
24
109
 
25
- @fields = fields.empty? ? source.keys.sort : fields
26
- @name = name
27
-
28
- check_length
29
- set_missing_vectors if @vectors.keys.size < @fields.size
30
- set_fields_order if @vectors.keys.sort != @fields.sort
31
- set_vector_names
110
+ set_size
111
+ validate
32
112
  end
33
113
 
34
- def self.from_csv file, opts={}
35
- opts[:col_sep] ||= ','
36
- opts[:headers] ||= true
37
- opts[:converters] ||= :numeric
38
- opts[:header_converters] ||= :symbol
39
-
40
- csv = CSV.open file, 'r', opts
114
+ def [](*names, axis)
115
+ if axis == :vector
116
+ access_vector *names
117
+ elsif axis == :row
118
+ access_row *names
119
+ else
120
+ raise IndexError, "Expected axis to be row or vector not #{axis}"
121
+ end
122
+ end
41
123
 
42
- yield csv if block_given?
124
+ def []=(name, axis ,vector)
125
+ if axis == :vector
126
+ insert_or_modify_vector name, vector
127
+ elsif axis == :row
128
+ insert_or_modify_row name, vector
129
+ else
130
+ raise IndexError, "Expected axis to be row or vector, not #{axis}."
131
+ end
132
+ end
43
133
 
44
- first = true
45
- df = nil
134
+ def vector
135
+ Daru::DataFrameByVector.new(self)
136
+ end
46
137
 
47
- csv.each do |row|
48
- if first
49
- df = Daru::DataFrame.new({}, csv.headers)
50
- first = false
51
- end
138
+ def row
139
+ Daru::DataFrameByRow.new(self)
140
+ end
52
141
 
53
- df.insert_row row
142
+ def dup
143
+ src = {}
144
+ @vectors.each do |vector|
145
+ src[vector] = @data[@vectors[vector]]
54
146
  end
55
147
 
56
- df
148
+ Daru::DataFrame.new src, @vectors.dup, @index.dup, @name
57
149
  end
58
150
 
59
- def column name
60
- @vectors[name]
61
- end
151
+ def each_vector(&block)
152
+ @data.each(&block)
62
153
 
63
- def delete_vector name
64
- @vectors.delete name
65
- @fields.delete name
154
+ self
66
155
  end
67
156
 
68
- alias_method :delete, :delete_vector
157
+ def each_vector_with_index(&block)
158
+ @vectors.each do |vector|
159
+ yield @data[@vectors[vector]], vector
160
+ end
69
161
 
70
- def delete_row index
71
- # TODO: Make this work with NMatrix and MDArray
72
- raise "Expected index less than size." if index > @size
162
+ self
163
+ end
73
164
 
74
- @fields.each do |field|
75
- @vectors[field].delete index
165
+ def each_row(&block)
166
+ @index.each do |index|
167
+ yield access_row(index)
76
168
  end
77
- puts @vectors
169
+
170
+ self
78
171
  end
79
172
 
80
- def filter_rows name=self.name, &block
81
- df = DataFrame.new({}, @fields, name)
173
+ def each_row_with_index(&block)
174
+ @index.each do |index|
175
+ yield access_row(index), index
176
+ end
82
177
 
83
- self.each_row do |row|
84
- keep_row = yield row
178
+ self
179
+ end
85
180
 
86
- df.insert_row(row.values) if keep_row
181
+ def map_vectors(&block)
182
+ df = self.dup
183
+
184
+ df.each_vector_with_index do |vector, name|
185
+ df[name, :vector] = yield(vector)
87
186
  end
88
187
 
89
188
  df
90
189
  end
91
190
 
92
- def [] *name
93
- unless name[1]
94
- return column(name[0])
95
- end
96
-
97
- h = {}
98
- req_fields = @fields & name
191
+ def map_vectors_with_index(&block)
192
+ df = self.dup
99
193
 
100
- req_fields.each do |f|
101
- h[f] = @vectors[f]
194
+ df.each_vector_with_index do |vector, name|
195
+ df[name, :vector] = yield(vector, name)
102
196
  end
103
197
 
104
- DataFrame.new h, req_fields, @name
198
+ df
105
199
  end
106
200
 
107
- def == other
108
- @name == other.name and @vectors == other.vectors and
109
- @size == other.size and @fields == other.fields
110
- end
201
+ def map_rows(&block)
202
+ df = self.dup
203
+
204
+ df.each_row_with_index do |row, index|
205
+ df[index, :row] = yield(row)
206
+ end
111
207
 
112
- def []= name, vector
113
- insert_vector name, vector
208
+ df
114
209
  end
115
210
 
116
- def row index
117
- raise Exception, "Expected index to be within bounds" if index > @size
211
+ def map_rows_with_index(&block)
212
+ df = self.dup
118
213
 
119
- row = {}
120
- self.each_vector do |column|
121
- row[column.name] = column[index]
214
+ df.each_row_with_index do |row, index|
215
+ df[index, :row] = yield(row, index)
122
216
  end
123
217
 
124
- row
218
+ df
125
219
  end
126
220
 
127
- def has_vector? vector
128
- !!@vectors[vector]
221
+ def delete_vector vector
222
+ if @vectors.include? vector
223
+ @data.delete_at @vectors[vector]
224
+ @vectors = Daru::Index.new @vectors.to_a - [vector]
225
+ else
226
+ raise IndexError, "Vector #{vector} does not exist."
227
+ end
129
228
  end
130
229
 
131
- def each_row(&block)
132
- 0.upto(@size-1) do |index|
133
- yield row(index)
134
- end
230
+ def delete_row index
231
+ idx = named_index_for index
135
232
 
136
- self
137
- end
233
+ if @index.include? idx
234
+ @index = (@index.to_a - [idx]).to_index
138
235
 
139
- def each_row_with_index(&block)
140
- 0.upto(@size-1) do |index|
141
- yield row(index), index
236
+ self.each_vector do |vector|
237
+ vector.delete_at idx
238
+ end
239
+ else
240
+ raise IndexError, "Index #{index} does not exist."
142
241
  end
143
242
 
144
- self
243
+ set_size
145
244
  end
146
245
 
147
- def each_vector(&block)
148
- @fields.each do |field|
149
- yield @vectors[field]
150
- end
246
+ def keep_row_if &block
247
+ @index.each do |index|
248
+ keep_row = yield access_row(index)
151
249
 
152
- self
250
+ delete_row index unless keep_row
251
+ end
153
252
  end
154
253
 
155
- def each_vector_with_name(&block)
156
- @fields.each do |field|
157
- yield @vectors[field], field
254
+ def keep_vector_if &block
255
+ @vectors.each do |vector|
256
+ keep_vector = yield @data[@vectors[vector]], vector
257
+
258
+ delete_vector vector unless keep_vector
158
259
  end
159
-
160
- self
161
260
  end
162
261
 
163
- def insert_vector name, vector
164
- raise Exeception, "Expected vector size to be same as DataFrame\
165
- size." if vector.size != self.size
166
-
167
- @vectors.merge({name => vector})
168
- @fields << name
262
+ def has_vector? name
263
+ !!@vectors[name]
169
264
  end
170
265
 
171
- def insert_row row
172
- raise Exception, "Expected new row to same as the number of rows \
173
- in the DataFrame" if row.size != @fields.size
266
+ # Converts the DataFrame into an array of hashes where key is vector name
267
+ # and value is the corresponding element.
268
+ # The 0th index of the array contains the array of hashes while the 1th
269
+ # index contains the indexes of each row of the dataframe. Each element in
270
+ # the index array corresponds to its row in the array of hashes, which has
271
+ # the same index.
272
+ def to_a
273
+ arry = [[],[]]
174
274
 
175
- @fields.each_with_index do |field, index|
176
- @vectors[field] << row[index]
275
+ self.each_row do |row|
276
+ arry[0] << row.to_hash
177
277
  end
178
278
 
179
- @size += 1
279
+ arry[1] = @index.to_a
280
+
281
+ arry
180
282
  end
181
283
 
182
- def to_html(threshold=15)
183
- html = '<table>'
284
+ def to_json no_index=true
285
+ if no_index
286
+ self.to_a[0].to_json
287
+ else
288
+ self.to_a.to_json
289
+ end
290
+ end
291
+
292
+ def to_html threshold=15
293
+ html = '<table><tr><th></th>'
294
+
295
+ @vectors.each { |vector| html += '<th>' + vector.to_s + '</th>' }
184
296
 
185
- html += '<tr>'
186
- @fields.each { |f| html.concat('<td>' + f.to_s + '</td>') }
187
297
  html += '</tr>'
188
298
 
189
- self.each_row_with_index do |row, index|
190
- break if index > threshold and index <= @size
299
+ @index.each_with_index do |index, num|
191
300
  html += '<tr>'
192
- row.each_value { |val| html.concat('<td>' + val.to_s + '</td>') }
301
+ html += '<td>' + index.to_s + '</td>'
302
+
303
+ self.row[index].each do |element|
304
+ html += '<td>' + element.to_s + '</td>'
305
+ end
193
306
  html += '</tr>'
194
- if index == threshold
307
+
308
+ if num > threshold
195
309
  html += '<tr>'
196
- row.size.times { html.concat('<td>...</td>') }
310
+ (@vectors + 1).size.times { html += '<td>...</td>' }
197
311
  html += '</tr>'
312
+ break
198
313
  end
199
314
  end
200
315
 
201
316
  html += '</table>'
317
+
318
+ html
202
319
  end
203
320
 
204
321
  def to_s
205
322
  to_html
206
323
  end
207
324
 
208
- def method_missing(name, *args)
325
+ def inspect spacing=10, threshold=15
326
+ longest = [@vectors.map(&:to_s).map(&:size).max,
327
+ @index .map(&:to_s).map(&:size).max,
328
+ @data .map{ |v| v.map(&:to_s).map(&:size).max }.max].max
329
+
330
+ name = @name || 'nil'
331
+ content = ""
332
+ longest = spacing if longest > spacing
333
+ formatter = "\n"
334
+
335
+ (@vectors.size + 1).times { formatter += "%#{longest}.#{longest}s " }
336
+
337
+ content += "\n#<" + self.class.to_s + ":" + self.object_id.to_s + " @name = " +
338
+ name.to_s + " @size = " + @size.to_s + ">"
339
+
340
+ content += sprintf formatter, "" , *@vectors.map(&:to_s)
341
+
342
+ row_num = 1
343
+
344
+ self.each_row_with_index do |row, index|
345
+ content += sprintf formatter, index.to_s, *row.to_hash.values.map { |e| (e || 'nil').to_s }
346
+
347
+ row_num += 1
348
+ if row_num > threshold
349
+ dots = []
350
+
351
+ (@vectors.size + 1).times { dots << "..." }
352
+ content += sprint formatter, *dots
353
+ break
354
+ end
355
+ end
356
+
357
+ content += "\n"
358
+
359
+ content
360
+ end
361
+
362
+ def == other
363
+ @index == other.index and @size == other.size and @vectors.all? { |vector|
364
+ self[vector, :vector] == other[vector, :vector] }
365
+ end
366
+
367
+ def method_missing(name, *args, &block)
209
368
  if md = name.match(/(.+)\=/)
210
- insert_vector name[/(.+)\=/].delete("="), args[0]
369
+ insert_or_modify_vector name[/(.+)\=/].delete("="), args[0]
211
370
  elsif self.has_vector? name
212
- column name
371
+ self[name, :vector]
213
372
  else
214
373
  super(name, *args)
215
374
  end
216
375
  end
217
376
 
218
377
  private
219
- def check_length
220
- size = nil
221
-
222
- @vectors.each_value do |vector|
223
- if size.nil?
224
- size = vector.size
225
- elsif size != vector.size
226
- raise Exception, "Expected all vectors to be of the same size. Vector \
227
- #{vector.name} is of size #{vector.size} and another one of size #{size}"
378
+
379
+ def access_vector *names
380
+ unless names[1]
381
+ if @vectors.include? names[0]
382
+ return @data[@vectors[names[0]]]
383
+ elsif @vectors.key names[0]
384
+ return @data[names[0]]
385
+ else
386
+ raise IndexError, "Specified index #{names[0]} does not exist."
387
+ end
388
+ end
389
+
390
+ new_vcs = {}
391
+
392
+ names.each do |name|
393
+ name = name.to_sym unless name.is_a?(Integer)
394
+
395
+ new_vcs[name] = @data[@vectors[name]]
396
+ end
397
+
398
+ Daru::DataFrame.new new_vcs, new_vcs.keys, @index, @name
399
+ end
400
+
401
+ def access_row *names
402
+ unless names[1]
403
+ row = []
404
+
405
+ @vectors.each do |vector|
406
+ row << @data[@vectors[vector]][names[0]]
407
+ end
408
+
409
+ if @index.include? names[0]
410
+ name = names[0]
411
+ elsif @index.key names[0]
412
+ name = @index.key names[0]
413
+ else
414
+ raise IndexError, "Specified row #{names[0]} does not exist."
415
+ end
416
+
417
+ Daru::Vector.new name, row, @vectors
418
+ else
419
+ # TODO: Access multiple rows
420
+ end
421
+ end
422
+
423
+ def insert_or_modify_vector name, vector
424
+ @vectors = @vectors.re_index(@vectors + name)
425
+
426
+ v = nil
427
+
428
+ if vector.is_a?(Daru::Vector)
429
+ v = Daru::Vector.new name, [], @index
430
+
431
+ @index.each do |idx|
432
+ begin
433
+ v[idx] = vector[idx]
434
+ rescue IndexError
435
+ v[idx] = nil
436
+ end
437
+ end
438
+ else
439
+ raise Exception, "Specified vector of length #{vector.size} cannot be inserted in DataFrame of size #{@size}" if
440
+ @size != vector.size
441
+
442
+ v = vector.dv(name, @index)
443
+ end
444
+
445
+ @data[@vectors[name]] = v
446
+ end
447
+
448
+ def insert_or_modify_row name, vector
449
+ if @index.include? name
450
+ v = vector.dv(name, @vectors)
451
+
452
+ @vectors.each do |vector|
453
+ begin
454
+ @data[@vectors[vector]][name] = v[vector]
455
+ rescue IndexError
456
+ @data[@vectors[vector]][name] = nil
457
+ end
458
+ end
459
+ else
460
+ @index = @index.re_index(@index + name)
461
+ v = vector.dv(name, @vectors)
462
+
463
+ @vectors.each do |vector|
464
+ begin
465
+ @data[@vectors[vector]].concat v[vector], name
466
+ rescue IndexError
467
+ @data[@vectors[vector]].concat nil, name
468
+ end
228
469
  end
229
470
  end
230
471
 
231
- @size = size
472
+ set_size
232
473
  end
233
474
 
234
- def set_fields_order # vectors more than specified fields
235
- @fields = @fields & @vectors.keys
236
- @fields += @vectors.keys.sort - @fields
475
+ def create_empty_vectors
476
+ @vectors.each do |name|
477
+ @data << Daru::Vector.new(name, [], @index)
478
+ end
479
+ end
480
+
481
+ def validate_labels
482
+ raise IndexError, "Expected equal number of vectors for number of Hash pairs" if
483
+ @vectors.size != @data.size
484
+
485
+ raise IndexError, "Expected number of indexes same as number of rows" if
486
+ @index.size != @data[0].size
487
+ end
488
+
489
+ def validate_vector_sizes
490
+ @data.each do |vector|
491
+ raise IndexError, "Expected vectors with equal length" if vector.size != @size
492
+ end
493
+ end
494
+
495
+ def validate
496
+ # TODO: [IMP] when vectors of different dimensions are specified, they should
497
+ # be inserted into the dataframe by inserting nils wherever necessary.
498
+ validate_labels
499
+ validate_vector_sizes
237
500
  end
238
501
 
239
- # Writes names specified in the hash to the actual name of the vector.
240
- # Will over-ride any previous name assigned to the vector.
241
- def set_vector_names
242
- @fields.each do |name|
243
- @vectors[name].name = name
502
+ def all_daru_vectors_in_source? source
503
+ source.values.all? do |vector|
504
+ vector.is_a?(Daru::Vector)
244
505
  end
245
506
  end
246
507
 
247
- def set_default_opts
248
- # Future proofing
508
+ def set_size
509
+ @size = @index.size
510
+ end
511
+
512
+ def named_index_for index
513
+ if @index.include? index
514
+ index
515
+ elsif @index.key index
516
+ @index.key index
517
+ else
518
+ raise IndexError, "Specified index #{index} does not exist."
519
+ end
520
+ end
521
+
522
+ def create_vectors_index_with vectors, source
523
+ vectors = source.keys.sort if vectors.nil?
524
+
525
+ if vectors.is_a?(Daru::Index)
526
+ @vectors = vectors.to_index
527
+ else
528
+ @vectors = Daru::Index.new (vectors + (source.keys - vectors)).uniq.map(&:to_sym)
529
+ end
249
530
  end
250
531
 
251
- def set_missing_vectors
252
- missing_fields = @fields - @vectors.keys
532
+ def all_vectors_have_equal_indexes? source
533
+ index = source.values[0].index
253
534
 
254
- missing_fields.each do |field|
255
- @vectors[field] = ([nil]*@size).dv
256
- @fields << field
535
+ source.all? do |name, vector|
536
+ index == vector.index
257
537
  end
258
538
  end
259
539
  end