daru 0.0.2.3 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e425d7fab01db79087549701e3e9e95df2b495d1
4
- data.tar.gz: f3b186c9405b1fb14fb7b95a2ecde7a291decbb4
3
+ metadata.gz: 7bee85826c8bd5bb962982278d93ee62b7874d93
4
+ data.tar.gz: 7e9e66b3282f44888c3d018bfcfe09e3d03ae065
5
5
  SHA512:
6
- metadata.gz: fd89ad49623169a8caf335746094a38e2613864ec05f059488773285deaff87eaa5a8a68b6f60a2a12e37f5ad29cc800e8cdb7b0fba12eeb3a21b9bb3dbcca69
7
- data.tar.gz: 0b1458e1df11590ca858e9350503eef9a29420661ea92d1c73907ee47c33879637260f489bd3451aa9d7774d69e20b639e17ecdfac339c18f710c6ba25955908
6
+ metadata.gz: de3897032876c4ced80ca9b8ac741e369b09478adbcc0bec1001e38b7c1474e62b2b0d9d2f5d29802830c2540c460b0688b66124a1aac570699424f95c6884c2
7
+ data.tar.gz: edeb5ae0b7d9523a1fc72ae845e89145b7a462b09c89989f01e614620c27860ea2bb14b020d59d3e9f21851085ff66d4104fe243d4daa57bc0544fcc46b023ba
@@ -15,3 +15,13 @@
15
15
  * Vector objects passed into a DataFrame are now duplicated so that any changes dont affect the original vector.
16
16
  * Added an optional opts argument to DataFrame.
17
17
  * Sending more fields than vectors in DataFrame will cause addition of nil vectors.
18
+ * Init a DataFrame without having to convert explicitly to vectors.
19
+
20
+ == 0.0.2.4
21
+ * Initialize dataframe from an array which looks like [{a: 10, b: 20}, {a: 11, b: 12}]. Works for parsed JSON.
22
+ * Over-riding vectors in DataFrame will still preserve order.
23
+ * Any re-assignment of rows in #each_row and #each_row_with_index will reflect in the DataFrame.
24
+ * Added #to_a and #to_json to DataFrame.
25
+
26
+ == 0.0.3
27
+ * This release is a complete rewrite of the entire gem to accomodate index values.
data/README.md CHANGED
@@ -23,6 +23,8 @@ daru employs several data structures for storing and manipulating data:
23
23
 
24
24
  daru data structures can be constructed by using several Ruby classes. These include `Array`, `Hash`, `Matrix`, [NMatrix](https://github.com/SciRuby/nmatrix) and [MDArray](https://github.com/rbotafogo/mdarray). daru brings a uniform API for handling and manipulating data represented in any of the above Ruby classes.
25
25
 
26
+ Currently things work as expected for Arrays only. Rest will added over the next few weeks.
27
+
26
28
  ## Testing
27
29
 
28
30
  Install jruby using `rvm install jruby`, then run `jruby -S gem install mdarray`, followed by `bundle install`. You will need to install `mdarray` manually because of strange gemspec file behaviour. If anyone can automate this then I'd greatly appreciate it! Then run `rspec` in JRuby to test for MDArray functionality.
@@ -33,9 +35,7 @@ Then switch to MRI, do a normal `bundle install` followed by `rspec` for testing
33
35
 
34
36
  * Automate testing for both MRI and JRuby.
35
37
  * Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
36
- * Add support for missing values in vectors.
37
- * Destructive version #filter\_rows!
38
- * NMatrix.first should return NMatrix (in vector).
38
+ * Destructive map iterators for DataFrame and Vector.
39
39
  * Completely test all functionality for NMatrix and MDArray.
40
40
  * Basic Data manipulation and analysis operations:
41
41
  - Different kinds of join operations
@@ -43,12 +43,19 @@ Then switch to MRI, do a normal `bundle install` followed by `rspec` for testing
43
43
  - Creation of correlation, covariance matrices
44
44
  - Verification of data in a vector
45
45
  - Basic vector statistics - mean, median, variance, etc.
46
- * Add indexing on vectors.
47
- - Creation of vector by supplying an index-value hash.
48
- - Auto generation of real numbered indices for any vector.
49
- - Ability to separately specify index for each element of a vector.
50
- - Runtime alteration of index.
51
- * Indexing on DataFrame.
52
46
  * Vector arithmetic - elementwise addition, subtraction, multiplication, division.
53
47
  * Transpose a dataframe.
54
- * Option to express a DataFrame as an NMatrix or MDArray so as to use more efficient storage techniques.
48
+ * Option to express a DataFrame as an NMatrix or MDArray so as to use more efficient storage techniques.
49
+ * Assignment of a column to a single number should set the entire column to that number.
50
+ * == between daru_vector and string/number.
51
+ * Multiple column assignment with []=
52
+ * Creation of DataFrame from Array of Arrays.
53
+ * Multiple value assignment for vectors with []=.
54
+ * Load DataFrame from multiple sources (excel, SQL, etc.).
55
+ * Allow for boolean operations inside #[].
56
+ * Deletion of elements from Vector should only modify the index and leave the vector as it is so that compacting is not needed and things are faster.
57
+ * Add a #sync method which will sync the modified index with the unmodified vector.
58
+ * Ability to reorder the index of a dataframe.
59
+
60
+ Copyright (c) 2014, Sameer Deshmukh
61
+ All rights reserved
@@ -0,0 +1,5 @@
1
+ require 'rspec/core/rake_task'
2
+
3
+ RSpec::Core::RakeTask.new(:spec)
4
+
5
+ task :default => :spec
@@ -24,7 +24,9 @@ Gem::Specification.new do |spec|
24
24
  spec.require_paths = ["lib"]
25
25
 
26
26
  spec.add_development_dependency 'bundler'
27
+ spec.add_development_dependency 'rake'
27
28
  spec.add_development_dependency 'rspec'
29
+ spec.add_development_dependency 'awesome_print'
28
30
  if RUBY_ENGINE != 'jruby'
29
31
  spec.add_development_dependency 'nmatrix', '~> 0.1.0.rc5'
30
32
  end
@@ -1,7 +1,7 @@
1
1
  require 'securerandom'
2
- require 'matrix'
3
2
  require 'csv'
4
3
 
4
+ require 'daru/index.rb'
5
5
  require 'daru/vector.rb'
6
6
  require 'daru/dataframe.rb'
7
7
  require 'daru/monkeys.rb'
@@ -1,259 +1,539 @@
1
+ require_relative 'dataframe_by_row.rb'
2
+ require_relative 'dataframe_by_vector.rb'
3
+ require_relative 'io.rb'
4
+
1
5
  module Daru
2
6
  class DataFrame
3
7
 
4
- attr_reader :vectors
5
-
6
- attr_reader :fields
8
+ class << self
9
+ def from_csv path, opts={}, &block
10
+ Daru::IO.from_csv path, opts, &block
11
+ end
12
+ end
7
13
 
14
+ attr_reader :vectors
15
+ attr_reader :index
16
+ attr_reader :name
8
17
  attr_reader :size
9
18
 
10
- attr_reader :name
19
+ # DataFrame basically consists of an Array of Vector objects.
20
+ # These objects are indexed by row and column by vectors and index Index objects.
21
+ # Arguments - source, vectors, index, name in that order. Last 3 are optional.
22
+ def initialize source, *args
23
+ vectors = args.shift
24
+ index = args.shift
25
+ @name = args.shift || SecureRandom.uuid
11
26
 
12
- def initialize source, fields=[], name=SecureRandom.uuid, opts={}
13
- @opts = opts
14
- set_default_opts
27
+ @data = []
15
28
 
16
29
  if source.empty?
17
- @vectors = fields.inject({}){ |a,x| a[x]=Daru::Vector.new; a}
30
+ @vectors = Daru::Index.new vectors
31
+ @index = Daru::Index.new index
32
+
33
+ create_empty_vectors
18
34
  else
19
- @vectors = source.inject({}) do |acc, h|
20
- acc[h[0]] = h[1].dv.dup
21
- acc
35
+ case source
36
+ when Array
37
+ if vectors.nil?
38
+ @vectors = Daru::Index.new source[0].keys.map(&:to_sym)
39
+ else
40
+ @vectors = Daru::Index.new (vectors + (source[0].keys - vectors)).uniq.map(&:to_sym)
41
+ end
42
+
43
+ if index.nil?
44
+ @index = Daru::Index.new source.size
45
+ else
46
+ @index = Daru::Index.new index
47
+ end
48
+
49
+ @vectors.each do |name|
50
+ v = []
51
+
52
+ source.each do |hsh|
53
+ v << (hsh[name] || hsh[name.to_s])
54
+ end
55
+
56
+ @data << v.dv(name, @index)
57
+ end
58
+ when Hash
59
+ create_vectors_index_with vectors, source
60
+
61
+ if all_daru_vectors_in_source? source
62
+
63
+ if !index.nil?
64
+ @index = index.to_index
65
+ elsif all_vectors_have_equal_indexes? source
66
+ @index = source.values[0].index.dup
67
+ else
68
+ all_indexes = []
69
+
70
+ source.each_value do |vector|
71
+ all_indexes << vector.index.to_a
72
+ end
73
+ # sort only if missing indexes detected
74
+ all_indexes.flatten!.uniq!.sort!
75
+
76
+ @index = Daru::Index.new all_indexes
77
+ end
78
+
79
+ @vectors.each do |vector|
80
+ @data << Daru::Vector.new(vector, [], @index)
81
+
82
+ @index.each do |idx|
83
+ begin
84
+ @data[@vectors[vector]][idx] = source[vector][idx]
85
+ rescue IndexError
86
+ # If the index is not present in the vector under consideration
87
+ # (in source) then an error is raised. Put a nil in that place if
88
+ # that is the case.
89
+ @data[@vectors[vector]][idx] = nil
90
+ end
91
+ end
92
+ end
93
+ else
94
+ index = source.values[0].size if index.nil?
95
+
96
+ if index.is_a?(Daru::Index)
97
+ @index = index.to_index
98
+ else
99
+ @index = Daru::Index.new index
100
+ end
101
+
102
+ @vectors.each do |name|
103
+ @data << source[name].dup.dv(name, @index)
104
+ end
105
+ end
106
+
22
107
  end
23
108
  end
24
109
 
25
- @fields = fields.empty? ? source.keys.sort : fields
26
- @name = name
27
-
28
- check_length
29
- set_missing_vectors if @vectors.keys.size < @fields.size
30
- set_fields_order if @vectors.keys.sort != @fields.sort
31
- set_vector_names
110
+ set_size
111
+ validate
32
112
  end
33
113
 
34
- def self.from_csv file, opts={}
35
- opts[:col_sep] ||= ','
36
- opts[:headers] ||= true
37
- opts[:converters] ||= :numeric
38
- opts[:header_converters] ||= :symbol
39
-
40
- csv = CSV.open file, 'r', opts
114
+ def [](*names, axis)
115
+ if axis == :vector
116
+ access_vector *names
117
+ elsif axis == :row
118
+ access_row *names
119
+ else
120
+ raise IndexError, "Expected axis to be row or vector not #{axis}"
121
+ end
122
+ end
41
123
 
42
- yield csv if block_given?
124
+ def []=(name, axis ,vector)
125
+ if axis == :vector
126
+ insert_or_modify_vector name, vector
127
+ elsif axis == :row
128
+ insert_or_modify_row name, vector
129
+ else
130
+ raise IndexError, "Expected axis to be row or vector, not #{axis}."
131
+ end
132
+ end
43
133
 
44
- first = true
45
- df = nil
134
+ def vector
135
+ Daru::DataFrameByVector.new(self)
136
+ end
46
137
 
47
- csv.each do |row|
48
- if first
49
- df = Daru::DataFrame.new({}, csv.headers)
50
- first = false
51
- end
138
+ def row
139
+ Daru::DataFrameByRow.new(self)
140
+ end
52
141
 
53
- df.insert_row row
142
+ def dup
143
+ src = {}
144
+ @vectors.each do |vector|
145
+ src[vector] = @data[@vectors[vector]]
54
146
  end
55
147
 
56
- df
148
+ Daru::DataFrame.new src, @vectors.dup, @index.dup, @name
57
149
  end
58
150
 
59
- def column name
60
- @vectors[name]
61
- end
151
+ def each_vector(&block)
152
+ @data.each(&block)
62
153
 
63
- def delete_vector name
64
- @vectors.delete name
65
- @fields.delete name
154
+ self
66
155
  end
67
156
 
68
- alias_method :delete, :delete_vector
157
+ def each_vector_with_index(&block)
158
+ @vectors.each do |vector|
159
+ yield @data[@vectors[vector]], vector
160
+ end
69
161
 
70
- def delete_row index
71
- # TODO: Make this work with NMatrix and MDArray
72
- raise "Expected index less than size." if index > @size
162
+ self
163
+ end
73
164
 
74
- @fields.each do |field|
75
- @vectors[field].delete index
165
+ def each_row(&block)
166
+ @index.each do |index|
167
+ yield access_row(index)
76
168
  end
77
- puts @vectors
169
+
170
+ self
78
171
  end
79
172
 
80
- def filter_rows name=self.name, &block
81
- df = DataFrame.new({}, @fields, name)
173
+ def each_row_with_index(&block)
174
+ @index.each do |index|
175
+ yield access_row(index), index
176
+ end
82
177
 
83
- self.each_row do |row|
84
- keep_row = yield row
178
+ self
179
+ end
85
180
 
86
- df.insert_row(row.values) if keep_row
181
+ def map_vectors(&block)
182
+ df = self.dup
183
+
184
+ df.each_vector_with_index do |vector, name|
185
+ df[name, :vector] = yield(vector)
87
186
  end
88
187
 
89
188
  df
90
189
  end
91
190
 
92
- def [] *name
93
- unless name[1]
94
- return column(name[0])
95
- end
96
-
97
- h = {}
98
- req_fields = @fields & name
191
+ def map_vectors_with_index(&block)
192
+ df = self.dup
99
193
 
100
- req_fields.each do |f|
101
- h[f] = @vectors[f]
194
+ df.each_vector_with_index do |vector, name|
195
+ df[name, :vector] = yield(vector, name)
102
196
  end
103
197
 
104
- DataFrame.new h, req_fields, @name
198
+ df
105
199
  end
106
200
 
107
- def == other
108
- @name == other.name and @vectors == other.vectors and
109
- @size == other.size and @fields == other.fields
110
- end
201
+ def map_rows(&block)
202
+ df = self.dup
203
+
204
+ df.each_row_with_index do |row, index|
205
+ df[index, :row] = yield(row)
206
+ end
111
207
 
112
- def []= name, vector
113
- insert_vector name, vector
208
+ df
114
209
  end
115
210
 
116
- def row index
117
- raise Exception, "Expected index to be within bounds" if index > @size
211
+ def map_rows_with_index(&block)
212
+ df = self.dup
118
213
 
119
- row = {}
120
- self.each_vector do |column|
121
- row[column.name] = column[index]
214
+ df.each_row_with_index do |row, index|
215
+ df[index, :row] = yield(row, index)
122
216
  end
123
217
 
124
- row
218
+ df
125
219
  end
126
220
 
127
- def has_vector? vector
128
- !!@vectors[vector]
221
+ def delete_vector vector
222
+ if @vectors.include? vector
223
+ @data.delete_at @vectors[vector]
224
+ @vectors = Daru::Index.new @vectors.to_a - [vector]
225
+ else
226
+ raise IndexError, "Vector #{vector} does not exist."
227
+ end
129
228
  end
130
229
 
131
- def each_row(&block)
132
- 0.upto(@size-1) do |index|
133
- yield row(index)
134
- end
230
+ def delete_row index
231
+ idx = named_index_for index
135
232
 
136
- self
137
- end
233
+ if @index.include? idx
234
+ @index = (@index.to_a - [idx]).to_index
138
235
 
139
- def each_row_with_index(&block)
140
- 0.upto(@size-1) do |index|
141
- yield row(index), index
236
+ self.each_vector do |vector|
237
+ vector.delete_at idx
238
+ end
239
+ else
240
+ raise IndexError, "Index #{index} does not exist."
142
241
  end
143
242
 
144
- self
243
+ set_size
145
244
  end
146
245
 
147
- def each_vector(&block)
148
- @fields.each do |field|
149
- yield @vectors[field]
150
- end
246
+ def keep_row_if &block
247
+ @index.each do |index|
248
+ keep_row = yield access_row(index)
151
249
 
152
- self
250
+ delete_row index unless keep_row
251
+ end
153
252
  end
154
253
 
155
- def each_vector_with_name(&block)
156
- @fields.each do |field|
157
- yield @vectors[field], field
254
+ def keep_vector_if &block
255
+ @vectors.each do |vector|
256
+ keep_vector = yield @data[@vectors[vector]], vector
257
+
258
+ delete_vector vector unless keep_vector
158
259
  end
159
-
160
- self
161
260
  end
162
261
 
163
- def insert_vector name, vector
164
- raise Exeception, "Expected vector size to be same as DataFrame\
165
- size." if vector.size != self.size
166
-
167
- @vectors.merge({name => vector})
168
- @fields << name
262
+ def has_vector? name
263
+ !!@vectors[name]
169
264
  end
170
265
 
171
- def insert_row row
172
- raise Exception, "Expected new row to same as the number of rows \
173
- in the DataFrame" if row.size != @fields.size
266
+ # Converts the DataFrame into an array of hashes where key is vector name
267
+ # and value is the corresponding element.
268
+ # The 0th index of the array contains the array of hashes while the 1th
269
+ # index contains the indexes of each row of the dataframe. Each element in
270
+ # the index array corresponds to its row in the array of hashes, which has
271
+ # the same index.
272
+ def to_a
273
+ arry = [[],[]]
174
274
 
175
- @fields.each_with_index do |field, index|
176
- @vectors[field] << row[index]
275
+ self.each_row do |row|
276
+ arry[0] << row.to_hash
177
277
  end
178
278
 
179
- @size += 1
279
+ arry[1] = @index.to_a
280
+
281
+ arry
180
282
  end
181
283
 
182
- def to_html(threshold=15)
183
- html = '<table>'
284
+ def to_json no_index=true
285
+ if no_index
286
+ self.to_a[0].to_json
287
+ else
288
+ self.to_a.to_json
289
+ end
290
+ end
291
+
292
+ def to_html threshold=15
293
+ html = '<table><tr><th></th>'
294
+
295
+ @vectors.each { |vector| html += '<th>' + vector.to_s + '</th>' }
184
296
 
185
- html += '<tr>'
186
- @fields.each { |f| html.concat('<td>' + f.to_s + '</td>') }
187
297
  html += '</tr>'
188
298
 
189
- self.each_row_with_index do |row, index|
190
- break if index > threshold and index <= @size
299
+ @index.each_with_index do |index, num|
191
300
  html += '<tr>'
192
- row.each_value { |val| html.concat('<td>' + val.to_s + '</td>') }
301
+ html += '<td>' + index.to_s + '</td>'
302
+
303
+ self.row[index].each do |element|
304
+ html += '<td>' + element.to_s + '</td>'
305
+ end
193
306
  html += '</tr>'
194
- if index == threshold
307
+
308
+ if num > threshold
195
309
  html += '<tr>'
196
- row.size.times { html.concat('<td>...</td>') }
310
+ (@vectors + 1).size.times { html += '<td>...</td>' }
197
311
  html += '</tr>'
312
+ break
198
313
  end
199
314
  end
200
315
 
201
316
  html += '</table>'
317
+
318
+ html
202
319
  end
203
320
 
204
321
  def to_s
205
322
  to_html
206
323
  end
207
324
 
208
- def method_missing(name, *args)
325
+ def inspect spacing=10, threshold=15
326
+ longest = [@vectors.map(&:to_s).map(&:size).max,
327
+ @index .map(&:to_s).map(&:size).max,
328
+ @data .map{ |v| v.map(&:to_s).map(&:size).max }.max].max
329
+
330
+ name = @name || 'nil'
331
+ content = ""
332
+ longest = spacing if longest > spacing
333
+ formatter = "\n"
334
+
335
+ (@vectors.size + 1).times { formatter += "%#{longest}.#{longest}s " }
336
+
337
+ content += "\n#<" + self.class.to_s + ":" + self.object_id.to_s + " @name = " +
338
+ name.to_s + " @size = " + @size.to_s + ">"
339
+
340
+ content += sprintf formatter, "" , *@vectors.map(&:to_s)
341
+
342
+ row_num = 1
343
+
344
+ self.each_row_with_index do |row, index|
345
+ content += sprintf formatter, index.to_s, *row.to_hash.values.map { |e| (e || 'nil').to_s }
346
+
347
+ row_num += 1
348
+ if row_num > threshold
349
+ dots = []
350
+
351
+ (@vectors.size + 1).times { dots << "..." }
352
+ content += sprint formatter, *dots
353
+ break
354
+ end
355
+ end
356
+
357
+ content += "\n"
358
+
359
+ content
360
+ end
361
+
362
+ def == other
363
+ @index == other.index and @size == other.size and @vectors.all? { |vector|
364
+ self[vector, :vector] == other[vector, :vector] }
365
+ end
366
+
367
+ def method_missing(name, *args, &block)
209
368
  if md = name.match(/(.+)\=/)
210
- insert_vector name[/(.+)\=/].delete("="), args[0]
369
+ insert_or_modify_vector name[/(.+)\=/].delete("="), args[0]
211
370
  elsif self.has_vector? name
212
- column name
371
+ self[name, :vector]
213
372
  else
214
373
  super(name, *args)
215
374
  end
216
375
  end
217
376
 
218
377
  private
219
- def check_length
220
- size = nil
221
-
222
- @vectors.each_value do |vector|
223
- if size.nil?
224
- size = vector.size
225
- elsif size != vector.size
226
- raise Exception, "Expected all vectors to be of the same size. Vector \
227
- #{vector.name} is of size #{vector.size} and another one of size #{size}"
378
+
379
+ def access_vector *names
380
+ unless names[1]
381
+ if @vectors.include? names[0]
382
+ return @data[@vectors[names[0]]]
383
+ elsif @vectors.key names[0]
384
+ return @data[names[0]]
385
+ else
386
+ raise IndexError, "Specified index #{names[0]} does not exist."
387
+ end
388
+ end
389
+
390
+ new_vcs = {}
391
+
392
+ names.each do |name|
393
+ name = name.to_sym unless name.is_a?(Integer)
394
+
395
+ new_vcs[name] = @data[@vectors[name]]
396
+ end
397
+
398
+ Daru::DataFrame.new new_vcs, new_vcs.keys, @index, @name
399
+ end
400
+
401
+ def access_row *names
402
+ unless names[1]
403
+ row = []
404
+
405
+ @vectors.each do |vector|
406
+ row << @data[@vectors[vector]][names[0]]
407
+ end
408
+
409
+ if @index.include? names[0]
410
+ name = names[0]
411
+ elsif @index.key names[0]
412
+ name = @index.key names[0]
413
+ else
414
+ raise IndexError, "Specified row #{names[0]} does not exist."
415
+ end
416
+
417
+ Daru::Vector.new name, row, @vectors
418
+ else
419
+ # TODO: Access multiple rows
420
+ end
421
+ end
422
+
423
+ def insert_or_modify_vector name, vector
424
+ @vectors = @vectors.re_index(@vectors + name)
425
+
426
+ v = nil
427
+
428
+ if vector.is_a?(Daru::Vector)
429
+ v = Daru::Vector.new name, [], @index
430
+
431
+ @index.each do |idx|
432
+ begin
433
+ v[idx] = vector[idx]
434
+ rescue IndexError
435
+ v[idx] = nil
436
+ end
437
+ end
438
+ else
439
+ raise Exception, "Specified vector of length #{vector.size} cannot be inserted in DataFrame of size #{@size}" if
440
+ @size != vector.size
441
+
442
+ v = vector.dv(name, @index)
443
+ end
444
+
445
+ @data[@vectors[name]] = v
446
+ end
447
+
448
+ def insert_or_modify_row name, vector
449
+ if @index.include? name
450
+ v = vector.dv(name, @vectors)
451
+
452
+ @vectors.each do |vector|
453
+ begin
454
+ @data[@vectors[vector]][name] = v[vector]
455
+ rescue IndexError
456
+ @data[@vectors[vector]][name] = nil
457
+ end
458
+ end
459
+ else
460
+ @index = @index.re_index(@index + name)
461
+ v = vector.dv(name, @vectors)
462
+
463
+ @vectors.each do |vector|
464
+ begin
465
+ @data[@vectors[vector]].concat v[vector], name
466
+ rescue IndexError
467
+ @data[@vectors[vector]].concat nil, name
468
+ end
228
469
  end
229
470
  end
230
471
 
231
- @size = size
472
+ set_size
232
473
  end
233
474
 
234
- def set_fields_order # vectors more than specified fields
235
- @fields = @fields & @vectors.keys
236
- @fields += @vectors.keys.sort - @fields
475
+ def create_empty_vectors
476
+ @vectors.each do |name|
477
+ @data << Daru::Vector.new(name, [], @index)
478
+ end
479
+ end
480
+
481
+ def validate_labels
482
+ raise IndexError, "Expected equal number of vectors for number of Hash pairs" if
483
+ @vectors.size != @data.size
484
+
485
+ raise IndexError, "Expected number of indexes same as number of rows" if
486
+ @index.size != @data[0].size
487
+ end
488
+
489
+ def validate_vector_sizes
490
+ @data.each do |vector|
491
+ raise IndexError, "Expected vectors with equal length" if vector.size != @size
492
+ end
493
+ end
494
+
495
+ def validate
496
+ # TODO: [IMP] when vectors of different dimensions are specified, they should
497
+ # be inserted into the dataframe by inserting nils wherever necessary.
498
+ validate_labels
499
+ validate_vector_sizes
237
500
  end
238
501
 
239
- # Writes names specified in the hash to the actual name of the vector.
240
- # Will over-ride any previous name assigned to the vector.
241
- def set_vector_names
242
- @fields.each do |name|
243
- @vectors[name].name = name
502
+ def all_daru_vectors_in_source? source
503
+ source.values.all? do |vector|
504
+ vector.is_a?(Daru::Vector)
244
505
  end
245
506
  end
246
507
 
247
- def set_default_opts
248
- # Future proofing
508
+ def set_size
509
+ @size = @index.size
510
+ end
511
+
512
+ def named_index_for index
513
+ if @index.include? index
514
+ index
515
+ elsif @index.key index
516
+ @index.key index
517
+ else
518
+ raise IndexError, "Specified index #{index} does not exist."
519
+ end
520
+ end
521
+
522
+ def create_vectors_index_with vectors, source
523
+ vectors = source.keys.sort if vectors.nil?
524
+
525
+ if vectors.is_a?(Daru::Index)
526
+ @vectors = vectors.to_index
527
+ else
528
+ @vectors = Daru::Index.new (vectors + (source.keys - vectors)).uniq.map(&:to_sym)
529
+ end
249
530
  end
250
531
 
251
- def set_missing_vectors
252
- missing_fields = @fields - @vectors.keys
532
+ def all_vectors_have_equal_indexes? source
533
+ index = source.values[0].index
253
534
 
254
- missing_fields.each do |field|
255
- @vectors[field] = ([nil]*@size).dv
256
- @fields << field
535
+ source.all? do |name, vector|
536
+ index == vector.index
257
537
  end
258
538
  end
259
539
  end