daru 0.2.2 → 0.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 617e082fd3366f695622071cf630690d102552821e82926af81a7007bb09093d
4
- data.tar.gz: b6b995e35e8124768a15a3e32d1fc38515aecc55f070510d7a51b45945520eb7
3
+ metadata.gz: 264a0549062a2c6b062f8c031b4e03524fd1bd852d59927a54722e8b8e68a2e8
4
+ data.tar.gz: 2dee6ded3fb009045a6ef13203c8ebe4251458e64b3e41a737b8bc04d4d0b91f
5
5
  SHA512:
6
- metadata.gz: 8ae029cac761e4a7164b472ad6ef5275d18aa7d6dace9f61ab4553cc288a1d95e80412804e28a4405444a97dfd1e850ce00783d9b45126c0cb8e0c4dafa09e63
7
- data.tar.gz: e8aa0aed6c05ec54ba4f5083ed3baa47b6eb02c3304bb0152beea1e57e181aafe45ebc779bbdec76b22b6c86b29f71dc16d013250870589cb93eeae0b8ca0917
6
+ metadata.gz: 1cb5cf9a2aa1660e9cd0d0a286af6d3dcbe987b5001f15a90e5b89394c36fdaedd8e29e742f591c36de64e61363ba66ceff8ae9f075c2ce1a4352a56584b4c24
7
+ data.tar.gz: 8e3d5843f871c0fba685430e27ff91090db4873da8edbfe2d1095b12a24a06dcda7ae1b198b9a3f7fcc013aeb757560088bcbb507a70ece4a4976eda63094cd8
@@ -2,11 +2,8 @@ language:
2
2
  ruby
3
3
 
4
4
  rvm:
5
- - '2.0'
6
- - '2.1'
7
- - '2.2'
8
- - '2.3.0'
9
- - '2.4.0'
5
+ - '2.5.1'
6
+ - '2.7.1'
10
7
 
11
8
  matrix:
12
9
  allow_failures:
@@ -14,6 +11,9 @@ matrix:
14
11
  fast_finish:
15
12
  true
16
13
 
14
+ env:
15
+ - DARU_TEST_NMATRIX=1 DARU_TEST_GSL=1
16
+
17
17
  script:
18
18
  - bundle add yard-junk
19
19
  - bundle install
@@ -22,10 +22,7 @@ script:
22
22
  - bundle exec yard-junk
23
23
 
24
24
  install:
25
- - if [ $TRAVIS_RUBY_VERSION == '2.2' ] || [ $TRAVIS_RUBY_VERSION == '2.1' ] || [ $TRAVIS_RUBY_VERSION == '2.0' ];
26
- then gem install bundler -v '~> 1.6';
27
- else gem install bundler;
28
- fi
25
+ - gem install bundler
29
26
  - gem install rainbow -v '2.2.1'
30
27
  - bundle install
31
28
 
@@ -6,15 +6,18 @@ Either nmatrix or rb-gsl are NOT NECESSARY for using daru. They are just require
6
6
 
7
7
  To install dependencies, execute the following commands:
8
8
 
9
- * `sudo apt-get update -qq`
10
- * `sudo apt-get install -y libgsl0-dev r-base r-base-dev`
11
- * `sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')"`
12
- * `sudo apt-get install libmagickwand-dev imagemagick`
13
-
14
-
15
- Then install remaining dependencies:
16
-
17
- `bundle install`
9
+ ``` bash
10
+ sudo apt-get update -qq
11
+ sudo apt-get install -y libgsl0-dev r-base r-base-dev
12
+ sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')"
13
+ sudo apt-get install libmagickwand-dev imagemagick
14
+ export DARU_TEST_NMATRIX=1 # for running nmatrix tests.
15
+ export DARU_TEST_GSL=1 # for running rb-GSL tests.
16
+ bundle install
17
+ ```
18
+ You don't need `DARU_TEST_NMATRIX` or `DARU_TEST_GSL` if you don't want to make changes
19
+ to those parts of the code. However, they will be set in CI and will raise a test failure
20
+ if something goes wrong.
18
21
 
19
22
  And run the test suite (should be all green with pending tests):
20
23
 
@@ -22,13 +25,6 @@ And run the test suite (should be all green with pending tests):
22
25
 
23
26
  If you have problems installing nmatrix, please consult the [nmatrix installation wiki](https://github.com/SciRuby/nmatrix/wiki/Installation) or the [mailing list](https://groups.google.com/forum/#!forum/sciruby-dev).
24
27
 
25
- **NOTE**: `Daru` is compatible with Ruby versions < 2.5; for later Ruby versions it breaks, returning the following error in versions >= 2.5.
26
- ```
27
- /gems/packable-1.3.10/lib/packable/extensions/io.rb:86:in `pos': Illegal seek @ rb_io_tell - <STDOUT> (Errno::ESPIPE)
28
- ```
29
- To reproduce this issue or explore this error further, head over to
30
- [issue #500](https://github.com/SciRuby/daru/issues/500),
31
- [issue #503](https://github.com/SciRuby/daru/issues/503). Also, if you want to fix this issue, then please discuss it here : [#505](https://github.com/SciRuby/daru/issues/500)
32
28
 
33
29
  While preparing your pull requests, don't forget to check your code with Rubocop:
34
30
 
data/History.md CHANGED
@@ -1,3 +1,20 @@
1
+ # 0.3 (30 May 2020)
2
+ * Major Enhacements
3
+ - Remove official support for Ruby < 2.5.1. Now we only test with 2.5.1 and 2.7.1. (@v0dro)
4
+ - Make nmatrix and gsl optional dependencies for testing. (@v0dro)
5
+ - Update sqlite, activerecord, nokogiri, packable, rake dependencies. (@v0dro)
6
+ - Remove runtime dependency on backports. (@v0dro)
7
+ - Add `Daru::Vector#match and Daru::Vector#apply_where` methods (@athityakumar).
8
+ - Add support for options to the `Daru` module. Adds a separate module `Daru::Configuration` that
9
+ can hold data for overall configuration of daru's execution. (@kojix2)
10
+ * Minor Enhancements
11
+ - Add new `DataFrame#insert_vector` method. (@cyrillefr)
12
+ - Add `Vector#last`. (@kojix2)
13
+ - Add `DataFrame#rename_vectors!`. (@neumanrq)
14
+ - Refactor `GroupBy#apply_method`. (@paisible-wanderer)
15
+ - Auto-adjust header parameters when printing to terminal. (@ncs1)
16
+ - Infer offsets of timeseries automatically when they are a natural number multiple of seconds. (@jpaulgs)
17
+
1
18
  # 0.2.2 (8 August 2019)
2
19
 
3
20
  * Minor Enhancements
data/README.md CHANGED
@@ -9,8 +9,9 @@
9
9
 
10
10
  daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby.
11
11
 
12
- daru makes it easy and intuitive to process data predominantly through 2 data structures: `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2, 2.3, and 2.4.
13
-
12
+ daru makes it easy and intuitive to process data predominantly through 2 data structures:
13
+ `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations.
14
+ Tested with MRI 2.5.1 and 2.7.1.
14
15
 
15
16
  ## daru plugin gems
16
17
 
@@ -19,7 +19,7 @@ Gem::Specification.new do |spec|
19
19
  spec.email = ['sameer.deshmukh93@gmail.com']
20
20
  spec.summary = %q{Data Analysis in RUby}
21
21
  spec.description = Daru::DESCRIPTION
22
- spec.homepage = "http://github.com/v0dro/daru"
22
+ spec.homepage = "http://github.com/SciRuby/daru"
23
23
  spec.license = 'BSD-2'
24
24
 
25
25
  spec.files = `git ls-files -z`.split("\x0")
@@ -27,14 +27,12 @@ Gem::Specification.new do |spec|
27
27
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
28
28
  spec.require_paths = ["lib"]
29
29
 
30
- spec.add_runtime_dependency 'backports'
31
-
32
30
  # it is required by NMatrix, yet we want to specify clearly which minimal version is OK
33
- spec.add_runtime_dependency 'packable', '~> 1.3.9'
31
+ spec.add_runtime_dependency 'packable', '~> 1.3.13'
34
32
 
35
33
  spec.add_development_dependency 'spreadsheet', '~> 1.1.1'
36
34
  spec.add_development_dependency 'bundler', '>= 1.10'
37
- spec.add_development_dependency 'rake', '~>10.5'
35
+ spec.add_development_dependency 'rake', '~>13.0'
38
36
  spec.add_development_dependency 'pry', '~> 0.10'
39
37
  spec.add_development_dependency 'pry-byebug'
40
38
  spec.add_development_dependency 'rserve-client', '~> 0.3'
@@ -42,28 +40,22 @@ Gem::Specification.new do |spec|
42
40
  spec.add_development_dependency 'rspec-its'
43
41
  spec.add_development_dependency 'awesome_print'
44
42
  spec.add_development_dependency 'nyaplot', '~> 0.1.5'
45
- spec.add_development_dependency 'nmatrix', '~> 0.2.1'
43
+ spec.add_development_dependency 'nmatrix', '~> 0.2.1' if ENV['DARU_TEST_NMATRIX']
46
44
  spec.add_development_dependency 'distribution', '~> 0.7'
47
- spec.add_development_dependency 'gsl', '~>2.1.0.2'
45
+ spec.add_development_dependency 'gsl', '~>2.1.0.2' if ENV['DARU_TEST_GSL']
48
46
  spec.add_development_dependency 'dbd-sqlite3'
49
47
  spec.add_development_dependency 'dbi'
50
- spec.add_development_dependency 'activerecord', '~> 4.0'
48
+ spec.add_development_dependency 'activerecord', '~> 6.0'
51
49
  spec.add_development_dependency 'mechanize'
52
50
  # issue : https://github.com/SciRuby/daru/issues/493 occured
53
51
  # with latest version of sqlite3
54
- spec.add_development_dependency 'sqlite3', '~> 1.3.13'
52
+ spec.add_development_dependency 'sqlite3'
55
53
  spec.add_development_dependency 'rubocop', '~> 0.49.0'
56
54
  spec.add_development_dependency 'ruby-prof'
57
55
  spec.add_development_dependency 'simplecov'
58
56
  spec.add_development_dependency 'gruff'
59
57
  spec.add_development_dependency 'webmock'
60
58
 
61
- if RUBY_VERSION < '2.1.0'
62
- spec.add_development_dependency 'nokogiri', '<= 1.6.8.1'
63
- else
64
- spec.add_development_dependency 'nokogiri'
65
- end
66
- if RUBY_VERSION >= '2.2.5'
67
- spec.add_development_dependency 'guard-rspec'
68
- end
59
+ spec.add_development_dependency 'nokogiri'
60
+ spec.add_development_dependency 'guard-rspec'
69
61
  end
@@ -95,13 +95,13 @@ require 'date'
95
95
  require 'daru/version.rb'
96
96
 
97
97
  require 'open-uri'
98
- require 'backports/2.1.0/array/to_h'
99
98
 
100
99
  require 'daru/index/index.rb'
101
100
  require 'daru/index/multi_index.rb'
102
101
  require 'daru/index/categorical_index.rb'
103
102
 
104
103
  require 'daru/helpers/array.rb'
104
+ require 'daru/configuration.rb'
105
105
  require 'daru/vector.rb'
106
106
  require 'daru/dataframe.rb'
107
107
  require 'daru/monkeys.rb'
@@ -0,0 +1,34 @@
1
+ module Daru
2
+ # Defines constants and methods related to configuration
3
+ module Configuration
4
+ INSPECT_OPTIONS_KEYS = [
5
+ :max_rows,
6
+ # Terminal
7
+ :spacing
8
+ ].freeze
9
+
10
+ # Jupyter
11
+ DEFAULT_MAX_ROWS = 30
12
+
13
+ # Terminal
14
+ DEFAULT_SPACING = 10
15
+
16
+ attr_accessor(*INSPECT_OPTIONS_KEYS)
17
+
18
+ def configure
19
+ yield self
20
+ end
21
+
22
+ def self.extended(base)
23
+ base.reset_options
24
+ end
25
+
26
+ def reset_options
27
+ self.max_rows = DEFAULT_MAX_ROWS
28
+
29
+ self.spacing = DEFAULT_SPACING
30
+ end
31
+ end
32
+
33
+ extend Configuration
34
+ end
@@ -2,21 +2,25 @@ module Daru
2
2
  module Core
3
3
  class GroupBy
4
4
  class << self
5
+ extend Gem::Deprecate
6
+
5
7
  # @private
6
- def get_positions_group_map_on(indexes_with_positions, sort: false)
7
- group_map = {}
8
+ def group_by_index_to_positions(indexes_with_positions, sort: false)
9
+ index_to_positions = {}
8
10
 
9
11
  indexes_with_positions.each do |idx, position|
10
- (group_map[idx] ||= []) << position
12
+ (index_to_positions[idx] ||= []) << position
11
13
  end
12
14
 
13
15
  if sort # TODO: maybe add a more "stable" sorting option?
14
- sorted_keys = group_map.keys.sort(&Daru::Core::GroupBy::TUPLE_SORTER)
15
- group_map = sorted_keys.map { |k| [k, group_map[k]] }.to_h
16
+ sorted_keys = index_to_positions.keys.sort(&Daru::Core::GroupBy::TUPLE_SORTER)
17
+ index_to_positions = sorted_keys.map { |k| [k, index_to_positions[k]] }.to_h
16
18
  end
17
19
 
18
- group_map
20
+ index_to_positions
19
21
  end
22
+ alias get_positions_group_map_on group_by_index_to_positions
23
+ deprecate :get_positions_group_map_on, :group_by_index_to_positions, 2019, 10
20
24
 
21
25
  # @private
22
26
  def get_positions_group_for_aggregation(multi_index, level=-1)
@@ -25,14 +29,14 @@ module Daru
25
29
  new_index = multi_index.dup
26
30
  new_index.remove_layer(level) # TODO: recheck code of Daru::MultiIndex#remove_layer
27
31
 
28
- get_positions_group_map_on(new_index.each_with_index)
32
+ group_by_index_to_positions(new_index.each_with_index)
29
33
  end
30
34
 
31
35
  # @private
32
36
  def get_positions_group_map_for_df(df, group_by_keys, sort: true)
33
37
  indexes_with_positions = df[*group_by_keys].to_df.each_row.map(&:to_a).each_with_index
34
38
 
35
- get_positions_group_map_on(indexes_with_positions, sort: sort)
39
+ group_by_index_to_positions(indexes_with_positions, sort: sort)
36
40
  end
37
41
 
38
42
  # @private
@@ -57,6 +61,9 @@ module Daru
57
61
  end
58
62
  end
59
63
 
64
+ # The group_by was done over the vectors in group_vectors; the remaining vectors are the non_group_vectors
65
+ attr_reader :group_vectors, :non_group_vectors
66
+
60
67
  # lazy accessor/attr_reader for the attribute groups
61
68
  def groups
62
69
  @groups ||= GroupBy.group_map_from_positions_to_indexes(@groups_by_pos, @context.index)
@@ -93,7 +100,7 @@ module Daru
93
100
  @group_vectors = names
94
101
  @non_group_vectors = context.vectors.to_a - names
95
102
 
96
- @context = context # TODO: maybe rename in @original_df or @grouped_db
103
+ @context = context # TODO: maybe rename in @original_df
97
104
 
98
105
  # FIXME: It feels like we don't want to sort here. Ruby's #group_by
99
106
  # never sorts:
@@ -362,21 +369,15 @@ module Daru
362
369
  Daru::DataFrame.rows(rows, order: @context.vectors, index: indexes)
363
370
  end
364
371
 
365
- def apply_method method_type, method
366
- order = @non_group_vectors.select do |ngvec|
367
- method_type == :numeric && @context[ngvec].type == :numeric
368
- end
372
+ def select_numeric_non_group_vectors
373
+ @non_group_vectors.select { |ngvec| @context[ngvec].type == :numeric }
374
+ end
369
375
 
370
- rows = groups_by_idx.map do |_group, indexes|
371
- order.map do |ngvector|
372
- slice = @context[ngvector][*indexes]
373
- slice.is_a?(Daru::Vector) ? slice.send(method) : slice
374
- end
375
- end
376
+ def apply_method method_type, method
377
+ raise 'To implement' if method_type != :numeric
378
+ aggregation_options = select_numeric_non_group_vectors.map { |k| [k, method] }.to_h
376
379
 
377
- index = get_grouped_index
378
- order = Daru::Index.new(order)
379
- Daru::DataFrame.new(rows.transpose, index: index, order: order)
380
+ aggregate(aggregation_options)
380
381
  end
381
382
 
382
383
  def get_grouped_index(index_tuples=nil)
@@ -70,6 +70,22 @@ module Daru
70
70
  resultant_dv
71
71
  end
72
72
 
73
+ def vector_apply_where dv, bool_array
74
+ _data, new_index = fetch_new_data_and_index dv, bool_array
75
+ all_index = dv.index
76
+ all_data = all_index.map { |idx| new_index.include?(idx) ? yield(dv[idx]) : dv[idx] }
77
+
78
+ resultant_dv = Daru::Vector.new all_data,
79
+ index: dv.index.class.new(all_index),
80
+ dtype: dv.dtype,
81
+ type: dv.type,
82
+ name: dv.name
83
+
84
+ # Preserve categories order for category vector
85
+ resultant_dv.categories = dv.categories if dv.category?
86
+ resultant_dv
87
+ end
88
+
73
89
  private
74
90
 
75
91
  def fetch_new_data_and_index dv, bool_array
@@ -12,6 +12,8 @@ module Daru
12
12
  # TODO: Remove this line but its causing erros due to unkown reason
13
13
  Daru.has_nyaplot?
14
14
 
15
+ attr_accessor(*Configuration::INSPECT_OPTIONS_KEYS)
16
+
15
17
  extend Gem::Deprecate
16
18
 
17
19
  class << self
@@ -545,6 +547,17 @@ module Daru
545
547
  self[n] = vector
546
548
  end
547
549
 
550
+ def insert_vector n, name, source
551
+ raise ArgumentError unless source.is_a? Array
552
+ vector = Daru::Vector.new(source, index: @index, name: @name)
553
+ @data << vector
554
+ @vectors = @vectors.add name
555
+ ordr = @vectors.dup.to_a
556
+ elmnt = ordr.pop
557
+ ordr = ordr.insert n, elmnt
558
+ self.order=ordr
559
+ end
560
+
548
561
  # Access a row or set/create a row. Refer #[] and #[]= docs for details.
549
562
  #
550
563
  # == Usage
@@ -1696,6 +1709,24 @@ module Daru
1696
1709
  self.vectors = Daru::Index.new new_names
1697
1710
  end
1698
1711
 
1712
+ # Renames the vectors and returns itself
1713
+ #
1714
+ # == Arguments
1715
+ #
1716
+ # * name_map - A hash where the keys are the exising vector names and
1717
+ # the values are the new names. If a vector is renamed
1718
+ # to a vector name that is already in use, the existing
1719
+ # one is overwritten.
1720
+ #
1721
+ # == Usage
1722
+ #
1723
+ # df = Daru::DataFrame.new({ a: [1,2,3,4], b: [:a,:b,:c,:d], c: [11,22,33,44] })
1724
+ # df.rename_vectors! :a => :alpha, :c => :gamma # df
1725
+ def rename_vectors! name_map
1726
+ rename_vectors(name_map)
1727
+ self
1728
+ end
1729
+
1699
1730
  # Return the indexes of all the numeric vectors. Will include vectors with nils
1700
1731
  # alongwith numbers.
1701
1732
  def numeric_vectors
@@ -2091,7 +2122,7 @@ module Daru
2091
2122
  end
2092
2123
 
2093
2124
  # Convert to html for IRuby.
2094
- def to_html(threshold=30)
2125
+ def to_html(threshold=Daru.max_rows)
2095
2126
  table_thead = to_html_thead
2096
2127
  table_tbody = to_html_tbody(threshold)
2097
2128
  path = if index.is_a?(MultiIndex)
@@ -2112,7 +2143,8 @@ module Daru
2112
2143
  ERB.new(File.read(table_thead_path).strip).result(binding)
2113
2144
  end
2114
2145
 
2115
- def to_html_tbody(threshold=30)
2146
+ def to_html_tbody(threshold=Daru.max_rows)
2147
+ threshold ||= @size
2116
2148
  table_tbody_path =
2117
2149
  if index.is_a?(MultiIndex)
2118
2150
  File.expand_path('../iruby/templates/dataframe_mi_tbody.html.erb', __FILE__)
@@ -2229,10 +2261,11 @@ module Daru
2229
2261
  end
2230
2262
 
2231
2263
  # Pretty print in a nice table format for the command line (irb/pry/iruby)
2232
- def inspect spacing=10, threshold=15
2264
+ def inspect spacing=Daru.spacing, threshold=Daru.max_rows
2233
2265
  name_part = @name ? ": #{@name} " : ''
2266
+ spacing = [headers.to_a.map(&:length).max, spacing].max
2234
2267
 
2235
- "#<#{self.class}#{name_part}(#{nrows}x#{ncols})>\n" +
2268
+ "#<#{self.class}#{name_part}(#{nrows}x#{ncols})>#{$INPUT_RECORD_SEPARATOR}" +
2236
2269
  Formatters::Table.format(
2237
2270
  each_row.lazy,
2238
2271
  row_headers: row_headers,
@@ -93,11 +93,12 @@ module Daru
93
93
  def infer_offset data
94
94
  diffs = data.each_cons(2).map { |d1, d2| d2 - d1 }
95
95
 
96
- if diffs.uniq.count == 1
97
- TIME_INTERVALS[diffs.first].new
98
- else
99
- nil
100
- end
96
+ return nil unless diffs.uniq.count == 1
97
+
98
+ return TIME_INTERVALS[diffs.first].new if TIME_INTERVALS.include?(diffs.first)
99
+
100
+ number_of_seconds = diffs.first / Daru::Offsets::Second.new.multiplier
101
+ Daru::Offsets::Second.new(number_of_seconds.numerator) if number_of_seconds.denominator == 1
101
102
  end
102
103
 
103
104
  def find_index_of_date data, date_time