daru 0.2.2 → 0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 617e082fd3366f695622071cf630690d102552821e82926af81a7007bb09093d
4
- data.tar.gz: b6b995e35e8124768a15a3e32d1fc38515aecc55f070510d7a51b45945520eb7
3
+ metadata.gz: 264a0549062a2c6b062f8c031b4e03524fd1bd852d59927a54722e8b8e68a2e8
4
+ data.tar.gz: 2dee6ded3fb009045a6ef13203c8ebe4251458e64b3e41a737b8bc04d4d0b91f
5
5
  SHA512:
6
- metadata.gz: 8ae029cac761e4a7164b472ad6ef5275d18aa7d6dace9f61ab4553cc288a1d95e80412804e28a4405444a97dfd1e850ce00783d9b45126c0cb8e0c4dafa09e63
7
- data.tar.gz: e8aa0aed6c05ec54ba4f5083ed3baa47b6eb02c3304bb0152beea1e57e181aafe45ebc779bbdec76b22b6c86b29f71dc16d013250870589cb93eeae0b8ca0917
6
+ metadata.gz: 1cb5cf9a2aa1660e9cd0d0a286af6d3dcbe987b5001f15a90e5b89394c36fdaedd8e29e742f591c36de64e61363ba66ceff8ae9f075c2ce1a4352a56584b4c24
7
+ data.tar.gz: 8e3d5843f871c0fba685430e27ff91090db4873da8edbfe2d1095b12a24a06dcda7ae1b198b9a3f7fcc013aeb757560088bcbb507a70ece4a4976eda63094cd8
@@ -2,11 +2,8 @@ language:
2
2
  ruby
3
3
 
4
4
  rvm:
5
- - '2.0'
6
- - '2.1'
7
- - '2.2'
8
- - '2.3.0'
9
- - '2.4.0'
5
+ - '2.5.1'
6
+ - '2.7.1'
10
7
 
11
8
  matrix:
12
9
  allow_failures:
@@ -14,6 +11,9 @@ matrix:
14
11
  fast_finish:
15
12
  true
16
13
 
14
+ env:
15
+ - DARU_TEST_NMATRIX=1 DARU_TEST_GSL=1
16
+
17
17
  script:
18
18
  - bundle add yard-junk
19
19
  - bundle install
@@ -22,10 +22,7 @@ script:
22
22
  - bundle exec yard-junk
23
23
 
24
24
  install:
25
- - if [ $TRAVIS_RUBY_VERSION == '2.2' ] || [ $TRAVIS_RUBY_VERSION == '2.1' ] || [ $TRAVIS_RUBY_VERSION == '2.0' ];
26
- then gem install bundler -v '~> 1.6';
27
- else gem install bundler;
28
- fi
25
+ - gem install bundler
29
26
  - gem install rainbow -v '2.2.1'
30
27
  - bundle install
31
28
 
@@ -6,15 +6,18 @@ Either nmatrix or rb-gsl are NOT NECESSARY for using daru. They are just require
6
6
 
7
7
  To install dependencies, execute the following commands:
8
8
 
9
- * `sudo apt-get update -qq`
10
- * `sudo apt-get install -y libgsl0-dev r-base r-base-dev`
11
- * `sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')"`
12
- * `sudo apt-get install libmagickwand-dev imagemagick`
13
-
14
-
15
- Then install remaining dependencies:
16
-
17
- `bundle install`
9
+ ``` bash
10
+ sudo apt-get update -qq
11
+ sudo apt-get install -y libgsl0-dev r-base r-base-dev
12
+ sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')"
13
+ sudo apt-get install libmagickwand-dev imagemagick
14
+ export DARU_TEST_NMATRIX=1 # for running nmatrix tests.
15
+ export DARU_TEST_GSL=1 # for running rb-GSL tests.
16
+ bundle install
17
+ ```
18
+ You don't need `DARU_TEST_NMATRIX` or `DARU_TEST_GSL` if you don't want to make changes
19
+ to those parts of the code. However, they will be set in CI and will raise a test failure
20
+ if something goes wrong.
18
21
 
19
22
  And run the test suite (should be all green with pending tests):
20
23
 
@@ -22,13 +25,6 @@ And run the test suite (should be all green with pending tests):
22
25
 
23
26
  If you have problems installing nmatrix, please consult the [nmatrix installation wiki](https://github.com/SciRuby/nmatrix/wiki/Installation) or the [mailing list](https://groups.google.com/forum/#!forum/sciruby-dev).
24
27
 
25
- **NOTE**: `Daru` is compatible with Ruby versions < 2.5; for later Ruby versions it breaks, returning the following error in versions >= 2.5.
26
- ```
27
- /gems/packable-1.3.10/lib/packable/extensions/io.rb:86:in `pos': Illegal seek @ rb_io_tell - <STDOUT> (Errno::ESPIPE)
28
- ```
29
- To reproduce this issue or explore this error further, head over to
30
- [issue #500](https://github.com/SciRuby/daru/issues/500),
31
- [issue #503](https://github.com/SciRuby/daru/issues/503). Also, if you want to fix this issue, then please discuss it here : [#505](https://github.com/SciRuby/daru/issues/500)
32
28
 
33
29
  While preparing your pull requests, don't forget to check your code with Rubocop:
34
30
 
data/History.md CHANGED
@@ -1,3 +1,20 @@
1
+ # 0.3 (30 May 2020)
2
+ * Major Enhacements
3
+ - Remove official support for Ruby < 2.5.1. Now we only test with 2.5.1 and 2.7.1. (@v0dro)
4
+ - Make nmatrix and gsl optional dependencies for testing. (@v0dro)
5
+ - Update sqlite, activerecord, nokogiri, packable, rake dependencies. (@v0dro)
6
+ - Remove runtime dependency on backports. (@v0dro)
7
+ - Add `Daru::Vector#match and Daru::Vector#apply_where` methods (@athityakumar).
8
+ - Add support for options to the `Daru` module. Adds a separate module `Daru::Configuration` that
9
+ can hold data for overall configuration of daru's execution. (@kojix2)
10
+ * Minor Enhancements
11
+ - Add new `DataFrame#insert_vector` method. (@cyrillefr)
12
+ - Add `Vector#last`. (@kojix2)
13
+ - Add `DataFrame#rename_vectors!`. (@neumanrq)
14
+ - Refactor `GroupBy#apply_method`. (@paisible-wanderer)
15
+ - Auto-adjust header parameters when printing to terminal. (@ncs1)
16
+ - Infer offsets of timeseries automatically when they are a natural number multiple of seconds. (@jpaulgs)
17
+
1
18
  # 0.2.2 (8 August 2019)
2
19
 
3
20
  * Minor Enhancements
data/README.md CHANGED
@@ -9,8 +9,9 @@
9
9
 
10
10
  daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby.
11
11
 
12
- daru makes it easy and intuitive to process data predominantly through 2 data structures: `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2, 2.3, and 2.4.
13
-
12
+ daru makes it easy and intuitive to process data predominantly through 2 data structures:
13
+ `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations.
14
+ Tested with MRI 2.5.1 and 2.7.1.
14
15
 
15
16
  ## daru plugin gems
16
17
 
@@ -19,7 +19,7 @@ Gem::Specification.new do |spec|
19
19
  spec.email = ['sameer.deshmukh93@gmail.com']
20
20
  spec.summary = %q{Data Analysis in RUby}
21
21
  spec.description = Daru::DESCRIPTION
22
- spec.homepage = "http://github.com/v0dro/daru"
22
+ spec.homepage = "http://github.com/SciRuby/daru"
23
23
  spec.license = 'BSD-2'
24
24
 
25
25
  spec.files = `git ls-files -z`.split("\x0")
@@ -27,14 +27,12 @@ Gem::Specification.new do |spec|
27
27
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
28
28
  spec.require_paths = ["lib"]
29
29
 
30
- spec.add_runtime_dependency 'backports'
31
-
32
30
  # it is required by NMatrix, yet we want to specify clearly which minimal version is OK
33
- spec.add_runtime_dependency 'packable', '~> 1.3.9'
31
+ spec.add_runtime_dependency 'packable', '~> 1.3.13'
34
32
 
35
33
  spec.add_development_dependency 'spreadsheet', '~> 1.1.1'
36
34
  spec.add_development_dependency 'bundler', '>= 1.10'
37
- spec.add_development_dependency 'rake', '~>10.5'
35
+ spec.add_development_dependency 'rake', '~>13.0'
38
36
  spec.add_development_dependency 'pry', '~> 0.10'
39
37
  spec.add_development_dependency 'pry-byebug'
40
38
  spec.add_development_dependency 'rserve-client', '~> 0.3'
@@ -42,28 +40,22 @@ Gem::Specification.new do |spec|
42
40
  spec.add_development_dependency 'rspec-its'
43
41
  spec.add_development_dependency 'awesome_print'
44
42
  spec.add_development_dependency 'nyaplot', '~> 0.1.5'
45
- spec.add_development_dependency 'nmatrix', '~> 0.2.1'
43
+ spec.add_development_dependency 'nmatrix', '~> 0.2.1' if ENV['DARU_TEST_NMATRIX']
46
44
  spec.add_development_dependency 'distribution', '~> 0.7'
47
- spec.add_development_dependency 'gsl', '~>2.1.0.2'
45
+ spec.add_development_dependency 'gsl', '~>2.1.0.2' if ENV['DARU_TEST_GSL']
48
46
  spec.add_development_dependency 'dbd-sqlite3'
49
47
  spec.add_development_dependency 'dbi'
50
- spec.add_development_dependency 'activerecord', '~> 4.0'
48
+ spec.add_development_dependency 'activerecord', '~> 6.0'
51
49
  spec.add_development_dependency 'mechanize'
52
50
  # issue : https://github.com/SciRuby/daru/issues/493 occured
53
51
  # with latest version of sqlite3
54
- spec.add_development_dependency 'sqlite3', '~> 1.3.13'
52
+ spec.add_development_dependency 'sqlite3'
55
53
  spec.add_development_dependency 'rubocop', '~> 0.49.0'
56
54
  spec.add_development_dependency 'ruby-prof'
57
55
  spec.add_development_dependency 'simplecov'
58
56
  spec.add_development_dependency 'gruff'
59
57
  spec.add_development_dependency 'webmock'
60
58
 
61
- if RUBY_VERSION < '2.1.0'
62
- spec.add_development_dependency 'nokogiri', '<= 1.6.8.1'
63
- else
64
- spec.add_development_dependency 'nokogiri'
65
- end
66
- if RUBY_VERSION >= '2.2.5'
67
- spec.add_development_dependency 'guard-rspec'
68
- end
59
+ spec.add_development_dependency 'nokogiri'
60
+ spec.add_development_dependency 'guard-rspec'
69
61
  end
@@ -95,13 +95,13 @@ require 'date'
95
95
  require 'daru/version.rb'
96
96
 
97
97
  require 'open-uri'
98
- require 'backports/2.1.0/array/to_h'
99
98
 
100
99
  require 'daru/index/index.rb'
101
100
  require 'daru/index/multi_index.rb'
102
101
  require 'daru/index/categorical_index.rb'
103
102
 
104
103
  require 'daru/helpers/array.rb'
104
+ require 'daru/configuration.rb'
105
105
  require 'daru/vector.rb'
106
106
  require 'daru/dataframe.rb'
107
107
  require 'daru/monkeys.rb'
@@ -0,0 +1,34 @@
1
+ module Daru
2
+ # Defines constants and methods related to configuration
3
+ module Configuration
4
+ INSPECT_OPTIONS_KEYS = [
5
+ :max_rows,
6
+ # Terminal
7
+ :spacing
8
+ ].freeze
9
+
10
+ # Jupyter
11
+ DEFAULT_MAX_ROWS = 30
12
+
13
+ # Terminal
14
+ DEFAULT_SPACING = 10
15
+
16
+ attr_accessor(*INSPECT_OPTIONS_KEYS)
17
+
18
+ def configure
19
+ yield self
20
+ end
21
+
22
+ def self.extended(base)
23
+ base.reset_options
24
+ end
25
+
26
+ def reset_options
27
+ self.max_rows = DEFAULT_MAX_ROWS
28
+
29
+ self.spacing = DEFAULT_SPACING
30
+ end
31
+ end
32
+
33
+ extend Configuration
34
+ end
@@ -2,21 +2,25 @@ module Daru
2
2
  module Core
3
3
  class GroupBy
4
4
  class << self
5
+ extend Gem::Deprecate
6
+
5
7
  # @private
6
- def get_positions_group_map_on(indexes_with_positions, sort: false)
7
- group_map = {}
8
+ def group_by_index_to_positions(indexes_with_positions, sort: false)
9
+ index_to_positions = {}
8
10
 
9
11
  indexes_with_positions.each do |idx, position|
10
- (group_map[idx] ||= []) << position
12
+ (index_to_positions[idx] ||= []) << position
11
13
  end
12
14
 
13
15
  if sort # TODO: maybe add a more "stable" sorting option?
14
- sorted_keys = group_map.keys.sort(&Daru::Core::GroupBy::TUPLE_SORTER)
15
- group_map = sorted_keys.map { |k| [k, group_map[k]] }.to_h
16
+ sorted_keys = index_to_positions.keys.sort(&Daru::Core::GroupBy::TUPLE_SORTER)
17
+ index_to_positions = sorted_keys.map { |k| [k, index_to_positions[k]] }.to_h
16
18
  end
17
19
 
18
- group_map
20
+ index_to_positions
19
21
  end
22
+ alias get_positions_group_map_on group_by_index_to_positions
23
+ deprecate :get_positions_group_map_on, :group_by_index_to_positions, 2019, 10
20
24
 
21
25
  # @private
22
26
  def get_positions_group_for_aggregation(multi_index, level=-1)
@@ -25,14 +29,14 @@ module Daru
25
29
  new_index = multi_index.dup
26
30
  new_index.remove_layer(level) # TODO: recheck code of Daru::MultiIndex#remove_layer
27
31
 
28
- get_positions_group_map_on(new_index.each_with_index)
32
+ group_by_index_to_positions(new_index.each_with_index)
29
33
  end
30
34
 
31
35
  # @private
32
36
  def get_positions_group_map_for_df(df, group_by_keys, sort: true)
33
37
  indexes_with_positions = df[*group_by_keys].to_df.each_row.map(&:to_a).each_with_index
34
38
 
35
- get_positions_group_map_on(indexes_with_positions, sort: sort)
39
+ group_by_index_to_positions(indexes_with_positions, sort: sort)
36
40
  end
37
41
 
38
42
  # @private
@@ -57,6 +61,9 @@ module Daru
57
61
  end
58
62
  end
59
63
 
64
+ # The group_by was done over the vectors in group_vectors; the remaining vectors are the non_group_vectors
65
+ attr_reader :group_vectors, :non_group_vectors
66
+
60
67
  # lazy accessor/attr_reader for the attribute groups
61
68
  def groups
62
69
  @groups ||= GroupBy.group_map_from_positions_to_indexes(@groups_by_pos, @context.index)
@@ -93,7 +100,7 @@ module Daru
93
100
  @group_vectors = names
94
101
  @non_group_vectors = context.vectors.to_a - names
95
102
 
96
- @context = context # TODO: maybe rename in @original_df or @grouped_db
103
+ @context = context # TODO: maybe rename in @original_df
97
104
 
98
105
  # FIXME: It feels like we don't want to sort here. Ruby's #group_by
99
106
  # never sorts:
@@ -362,21 +369,15 @@ module Daru
362
369
  Daru::DataFrame.rows(rows, order: @context.vectors, index: indexes)
363
370
  end
364
371
 
365
- def apply_method method_type, method
366
- order = @non_group_vectors.select do |ngvec|
367
- method_type == :numeric && @context[ngvec].type == :numeric
368
- end
372
+ def select_numeric_non_group_vectors
373
+ @non_group_vectors.select { |ngvec| @context[ngvec].type == :numeric }
374
+ end
369
375
 
370
- rows = groups_by_idx.map do |_group, indexes|
371
- order.map do |ngvector|
372
- slice = @context[ngvector][*indexes]
373
- slice.is_a?(Daru::Vector) ? slice.send(method) : slice
374
- end
375
- end
376
+ def apply_method method_type, method
377
+ raise 'To implement' if method_type != :numeric
378
+ aggregation_options = select_numeric_non_group_vectors.map { |k| [k, method] }.to_h
376
379
 
377
- index = get_grouped_index
378
- order = Daru::Index.new(order)
379
- Daru::DataFrame.new(rows.transpose, index: index, order: order)
380
+ aggregate(aggregation_options)
380
381
  end
381
382
 
382
383
  def get_grouped_index(index_tuples=nil)
@@ -70,6 +70,22 @@ module Daru
70
70
  resultant_dv
71
71
  end
72
72
 
73
+ def vector_apply_where dv, bool_array
74
+ _data, new_index = fetch_new_data_and_index dv, bool_array
75
+ all_index = dv.index
76
+ all_data = all_index.map { |idx| new_index.include?(idx) ? yield(dv[idx]) : dv[idx] }
77
+
78
+ resultant_dv = Daru::Vector.new all_data,
79
+ index: dv.index.class.new(all_index),
80
+ dtype: dv.dtype,
81
+ type: dv.type,
82
+ name: dv.name
83
+
84
+ # Preserve categories order for category vector
85
+ resultant_dv.categories = dv.categories if dv.category?
86
+ resultant_dv
87
+ end
88
+
73
89
  private
74
90
 
75
91
  def fetch_new_data_and_index dv, bool_array
@@ -12,6 +12,8 @@ module Daru
12
12
  # TODO: Remove this line but its causing erros due to unkown reason
13
13
  Daru.has_nyaplot?
14
14
 
15
+ attr_accessor(*Configuration::INSPECT_OPTIONS_KEYS)
16
+
15
17
  extend Gem::Deprecate
16
18
 
17
19
  class << self
@@ -545,6 +547,17 @@ module Daru
545
547
  self[n] = vector
546
548
  end
547
549
 
550
+ def insert_vector n, name, source
551
+ raise ArgumentError unless source.is_a? Array
552
+ vector = Daru::Vector.new(source, index: @index, name: @name)
553
+ @data << vector
554
+ @vectors = @vectors.add name
555
+ ordr = @vectors.dup.to_a
556
+ elmnt = ordr.pop
557
+ ordr = ordr.insert n, elmnt
558
+ self.order=ordr
559
+ end
560
+
548
561
  # Access a row or set/create a row. Refer #[] and #[]= docs for details.
549
562
  #
550
563
  # == Usage
@@ -1696,6 +1709,24 @@ module Daru
1696
1709
  self.vectors = Daru::Index.new new_names
1697
1710
  end
1698
1711
 
1712
+ # Renames the vectors and returns itself
1713
+ #
1714
+ # == Arguments
1715
+ #
1716
+ # * name_map - A hash where the keys are the exising vector names and
1717
+ # the values are the new names. If a vector is renamed
1718
+ # to a vector name that is already in use, the existing
1719
+ # one is overwritten.
1720
+ #
1721
+ # == Usage
1722
+ #
1723
+ # df = Daru::DataFrame.new({ a: [1,2,3,4], b: [:a,:b,:c,:d], c: [11,22,33,44] })
1724
+ # df.rename_vectors! :a => :alpha, :c => :gamma # df
1725
+ def rename_vectors! name_map
1726
+ rename_vectors(name_map)
1727
+ self
1728
+ end
1729
+
1699
1730
  # Return the indexes of all the numeric vectors. Will include vectors with nils
1700
1731
  # alongwith numbers.
1701
1732
  def numeric_vectors
@@ -2091,7 +2122,7 @@ module Daru
2091
2122
  end
2092
2123
 
2093
2124
  # Convert to html for IRuby.
2094
- def to_html(threshold=30)
2125
+ def to_html(threshold=Daru.max_rows)
2095
2126
  table_thead = to_html_thead
2096
2127
  table_tbody = to_html_tbody(threshold)
2097
2128
  path = if index.is_a?(MultiIndex)
@@ -2112,7 +2143,8 @@ module Daru
2112
2143
  ERB.new(File.read(table_thead_path).strip).result(binding)
2113
2144
  end
2114
2145
 
2115
- def to_html_tbody(threshold=30)
2146
+ def to_html_tbody(threshold=Daru.max_rows)
2147
+ threshold ||= @size
2116
2148
  table_tbody_path =
2117
2149
  if index.is_a?(MultiIndex)
2118
2150
  File.expand_path('../iruby/templates/dataframe_mi_tbody.html.erb', __FILE__)
@@ -2229,10 +2261,11 @@ module Daru
2229
2261
  end
2230
2262
 
2231
2263
  # Pretty print in a nice table format for the command line (irb/pry/iruby)
2232
- def inspect spacing=10, threshold=15
2264
+ def inspect spacing=Daru.spacing, threshold=Daru.max_rows
2233
2265
  name_part = @name ? ": #{@name} " : ''
2266
+ spacing = [headers.to_a.map(&:length).max, spacing].max
2234
2267
 
2235
- "#<#{self.class}#{name_part}(#{nrows}x#{ncols})>\n" +
2268
+ "#<#{self.class}#{name_part}(#{nrows}x#{ncols})>#{$INPUT_RECORD_SEPARATOR}" +
2236
2269
  Formatters::Table.format(
2237
2270
  each_row.lazy,
2238
2271
  row_headers: row_headers,
@@ -93,11 +93,12 @@ module Daru
93
93
  def infer_offset data
94
94
  diffs = data.each_cons(2).map { |d1, d2| d2 - d1 }
95
95
 
96
- if diffs.uniq.count == 1
97
- TIME_INTERVALS[diffs.first].new
98
- else
99
- nil
100
- end
96
+ return nil unless diffs.uniq.count == 1
97
+
98
+ return TIME_INTERVALS[diffs.first].new if TIME_INTERVALS.include?(diffs.first)
99
+
100
+ number_of_seconds = diffs.first / Daru::Offsets::Second.new.multiplier
101
+ Daru::Offsets::Second.new(number_of_seconds.numerator) if number_of_seconds.denominator == 1
101
102
  end
102
103
 
103
104
  def find_index_of_date data, date_time