red_amber 0.1.1 → 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 00ba2e99b2b1d6f977b2e2e5c7d60b9313972cf3e831918606e5388d51442137
4
- data.tar.gz: f0fc831937bff5fede4ee0f0537b0ef5fdfb8a1faa8a57082a197a627562252c
3
+ metadata.gz: 54de345111ab7c3918e119abe820d2ff207007f1ce9731e2f8954513d47c76a9
4
+ data.tar.gz: 75e4251c6d6be8eab05739f75e064a2e65cbe3abdafaa574c559d9356fe93a20
5
5
  SHA512:
6
- metadata.gz: 7bc020b8663c3523426461e3bd54642d4eb85a86296a8db3f5d94315091ee4475ec8b910fb87165c5d029e35fa9dc45f119bea6278e023d3cc63ad011388fbfb
7
- data.tar.gz: 78dd55182b40ee9bec769efdbcac23adb85ad93bbafe3f74c4ded9d56ab40e39da0ce1e34a841d5705e6b94fea85312057e360140965fa217243eede0d238eb5
6
+ metadata.gz: 60c2d11d30b91947b67e608864e5e4fe13e544662f671789256e6e2e624a892577f616572e4ba55be4de99affd528d020060b4be56f8820250697db2a80132a2
7
+ data.tar.gz: 19170b7cd3d6b1174b7de44c0b8841d47acc4d1832fe72fdc8adc7171245e031c922614aca979755ae035566deae0a711644a3e483cecbceabdcfc411efb2263
data/.rubocop.yml CHANGED
@@ -45,7 +45,7 @@ Lint/BinaryOperatorWithIdenticalOperands:
45
45
 
46
46
  # Max: 120
47
47
  Layout/LineLength:
48
- Max: 100
48
+ Max: 118
49
49
  Exclude:
50
50
  - 'test/**/*'
51
51
 
@@ -53,7 +53,7 @@ Layout/LineLength:
53
53
  # 18..30 unsatisfactory
54
54
  # > 30 dangerous
55
55
  Metrics/AbcSize:
56
- Max: 19
56
+ Max: 23
57
57
  Exclude:
58
58
  - 'lib/red_amber/data_frame_output.rb' # Max: 78
59
59
 
@@ -84,6 +84,11 @@ Metrics/MethodLength:
84
84
 
85
85
  # Max: 8
86
86
  Metrics/PerceivedComplexity:
87
- Max: 9
87
+ Max: 11
88
88
  Exclude:
89
89
  - 'lib/red_amber/data_frame_output.rb' # Max: 12
90
+
91
+ # Necessary to test when range.end == -1
92
+ Style/SlicingWithRange:
93
+ Exclude:
94
+ - 'test/test_data_frame_selectable.rb'
data/.rubocop_todo.yml CHANGED
@@ -1,17 +1,11 @@
1
1
  # This configuration was generated by
2
2
  # `rubocop --auto-gen-config`
3
- # on 2022-04-27 00:29:57 UTC using RuboCop version 1.27.0.
3
+ # on 2022-05-08 02:37:36 UTC using RuboCop version 1.27.0.
4
4
  # The point is for the user to remove these configuration records
5
5
  # one by one as the offenses are removed from the code base.
6
6
  # Note that changes in the inspected code, or installation of new
7
7
  # versions of RuboCop, may require this file to be generated again.
8
8
 
9
- # Offense count: 1
10
- # This cop supports unsafe auto-correction (--auto-correct-all).
11
- Style/SlicingWithRange:
12
- Exclude:
13
- - 'lib/red_amber/data_frame_selectable.rb'
14
-
15
9
  # Offense count: 1
16
10
  # This cop supports unsafe auto-correction (--auto-correct-all).
17
11
  # Configuration parameters: EnforcedStyle.
data/CHANGELOG.md CHANGED
@@ -1,17 +1,29 @@
1
- ## [0.1.2] - Unreleased
1
+ ## [0.1.3] - Unreleased
2
2
 
3
- - Add support for Arrow 8.0.0
4
3
  - `DataFrame`
5
- - Introduce updating
4
+ - Introduce updating capabilities
6
5
  - Introduce NA support
7
6
  - Add slice method
8
7
  - `Vector`
9
8
  - Add NaN support for functions
10
9
  - More functions
11
10
 
11
+ ## [0.1.2] - 2022-05-08 (experimental)
12
+
13
+ - Bug fixes:
14
+ - `DataFrame`
15
+ - Fix bug in `#[]` with end-less Range
16
+ - New features and improvements
17
+ - Add support for Arrow 8.0.0
18
+ - `DataFrame`
19
+ - `types` and `data_types`
20
+ - Range is usable to specify columns in `#[]`
21
+ - `Vector`
22
+ - `type` and `data_type`
23
+
12
24
  ## [0.1.1] - 2022-05-06 (experimental)
13
25
 
14
- - Release on rubygem.org
26
+ - Release on rubygems.org
15
27
  - Introduce class `DataFrame`
16
28
  - New from Hash, schema/rows, `Arrow::Table`, `Rover::DataFrame`
17
29
  - Load from file, string, URI
data/README.md CHANGED
@@ -8,8 +8,8 @@ A simple dataframe library for Ruby (experimental)
8
8
  ## Requirements
9
9
 
10
10
  ```ruby
11
- gem 'red-arrow', '~> 7.0.0'
12
- gem 'red-parquet', '~> 7.0.0' # if you use IO from/to parquet
11
+ gem 'red-arrow', '>= 7.0.0'
12
+ gem 'red-parquet', '>= 7.0.0' # if you use IO from/to parquet
13
13
  gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame
14
14
  ```
15
15
 
@@ -89,10 +89,13 @@ Or install it yourself as:
89
89
 
90
90
  Returns num of column names by an Array.
91
91
 
92
- - [x] `types(class_name: false)`
92
+ - [x] `types`
93
93
 
94
- Returns types of columns by an Array.
95
- If `class_name: true` returns an Array of `Arrow::DataType`.
94
+ Returns types of columns by an Array of Symbols.
95
+
96
+ - [x] `data_types`
97
+
98
+ Returns types of columns by an Array of `Arrow::DataType`.
96
99
 
97
100
  - [x] `vectors`
98
101
 
@@ -128,20 +131,50 @@ Or install it yourself as:
128
131
 
129
132
  Shows some information about self.
130
133
 
134
+ ```ruby
135
+ hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
136
+ RedAmber::DataFrame.new(hash)
137
+ # =>
138
+ RedAmber::DataFrame : 3 observations(rows) of 3 variables(columns)
139
+ Variables : 2 numeric, 1 string
140
+ # key type level data_preview
141
+ 1 :a uint8 3 [1, 2, 3]
142
+ 2 :b string 3 [A, B, C]
143
+ 3 :c double 3 [1.0, 2.0, 3.0]
144
+ ```
145
+
131
146
  - tally_level: max level to use tally mode
132
147
  - max_element: max num of element to show values in each row
133
148
 
134
149
  ### Selecting
135
150
 
136
- - [x] Selecting columns by `[]`
137
-
138
- `[key]`, `[keys]`, `[keys[index]]`
139
-
140
- - [x] Selecting rows by `[]`
151
+ - [x] Select columns by `[]` as `[key]`, `[keys]`, `[keys[index]]`
152
+ - Key in a Symbol: `df[:symbol]`
153
+ - Key in a String: `df["string"]`
154
+ - Keys in an Array: `df[:symbol1`, `"string"`, `:symbol2`
155
+ - Keys in indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`
156
+ - Keys in a Range:
157
+ A end-less Range can be used to represent keys.
158
+ ```ruby
159
+ hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
160
+ df = RedAmber::DataFrame.new(hash)
161
+ df[:b..:c, "a"]
162
+ # =>
163
+ RedAmber::DataFrame : 3 observations(rows) of 3 variables(columns)
164
+ Variables : 2 numeric, 1 string
165
+ # key type level data_preview
166
+ 1 :b string 3 [A, B, C]
167
+ 2 :c double 3 [1.0, 2.0, 3.0]
168
+ 3 :a uint8 3 [1, 2, 3]
169
+ ```
141
170
 
142
- `[index]`, `[range]`, `[array]`
171
+ - [x] Select rows by `[]` as `[index]`, `[range]`, `[array]`
172
+ - Select a row by index: `df[0]`
173
+ - Select rows by indeces in a Range: `df[1..2]`
174
+ - Select rows by indeces in an Array: `df[1, 2]`
175
+ - Mixed case: `df[2, 0..]`
143
176
 
144
- - [x] Selecting rows from top or bottom
177
+ - [x] Select rows from top or bottom
145
178
 
146
179
  `head(n=5)`, `tail(n=5)`, `first(n=1)`, `last(n=1)`
147
180
 
@@ -213,6 +246,8 @@ Or install it yourself as:
213
246
 
214
247
  - [x] `type`
215
248
 
249
+ - [x] `data_type`
250
+
216
251
  - [ ] `each`
217
252
 
218
253
  - [ ] `chunked?`
@@ -324,7 +359,7 @@ Or install it yourself as:
324
359
 
325
360
  ## Development
326
361
 
327
- ```
362
+ ```shell
328
363
  git clone https://github.com/heronshoes/red_amber.git
329
364
  cd red_amber
330
365
  bundle install
@@ -9,13 +9,13 @@ module RedAmber
9
9
  include DataFrameOutput
10
10
 
11
11
  def initialize(*args)
12
- # accepts: DataFrame.new, DataFrame.new([]), DataFrame.new(nil)
12
+ # DataFrame.new, DataFrame.new([]), DataFrame.new({}), DataFrame.new(nil)
13
13
  # returns empty DataFrame
14
14
  @table = Arrow::Table.new({}, [])
15
15
  # bug in gobject-introspection: ruby-gnome/ruby-gnome#1472
16
16
  # [Arrow::Table] == [nil] shows ArgumentError
17
17
  # temporary use yoda condition to workaround
18
- return if args.empty? || args == [[]] || [nil] == args
18
+ return if args.empty? || args == [[]] || args == [{}] || [nil] == args
19
19
 
20
20
  if args.size > 1
21
21
  @table = Arrow::Table.new(*args)
@@ -26,11 +26,9 @@ module RedAmber
26
26
  when Arrow::Table then arg
27
27
  when DataFrame then arg.table
28
28
  when Rover::DataFrame then Arrow::Table.new(arg.to_h)
29
- when Hash
30
- args << [] if arg.empty? # create empty df from DataFrame.new({})
31
- Arrow::Table.new(*args)
29
+ when Hash then Arrow::Table.new(arg)
32
30
  else
33
- raise DataFrameTypeError, "invalid argument: #{args}"
31
+ raise DataFrameTypeError, "invalid argument: #{arg}"
34
32
  end
35
33
  end
36
34
  end
@@ -69,10 +67,15 @@ module RedAmber
69
67
  alias_method :keys, :column_names
70
68
  alias_method :header, :column_names
71
69
 
72
- def types(class_name: false)
70
+ def types
73
71
  @table.columns.map do |column|
74
- r = column.data_type
75
- class_name ? r.class : r.to_s.to_sym
72
+ column.data_type.to_s.to_sym
73
+ end
74
+ end
75
+
76
+ def data_types
77
+ @table.columns.map do |column|
78
+ column.data_type.class
76
79
  end
77
80
  end
78
81
 
@@ -35,7 +35,7 @@ module RedAmber
35
35
  "#{self.class} : #{nrow} observation#{r}(row#{r}) of #{ncol} variable#{c}(column#{c})"
36
36
 
37
37
  # 2nd row: show var counts by type
38
- type_groups = types(class_name: true).map { |t| type_group(t) }
38
+ type_groups = data_types.map { |t| type_group(t) }
39
39
 
40
40
  stringio.puts "Variable#{pl(ncol)} : #{var_type_count(type_groups).join(', ')}"
41
41
 
@@ -1,7 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module RedAmber
4
- # mix-ins for the class DataFrame
4
+ # mix-in for the class DataFrame
5
5
  module DataFrameSelectable
6
6
  # select columns: [symbol] or [string]
7
7
  # select rows: [array of index], [range]
@@ -12,25 +12,25 @@ module RedAmber
12
12
  # expand Range like [1..3, 4] to [1, 2, 3, 4]
13
13
  expanded =
14
14
  args.each_with_object([]) do |e, a|
15
- e.is_a?(Range) ? a.concat(e.to_a) : a.append(e)
15
+ e.is_a?(Range) ? a.concat(normalized_array(e)) : a.append(e)
16
16
  end
17
17
 
18
18
  return select_rows(expanded) if integers?(expanded)
19
19
  return select_columns(expanded.map(&:to_sym)) if sym_or_str?(expanded)
20
20
 
21
- raise DataFrameArgumentError, "invalid argument #{args}"
21
+ raise DataFrameArgumentError, "Invalid argument #{args}"
22
22
  end
23
23
 
24
24
  def head(n_rows = 5)
25
- raise DataFrameArgumentError, "index is out of range #{n_rows}" if n_rows.negative?
25
+ raise DataFrameArgumentError, "Index is out of range #{n_rows}" if n_rows.negative?
26
26
 
27
27
  self[0...[n_rows, size].min]
28
28
  end
29
29
 
30
30
  def tail(n_rows = 5)
31
- raise DataFrameArgumentError, "index is out of range #{n_rows}" if n_rows.negative?
31
+ raise DataFrameArgumentError, "Index is out of range #{n_rows}" if n_rows.negative?
32
32
 
33
- self[-[n_rows, size].min..-1]
33
+ self[-[n_rows, size].min..]
34
34
  end
35
35
 
36
36
  def first(n_rows = 1)
@@ -52,14 +52,27 @@ module RedAmber
52
52
  end
53
53
 
54
54
  def select_rows(indeces)
55
- if out_of_range?(indeces)
56
- raise DataFrameArgumentError, "invalid index: #{indeces} for [0..#{size - 1}]"
57
- end
55
+ out_of_range?(indeces) && raise(DataFrameArgumentError, "Invalid index: #{indeces} for 0..#{size - 1}")
58
56
 
59
57
  a = indeces.map { |i| @table.slice(i).to_a }
60
58
  DataFrame.new(@table.schema, a)
61
59
  end
62
60
 
61
+ def normalized_array(range)
62
+ both_end = [range.begin, range.end]
63
+ both_end[1] -= 1 if range.exclude_end? && range.end.is_a?(Integer)
64
+
65
+ if both_end.any?(Integer) || both_end.all?(&:nil?)
66
+ if both_end.any? { |e| e&.>=(size) || e&.<(-size) }
67
+ raise DataFrameArgumentError, "Index out of range: #{range} for 0..#{size - 1}"
68
+ end
69
+
70
+ (0...size).to_a[range]
71
+ else
72
+ range.to_a
73
+ end
74
+ end
75
+
63
76
  def out_of_range?(indeces)
64
77
  indeces.max >= size || indeces.min < -size
65
78
  end
@@ -49,6 +49,10 @@ module RedAmber
49
49
  @data.value_type.nick.to_sym
50
50
  end
51
51
 
52
+ def data_type
53
+ @data.value_type
54
+ end
55
+
52
56
  # def each() end
53
57
 
54
58
  def chunked?
@@ -164,7 +164,7 @@ module RedAmber
164
164
  when Rover::Vector
165
165
  func.execute([data, other.to_a])
166
166
  else
167
- raise ArgumentError, "operand is not supported: #{other.class}"
167
+ raise ArgumentError, "Operand is not supported: #{other.class}"
168
168
  end
169
169
  options[:aggregate] ? output.value : Vector.new(output.value)
170
170
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module RedAmber
4
- VERSION = '0.1.1'
4
+ VERSION = '0.1.2'
5
5
  end
data/red_amber.gemspec CHANGED
@@ -6,10 +6,10 @@ Gem::Specification.new do |spec|
6
6
  spec.name = 'red_amber'
7
7
  spec.version = RedAmber::VERSION
8
8
  spec.authors = ['Hirokazu SUZUKI (heronshoes)']
9
- spec.email = ['63298319+heronshoes@users.noreply.github.com']
9
+ spec.email = ['heronshoes877@gmail.com']
10
10
 
11
- spec.summary = 'Simple data frames for Ruby'
12
- spec.description = 'Powered by Red Arrow and simple API similar to Rover-df'
11
+ spec.summary = 'Simple dataframe library for Ruby'
12
+ spec.description = 'RedAmber is a simple dataframe library powered by Red Arrow with simple API similar to Rover-df.'
13
13
  spec.homepage = 'https://github.com/heronshoes/red_amber'
14
14
  spec.license = 'MIT'
15
15
  spec.required_ruby_version = '>= 2.7'
@@ -30,8 +30,8 @@ Gem::Specification.new do |spec|
30
30
  spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
31
31
  spec.require_paths = ['lib']
32
32
 
33
- spec.add_dependency 'red-arrow', '~> 7.0.0'
34
- spec.add_dependency 'red-parquet', '~> 7.0.0'
33
+ spec.add_dependency 'red-arrow', '>= 7.0.0'
34
+ spec.add_dependency 'red-parquet', '>= 7.0.0'
35
35
  spec.add_dependency 'rover-df', '~> 0.3.0'
36
36
 
37
37
  # Development dependency has gone to the Gemfile (rubygems/bundler#7237)
metadata CHANGED
@@ -1,41 +1,41 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: red_amber
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Hirokazu SUZUKI (heronshoes)
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-05-06 00:00:00.000000000 Z
11
+ date: 2022-05-08 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: red-arrow
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - "~>"
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
19
  version: 7.0.0
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - "~>"
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: 7.0.0
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: red-parquet
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - "~>"
31
+ - - ">="
32
32
  - !ruby/object:Gem::Version
33
33
  version: 7.0.0
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - "~>"
38
+ - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: 7.0.0
41
41
  - !ruby/object:Gem::Dependency
@@ -52,9 +52,10 @@ dependencies:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
54
  version: 0.3.0
55
- description: Powered by Red Arrow and simple API similar to Rover-df
55
+ description: RedAmber is a simple dataframe library powered by Red Arrow with simple
56
+ API similar to Rover-df.
56
57
  email:
57
- - 63298319+heronshoes@users.noreply.github.com
58
+ - heronshoes877@gmail.com
58
59
  executables: []
59
60
  extensions: []
60
61
  extra_rdoc_files: []
@@ -102,5 +103,5 @@ requirements: []
102
103
  rubygems_version: 3.3.7
103
104
  signing_key:
104
105
  specification_version: 4
105
- summary: Simple data frames for Ruby
106
+ summary: Simple dataframe library for Ruby
106
107
  test_files: []