red_amber 0.1.5 → 0.1.6

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4d18eedf5de7fd06fe52e8a82ad38fe12d590dc10929c96872e557b9e946f785
4
- data.tar.gz: dda93f0af421096410e00ecf2261e8846a236634bd96ae9941d1b5cd49cd5eb2
3
+ metadata.gz: ae6a6696e0f01ae7d621d11542e203803ba117fc1ee3d286a1444b3c4ac746fc
4
+ data.tar.gz: 722d4ad538fe4f0c85db4911e773e1f87eb03f47fa63c954529bf04babc55d8c
5
5
  SHA512:
6
- metadata.gz: 7c1b1edd6c1f6f3f275ea765c4bc8765327c88a36120a4c5a66dd8afa59f5913db4a5b436d80378554e03403bab823edf7467beea0f44e2803e36f3e9677a065
7
- data.tar.gz: 949fd15d2076d4e53fb141375bde282228c7f6566e137047344134c54964fe77fd2f9757b0bdc324eb3cfa14091f2ae928e0e844d28f3ebbcfa17fc7d388bbd0
6
+ metadata.gz: 96887abfbdd44330e80a6a97f91597c00706fc99492d086683702f1d3e757331e90fb275e5796a2b7b3228b476f8c7799ab22727411baedb79ae39acafd2d3f0
7
+ data.tar.gz: c020bba60734fccdeb4a18efecb98260f70fa76c5b8ba2c7f2830d2dac6de66e9776a1fa2ffc5d396e7270b29c5e4dd9737dee70b5dcc47dd3113aaafe4f4d22
data/.rubocop.yml CHANGED
@@ -56,7 +56,9 @@ Metrics/AbcSize:
56
56
  Max: 30
57
57
  Exclude:
58
58
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
59
- - 'lib/red_amber/vector_compensable.rb' # Max: 36
59
+ - 'lib/red_amber/data_frame_selectable.rb' # Max: 51
60
+ - 'lib/red_amber/vector_updatable.rb' # Max: 36
61
+ - 'lib/red_amber/vector_selectable.rb' # Max: 33
60
62
 
61
63
  # Max: 25
62
64
  Metrics/BlockLength:
@@ -66,15 +68,18 @@ Metrics/BlockLength:
66
68
 
67
69
  # Max: 100
68
70
  Metrics/ClassLength:
69
- Max: 120
71
+ Max: 100
70
72
  Exclude:
71
73
  - 'test/**/*'
74
+ - 'lib/red_amber/data_frame.rb' #Max: 131
75
+ - 'lib/red_amber/vector.rb' #Max: 102
72
76
 
73
77
  # Max: 7
74
78
  Metrics/CyclomaticComplexity:
75
79
  Max: 12
76
80
  Exclude:
77
- - 'lib/red_amber/vector_compensable.rb' # Max: 14
81
+ - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
82
+ - 'lib/red_amber/vector_updatable.rb' # Max: 14
78
83
 
79
84
  # Max: 10
80
85
  Metrics/MethodLength:
@@ -86,20 +91,34 @@ Metrics/MethodLength:
86
91
  Metrics/ModuleLength:
87
92
  Max: 100
88
93
  Exclude:
94
+ - 'lib/red_amber/data_frame_selectable.rb' # Max: 141
89
95
  - 'lib/red_amber/vector_functions.rb' # Max: 114
90
96
 
91
97
  # Max: 8
92
98
  Metrics/PerceivedComplexity:
93
99
  Max: 13
94
100
  Exclude:
95
- - 'lib/red_amber/vector_compensable.rb' # Max: 15
101
+ - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
102
+ - 'lib/red_amber/vector_updatable.rb' # Max: 15
103
+
104
+ Naming/FileName:
105
+ Exclude:
106
+ - 'lib/red-amber.rb'
96
107
 
97
- # Necessary to define is_na
108
+ # Necessary to define is_na, is_in, etc.
98
109
  Naming/PredicateName:
99
110
  Exclude:
100
111
  - 'lib/red_amber/vector_functions.rb'
112
+ - 'lib/red_amber/vector.rb'
113
+ - 'lib/red_amber/vector_selectable.rb'
101
114
 
102
115
  # Necessary to test when range.end == -1
103
116
  Style/SlicingWithRange:
104
117
  Exclude:
105
118
  - 'test/test_data_frame_selectable.rb'
119
+
120
+ # Necessary to Vector < 0 element-wise comparison
121
+ Style/NumericPredicate:
122
+ Exclude:
123
+ - 'lib/red_amber/data_frame_selectable.rb'
124
+ - 'lib/red_amber/vector_selectable.rb'
data/CHANGELOG.md CHANGED
@@ -1,30 +1,115 @@
1
- ## [0.2.0] - unreleased
1
+ ## - unreleased
2
2
 
3
3
  - Document
4
4
  - YARD support
5
5
 
6
- - DataFrame#join features
6
+ - `datasets-red-amber` gem
7
+ - `red-amber` gem
7
8
 
8
- ## [0.1.6] - Unreleased
9
+ - `Vector#divmod`
10
+ - Introduce if Arrow's function is ready
9
11
 
10
- - Feedback something to Red Data Tools
12
+ ## - Unreleased, will be after Arrow 9.0.0 released
11
13
 
12
14
  - `DataFrame`
13
15
  - Introduce `summary` or ``describe`
14
- - Add `Quantile` by own code?
15
- - Improve dataframe obs. manipuration methods to accept float as a index (#10)
16
- - Improve as more performant by benchmark check.
16
+ - `Quantile` will be available
17
17
 
18
- - `Vector`
19
- - Support more functions
20
- - Support coerece
18
+ ## [0.1.7] - Unreleased, may be 2022-07-10
21
19
 
20
+ - Feedback something to Red Data Tools
21
+ - Support more functions
22
+ - Improve as more performant
22
23
  - More examples of frequently needed tasks
23
24
 
25
+ - New `Group` API
26
+ - `DataFrame#join features
27
+
28
+ ## [0.1.6] - 2022-06-26 (experimental)
29
+
30
+ - Bug fixes
31
+ - Fix mime-type of empty DataFrame in `#to_iruby` (#31)
32
+ - Fix mime setting in `DataFrame#to_iruby` (#36)
33
+ - Fix unmatched return val in Selectable (#34)
34
+ - Fix to return same error as `#[]` in `DataFrame#slice` (#34)
35
+
36
+ - New features and improvements
37
+ - Introduce Jupyter support (#29, #30, #31, #32)
38
+ - Add `DataFrame#to_html (changed to use #to_iruby)
39
+ - Add feature to show nil in to_iruby
40
+ - nil is expressed as (nil)
41
+ - empty string('') is ""
42
+ - blank spaces are " "
43
+
44
+ - Enable to change DataFrame display mode by ENV (#36)
45
+ - Support ENV['RED_AMBER_OUTPUT_STYLE'] to change display mode in `#inspect` and `#to_iruby`
46
+ - ENV['RED_AMBER_OUTPUT_STYLE'] = 'table' # => Table mode
47
+ - ENV['RED_AMBER_OUTPUT_STYLE'] = nil or other than 'table' # => TDR mode
48
+
49
+ - Support `require 'red-amber'`, as well (#34)
50
+
51
+ - Refine Vector slicing methods (#31)
52
+ - Introduce `Vector#take` method
53
+ - Introduce `Vector#filter` method
54
+ - Improve `Vector#[]` to overload take and filter
55
+ - Introduce `Vector#drop_nil` method
56
+ - Introduce `Vector#if_else` method
57
+ - Intorduce `Vector#is_in` method
58
+ - Add alias `Vector#all?`, `#any?` methods (#32)
59
+ - Add `Vector#has_nil?` method(#32)
60
+ - Add `Vector#empty?` method
61
+ - Add `Vector#primitive_invert` method
62
+ - Refactor `Vector#take`, `#filter`
63
+ - Move `Vector#if_else` from function to Updatable
64
+ - Move if_else test to updatable
65
+ - Rename updatable in test
66
+ - Remove method `Vector#take_out_element_wise`
67
+ - Rename inner metthod name
68
+
69
+ - Refine DataFrame slicing methods (#31)
70
+ - Introduce `DataFrame#take method
71
+ - #take is implemented as vector calculation by #if_else
72
+ - Introduce `DataFrame#fliter method
73
+ - Change `DataFrame#[] to use take and filter
74
+ - Float indices is acceptable (#10)
75
+ - Negative index (like Array) is also acceptable
76
+
77
+ - Further refinement in DataFrame slicing methods (#34)
78
+ - Improve `DataFrame#[]`, `#slice`, `#remove` by a new engine
79
+ - It parses arguments to Vector internally.
80
+ - Used Kernel#Array to simplify code (#16) .
81
+ - recycle: Move `DataFrame#slice`, `#remove` to Selectable
82
+ - Refine `DataFrame#take`, `#filter` (undocumented)
83
+
84
+ - Introduce coerce in Vector (#35)
85
+ - Introduce `Vector#coerce`
86
+ - Now we can `-1 * Vector.new([1, 2, 3])`
87
+ - Add `Vector#to_ary` method
88
+ - Now we can `[1, 2] + Vector.new([3, 4, 5])`
89
+
90
+ - Other new feature or refinements
91
+ - Common
92
+ - Refactor helper as common for DataFrame and Vector (#35)
93
+ - Change name row/col to obs/var (#34)
94
+ - Rename internal function name (#34)
95
+ - Delete unused methods (#34)
96
+ - DataFrame
97
+ - Change to return instance variable in `#to_arrow`, `#keys` and `#key_index` (#34)
98
+ - Change to return an Array in `DataFrame#indices` (#35)
99
+ - Vector
100
+ - Introduce `Vector#replace` method
101
+ - Accept Range and expanded Array in `Vector#new`
102
+ - Add `Vector#indices` method (#35)
103
+ - Add `Vector#index` method (#35)
104
+ - Rename VectorCompensable to *Updatable (#33)
105
+
106
+ - Documentation
107
+ - Fix typo in DataFrame.md
108
+
24
109
  ## [0.1.5] - 2022-06-12 (experimental)
25
110
 
26
111
  - Bug fixes
27
- - Fix DF#tdr to display timestamp type (#19)
112
+ - Fix DataFrame#tdr to display timestamp type (#19)
28
113
  - Add TZ setting in CI test to pass temporal tests (#19)
29
114
  - Fix example in document of #load(csv_from_URI) (#23)
30
115
 
@@ -38,7 +123,7 @@
38
123
  - Add `Vector#temporal?` to check if temporal type
39
124
  - Refine around DataFrame#variables
40
125
  - Refine init of instance variables
41
- - Refine DataFrame#type_classes, V#ectortype_class
126
+ - Refine DataFrame#type_classes, Vector#ectortype_class
42
127
  - Refine DataFrame#tdr to shorten temporal data
43
128
 
44
129
  - Add supports to make up for missing values (#20)
@@ -86,7 +171,7 @@
86
171
 
87
172
  - Bug fixes
88
173
  - Fix missing support for scalar argument (#1)
89
- - Fix type name of boolean in DF#types to be same as Vector#type (#6, #7)
174
+ - Fix type name of boolean in DataFrame#types to be same as Vector#type (#6, #7)
90
175
  - Fix zero picking to return empty DataFrame (#8)
91
176
  - Fix code at both args and a block given (#8)
92
177
 
data/Gemfile CHANGED
@@ -12,6 +12,7 @@ group :test do
12
12
  gem 'rubocop-rake'
13
13
  gem 'rubocop-rubycw', require: false
14
14
 
15
+ gem 'iruby'
15
16
  gem 'test-unit'
16
17
  gem 'webrick'
17
18
 
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # RedAmber
2
2
 
3
- A simple dataframe library for Ruby (experimental)
3
+ A simple dataframe library for Ruby (experimental).
4
4
 
5
5
  - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
6
6
  - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
@@ -42,17 +42,56 @@ Or install it yourself as:
42
42
  gem install red_amber
43
43
  ```
44
44
 
45
+ (From v0.1.6)
46
+
47
+ RedAmber uses TDR mode for `#inspect` and `#to_iruby` by default. If you prefer Table mode, please set the environment variable
48
+ `RED_AMBER_OUTPUT_MODE` to `"table"`. See [TDR section](#TDR) for detail.
49
+
45
50
  ## `RedAmber::DataFrame`
46
51
 
47
- Represents a set of data in 2D-shape.
52
+ Represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
48
53
 
49
54
  ```ruby
50
- require 'red_amber'
55
+ require 'red_amber' # require 'red-amber' is also OK.
51
56
  require 'datasets-arrow'
52
57
 
53
58
  arrow = Datasets::Penguins.new.to_arrow
54
59
  penguins = RedAmber::DataFrame.new(arrow)
55
- penguins.tdr
60
+ penguins.table
61
+
62
+ # =>
63
+ #<Arrow::Table:0x111271098 ptr=0x7f9118b3e0b0>
64
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
65
+ 0 Adelie Torgersen 39.100000 18.700000 181 3750 male 2007
66
+ 1 Adelie Torgersen 39.500000 17.400000 186 3800 female 2007
67
+ 2 Adelie Torgersen 40.300000 18.000000 195 3250 female 2007
68
+ 3 Adelie Torgersen (null) (null) (null) (null) (null) 2007
69
+ 4 Adelie Torgersen 36.700000 19.300000 193 3450 female 2007
70
+ 5 Adelie Torgersen 39.300000 20.600000 190 3650 male 2007
71
+ 6 Adelie Torgersen 38.900000 17.800000 181 3625 female 2007
72
+ 7 Adelie Torgersen 39.200000 19.600000 195 4675 male 2007
73
+ 8 Adelie Torgersen 34.100000 18.100000 193 3475 (null) 2007
74
+ 9 Adelie Torgersen 42.000000 20.200000 190 4250 (null) 2007
75
+ ...
76
+ 334 Gentoo Biscoe 46.200000 14.100000 217 4375 female 2009
77
+ 335 Gentoo Biscoe 55.100000 16.000000 230 5850 male 2009
78
+ 336 Gentoo Biscoe 44.500000 15.700000 217 4875 (null) 2009
79
+ 337 Gentoo Biscoe 48.800000 16.200000 222 6000 male 2009
80
+ 338 Gentoo Biscoe 47.200000 13.700000 214 4925 female 2009
81
+ 339 Gentoo Biscoe (null) (null) (null) (null) (null) 2009
82
+ 340 Gentoo Biscoe 46.800000 14.300000 215 4850 female 2009
83
+ 341 Gentoo Biscoe 50.400000 15.700000 222 5750 male 2009
84
+ 342 Gentoo Biscoe 45.200000 14.800000 212 5200 female 2009
85
+ 343 Gentoo Biscoe 49.900000 16.100000 213 5400 male 2009
86
+ ```
87
+
88
+ By default, RedAmber shows self by compact transposed style. This unfamiliar style (TDR) is designed for
89
+ the exploratory data processing. It keeps Vectors as row vectors, shows keys and types at a glance, shows levels
90
+ for the 'factor-like' variables and shows the number of abnormal values like NaN and nil.
91
+
92
+ ```ruby
93
+ penguins
94
+
56
95
  # =>
57
96
  RedAmber::DataFrame : 344 x 8 Vectors
58
97
  Vectors : 5 numeric, 3 strings
@@ -139,9 +178,19 @@ Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/do
139
178
 
140
179
  See [Vector.md](doc/Vector.md) for details.
141
180
 
142
- ## TDR concept
181
+ ## TDR
182
+
183
+ I named the data frame representation style in the model above as TDR (Transposed DataFrame Representation).
184
+
185
+ This library can be used with both TDR mode and usual Table mode.
186
+ If you set the environment variable `RED_AMBER_OUTPUT_MODE` to `"table"`, output style by `inspect` and `to_iruby` is the Table mode. Other value including nil will output TDR style.
187
+
188
+ You can switch the mode in Ruby like this.
189
+ ```ruby
190
+ ENV['RED_AMBER_OUTPUT_STYLE'] = 'table' # => Table mode
191
+ ```
143
192
 
144
- I named the data frame representation style in the model above as TDR (Transposed DataFrame Representation). See [TDR.md](doc/tdr.md) for details.
193
+ For more detail information about TDR, see [TDR.md](doc/tdr.md).
145
194
 
146
195
  ## Development
147
196
 
data/doc/DataFrame.md CHANGED
@@ -4,11 +4,13 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
4
4
  - A collection of data which have same data type within. We call it `Vector`.
5
5
  - A label is attached to `Vector`. We call it `key`.
6
6
  - A `Vector` and associated `key` is grouped as a `variable`.
7
- - `variable`s with same vector length are aligned and arranged to be a `DaTaFrame`.
7
+ - `variable`s with same vector length are aligned and arranged to be a `DataFrame`.
8
8
  - Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `observation`.
9
9
 
10
10
  ![dataframe model image](doc/../image/dataframe_model.png)
11
11
 
12
+ (No change in this model in v0.1.6 .)
13
+
12
14
  ## Constructors and saving
13
15
 
14
16
  ### `new` from a Hash
@@ -52,7 +54,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
52
54
  - from a URI
53
55
 
54
56
  ```ruby
55
- uri = URI("uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
57
+ uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
56
58
  RedAmber::DataFrame.load(uri)
57
59
  ```
58
60
 
@@ -147,9 +149,9 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
147
149
 
148
150
  - Returns an Array of Vectors.
149
151
 
150
- ### `indexes`, `indices`
152
+ ### `indices`, `indexes`
151
153
 
152
- - Returns all indexes in a Range.
154
+ - Returns all indexes in an Array.
153
155
 
154
156
  ### `to_h`
155
157
 
@@ -179,6 +181,10 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
179
181
 
180
182
  - Returns a `Rover::DataFrame`.
181
183
 
184
+ ### `to_iruby`
185
+
186
+ - Show the DataFrame as a Table in Jupyter Notebook or Jupyter Lab with IRuby.
187
+
182
188
  ### `tdr(limit = 10, tally: 5, elements: 5)`
183
189
 
184
190
  - Shows some information about self in a transposed style.
@@ -280,6 +286,9 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
280
286
  An end-less or a begin-less Range can be used to represent indeces.
281
287
 
282
288
  - Select obs. by indeces in an Array: `df[1, 2]`
289
+
290
+ - You can use float indices.
291
+
283
292
  - Mixed case: `df[2, 0..]`
284
293
 
285
294
  ```ruby
@@ -423,9 +432,11 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
423
432
 
424
433
  ![slice method image](doc/../image/dataframe/slice.png)
425
434
 
426
- - Keys as arguments
435
+ - Indices as arguments
427
436
 
428
- `slice(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
437
+ `slice(indeces)` accepts indices as arguments. Indices should be Integers, Floats or Ranges of Integers.
438
+
439
+ Negative index from the tail like Ruby's Array is also acceptable.
429
440
 
430
441
  ```ruby
431
442
  # returns 5 obs. at start and 5 obs. from end
@@ -457,7 +468,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
457
468
  ... 5 more Vectors ...
458
469
  ```
459
470
 
460
- - Keys or booleans by a block
471
+ - Indices or booleans by a block
461
472
 
462
473
  `slice {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.
463
474
 
@@ -469,6 +480,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
469
480
  max = vector.mean + vector.std
470
481
  vector.to_a.map { |e| (min..max).include? e }
471
482
  end
483
+
472
484
  # =>
473
485
  #<RedAmber::DataFrame : 204 x 8 Vectors, 0x000000000000f30c>
474
486
  Vectors : 5 numeric, 3 strings
@@ -509,7 +521,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
509
521
 
510
522
  ![remove method image](doc/../image/dataframe/remove.png)
511
523
 
512
- - Keys as arguments
524
+ - Indices as arguments
513
525
 
514
526
  `remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
515
527
 
@@ -548,7 +560,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
548
560
  8 :year uint16 3 {2007=>103, 2008=>113, 2009=>117}
549
561
  ```
550
562
 
551
- - Keys or booleans by a block
563
+ - Indices or booleans by a block
552
564
 
553
565
  `remove {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.
554
566
 
@@ -748,6 +760,8 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
748
760
 
749
761
  ### `group(aggregating_keys, function, target_keys)`
750
762
 
763
+ (This is a temporary API and may change in the future version.)
764
+
751
765
  Create grouped dataframe by `aggregation_keys` and apply `function` to each group and returns in `target_keys`. Aggregated key name is `function(key)` style.
752
766
 
753
767
  (The current implementation is not intuitive. Needs improvement.)