red_amber 0.1.6 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ae6a6696e0f01ae7d621d11542e203803ba117fc1ee3d286a1444b3c4ac746fc
4
- data.tar.gz: 722d4ad538fe4f0c85db4911e773e1f87eb03f47fa63c954529bf04babc55d8c
3
+ metadata.gz: 88bdd603d8daec1a95c0277ef68857f84346ad7cf95d0ba23a306e6b70567c29
4
+ data.tar.gz: 40add80cbaa5183ca0e93eadcdcd1fead37015cac1cb2360660002c0b1878255
5
5
  SHA512:
6
- metadata.gz: 96887abfbdd44330e80a6a97f91597c00706fc99492d086683702f1d3e757331e90fb275e5796a2b7b3228b476f8c7799ab22727411baedb79ae39acafd2d3f0
7
- data.tar.gz: c020bba60734fccdeb4a18efecb98260f70fa76c5b8ba2c7f2830d2dac6de66e9776a1fa2ffc5d396e7270b29c5e4dd9737dee70b5dcc47dd3113aaafe4f4d22
6
+ metadata.gz: d043eea51117ecc48bdc52fa951e24d2618f273eb289a30f5bbb182e1a891763cdd35f6a7c6764f6e0061bddeaaa86b2374de1dc2b48f25a5b6b05c9af83a0e3
7
+ data.tar.gz: cdbba19750bf71fe99e55bf6c46cb4522018f43563d7a93fdc375987f9388234e4f7e833297fdb6b8dd5a41b5a1bfdbf287ea47663f5f8a90facb56a4c63daef
data/.rubocop.yml CHANGED
@@ -80,6 +80,7 @@ Metrics/CyclomaticComplexity:
80
80
  Exclude:
81
81
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
82
82
  - 'lib/red_amber/vector_updatable.rb' # Max: 14
83
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 18
83
84
 
84
85
  # Max: 10
85
86
  Metrics/MethodLength:
@@ -93,6 +94,7 @@ Metrics/ModuleLength:
93
94
  Exclude:
94
95
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 141
95
96
  - 'lib/red_amber/vector_functions.rb' # Max: 114
97
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
96
98
 
97
99
  # Max: 8
98
100
  Metrics/PerceivedComplexity:
@@ -100,6 +102,7 @@ Metrics/PerceivedComplexity:
100
102
  Exclude:
101
103
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
102
104
  - 'lib/red_amber/vector_updatable.rb' # Max: 15
105
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 19
103
106
 
104
107
  Naming/FileName:
105
108
  Exclude:
data/CHANGELOG.md CHANGED
@@ -1,29 +1,55 @@
1
- ## - unreleased
1
+ ## [0.1.9] - Unreleased
2
2
 
3
- - Document
4
- - YARD support
3
+ - Supports Arrow 9.0.0
5
4
 
6
- - `datasets-red-amber` gem
7
- - `red-amber` gem
5
+ ## [0.1.7] - 2022-07-15 (experimental)
8
6
 
9
- - `Vector#divmod`
10
- - Introduce if Arrow's function is ready
7
+ - Bug fixes
8
+
9
+ - Remove development dependency for red-dataset-arrow (#47)
10
+ - To avoid irregular fails in CI test
11
+ - Add red-datasets to development dependency instead (#49)
11
12
 
12
- ## - Unreleased, will be after Arrow 9.0.0 released
13
+ - Supress useless log in tests (#46)
14
+ Suppress log of Webrick and iruby.
13
15
 
14
- - `DataFrame`
15
- - Introduce `summary` or ``describe`
16
- - `Quantile` will be available
16
+ - New features and improvements
17
+
18
+ - Use Table mode as default preview mode in `inspect`/`to_s` (#40)
19
+ - Show examples in documents in Table
20
+ - Use the word rows/columns
21
+ - Update images of data processing in Table style
22
+
23
+ - Introduce a new Table formatter (#47)
24
+ - Migrate from the Arrow's formatter
25
+ - Do not use TAB, format by spaces only.
26
+ - Align column width with head rows and tail rows.
27
+ - Show nils.
28
+ - Show data types.
29
+ - Refine documents to use new formatter output
30
+
31
+ - Simplify options of Vector functions (#46)
32
+ Vector functions with options use optional argument opt in previous code.
33
+
34
+ - Add `#float?`, `#integer?` to Vector (#46)
35
+ - Add `#each` to Vector (#47)
36
+
37
+ - Introduce class `Group` (#48)
38
+ - Refine `DataFrame#group` to use class Group
39
+ - Add methods to Group
40
+
41
+ - Move parquet and rover to development dependency (#49)
42
+
43
+ - Refine text in `DataFrame#to_iruby` (#40)
17
44
 
18
- ## [0.1.7] - Unreleased, may be 2022-07-10
45
+ - Add badges in Github site
46
+ - Gitter badge for Red Data Tools (#42)
47
+ - Gem version and CI status badge (#45)
19
48
 
20
- - Feedback something to Red Data Tools
21
- - Support more functions
22
- - Improve as more performant
23
- - More examples of frequently needed tasks
49
+ - Exchange containers in red-amber.rb and red_amber.rb (#47)
50
+ - Mainly use red_amber by consistency with the folder name
24
51
 
25
- - New `Group` API
26
- - `DataFrame#join features
52
+ - Add Jupyter notebook '47 Examples of Red Amber' (#49)
27
53
 
28
54
  ## [0.1.6] - 2022-06-26 (experimental)
29
55
 
data/Gemfile CHANGED
@@ -7,6 +7,9 @@ gemspec
7
7
  group :test do
8
8
  gem 'rake'
9
9
 
10
+ gem 'red-parquet', '>= 8.0.0'
11
+ gem 'rover-df', '~> 0.3.0'
12
+
10
13
  gem 'rubocop'
11
14
  gem 'rubocop-performance', require: false
12
15
  gem 'rubocop-rake'
@@ -17,5 +20,5 @@ group :test do
17
20
  gem 'webrick'
18
21
 
19
22
  gem 'benchmark_driver'
20
- gem 'red-datasets-arrow'
23
+ gem 'red-datasets'
21
24
  end
data/README.md CHANGED
@@ -1,16 +1,20 @@
1
1
  # RedAmber
2
2
 
3
+ [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
4
+ [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
5
+
3
6
  A simple dataframe library for Ruby (experimental).
4
7
 
5
- - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
8
+ - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
6
9
  - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
7
10
 
8
11
  ## Requirements
9
12
 
10
13
  ```ruby
11
14
  gem 'red-arrow', '>= 8.0.0'
12
- gem 'red-parquet', '>= 8.0.0' # if you use IO from/to parquet
13
- gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame
15
+
16
+ gem 'red-parquet', '>= 8.0.0' # Optional, if you use IO from/to parquet
17
+ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
14
18
  ```
15
19
 
16
20
  ## Installation
@@ -18,7 +22,8 @@ gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame
18
22
  Install requirements before you install Red Amber.
19
23
 
20
24
  - Apache Arrow GLib (>= 8.0.0)
21
- - Apache Parquet GLib (>= 8.0.0)
25
+
26
+ - Apache Parquet GLib (>= 8.0.0) # If you use IO from/to parquet
22
27
 
23
28
  See [Apache Arrow install document](https://arrow.apache.org/install/).
24
29
 
@@ -42,11 +47,6 @@ Or install it yourself as:
42
47
  gem install red_amber
43
48
  ```
44
49
 
45
- (From v0.1.6)
46
-
47
- RedAmber uses TDR mode for `#inspect` and `#to_iruby` by default. If you prefer Table mode, please set the environment variable
48
- `RED_AMBER_OUTPUT_MODE` to `"table"`. See [TDR section](#TDR) for detail.
49
-
50
50
  ## `RedAmber::DataFrame`
51
51
 
52
52
  Represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
@@ -56,54 +56,21 @@ require 'red_amber' # require 'red-amber' is also OK.
56
56
  require 'datasets-arrow'
57
57
 
58
58
  arrow = Datasets::Penguins.new.to_arrow
59
- penguins = RedAmber::DataFrame.new(arrow)
60
- penguins.table
61
-
62
- # =>
63
- #<Arrow::Table:0x111271098 ptr=0x7f9118b3e0b0>
64
- species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
65
- 0 Adelie Torgersen 39.100000 18.700000 181 3750 male 2007
66
- 1 Adelie Torgersen 39.500000 17.400000 186 3800 female 2007
67
- 2 Adelie Torgersen 40.300000 18.000000 195 3250 female 2007
68
- 3 Adelie Torgersen (null) (null) (null) (null) (null) 2007
69
- 4 Adelie Torgersen 36.700000 19.300000 193 3450 female 2007
70
- 5 Adelie Torgersen 39.300000 20.600000 190 3650 male 2007
71
- 6 Adelie Torgersen 38.900000 17.800000 181 3625 female 2007
72
- 7 Adelie Torgersen 39.200000 19.600000 195 4675 male 2007
73
- 8 Adelie Torgersen 34.100000 18.100000 193 3475 (null) 2007
74
- 9 Adelie Torgersen 42.000000 20.200000 190 4250 (null) 2007
75
- ...
76
- 334 Gentoo Biscoe 46.200000 14.100000 217 4375 female 2009
77
- 335 Gentoo Biscoe 55.100000 16.000000 230 5850 male 2009
78
- 336 Gentoo Biscoe 44.500000 15.700000 217 4875 (null) 2009
79
- 337 Gentoo Biscoe 48.800000 16.200000 222 6000 male 2009
80
- 338 Gentoo Biscoe 47.200000 13.700000 214 4925 female 2009
81
- 339 Gentoo Biscoe (null) (null) (null) (null) (null) 2009
82
- 340 Gentoo Biscoe 46.800000 14.300000 215 4850 female 2009
83
- 341 Gentoo Biscoe 50.400000 15.700000 222 5750 male 2009
84
- 342 Gentoo Biscoe 45.200000 14.800000 212 5200 female 2009
85
- 343 Gentoo Biscoe 49.900000 16.100000 213 5400 male 2009
86
- ```
87
-
88
- By default, RedAmber shows self by compact transposed style. This unfamiliar style (TDR) is designed for
89
- the exploratory data processing. It keeps Vectors as row vectors, shows keys and types at a glance, shows levels
90
- for the 'factor-like' variables and shows the number of abnormal values like NaN and nil.
91
-
92
- ```ruby
93
- penguins
59
+ RedAmber::DataFrame.new(arrow)
94
60
 
95
61
  # =>
96
- RedAmber::DataFrame : 344 x 8 Vectors
97
- Vectors : 5 numeric, 3 strings
98
- # key type level data_preview
99
- 1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
100
- 2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
101
- 3 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
102
- 4 :bill_depth_mm double 81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
103
- 5 :flipper_length_mm uint8 56 [181, 186, 195, nil, 193, ... ], 2 nils
104
- 6 :body_mass_g uint16 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
105
- 7 :sex string 3 {"male"=>168, "female"=>165, nil=>11}
106
- 8 :year uint16 3 {2007=>110, 2008=>114, 2009=>120}
62
+ #<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
63
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
64
+ <string> <string> <double> <double> <uint8> ... <uint16>
65
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
66
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
67
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
68
+ 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
69
+ 5 Adelie Torgersen 36.7 19.3 193 ... 2007
70
+ : : : : : : ... :
71
+ 342 Gentoo Biscoe 50.4 15.7 222 ... 2009
72
+ 343 Gentoo Biscoe 45.2 14.8 212 ... 2009
73
+ 344 Gentoo Biscoe 49.9 16.1 213 ... 2009
107
74
  ```
108
75
 
109
76
  ### DataFrame model
@@ -113,23 +80,41 @@ For example, `DataFrame#pick` accepts keys as an argument and returns a sub Data
113
80
 
114
81
  ```ruby
115
82
  df = penguins.pick(:body_mass_g)
83
+ df
84
+
116
85
  # =>
117
- #<RedAmber::DataFrame : 344 x 1 Vector, 0x000000000000fa14>
118
- Vector : 1 numeric
119
- # key type level data_preview
120
- 1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
86
+ #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000015cc0>
87
+ body_mass_g
88
+ <uint16>
89
+ 1 3750
90
+ 2 3800
91
+ 3 3250
92
+ 4 (nil)
93
+ 5 3450
94
+ : :
95
+ 342 5750
96
+ 343 5200
97
+ 344 5400
121
98
  ```
122
99
 
123
100
  `DataFrame#assign` creates new variables (column in the table).
124
101
 
125
102
  ```ruby
126
103
  df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
104
+
127
105
  # =>
128
- #<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000fa28>
129
- Vectors : 2 numeric
130
- # key type level data_preview
131
- 1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
132
- 2 :body_mass_kg double 95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils
106
+ #<RedAmber::DataFrame : 344 x 2 Vectors, 0x00000000000212f0>
107
+ body_mass_g body_mass_kg
108
+ <uint16> <double>
109
+ 1 3750 3.8
110
+ 2 3800 3.8
111
+ 3 3250 3.3
112
+ 4 (nil) (nil)
113
+ 5 3450 3.5
114
+ : : :
115
+ 342 5750 5.8
116
+ 343 5200 5.2
117
+ 344 5400 5.4
133
118
  ```
134
119
 
135
120
  DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
@@ -178,19 +163,9 @@ Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/do
178
163
 
179
164
  See [Vector.md](doc/Vector.md) for details.
180
165
 
181
- ## TDR
182
-
183
- I named the data frame representation style in the model above as TDR (Transposed DataFrame Representation).
184
-
185
- This library can be used with both TDR mode and usual Table mode.
186
- If you set the environment variable `RED_AMBER_OUTPUT_MODE` to `"table"`, output style by `inspect` and `to_iruby` is the Table mode. Other value including nil will output TDR style.
187
-
188
- You can switch the mode in Ruby like this.
189
- ```ruby
190
- ENV['RED_AMBER_OUTPUT_STYLE'] = 'table' # => Table mode
191
- ```
166
+ ## Jupyter notebook
192
167
 
193
- For more detail information about TDR, see [TDR.md](doc/tdr.md).
168
+ [47 Examples of Red Amber](doc/47_examples_of_red_amber.ipynb)
194
169
 
195
170
  ## Development
196
171
 
data/Rakefile CHANGED
@@ -7,6 +7,7 @@ Rake::TestTask.new(:test) do |t|
7
7
  t.libs << 'test'
8
8
  t.libs << 'lib'
9
9
  t.test_files = FileList['test/**/test_*.rb']
10
+ t.warning = false
10
11
  end
11
12
 
12
13
  require 'rubocop/rake_task'
@@ -1,11 +1,11 @@
1
1
  prelude: |
2
- require 'datasets-arrow'
3
2
  require 'rover'
4
3
  require 'red_amber'
5
4
 
6
5
  penguins_csv = 'benchmark/cache/penguins.csv'
7
6
 
8
7
  unless File.exist?(penguins_csv)
8
+ require 'datasets-arrow'
9
9
  arrow = Datasets::Penguins.new.to_arrow
10
10
  RedAmber::DataFrame.new(arrow).save(penguins_csv)
11
11
  end