red_amber 0.1.2 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +21 -10
  3. data/CHANGELOG.md +162 -6
  4. data/Gemfile +3 -0
  5. data/README.md +89 -303
  6. data/benchmark/csv_load_penguins.yml +15 -0
  7. data/benchmark/drop_nil.yml +11 -0
  8. data/doc/DataFrame.md +840 -0
  9. data/doc/Vector.md +317 -0
  10. data/doc/image/arrow_table_new.png +0 -0
  11. data/doc/image/dataframe/assign.png +0 -0
  12. data/doc/image/dataframe/drop.png +0 -0
  13. data/doc/image/dataframe/pick.png +0 -0
  14. data/doc/image/dataframe/remove.png +0 -0
  15. data/doc/image/dataframe/rename.png +0 -0
  16. data/doc/image/dataframe/slice.png +0 -0
  17. data/doc/image/dataframe_model.png +0 -0
  18. data/doc/image/example_in_red_arrow.png +0 -0
  19. data/doc/image/tdr.png +0 -0
  20. data/doc/image/tdr_and_table.png +0 -0
  21. data/doc/image/tidy_data_in_TDR.png +0 -0
  22. data/doc/image/vector/binary_element_wise.png +0 -0
  23. data/doc/image/vector/unary_aggregation.png +0 -0
  24. data/doc/image/vector/unary_aggregation_w_option.png +0 -0
  25. data/doc/image/vector/unary_element_wise.png +0 -0
  26. data/doc/tdr.md +56 -0
  27. data/doc/tdr_ja.md +56 -0
  28. data/lib/red_amber/data_frame.rb +68 -35
  29. data/lib/red_amber/data_frame_displayable.rb +132 -0
  30. data/lib/red_amber/data_frame_helper.rb +64 -0
  31. data/lib/red_amber/data_frame_indexable.rb +38 -0
  32. data/lib/red_amber/data_frame_observation_operation.rb +83 -0
  33. data/lib/red_amber/data_frame_selectable.rb +34 -43
  34. data/lib/red_amber/data_frame_variable_operation.rb +133 -0
  35. data/lib/red_amber/vector.rb +58 -6
  36. data/lib/red_amber/vector_compensable.rb +68 -0
  37. data/lib/red_amber/vector_functions.rb +147 -68
  38. data/lib/red_amber/version.rb +1 -1
  39. data/lib/red_amber.rb +9 -1
  40. data/red_amber.gemspec +3 -6
  41. metadata +36 -9
  42. data/lib/red_amber/data_frame_output.rb +0 -116
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 54de345111ab7c3918e119abe820d2ff207007f1ce9731e2f8954513d47c76a9
4
- data.tar.gz: 75e4251c6d6be8eab05739f75e064a2e65cbe3abdafaa574c559d9356fe93a20
3
+ metadata.gz: 4d18eedf5de7fd06fe52e8a82ad38fe12d590dc10929c96872e557b9e946f785
4
+ data.tar.gz: dda93f0af421096410e00ecf2261e8846a236634bd96ae9941d1b5cd49cd5eb2
5
5
  SHA512:
6
- metadata.gz: 60c2d11d30b91947b67e608864e5e4fe13e544662f671789256e6e2e624a892577f616572e4ba55be4de99affd528d020060b4be56f8820250697db2a80132a2
7
- data.tar.gz: 19170b7cd3d6b1174b7de44c0b8841d47acc4d1832fe72fdc8adc7171245e031c922614aca979755ae035566deae0a711644a3e483cecbceabdcfc411efb2263
6
+ metadata.gz: 7c1b1edd6c1f6f3f275ea765c4bc8765327c88a36120a4c5a66dd8afa59f5913db4a5b436d80378554e03403bab823edf7467beea0f44e2803e36f3e9677a065
7
+ data.tar.gz: 949fd15d2076d4e53fb141375bde282228c7f6566e137047344134c54964fe77fd2f9757b0bdc324eb3cfa14091f2ae928e0e844d28f3ebbcfa17fc7d388bbd0
data/.rubocop.yml CHANGED
@@ -53,40 +53,51 @@ Layout/LineLength:
53
53
  # 18..30 unsatisfactory
54
54
  # > 30 dangerous
55
55
  Metrics/AbcSize:
56
- Max: 23
56
+ Max: 30
57
57
  Exclude:
58
- - 'lib/red_amber/data_frame_output.rb' # Max: 78
58
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
59
+ - 'lib/red_amber/vector_compensable.rb' # Max: 36
59
60
 
60
61
  # Max: 25
61
62
  Metrics/BlockLength:
62
63
  Max: 25
63
64
  Exclude:
64
65
  - 'test/**/*'
65
- - '*.gemspec'
66
66
 
67
67
  # Max: 100
68
68
  Metrics/ClassLength:
69
- Max: 100
69
+ Max: 120
70
70
  Exclude:
71
71
  - 'test/**/*'
72
72
 
73
73
  # Max: 7
74
74
  Metrics/CyclomaticComplexity:
75
- Max: 10
75
+ Max: 12
76
76
  Exclude:
77
- - 'lib/red_amber/data_frame_output.rb' # Max: 11
77
+ - 'lib/red_amber/vector_compensable.rb' # Max: 14
78
78
 
79
79
  # Max: 10
80
80
  Metrics/MethodLength:
81
- Max: 18
81
+ Max: 30
82
82
  Exclude:
83
- - 'lib/red_amber/data_frame_output.rb' # Max: 35
83
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 33
84
+
85
+ # Max: 100
86
+ Metrics/ModuleLength:
87
+ Max: 100
88
+ Exclude:
89
+ - 'lib/red_amber/vector_functions.rb' # Max: 114
84
90
 
85
91
  # Max: 8
86
92
  Metrics/PerceivedComplexity:
87
- Max: 11
93
+ Max: 13
94
+ Exclude:
95
+ - 'lib/red_amber/vector_compensable.rb' # Max: 15
96
+
97
+ # Necessary to define is_na
98
+ Naming/PredicateName:
88
99
  Exclude:
89
- - 'lib/red_amber/data_frame_output.rb' # Max: 12
100
+ - 'lib/red_amber/vector_functions.rb'
90
101
 
91
102
  # Necessary to test when range.end == -1
92
103
  Style/SlicingWithRange:
data/CHANGELOG.md CHANGED
@@ -1,12 +1,168 @@
1
- ## [0.1.3] - Unreleased
1
+ ## [0.2.0] - unreleased
2
+
3
+ - Document
4
+ - YARD support
5
+
6
+ - DataFrame#join features
7
+
8
+ ## [0.1.6] - Unreleased
9
+
10
+ - Feedback something to Red Data Tools
2
11
 
3
12
  - `DataFrame`
4
- - Introduce updating capabilities
5
- - Introduce NA support
6
- - Add slice method
13
+ - Introduce `summary` or ``describe`
14
+ - Add `Quantile` by own code?
15
+ - Improve dataframe obs. manipuration methods to accept float as a index (#10)
16
+ - Improve as more performant by benchmark check.
17
+
7
18
  - `Vector`
8
- - Add NaN support for functions
9
- - More functions
19
+ - Support more functions
20
+ - Support coerece
21
+
22
+ - More examples of frequently needed tasks
23
+
24
+ ## [0.1.5] - 2022-06-12 (experimental)
25
+
26
+ - Bug fixes
27
+ - Fix DF#tdr to display timestamp type (#19)
28
+ - Add TZ setting in CI test to pass temporal tests (#19)
29
+ - Fix example in document of #load(csv_from_URI) (#23)
30
+
31
+ - New features and improvements
32
+ - Improve usability of DataFrame manipulating block (#19)
33
+ - Add `DataFrame#v` to select a Vector
34
+ - Add `DataFrame#variables` method
35
+ - Add `DataFrame#to_arrow`
36
+ - Add instance variables in DataFrame with lazy initialization
37
+ - Add `Vector#key` to get key name
38
+ - Add `Vector#temporal?` to check if temporal type
39
+ - Refine around DataFrame#variables
40
+ - Refine init of instance variables
41
+ - Refine DataFrame#type_classes, V#ectortype_class
42
+ - Refine DataFrame#tdr to shorten temporal data
43
+
44
+ - Add supports to make up for missing values (#20)
45
+ - Add VectorArgumentError
46
+ - Add `Vector#replace_with`
47
+ - Add helper function to assert with NaN
48
+ - To assert NaN == NaN
49
+ - Add `Vector#fill_nil_backward`, `Vector#forward`
50
+ - Add `DataFrame#remove_nil` method
51
+ - Change to accept nil as replacement in Vector#replace_with
52
+
53
+ - Introduce index related methods (#22)
54
+ - Add `Vector#sort_indexes` method
55
+ - Add `Vector#uniq` method
56
+ - Add `Vector#tally` and `Vectorvalue_counts` methods
57
+ - Add `DataFrame#sort` method
58
+ - Add `DataFrame#group` method
59
+ - Change to use DataFrame#map_indices in #[]
60
+
61
+ - Add rounding functions with opts (#21)
62
+ - With options :mode and :n_digits
63
+ - :n_digits also can be specified with :multiple option in `Vector#round_to_multiple`
64
+ - `Vector#round`
65
+ - `Vector#ceil`
66
+ - `Vector#floor`
67
+ - `Vector#trunc`
68
+
69
+ - Documentation
70
+ - Update TDR, TDR_ja documents to latest (#18)
71
+ - Refinement and small fix in DataFrame.md (#18)
72
+ - Update README to use more effective example (#18)
73
+ - Delete expired TDR_operations.pdf (#23)
74
+ - Update README and dataframe_model image (#23)
75
+ - Update description about rover-df in README (#23)
76
+ - Add installation of Arrow in README (#23)
77
+
78
+ - Others
79
+ - Tried but cannot use bundler cache in ci test (#17)
80
+ - Bump up requirements to Arrow 8.0.0 (#25)
81
+ - Arrow 7.0.0 with Ubuntu 21.04 causes an fatal error in replace_with_mask function.
82
+ - Update the description of gem (#23)
83
+ - Add benchmark tests (#26)
84
+
85
+ ## [0.1.4] - 2022-05-29 (experimental)
86
+
87
+ - Bug fixes
88
+ - Fix missing support for scalar argument (#1)
89
+ - Fix type name of boolean in DF#types to be same as Vector#type (#6, #7)
90
+ - Fix zero picking to return empty DataFrame (#8)
91
+ - Fix code at both args and a block given (#8)
92
+
93
+ - New features and improvements
94
+ - `DataFrame`
95
+ - Refine module name `Displayable`
96
+ - Rename nrow/ncol methods to `size`/`n_keys` to align with TDR concept (#4)
97
+ - Remain `n_row`/`n_col` for compatibility
98
+ - Rename `ls` method to `tdr` (#4)
99
+ - Add limit option to `tdr`
100
+ - Shorten option name (#11)
101
+ - Introduce `pick` method to create sub DataFrame (#8)
102
+ - Add boolean support (#8)
103
+ - Refactor `pick` (#9)
104
+ - Introduce `drop` method to create sub DataFrame (#8)
105
+ - Add boolean support (#8)
106
+ - Refactor `drop` (#9)
107
+ - Add boolean array support for `[]` (#9)
108
+ - Add `indexes`/`indices` to use with selecting observations (#9)
109
+ - Introduce `slice` method to create sub DataFrame (#8)
110
+ - Refactor `slice` (#9)
111
+ - Introduce `remove` method to create sub DataFrame (#9)
112
+ - Introduce `rename` method to create sub DataFrame (#14)
113
+ - Introduce `assign` method to create sub DataFrame (#14)
114
+ - Improve to call block by instance_eval (#13)
115
+
116
+ - `Vector`
117
+ - Refine `find(function)`
118
+ - Add `min_max` method (#2)
119
+ - Add `std`/`sd` method (ddof=0 version: `stddev`) (#2)
120
+ - Add `var` method (ddof=0 version: `variance`) (#2)
121
+ - Add `VectorFunctions.arrow_doc(func_name)` (temporally)
122
+
123
+ - Documentation
124
+ - Show code in README
125
+ - Change row/column names for **TDR** concept (#4)
126
+ - Add documents about **TDR** concept (#4)
127
+ - Add example about TDR (#4)
128
+ - Separate README to create DataFrame and Vector documents (#12)
129
+ - Add DataFrame model concept image to README (#12)
130
+
131
+ - GitHub site
132
+ - Switched to use merge on GitHub (not to push merged master) (#1)
133
+ - Create lifetime issue #3 to show the goal of this project (#3)
134
+
135
+ ## [0.1.3] - 2022-05-15 (experimental)
136
+
137
+ - Bug fixes
138
+ - Fix boolean functions in `Vector` to align with Ruby's behavior
139
+ - `&` == `and_kleene`
140
+ - `|` == `or_kleene`
141
+ - Quote strings of data-preview in `DataFrame#inspect`
142
+ - Quote empty and blank keys in `DataFrame#inspect`
143
+ - Respond to error for a wrong key in `DataFrame#[]`
144
+
145
+ - New features and improvements
146
+ - `DataFrame`
147
+ - Display nil elements in `inspect`
148
+ - Show NaN and nil counts in `inspect`
149
+ - Refactor `inspect`
150
+ - Add method `key` and `key_index`
151
+ - Add how to load/save Parquet to README
152
+
153
+ - `Vector`
154
+ - Add categorization functions
155
+
156
+ This is an important step to support `slice` method and NA treatment features.
157
+ - `is_finite`
158
+ - `is_inf`
159
+ - `is_na` (RedAmber original)
160
+ - `is_nan`
161
+ - `is_nil`, `is_null`
162
+ - `is_valid`
163
+ - Show in a reduced representation for long array in `inspect`
164
+ - Support options in aggregatiton functions
165
+ - Return values in non-arrow object for scalar aggregation functions
10
166
 
11
167
  ## [0.1.2] - 2022-05-08 (experimental)
12
168
 
data/Gemfile CHANGED
@@ -14,4 +14,7 @@ group :test do
14
14
 
15
15
  gem 'test-unit'
16
16
  gem 'webrick'
17
+
18
+ gem 'benchmark_driver'
19
+ gem 'red-datasets-arrow'
17
20
  end