red_amber 0.1.2 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +21 -10
- data/CHANGELOG.md +162 -6
- data/Gemfile +3 -0
- data/README.md +89 -303
- data/benchmark/csv_load_penguins.yml +15 -0
- data/benchmark/drop_nil.yml +11 -0
- data/doc/DataFrame.md +840 -0
- data/doc/Vector.md +317 -0
- data/doc/image/arrow_table_new.png +0 -0
- data/doc/image/dataframe/assign.png +0 -0
- data/doc/image/dataframe/drop.png +0 -0
- data/doc/image/dataframe/pick.png +0 -0
- data/doc/image/dataframe/remove.png +0 -0
- data/doc/image/dataframe/rename.png +0 -0
- data/doc/image/dataframe/slice.png +0 -0
- data/doc/image/dataframe_model.png +0 -0
- data/doc/image/example_in_red_arrow.png +0 -0
- data/doc/image/tdr.png +0 -0
- data/doc/image/tdr_and_table.png +0 -0
- data/doc/image/tidy_data_in_TDR.png +0 -0
- data/doc/image/vector/binary_element_wise.png +0 -0
- data/doc/image/vector/unary_aggregation.png +0 -0
- data/doc/image/vector/unary_aggregation_w_option.png +0 -0
- data/doc/image/vector/unary_element_wise.png +0 -0
- data/doc/tdr.md +56 -0
- data/doc/tdr_ja.md +56 -0
- data/lib/red_amber/data_frame.rb +68 -35
- data/lib/red_amber/data_frame_displayable.rb +132 -0
- data/lib/red_amber/data_frame_helper.rb +64 -0
- data/lib/red_amber/data_frame_indexable.rb +38 -0
- data/lib/red_amber/data_frame_observation_operation.rb +83 -0
- data/lib/red_amber/data_frame_selectable.rb +34 -43
- data/lib/red_amber/data_frame_variable_operation.rb +133 -0
- data/lib/red_amber/vector.rb +58 -6
- data/lib/red_amber/vector_compensable.rb +68 -0
- data/lib/red_amber/vector_functions.rb +147 -68
- data/lib/red_amber/version.rb +1 -1
- data/lib/red_amber.rb +9 -1
- data/red_amber.gemspec +3 -6
- metadata +36 -9
- data/lib/red_amber/data_frame_output.rb +0 -116
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 4d18eedf5de7fd06fe52e8a82ad38fe12d590dc10929c96872e557b9e946f785
|
4
|
+
data.tar.gz: dda93f0af421096410e00ecf2261e8846a236634bd96ae9941d1b5cd49cd5eb2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7c1b1edd6c1f6f3f275ea765c4bc8765327c88a36120a4c5a66dd8afa59f5913db4a5b436d80378554e03403bab823edf7467beea0f44e2803e36f3e9677a065
|
7
|
+
data.tar.gz: 949fd15d2076d4e53fb141375bde282228c7f6566e137047344134c54964fe77fd2f9757b0bdc324eb3cfa14091f2ae928e0e844d28f3ebbcfa17fc7d388bbd0
|
data/.rubocop.yml
CHANGED
@@ -53,40 +53,51 @@ Layout/LineLength:
|
|
53
53
|
# 18..30 unsatisfactory
|
54
54
|
# > 30 dangerous
|
55
55
|
Metrics/AbcSize:
|
56
|
-
Max:
|
56
|
+
Max: 30
|
57
57
|
Exclude:
|
58
|
-
- 'lib/red_amber/
|
58
|
+
- 'lib/red_amber/data_frame_displayable.rb' # Max: 55
|
59
|
+
- 'lib/red_amber/vector_compensable.rb' # Max: 36
|
59
60
|
|
60
61
|
# Max: 25
|
61
62
|
Metrics/BlockLength:
|
62
63
|
Max: 25
|
63
64
|
Exclude:
|
64
65
|
- 'test/**/*'
|
65
|
-
- '*.gemspec'
|
66
66
|
|
67
67
|
# Max: 100
|
68
68
|
Metrics/ClassLength:
|
69
|
-
Max:
|
69
|
+
Max: 120
|
70
70
|
Exclude:
|
71
71
|
- 'test/**/*'
|
72
72
|
|
73
73
|
# Max: 7
|
74
74
|
Metrics/CyclomaticComplexity:
|
75
|
-
Max:
|
75
|
+
Max: 12
|
76
76
|
Exclude:
|
77
|
-
- 'lib/red_amber/
|
77
|
+
- 'lib/red_amber/vector_compensable.rb' # Max: 14
|
78
78
|
|
79
79
|
# Max: 10
|
80
80
|
Metrics/MethodLength:
|
81
|
-
Max:
|
81
|
+
Max: 30
|
82
82
|
Exclude:
|
83
|
-
- 'lib/red_amber/
|
83
|
+
- 'lib/red_amber/data_frame_displayable.rb' # Max: 33
|
84
|
+
|
85
|
+
# Max: 100
|
86
|
+
Metrics/ModuleLength:
|
87
|
+
Max: 100
|
88
|
+
Exclude:
|
89
|
+
- 'lib/red_amber/vector_functions.rb' # Max: 114
|
84
90
|
|
85
91
|
# Max: 8
|
86
92
|
Metrics/PerceivedComplexity:
|
87
|
-
Max:
|
93
|
+
Max: 13
|
94
|
+
Exclude:
|
95
|
+
- 'lib/red_amber/vector_compensable.rb' # Max: 15
|
96
|
+
|
97
|
+
# Necessary to define is_na
|
98
|
+
Naming/PredicateName:
|
88
99
|
Exclude:
|
89
|
-
- 'lib/red_amber/
|
100
|
+
- 'lib/red_amber/vector_functions.rb'
|
90
101
|
|
91
102
|
# Necessary to test when range.end == -1
|
92
103
|
Style/SlicingWithRange:
|
data/CHANGELOG.md
CHANGED
@@ -1,12 +1,168 @@
|
|
1
|
-
## [0.
|
1
|
+
## [0.2.0] - unreleased
|
2
|
+
|
3
|
+
- Document
|
4
|
+
- YARD support
|
5
|
+
|
6
|
+
- DataFrame#join features
|
7
|
+
|
8
|
+
## [0.1.6] - Unreleased
|
9
|
+
|
10
|
+
- Feedback something to Red Data Tools
|
2
11
|
|
3
12
|
- `DataFrame`
|
4
|
-
- Introduce
|
5
|
-
-
|
6
|
-
-
|
13
|
+
- Introduce `summary` or ``describe`
|
14
|
+
- Add `Quantile` by own code?
|
15
|
+
- Improve dataframe obs. manipuration methods to accept float as a index (#10)
|
16
|
+
- Improve as more performant by benchmark check.
|
17
|
+
|
7
18
|
- `Vector`
|
8
|
-
-
|
9
|
-
-
|
19
|
+
- Support more functions
|
20
|
+
- Support coerece
|
21
|
+
|
22
|
+
- More examples of frequently needed tasks
|
23
|
+
|
24
|
+
## [0.1.5] - 2022-06-12 (experimental)
|
25
|
+
|
26
|
+
- Bug fixes
|
27
|
+
- Fix DF#tdr to display timestamp type (#19)
|
28
|
+
- Add TZ setting in CI test to pass temporal tests (#19)
|
29
|
+
- Fix example in document of #load(csv_from_URI) (#23)
|
30
|
+
|
31
|
+
- New features and improvements
|
32
|
+
- Improve usability of DataFrame manipulating block (#19)
|
33
|
+
- Add `DataFrame#v` to select a Vector
|
34
|
+
- Add `DataFrame#variables` method
|
35
|
+
- Add `DataFrame#to_arrow`
|
36
|
+
- Add instance variables in DataFrame with lazy initialization
|
37
|
+
- Add `Vector#key` to get key name
|
38
|
+
- Add `Vector#temporal?` to check if temporal type
|
39
|
+
- Refine around DataFrame#variables
|
40
|
+
- Refine init of instance variables
|
41
|
+
- Refine DataFrame#type_classes, V#ectortype_class
|
42
|
+
- Refine DataFrame#tdr to shorten temporal data
|
43
|
+
|
44
|
+
- Add supports to make up for missing values (#20)
|
45
|
+
- Add VectorArgumentError
|
46
|
+
- Add `Vector#replace_with`
|
47
|
+
- Add helper function to assert with NaN
|
48
|
+
- To assert NaN == NaN
|
49
|
+
- Add `Vector#fill_nil_backward`, `Vector#forward`
|
50
|
+
- Add `DataFrame#remove_nil` method
|
51
|
+
- Change to accept nil as replacement in Vector#replace_with
|
52
|
+
|
53
|
+
- Introduce index related methods (#22)
|
54
|
+
- Add `Vector#sort_indexes` method
|
55
|
+
- Add `Vector#uniq` method
|
56
|
+
- Add `Vector#tally` and `Vectorvalue_counts` methods
|
57
|
+
- Add `DataFrame#sort` method
|
58
|
+
- Add `DataFrame#group` method
|
59
|
+
- Change to use DataFrame#map_indices in #[]
|
60
|
+
|
61
|
+
- Add rounding functions with opts (#21)
|
62
|
+
- With options :mode and :n_digits
|
63
|
+
- :n_digits also can be specified with :multiple option in `Vector#round_to_multiple`
|
64
|
+
- `Vector#round`
|
65
|
+
- `Vector#ceil`
|
66
|
+
- `Vector#floor`
|
67
|
+
- `Vector#trunc`
|
68
|
+
|
69
|
+
- Documentation
|
70
|
+
- Update TDR, TDR_ja documents to latest (#18)
|
71
|
+
- Refinement and small fix in DataFrame.md (#18)
|
72
|
+
- Update README to use more effective example (#18)
|
73
|
+
- Delete expired TDR_operations.pdf (#23)
|
74
|
+
- Update README and dataframe_model image (#23)
|
75
|
+
- Update description about rover-df in README (#23)
|
76
|
+
- Add installation of Arrow in README (#23)
|
77
|
+
|
78
|
+
- Others
|
79
|
+
- Tried but cannot use bundler cache in ci test (#17)
|
80
|
+
- Bump up requirements to Arrow 8.0.0 (#25)
|
81
|
+
- Arrow 7.0.0 with Ubuntu 21.04 causes an fatal error in replace_with_mask function.
|
82
|
+
- Update the description of gem (#23)
|
83
|
+
- Add benchmark tests (#26)
|
84
|
+
|
85
|
+
## [0.1.4] - 2022-05-29 (experimental)
|
86
|
+
|
87
|
+
- Bug fixes
|
88
|
+
- Fix missing support for scalar argument (#1)
|
89
|
+
- Fix type name of boolean in DF#types to be same as Vector#type (#6, #7)
|
90
|
+
- Fix zero picking to return empty DataFrame (#8)
|
91
|
+
- Fix code at both args and a block given (#8)
|
92
|
+
|
93
|
+
- New features and improvements
|
94
|
+
- `DataFrame`
|
95
|
+
- Refine module name `Displayable`
|
96
|
+
- Rename nrow/ncol methods to `size`/`n_keys` to align with TDR concept (#4)
|
97
|
+
- Remain `n_row`/`n_col` for compatibility
|
98
|
+
- Rename `ls` method to `tdr` (#4)
|
99
|
+
- Add limit option to `tdr`
|
100
|
+
- Shorten option name (#11)
|
101
|
+
- Introduce `pick` method to create sub DataFrame (#8)
|
102
|
+
- Add boolean support (#8)
|
103
|
+
- Refactor `pick` (#9)
|
104
|
+
- Introduce `drop` method to create sub DataFrame (#8)
|
105
|
+
- Add boolean support (#8)
|
106
|
+
- Refactor `drop` (#9)
|
107
|
+
- Add boolean array support for `[]` (#9)
|
108
|
+
- Add `indexes`/`indices` to use with selecting observations (#9)
|
109
|
+
- Introduce `slice` method to create sub DataFrame (#8)
|
110
|
+
- Refactor `slice` (#9)
|
111
|
+
- Introduce `remove` method to create sub DataFrame (#9)
|
112
|
+
- Introduce `rename` method to create sub DataFrame (#14)
|
113
|
+
- Introduce `assign` method to create sub DataFrame (#14)
|
114
|
+
- Improve to call block by instance_eval (#13)
|
115
|
+
|
116
|
+
- `Vector`
|
117
|
+
- Refine `find(function)`
|
118
|
+
- Add `min_max` method (#2)
|
119
|
+
- Add `std`/`sd` method (ddof=0 version: `stddev`) (#2)
|
120
|
+
- Add `var` method (ddof=0 version: `variance`) (#2)
|
121
|
+
- Add `VectorFunctions.arrow_doc(func_name)` (temporally)
|
122
|
+
|
123
|
+
- Documentation
|
124
|
+
- Show code in README
|
125
|
+
- Change row/column names for **TDR** concept (#4)
|
126
|
+
- Add documents about **TDR** concept (#4)
|
127
|
+
- Add example about TDR (#4)
|
128
|
+
- Separate README to create DataFrame and Vector documents (#12)
|
129
|
+
- Add DataFrame model concept image to README (#12)
|
130
|
+
|
131
|
+
- GitHub site
|
132
|
+
- Switched to use merge on GitHub (not to push merged master) (#1)
|
133
|
+
- Create lifetime issue #3 to show the goal of this project (#3)
|
134
|
+
|
135
|
+
## [0.1.3] - 2022-05-15 (experimental)
|
136
|
+
|
137
|
+
- Bug fixes
|
138
|
+
- Fix boolean functions in `Vector` to align with Ruby's behavior
|
139
|
+
- `&` == `and_kleene`
|
140
|
+
- `|` == `or_kleene`
|
141
|
+
- Quote strings of data-preview in `DataFrame#inspect`
|
142
|
+
- Quote empty and blank keys in `DataFrame#inspect`
|
143
|
+
- Respond to error for a wrong key in `DataFrame#[]`
|
144
|
+
|
145
|
+
- New features and improvements
|
146
|
+
- `DataFrame`
|
147
|
+
- Display nil elements in `inspect`
|
148
|
+
- Show NaN and nil counts in `inspect`
|
149
|
+
- Refactor `inspect`
|
150
|
+
- Add method `key` and `key_index`
|
151
|
+
- Add how to load/save Parquet to README
|
152
|
+
|
153
|
+
- `Vector`
|
154
|
+
- Add categorization functions
|
155
|
+
|
156
|
+
This is an important step to support `slice` method and NA treatment features.
|
157
|
+
- `is_finite`
|
158
|
+
- `is_inf`
|
159
|
+
- `is_na` (RedAmber original)
|
160
|
+
- `is_nan`
|
161
|
+
- `is_nil`, `is_null`
|
162
|
+
- `is_valid`
|
163
|
+
- Show in a reduced representation for long array in `inspect`
|
164
|
+
- Support options in aggregatiton functions
|
165
|
+
- Return values in non-arrow object for scalar aggregation functions
|
10
166
|
|
11
167
|
## [0.1.2] - 2022-05-08 (experimental)
|
12
168
|
|