ruby_scientist_and_graphics 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +5 -0
- data/CODE_OF_CONDUCT.md +132 -0
- data/LICENSE.txt +21 -0
- data/README.md +142 -0
- data/Rakefile +12 -0
- data/demo.rb +50 -0
- data/lib/ruby_scientist_and_graphics/backends/rover_adapter.rb +102 -0
- data/lib/ruby_scientist_and_graphics/dataframe.rb +216 -0
- data/lib/ruby_scientist_and_graphics/dataset.rb +57 -0
- data/lib/ruby_scientist_and_graphics/interface.rb +102 -0
- data/lib/ruby_scientist_and_graphics/io.rb +49 -0
- data/lib/ruby_scientist_and_graphics/ml.rb +168 -0
- data/lib/ruby_scientist_and_graphics/plotter.rb +26 -0
- data/lib/ruby_scientist_and_graphics/stats.rb +48 -0
- data/lib/ruby_scientist_and_graphics/utils.rb +31 -0
- data/lib/ruby_scientist_and_graphics/version.rb +5 -0
- data/lib/ruby_scientist_and_graphics.rb +42 -0
- data/sig/ruby_scientist_and_graphics.rbs +4 -0
- metadata +106 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 8614de7395cbeab432fe7481fd46dc03099915f97e88e8396d089111a51f9fd8
|
4
|
+
data.tar.gz: ea200fdc2b3099ff2bc6b4652c4d0869e0819eb1689dc87a77795c44a3a811e8
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: eda95e7acb3db61c1f4ad914b0ce319ba2e2370c48ee74f9d297283f540699cad28c97cdb9b24c893528585bbb54d5ae46fb0b060487e8b67396ef7cd9969932
|
7
|
+
data.tar.gz: 40db38c0e09c3d441c0e8373362a6f9dd27e30b77a6736db7fd480c443c8d194e015d690e70c9110da07ec19c25ca5130ce8182d74166e0bf2ccabb0cb093eb1
|
data/CHANGELOG.md
ADDED
data/CODE_OF_CONDUCT.md
ADDED
@@ -0,0 +1,132 @@
|
|
1
|
+
# Contributor Covenant Code of Conduct
|
2
|
+
|
3
|
+
## Our Pledge
|
4
|
+
|
5
|
+
We as members, contributors, and leaders pledge to make participation in our
|
6
|
+
community a harassment-free experience for everyone, regardless of age, body
|
7
|
+
size, visible or invisible disability, ethnicity, sex characteristics, gender
|
8
|
+
identity and expression, level of experience, education, socio-economic status,
|
9
|
+
nationality, personal appearance, race, caste, color, religion, or sexual
|
10
|
+
identity and orientation.
|
11
|
+
|
12
|
+
We pledge to act and interact in ways that contribute to an open, welcoming,
|
13
|
+
diverse, inclusive, and healthy community.
|
14
|
+
|
15
|
+
## Our Standards
|
16
|
+
|
17
|
+
Examples of behavior that contributes to a positive environment for our
|
18
|
+
community include:
|
19
|
+
|
20
|
+
* Demonstrating empathy and kindness toward other people
|
21
|
+
* Being respectful of differing opinions, viewpoints, and experiences
|
22
|
+
* Giving and gracefully accepting constructive feedback
|
23
|
+
* Accepting responsibility and apologizing to those affected by our mistakes,
|
24
|
+
and learning from the experience
|
25
|
+
* Focusing on what is best not just for us as individuals, but for the overall
|
26
|
+
community
|
27
|
+
|
28
|
+
Examples of unacceptable behavior include:
|
29
|
+
|
30
|
+
* The use of sexualized language or imagery, and sexual attention or advances of
|
31
|
+
any kind
|
32
|
+
* Trolling, insulting or derogatory comments, and personal or political attacks
|
33
|
+
* Public or private harassment
|
34
|
+
* Publishing others' private information, such as a physical or email address,
|
35
|
+
without their explicit permission
|
36
|
+
* Other conduct which could reasonably be considered inappropriate in a
|
37
|
+
professional setting
|
38
|
+
|
39
|
+
## Enforcement Responsibilities
|
40
|
+
|
41
|
+
Community leaders are responsible for clarifying and enforcing our standards of
|
42
|
+
acceptable behavior and will take appropriate and fair corrective action in
|
43
|
+
response to any behavior that they deem inappropriate, threatening, offensive,
|
44
|
+
or harmful.
|
45
|
+
|
46
|
+
Community leaders have the right and responsibility to remove, edit, or reject
|
47
|
+
comments, commits, code, wiki edits, issues, and other contributions that are
|
48
|
+
not aligned to this Code of Conduct, and will communicate reasons for moderation
|
49
|
+
decisions when appropriate.
|
50
|
+
|
51
|
+
## Scope
|
52
|
+
|
53
|
+
This Code of Conduct applies within all community spaces, and also applies when
|
54
|
+
an individual is officially representing the community in public spaces.
|
55
|
+
Examples of representing our community include using an official email address,
|
56
|
+
posting via an official social media account, or acting as an appointed
|
57
|
+
representative at an online or offline event.
|
58
|
+
|
59
|
+
## Enforcement
|
60
|
+
|
61
|
+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
62
|
+
reported to the community leaders responsible for enforcement at
|
63
|
+
[INSERT CONTACT METHOD].
|
64
|
+
All complaints will be reviewed and investigated promptly and fairly.
|
65
|
+
|
66
|
+
All community leaders are obligated to respect the privacy and security of the
|
67
|
+
reporter of any incident.
|
68
|
+
|
69
|
+
## Enforcement Guidelines
|
70
|
+
|
71
|
+
Community leaders will follow these Community Impact Guidelines in determining
|
72
|
+
the consequences for any action they deem in violation of this Code of Conduct:
|
73
|
+
|
74
|
+
### 1. Correction
|
75
|
+
|
76
|
+
**Community Impact**: Use of inappropriate language or other behavior deemed
|
77
|
+
unprofessional or unwelcome in the community.
|
78
|
+
|
79
|
+
**Consequence**: A private, written warning from community leaders, providing
|
80
|
+
clarity around the nature of the violation and an explanation of why the
|
81
|
+
behavior was inappropriate. A public apology may be requested.
|
82
|
+
|
83
|
+
### 2. Warning
|
84
|
+
|
85
|
+
**Community Impact**: A violation through a single incident or series of
|
86
|
+
actions.
|
87
|
+
|
88
|
+
**Consequence**: A warning with consequences for continued behavior. No
|
89
|
+
interaction with the people involved, including unsolicited interaction with
|
90
|
+
those enforcing the Code of Conduct, for a specified period of time. This
|
91
|
+
includes avoiding interactions in community spaces as well as external channels
|
92
|
+
like social media. Violating these terms may lead to a temporary or permanent
|
93
|
+
ban.
|
94
|
+
|
95
|
+
### 3. Temporary Ban
|
96
|
+
|
97
|
+
**Community Impact**: A serious violation of community standards, including
|
98
|
+
sustained inappropriate behavior.
|
99
|
+
|
100
|
+
**Consequence**: A temporary ban from any sort of interaction or public
|
101
|
+
communication with the community for a specified period of time. No public or
|
102
|
+
private interaction with the people involved, including unsolicited interaction
|
103
|
+
with those enforcing the Code of Conduct, is allowed during this period.
|
104
|
+
Violating these terms may lead to a permanent ban.
|
105
|
+
|
106
|
+
### 4. Permanent Ban
|
107
|
+
|
108
|
+
**Community Impact**: Demonstrating a pattern of violation of community
|
109
|
+
standards, including sustained inappropriate behavior, harassment of an
|
110
|
+
individual, or aggression toward or disparagement of classes of individuals.
|
111
|
+
|
112
|
+
**Consequence**: A permanent ban from any sort of public interaction within the
|
113
|
+
community.
|
114
|
+
|
115
|
+
## Attribution
|
116
|
+
|
117
|
+
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
|
118
|
+
version 2.1, available at
|
119
|
+
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
|
120
|
+
|
121
|
+
Community Impact Guidelines were inspired by
|
122
|
+
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
|
123
|
+
|
124
|
+
For answers to common questions about this code of conduct, see the FAQ at
|
125
|
+
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
|
126
|
+
[https://www.contributor-covenant.org/translations][translations].
|
127
|
+
|
128
|
+
[homepage]: https://www.contributor-covenant.org
|
129
|
+
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
|
130
|
+
[Mozilla CoC]: https://github.com/mozilla/diversity
|
131
|
+
[FAQ]: https://www.contributor-covenant.org/faq
|
132
|
+
[translations]: https://www.contributor-covenant.org/translations
|
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2025 jtvaldivia
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,142 @@
|
|
1
|
+
# RubyScientistAndGraphics
|
2
|
+
|
3
|
+
Lightweight data science toolkit for Ruby: load/clean data, get quick stats, plot charts, and train simple ML models — all in one gem and with zero heavy dependencies.
|
4
|
+
|
5
|
+
It ships with a minimal in-house DataFrame (no Daru required), Gruff for plotting, and tiny implementations for statistics and ML (linear regression and k-means).
|
6
|
+
|
7
|
+
## Features
|
8
|
+
|
9
|
+
- Load and save CSV/JSON, plus save/load a simple “project” (columns + rows).
|
10
|
+
- Data cleaning helpers: remove columns, fill missing values, limit rows.
|
11
|
+
- Quick stats: per-column mean/min/max and Pearson correlation.
|
12
|
+
- Plotting: bar and line charts via Gruff.
|
13
|
+
- ML: linear regression (least squares) and k-means clustering.
|
14
|
+
|
15
|
+
## Installation
|
16
|
+
|
17
|
+
Clone and use directly, or add to your Gemfile from a git source until published to RubyGems:
|
18
|
+
|
19
|
+
```ruby
|
20
|
+
gem 'ruby_scientist_and_graphics', git: 'https://github.com/your-user/ruby_scientist_and_graphics'
|
21
|
+
```
|
22
|
+
|
23
|
+
Then install:
|
24
|
+
|
25
|
+
```bash
|
26
|
+
bundle install
|
27
|
+
```
|
28
|
+
|
29
|
+
Ruby 3.2+ is recommended.
|
30
|
+
|
31
|
+
## Quick start
|
32
|
+
|
33
|
+
Run the demo to see the workflow end-to-end:
|
34
|
+
|
35
|
+
```bash
|
36
|
+
ruby demo.rb
|
37
|
+
```
|
38
|
+
|
39
|
+
Or use the API:
|
40
|
+
|
41
|
+
```ruby
|
42
|
+
require_relative 'lib/ruby_scientist_and_graphics'
|
43
|
+
|
44
|
+
interface = RubyScientistAndGraphics::Interface.new
|
45
|
+
|
46
|
+
# 1) Load and clean
|
47
|
+
interface.load('test/fixtures/sample.csv', remove_columns: [:comentarios], limit: 5)
|
48
|
+
interface.clean(missing: 0)
|
49
|
+
|
50
|
+
# 2) Stats
|
51
|
+
interface.analyze
|
52
|
+
|
53
|
+
# 3) Plot
|
54
|
+
interface.graph(type: :bar, x: :mes, y: :ventas, file: 'output.png')
|
55
|
+
|
56
|
+
# 4) Train a model
|
57
|
+
model = interface.train_model(type: :linear_regression, features: [:mes], target: :ventas)
|
58
|
+
preds = model.predict([[1.0], [2.0], [3.0]])
|
59
|
+
|
60
|
+
# 5) Save project
|
61
|
+
interface.save_project('project.json')
|
62
|
+
|
63
|
+
# 6) Load a previously saved project and predict
|
64
|
+
interface.load_project('project.json')
|
65
|
+
interface.train_model(type: :linear_regression, features: [:mes], target: :ventas)
|
66
|
+
preds = interface.predict([[6.0], [7.0]])
|
67
|
+
```
|
68
|
+
|
69
|
+
## API overview
|
70
|
+
|
71
|
+
- DataFrame (internal): CSV load, indexing by column symbol, `head`, `write_csv`, `map_vectors`, `filter_rows`.
|
72
|
+
- IO: `load_csv`, `load_json`, `save_csv`, `save_json`, `save_project`, `load_project`.
|
73
|
+
- Dataset: `remove_columns`, `add_column`, `limit_rows`, `fill_missing`, `head`, `stats`, `plot`.
|
74
|
+
- Stats: `describe`, `correlation(col1, col2)`.
|
75
|
+
- Plotter: `bar(x:, y:, file:)`, `line(x:, y:, file:)`.
|
76
|
+
- Interface: `load`, `clean`, `analyze`, `graph`, `pipeline`, `train_model`, `save_project`.
|
77
|
+
- Interface: `load`, `clean`, `analyze`, `graph`, `pipeline`, `train_model`, `save_project`, `load_project`, `predict`.
|
78
|
+
|
79
|
+
## Adapters (optional backends)
|
80
|
+
|
81
|
+
This gem includes a minimal in-house DataFrame that powers all features. If you want more performance or richer operations (group-by, joins, rolling, etc.), you can plug a third-party backend behind the same API using a simple adapter pattern.
|
82
|
+
|
83
|
+
Potential backends:
|
84
|
+
|
85
|
+
- Polars (Ruby bindings): very fast, columnar engine written in Rust.
|
86
|
+
- Rover-Df: pure Ruby DataFrame with a friendly API.
|
87
|
+
|
88
|
+
Adapter idea (sketch):
|
89
|
+
|
90
|
+
```ruby
|
91
|
+
module RubyScientistAndGraphics
|
92
|
+
module Backends
|
93
|
+
class PolarsAdapter
|
94
|
+
def self.from_csv(path); end
|
95
|
+
def vectors; end
|
96
|
+
def [](col); end
|
97
|
+
def to_a; end
|
98
|
+
# implement methods used by Dataset/Stats/Plotter
|
99
|
+
end
|
100
|
+
end
|
101
|
+
end
|
102
|
+
|
103
|
+
# Then inject at app start:
|
104
|
+
# RubyScientistAndGraphics::DataFrame = RubyScientistAndGraphics::Backends::PolarsAdapter
|
105
|
+
```
|
106
|
+
|
107
|
+
This keeps your app code unchanged while letting you switch engines.
|
108
|
+
|
109
|
+
## Development
|
110
|
+
|
111
|
+
Setup and tests:
|
112
|
+
|
113
|
+
```bash
|
114
|
+
bin/setup
|
115
|
+
bundle exec rake test
|
116
|
+
```
|
117
|
+
|
118
|
+
Run an interactive console:
|
119
|
+
|
120
|
+
```bash
|
121
|
+
bin/console
|
122
|
+
```
|
123
|
+
|
124
|
+
Build and install locally:
|
125
|
+
|
126
|
+
```bash
|
127
|
+
bundle exec rake install
|
128
|
+
```
|
129
|
+
|
130
|
+
Release flow: bump version in `lib/ruby_scientist_and_graphics/version.rb`, then:
|
131
|
+
|
132
|
+
```bash
|
133
|
+
bundle exec rake release
|
134
|
+
```
|
135
|
+
|
136
|
+
## Contributing
|
137
|
+
|
138
|
+
Pull requests are welcome. Please open an issue to discuss large changes first. See CODE_OF_CONDUCT.md.
|
139
|
+
|
140
|
+
## License
|
141
|
+
|
142
|
+
MIT License. See LICENSE.txt.
|
data/Rakefile
ADDED
data/demo.rb
ADDED
@@ -0,0 +1,50 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
require_relative "lib/ruby_scientist_and_graphics"
|
3
|
+
|
4
|
+
puts "=== Demo RubyScientistAndGraphics ==="
|
5
|
+
|
6
|
+
interface = RubyScientistAndGraphics::Interface.new
|
7
|
+
|
8
|
+
puts "\nCargando dataset..."
|
9
|
+
interface.load("test/fixtures/sample.csv", remove_columns: [:comentarios], limit: 5)
|
10
|
+
|
11
|
+
puts "\nLimpiando datos (rellenar nils con 0)..."
|
12
|
+
interface.clean(missing: 0)
|
13
|
+
|
14
|
+
puts "\nMostrando primeras filas:"
|
15
|
+
puts interface.dataset.df.head(5)
|
16
|
+
|
17
|
+
puts "\nMostrando estadísticas descriptivas:"
|
18
|
+
interface.analyze
|
19
|
+
|
20
|
+
puts "\nGenerando gráfico de barras (ventas vs mes)..."
|
21
|
+
interface.graph(type: :bar, x: :mes, y: :ventas, file: "test/output_demo.png")
|
22
|
+
|
23
|
+
puts "\nEntrenando modelo de regresión lineal (mes -> ventas)..."
|
24
|
+
model = interface.train_model(type: :linear_regression, features: [:mes], target: :ventas)
|
25
|
+
puts "Modelo entrenado: #{model.class}"
|
26
|
+
|
27
|
+
puts "\nPredicciones con el modelo (mes = 6,7):"
|
28
|
+
preds = interface.predict([[6.0], [7.0]])
|
29
|
+
puts "Predicciones: #{preds.inspect}"
|
30
|
+
|
31
|
+
puts "\nCorrelación Pearson entre mes y ventas:"
|
32
|
+
corr = interface.dataset.stats.correlation(:mes, :ventas)
|
33
|
+
puts "correlación(mes, ventas) = #{corr.round(4)}"
|
34
|
+
|
35
|
+
puts "\nGráfico de línea (ventas vs mes)..."
|
36
|
+
interface.graph(type: :line, x: :mes, y: :ventas, file: "test/output_demo_line.png")
|
37
|
+
|
38
|
+
puts "\nEntrenando KMeans con 2 clusters sobre ventas..."
|
39
|
+
kmeans = interface.train_model(type: :kmeans, features: [:ventas], clusters: 2)
|
40
|
+
puts "Modelo KMeans con n_clusters = #{kmeans.n_clusters}"
|
41
|
+
|
42
|
+
puts "\nGuardando proyecto en 'test/project.json'..."
|
43
|
+
interface.save_project("test/project.json")
|
44
|
+
|
45
|
+
puts "\nCargando proyecto desde 'test/project.json' y mostrando primeras filas:"
|
46
|
+
interface2 = RubyScientistAndGraphics::Interface.new
|
47
|
+
interface2.load_project("test/project.json")
|
48
|
+
puts interface2.dataset.df.head(3)
|
49
|
+
|
50
|
+
puts "\nDemo terminado."
|
@@ -0,0 +1,102 @@
|
|
1
|
+
# Optional backend adapter using Rover::DataFrame
|
2
|
+
# Requires the 'rover-df' gem when activated via use_backend(:rover)
|
3
|
+
|
4
|
+
module RubyScientistAndGraphics
|
5
|
+
module Backends
|
6
|
+
class RoverAdapter
|
7
|
+
# Constructors
|
8
|
+
def self.from_csv(path)
|
9
|
+
require "rover"
|
10
|
+
::Rover::DataFrame.read_csv(path)
|
11
|
+
end
|
12
|
+
|
13
|
+
def self.rows(rows, order: [])
|
14
|
+
require "rover"
|
15
|
+
data = order.map.with_index { |k, i| [k.to_sym, rows.map { |r| r[i] }] }.to_h
|
16
|
+
::Rover::DataFrame.new(data)
|
17
|
+
end
|
18
|
+
|
19
|
+
def initialize(df)
|
20
|
+
@df = df
|
21
|
+
end
|
22
|
+
|
23
|
+
# Align with internal DataFrame API used in the gem
|
24
|
+
def vectors
|
25
|
+
Vectors.new(@df.keys.map(&:to_sym))
|
26
|
+
end
|
27
|
+
|
28
|
+
class Vectors
|
29
|
+
def initialize(keys)
|
30
|
+
@keys = keys
|
31
|
+
end
|
32
|
+
|
33
|
+
def to_a = @keys
|
34
|
+
def include?(key) = @keys.include?(key)
|
35
|
+
def each(&block) = @keys.each(&block)
|
36
|
+
end
|
37
|
+
|
38
|
+
class Column
|
39
|
+
def initialize(values)
|
40
|
+
@values = values
|
41
|
+
end
|
42
|
+
|
43
|
+
def to_a = @values.dup
|
44
|
+
def [](idx) = @values.[](idx)
|
45
|
+
|
46
|
+
def type
|
47
|
+
all_numeric = @values.compact.all? { |v| v.is_a?(Numeric) }
|
48
|
+
all_numeric ? :numeric : :object
|
49
|
+
end
|
50
|
+
|
51
|
+
def map(&block) = @values.map(&block)
|
52
|
+
end
|
53
|
+
|
54
|
+
def [](col)
|
55
|
+
Column.new(@df[col.to_s] || @df[col.to_sym] || [])
|
56
|
+
end
|
57
|
+
|
58
|
+
def []=(name, values)
|
59
|
+
name = name.to_s
|
60
|
+
@df[name] = values
|
61
|
+
end
|
62
|
+
|
63
|
+
def delete_vector(col)
|
64
|
+
name = col.to_s
|
65
|
+
@df.delete(name)
|
66
|
+
end
|
67
|
+
|
68
|
+
def to_a
|
69
|
+
keys = vectors.to_a
|
70
|
+
(0...size).map do |i|
|
71
|
+
keys.map { |k| (@df[k.to_s] || @df[k.to_sym])[i] }
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
def size
|
76
|
+
@df.size
|
77
|
+
end
|
78
|
+
|
79
|
+
def head(n = 5)
|
80
|
+
RoverAdapter.new(@df.head(n))
|
81
|
+
end
|
82
|
+
|
83
|
+
def write_csv(path)
|
84
|
+
@df.to_csv(path)
|
85
|
+
end
|
86
|
+
|
87
|
+
def map_vectors
|
88
|
+
result = {}
|
89
|
+
vectors.to_a.each do |k|
|
90
|
+
mapped = yield Column.new(@df[k.to_s] || @df[k.to_sym])
|
91
|
+
result[k.to_sym] = mapped.is_a?(Column) ? mapped.to_a : Array(mapped)
|
92
|
+
end
|
93
|
+
self.class.new(::Rover::DataFrame.new(result))
|
94
|
+
end
|
95
|
+
|
96
|
+
def filter_rows(&block)
|
97
|
+
kept = to_a.select(&block)
|
98
|
+
self.class.rows(kept, order: vectors.to_a)
|
99
|
+
end
|
100
|
+
end
|
101
|
+
end
|
102
|
+
end
|
@@ -0,0 +1,216 @@
|
|
1
|
+
module RubyScientistAndGraphics
|
2
|
+
# Minimal DataFrame to cover the API used by this gem's code and tests
|
3
|
+
class DataFrame
|
4
|
+
# Helper wrapper that mimics Daru's vectors
|
5
|
+
class Vectors
|
6
|
+
def initialize(keys)
|
7
|
+
@keys = keys
|
8
|
+
end
|
9
|
+
|
10
|
+
def to_a
|
11
|
+
@keys
|
12
|
+
end
|
13
|
+
|
14
|
+
def include?(key)
|
15
|
+
@keys.include?(key)
|
16
|
+
end
|
17
|
+
|
18
|
+
def each(&block)
|
19
|
+
@keys.each(&block)
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
# Column wrapper with minimal API
|
24
|
+
class Column
|
25
|
+
def initialize(values)
|
26
|
+
@values = values
|
27
|
+
end
|
28
|
+
|
29
|
+
def to_a
|
30
|
+
@values.dup
|
31
|
+
end
|
32
|
+
|
33
|
+
def [](idx)
|
34
|
+
@values[idx]
|
35
|
+
end
|
36
|
+
|
37
|
+
# Simple type inference
|
38
|
+
def type
|
39
|
+
all_numeric = @values.compact.all? { |v| v.is_a?(Numeric) }
|
40
|
+
all_numeric ? :numeric : :object
|
41
|
+
end
|
42
|
+
|
43
|
+
def map(&block)
|
44
|
+
@values.map(&block)
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
# Constructors
|
49
|
+
def self.from_csv(path)
|
50
|
+
require "csv"
|
51
|
+
rows = []
|
52
|
+
headers = nil
|
53
|
+
CSV.foreach(path, headers: true, header_converters: ->(h) { h&.strip&.downcase&.to_sym }) do |row|
|
54
|
+
headers ||= row.headers.map(&:to_sym)
|
55
|
+
rows << headers.map { |h| coerce_value(row[h]) }
|
56
|
+
end
|
57
|
+
DataFrame.rows(rows, order: headers || [])
|
58
|
+
end
|
59
|
+
|
60
|
+
# Accept array of hashes (keys as symbols/strings) or hash of arrays
|
61
|
+
def initialize(data)
|
62
|
+
@columns = {}
|
63
|
+
case data
|
64
|
+
when Array
|
65
|
+
if data.first.is_a?(Hash)
|
66
|
+
keys = data.map(&:keys).flatten.uniq.map { |k| k.to_sym }
|
67
|
+
keys.each { |k| @columns[k] = [] }
|
68
|
+
data.each do |row|
|
69
|
+
keys.each { |k| @columns[k] << (row.key?(k) ? row[k] : row[k.to_s]) }
|
70
|
+
end
|
71
|
+
elsif data.first.is_a?(Array)
|
72
|
+
# Assume first row is header
|
73
|
+
headers = (data.first || []).map { |h| h.to_sym }
|
74
|
+
body = data[1..] || []
|
75
|
+
@columns = headers.map.with_index { |h, i| [h, body.map { |r| r[i] }] }.to_h
|
76
|
+
else
|
77
|
+
# Single array -> make an index column
|
78
|
+
@columns[:value] = data
|
79
|
+
end
|
80
|
+
when Hash
|
81
|
+
data.each { |k, v| @columns[k.to_sym] = v.dup }
|
82
|
+
else
|
83
|
+
raise ArgumentError, "Unsupported data type for DataFrame"
|
84
|
+
end
|
85
|
+
normalize_column_lengths!
|
86
|
+
end
|
87
|
+
|
88
|
+
# Build from rows (array of arrays) and a column order
|
89
|
+
def self.rows(rows, order: [])
|
90
|
+
cols = order.map { |k| [k.to_sym, []] }.to_h
|
91
|
+
rows.each do |r|
|
92
|
+
order.each_with_index do |k, i|
|
93
|
+
cols[k.to_sym] << r[i]
|
94
|
+
end
|
95
|
+
end
|
96
|
+
new(cols)
|
97
|
+
end
|
98
|
+
|
99
|
+
# Basic accessors
|
100
|
+
def vectors
|
101
|
+
Vectors.new(@columns.keys)
|
102
|
+
end
|
103
|
+
|
104
|
+
def [](col)
|
105
|
+
Column.new(@columns[col.to_sym] || [])
|
106
|
+
end
|
107
|
+
|
108
|
+
def []=(name, values)
|
109
|
+
name = name.to_sym
|
110
|
+
# Resize to match current rows; pad with nils if needed
|
111
|
+
if values.nil?
|
112
|
+
@columns[name] = Array.new(size, nil)
|
113
|
+
else
|
114
|
+
values = values.dup
|
115
|
+
if values.size < size
|
116
|
+
values += Array.new(size - values.size, nil)
|
117
|
+
elsif values.size > size
|
118
|
+
grow_to(values.size)
|
119
|
+
end
|
120
|
+
@columns[name] = values
|
121
|
+
end
|
122
|
+
end
|
123
|
+
|
124
|
+
def delete_vector(col)
|
125
|
+
@columns.delete(col.to_sym)
|
126
|
+
end
|
127
|
+
|
128
|
+
def to_a
|
129
|
+
order = vectors.to_a
|
130
|
+
(0...size).map do |i|
|
131
|
+
order.map { |k| @columns[k][i] }
|
132
|
+
end
|
133
|
+
end
|
134
|
+
|
135
|
+
def size
|
136
|
+
@columns.values.map(&:size).max || 0
|
137
|
+
end
|
138
|
+
|
139
|
+
def head(n = 5)
|
140
|
+
DataFrame.rows(to_a.first(n), order: vectors.to_a)
|
141
|
+
end
|
142
|
+
|
143
|
+
def write_csv(path)
|
144
|
+
require "csv"
|
145
|
+
order = vectors.to_a
|
146
|
+
CSV.open(path, "w") do |csv|
|
147
|
+
csv << order
|
148
|
+
to_a.each { |row| csv << row }
|
149
|
+
end
|
150
|
+
end
|
151
|
+
|
152
|
+
# Map each column (vector) and return a new DataFrame with resulting arrays
|
153
|
+
def map_vectors
|
154
|
+
result = {}
|
155
|
+
@columns.each do |k, arr|
|
156
|
+
mapped = yield Column.new(arr)
|
157
|
+
result[k] = mapped.is_a?(Column) ? mapped.to_a : Array(mapped)
|
158
|
+
end
|
159
|
+
DataFrame.new(result)
|
160
|
+
end
|
161
|
+
|
162
|
+
# Filter rows by a predicate block that receives an Array of row values
|
163
|
+
def filter_rows(&block)
|
164
|
+
kept = to_a.select(&block)
|
165
|
+
DataFrame.rows(kept, order: vectors.to_a)
|
166
|
+
end
|
167
|
+
|
168
|
+
# Pretty print as a simple table (headers + rows)
|
169
|
+
def to_s
|
170
|
+
headers = vectors.to_a.map(&:to_s)
|
171
|
+
lines = []
|
172
|
+
lines << headers.join("\t")
|
173
|
+
to_a.each do |row|
|
174
|
+
cells = row.map { |v| v.nil? ? "" : v }
|
175
|
+
lines << cells.join("\t")
|
176
|
+
end
|
177
|
+
lines.join("\n")
|
178
|
+
end
|
179
|
+
|
180
|
+
# Compact inspect showing shape
|
181
|
+
def inspect
|
182
|
+
"#<#{self.class} rows=#{size} cols=#{vectors.to_a.size}>"
|
183
|
+
end
|
184
|
+
|
185
|
+
private
|
186
|
+
|
187
|
+
def grow_to(n)
|
188
|
+
@columns.each do |k, arr|
|
189
|
+
@columns[k] = arr + Array.new(n - arr.size, nil) if arr.size < n
|
190
|
+
end
|
191
|
+
end
|
192
|
+
|
193
|
+
def normalize_column_lengths!
|
194
|
+
max_len = size
|
195
|
+
grow_to(max_len)
|
196
|
+
end
|
197
|
+
|
198
|
+
class << self
|
199
|
+
private
|
200
|
+
|
201
|
+
def coerce_value(v)
|
202
|
+
return nil if v.nil? || v == ""
|
203
|
+
|
204
|
+
# Try numeric
|
205
|
+
if v.is_a?(String)
|
206
|
+
if v =~ /^-?\d+$/
|
207
|
+
return v.to_i
|
208
|
+
elsif v =~ /^-?\d*\.\d+$/
|
209
|
+
return v.to_f
|
210
|
+
end
|
211
|
+
end
|
212
|
+
v
|
213
|
+
end
|
214
|
+
end
|
215
|
+
end
|
216
|
+
end
|
@@ -0,0 +1,57 @@
|
|
1
|
+
module RubyScientistAndGraphics
|
2
|
+
require "csv"
|
3
|
+
require_relative "dataframe"
|
4
|
+
class Dataset
|
5
|
+
attr_accessor :df
|
6
|
+
|
7
|
+
def initialize(dataframe, options = {})
|
8
|
+
@df = dataframe
|
9
|
+
apply_options(options)
|
10
|
+
end
|
11
|
+
|
12
|
+
# Aplicar configuraciones iniciales
|
13
|
+
def apply_options(options)
|
14
|
+
remove_columns(options[:remove_columns]) if options[:remove_columns]
|
15
|
+
limit_rows(options[:limit]) if options[:limit]
|
16
|
+
end
|
17
|
+
|
18
|
+
# Eliminar columnas
|
19
|
+
def remove_columns(columns)
|
20
|
+
columns.each { |col| @df.delete_vector(col) if @df.vectors.include?(col) }
|
21
|
+
self
|
22
|
+
end
|
23
|
+
|
24
|
+
# Agregar nueva columna
|
25
|
+
def add_column(name, values)
|
26
|
+
@df[name] = values
|
27
|
+
self
|
28
|
+
end
|
29
|
+
|
30
|
+
# Limitar cantidad de filas
|
31
|
+
def limit_rows(n)
|
32
|
+
@df = DataFrame.rows(@df.to_a.first(n), order: @df.vectors.to_a)
|
33
|
+
self
|
34
|
+
end
|
35
|
+
|
36
|
+
# Reemplazar valores nulos
|
37
|
+
def fill_missing(value)
|
38
|
+
@df = @df.map_vectors { |vector| vector.map { |v| v.nil? ? value : v } }
|
39
|
+
self
|
40
|
+
end
|
41
|
+
|
42
|
+
# Mostrar primeras filas
|
43
|
+
def head(n = 5)
|
44
|
+
@df.head(n)
|
45
|
+
end
|
46
|
+
|
47
|
+
# Acceso rápido a estadísticas
|
48
|
+
def stats
|
49
|
+
Stats.new(@df)
|
50
|
+
end
|
51
|
+
|
52
|
+
# Acceso rápido a gráficos
|
53
|
+
def plot
|
54
|
+
Plotter.new(@df)
|
55
|
+
end
|
56
|
+
end
|
57
|
+
end
|
@@ -0,0 +1,102 @@
|
|
1
|
+
require_relative "ml"
|
2
|
+
require_relative "io"
|
3
|
+
require_relative "utils"
|
4
|
+
# Crear interfaz y ejecutar todo en una sola línea
|
5
|
+
module RubyScientistAndGraphics
|
6
|
+
class Interface
|
7
|
+
attr_reader :dataset, :model
|
8
|
+
|
9
|
+
def initialize
|
10
|
+
@dataset = nil
|
11
|
+
end
|
12
|
+
|
13
|
+
# 1. Cargar datos
|
14
|
+
def load(path, options = {})
|
15
|
+
@dataset = Dataset.new(DataFrame.from_csv(path), options)
|
16
|
+
self
|
17
|
+
end
|
18
|
+
|
19
|
+
# 2. Limpiar datos
|
20
|
+
def clean(missing: nil, remove_columns: nil, limit: nil)
|
21
|
+
return self unless @dataset
|
22
|
+
|
23
|
+
@dataset.fill_missing(missing) if missing
|
24
|
+
@dataset.remove_columns(remove_columns) if remove_columns
|
25
|
+
@dataset.limit_rows(limit) if limit
|
26
|
+
self
|
27
|
+
end
|
28
|
+
|
29
|
+
# 3. Analizar datos
|
30
|
+
def analyze
|
31
|
+
return self unless @dataset
|
32
|
+
|
33
|
+
@dataset.stats.describe
|
34
|
+
self
|
35
|
+
end
|
36
|
+
|
37
|
+
# 4. Graficar
|
38
|
+
def graph(x:, y:, type: :bar, file: "output.png")
|
39
|
+
return self unless @dataset
|
40
|
+
|
41
|
+
case type
|
42
|
+
when :bar
|
43
|
+
@dataset.plot.bar(x: x, y: y, file: file)
|
44
|
+
when :line
|
45
|
+
@dataset.plot.line(x: x, y: y, file: file)
|
46
|
+
else
|
47
|
+
puts "Tipo de gráfico no soportado."
|
48
|
+
end
|
49
|
+
self
|
50
|
+
end
|
51
|
+
|
52
|
+
# 5. Flujo completo
|
53
|
+
def pipeline(path:, clean_opts: {}, analysis: true, graph_opts: nil)
|
54
|
+
load(path)
|
55
|
+
clean(**clean_opts)
|
56
|
+
analyze if analysis
|
57
|
+
graph(**graph_opts) if graph_opts
|
58
|
+
end
|
59
|
+
|
60
|
+
# 6. Entrenar modelos ML sencillos
|
61
|
+
# type: :linear_regression o :kmeans
|
62
|
+
# Para :linear_regression requiere target
|
63
|
+
def train_model(type:, features:, target: nil, clusters: 3)
|
64
|
+
return nil unless @dataset
|
65
|
+
|
66
|
+
ml = ML.new(@dataset.df)
|
67
|
+
case type
|
68
|
+
when :linear_regression
|
69
|
+
raise ArgumentError, "target requerido para linear_regression" unless target
|
70
|
+
|
71
|
+
@model = ml.linear_regression(features: features, target: target)
|
72
|
+
when :kmeans
|
73
|
+
@model = ml.kmeans(features: features, clusters: clusters)
|
74
|
+
else
|
75
|
+
raise ArgumentError, "Tipo de modelo no soportado"
|
76
|
+
end
|
77
|
+
@model
|
78
|
+
end
|
79
|
+
|
80
|
+
# 7b. Cargar un proyecto y setear el dataset
|
81
|
+
def load_project(path)
|
82
|
+
df = IO.load_project(path)
|
83
|
+
@dataset = Dataset.new(df)
|
84
|
+
self
|
85
|
+
end
|
86
|
+
|
87
|
+
# 8. Predecir con el modelo actual
|
88
|
+
# data: matriz (Array<Array<Numeric>>) con filas de features
|
89
|
+
def predict(data)
|
90
|
+
raise "No model trained" unless @model
|
91
|
+
|
92
|
+
@model.predict(data)
|
93
|
+
end
|
94
|
+
|
95
|
+
# 7. Guardar proyecto (df actual a JSON estructurado)
|
96
|
+
def save_project(path)
|
97
|
+
return unless @dataset
|
98
|
+
|
99
|
+
IO.save_project(@dataset.df, path)
|
100
|
+
end
|
101
|
+
end
|
102
|
+
end
|
@@ -0,0 +1,49 @@
|
|
1
|
+
# lib/ruby_scientist_and_graphics/io.rb
|
2
|
+
require "json"
|
3
|
+
require "csv"
|
4
|
+
|
5
|
+
module RubyScientistAndGraphics
|
6
|
+
module IO
|
7
|
+
module_function
|
8
|
+
|
9
|
+
require_relative "dataframe"
|
10
|
+
|
11
|
+
# Cargar CSV
|
12
|
+
def load_csv(path)
|
13
|
+
DataFrame.from_csv(path)
|
14
|
+
end
|
15
|
+
|
16
|
+
# Cargar JSON
|
17
|
+
def load_json(path)
|
18
|
+
data = JSON.parse(File.read(path))
|
19
|
+
DataFrame.new(data)
|
20
|
+
end
|
21
|
+
|
22
|
+
# Exportar a CSV
|
23
|
+
def save_csv(df, path)
|
24
|
+
df.write_csv(path)
|
25
|
+
puts "Datos exportados a #{path}"
|
26
|
+
end
|
27
|
+
|
28
|
+
# Exportar a JSON
|
29
|
+
def save_json(df, path)
|
30
|
+
File.write(path, df.to_a.to_json)
|
31
|
+
puts "Datos exportados a #{path}"
|
32
|
+
end
|
33
|
+
|
34
|
+
# Guardar proyecto (estructura completa)
|
35
|
+
def save_project(df, path)
|
36
|
+
File.write(path, {
|
37
|
+
columns: df.vectors.to_a,
|
38
|
+
data: df.to_a
|
39
|
+
}.to_json)
|
40
|
+
puts "Proyecto guardado en #{path}"
|
41
|
+
end
|
42
|
+
|
43
|
+
# Cargar proyecto
|
44
|
+
def load_project(path)
|
45
|
+
proj = JSON.parse(File.read(path))
|
46
|
+
DataFrame.rows(proj["data"], order: proj["columns"].map(&:to_sym))
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
@@ -0,0 +1,168 @@
|
|
1
|
+
# lib/ruby_scientist_and_graphics/ml.rb
|
2
|
+
module RubyScientistAndGraphics
|
3
|
+
class ML
|
4
|
+
def initialize(df)
|
5
|
+
@df = df
|
6
|
+
end
|
7
|
+
|
8
|
+
# Entrenar un modelo de regresión lineal (mínimos cuadrados)
|
9
|
+
def linear_regression(features:, target:)
|
10
|
+
x = build_matrix(features)
|
11
|
+
y = @df[target].to_a.map(&:to_f)
|
12
|
+
# añadir bias (columna de 1s)
|
13
|
+
x_bias = x.map { |row| [1.0] + row }
|
14
|
+
xt = transpose(x_bias)
|
15
|
+
xtx = mat_mul(xt, x_bias)
|
16
|
+
xty = mat_vec_mul(xt, y)
|
17
|
+
w = solve_sym_posdef(xtx, xty) # vector de pesos
|
18
|
+
LinearRegressionModel.new(w)
|
19
|
+
end
|
20
|
+
|
21
|
+
# Entrenar K-Means (Lloyd)
|
22
|
+
def kmeans(features:, clusters: 3, max_iter: 100)
|
23
|
+
data = build_matrix(features)
|
24
|
+
model = KMeansModel.new(clusters)
|
25
|
+
model.fit(data, max_iter: max_iter)
|
26
|
+
model
|
27
|
+
end
|
28
|
+
|
29
|
+
private
|
30
|
+
|
31
|
+
def build_matrix(features)
|
32
|
+
feats = Array(features)
|
33
|
+
cols = feats.map { |f| @df[f].to_a.map(&:to_f) }
|
34
|
+
rows = @df.size
|
35
|
+
(0...rows).map { |i| cols.map { |c| c[i] } }
|
36
|
+
end
|
37
|
+
|
38
|
+
def transpose(m)
|
39
|
+
return [] if m.empty?
|
40
|
+
|
41
|
+
(0...m.first.size).map { |j| m.map { |row| row[j] } }
|
42
|
+
end
|
43
|
+
|
44
|
+
def mat_mul(a, b)
|
45
|
+
bt = transpose(b)
|
46
|
+
a.map { |row| bt.map { |col| dot(row, col) } }
|
47
|
+
end
|
48
|
+
|
49
|
+
def mat_vec_mul(a, v)
|
50
|
+
a.map { |row| dot(row, v) }
|
51
|
+
end
|
52
|
+
|
53
|
+
def dot(x, y)
|
54
|
+
x.each_index.reduce(0.0) { |s, i| s + x[i] * y[i] }
|
55
|
+
end
|
56
|
+
|
57
|
+
# Resolver (A w = b) para A simétrica definida positiva (Cholesky)
|
58
|
+
def solve_sym_posdef(a, b)
|
59
|
+
l = cholesky(a)
|
60
|
+
# forward substitution: L y = b
|
61
|
+
y = Array.new(b.size, 0.0)
|
62
|
+
(0...l.size).each do |i|
|
63
|
+
sum = 0.0
|
64
|
+
(0...i).each { |k| sum += l[i][k] * y[k] }
|
65
|
+
y[i] = (b[i] - sum) / l[i][i]
|
66
|
+
end
|
67
|
+
# backward substitution: L^T w = y
|
68
|
+
n = l.size
|
69
|
+
w = Array.new(n, 0.0)
|
70
|
+
(n - 1).downto(0) do |i|
|
71
|
+
sum = 0.0
|
72
|
+
(i + 1...n).each { |k| sum += l[k][i] * w[k] }
|
73
|
+
w[i] = (y[i] - sum) / l[i][i]
|
74
|
+
end
|
75
|
+
w
|
76
|
+
end
|
77
|
+
|
78
|
+
def cholesky(a)
|
79
|
+
n = a.size
|
80
|
+
l = Array.new(n) { Array.new(n, 0.0) }
|
81
|
+
n.times do |i|
|
82
|
+
(0..i).each do |j|
|
83
|
+
sum = 0.0
|
84
|
+
(0...j).each { |k| sum += l[i][k] * l[j][k] }
|
85
|
+
if i == j
|
86
|
+
val = a[i][i] - sum
|
87
|
+
l[i][j] = val > 0 ? Math.sqrt(val) : 0.0
|
88
|
+
else
|
89
|
+
l[i][j] = (a[i][j] - sum) / (l[j][j].zero? ? 1.0 : l[j][j])
|
90
|
+
end
|
91
|
+
end
|
92
|
+
end
|
93
|
+
l
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
class LinearRegressionModel
|
98
|
+
def initialize(weights)
|
99
|
+
@weights = weights # [bias, w1, w2, ...]
|
100
|
+
end
|
101
|
+
|
102
|
+
def predict(x)
|
103
|
+
mat = x.map { |row| [1.0] + row.map(&:to_f) }
|
104
|
+
mat.map { |row| row.each_index.reduce(0.0) { |s, i| s + row[i] * @weights[i] } }
|
105
|
+
end
|
106
|
+
end
|
107
|
+
|
108
|
+
class KMeansModel
|
109
|
+
attr_reader :n_clusters
|
110
|
+
|
111
|
+
def initialize(k)
|
112
|
+
@n_clusters = k
|
113
|
+
@centroids = []
|
114
|
+
end
|
115
|
+
|
116
|
+
def fit(data, max_iter: 100)
|
117
|
+
k = @n_clusters
|
118
|
+
# init: pick first k points (simple, deterministic for tests)
|
119
|
+
@centroids = data.first(k).map(&:dup)
|
120
|
+
max_iter.times do
|
121
|
+
clusters = Array.new(k) { [] }
|
122
|
+
data.each do |point|
|
123
|
+
idx = nearest_centroid(point)
|
124
|
+
clusters[idx] << point
|
125
|
+
end
|
126
|
+
new_centroids = clusters.map do |pts|
|
127
|
+
if pts.empty?
|
128
|
+
@centroids.sample || Array.new(data.first.size, 0.0)
|
129
|
+
else
|
130
|
+
mean_point(pts)
|
131
|
+
end
|
132
|
+
end
|
133
|
+
break if converged?(@centroids, new_centroids)
|
134
|
+
|
135
|
+
@centroids = new_centroids
|
136
|
+
end
|
137
|
+
self
|
138
|
+
end
|
139
|
+
|
140
|
+
def predict(data)
|
141
|
+
data.map { |point| nearest_centroid(point) }
|
142
|
+
end
|
143
|
+
|
144
|
+
private
|
145
|
+
|
146
|
+
def mean_point(points)
|
147
|
+
dims = points.first.size
|
148
|
+
sums = Array.new(dims, 0.0)
|
149
|
+
points.each { |p| p.each_index { |i| sums[i] += p[i].to_f } }
|
150
|
+
sums.map { |s| s / points.size }
|
151
|
+
end
|
152
|
+
|
153
|
+
def nearest_centroid(point)
|
154
|
+
dists = @centroids.map { |c| squared_distance(c, point) }
|
155
|
+
dists.each_with_index.min.last
|
156
|
+
end
|
157
|
+
|
158
|
+
def squared_distance(a, b)
|
159
|
+
a.each_index.reduce(0.0) { |s, i| s + (a[i].to_f - b[i].to_f)**2 }
|
160
|
+
end
|
161
|
+
|
162
|
+
def converged?(a, b)
|
163
|
+
a.each_with_index.all? do |cent, i|
|
164
|
+
cent.each_index.all? { |j| (cent[j] - b[i][j]).abs < 1e-9 }
|
165
|
+
end
|
166
|
+
end
|
167
|
+
end
|
168
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
module RubyScientistAndGraphics
|
2
|
+
class Plotter
|
3
|
+
def initialize(df)
|
4
|
+
@df = df
|
5
|
+
end
|
6
|
+
|
7
|
+
def bar(x:, y:, file: "plot.png")
|
8
|
+
g = Gruff::Bar.new
|
9
|
+
g.title = "#{y} por #{x}"
|
10
|
+
@df[x].to_a.each_with_index do |label, idx|
|
11
|
+
g.data(label, [@df[y][idx]])
|
12
|
+
end
|
13
|
+
g.write(file)
|
14
|
+
puts "Gráfico guardado en #{file}"
|
15
|
+
end
|
16
|
+
|
17
|
+
def line(x:, y:, file: "plot.png")
|
18
|
+
g = Gruff::Line.new
|
19
|
+
g.title = "#{y} por #{x}"
|
20
|
+
g.labels = @df[x].to_a.each_with_index.map { |v, i| [i, v.to_s] }.to_h
|
21
|
+
g.data(y.to_sym, @df[y].to_a)
|
22
|
+
g.write(file)
|
23
|
+
puts "Gráfico guardado en #{file}"
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,48 @@
|
|
1
|
+
module RubyScientistAndGraphics
|
2
|
+
class Stats
|
3
|
+
def initialize(df)
|
4
|
+
@df = df
|
5
|
+
end
|
6
|
+
|
7
|
+
def describe
|
8
|
+
@df.vectors.each do |col|
|
9
|
+
col_data = @df[col]
|
10
|
+
next unless col_data.type == :numeric
|
11
|
+
|
12
|
+
data = col_data.to_a.compact.map(&:to_f)
|
13
|
+
next if data.empty?
|
14
|
+
|
15
|
+
mean = data.sum / data.size
|
16
|
+
min = data.min
|
17
|
+
max = data.max
|
18
|
+
puts "#{col}: Media=#{mean.round(2)}, Min=#{min}, Max=#{max}"
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
def correlation(col1, col2)
|
23
|
+
x = @df[col1].to_a.compact.map(&:to_f)
|
24
|
+
y = @df[col2].to_a.compact.map(&:to_f)
|
25
|
+
n = [x.size, y.size].min
|
26
|
+
x = x.first(n)
|
27
|
+
y = y.first(n)
|
28
|
+
return 0.0 if n == 0
|
29
|
+
|
30
|
+
mean_x = x.sum / n
|
31
|
+
mean_y = y.sum / n
|
32
|
+
num = 0.0
|
33
|
+
den_x = 0.0
|
34
|
+
den_y = 0.0
|
35
|
+
n.times do |i|
|
36
|
+
dx = x[i] - mean_x
|
37
|
+
dy = y[i] - mean_y
|
38
|
+
num += dx * dy
|
39
|
+
den_x += dx * dx
|
40
|
+
den_y += dy * dy
|
41
|
+
end
|
42
|
+
den = Math.sqrt(den_x * den_y)
|
43
|
+
return 0.0 if den.zero?
|
44
|
+
|
45
|
+
num / den
|
46
|
+
end
|
47
|
+
end
|
48
|
+
end
|
@@ -0,0 +1,31 @@
|
|
1
|
+
# lib/ruby_scientist_and_graphics/utils.rb
|
2
|
+
module RubyScientistAndGraphics
|
3
|
+
module Utils
|
4
|
+
module_function
|
5
|
+
|
6
|
+
# Normalizar valores a rango 0-1
|
7
|
+
def normalize(array)
|
8
|
+
min = array.min.to_f
|
9
|
+
max = array.max.to_f
|
10
|
+
array.map { |v| (v.to_f - min) / (max - min) }
|
11
|
+
end
|
12
|
+
|
13
|
+
# Estandarizar valores (media 0, desviación 1)
|
14
|
+
def standardize(array)
|
15
|
+
mean = array.sum.to_f / array.size
|
16
|
+
stddev = Math.sqrt(array.map { |v| (v.to_f - mean)**2 }.sum / array.size)
|
17
|
+
array.map { |v| (v.to_f - mean) / stddev }
|
18
|
+
end
|
19
|
+
|
20
|
+
# One-hot encoding para categóricas
|
21
|
+
def one_hot_encode(array)
|
22
|
+
categories = array.uniq
|
23
|
+
array.map { |val| categories.map { |c| val == c ? 1 : 0 } }
|
24
|
+
end
|
25
|
+
|
26
|
+
# Eliminar filas con valores nulos
|
27
|
+
def drop_na(df)
|
28
|
+
df.filter_rows { |row| !row.include?(nil) }
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
@@ -0,0 +1,42 @@
|
|
1
|
+
require "gruff"
|
2
|
+
|
3
|
+
require_relative "ruby_scientist_and_graphics/dataframe"
|
4
|
+
require_relative "ruby_scientist_and_graphics/dataset"
|
5
|
+
require_relative "ruby_scientist_and_graphics/plotter"
|
6
|
+
require_relative "ruby_scientist_and_graphics/stats"
|
7
|
+
require_relative "ruby_scientist_and_graphics/interface"
|
8
|
+
require_relative "ruby_scientist_and_graphics/version"
|
9
|
+
|
10
|
+
module RubyScientistAndGraphics
|
11
|
+
def self.load_csv(path, options = {})
|
12
|
+
Dataset.new(DataFrame.from_csv(path), options)
|
13
|
+
end
|
14
|
+
|
15
|
+
# Simple configuration holder
|
16
|
+
module Config
|
17
|
+
class << self
|
18
|
+
attr_accessor :backend
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
# Keep a reference to the original internal DataFrame
|
23
|
+
ORIG_DATAFRAME = DataFrame unless const_defined?(:ORIG_DATAFRAME)
|
24
|
+
|
25
|
+
# Switch the backend at runtime. Supported: :internal, :rover
|
26
|
+
def self.use_backend(backend)
|
27
|
+
case backend
|
28
|
+
when :internal
|
29
|
+
remove_const(:DataFrame) if const_defined?(:DataFrame)
|
30
|
+
const_set(:DataFrame, ORIG_DATAFRAME)
|
31
|
+
Config.backend = :internal
|
32
|
+
when :rover
|
33
|
+
require_relative "ruby_scientist_and_graphics/backends/rover_adapter"
|
34
|
+
remove_const(:DataFrame) if const_defined?(:DataFrame)
|
35
|
+
const_set(:DataFrame, RubyScientistAndGraphics::Backends::RoverAdapter)
|
36
|
+
Config.backend = :rover
|
37
|
+
else
|
38
|
+
raise ArgumentError, "Unknown backend: #{backend}"
|
39
|
+
end
|
40
|
+
true
|
41
|
+
end
|
42
|
+
end
|
metadata
ADDED
@@ -0,0 +1,106 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: ruby_scientist_and_graphics
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- jtvaldivia
|
8
|
+
autorequire:
|
9
|
+
bindir: exe
|
10
|
+
cert_chain: []
|
11
|
+
date: 2025-08-10 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: csv
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - ">="
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '0'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - ">="
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: gruff
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :runtime
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
41
|
+
description: |2
|
42
|
+
RubyScience es una gema que integra utilidades prácticas para ciencia de datos en Ruby.
|
43
|
+
Incluye un DataFrame minimal propio para manipulación y limpieza de datos y Gruff para visualización,
|
44
|
+
todo bajo una API unificada y personalizable.
|
45
|
+
|
46
|
+
Características principales:
|
47
|
+
- Carga de datos desde CSV y otros formatos.
|
48
|
+
- Limpieza y transformación de datos (eliminar columnas, manejar valores nulos, limitar filas).
|
49
|
+
- Estadísticas descriptivas y correlaciones rápidas.
|
50
|
+
- Creación de gráficos de barras y líneas con opciones personalizables.
|
51
|
+
- API sencilla inspirada en pandas de Python, pero adaptada al estilo Ruby.
|
52
|
+
|
53
|
+
Ideal para analistas, científicos de datos y desarrolladores Ruby que necesiten explorar datos
|
54
|
+
sin depender de entornos como Python o R.
|
55
|
+
email:
|
56
|
+
- josevaldivia9@gmail.com
|
57
|
+
executables: []
|
58
|
+
extensions: []
|
59
|
+
extra_rdoc_files: []
|
60
|
+
files:
|
61
|
+
- CHANGELOG.md
|
62
|
+
- CODE_OF_CONDUCT.md
|
63
|
+
- LICENSE.txt
|
64
|
+
- README.md
|
65
|
+
- Rakefile
|
66
|
+
- demo.rb
|
67
|
+
- lib/ruby_scientist_and_graphics.rb
|
68
|
+
- lib/ruby_scientist_and_graphics/backends/rover_adapter.rb
|
69
|
+
- lib/ruby_scientist_and_graphics/dataframe.rb
|
70
|
+
- lib/ruby_scientist_and_graphics/dataset.rb
|
71
|
+
- lib/ruby_scientist_and_graphics/interface.rb
|
72
|
+
- lib/ruby_scientist_and_graphics/io.rb
|
73
|
+
- lib/ruby_scientist_and_graphics/ml.rb
|
74
|
+
- lib/ruby_scientist_and_graphics/plotter.rb
|
75
|
+
- lib/ruby_scientist_and_graphics/stats.rb
|
76
|
+
- lib/ruby_scientist_and_graphics/utils.rb
|
77
|
+
- lib/ruby_scientist_and_graphics/version.rb
|
78
|
+
- sig/ruby_scientist_and_graphics.rbs
|
79
|
+
homepage: https://github.com/jtvaldivia/Ruby_scientist_and_graphics
|
80
|
+
licenses:
|
81
|
+
- MIT
|
82
|
+
metadata:
|
83
|
+
homepage_uri: https://github.com/jtvaldivia/Ruby_scientist_and_graphics
|
84
|
+
source_code_uri: https://github.com/jtvaldivia/Ruby_scientist_and_graphics
|
85
|
+
changelog_uri: https://github.com/jtvaldivia/Ruby_scientist_and_graphics/blob/master/CHANGELOG.md
|
86
|
+
post_install_message:
|
87
|
+
rdoc_options: []
|
88
|
+
require_paths:
|
89
|
+
- lib
|
90
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
91
|
+
requirements:
|
92
|
+
- - ">="
|
93
|
+
- !ruby/object:Gem::Version
|
94
|
+
version: 3.2.0
|
95
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
96
|
+
requirements:
|
97
|
+
- - ">="
|
98
|
+
- !ruby/object:Gem::Version
|
99
|
+
version: '0'
|
100
|
+
requirements: []
|
101
|
+
rubygems_version: 3.5.22
|
102
|
+
signing_key:
|
103
|
+
specification_version: 4
|
104
|
+
summary: 'Suite de Data Science para Ruby: limpieza, análisis y visualización de datos
|
105
|
+
en una sola gema.'
|
106
|
+
test_files: []
|