fast_statistics 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +16 -13
- data/Rakefile +0 -12
- data/ext/fast_statistics/extconf.rb +3 -0
- data/lib/fast_statistics/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 98882f948017b8d64fdd334bd0ea4e3475a9beb0e1817e766af4779a9f56aafe
|
4
|
+
data.tar.gz: e06600fef9d8026e559928bac3278b606a52a0c0f287320bfc9457d45a4ba7ff
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: '0449c749047a282bc3aaebfeb1c84a5416cd98a727884f5153f25513b7750ad557b7c51cc51356245c7a28a2ed0d096638b3efd9a6a89801e4ad532386f7e7f7'
|
7
|
+
data.tar.gz: e3c2ceedb95e765e5e88dfb4ceddc36065f6040e2a066c03f480d75867f7c5763f559e73c26525ce493270cbdf3865cfe9f5e7a83076e85e752e21cb2fb12aad
|
data/README.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
#
|
1
|
+
# Fast Statistics :rocket:
|
2
2
|

|
3
3
|
|
4
4
|
A high performance native ruby extension (written in C++) for computation of
|
@@ -9,6 +9,10 @@ This gem provides fast computation of descriptive statistics (min, max, mean,
|
|
9
9
|
median, 1st and 3rd quartiles, population standard deviation) for a multivariate
|
10
10
|
dataset (represented as a 2D array) in ruby.
|
11
11
|
|
12
|
+
It is **~11x** faster than an optimal algorithm in hand-written ruby, and
|
13
|
+
**~4.7x** faster than the next fastest available ruby gem or native extension
|
14
|
+
(see [benchmarks](#benchmarks) below).
|
15
|
+
|
12
16
|
## Installation
|
13
17
|
|
14
18
|
Add this line to your application's Gemfile:
|
@@ -81,10 +85,6 @@ Some alternatives compared are:
|
|
81
85
|
- [Numo::NArray](https://github.com/ruby-numo/numo-narray)
|
82
86
|
- Hand-written ruby (using the same algorithm implemented in C++ in this gem)
|
83
87
|
|
84
|
-
Benchmarked on my machine (8th gen i7, sse2), this gem is **~11x**
|
85
|
-
faster than an optimal algorithm in hand-written ruby, and **~4.7x** faster than
|
86
|
-
the next fastest available native ruby extension (that I tested).
|
87
|
-
|
88
88
|
You can reivew the benchmark implementations at `benchmark/benchmark.rb` and run the
|
89
89
|
benchmark with `rake benchmark`.
|
90
90
|
|
@@ -124,6 +124,15 @@ first explored performing the computations natively in [this
|
|
124
124
|
repository](https://github.com/Martin-Nyaga/ruby-ffi-simd). The results were
|
125
125
|
promising, so I decided to package it as a ruby gem.
|
126
126
|
|
127
|
+
**Note**: This is an early release and should be considered unstable, at least
|
128
|
+
until I'm confident in the stability & performance in a real world application
|
129
|
+
setting. Feel free to test it out in non-critical scenarios/environments (let
|
130
|
+
me know in [this discussion
|
131
|
+
thread](https://github.com/Martin-Nyaga/fast_statistics/discussions/1) or by
|
132
|
+
filing an issue if you use it!). I'm also not really an expert in C++, so
|
133
|
+
reviews & suggestions are welcome.
|
134
|
+
|
135
|
+
### How is the performance achieved?
|
127
136
|
The following factors combined help this gem achieve high performance compared
|
128
137
|
to available native alternatives and hand-written computations in ruby:
|
129
138
|
|
@@ -139,20 +148,14 @@ to available native alternatives and hand-written computations in ruby:
|
|
139
148
|
where possible, giving an additional speed advantage while still being single
|
140
149
|
threaded.
|
141
150
|
|
142
|
-
|
151
|
+
### Limitations of the current implementation
|
152
|
+
The speed gains notwithstanding, there are some limitations in the current implementation:
|
143
153
|
- The variables in the 2D array must all have the same number of data points
|
144
154
|
(inner arrays must have the same length) and contain only numbers (i.e. no
|
145
155
|
`nil` awareness is present).
|
146
156
|
- There is currently no API to calculate single statistics (although this may be
|
147
157
|
made available in the future).
|
148
158
|
|
149
|
-
This is an early release and should be considered unstable, at least until I'm
|
150
|
-
confident in the stability & performance in a real world application setting
|
151
|
-
(let me know in [the Welcome discussion
|
152
|
-
thread](https://github.com/Martin-Nyaga/fast_statistics/discussions/1) if you
|
153
|
-
use it!). I'm also not really an expert in C++, so reviews & suggestions are
|
154
|
-
welcome.
|
155
|
-
|
156
159
|
## Contributing
|
157
160
|
|
158
161
|
Bug reports and pull requests are welcome on GitHub at
|
data/Rakefile
CHANGED
@@ -22,15 +22,3 @@ task :benchmark => [:clean, :compile] do
|
|
22
22
|
bench.compare_results!
|
23
23
|
bench.benchmark_ips!
|
24
24
|
end
|
25
|
-
|
26
|
-
task :profile => [:clean, :compile] do
|
27
|
-
require "fast_statistics"
|
28
|
-
$stdout.sync = true
|
29
|
-
|
30
|
-
variables = 12
|
31
|
-
length = 100_000
|
32
|
-
data = (0..(variables - 1)).map { (0..(length - 1)).map { rand } }
|
33
|
-
FastStatistics::Array2D.new(data, dtype: :float).mean.to_a
|
34
|
-
puts
|
35
|
-
puts
|
36
|
-
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: fast_statistics
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Martin Nyaga
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2021-
|
11
|
+
date: 2021-07-01 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
13
|
description: Fast computation of descriptive statistics in ruby using native code
|
14
14
|
and SIMD
|