fast_statistics 0.2.0 → 0.2.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +16 -13
- data/Rakefile +0 -12
- data/ext/fast_statistics/extconf.rb +3 -0
- data/lib/fast_statistics/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 98882f948017b8d64fdd334bd0ea4e3475a9beb0e1817e766af4779a9f56aafe
|
4
|
+
data.tar.gz: e06600fef9d8026e559928bac3278b606a52a0c0f287320bfc9457d45a4ba7ff
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: '0449c749047a282bc3aaebfeb1c84a5416cd98a727884f5153f25513b7750ad557b7c51cc51356245c7a28a2ed0d096638b3efd9a6a89801e4ad532386f7e7f7'
|
7
|
+
data.tar.gz: e3c2ceedb95e765e5e88dfb4ceddc36065f6040e2a066c03f480d75867f7c5763f559e73c26525ce493270cbdf3865cfe9f5e7a83076e85e752e21cb2fb12aad
|
data/README.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
#
|
1
|
+
# Fast Statistics :rocket:
|
2
2
|
![Build Status](https://travis-ci.com/Martin-Nyaga/fast_statistics.svg?branch=master)
|
3
3
|
|
4
4
|
A high performance native ruby extension (written in C++) for computation of
|
@@ -9,6 +9,10 @@ This gem provides fast computation of descriptive statistics (min, max, mean,
|
|
9
9
|
median, 1st and 3rd quartiles, population standard deviation) for a multivariate
|
10
10
|
dataset (represented as a 2D array) in ruby.
|
11
11
|
|
12
|
+
It is **~11x** faster than an optimal algorithm in hand-written ruby, and
|
13
|
+
**~4.7x** faster than the next fastest available ruby gem or native extension
|
14
|
+
(see [benchmarks](#benchmarks) below).
|
15
|
+
|
12
16
|
## Installation
|
13
17
|
|
14
18
|
Add this line to your application's Gemfile:
|
@@ -81,10 +85,6 @@ Some alternatives compared are:
|
|
81
85
|
- [Numo::NArray](https://github.com/ruby-numo/numo-narray)
|
82
86
|
- Hand-written ruby (using the same algorithm implemented in C++ in this gem)
|
83
87
|
|
84
|
-
Benchmarked on my machine (8th gen i7, sse2), this gem is **~11x**
|
85
|
-
faster than an optimal algorithm in hand-written ruby, and **~4.7x** faster than
|
86
|
-
the next fastest available native ruby extension (that I tested).
|
87
|
-
|
88
88
|
You can reivew the benchmark implementations at `benchmark/benchmark.rb` and run the
|
89
89
|
benchmark with `rake benchmark`.
|
90
90
|
|
@@ -124,6 +124,15 @@ first explored performing the computations natively in [this
|
|
124
124
|
repository](https://github.com/Martin-Nyaga/ruby-ffi-simd). The results were
|
125
125
|
promising, so I decided to package it as a ruby gem.
|
126
126
|
|
127
|
+
**Note**: This is an early release and should be considered unstable, at least
|
128
|
+
until I'm confident in the stability & performance in a real world application
|
129
|
+
setting. Feel free to test it out in non-critical scenarios/environments (let
|
130
|
+
me know in [this discussion
|
131
|
+
thread](https://github.com/Martin-Nyaga/fast_statistics/discussions/1) or by
|
132
|
+
filing an issue if you use it!). I'm also not really an expert in C++, so
|
133
|
+
reviews & suggestions are welcome.
|
134
|
+
|
135
|
+
### How is the performance achieved?
|
127
136
|
The following factors combined help this gem achieve high performance compared
|
128
137
|
to available native alternatives and hand-written computations in ruby:
|
129
138
|
|
@@ -139,20 +148,14 @@ to available native alternatives and hand-written computations in ruby:
|
|
139
148
|
where possible, giving an additional speed advantage while still being single
|
140
149
|
threaded.
|
141
150
|
|
142
|
-
|
151
|
+
### Limitations of the current implementation
|
152
|
+
The speed gains notwithstanding, there are some limitations in the current implementation:
|
143
153
|
- The variables in the 2D array must all have the same number of data points
|
144
154
|
(inner arrays must have the same length) and contain only numbers (i.e. no
|
145
155
|
`nil` awareness is present).
|
146
156
|
- There is currently no API to calculate single statistics (although this may be
|
147
157
|
made available in the future).
|
148
158
|
|
149
|
-
This is an early release and should be considered unstable, at least until I'm
|
150
|
-
confident in the stability & performance in a real world application setting
|
151
|
-
(let me know in [the Welcome discussion
|
152
|
-
thread](https://github.com/Martin-Nyaga/fast_statistics/discussions/1) if you
|
153
|
-
use it!). I'm also not really an expert in C++, so reviews & suggestions are
|
154
|
-
welcome.
|
155
|
-
|
156
159
|
## Contributing
|
157
160
|
|
158
161
|
Bug reports and pull requests are welcome on GitHub at
|
data/Rakefile
CHANGED
@@ -22,15 +22,3 @@ task :benchmark => [:clean, :compile] do
|
|
22
22
|
bench.compare_results!
|
23
23
|
bench.benchmark_ips!
|
24
24
|
end
|
25
|
-
|
26
|
-
task :profile => [:clean, :compile] do
|
27
|
-
require "fast_statistics"
|
28
|
-
$stdout.sync = true
|
29
|
-
|
30
|
-
variables = 12
|
31
|
-
length = 100_000
|
32
|
-
data = (0..(variables - 1)).map { (0..(length - 1)).map { rand } }
|
33
|
-
FastStatistics::Array2D.new(data, dtype: :float).mean.to_a
|
34
|
-
puts
|
35
|
-
puts
|
36
|
-
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: fast_statistics
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Martin Nyaga
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2021-
|
11
|
+
date: 2021-07-01 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
13
|
description: Fast computation of descriptive statistics in ruby using native code
|
14
14
|
and SIMD
|