tomoto 0.6.0-aarch64-linux → 0.6.2-aarch64-linux

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -16,7 +16,7 @@ You can get 5~10 times speed by just replacing old Eigen's Random or unvectoriza
16
16
  * 5~10 times faster than non-vectorized functions
17
17
  * Header-only (like Eigen)
18
18
  * Can be easily integrated with Eigen's expressions
19
- * Currently supports only x86, x86-64(up to AVX2), and ARM64 NEON (experimental) architecture.
19
+ * Currently supports only x86, x86-64(up to AVX2), and ARM64 NEON architecture.
20
20
 
21
21
  ## Requirement
22
22
 
@@ -46,6 +46,13 @@ You can specify additional compiler arguments including target machine options (
46
46
  $ cmake -DCMAKE_BUILD_TYPE=Release -DEIGENRAND_CXX_FLAGS="-march=native" ..
47
47
  ```
48
48
 
49
+ Alternatively cmake preset with cmake 3.21 or later can be used to compile EigenRand which also integrates nicely in VSCode
50
+ ```console
51
+ cmake --preset default
52
+ cmake --build --preset default
53
+ ctest --preset default
54
+ ```
55
+
49
56
  ## Documentation
50
57
 
51
58
  https://bab2min.github.io/eigenrand/
@@ -54,33 +61,37 @@ https://bab2min.github.io/eigenrand/
54
61
 
55
62
  ### Random distributions for real types
56
63
 
57
- | Function | Generator | Scalar Type | Description | Equivalent to |
58
- |:---:|:---:|:---:|:---:|:---:|
59
- | `Eigen::Rand::balanced` | `Eigen::Rand::BalancedGen` | float, double | generates real values in the [-1, 1] range | `Eigen::DenseBase<Ty>::Random` for floating point types |
60
- | `Eigen::Rand::beta` | `Eigen::Rand::BetaGen` | float, double | generates real values on a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) | |
61
- | `Eigen::Rand::cauchy` | `Eigen::Rand::CauchyGen` | float, double | generates real values on the [Cauchy distribution](https://en.wikipedia.org/wiki/Cauchy_distribution). | `std::cauchy_distribution` |
62
- | `Eigen::Rand::chiSquared` | `Eigen::Rand::ChiSquaredGen` | float, double | generates real values on a [chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution). | `std::chi_squared_distribution` |
63
- | `Eigen::Rand::exponential` | `Eigen::Rand::ExponentialGen` | float, double | generates real values on an [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution). | `std::exponential_distribution` |
64
- | `Eigen::Rand::extremeValue` | `Eigen::Rand::ExtremeValueGen` | float, double | generates real values on an [extreme value distribution](https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution). | `std::extreme_value_distribution` |
65
- | `Eigen::Rand::fisherF` | `Eigen::Rand::FisherFGen` | float, double | generates real values on the [Fisher's F distribution](https://en.wikipedia.org/wiki/F_distribution). | `std::fisher_f_distribution` |
66
- | `Eigen::Rand::gamma` | `Eigen::Rand::GammaGen` | float, double | generates real values on a [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution). | `std::gamma_distribution` |
67
- | `Eigen::Rand::lognormal` | `Eigen::Rand::LognormalGen` | float, double | generates real values on a [lognormal distribution](https://en.wikipedia.org/wiki/Lognormal_distribution). | `std::lognormal_distribution` |
68
- | `Eigen::Rand::normal` | `Eigen::Rand::StdNormalGen`, `Eigen::Rand::NormalGen` | float, double | generates real values on a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution). | `std::normal_distribution` |
69
- | `Eigen::Rand::studentT` | `Eigen::Rand::StudentTGen` | float, double | generates real values on the [Student's t distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution). | `std::student_t_distribution` |
70
- | `Eigen::Rand::uniformReal` | `Eigen::Rand::UniformRealGen` | float, double | generates real values in the `[0, 1)` range. | `std::generate_canonical` |
71
- | `Eigen::Rand::weibull` | `Eigen::Rand::WeibullGen` | float, double | generates real values on the [Weibull distribution](https://en.wikipedia.org/wiki/Weibull_distribution). | `std::weibull_distribution` |
64
+ | Function | Generator | Scalar Type | VoP | Description | Equivalent to |
65
+ |:---:|:---:|:---:|:---:|:---:|:---:|
66
+ | `Eigen::Rand::balanced` | `Eigen::Rand::BalancedGen` | float, double | Yes | generates real values in the [-1, 1] range | `Eigen::DenseBase<Ty>::Random` for floating point types |
67
+ | `Eigen::Rand::beta` | `Eigen::Rand::BetaGen` | float, double | | generates real values on a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) | |
68
+ | `Eigen::Rand::cauchy` | `Eigen::Rand::CauchyGen` | float, double | Yes | generates real values on the [Cauchy distribution](https://en.wikipedia.org/wiki/Cauchy_distribution). | `std::cauchy_distribution` |
69
+ | `Eigen::Rand::chiSquared` | `Eigen::Rand::ChiSquaredGen` | float, double | | generates real values on a [chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution). | `std::chi_squared_distribution` |
70
+ | `Eigen::Rand::exponential` | `Eigen::Rand::ExponentialGen` | float, double | Yes | generates real values on an [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution). | `std::exponential_distribution` |
71
+ | `Eigen::Rand::extremeValue` | `Eigen::Rand::ExtremeValueGen` | float, double | Yes | generates real values on an [extreme value distribution](https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution). | `std::extreme_value_distribution` |
72
+ | `Eigen::Rand::fisherF` | `Eigen::Rand::FisherFGen` | float, double | | generates real values on the [Fisher's F distribution](https://en.wikipedia.org/wiki/F_distribution). | `std::fisher_f_distribution` |
73
+ | `Eigen::Rand::gamma` | `Eigen::Rand::GammaGen` | float, double | | generates real values on a [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution). | `std::gamma_distribution` |
74
+ | `Eigen::Rand::lognormal` | `Eigen::Rand::LognormalGen` | float, double | Yes | generates real values on a [lognormal distribution](https://en.wikipedia.org/wiki/Lognormal_distribution). | `std::lognormal_distribution` |
75
+ | `Eigen::Rand::normal` | `Eigen::Rand::StdNormalGen`, `Eigen::Rand::NormalGen` | float, double | Yes | generates real values on a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution). | `std::normal_distribution` |
76
+ | `Eigen::Rand::studentT` | `Eigen::Rand::StudentTGen` | float, double | Yes | generates real values on the [Student's t distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution). | `std::student_t_distribution` |
77
+ | `Eigen::Rand::uniformReal` | `Eigen::Rand::UniformRealGen` | float, double | Yes | generates real values in the `[0, 1)` range. | `std::generate_canonical` |
78
+ | `Eigen::Rand::weibull` | `Eigen::Rand::WeibullGen` | float, double | Yes | generates real values on the [Weibull distribution](https://en.wikipedia.org/wiki/Weibull_distribution). | `std::weibull_distribution` |
79
+
80
+ * VoP indicates 'Vectorization over Parameters'.
72
81
 
73
82
  ### Random distributions for integer types
74
83
 
75
- | Function | Generator | Scalar Type | Description | Equivalent to |
76
- |:---:|:---:|:---:|:---:|:---:|
77
- | `Eigen::Rand::binomial` | `Eigen::Rand::BinomialGen` | int | generates integers on a [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution). | `std::binomial_distribution` |
78
- | `Eigen::Rand::discrete` | `Eigen::Rand::DiscreteGen` | int | generates random integers on a discrete distribution. | `std::discrete_distribution` |
79
- | `Eigen::Rand::geometric` | `Eigen::Rand::GeometricGen` | int | generates integers on a [geometric distribution](https://en.wikipedia.org/wiki/Geometric_distribution). | `std::geometric_distribution` |
80
- | `Eigen::Rand::negativeBinomial` | `Eigen::Rand::NegativeBinomialGen` | int | generates integers on a [negative binomial distribution](https://en.wikipedia.org/wiki/Negative_binomial_distribution). | `std::negative_binomial_distribution` |
81
- | `Eigen::Rand::poisson` | `Eigen::Rand::PoissonGen` | int | generates integers on the [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution). | `std::poisson_distribution` |
82
- | `Eigen::Rand::randBits` | `Eigen::Rand::RandbitsGen` | int | generates integers with random bits. | `Eigen::DenseBase<Ty>::Random` for integer types |
83
- | `Eigen::Rand::uniformInt` | `Eigen::Rand::UniformIntGen` | int | generates integers in the `[min, max]` range. | `std::uniform_int_distribution` |
84
+ | Function | Generator | Scalar Type | VoP | Description | Equivalent to |
85
+ |:---:|:---:|:---:|:---:|:---:|:---:|
86
+ | `Eigen::Rand::binomial` | `Eigen::Rand::BinomialGen` | int | Yes | generates integers on a [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution). | `std::binomial_distribution` |
87
+ | `Eigen::Rand::discrete` | `Eigen::Rand::DiscreteGen` | int | | generates random integers on a discrete distribution. | `std::discrete_distribution` |
88
+ | `Eigen::Rand::geometric` | `Eigen::Rand::GeometricGen` | int | | generates integers on a [geometric distribution](https://en.wikipedia.org/wiki/Geometric_distribution). | `std::geometric_distribution` |
89
+ | `Eigen::Rand::negativeBinomial` | `Eigen::Rand::NegativeBinomialGen` | int | | generates integers on a [negative binomial distribution](https://en.wikipedia.org/wiki/Negative_binomial_distribution). | `std::negative_binomial_distribution` |
90
+ | `Eigen::Rand::poisson` | `Eigen::Rand::PoissonGen` | int | | generates integers on the [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution). | `std::poisson_distribution` |
91
+ | `Eigen::Rand::randBits` | `Eigen::Rand::RandbitsGen` | int | | generates integers with random bits. | `Eigen::DenseBase<Ty>::Random` for integer types |
92
+ | `Eigen::Rand::uniformInt` | `Eigen::Rand::UniformIntGen` | int | | generates integers in the `[min, max]` range. | `std::uniform_int_distribution` |
93
+
94
+ * VoP indicates 'Vectorization over Parameters'.
84
95
 
85
96
  ### Multivariate distributions for real vectors and matrices
86
97
 
@@ -101,259 +112,27 @@ https://bab2min.github.io/eigenrand/
101
112
  | `Eigen::Rand::P8_mt19937_64` | a vectorized version of Mersenne Twister algorithm. Since it generates eight 64bit random integers simultaneously, the random values are the same regardless of architecture. | |
102
113
 
103
114
  ## Performance
104
- The following charts show the relative speed-up of EigenRand compared to references(equivalent functions of C++ std or Eigen).
105
-
106
- ![Perf_no_vect](/doxygen/images/perf_no_vect.png)
107
- ![Perf_no_vect](/doxygen/images/perf_sse2.png)
108
- ![Perf_no_vect](/doxygen/images/perf_avx.png)
109
- ![Perf_no_vect](/doxygen/images/perf_avx2.png)
110
-
111
- The following charts are about multivariate distributions.
112
- ![Perf_no_vect](/doxygen/images/perf_mv_part1.png)
113
- ![Perf_no_vect](/doxygen/images/perf_mv_part2.png)
114
-
115
-
116
- The following result is a measure of the time in seconds it takes to generate 1M random numbers.
117
- It shows the average of 20 times.
118
-
119
- ### Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz (Ubuntu 16.04, gcc5.4)
120
-
121
- | | C++ std (or Eigen) | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (SSSE3) | EigenRand (AVX) | EigenRand (AVX2) |
122
- |---|---:|---:|---:|---:|---:|---:|
123
- | `balanced`* | 9.0 | 5.9 | 1.5 | 1.4 | 1.3 | 0.9 |
124
- | `balanced`(double)* | 8.7 | 6.4 | 3.3 | 2.9 | 1.7 | 1.7 |
125
- | `binomial(20, 0.5)` | 400.8 | 118.5 | 32.7 | 36.6 | 30.0 | 22.7 |
126
- | `binomial(50, 0.01)` | 71.7 | 22.5 | 7.7 | 8.3 | 7.9 | 6.6 |
127
- | `binomial(100, 0.75)` | 340.5 | 454.5 | 91.7 | 111.5 | 106.3 | 86.4 |
128
- | `cauchy` | 36.1 | 54.4 | 6.1 | 7.1 | 4.7 | 3.9 |
129
- | `chiSquared` | 80.5 | 249.5 | 64.6 | 58.0 | 29.4 | 28.8 |
130
- | `discrete`(int32) | - | 14.0 | 2.9 | 2.6 | 2.4 | 1.7 |
131
- | `discrete`(fp32) | - | 21.9 | 4.3 | 4.0 | 3.6 | 3.0 |
132
- | `discrete`(fp64) | 72.4 | 21.4 | 6.9 | 6.5 | 4.9 | 3.7 |
133
- | `exponential` | 31.0 | 25.3 | 5.5 | 5.3 | 3.3 | 2.9 |
134
- | `extremeValue` | 66.0 | 60.1 | 11.9 | 10.7 | 6.5 | 5.8 |
135
- | `fisherF(1, 1)` | 178.1 | 35.1 | 33.2 | 39.3 | 22.9 | 18.7 |
136
- | `fisherF(5, 5)` | 141.8 | 415.2 | 136.47 | 172.4 | 92.4 | 74.9 |
137
- | `gamma(0.2, 1)` | 207.8 | 211.4 | 54.6 | 51.2 | 26.9 | 27.0 |
138
- | `gamma(5, 3)` | 80.9 | 60.0 | 14.3 | 13.3 | 11.4 | 8.0 |
139
- | `gamma(10.5, 1)` | 81.1 | 248.6 | 63.3 | 58.5 | 29.2 | 28.4 |
140
- | `geometric` | 43.0 | 22.4 | 6.7 | 7.4 | 5.8 | |
141
- | `lognormal` | 66.3 | 55.4 | 12.8 | 11.8 | 6.2 | 6.2 |
142
- | `negativeBinomial(10, 0.5)` | 312.0 | 301.4 | 82.9 | 100.6 | 95.3 | 77.9 |
143
- | `negativeBinomial(20, 0.25)` | 483.4 | 575.9 | 125.0 | 158.2 | 148.4 | 119.5 |
144
- | `normal(0, 1)` | 38.1 | 28.5 | 6.8 | 6.2 | 3.8 | 3.7 |
145
- | `normal(2, 3)` | 37.6 | 29.0 | 7.3 | 6.6 | 4.0 | 3.9 |
146
- | `poisson(1)` | 31.8 | 25.2 | 9.8 | 10.8 | 9.7 | 8.2 |
147
- | `poisson(16)` | 231.8 | 274.1 | 66.2 | 80.7 | 74.4 | 64.2 |
148
- | `randBits` | 5.2 | 5.4 | 1.4 | 1.3 | 1.1 | 1.0 |
149
- | `studentT(1)` | 122.7 | 120.1 | 15.3 | 19.2 | 12.6 | 9.4 |
150
- | `studentT(20)` | 102.2 | 111.1 | 15.4 | 19.2 | 12.2 | 9.4 |
151
- | `uniformInt(0~63)` | 22.4 | 4.7 | 1.7 | 1.6 | 1.4 | 1.1 |
152
- | `uniformInt(0~100k)` | 21.8 | 10.1 | 6.2 | 6.7 | 6.6 | 5.4 |
153
- | `uniformReal` | 12.9 | 5.7 | 1.4 | 1.2 | 1.4 | 0.7 |
154
- | `weibull` | 41.0 | 35.8 | 17.7 | 15.5 | 8.5 | 8.5 |
155
-
156
- * Since there is no equivalent class to `balanced` in C++11 std, we used Eigen::DenseBase::Random instead.
157
-
158
- | | C++ std | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (SSSE3) | EigenRand (AVX) | EigenRand (AVX2) |
159
- |---|---:|---:|---:|---:|---:|---:|
160
- | Mersenne Twister(int32) | 4.7 | 5.6 | 4.0 | 3.7 | 3.5 | 3.6 |
161
- | Mersenne Twister(int64) | 5.4 | 5.3 | 4.0 | 3.9 | 3.4 | 2.6 |
162
-
163
- | | Python 3.6 + scipy 1.5.2 + numpy 1.19.2 | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (SSSE3) | EigenRand (AVX) | EigenRand (AVX2) |
164
- |---|---:|---:|---:|---:|---:|---:|
165
- | `Dirichlet(4)` | 6.47 | 6.60 | 2.39 | 2.49 | 1.34 | 1.67 |
166
- | `Dirichlet(100)` | 75.95 | 189.97 | 66.60 | 72.11 | 38.86 | 34.98 |
167
- | `InvWishart(4)` | 140.18 | 7.62 | 4.21 | 4.54 | 3.58 | 3.39 |
168
- | `InvWishart(50)` | 1510.47 | 1737.4 | 697.39 | 733.69 | 604.59 | 554.006 |
169
- | `Multinomial(4, t=20)` | 3.32 | 4.12 | 0.95 | 1.06 | 1.00 | 1.03 |
170
- | `Multinomial(4, t=1000)` | 3.51 | 192.51 | 35.99 | 39.58 | 27.84 | 35.45 |
171
- | `Multinomial(100, t=20)` | 69.19 | 4.80 | 2.00 | 2.20 | 2.28 | 2.09 |
172
- | `Multinomial(100, t=1000)` | 139.74 | 179.43 | 49.48 | 56.19 | 40.78 | 43.18 |
173
- | `MvNormal(4)` | 2.32 | 0.96 | 0.36 | 0.37 | 0.25 | 0.30 |
174
- | `MvNormal(100)` | 49.09 | 57.18 | 17.17 | 18.51 | 10.82 | 11.03 |
175
- | `Wishart(4)` | 71.19 | 5.28 | 2.70 | 2.93 | 2.04 | 1.94 |
176
- | `Wishart(50)` | 1185.26 | 1360.49 | 492.91 | 517.44 | 359.03 | 324.60 |
177
-
178
-
179
- ### Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz (macOS 10.15, clang-1103)
180
-
181
- | | C++ std (or Eigen) | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (SSSE3) | EigenRand (AVX) |
182
- |---|---:|---:|---:|---:|---:|
183
- | `balanced`* | 6.5 | 7.3 | 1.1 | 1.4 | 1.1 |
184
- | `balanced`(double)* | 6.6 | 7.5 | 2.6 | 3.3 | 2.4 |
185
- | `binomial(20, 0.5)` | 38.8 | 164.9 | 27.7 | 29.3 | 24.9 |
186
- | `binomial(50, 0.01)` | 21.9 | 27.6 | 6.6 | 7.0 | 6.3 |
187
- | `binomial(100, 0.75)` | 52.2 | 421.9 | 93.6 | 94.8 | 89.1 |
188
- | `cauchy` | 36.0 | 30.4 | 5.6 | 5.8 | 4.0 |
189
- | `chiSquared` | 84.4 | 152.2 | 44.1 | 48.7 | 26.2 |
190
- | `discrete`(int32) | - | 12.4 | 2.1 | 2.6 | 2.2 |
191
- | `discrete`(fp32) | - | 23.2 | 3.4 | 3.7 | 3.4 |
192
- | `discrete`(fp64) | 48.6 | 22.9 | 4.2 | 5.0 | 4.6 |
193
- | `exponential` | 22.0 | 18.0 | 4.1 | 4.9 | 3.2 |
194
- | `extremeValue` | 36.2 | 32.0 | 8.7 | 9.5 | 5.1 |
195
- | `fisherF(1, 1)` | 158.2 | 73.1 | 32.3 | 32.1 | 18.1 |
196
- | `fisherF(5, 5)` | 177.3 | 310.1 | 127.0 | 121.8 | 74.3 |
197
- | `gamma(0.2, 1)` | 69.8 | 80.4 | 28.5 | 33.8 | 19.2 |
198
- | `gamma(5, 3)` | 83.9 | 53.3 | 10.6 | 12.4 | 8.6 |
199
- | `gamma(10.5, 1)` | 83.2 | 150.4 | 43.3 | 48.4 | 26.2 |
200
- | `geometric` | 39.6 | 19.0 | 4.3 | 4.4 | 4.1 |
201
- | `lognormal` | 43.8 | 40.7 | 9.0 | 10.8 | 5.7 |
202
- | `negativeBinomial(10, 0.5)` | 217.4 | 274.8 | 71.6 | 73.7 | 68.2 |
203
- | `negativeBinomial(20, 0.25)` | 192.9 | 464.9 | 112.0 | 111.5 | 105.7 |
204
- | `normal(0, 1)` | 32.6 | 28.6 | 5.5 | 6.5 | 3.8 |
205
- | `normal(2, 3)` | 32.9 | 30.5 | 5.7 | 6.7 | 3.9 |
206
- | `poisson(1)` | 37.9 | 31.0 | 7.5 | 7.8 | 7.1 |
207
- | `poisson(16)` | 92.4 | 243.3 | 55.6 | 57.7 | 53.7 |
208
- | `randBits` | 6.5 | 6.5 | 1.1 | 1.3 | 1.1 |
209
- | `studentT(1)` | 115.0 | 54.1 | 15.5 | 15.7 | 8.3 |
210
- | `studentT(20)` | 121.2 | 53.8 | 15.8 | 16.0 | 8.2 |
211
- | `uniformInt(0~63)` | 20.2 | 9.8 | 1.8 | 1.8 | 1.6 |
212
- | `uniformInt(0~100k)` | 25.7 | 16.1 | 8.1 | 8.5 | 7.2 |
213
- | `uniformReal` | 12.7 | 7.0 | 1.0 | 1.2 | 1.1 |
214
- | `weibull` | 23.1 | 19.2 | 11.6 | 13.6 | 7.6 |
215
-
216
- * Since there is no equivalent class to `balanced` in C++11 std, we used Eigen::DenseBase::Random instead.
217
-
218
- | | C++ std | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (SSSE3) | EigenRand (AVX) |
219
- |---|---:|---:|---:|---:|---:|
220
- | Mersenne Twister(int32) | 6.2 | 6.4 | 1.7 | 2.0 | 1.8 |
221
- | Mersenne Twister(int64) | 6.4 | 6.3 | 2.5 | 3.1 | 2.4 |
222
-
223
-
224
- | | Python 3.6 + scipy 1.5.2 + numpy 1.19.2 | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (SSSE3) | EigenRand (AVX) |
225
- |---|---:|---:|---:|---:|---:|
226
- | `Dirichlet(4)` | 3.54 | 3.29 | 1.25 | 1.25 | 0.83 |
227
- | `Dirichlet(100)` | 57.63 | 145.32 | 49.71 | 49.50 | 29.13 |
228
- | `InvWishart(4)` | 210.92 | 7.53 | 3.72 | 3.66 | 3.10 |
229
- | `InvWishart(50)` | 1980.73 | 1446.40 | 560.40 | 559.73 | 457.07 |
230
- | `Multinomial(4, t=20)` | 2.60 | 5.22 | 1.48 | 1.50 | 1.42 |
231
- | `Multinomial(4, t=1000)` | 3.90 | 208.75 | 29.19 | 29.50 | 27.70 |
232
- | `Multinomial(100, t=20)` | 47.71 | 7.09 | 3.71 | 3.63 | 3.60 |
233
- | `Multinomial(100, t=1000)` | 128.69 | 215.19 | 44.48 | 44.63 | 43.76 |
234
- | `MvNormal(4)` | 2.04 | 1.05 | 0.35 | 0.34 | 0.19 |
235
- | `MvNormal(100)` | 48.69 | 47.10 | 16.25 | 16.12 | 11.41 |
236
- | `Wishart(4)` | 81.11 | 13.24 | 9.87 | 9.81 | 5.90 |
237
- | `Wishart(50)` | 1419.02 | 1087.40 | 448.06 | 442.97 | 328.20 |
238
-
239
-
240
- ### Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz (Windows Server 2019, MSVC2019)
241
-
242
- | | C++ std (or Eigen) | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (AVX) | EigenRand (AVX2) |
243
- |---|---:|---:|---:|---:|---:|
244
- | `balanced`* | 20.7 | 7.2 | 3.3 | 4.0 | 2.2 |
245
- | `balanced`(double)* | 21.9 | 8.8 | 6.7 | 4.3 | 4.3 |
246
- | `binomial(20, 0.5)` | 718.3 | 141.0 | 38.1 | 30.2 | 32.7 |
247
- | `binomial(50, 0.01)` | 61.5 | 21.4 | 7.5 | 6.5 | 8.0 |
248
- | `binomial(100, 0.75)` | 495.9 | 1042.5 | 100.6 | 95.2 | 93.0 |
249
- | `cauchy` | 71.6 | 30.0 | 6.8 | 6.4 | 3.0 |
250
- | `chiSquared` | 243.0 | 147.3 | 63.5 | 34.1 | 24.0 |
251
- | `discrete`(int32) | - | 12.4 | 3.5 | 2.7 | 2.2 |
252
- | `discrete`(fp32) | - | 19.2 | 5.1 | 3.6 | 3.7 |
253
- | `discrete`(fp64) | 83.9 | 19.0 | 6.7 | 7.4 | 4.6 |
254
- | `exponential` | 58.7 | 16.0 | 6.8 | 6.4 | 3.0 |
255
- | `extremeValue` | 64.6 | 27.7 | 13.5 | 9.8 | 5.5 |
256
- | `fisherF(1, 1)` | 178.7 | 75.2 | 35.3 | 28.4 | 17.5 |
257
- | `fisherF(5, 5)` | 491.0 | 298.4 | 125.8 | 87.4 | 60.5 |
258
- | `gamma(0.2, 1)` | 211.7 | 69.3 | 43.7 | 24.7 | 18.7 |
259
- | `gamma(5, 3)` | 272.5 | 42.3 | 17.6 | 17.2 | 8.5 |
260
- | `gamma(10.5, 1)` | 237.8 | 146.2 | 63.7 | 33.8 | 23.5 |
261
- | `geometric` | 49.3 | 17.0 | 7.0 | 5.8 | 5.4 |
262
- | `lognormal` | 169.8 | 37.6 | 12.7 | 7.2 | 5.0 |
263
- | `negativeBinomial(10, 0.5)` | 752.7 | 462.3 | 87.0 | 83.0 | 81.6 |
264
- | `negativeBinomial(20, 0.25)` | 611.4 | 855.3 | 123.7 | 125.3 | 116.6 |
265
- | `normal(0, 1)` | 78.4 | 21.1 | 6.9 | 4.6 | 2.9 |
266
- | `normal(2, 3)` | 77.2 | 22.3 | 6.8 | 4.8 | 3.1 |
267
- | `poisson(1)` | 77.4 | 28.9 | 10.0 | 8.1 | 10.1 |
268
- | `poisson(16)` | 312.9 | 485.5 | 63.6 | 61.5 | 60.5 |
269
- | `randBits` | 6.0 | 6.2 | 3.1 | 2.7 | 2.7 |
270
- | `studentT(1)` | 175.8 | 53.9 | 17.3 | 12.5 | 7.7 |
271
- | `studentT(20)` | 173.2 | 55.5 | 17.9 | 12.7 | 7.6 |
272
- | `uniformInt(0~63)` | 39.1 | 5.2 | 2.0 | 1.4 | 1.6 |
273
- | `uniformInt(0~100k)` | 38.5 | 12.3 | 7.6 | 6.0 | 7.7 |
274
- | `uniformReal` | 53.4 | 5.7 | 1.9 | 2.3 | 1.0 |
275
- | `weibull` | 75.1 | 44.3 | 18.5 | 14.3 | 7.9 |
276
-
277
- * Since there is no equivalent class to `balanced` in C++11 std, we used Eigen::DenseBase::Random instead.
278
-
279
- | | C++ std | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (AVX) | EigenRand (AVX2) |
280
- |---|---:|---:|---:|---:|---:|
281
- | Mersenne Twister(int32) | 6.5 | 6.4 | 5.6 | 5.1 | 4.5 |
282
- | Mersenne Twister(int64) | 6.6 | 6.5 | 6.9 | 5.9 | 5.1 |
283
-
284
-
285
- | | Python 3.6 + scipy 1.5.2 + numpy 1.19.2 | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (AVX) | EigenRand (AVX2) |
286
- |---|---:|---:|---:|---:|---:|
287
- | `Dirichlet(4)` | 4.27 | 3.20 | 2.31 | 1.43 | 1.25 |
288
- | `Dirichlet(100)` | 69.61 | 150.33 | 67.01 | 47.34 | 32.47 |
289
- | `InvWishart(4)` | 482.87 | 14.52 | 8.88 | 13.17 | 11.28 |
290
- | `InvWishart(50)` | 2222.72 | 2211.66 | 902.34 | 775.36 | 610.60 |
291
- | `Multinomial(4, t=20)` | 2.99 | 5.41 | 1.99 | 1.92 | 1.78 |
292
- | `Multinomial(4, t=1000)` | 4.23 | 235.84 | 49.73 | 42.41 | 40.76 |
293
- | `Multinomial(100, t=20)` | 58.20 | 9.12 | 5.84 | 6.02 | 5.98 |
294
- | `Multinomial(100, t=1000)` | 130.54 | 234.40 | 72.99 | 66.36 | 55.28 |
295
- | `MvNormal(4)` | 2.25 | 1.89 | 0.35 | 0.32 | 0.25 |
296
- | `MvNormal(100)` | 57.71 | 68.80 | 24.40 | 18.28 | 13.05 |
297
- | `Wishart(4)` | 70.18 | 16.25 | 4.49 | 3.97 | 3.07 |
298
- | `Wishart(50)` | 1471.29 | 1641.73 | 628.58 | 485.68 | 349.81 |
299
-
300
-
301
- ### AMD Ryzen 7 3700x CPU @ 3.60GHz (Windows 10, MSVC2017)
302
-
303
- | | C++ std (or Eigen) | EigenRand (SSE2) | EigenRand (AVX) | EigenRand (AVX2) |
304
- |---|---:|---:|---:|---:|
305
- | `balanced`* | 20.8 | 1.9 | 2.0 | 1.4 |
306
- | `balanced`(double)* | 21.7 | 4.1 | 2.7 | 3.0 |
307
- | `binomial(20, 0.5)` | 416.0 | 27.7 | 28.9 | 29.1 |
308
- | `binomial(50, 0.01)` | 37.8 | 6.3 | 6.0 | 6.6 |
309
- | `binomial(100, 0.75)` | 309.1 | 72.4 | 66.0 | 67.0 |
310
- | `cauchy` | 42.2 | 4.8 | 5.1 | 2.7 |
311
- | `chiSquared` | 153.8 | 33.5 | 21.2 | 17.0 |
312
- | `discrete`(int32) | - | 2.4 | 2.3 | 2.5 |
313
- | `discrete`(fp32) | - | 2.6 | 2.3 | 3.5 |
314
- | `discrete`(fp64) | 55.8 | 5.1 | 4.7 | 4.3 |
315
- | `exponential` | 33.4 | 6.4 | 2.8 | 2.2 |
316
- | `extremeValue` | 39.4 | 7.8 | 4.6 | 4.0 |
317
- | `fisherF(1, 1)` | 103.9 | 25.3 | 14.9 | 11.7 |
318
- | `fisherF(5, 5)` | 295.7 | 85.5 | 58.3 | 44.8 |
319
- | `gamma(0.2, 1)` | 128.8 | 31.9 | 18.3 | 15.8 |
320
- | `gamma(5, 3)` | 156.1 | 9.7 | 8.0 | 5.0 |
321
- | `gamma(10.5, 1)` | 148.5 | 33.1 | 21.1 | 17.2 |
322
- | `geometric` | 27.1 | 6.6 | 4.3 | 4.1 |
323
- | `lognormal` | 104.0 | 6.6 | 4.7 | 3.5 |
324
- | `negativeBinomial(10, 0.5)` | 462.1 | 60.0 | 56.4 | 58.6 |
325
- | `negativeBinomial(20, 0.25)` | 357.6 | 84.5 | 80.6 | 78.4 |
326
- | `normal(0, 1)` | 48.8 | 4.2 | 3.7 | 2.3 |
327
- | `normal(2, 3)` | 48.8 | 4.5 | 3.8 | 2.4 |
328
- | `poisson(1)` | 46.4 | 7.9 | 7.4 | 8.2 |
329
- | `poisson(16)` | 192.4 | 43.2 | 40.4 | 40.9 |
330
- | `randBits` | 4.2 | 1.7 | 1.5 | 1.8 |
331
- | `studentT(1)` | 107.0 | 12.3 | 6.8 | 5.7 |
332
- | `studentT(20)` | 107.1 | 12.3 | 6.8 | 5.8 |
333
- | `uniformInt(0~63)` | 31.2 | 1.1 | 1.0 | 1.2 |
334
- | `uniformInt(0~100k)` | 27.7 | 5.6 | 5.6 | 5.4 |
335
- | `uniformReal` | 30.7 | 1.1 | 1.0 | 0.6 |
336
- | `weibull` | 46.5 | 10.6 | 6.4 | 5.2 |
115
+ The following charts show the relative speed-up of EigenRand compared to references(equivalent functions of C++ std or Eigen for univariate distributions and Scipy for multivariate distributions).
337
116
 
338
117
  * Since there is no equivalent class to `balanced` in C++11 std, we used Eigen::DenseBase::Random instead.
118
+ * Cases filled with orange are generators that are slower than reference functions.
339
119
 
340
- | | C++ std | EigenRand (SSE2) | EigenRand (AVX) | EigenRand (AVX2) |
341
- |---|---:|---:|---:|---:|
342
- | Mersenne Twister(int32) | 5.0 | 3.4 | 3.4 | 3.3 |
343
- | Mersenne Twister(int64) | 5.1 | 3.9 | 3.9 | 3.3 |
344
-
345
- ### ARM64 NEON (Cortex-A73)
346
- Currently, Support for ARM64 NEON is experimental and the result may be sub-optimal.
347
- Also keep in mind that NEON does not support vectorization of double type.
348
- So if you use double type generators, they would fallback into scalar computations.
120
+ ### Windows 2019, MSVC 19.29.30147, Intel(R) Xeon(R) Platinum 8171M CPU, AVX2, Eigen 3.4.0
121
+ ![Perf_AVX2_Win](/doxygen/images/perf_avx2_win.png)
122
+ ![Perf_AVX2_Win_Mv1](/doxygen/images/perf_avx2_win_mv1.png)
123
+ ![Perf_AVX2_Win_Mv1](/doxygen/images/perf_avx2_win_mv2.png)
349
124
 
350
- ![Perf_no_vect](/doxygen/images/perf_neon_v0.3.90.png)
125
+ ### Ubuntu 18.04, gcc 7.5.0, Intel(R) Xeon(R) Platinum 8370C CPU, AVX2, Eigen 3.4.0
126
+ ![Perf_AVX2_Ubu](/doxygen/images/perf_avx2_ubu.png)
127
+ ![Perf_AVX2_Ubu_Mv1](/doxygen/images/perf_avx2_ubu_mv1.png)
128
+ ![Perf_AVX2_Ubu_Mv1](/doxygen/images/perf_avx2_ubu_mv2.png)
351
129
 
352
- The following charts are about multivariate distributions.
353
- ![Perf_no_vect](/doxygen/images/perf_mv_part1_neon_v0.3.90.png)
354
- ![Perf_no_vect](/doxygen/images/perf_mv_part2_neon_v0.3.90.png)
130
+ ### macOS Monterey 12.2.1, clang 13.1.6, Apple M1 Pro, NEON, Eigen 3.4.0
131
+ ![Perf_NEON_mac](/doxygen/images/perf_neon_mac.png)
132
+ ![Perf_NEON_mac_Mv1](/doxygen/images/perf_neon_mac_mv1.png)
133
+ ![Perf_NEON_mac_Mv1](/doxygen/images/perf_neon_mac_mv2.png)
355
134
 
356
- Cases filled with orange are generators that are slower than reference functions.
135
+ You can see the detailed numerical values used to plot the above charts on the [Action](https://github.com/bab2min/EigenRand/actions/workflows/release.yml) page.
357
136
 
358
137
  ## Accuracy
359
138
  Since vectorized mathematical functions may have a loss of precision, I measured how well the generated random number fits its actual distribution.
@@ -385,6 +164,15 @@ MIT License
385
164
 
386
165
  ## History
387
166
 
167
+ ### 0.5.1 (2024-09-08)
168
+ * Add AVX512 support
169
+ * Add `EIGENRAND_BUILD_BENCHMARK` cmake option
170
+
171
+ ### 0.5.0 (2023-01-31)
172
+ * Improved the performance of `MultinomialGen`.
173
+ * Implemented vectorization over parameters to some distributions.
174
+ * Optimized the performance of `double`-type generators on NEON architecture.
175
+
388
176
  ### 0.4.1 (2022-08-13)
389
177
  * Fixed a bug where double-type generation with std::mt19937 fails compilation.
390
178
  * Fixed a bug where `UniformIntGen` in scalar mode generates numbers in the wrong range.
@@ -55,9 +55,9 @@ tomotopy 란?
55
55
  ::
56
56
 
57
57
  import tomotopy as tp
58
- print(tp.isa) # 'avx2'나 'avx', 'sse2', 'none'를 출력합니다.
58
+ print(tp.isa) # 'avx512'나 'avx2', 'sse2', 'none'를 출력합니다.
59
59
 
60
- 현재 tomotopy는 가속을 위해 AVX2, AVX or SSE2 SIMD 명령어 세트를 활용할 수 있습니다.
60
+ 현재 tomotopy는 가속을 위해 AVX512, AVX2 or SSE2 SIMD 명령어 세트를 활용할 수 있습니다.
61
61
  패키지가 import될 때 현재 환경에서 활용할 수 있는 최선의 명령어 세트를 확인하여 최상의 모듈을 자동으로 가져옵니다.
62
62
  만약 `tp.isa`가 `none`이라면 현재 환경에서 활용 가능한 SIMD 명령어 세트가 없는 것이므로 훈련에 오랜 시간이 걸릴 수 있습니다.
63
63
  그러나 최근 대부분의 Intel 및 AMD CPU에서는 SIMD 명령어 세트를 지원하므로 SIMD 가속이 성능을 크게 향상시킬 수 있을 것입니다.
@@ -148,6 +148,31 @@ CGS와 VB는 서로 접근방법이 아예 다른 기법이기 때문에 둘을
148
148
 
149
149
  이에 대해서는 `tomotopy.LDAModel.save`와 `tomotopy.LDAModel.load`에서 더 자세한 내용을 확인할 수 있습니다.
150
150
 
151
+ 인터랙티브 모델 뷰어
152
+ ------------------
153
+
154
+ .. raw:: html
155
+
156
+ <video src="https://private-user-images.githubusercontent.com/19266222/355924875-fc9d27f5-5542-4e65-ab69-1d96dc0913af.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMwNTI4MTUsIm5iZiI6MTcyMzA1MjUxNSwicGF0aCI6Ii8xOTI2NjIyMi8zNTU5MjQ4NzUtZmM5ZDI3ZjUtNTU0Mi00ZTY1LWFiNjktMWQ5NmRjMDkxM2FmLm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA3VDE3NDE1NVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk1N2YxODE3MzBiZTNhMjkyNTk1OWJkODRmZjc4ZTcyYzFkZGYxZjgxODUxYTNlNGYxMzllOTgzNDI0MjA4ZDImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.evTswIGMps594nQ6JCtbd6puFM8ARHM0emgaluIUxvY" style="max-width:100%"></video>
157
+
158
+ v0.13.0부터는 토픽 모델링의 결과를 인터랙티브 뷰어를 통해 살펴보는 것도 가능합니다.
159
+
160
+ ::
161
+
162
+ import tomotopy as tp
163
+ model = tp.LDAModel(...)
164
+ # ... some training codes ...
165
+ tp.viewer.open_viewer(model, host="localhost", port=9999)
166
+ # And open http://localhost:9999 in your web browser!
167
+
168
+ 이미 저장된 모델 파일이 있다면 다음 명령행을 통해서도 간단히 뷰어를 구동할 수 있습니다.
169
+
170
+ ::
171
+
172
+ python -m tomotopy.viewer a_trained_model.bin --host localhost --port 9999
173
+
174
+ 자세한 내용은 `tomotopy.viewer`을 참조하세요.
175
+
151
176
  모델 안의 문헌과 모델 밖의 문헌
152
177
  -------------------------------------------
153
178
  토픽 모델은 크게 2가지 목적으로 사용할 수 있습니다.
@@ -540,6 +565,3 @@ tomotopy의 Python3 예제 코드는 https://github.com/bab2min/tomotopy/blob/ma
540
565
 
541
566
  * EigenRand: `MIT License
542
567
  <licenses_bundled/EigenRand>`_
543
-
544
- * Mapbox Variant: `BSD License
545
- <licenses_bundled/MapboxVariant>`_
@@ -56,9 +56,9 @@ After installing, you can start tomotopy by just importing.
56
56
  ::
57
57
 
58
58
  import tomotopy as tp
59
- print(tp.isa) # prints 'avx2', 'avx', 'sse2' or 'none'
59
+ print(tp.isa) # prints 'avx512', 'avx2', 'sse2' or 'none'
60
60
 
61
- Currently, tomotopy can exploits AVX2, AVX or SSE2 SIMD instruction set for maximizing performance.
61
+ Currently, tomotopy can exploits AVX512, AVX2 or SSE2 SIMD instruction set for maximizing performance.
62
62
  When the package is imported, it will check available instruction sets and select the best option.
63
63
  If `tp.isa` tells `none`, iterations of training may take a long time.
64
64
  But, since most of modern Intel or AMD CPUs provide SIMD instruction set, the SIMD acceleration could show a big improvement.
@@ -149,6 +149,31 @@ When you load the model from a file, a model type in the file should match the c
149
149
 
150
150
  See more at `tomotopy.LDAModel.save` and `tomotopy.LDAModel.load` methods.
151
151
 
152
+ Interactive Model Viewer
153
+ ------------------------
154
+
155
+ .. raw:: html
156
+
157
+ <video src="https://private-user-images.githubusercontent.com/19266222/355924875-fc9d27f5-5542-4e65-ab69-1d96dc0913af.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMwNTI4MTUsIm5iZiI6MTcyMzA1MjUxNSwicGF0aCI6Ii8xOTI2NjIyMi8zNTU5MjQ4NzUtZmM5ZDI3ZjUtNTU0Mi00ZTY1LWFiNjktMWQ5NmRjMDkxM2FmLm1wND9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA3VDE3NDE1NVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk1N2YxODE3MzBiZTNhMjkyNTk1OWJkODRmZjc4ZTcyYzFkZGYxZjgxODUxYTNlNGYxMzllOTgzNDI0MjA4ZDImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.evTswIGMps594nQ6JCtbd6puFM8ARHM0emgaluIUxvY" style="max-width:100%"></video>
158
+
159
+ You can see the result of modeling using the interactive viewer since v0.13.0.
160
+
161
+ ::
162
+
163
+ import tomotopy as tp
164
+ model = tp.LDAModel(...)
165
+ # ... some training codes ...
166
+ tp.viewer.open_viewer(model, host="localhost", port=9999)
167
+ # And open http://localhost:9999 in your web browser!
168
+
169
+ If you have a saved model file, you can also use the following command line.
170
+
171
+ ::
172
+
173
+ python -m tomotopy.viewer a_trained_model.bin --host localhost --port 9999
174
+
175
+ See more at `tomotopy.viewer` module.
176
+
152
177
  Documents in the Model and out of the Model
153
178
  -------------------------------------------
154
179
  We can use Topic Model for two major purposes.
@@ -545,9 +570,6 @@ Bundled Libraries and Their License
545
570
  * EigenRand: `MIT License
546
571
  <licenses_bundled/EigenRand>`_
547
572
 
548
- * Mapbox Variant: `BSD License
549
- <licenses_bundled/MapboxVariant>`_
550
-
551
573
  Citation
552
574
  ---------
553
575
  ::