isotree 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,3 @@
1
+ module IsoTree
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,25 @@
1
+ BSD 2-Clause License
2
+
3
+ Copyright (c) 2019, David Cortes
4
+ All rights reserved.
5
+
6
+ Redistribution and use in source and binary forms, with or without
7
+ modification, are permitted provided that the following conditions are met:
8
+
9
+ 1. Redistributions of source code must retain the above copyright notice, this
10
+ list of conditions and the following disclaimer.
11
+
12
+ 2. Redistributions in binary form must reproduce the above copyright notice,
13
+ this list of conditions and the following disclaimer in the documentation
14
+ and/or other materials provided with the distribution.
15
+
16
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
17
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,167 @@
1
+ # IsoTree
2
+
3
+ Fast and multi-threaded implementation of Extended Isolation Forest, Fair-Cut Forest, SCiForest (a.k.a. Split-Criterion iForest), and regular Isolation Forest, for outlier/anomaly detection, plus additions for imputation of missing values, distance/similarity calculation between observations, and handling of categorical data. Written in C++ with interfaces for Python and R.
4
+
5
+ The new concepts in this software are described in:
6
+ * [Distance approximation using Isolation Forests](https://arxiv.org/abs/1910.12362)
7
+ * [Imputing missing values with unsupervised random trees](https://arxiv.org/abs/1911.06646)
8
+
9
+ # Description
10
+
11
+ Isolation Forest is an algorithm originally developed for outlier detection that consists in splitting sub-samples of the data according to some attribute/feature/column at random. The idea is that, the rarer the observation, the more likely it is that a random uniform split on some feature would put outliers alone in one branch, and the fewer splits it will take to isolate an outlier observation like this. The concept is extended to splitting hyperplanes in the extended model (i.e. splitting by more than one column at a time), and to guided (not entirely random) splits in the SCiForest model that aim at isolating outliers faster and finding clustered outliers.
12
+
13
+ Note that this is a black-box model that will not produce explanations or importances - for a different take on explainable outlier detection see [OutlierTree](https://www.github.com/david-cortes/outliertree).
14
+
15
+ ![image](image/density_regions.png "density regions")
16
+
17
+ _(Code to produce these plots can be found in the R examples in the documentation)_
18
+
19
+
20
+ # Distance / similarity calculations
21
+
22
+ General idea was extended to produce distance (alternatively, similarity) between observations according to how many random splits it takes to separate them - idea is described in ["Distance approximation using Isolation Forests"](https://arxiv.org/abs/1910.12362).
23
+
24
+ # Imputation of missing values
25
+
26
+ The model can also be used to impute missing values in a similar fashion as kNN, by taking the values from observations in the terminal nodes of each tree in which an observation with missing values falls at prediction time, combining the non-missing values of the other observations as a weighted average according to the depth of the node and the number of observations that fall there. This is not related to how the model handles missing values internally, but is rather meant as a faster way of imputing by similarity. Quality is usually not as good as chained equations, but the method is a lot faster and more scalable. Recommended to use non-random splits when used as an imputer. Details are described in ["Imputing missing values with unsupervised random trees"](https://arxiv.org/abs/1911.06646).
27
+
28
+ # Highlights
29
+
30
+ There's already many available implementations of isolation forests for both Python and R (such as [the one from the original paper's authors'](https://sourceforge.net/projects/iforest/) or [the one in SciKit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html)), but at the time of writing, all of them are lacking some important functionality and/or offer sub-optimal speed. This particular implementation offers the following:
31
+
32
+ * Implements the extended model (with splitting hyperplanes) and split-criterion model (with non-random splits).
33
+ * Can use a mixture of random and non-random splits, and can split by weighted/pooled gain (in addition to simple average).
34
+ * Can produce approximated pairwise distances between observations according to how many steps it takes on average to separate them down the tree.
35
+ * Can handle missing values (but performance with them is not so good).
36
+ * Can produce missing value imputations according to observations that fall on each terminal node.
37
+ * Can handle categorical variables (one-hot/dummy encoding does not produce the same result).
38
+ * Can work with sparse matrices.
39
+ * Supports sample/observation weights, either as sampling importance or as distribution density measurement.
40
+ * Supports user-provided column sample weights.
41
+ * Can sample columns randomly with weights given by kurtosis.
42
+ * Uses exact formula (not approximation as others do) for harmonic numbers at lower sample and remainder sizes.
43
+ * Can fit trees incrementally to user-provided data samples.
44
+ * Produces serializable model objects with reasonable file sizes.
45
+ * Fast and multi-threaded C++ code. Can be wrapped in languages other than Python and R.
46
+
47
+ (Note that categoricals, NAs, and density-like sample weights, are treated heuristically with different options as there is no single logical extension of the original idea to them, and having them present might degrade performance/accuracy for regular numerical non-missing observations)
48
+
49
+ # Installation
50
+
51
+ * Python:
52
+ ```python
53
+ pip install isotree
54
+ ```
55
+
56
+ **Note for macOS users:** on macOS, the Python version of this package will compile **without** multi-threading capabilities. This is due to default apple's redistribution of `clang` not providing OpenMP modules, and aliasing it to `gcc` which causes confusions in build scripts. If you have a non-apple version of `clang` with the OpenMP modules, or if you have `gcc` installed, you can compile this package with multi-threading enabled by setting up an environment variable `ENABLE_OMP=1`:
57
+ ```
58
+ export ENABLE_OMP=1
59
+ pip install isotree
60
+ ```
61
+ (Alternatively, can also pass argument `enable-omp` to the `setup.py` file: `python setup.py install enable-omp`)
62
+
63
+ * R:
64
+
65
+ ```r
66
+ install.packages("isotree")
67
+ ```
68
+
69
+ * C++:
70
+ ```
71
+ git clone https://www.github.com/david-cortes/isotree.git
72
+ cd isotree
73
+ mkdir build
74
+ cd build
75
+ cmake ..
76
+ make
77
+
78
+ ### for a system-wide install in linux
79
+ sudo make install
80
+ sudo ldconfig
81
+ ```
82
+
83
+ (Will build as a shared object - linkage is then done with `-lisotree`)
84
+
85
+ # Sample usage
86
+
87
+ **Warning: default parameters in this implementation are very different from default parameters in others such as SciKit-Learn's, and these defaults won't scale to large datasets (see documentation for details).**
88
+
89
+ * Python:
90
+
91
+ ```python
92
+ import numpy as np
93
+ from isotree import IsolationForest
94
+
95
+ ### Random data from a standard normal distribution
96
+ np.random.seed(1)
97
+ n = 100
98
+ m = 2
99
+ X = np.random.normal(size = (n, m))
100
+
101
+ ### Will now add obvious outlier point (3, 3) to the data
102
+ X = np.r_[X, np.array([3, 3]).reshape((1, m))]
103
+
104
+ ### Fit a small isolation forest model
105
+ iso = IsolationForest(ntrees = 10, ndim = 2, nthreads = 1)
106
+ iso.fit(X)
107
+
108
+ ### Check which row has the highest outlier score
109
+ pred = iso.predict(X)
110
+ print("Point with highest outlier score: ",
111
+ X[np.argsort(-pred)[0], ])
112
+ ```
113
+
114
+ * R:
115
+ (see documentation for more examples - `help(isotree::isolation.forest)`)
116
+ ```r
117
+ ### Random data from a standard normal distribution
118
+ library(isotree)
119
+ set.seed(1)
120
+ n <- 100
121
+ m <- 2
122
+ X <- matrix(rnorm(n * m), nrow = n)
123
+
124
+ ### Will now add obvious outlier point (3, 3) to the data
125
+ X <- rbind(X, c(3, 3))
126
+
127
+ ### Fit a small isolation forest model
128
+ iso <- isolation.forest(X, ntrees = 10, nthreads = 1)
129
+
130
+ ### Check which row has the highest outlier score
131
+ pred <- predict(iso, X)
132
+ cat("Point with highest outlier score: ",
133
+ X[which.max(pred), ], "\n")
134
+ ```
135
+
136
+ * C++:
137
+
138
+ See file [isotree_cpp_ex.cpp](https://github.com/david-cortes/isotree/blob/master/example/isotree_cpp_ex.cpp).
139
+
140
+
141
+ # Examples
142
+
143
+ * Python: example notebook [here](https://nbviewer.jupyter.org/github/david-cortes/isotree/blob/master/example/isotree_example.ipynb), (also example as imputer in sklearn pipeline [here](https://nbviewer.jupyter.org/github/david-cortes/isotree/blob/master/example/isotree_impute.ipynb)).
144
+ * R: examples available in the documentation (`help(isotree::isolation.forest)`, [link to CRAN](https://cran.r-project.org/web/packages/isotree/index.html)).
145
+ * C++: see short example in the section above.
146
+
147
+ # Documentation
148
+
149
+ * Python: documentation is available at [ReadTheDocs](http://isotree.readthedocs.io/en/latest/).
150
+ * R: documentation is available internally in the package (e.g. `help(isolation.forest)`) and in [CRAN](https://cran.r-project.org/web/packages/isotree/index.html).
151
+ * C++: documentation is available in the public header (`include/isotree.hpp`) and in the source files.
152
+
153
+ # Known issues
154
+
155
+ When setting a random seed and using more than one thread, the results of some functions are not 100% reproducible to the last decimal - especially not for imputations. This is due to parallelized aggregations, and thus the only "fix" is to limit oneself to only one thread. The trees themselves are however not affected by this, and neither is the isolation depth (main functionality of the package).
156
+
157
+ # References
158
+
159
+ * Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 Eighth IEEE International Conference on Data Mining. IEEE, 2008.
160
+ * Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation-based anomaly detection." ACM Transactions on Knowledge Discovery from Data (TKDD) 6.1 (2012): 3.
161
+ * Hariri, Sahand, Matias Carrasco Kind, and Robert J. Brunner. "Extended Isolation Forest." arXiv preprint arXiv:1811.02141 (2018).
162
+ * Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "On detecting clustered anomalies using SCiForest." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2010.
163
+ * https://sourceforge.net/projects/iforest/
164
+ * https://math.stackexchange.com/questions/3388518/expected-number-of-paths-required-to-separate-elements-in-a-binary-tree
165
+ * Quinlan, J. Ross. C4. 5: programs for machine learning. Elsevier, 2014.
166
+ * Cortes, David. "Distance approximation using Isolation Forests." arXiv preprint arXiv:1910.12362 (2019).
167
+ * Cortes, David. "Imputing missing values with unsupervised random trees." arXiv preprint arXiv:1911.06646 (2019).
@@ -0,0 +1,4 @@
1
+ PKG_CPPFLAGS = -D_FOR_R -D_USE_MERSENNE_TWISTER -D_ENABLE_CEREAL
2
+ PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS)
3
+ PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS)
4
+ CXX_STD = CXX11
@@ -0,0 +1,267 @@
1
+ // Generated by using Rcpp::compileAttributes() -> do not edit by hand
2
+ // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393
3
+
4
+ #include <Rcpp.h>
5
+
6
+ using namespace Rcpp;
7
+
8
+ // deserialize_IsoForest
9
+ SEXP deserialize_IsoForest(Rcpp::RawVector src);
10
+ RcppExport SEXP _isotree_deserialize_IsoForest(SEXP srcSEXP) {
11
+ BEGIN_RCPP
12
+ Rcpp::RObject rcpp_result_gen;
13
+ Rcpp::RNGScope rcpp_rngScope_gen;
14
+ Rcpp::traits::input_parameter< Rcpp::RawVector >::type src(srcSEXP);
15
+ rcpp_result_gen = Rcpp::wrap(deserialize_IsoForest(src));
16
+ return rcpp_result_gen;
17
+ END_RCPP
18
+ }
19
+ // deserialize_ExtIsoForest
20
+ SEXP deserialize_ExtIsoForest(Rcpp::RawVector src);
21
+ RcppExport SEXP _isotree_deserialize_ExtIsoForest(SEXP srcSEXP) {
22
+ BEGIN_RCPP
23
+ Rcpp::RObject rcpp_result_gen;
24
+ Rcpp::RNGScope rcpp_rngScope_gen;
25
+ Rcpp::traits::input_parameter< Rcpp::RawVector >::type src(srcSEXP);
26
+ rcpp_result_gen = Rcpp::wrap(deserialize_ExtIsoForest(src));
27
+ return rcpp_result_gen;
28
+ END_RCPP
29
+ }
30
+ // deserialize_Imputer
31
+ SEXP deserialize_Imputer(Rcpp::RawVector src);
32
+ RcppExport SEXP _isotree_deserialize_Imputer(SEXP srcSEXP) {
33
+ BEGIN_RCPP
34
+ Rcpp::RObject rcpp_result_gen;
35
+ Rcpp::RNGScope rcpp_rngScope_gen;
36
+ Rcpp::traits::input_parameter< Rcpp::RawVector >::type src(srcSEXP);
37
+ rcpp_result_gen = Rcpp::wrap(deserialize_Imputer(src));
38
+ return rcpp_result_gen;
39
+ END_RCPP
40
+ }
41
+ // check_null_ptr_model
42
+ Rcpp::LogicalVector check_null_ptr_model(SEXP ptr_model);
43
+ RcppExport SEXP _isotree_check_null_ptr_model(SEXP ptr_modelSEXP) {
44
+ BEGIN_RCPP
45
+ Rcpp::RObject rcpp_result_gen;
46
+ Rcpp::RNGScope rcpp_rngScope_gen;
47
+ Rcpp::traits::input_parameter< SEXP >::type ptr_model(ptr_modelSEXP);
48
+ rcpp_result_gen = Rcpp::wrap(check_null_ptr_model(ptr_model));
49
+ return rcpp_result_gen;
50
+ END_RCPP
51
+ }
52
+ // fit_model
53
+ Rcpp::List fit_model(Rcpp::NumericVector X_num, Rcpp::IntegerVector X_cat, Rcpp::IntegerVector ncat, Rcpp::NumericVector Xc, Rcpp::IntegerVector Xc_ind, Rcpp::IntegerVector Xc_indptr, Rcpp::NumericVector sample_weights, Rcpp::NumericVector col_weights, size_t nrows, size_t ncols_numeric, size_t ncols_categ, size_t ndim, size_t ntry, Rcpp::CharacterVector coef_type, bool coef_by_prop, bool with_replacement, bool weight_as_sample, size_t sample_size, size_t ntrees, size_t max_depth, bool limit_depth, bool penalize_range, bool calc_dist, bool standardize_dist, bool sq_dist, bool calc_depth, bool standardize_depth, bool weigh_by_kurt, double prob_pick_by_gain_avg, double prob_split_by_gain_avg, double prob_pick_by_gain_pl, double prob_split_by_gain_pl, double min_gain, Rcpp::CharacterVector cat_split_type, Rcpp::CharacterVector new_cat_action, Rcpp::CharacterVector missing_action, bool all_perm, bool build_imputer, bool output_imputations, size_t min_imp_obs, Rcpp::CharacterVector depth_imp, Rcpp::CharacterVector weigh_imp_rows, int random_seed, int nthreads);
54
+ RcppExport SEXP _isotree_fit_model(SEXP X_numSEXP, SEXP X_catSEXP, SEXP ncatSEXP, SEXP XcSEXP, SEXP Xc_indSEXP, SEXP Xc_indptrSEXP, SEXP sample_weightsSEXP, SEXP col_weightsSEXP, SEXP nrowsSEXP, SEXP ncols_numericSEXP, SEXP ncols_categSEXP, SEXP ndimSEXP, SEXP ntrySEXP, SEXP coef_typeSEXP, SEXP coef_by_propSEXP, SEXP with_replacementSEXP, SEXP weight_as_sampleSEXP, SEXP sample_sizeSEXP, SEXP ntreesSEXP, SEXP max_depthSEXP, SEXP limit_depthSEXP, SEXP penalize_rangeSEXP, SEXP calc_distSEXP, SEXP standardize_distSEXP, SEXP sq_distSEXP, SEXP calc_depthSEXP, SEXP standardize_depthSEXP, SEXP weigh_by_kurtSEXP, SEXP prob_pick_by_gain_avgSEXP, SEXP prob_split_by_gain_avgSEXP, SEXP prob_pick_by_gain_plSEXP, SEXP prob_split_by_gain_plSEXP, SEXP min_gainSEXP, SEXP cat_split_typeSEXP, SEXP new_cat_actionSEXP, SEXP missing_actionSEXP, SEXP all_permSEXP, SEXP build_imputerSEXP, SEXP output_imputationsSEXP, SEXP min_imp_obsSEXP, SEXP depth_impSEXP, SEXP weigh_imp_rowsSEXP, SEXP random_seedSEXP, SEXP nthreadsSEXP) {
55
+ BEGIN_RCPP
56
+ Rcpp::RObject rcpp_result_gen;
57
+ Rcpp::RNGScope rcpp_rngScope_gen;
58
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type X_num(X_numSEXP);
59
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type X_cat(X_catSEXP);
60
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type ncat(ncatSEXP);
61
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type Xc(XcSEXP);
62
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xc_ind(Xc_indSEXP);
63
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xc_indptr(Xc_indptrSEXP);
64
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type sample_weights(sample_weightsSEXP);
65
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type col_weights(col_weightsSEXP);
66
+ Rcpp::traits::input_parameter< size_t >::type nrows(nrowsSEXP);
67
+ Rcpp::traits::input_parameter< size_t >::type ncols_numeric(ncols_numericSEXP);
68
+ Rcpp::traits::input_parameter< size_t >::type ncols_categ(ncols_categSEXP);
69
+ Rcpp::traits::input_parameter< size_t >::type ndim(ndimSEXP);
70
+ Rcpp::traits::input_parameter< size_t >::type ntry(ntrySEXP);
71
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type coef_type(coef_typeSEXP);
72
+ Rcpp::traits::input_parameter< bool >::type coef_by_prop(coef_by_propSEXP);
73
+ Rcpp::traits::input_parameter< bool >::type with_replacement(with_replacementSEXP);
74
+ Rcpp::traits::input_parameter< bool >::type weight_as_sample(weight_as_sampleSEXP);
75
+ Rcpp::traits::input_parameter< size_t >::type sample_size(sample_sizeSEXP);
76
+ Rcpp::traits::input_parameter< size_t >::type ntrees(ntreesSEXP);
77
+ Rcpp::traits::input_parameter< size_t >::type max_depth(max_depthSEXP);
78
+ Rcpp::traits::input_parameter< bool >::type limit_depth(limit_depthSEXP);
79
+ Rcpp::traits::input_parameter< bool >::type penalize_range(penalize_rangeSEXP);
80
+ Rcpp::traits::input_parameter< bool >::type calc_dist(calc_distSEXP);
81
+ Rcpp::traits::input_parameter< bool >::type standardize_dist(standardize_distSEXP);
82
+ Rcpp::traits::input_parameter< bool >::type sq_dist(sq_distSEXP);
83
+ Rcpp::traits::input_parameter< bool >::type calc_depth(calc_depthSEXP);
84
+ Rcpp::traits::input_parameter< bool >::type standardize_depth(standardize_depthSEXP);
85
+ Rcpp::traits::input_parameter< bool >::type weigh_by_kurt(weigh_by_kurtSEXP);
86
+ Rcpp::traits::input_parameter< double >::type prob_pick_by_gain_avg(prob_pick_by_gain_avgSEXP);
87
+ Rcpp::traits::input_parameter< double >::type prob_split_by_gain_avg(prob_split_by_gain_avgSEXP);
88
+ Rcpp::traits::input_parameter< double >::type prob_pick_by_gain_pl(prob_pick_by_gain_plSEXP);
89
+ Rcpp::traits::input_parameter< double >::type prob_split_by_gain_pl(prob_split_by_gain_plSEXP);
90
+ Rcpp::traits::input_parameter< double >::type min_gain(min_gainSEXP);
91
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type cat_split_type(cat_split_typeSEXP);
92
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type new_cat_action(new_cat_actionSEXP);
93
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type missing_action(missing_actionSEXP);
94
+ Rcpp::traits::input_parameter< bool >::type all_perm(all_permSEXP);
95
+ Rcpp::traits::input_parameter< bool >::type build_imputer(build_imputerSEXP);
96
+ Rcpp::traits::input_parameter< bool >::type output_imputations(output_imputationsSEXP);
97
+ Rcpp::traits::input_parameter< size_t >::type min_imp_obs(min_imp_obsSEXP);
98
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type depth_imp(depth_impSEXP);
99
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type weigh_imp_rows(weigh_imp_rowsSEXP);
100
+ Rcpp::traits::input_parameter< int >::type random_seed(random_seedSEXP);
101
+ Rcpp::traits::input_parameter< int >::type nthreads(nthreadsSEXP);
102
+ rcpp_result_gen = Rcpp::wrap(fit_model(X_num, X_cat, ncat, Xc, Xc_ind, Xc_indptr, sample_weights, col_weights, nrows, ncols_numeric, ncols_categ, ndim, ntry, coef_type, coef_by_prop, with_replacement, weight_as_sample, sample_size, ntrees, max_depth, limit_depth, penalize_range, calc_dist, standardize_dist, sq_dist, calc_depth, standardize_depth, weigh_by_kurt, prob_pick_by_gain_avg, prob_split_by_gain_avg, prob_pick_by_gain_pl, prob_split_by_gain_pl, min_gain, cat_split_type, new_cat_action, missing_action, all_perm, build_imputer, output_imputations, min_imp_obs, depth_imp, weigh_imp_rows, random_seed, nthreads));
103
+ return rcpp_result_gen;
104
+ END_RCPP
105
+ }
106
+ // fit_tree
107
+ Rcpp::RawVector fit_tree(SEXP model_R_ptr, Rcpp::NumericVector X_num, Rcpp::IntegerVector X_cat, Rcpp::IntegerVector ncat, Rcpp::NumericVector Xc, Rcpp::IntegerVector Xc_ind, Rcpp::IntegerVector Xc_indptr, Rcpp::NumericVector sample_weights, Rcpp::NumericVector col_weights, size_t nrows, size_t ncols_numeric, size_t ncols_categ, size_t ndim, size_t ntry, Rcpp::CharacterVector coef_type, bool coef_by_prop, size_t max_depth, bool limit_depth, bool penalize_range, bool weigh_by_kurt, double prob_pick_by_gain_avg, double prob_split_by_gain_avg, double prob_pick_by_gain_pl, double prob_split_by_gain_pl, double min_gain, Rcpp::CharacterVector cat_split_type, Rcpp::CharacterVector new_cat_action, Rcpp::CharacterVector missing_action, bool build_imputer, size_t min_imp_obs, SEXP imp_R_ptr, Rcpp::CharacterVector depth_imp, Rcpp::CharacterVector weigh_imp_rows, bool all_perm, uint64_t random_seed);
108
+ RcppExport SEXP _isotree_fit_tree(SEXP model_R_ptrSEXP, SEXP X_numSEXP, SEXP X_catSEXP, SEXP ncatSEXP, SEXP XcSEXP, SEXP Xc_indSEXP, SEXP Xc_indptrSEXP, SEXP sample_weightsSEXP, SEXP col_weightsSEXP, SEXP nrowsSEXP, SEXP ncols_numericSEXP, SEXP ncols_categSEXP, SEXP ndimSEXP, SEXP ntrySEXP, SEXP coef_typeSEXP, SEXP coef_by_propSEXP, SEXP max_depthSEXP, SEXP limit_depthSEXP, SEXP penalize_rangeSEXP, SEXP weigh_by_kurtSEXP, SEXP prob_pick_by_gain_avgSEXP, SEXP prob_split_by_gain_avgSEXP, SEXP prob_pick_by_gain_plSEXP, SEXP prob_split_by_gain_plSEXP, SEXP min_gainSEXP, SEXP cat_split_typeSEXP, SEXP new_cat_actionSEXP, SEXP missing_actionSEXP, SEXP build_imputerSEXP, SEXP min_imp_obsSEXP, SEXP imp_R_ptrSEXP, SEXP depth_impSEXP, SEXP weigh_imp_rowsSEXP, SEXP all_permSEXP, SEXP random_seedSEXP) {
109
+ BEGIN_RCPP
110
+ Rcpp::RObject rcpp_result_gen;
111
+ Rcpp::RNGScope rcpp_rngScope_gen;
112
+ Rcpp::traits::input_parameter< SEXP >::type model_R_ptr(model_R_ptrSEXP);
113
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type X_num(X_numSEXP);
114
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type X_cat(X_catSEXP);
115
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type ncat(ncatSEXP);
116
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type Xc(XcSEXP);
117
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xc_ind(Xc_indSEXP);
118
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xc_indptr(Xc_indptrSEXP);
119
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type sample_weights(sample_weightsSEXP);
120
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type col_weights(col_weightsSEXP);
121
+ Rcpp::traits::input_parameter< size_t >::type nrows(nrowsSEXP);
122
+ Rcpp::traits::input_parameter< size_t >::type ncols_numeric(ncols_numericSEXP);
123
+ Rcpp::traits::input_parameter< size_t >::type ncols_categ(ncols_categSEXP);
124
+ Rcpp::traits::input_parameter< size_t >::type ndim(ndimSEXP);
125
+ Rcpp::traits::input_parameter< size_t >::type ntry(ntrySEXP);
126
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type coef_type(coef_typeSEXP);
127
+ Rcpp::traits::input_parameter< bool >::type coef_by_prop(coef_by_propSEXP);
128
+ Rcpp::traits::input_parameter< size_t >::type max_depth(max_depthSEXP);
129
+ Rcpp::traits::input_parameter< bool >::type limit_depth(limit_depthSEXP);
130
+ Rcpp::traits::input_parameter< bool >::type penalize_range(penalize_rangeSEXP);
131
+ Rcpp::traits::input_parameter< bool >::type weigh_by_kurt(weigh_by_kurtSEXP);
132
+ Rcpp::traits::input_parameter< double >::type prob_pick_by_gain_avg(prob_pick_by_gain_avgSEXP);
133
+ Rcpp::traits::input_parameter< double >::type prob_split_by_gain_avg(prob_split_by_gain_avgSEXP);
134
+ Rcpp::traits::input_parameter< double >::type prob_pick_by_gain_pl(prob_pick_by_gain_plSEXP);
135
+ Rcpp::traits::input_parameter< double >::type prob_split_by_gain_pl(prob_split_by_gain_plSEXP);
136
+ Rcpp::traits::input_parameter< double >::type min_gain(min_gainSEXP);
137
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type cat_split_type(cat_split_typeSEXP);
138
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type new_cat_action(new_cat_actionSEXP);
139
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type missing_action(missing_actionSEXP);
140
+ Rcpp::traits::input_parameter< bool >::type build_imputer(build_imputerSEXP);
141
+ Rcpp::traits::input_parameter< size_t >::type min_imp_obs(min_imp_obsSEXP);
142
+ Rcpp::traits::input_parameter< SEXP >::type imp_R_ptr(imp_R_ptrSEXP);
143
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type depth_imp(depth_impSEXP);
144
+ Rcpp::traits::input_parameter< Rcpp::CharacterVector >::type weigh_imp_rows(weigh_imp_rowsSEXP);
145
+ Rcpp::traits::input_parameter< bool >::type all_perm(all_permSEXP);
146
+ Rcpp::traits::input_parameter< uint64_t >::type random_seed(random_seedSEXP);
147
+ rcpp_result_gen = Rcpp::wrap(fit_tree(model_R_ptr, X_num, X_cat, ncat, Xc, Xc_ind, Xc_indptr, sample_weights, col_weights, nrows, ncols_numeric, ncols_categ, ndim, ntry, coef_type, coef_by_prop, max_depth, limit_depth, penalize_range, weigh_by_kurt, prob_pick_by_gain_avg, prob_split_by_gain_avg, prob_pick_by_gain_pl, prob_split_by_gain_pl, min_gain, cat_split_type, new_cat_action, missing_action, build_imputer, min_imp_obs, imp_R_ptr, depth_imp, weigh_imp_rows, all_perm, random_seed));
148
+ return rcpp_result_gen;
149
+ END_RCPP
150
+ }
151
+ // predict_iso
152
+ void predict_iso(SEXP model_R_ptr, Rcpp::NumericVector outp, Rcpp::IntegerVector tree_num, bool is_extended, Rcpp::NumericVector X_num, Rcpp::IntegerVector X_cat, Rcpp::NumericVector Xc, Rcpp::IntegerVector Xc_ind, Rcpp::IntegerVector Xc_indptr, Rcpp::NumericVector Xr, Rcpp::IntegerVector Xr_ind, Rcpp::IntegerVector Xr_indptr, size_t nrows, int nthreads, bool standardize);
153
+ RcppExport SEXP _isotree_predict_iso(SEXP model_R_ptrSEXP, SEXP outpSEXP, SEXP tree_numSEXP, SEXP is_extendedSEXP, SEXP X_numSEXP, SEXP X_catSEXP, SEXP XcSEXP, SEXP Xc_indSEXP, SEXP Xc_indptrSEXP, SEXP XrSEXP, SEXP Xr_indSEXP, SEXP Xr_indptrSEXP, SEXP nrowsSEXP, SEXP nthreadsSEXP, SEXP standardizeSEXP) {
154
+ BEGIN_RCPP
155
+ Rcpp::RNGScope rcpp_rngScope_gen;
156
+ Rcpp::traits::input_parameter< SEXP >::type model_R_ptr(model_R_ptrSEXP);
157
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type outp(outpSEXP);
158
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type tree_num(tree_numSEXP);
159
+ Rcpp::traits::input_parameter< bool >::type is_extended(is_extendedSEXP);
160
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type X_num(X_numSEXP);
161
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type X_cat(X_catSEXP);
162
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type Xc(XcSEXP);
163
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xc_ind(Xc_indSEXP);
164
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xc_indptr(Xc_indptrSEXP);
165
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type Xr(XrSEXP);
166
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xr_ind(Xr_indSEXP);
167
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xr_indptr(Xr_indptrSEXP);
168
+ Rcpp::traits::input_parameter< size_t >::type nrows(nrowsSEXP);
169
+ Rcpp::traits::input_parameter< int >::type nthreads(nthreadsSEXP);
170
+ Rcpp::traits::input_parameter< bool >::type standardize(standardizeSEXP);
171
+ predict_iso(model_R_ptr, outp, tree_num, is_extended, X_num, X_cat, Xc, Xc_ind, Xc_indptr, Xr, Xr_ind, Xr_indptr, nrows, nthreads, standardize);
172
+ return R_NilValue;
173
+ END_RCPP
174
+ }
175
+ // dist_iso
176
+ void dist_iso(SEXP model_R_ptr, Rcpp::NumericVector tmat, Rcpp::NumericVector dmat, Rcpp::NumericVector rmat, bool is_extended, Rcpp::NumericVector X_num, Rcpp::IntegerVector X_cat, Rcpp::NumericVector Xc, Rcpp::IntegerVector Xc_ind, Rcpp::IntegerVector Xc_indptr, size_t nrows, int nthreads, bool assume_full_distr, bool standardize_dist, bool sq_dist, size_t n_from);
177
+ RcppExport SEXP _isotree_dist_iso(SEXP model_R_ptrSEXP, SEXP tmatSEXP, SEXP dmatSEXP, SEXP rmatSEXP, SEXP is_extendedSEXP, SEXP X_numSEXP, SEXP X_catSEXP, SEXP XcSEXP, SEXP Xc_indSEXP, SEXP Xc_indptrSEXP, SEXP nrowsSEXP, SEXP nthreadsSEXP, SEXP assume_full_distrSEXP, SEXP standardize_distSEXP, SEXP sq_distSEXP, SEXP n_fromSEXP) {
178
+ BEGIN_RCPP
179
+ Rcpp::RNGScope rcpp_rngScope_gen;
180
+ Rcpp::traits::input_parameter< SEXP >::type model_R_ptr(model_R_ptrSEXP);
181
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type tmat(tmatSEXP);
182
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type dmat(dmatSEXP);
183
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type rmat(rmatSEXP);
184
+ Rcpp::traits::input_parameter< bool >::type is_extended(is_extendedSEXP);
185
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type X_num(X_numSEXP);
186
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type X_cat(X_catSEXP);
187
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type Xc(XcSEXP);
188
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xc_ind(Xc_indSEXP);
189
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xc_indptr(Xc_indptrSEXP);
190
+ Rcpp::traits::input_parameter< size_t >::type nrows(nrowsSEXP);
191
+ Rcpp::traits::input_parameter< int >::type nthreads(nthreadsSEXP);
192
+ Rcpp::traits::input_parameter< bool >::type assume_full_distr(assume_full_distrSEXP);
193
+ Rcpp::traits::input_parameter< bool >::type standardize_dist(standardize_distSEXP);
194
+ Rcpp::traits::input_parameter< bool >::type sq_dist(sq_distSEXP);
195
+ Rcpp::traits::input_parameter< size_t >::type n_from(n_fromSEXP);
196
+ dist_iso(model_R_ptr, tmat, dmat, rmat, is_extended, X_num, X_cat, Xc, Xc_ind, Xc_indptr, nrows, nthreads, assume_full_distr, standardize_dist, sq_dist, n_from);
197
+ return R_NilValue;
198
+ END_RCPP
199
+ }
200
+ // impute_iso
201
+ Rcpp::List impute_iso(SEXP model_R_ptr, SEXP imputer_R_ptr, bool is_extended, Rcpp::NumericVector X_num, Rcpp::IntegerVector X_cat, Rcpp::NumericVector Xr, Rcpp::IntegerVector Xr_ind, Rcpp::IntegerVector Xr_indptr, size_t nrows, int nthreads);
202
+ RcppExport SEXP _isotree_impute_iso(SEXP model_R_ptrSEXP, SEXP imputer_R_ptrSEXP, SEXP is_extendedSEXP, SEXP X_numSEXP, SEXP X_catSEXP, SEXP XrSEXP, SEXP Xr_indSEXP, SEXP Xr_indptrSEXP, SEXP nrowsSEXP, SEXP nthreadsSEXP) {
203
+ BEGIN_RCPP
204
+ Rcpp::RObject rcpp_result_gen;
205
+ Rcpp::RNGScope rcpp_rngScope_gen;
206
+ Rcpp::traits::input_parameter< SEXP >::type model_R_ptr(model_R_ptrSEXP);
207
+ Rcpp::traits::input_parameter< SEXP >::type imputer_R_ptr(imputer_R_ptrSEXP);
208
+ Rcpp::traits::input_parameter< bool >::type is_extended(is_extendedSEXP);
209
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type X_num(X_numSEXP);
210
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type X_cat(X_catSEXP);
211
+ Rcpp::traits::input_parameter< Rcpp::NumericVector >::type Xr(XrSEXP);
212
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xr_ind(Xr_indSEXP);
213
+ Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type Xr_indptr(Xr_indptrSEXP);
214
+ Rcpp::traits::input_parameter< size_t >::type nrows(nrowsSEXP);
215
+ Rcpp::traits::input_parameter< int >::type nthreads(nthreadsSEXP);
216
+ rcpp_result_gen = Rcpp::wrap(impute_iso(model_R_ptr, imputer_R_ptr, is_extended, X_num, X_cat, Xr, Xr_ind, Xr_indptr, nrows, nthreads));
217
+ return rcpp_result_gen;
218
+ END_RCPP
219
+ }
220
+ // get_n_nodes
221
+ Rcpp::List get_n_nodes(SEXP model_R_ptr, bool is_extended, int nthreads);
222
+ RcppExport SEXP _isotree_get_n_nodes(SEXP model_R_ptrSEXP, SEXP is_extendedSEXP, SEXP nthreadsSEXP) {
223
+ BEGIN_RCPP
224
+ Rcpp::RObject rcpp_result_gen;
225
+ Rcpp::RNGScope rcpp_rngScope_gen;
226
+ Rcpp::traits::input_parameter< SEXP >::type model_R_ptr(model_R_ptrSEXP);
227
+ Rcpp::traits::input_parameter< bool >::type is_extended(is_extendedSEXP);
228
+ Rcpp::traits::input_parameter< int >::type nthreads(nthreadsSEXP);
229
+ rcpp_result_gen = Rcpp::wrap(get_n_nodes(model_R_ptr, is_extended, nthreads));
230
+ return rcpp_result_gen;
231
+ END_RCPP
232
+ }
233
+ // append_trees_from_other
234
+ Rcpp::List append_trees_from_other(SEXP model_R_ptr, SEXP other_R_ptr, SEXP imp_R_ptr, SEXP oimp_R_ptr, bool is_extended);
235
+ RcppExport SEXP _isotree_append_trees_from_other(SEXP model_R_ptrSEXP, SEXP other_R_ptrSEXP, SEXP imp_R_ptrSEXP, SEXP oimp_R_ptrSEXP, SEXP is_extendedSEXP) {
236
+ BEGIN_RCPP
237
+ Rcpp::RObject rcpp_result_gen;
238
+ Rcpp::RNGScope rcpp_rngScope_gen;
239
+ Rcpp::traits::input_parameter< SEXP >::type model_R_ptr(model_R_ptrSEXP);
240
+ Rcpp::traits::input_parameter< SEXP >::type other_R_ptr(other_R_ptrSEXP);
241
+ Rcpp::traits::input_parameter< SEXP >::type imp_R_ptr(imp_R_ptrSEXP);
242
+ Rcpp::traits::input_parameter< SEXP >::type oimp_R_ptr(oimp_R_ptrSEXP);
243
+ Rcpp::traits::input_parameter< bool >::type is_extended(is_extendedSEXP);
244
+ rcpp_result_gen = Rcpp::wrap(append_trees_from_other(model_R_ptr, other_R_ptr, imp_R_ptr, oimp_R_ptr, is_extended));
245
+ return rcpp_result_gen;
246
+ END_RCPP
247
+ }
248
+
249
+ static const R_CallMethodDef CallEntries[] = {
250
+ {"_isotree_deserialize_IsoForest", (DL_FUNC) &_isotree_deserialize_IsoForest, 1},
251
+ {"_isotree_deserialize_ExtIsoForest", (DL_FUNC) &_isotree_deserialize_ExtIsoForest, 1},
252
+ {"_isotree_deserialize_Imputer", (DL_FUNC) &_isotree_deserialize_Imputer, 1},
253
+ {"_isotree_check_null_ptr_model", (DL_FUNC) &_isotree_check_null_ptr_model, 1},
254
+ {"_isotree_fit_model", (DL_FUNC) &_isotree_fit_model, 44},
255
+ {"_isotree_fit_tree", (DL_FUNC) &_isotree_fit_tree, 35},
256
+ {"_isotree_predict_iso", (DL_FUNC) &_isotree_predict_iso, 15},
257
+ {"_isotree_dist_iso", (DL_FUNC) &_isotree_dist_iso, 16},
258
+ {"_isotree_impute_iso", (DL_FUNC) &_isotree_impute_iso, 10},
259
+ {"_isotree_get_n_nodes", (DL_FUNC) &_isotree_get_n_nodes, 3},
260
+ {"_isotree_append_trees_from_other", (DL_FUNC) &_isotree_append_trees_from_other, 5},
261
+ {NULL, NULL, 0}
262
+ };
263
+
264
+ RcppExport void R_init_isotree(DllInfo *dll) {
265
+ R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
266
+ R_useDynamicSymbols(dll, FALSE);
267
+ }