outliertree 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +3 -0
- data/LICENSE.txt +674 -0
- data/NOTICE.txt +14 -0
- data/README.md +107 -0
- data/ext/outliertree/ext.cpp +260 -0
- data/ext/outliertree/extconf.rb +21 -0
- data/lib/outliertree.rb +17 -0
- data/lib/outliertree/dataset.rb +35 -0
- data/lib/outliertree/model.rb +128 -0
- data/lib/outliertree/result.rb +190 -0
- data/lib/outliertree/version.rb +3 -0
- data/vendor/outliertree/LICENSE +674 -0
- data/vendor/outliertree/README.md +155 -0
- data/vendor/outliertree/src/Makevars +3 -0
- data/vendor/outliertree/src/RcppExports.cpp +123 -0
- data/vendor/outliertree/src/Rwrapper.cpp +1225 -0
- data/vendor/outliertree/src/cat_outlier.cpp +328 -0
- data/vendor/outliertree/src/clusters.cpp +972 -0
- data/vendor/outliertree/src/fit_model.cpp +1932 -0
- data/vendor/outliertree/src/misc.cpp +685 -0
- data/vendor/outliertree/src/outlier_tree.hpp +758 -0
- data/vendor/outliertree/src/predict.cpp +706 -0
- data/vendor/outliertree/src/split.cpp +1098 -0
- metadata +150 -0
| @@ -0,0 +1,155 @@ | |
| 1 | 
            +
            # OutlierTree
         | 
| 2 | 
            +
             | 
| 3 | 
            +
            Explainable outlier/anomaly detection based on smart decision tree grouping, similar in spirit to the GritBot software developed by RuleQuest research. Written in C++ with interfaces for R and Python. Supports columns of types numeric, categorical, binary/boolean, and ordinal, and can handle missing values in all of them. Ideal as a sanity checker in exploratory data analysis.
         | 
| 4 | 
            +
             | 
| 5 | 
            +
            # How it works
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            Will try to fit decision trees that try to "predict" values for each column based on the values of each other column. Along the way, each time a split is evaluated, it will take the observations that fall into each branch as a homogeneous cluster in which it will search for outliers in the 1-d distribution of the column being predicted. Outliers are determined according to confidence intervals on this 1-d distribution, and need to have a large gap with respect to the next observation in sorted order to be flagged as outliers. Since outliers are searched for in a decision tree branch, it will know the conditions that make it a rare observation compared to others that meet the same conditions, and the conditions will always be correlated with the target variable (as it's being predicted from them).
         | 
| 8 | 
            +
             | 
| 9 | 
            +
            As such, it will only be able to detect outliers that can be described through a decision tree logic, and unlike other methods such as [Isolation Forests](https://github.com/david-cortes/isotree), will not be able to assign an outlier score to each observation, nor to detect outliers that are just overall rare, but will always provide a human-readable justification when it flags an outlier.
         | 
| 10 | 
            +
             | 
| 11 | 
            +
            Procedure is described in more detail in [Explainable outlier detection through decision tree conditioning](http://arxiv.org/abs/2001.00636).
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            # Example outputs
         | 
| 14 | 
            +
             | 
| 15 | 
            +
            Example outliers from [hypothyroid dataset](http://archive.ics.uci.edu/ml/datasets/thyroid+disease):
         | 
| 16 | 
            +
            ```
         | 
| 17 | 
            +
            row [1137] - suspicious column: [age] - suspicious value: [75.000]
         | 
| 18 | 
            +
            	distribution: 95.122% <= 42.000 - [mean: 31.462] - [sd: 5.281] - [norm. obs: 39]
         | 
| 19 | 
            +
            	given:
         | 
| 20 | 
            +
            		[pregnant] = [t]
         | 
| 21 | 
            +
             | 
| 22 | 
            +
             | 
| 23 | 
            +
            row [2229] - suspicious column: [T3] - suspicious vale: [10.600]
         | 
| 24 | 
            +
            	distribution: 99.951% <= 7.100 - [mean: 1.984] - [sd: 0.750] - [norm. obs: 2050]
         | 
| 25 | 
            +
            	given:
         | 
| 26 | 
            +
            		[query hyperthyroid] = [f]
         | 
| 27 | 
            +
            ```
         | 
| 28 | 
            +
            (i.e. it's saying that it's abnormal to be pregnant at the age of 75, or to not be classified as hyperthyroidal when having very high thyroid hormone levels)
         | 
| 29 | 
            +
            (this dataset is also bundled into the R package - e.g. `data(hypothyroid)`)
         | 
| 30 | 
            +
             | 
| 31 | 
            +
             | 
| 32 | 
            +
            Example outlier from [Titanic dataset](https://www.kaggle.com/c/titanic):
         | 
| 33 | 
            +
            ```
         | 
| 34 | 
            +
            row [885] - suspicious column: [Fare] - suspicious value: [29.125]
         | 
| 35 | 
            +
            	distribution: 97.849% <= 15.500 - [mean: 7.887] - [sd: 1.173] - [norm. obs: 91]
         | 
| 36 | 
            +
            	given:
         | 
| 37 | 
            +
            		[Pclass] = [3]
         | 
| 38 | 
            +
            		[SibSp] = [0]
         | 
| 39 | 
            +
            		[Embarked] = [Q]
         | 
| 40 | 
            +
            ```
         | 
| 41 | 
            +
            (i.e. it's saying that the this person paid too much for the kind of accomodation he had)
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            _Note that it can also produce other types of conditions such as 'between' (for numeric intervals) or 'in' (for categorical subsets)_
         | 
| 44 | 
            +
             | 
| 45 | 
            +
            # Installation
         | 
| 46 | 
            +
             | 
| 47 | 
            +
            * For R:
         | 
| 48 | 
            +
            ```r
         | 
| 49 | 
            +
            install.packages("outliertree")
         | 
| 50 | 
            +
            ```
         | 
| 51 | 
            +
             | 
| 52 | 
            +
             | 
| 53 | 
            +
            * For Python:
         | 
| 54 | 
            +
            ```
         | 
| 55 | 
            +
            pip install outliertree
         | 
| 56 | 
            +
            ```
         | 
| 57 | 
            +
            (Package has only been tested in Python 3)
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            **Note for macOS users:** on macOS, the Python version of this package will compile **without** multi-threading capabilities. This is due to default apple's redistribution of `clang` not providing OpenMP modules, and aliasing it to `gcc` which causes confusions in build scripts. If you have a non-apple version of `clang` with the OpenMP modules, or if you have `gcc` installed, you can compile this package with multi-threading enabled by setting up an environment variable `ENABLE_OMP=1`:
         | 
| 60 | 
            +
            ```
         | 
| 61 | 
            +
            export ENABLE_OMP=1
         | 
| 62 | 
            +
            pip install outliertree
         | 
| 63 | 
            +
            ```
         | 
| 64 | 
            +
            (Alternatively, can also pass argument `enable-omp` to the `setup.py` file: `python setup.py install enable-omp`)
         | 
| 65 | 
            +
             | 
| 66 | 
            +
             | 
| 67 | 
            +
            * For C++: package doesn't have a build system, nor a `main` function that can produce an executable, but can be built as a shared object and wrapped into other languages with any C++11-compliant compiler (`std=c++11` in most compilers, `/std:c++14` in MSVC). For parallelization, needs OpenMP linkage (`-fopenmp` in most compilers, `/openmp` in MSVC). Package should *not* be built with optimization higher than `O3` (i.e. don't use `-Ofast`). Needs linkage to the `math` library, which should be enabled by default in most C++ compilers, but otherwise would require `-lm` argument. No external dependencies are required.
         | 
| 68 | 
            +
             | 
| 69 | 
            +
             | 
| 70 | 
            +
            # Sample usage
         | 
| 71 | 
            +
             | 
| 72 | 
            +
            * For R:
         | 
| 73 | 
            +
            ```r
         | 
| 74 | 
            +
            library(outliertree)
         | 
| 75 | 
            +
             | 
| 76 | 
            +
            ### random data frame with an obvious outlier
         | 
| 77 | 
            +
            nrows = 100
         | 
| 78 | 
            +
            set.seed(1)
         | 
| 79 | 
            +
            df = data.frame(
         | 
| 80 | 
            +
            	numeric_col1 = c(rnorm(nrows - 1), 1e6),
         | 
| 81 | 
            +
            	numeric_col2 = rgamma(nrows, 1),
         | 
| 82 | 
            +
            	categ_col    = sample(c('categA', 'categB', 'categC'), size = nrows, replace = TRUE)
         | 
| 83 | 
            +
            	)
         | 
| 84 | 
            +
             | 
| 85 | 
            +
            ### test data frame with another obvious outlier
         | 
| 86 | 
            +
            nrows_test = 50
         | 
| 87 | 
            +
            df_test = data.frame(
         | 
| 88 | 
            +
            	numeric_col1 = rnorm(nrows_test),
         | 
| 89 | 
            +
            	numeric_col2 = c(-1e6, rgamma(nrows_test - 1, 1)),
         | 
| 90 | 
            +
            	categ_col    = sample(c('categA', 'categB', 'categC'), size = nrows_test, replace = TRUE)
         | 
| 91 | 
            +
            	)
         | 
| 92 | 
            +
             | 
| 93 | 
            +
            ### fit model
         | 
| 94 | 
            +
            outliers_model = outliertree::outlier.tree(df, outliers_print = 10, save_outliers = TRUE)
         | 
| 95 | 
            +
             | 
| 96 | 
            +
            ### find outliers in new data
         | 
| 97 | 
            +
            new_outliers = predict(outliers_model, df_test, outliers_print = 10, return_outliers = TRUE)
         | 
| 98 | 
            +
             | 
| 99 | 
            +
            ### print outliers in readable format
         | 
| 100 | 
            +
            summary(new_outliers)
         | 
| 101 | 
            +
            ```
         | 
| 102 | 
            +
            (see documentation for more examples)
         | 
| 103 | 
            +
             | 
| 104 | 
            +
            Example [RMarkdown](http://htmlpreview.github.io/?https://github.com/david-cortes/outliertree/blob/master/example/titanic_outliertree_r.html) using the Titanic dataset.
         | 
| 105 | 
            +
             | 
| 106 | 
            +
             | 
| 107 | 
            +
            * For Python:
         | 
| 108 | 
            +
            ```python
         | 
| 109 | 
            +
            import numpy as np, pandas as pd
         | 
| 110 | 
            +
            from outliertree import OutlierTree
         | 
| 111 | 
            +
             | 
| 112 | 
            +
            ### random data frame with an obvious outlier
         | 
| 113 | 
            +
            nrows = 100
         | 
| 114 | 
            +
            np.random.seed(1)
         | 
| 115 | 
            +
            df = pd.DataFrame({
         | 
| 116 | 
            +
            	"numeric_col1" : np.r_[np.random.normal(size = nrows - 1), np.array([float(1e6)])],
         | 
| 117 | 
            +
            	"numeric_col2" : np.random.gamma(1, 1, size = nrows),
         | 
| 118 | 
            +
            	"categ_col"    : np.random.choice(['categA', 'categB', 'categC'], size = nrows)
         | 
| 119 | 
            +
            	})
         | 
| 120 | 
            +
             | 
| 121 | 
            +
            ### test data frame with another obvious outlier
         | 
| 122 | 
            +
            df_test = pd.DataFrame({
         | 
| 123 | 
            +
            	"numeric_col1" : np.random.normal(size = nrows),
         | 
| 124 | 
            +
            	"numeric_col2" : np.r_[np.array([float(-1e6)]), np.random.gamma(1, 1, size = nrows - 1)],
         | 
| 125 | 
            +
            	"categ_col"    : np.random.choice(['categA', 'categB', 'categC'], size = nrows)
         | 
| 126 | 
            +
            	})
         | 
| 127 | 
            +
             | 
| 128 | 
            +
            ### fit model
         | 
| 129 | 
            +
            outliers_model = OutlierTree()
         | 
| 130 | 
            +
            outliers_df = outliers_model.fit(df, outliers_print = 10, return_outliers = True)
         | 
| 131 | 
            +
             | 
| 132 | 
            +
            ### find outliers in new data
         | 
| 133 | 
            +
            new_outliers = outliers_model.predict(df_test)
         | 
| 134 | 
            +
             | 
| 135 | 
            +
            ### print outliers in readable format
         | 
| 136 | 
            +
            outliers_model.print_outliers(new_outliers)
         | 
| 137 | 
            +
            ```
         | 
| 138 | 
            +
             | 
| 139 | 
            +
            Example [IPython notebook](http://nbviewer.ipython.org/github/david-cortes/outliertree/blob/master/example/titanic_outliertree_python.ipynb) using the Titanic dataset.
         | 
| 140 | 
            +
             | 
| 141 | 
            +
            * For C++: see functions `fit_outliers_models` and `find_new_outliers` in header `outlier_tree.hpp`.
         | 
| 142 | 
            +
             | 
| 143 | 
            +
            # Documentation
         | 
| 144 | 
            +
             | 
| 145 | 
            +
            * For R : documentation is built-in in the package (e.g. `help(outliertree::outlier.tree)`) - PDF can be downloaded in [CRAN](https://cran.r-project.org/web/packages/outliertree/index.html).
         | 
| 146 | 
            +
             | 
| 147 | 
            +
            * For Python: documentation is available at [ReadTheDocs](http://outliertree.readthedocs.io/en/latest/) (and it's also built-in in the package as docstrings, e.g. `help(outliertree.OutlierTree.fit)`).
         | 
| 148 | 
            +
             | 
| 149 | 
            +
            * For C++: documentation is available in the source files (not in the header).
         | 
| 150 | 
            +
             | 
| 151 | 
            +
            # References
         | 
| 152 | 
            +
             | 
| 153 | 
            +
            * Cortes, David. "Explainable outlier detection through decision tree conditioning." arXiv preprint arXiv:2001.00636 (2020).
         | 
| 154 | 
            +
            * [GritBot software](https://www.rulequest.com/gritbot-info.html) .
         | 
| 155 | 
            +
             | 
| @@ -0,0 +1,123 @@ | |
| 1 | 
            +
            // Generated by using Rcpp::compileAttributes() -> do not edit by hand
         | 
| 2 | 
            +
            // Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393
         | 
| 3 | 
            +
             | 
| 4 | 
            +
            #include <Rcpp.h>
         | 
| 5 | 
            +
             | 
| 6 | 
            +
            using namespace Rcpp;
         | 
| 7 | 
            +
             | 
| 8 | 
            +
            // deserialize_OutlierTree
         | 
| 9 | 
            +
            SEXP deserialize_OutlierTree(Rcpp::RawVector src);
         | 
| 10 | 
            +
            RcppExport SEXP _outliertree_deserialize_OutlierTree(SEXP srcSEXP) {
         | 
| 11 | 
            +
            BEGIN_RCPP
         | 
| 12 | 
            +
                Rcpp::RObject rcpp_result_gen;
         | 
| 13 | 
            +
                Rcpp::RNGScope rcpp_rngScope_gen;
         | 
| 14 | 
            +
                Rcpp::traits::input_parameter< Rcpp::RawVector >::type src(srcSEXP);
         | 
| 15 | 
            +
                rcpp_result_gen = Rcpp::wrap(deserialize_OutlierTree(src));
         | 
| 16 | 
            +
                return rcpp_result_gen;
         | 
| 17 | 
            +
            END_RCPP
         | 
| 18 | 
            +
            }
         | 
| 19 | 
            +
            // check_null_ptr_model
         | 
| 20 | 
            +
            Rcpp::LogicalVector check_null_ptr_model(SEXP ptr_model);
         | 
| 21 | 
            +
            RcppExport SEXP _outliertree_check_null_ptr_model(SEXP ptr_modelSEXP) {
         | 
| 22 | 
            +
            BEGIN_RCPP
         | 
| 23 | 
            +
                Rcpp::RObject rcpp_result_gen;
         | 
| 24 | 
            +
                Rcpp::RNGScope rcpp_rngScope_gen;
         | 
| 25 | 
            +
                Rcpp::traits::input_parameter< SEXP >::type ptr_model(ptr_modelSEXP);
         | 
| 26 | 
            +
                rcpp_result_gen = Rcpp::wrap(check_null_ptr_model(ptr_model));
         | 
| 27 | 
            +
                return rcpp_result_gen;
         | 
| 28 | 
            +
            END_RCPP
         | 
| 29 | 
            +
            }
         | 
| 30 | 
            +
            // fit_OutlierTree
         | 
| 31 | 
            +
            Rcpp::List fit_OutlierTree(Rcpp::NumericVector arr_num, size_t ncols_numeric, Rcpp::IntegerVector arr_cat, size_t ncols_categ, Rcpp::IntegerVector ncat, Rcpp::IntegerVector arr_ord, size_t ncols_ord, Rcpp::IntegerVector ncat_ord, size_t nrows, Rcpp::LogicalVector cols_ignore_r, int nthreads, bool categ_as_bin, bool ord_as_bin, bool cat_bruteforce_subset, bool categ_from_maj, bool take_mid, size_t max_depth, double max_perc_outliers, size_t min_size_numeric, size_t min_size_categ, double min_gain, bool follow_all, bool gain_as_pct, double z_norm, double z_outlier, bool return_outliers, Rcpp::ListOf<Rcpp::StringVector> cat_levels, Rcpp::ListOf<Rcpp::StringVector> ord_levels, Rcpp::StringVector colnames_num, Rcpp::StringVector colnames_cat, Rcpp::StringVector colnames_ord, Rcpp::NumericVector min_date, Rcpp::NumericVector min_ts);
         | 
| 32 | 
            +
            RcppExport SEXP _outliertree_fit_OutlierTree(SEXP arr_numSEXP, SEXP ncols_numericSEXP, SEXP arr_catSEXP, SEXP ncols_categSEXP, SEXP ncatSEXP, SEXP arr_ordSEXP, SEXP ncols_ordSEXP, SEXP ncat_ordSEXP, SEXP nrowsSEXP, SEXP cols_ignore_rSEXP, SEXP nthreadsSEXP, SEXP categ_as_binSEXP, SEXP ord_as_binSEXP, SEXP cat_bruteforce_subsetSEXP, SEXP categ_from_majSEXP, SEXP take_midSEXP, SEXP max_depthSEXP, SEXP max_perc_outliersSEXP, SEXP min_size_numericSEXP, SEXP min_size_categSEXP, SEXP min_gainSEXP, SEXP follow_allSEXP, SEXP gain_as_pctSEXP, SEXP z_normSEXP, SEXP z_outlierSEXP, SEXP return_outliersSEXP, SEXP cat_levelsSEXP, SEXP ord_levelsSEXP, SEXP colnames_numSEXP, SEXP colnames_catSEXP, SEXP colnames_ordSEXP, SEXP min_dateSEXP, SEXP min_tsSEXP) {
         | 
| 33 | 
            +
            BEGIN_RCPP
         | 
| 34 | 
            +
                Rcpp::RObject rcpp_result_gen;
         | 
| 35 | 
            +
                Rcpp::RNGScope rcpp_rngScope_gen;
         | 
| 36 | 
            +
                Rcpp::traits::input_parameter< Rcpp::NumericVector >::type arr_num(arr_numSEXP);
         | 
| 37 | 
            +
                Rcpp::traits::input_parameter< size_t >::type ncols_numeric(ncols_numericSEXP);
         | 
| 38 | 
            +
                Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type arr_cat(arr_catSEXP);
         | 
| 39 | 
            +
                Rcpp::traits::input_parameter< size_t >::type ncols_categ(ncols_categSEXP);
         | 
| 40 | 
            +
                Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type ncat(ncatSEXP);
         | 
| 41 | 
            +
                Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type arr_ord(arr_ordSEXP);
         | 
| 42 | 
            +
                Rcpp::traits::input_parameter< size_t >::type ncols_ord(ncols_ordSEXP);
         | 
| 43 | 
            +
                Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type ncat_ord(ncat_ordSEXP);
         | 
| 44 | 
            +
                Rcpp::traits::input_parameter< size_t >::type nrows(nrowsSEXP);
         | 
| 45 | 
            +
                Rcpp::traits::input_parameter< Rcpp::LogicalVector >::type cols_ignore_r(cols_ignore_rSEXP);
         | 
| 46 | 
            +
                Rcpp::traits::input_parameter< int >::type nthreads(nthreadsSEXP);
         | 
| 47 | 
            +
                Rcpp::traits::input_parameter< bool >::type categ_as_bin(categ_as_binSEXP);
         | 
| 48 | 
            +
                Rcpp::traits::input_parameter< bool >::type ord_as_bin(ord_as_binSEXP);
         | 
| 49 | 
            +
                Rcpp::traits::input_parameter< bool >::type cat_bruteforce_subset(cat_bruteforce_subsetSEXP);
         | 
| 50 | 
            +
                Rcpp::traits::input_parameter< bool >::type categ_from_maj(categ_from_majSEXP);
         | 
| 51 | 
            +
                Rcpp::traits::input_parameter< bool >::type take_mid(take_midSEXP);
         | 
| 52 | 
            +
                Rcpp::traits::input_parameter< size_t >::type max_depth(max_depthSEXP);
         | 
| 53 | 
            +
                Rcpp::traits::input_parameter< double >::type max_perc_outliers(max_perc_outliersSEXP);
         | 
| 54 | 
            +
                Rcpp::traits::input_parameter< size_t >::type min_size_numeric(min_size_numericSEXP);
         | 
| 55 | 
            +
                Rcpp::traits::input_parameter< size_t >::type min_size_categ(min_size_categSEXP);
         | 
| 56 | 
            +
                Rcpp::traits::input_parameter< double >::type min_gain(min_gainSEXP);
         | 
| 57 | 
            +
                Rcpp::traits::input_parameter< bool >::type follow_all(follow_allSEXP);
         | 
| 58 | 
            +
                Rcpp::traits::input_parameter< bool >::type gain_as_pct(gain_as_pctSEXP);
         | 
| 59 | 
            +
                Rcpp::traits::input_parameter< double >::type z_norm(z_normSEXP);
         | 
| 60 | 
            +
                Rcpp::traits::input_parameter< double >::type z_outlier(z_outlierSEXP);
         | 
| 61 | 
            +
                Rcpp::traits::input_parameter< bool >::type return_outliers(return_outliersSEXP);
         | 
| 62 | 
            +
                Rcpp::traits::input_parameter< Rcpp::ListOf<Rcpp::StringVector> >::type cat_levels(cat_levelsSEXP);
         | 
| 63 | 
            +
                Rcpp::traits::input_parameter< Rcpp::ListOf<Rcpp::StringVector> >::type ord_levels(ord_levelsSEXP);
         | 
| 64 | 
            +
                Rcpp::traits::input_parameter< Rcpp::StringVector >::type colnames_num(colnames_numSEXP);
         | 
| 65 | 
            +
                Rcpp::traits::input_parameter< Rcpp::StringVector >::type colnames_cat(colnames_catSEXP);
         | 
| 66 | 
            +
                Rcpp::traits::input_parameter< Rcpp::StringVector >::type colnames_ord(colnames_ordSEXP);
         | 
| 67 | 
            +
                Rcpp::traits::input_parameter< Rcpp::NumericVector >::type min_date(min_dateSEXP);
         | 
| 68 | 
            +
                Rcpp::traits::input_parameter< Rcpp::NumericVector >::type min_ts(min_tsSEXP);
         | 
| 69 | 
            +
                rcpp_result_gen = Rcpp::wrap(fit_OutlierTree(arr_num, ncols_numeric, arr_cat, ncols_categ, ncat, arr_ord, ncols_ord, ncat_ord, nrows, cols_ignore_r, nthreads, categ_as_bin, ord_as_bin, cat_bruteforce_subset, categ_from_maj, take_mid, max_depth, max_perc_outliers, min_size_numeric, min_size_categ, min_gain, follow_all, gain_as_pct, z_norm, z_outlier, return_outliers, cat_levels, ord_levels, colnames_num, colnames_cat, colnames_ord, min_date, min_ts));
         | 
| 70 | 
            +
                return rcpp_result_gen;
         | 
| 71 | 
            +
            END_RCPP
         | 
| 72 | 
            +
            }
         | 
| 73 | 
            +
            // predict_OutlierTree
         | 
| 74 | 
            +
            Rcpp::List predict_OutlierTree(SEXP ptr_model, size_t nrows, int nthreads, Rcpp::NumericVector arr_num, Rcpp::IntegerVector arr_cat, Rcpp::IntegerVector arr_ord, Rcpp::ListOf<Rcpp::StringVector> cat_levels, Rcpp::ListOf<Rcpp::StringVector> ord_levels, Rcpp::StringVector colnames_num, Rcpp::StringVector colnames_cat, Rcpp::StringVector colnames_ord, Rcpp::NumericVector min_date, Rcpp::NumericVector min_ts);
         | 
| 75 | 
            +
            RcppExport SEXP _outliertree_predict_OutlierTree(SEXP ptr_modelSEXP, SEXP nrowsSEXP, SEXP nthreadsSEXP, SEXP arr_numSEXP, SEXP arr_catSEXP, SEXP arr_ordSEXP, SEXP cat_levelsSEXP, SEXP ord_levelsSEXP, SEXP colnames_numSEXP, SEXP colnames_catSEXP, SEXP colnames_ordSEXP, SEXP min_dateSEXP, SEXP min_tsSEXP) {
         | 
| 76 | 
            +
            BEGIN_RCPP
         | 
| 77 | 
            +
                Rcpp::RObject rcpp_result_gen;
         | 
| 78 | 
            +
                Rcpp::RNGScope rcpp_rngScope_gen;
         | 
| 79 | 
            +
                Rcpp::traits::input_parameter< SEXP >::type ptr_model(ptr_modelSEXP);
         | 
| 80 | 
            +
                Rcpp::traits::input_parameter< size_t >::type nrows(nrowsSEXP);
         | 
| 81 | 
            +
                Rcpp::traits::input_parameter< int >::type nthreads(nthreadsSEXP);
         | 
| 82 | 
            +
                Rcpp::traits::input_parameter< Rcpp::NumericVector >::type arr_num(arr_numSEXP);
         | 
| 83 | 
            +
                Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type arr_cat(arr_catSEXP);
         | 
| 84 | 
            +
                Rcpp::traits::input_parameter< Rcpp::IntegerVector >::type arr_ord(arr_ordSEXP);
         | 
| 85 | 
            +
                Rcpp::traits::input_parameter< Rcpp::ListOf<Rcpp::StringVector> >::type cat_levels(cat_levelsSEXP);
         | 
| 86 | 
            +
                Rcpp::traits::input_parameter< Rcpp::ListOf<Rcpp::StringVector> >::type ord_levels(ord_levelsSEXP);
         | 
| 87 | 
            +
                Rcpp::traits::input_parameter< Rcpp::StringVector >::type colnames_num(colnames_numSEXP);
         | 
| 88 | 
            +
                Rcpp::traits::input_parameter< Rcpp::StringVector >::type colnames_cat(colnames_catSEXP);
         | 
| 89 | 
            +
                Rcpp::traits::input_parameter< Rcpp::StringVector >::type colnames_ord(colnames_ordSEXP);
         | 
| 90 | 
            +
                Rcpp::traits::input_parameter< Rcpp::NumericVector >::type min_date(min_dateSEXP);
         | 
| 91 | 
            +
                Rcpp::traits::input_parameter< Rcpp::NumericVector >::type min_ts(min_tsSEXP);
         | 
| 92 | 
            +
                rcpp_result_gen = Rcpp::wrap(predict_OutlierTree(ptr_model, nrows, nthreads, arr_num, arr_cat, arr_ord, cat_levels, ord_levels, colnames_num, colnames_cat, colnames_ord, min_date, min_ts));
         | 
| 93 | 
            +
                return rcpp_result_gen;
         | 
| 94 | 
            +
            END_RCPP
         | 
| 95 | 
            +
            }
         | 
| 96 | 
            +
            // check_few_values
         | 
| 97 | 
            +
            Rcpp::LogicalVector check_few_values(Rcpp::NumericVector arr_num, size_t nrows, size_t ncols, int nthreads);
         | 
| 98 | 
            +
            RcppExport SEXP _outliertree_check_few_values(SEXP arr_numSEXP, SEXP nrowsSEXP, SEXP ncolsSEXP, SEXP nthreadsSEXP) {
         | 
| 99 | 
            +
            BEGIN_RCPP
         | 
| 100 | 
            +
                Rcpp::RObject rcpp_result_gen;
         | 
| 101 | 
            +
                Rcpp::RNGScope rcpp_rngScope_gen;
         | 
| 102 | 
            +
                Rcpp::traits::input_parameter< Rcpp::NumericVector >::type arr_num(arr_numSEXP);
         | 
| 103 | 
            +
                Rcpp::traits::input_parameter< size_t >::type nrows(nrowsSEXP);
         | 
| 104 | 
            +
                Rcpp::traits::input_parameter< size_t >::type ncols(ncolsSEXP);
         | 
| 105 | 
            +
                Rcpp::traits::input_parameter< int >::type nthreads(nthreadsSEXP);
         | 
| 106 | 
            +
                rcpp_result_gen = Rcpp::wrap(check_few_values(arr_num, nrows, ncols, nthreads));
         | 
| 107 | 
            +
                return rcpp_result_gen;
         | 
| 108 | 
            +
            END_RCPP
         | 
| 109 | 
            +
            }
         | 
| 110 | 
            +
             | 
| 111 | 
            +
            static const R_CallMethodDef CallEntries[] = {
         | 
| 112 | 
            +
                {"_outliertree_deserialize_OutlierTree", (DL_FUNC) &_outliertree_deserialize_OutlierTree, 1},
         | 
| 113 | 
            +
                {"_outliertree_check_null_ptr_model", (DL_FUNC) &_outliertree_check_null_ptr_model, 1},
         | 
| 114 | 
            +
                {"_outliertree_fit_OutlierTree", (DL_FUNC) &_outliertree_fit_OutlierTree, 33},
         | 
| 115 | 
            +
                {"_outliertree_predict_OutlierTree", (DL_FUNC) &_outliertree_predict_OutlierTree, 13},
         | 
| 116 | 
            +
                {"_outliertree_check_few_values", (DL_FUNC) &_outliertree_check_few_values, 4},
         | 
| 117 | 
            +
                {NULL, NULL, 0}
         | 
| 118 | 
            +
            };
         | 
| 119 | 
            +
             | 
| 120 | 
            +
            RcppExport void R_init_outliertree(DllInfo *dll) {
         | 
| 121 | 
            +
                R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
         | 
| 122 | 
            +
                R_useDynamicSymbols(dll, FALSE);
         | 
| 123 | 
            +
            }
         | 
| @@ -0,0 +1,1225 @@ | |
| 1 | 
            +
            #include <Rcpp.h>
         | 
| 2 | 
            +
            // [[Rcpp::plugins(cpp11)]]
         | 
| 3 | 
            +
             | 
| 4 | 
            +
            /* This is to serialize the model objects */
         | 
| 5 | 
            +
            // [[Rcpp::depends(Rcereal)]]
         | 
| 6 | 
            +
            #include <cereal/archives/binary.hpp>
         | 
| 7 | 
            +
            #include <cereal/types/vector.hpp>
         | 
| 8 | 
            +
            #include <sstream>
         | 
| 9 | 
            +
            #include <string>
         | 
| 10 | 
            +
             | 
| 11 | 
            +
            /* This is the package's header */
         | 
| 12 | 
            +
            #include "outlier_tree.hpp"
         | 
| 13 | 
            +
             | 
| 14 | 
            +
            /* for model serialization and re-usage in R */
         | 
| 15 | 
            +
            /* https://stackoverflow.com/questions/18474292/how-to-handle-c-internal-data-structure-in-r-in-order-to-allow-save-load */
         | 
| 16 | 
            +
            /* this extra comment below the link is a workaround for Rcpp issue 675 in GitHub, do not remove it */
         | 
| 17 | 
            +
            #include <Rinternals.h>
         | 
| 18 | 
            +
            Rcpp::RawVector serialize_OutlierTree(ModelOutputs *model_outputs)
         | 
| 19 | 
            +
            {
         | 
| 20 | 
            +
                std::stringstream ss;
         | 
| 21 | 
            +
                {
         | 
| 22 | 
            +
                    cereal::BinaryOutputArchive oarchive(ss); // Create an output archive
         | 
| 23 | 
            +
                    oarchive(*model_outputs);
         | 
| 24 | 
            +
                }
         | 
| 25 | 
            +
                ss.seekg(0, ss.end);
         | 
| 26 | 
            +
                Rcpp::RawVector retval(ss.tellg());
         | 
| 27 | 
            +
                ss.seekg(0, ss.beg);
         | 
| 28 | 
            +
                ss.read(reinterpret_cast<char*>(&retval[0]), retval.size());
         | 
| 29 | 
            +
                return retval;
         | 
| 30 | 
            +
            }
         | 
| 31 | 
            +
             | 
| 32 | 
            +
            // [[Rcpp::export]]
         | 
| 33 | 
            +
            SEXP deserialize_OutlierTree(Rcpp::RawVector src)
         | 
| 34 | 
            +
            {
         | 
| 35 | 
            +
                std::stringstream ss;
         | 
| 36 | 
            +
                ss.write(reinterpret_cast<char*>(&src[0]), src.size());
         | 
| 37 | 
            +
                ss.seekg(0, ss.beg);
         | 
| 38 | 
            +
                std::unique_ptr<ModelOutputs> model_outputs = std::unique_ptr<ModelOutputs>(new ModelOutputs());
         | 
| 39 | 
            +
                {
         | 
| 40 | 
            +
                    cereal::BinaryInputArchive iarchive(ss);
         | 
| 41 | 
            +
                    iarchive(*model_outputs);
         | 
| 42 | 
            +
                }
         | 
| 43 | 
            +
                return Rcpp::XPtr<ModelOutputs>(model_outputs.release(), true);
         | 
| 44 | 
            +
            }
         | 
| 45 | 
            +
             | 
| 46 | 
            +
            // [[Rcpp::export]]
         | 
| 47 | 
            +
            Rcpp::LogicalVector check_null_ptr_model(SEXP ptr_model)
         | 
| 48 | 
            +
            {
         | 
| 49 | 
            +
                return Rcpp::LogicalVector(R_ExternalPtrAddr(ptr_model) == NULL);
         | 
| 50 | 
            +
            }
         | 
| 51 | 
            +
             | 
| 52 | 
            +
            double* set_R_nan_as_C_nan(double *restrict x_R, std::vector<double> &x_C, size_t n, int nthreads)
         | 
| 53 | 
            +
            {
         | 
| 54 | 
            +
                x_C.assign(x_R, x_R + n);
         | 
| 55 | 
            +
                #pragma omp parallel for schedule(static) num_threads(nthreads) shared(x_R, x_C, n)
         | 
| 56 | 
            +
                for (size_t_for i = 0; i < n; i++)
         | 
| 57 | 
            +
                    if (isnan(x_R[i]) || Rcpp::NumericVector::is_na(x_R[i]) || Rcpp::traits::is_nan<REALSXP>(x_R[i]))
         | 
| 58 | 
            +
                        x_C[i] = NAN;
         | 
| 59 | 
            +
                return x_C.data();
         | 
| 60 | 
            +
            }
         | 
| 61 | 
            +
             | 
| 62 | 
            +
             | 
| 63 | 
            +
            /* for predicting outliers */
         | 
| 64 | 
            +
            Rcpp::List describe_outliers(ModelOutputs &model_outputs,
         | 
| 65 | 
            +
                                         double *arr_num,
         | 
| 66 | 
            +
                                         int    *arr_cat,
         | 
| 67 | 
            +
                                         int    *arr_ord,
         | 
| 68 | 
            +
                                         Rcpp::ListOf<Rcpp::StringVector> cat_levels,
         | 
| 69 | 
            +
                                         Rcpp::ListOf<Rcpp::StringVector> ord_levels,
         | 
| 70 | 
            +
                                         Rcpp::StringVector colnames_num,
         | 
| 71 | 
            +
                                         Rcpp::StringVector colnames_cat,
         | 
| 72 | 
            +
                                         Rcpp::StringVector colnames_ord,
         | 
| 73 | 
            +
                                         Rcpp::NumericVector min_date,
         | 
| 74 | 
            +
                                         Rcpp::NumericVector min_ts)
         | 
| 75 | 
            +
            {
         | 
| 76 | 
            +
                size_t nrows         = model_outputs.outlier_scores_final.size();
         | 
| 77 | 
            +
                size_t ncols_num     = model_outputs.ncols_numeric;
         | 
| 78 | 
            +
                size_t ncols_cat     = model_outputs.ncols_categ;
         | 
| 79 | 
            +
                size_t ncols_num_num = model_outputs.ncols_numeric - min_date.size() - min_ts.size();
         | 
| 80 | 
            +
                size_t ncols_date    = min_date.size();
         | 
| 81 | 
            +
                size_t ncols_cat_cat = cat_levels.size();
         | 
| 82 | 
            +
                Rcpp::List outp;
         | 
| 83 | 
            +
                
         | 
| 84 | 
            +
                Rcpp::LogicalVector has_na_col       = Rcpp::LogicalVector(nrows, NA_LOGICAL);
         | 
| 85 | 
            +
                Rcpp::IntegerVector tree_depth       = Rcpp::IntegerVector(nrows, NA_INTEGER);
         | 
| 86 | 
            +
                Rcpp::NumericVector outlier_score    = Rcpp::NumericVector(nrows, NA_REAL);
         | 
| 87 | 
            +
                Rcpp::ListOf<Rcpp::List> outlier_val = Rcpp::ListOf<Rcpp::List>(nrows);
         | 
| 88 | 
            +
                Rcpp::ListOf<Rcpp::List> lst_stats   = Rcpp::ListOf<Rcpp::List>(nrows);
         | 
| 89 | 
            +
                Rcpp::ListOf<Rcpp::List> lst_cond    = Rcpp::ListOf<Rcpp::List>(nrows);
         | 
| 90 | 
            +
                
         | 
| 91 | 
            +
                
         | 
| 92 | 
            +
                size_t outl_col;
         | 
| 93 | 
            +
                size_t outl_clust;
         | 
| 94 | 
            +
                size_t curr_tree;
         | 
| 95 | 
            +
                size_t parent_tree;
         | 
| 96 | 
            +
                Rcpp::LogicalVector tmp_bool;
         | 
| 97 | 
            +
                
         | 
| 98 | 
            +
                for (size_t row = 0; row < nrows; row++) {
         | 
| 99 | 
            +
                    if (model_outputs.outlier_scores_final[row] < 1) {
         | 
| 100 | 
            +
                        
         | 
| 101 | 
            +
                        outl_col   = model_outputs.outlier_columns_final[row];
         | 
| 102 | 
            +
                        outl_clust = model_outputs.outlier_clusters_final[row];
         | 
| 103 | 
            +
                        
         | 
| 104 | 
            +
                        /* metrics of outlierness - used to rank when choosing which to print */
         | 
| 105 | 
            +
                        outlier_score[row] = model_outputs.outlier_scores_final[row];
         | 
| 106 | 
            +
                        tree_depth[row]    = (int)model_outputs.outlier_depth_final[row];
         | 
| 107 | 
            +
                        has_na_col[row]    = model_outputs.all_clusters[outl_col][outl_clust].has_NA_branch;
         | 
| 108 | 
            +
                        
         | 
| 109 | 
            +
                        /* first determine outlier column and suspected value */
         | 
| 110 | 
            +
                        if (outl_col < ncols_num) {
         | 
| 111 | 
            +
                            if (outl_col < ncols_num_num) {
         | 
| 112 | 
            +
                                outlier_val[row] = Rcpp::List::create(
         | 
| 113 | 
            +
                                    Rcpp::_["column"] = Rcpp::CharacterVector(1, colnames_num[outl_col]),
         | 
| 114 | 
            +
                                    Rcpp::_["value"]  = Rcpp::wrap(arr_num[row + outl_col * nrows]),
         | 
| 115 | 
            +
                                    Rcpp::_["decimals"] = Rcpp::wrap(model_outputs.outlier_decimals_distr[row])
         | 
| 116 | 
            +
                                );
         | 
| 117 | 
            +
                            } else if (outl_col < (ncols_num_num + ncols_date)) {
         | 
| 118 | 
            +
                                outlier_val[row] = Rcpp::List::create(
         | 
| 119 | 
            +
                                    Rcpp::_["column"] = Rcpp::CharacterVector(1, colnames_num[outl_col]),
         | 
| 120 | 
            +
                                    Rcpp::_["value"]  = Rcpp::Date(arr_num[row + outl_col * nrows] - 1 + min_date[outl_col - ncols_num_num])
         | 
| 121 | 
            +
                                );
         | 
| 122 | 
            +
                            } else {
         | 
| 123 | 
            +
                                outlier_val[row] = Rcpp::List::create(
         | 
| 124 | 
            +
                                    Rcpp::_["column"] = Rcpp::CharacterVector(1, colnames_num[outl_col]),
         | 
| 125 | 
            +
                                    Rcpp::_["value"]  = Rcpp::Datetime(arr_num[row + outl_col * nrows] - 1 + min_ts[outl_col - ncols_num_num - ncols_date])
         | 
| 126 | 
            +
                                );
         | 
| 127 | 
            +
                            }
         | 
| 128 | 
            +
                        } else if (outl_col < (ncols_num + ncols_cat)) {
         | 
| 129 | 
            +
                            if (outl_col < (ncols_num + ncols_cat_cat)) {
         | 
| 130 | 
            +
                                outlier_val[row] = Rcpp::List::create(
         | 
| 131 | 
            +
                                    Rcpp::_["column"] = Rcpp::CharacterVector(1, colnames_cat[outl_col - ncols_num]),
         | 
| 132 | 
            +
                                    Rcpp::_["value"]  = Rcpp::CharacterVector(1, cat_levels[outl_col - ncols_num]
         | 
| 133 | 
            +
                                                                                           [arr_cat[row + (outl_col - ncols_num) * nrows]])
         | 
| 134 | 
            +
                                );
         | 
| 135 | 
            +
                            } else {
         | 
| 136 | 
            +
                                outlier_val[row] = Rcpp::List::create(
         | 
| 137 | 
            +
                                    Rcpp::_["column"] = Rcpp::CharacterVector(1, colnames_cat[outl_col - ncols_num]),
         | 
| 138 | 
            +
                                    Rcpp::_["value"]  = Rcpp::wrap((bool)arr_cat[row + (outl_col - ncols_num) * nrows])
         | 
| 139 | 
            +
                                );
         | 
| 140 | 
            +
                            }
         | 
| 141 | 
            +
                        } else {
         | 
| 142 | 
            +
                            outlier_val[row] = Rcpp::List::create(
         | 
| 143 | 
            +
                                Rcpp::_["column"] = Rcpp::CharacterVector(1, colnames_ord[outl_col - ncols_num - ncols_cat]),
         | 
| 144 | 
            +
                                Rcpp::_["value"]  = Rcpp::CharacterVector(1, ord_levels[outl_col - ncols_num - ncols_cat]
         | 
| 145 | 
            +
                                                                                       [arr_ord[row + (outl_col - ncols_num - ncols_cat) * nrows]])
         | 
| 146 | 
            +
                            );
         | 
| 147 | 
            +
                        }
         | 
| 148 | 
            +
                        
         | 
| 149 | 
            +
                        
         | 
| 150 | 
            +
                        /* info about the normal observations in the cluster */
         | 
| 151 | 
            +
                        if (outl_col < ncols_num) {
         | 
| 152 | 
            +
                            if (outl_col < ncols_num_num) {
         | 
| 153 | 
            +
                                if (arr_num[row + outl_col * nrows] >= model_outputs.all_clusters[outl_col][outl_clust].upper_lim) {
         | 
| 154 | 
            +
                                    lst_stats[row] = Rcpp::List::create(
         | 
| 155 | 
            +
                                        Rcpp::_["upper_thr"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].display_lim_high),
         | 
| 156 | 
            +
                                        Rcpp::_["pct_below"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_below),
         | 
| 157 | 
            +
                                        Rcpp::_["mean"]      = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].display_mean),
         | 
| 158 | 
            +
                                        Rcpp::_["sd"]        = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].display_sd),
         | 
| 159 | 
            +
                                        Rcpp::_["n_obs"]     = Rcpp::wrap((int)model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 160 | 
            +
                                    );
         | 
| 161 | 
            +
                                } else {
         | 
| 162 | 
            +
                                    lst_stats[row] = Rcpp::List::create(
         | 
| 163 | 
            +
                                        Rcpp::_["lower_thr"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].display_lim_low),
         | 
| 164 | 
            +
                                        Rcpp::_["pct_above"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_above),
         | 
| 165 | 
            +
                                        Rcpp::_["mean"]      = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].display_mean),
         | 
| 166 | 
            +
                                        Rcpp::_["sd"]        = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].display_sd),
         | 
| 167 | 
            +
                                        Rcpp::_["n_obs"]     = Rcpp::wrap((int)model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 168 | 
            +
                                    );
         | 
| 169 | 
            +
                                }
         | 
| 170 | 
            +
                            } else if (outl_col < (ncols_num_num + ncols_date)) {
         | 
| 171 | 
            +
                                if (arr_num[row + outl_col * nrows] >= model_outputs.all_clusters[outl_col][outl_clust].upper_lim) {
         | 
| 172 | 
            +
                                    lst_stats[row] = Rcpp::List::create(
         | 
| 173 | 
            +
                                        Rcpp::_["upper_thr"] = Rcpp::Date(model_outputs.all_clusters[outl_col][outl_clust].display_lim_high
         | 
| 174 | 
            +
                                                                          - 1 + min_date[outl_col - ncols_num_num]),
         | 
| 175 | 
            +
                                        Rcpp::_["pct_below"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_below),
         | 
| 176 | 
            +
                                        Rcpp::_["mean"]      = Rcpp::Date(model_outputs.all_clusters[outl_col][outl_clust].display_mean - 1 + min_date[outl_col - ncols_num_num]),
         | 
| 177 | 
            +
                                        Rcpp::_["n_obs"]     = Rcpp::wrap((int)model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 178 | 
            +
                                    );
         | 
| 179 | 
            +
                                } else {
         | 
| 180 | 
            +
                                    lst_stats[row] = Rcpp::List::create(
         | 
| 181 | 
            +
                                        Rcpp::_["lower_thr"] = Rcpp::Date(model_outputs.all_clusters[outl_col][outl_clust].display_lim_low
         | 
| 182 | 
            +
                                                                          - 1 + min_date[outl_col - ncols_num_num]),
         | 
| 183 | 
            +
                                        Rcpp::_["pct_above"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_above),
         | 
| 184 | 
            +
                                        Rcpp::_["mean"]      = Rcpp::Date(model_outputs.all_clusters[outl_col][outl_clust].display_mean - 1 + min_date[outl_col - ncols_num_num]),
         | 
| 185 | 
            +
                                        Rcpp::_["n_obs"]     = Rcpp::wrap((int)model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 186 | 
            +
                                    );
         | 
| 187 | 
            +
                                }
         | 
| 188 | 
            +
                            } else {
         | 
| 189 | 
            +
                                if (arr_num[row + outl_col * nrows] >= model_outputs.all_clusters[outl_col][outl_clust].upper_lim) {
         | 
| 190 | 
            +
                                    lst_stats[row] = Rcpp::List::create(
         | 
| 191 | 
            +
                                        Rcpp::_["upper_thr"] = Rcpp::Datetime(model_outputs.all_clusters[outl_col][outl_clust].display_lim_high
         | 
| 192 | 
            +
                                                                              - 1 + min_ts[outl_col - ncols_num_num - ncols_date]),
         | 
| 193 | 
            +
                                        Rcpp::_["pct_below"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_below),
         | 
| 194 | 
            +
                                        Rcpp::_["mean"]      = Rcpp::Datetime(model_outputs.all_clusters[outl_col][outl_clust].display_mean
         | 
| 195 | 
            +
                                                                              - 1 + min_ts[outl_col - ncols_num_num - ncols_date]),
         | 
| 196 | 
            +
                                        Rcpp::_["n_obs"]     = Rcpp::wrap((int)model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 197 | 
            +
                                    );
         | 
| 198 | 
            +
                                } else {
         | 
| 199 | 
            +
                                    lst_stats[row] = Rcpp::List::create(
         | 
| 200 | 
            +
                                        Rcpp::_["lower_thr"] = Rcpp::Datetime(model_outputs.all_clusters[outl_col][outl_clust].display_lim_low
         | 
| 201 | 
            +
                                                                              - 1 + min_ts[outl_col - ncols_num_num - ncols_date]),
         | 
| 202 | 
            +
                                        Rcpp::_["pct_above"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_above),
         | 
| 203 | 
            +
                                        Rcpp::_["mean"]      = Rcpp::Datetime(model_outputs.all_clusters[outl_col][outl_clust].display_mean
         | 
| 204 | 
            +
                                                                              - 1 + min_ts[outl_col - ncols_num_num - ncols_date]),
         | 
| 205 | 
            +
                                        Rcpp::_["n_obs"]     = Rcpp::wrap((int)model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 206 | 
            +
                                    );
         | 
| 207 | 
            +
                                }
         | 
| 208 | 
            +
                            }
         | 
| 209 | 
            +
                        } else if (outl_col < (ncols_num + ncols_cat)) {
         | 
| 210 | 
            +
                            if (outl_col < (ncols_num + ncols_cat_cat)) {
         | 
| 211 | 
            +
                                tmp_bool = Rcpp::LogicalVector(model_outputs.all_clusters[outl_col][outl_clust].subset_common.size(), false);
         | 
| 212 | 
            +
                                for (size_t cat = 0; cat < tmp_bool.size(); cat++) {
         | 
| 213 | 
            +
                                    if (model_outputs.all_clusters[outl_col][outl_clust].subset_common[cat] == 0) {
         | 
| 214 | 
            +
                                        tmp_bool[cat] = true;
         | 
| 215 | 
            +
                                        }
         | 
| 216 | 
            +
                                    }
         | 
| 217 | 
            +
                                if (model_outputs.all_clusters[outl_col][outl_clust].split_type != Root) {
         | 
| 218 | 
            +
                                    if (model_outputs.all_clusters[outl_col][outl_clust].categ_maj < 0) {
         | 
| 219 | 
            +
                                        lst_stats[row] = Rcpp::List::create(
         | 
| 220 | 
            +
                                            Rcpp::_["categs_common"]      = Rcpp::as<Rcpp::CharacterVector>(cat_levels[outl_col - ncols_num][tmp_bool]),
         | 
| 221 | 
            +
                                            Rcpp::_["pct_common"]         = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_in_subset),
         | 
| 222 | 
            +
                                            Rcpp::_["pct_next_most_comm"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_next_most_comm),
         | 
| 223 | 
            +
                                            Rcpp::_["prior_prob"]         = Rcpp::wrap(model_outputs.prop_categ[model_outputs.start_ix_cat_counts[outl_col - ncols_num] +
         | 
| 224 | 
            +
                                                                                       arr_cat[row + (outl_col - ncols_num) * nrows]]),
         | 
| 225 | 
            +
                                            Rcpp::_["n_obs"]              = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 226 | 
            +
                                        );
         | 
| 227 | 
            +
                                    } else {
         | 
| 228 | 
            +
                                        lst_stats[row] = Rcpp::List::create(
         | 
| 229 | 
            +
                                            Rcpp::_["categ_maj"]   = Rcpp::as<Rcpp::CharacterVector>(cat_levels[outl_col - ncols_num][
         | 
| 230 | 
            +
                                                                                                            model_outputs.all_clusters[outl_col][outl_clust].categ_maj
         | 
| 231 | 
            +
                                                                                                            ]),
         | 
| 232 | 
            +
                                            Rcpp::_["pct_common"]  = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_in_subset),
         | 
| 233 | 
            +
                                            Rcpp::_["prior_prob"]  = Rcpp::wrap(model_outputs.prop_categ[model_outputs.start_ix_cat_counts[outl_col - ncols_num] +
         | 
| 234 | 
            +
                                                                                arr_cat[row + (outl_col - ncols_num) * nrows]]),
         | 
| 235 | 
            +
                                            Rcpp::_["n_obs"]       = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 236 | 
            +
                                        );
         | 
| 237 | 
            +
                                    }
         | 
| 238 | 
            +
                                } else {
         | 
| 239 | 
            +
                                    lst_stats[row] = Rcpp::List::create(
         | 
| 240 | 
            +
                                        Rcpp::_["categs_common"]      = Rcpp::as<Rcpp::CharacterVector>(cat_levels[outl_col - ncols_num][tmp_bool]),
         | 
| 241 | 
            +
                                        Rcpp::_["pct_common"]         = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_in_subset),
         | 
| 242 | 
            +
                                        Rcpp::_["pct_next_most_comm"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_next_most_comm),
         | 
| 243 | 
            +
                                        Rcpp::_["n_obs"]              = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 244 | 
            +
                                    );
         | 
| 245 | 
            +
                                }
         | 
| 246 | 
            +
                            } else {
         | 
| 247 | 
            +
                                lst_stats[row] = Rcpp::List::create(
         | 
| 248 | 
            +
                                    Rcpp::_["pct_other"]  = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_in_subset),
         | 
| 249 | 
            +
                                    Rcpp::_["prior_prob"] = Rcpp::wrap(model_outputs.prop_categ[model_outputs.start_ix_cat_counts[outl_col - ncols_num] +
         | 
| 250 | 
            +
                                                                       arr_cat[row + (outl_col - ncols_num) * nrows]]),
         | 
| 251 | 
            +
                                    Rcpp::_["n_obs"]      = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 252 | 
            +
                                );
         | 
| 253 | 
            +
                            }
         | 
| 254 | 
            +
                        } else {
         | 
| 255 | 
            +
                            tmp_bool = Rcpp::LogicalVector(model_outputs.all_clusters[outl_col][outl_clust].subset_common.size(), false);
         | 
| 256 | 
            +
                            for (size_t cat = 0; cat < tmp_bool.size(); cat++) {
         | 
| 257 | 
            +
                                if (model_outputs.all_clusters[outl_col][outl_clust].subset_common[cat] == 0) {
         | 
| 258 | 
            +
                                    tmp_bool[cat] = true;
         | 
| 259 | 
            +
                                }
         | 
| 260 | 
            +
                            }
         | 
| 261 | 
            +
                            if (model_outputs.all_clusters[outl_col][outl_clust].split_type != Root) {
         | 
| 262 | 
            +
                                if (model_outputs.all_clusters[outl_col][outl_clust].categ_maj < 0) {
         | 
| 263 | 
            +
                                    lst_stats[row] = Rcpp::List::create(
         | 
| 264 | 
            +
                                        Rcpp::_["categs_common"]      = Rcpp::as<Rcpp::CharacterVector>(ord_levels[outl_col - ncols_num - ncols_cat][tmp_bool]),
         | 
| 265 | 
            +
                                        Rcpp::_["pct_common"]         = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_in_subset),
         | 
| 266 | 
            +
                                        Rcpp::_["pct_next_most_comm"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_next_most_comm),
         | 
| 267 | 
            +
                                        Rcpp::_["prior_prob"]         = Rcpp::wrap(model_outputs.prop_categ[model_outputs.start_ix_cat_counts[outl_col - ncols_num] +
         | 
| 268 | 
            +
                                                                                   arr_ord[row + (outl_col - ncols_num - ncols_cat) * nrows]]),
         | 
| 269 | 
            +
                                        Rcpp::_["n_obs"]              = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 270 | 
            +
                                    );
         | 
| 271 | 
            +
                                } else {
         | 
| 272 | 
            +
                                    lst_stats[row] = Rcpp::List::create(
         | 
| 273 | 
            +
                                        Rcpp::_["categ_maj"]   = Rcpp::as<Rcpp::CharacterVector>(ord_levels[outl_col - ncols_num - ncols_cat][
         | 
| 274 | 
            +
                                                                                                    model_outputs.all_clusters[outl_col][outl_clust].categ_maj
         | 
| 275 | 
            +
                                                                                                    ]),
         | 
| 276 | 
            +
                                        Rcpp::_["pct_common"]  = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_in_subset),
         | 
| 277 | 
            +
                                        Rcpp::_["prior_prob"]  = Rcpp::wrap(model_outputs.prop_categ[model_outputs.start_ix_cat_counts[outl_col - ncols_num] +
         | 
| 278 | 
            +
                                                                            arr_ord[row + (outl_col - ncols_num - ncols_cat) * nrows]]),
         | 
| 279 | 
            +
                                        Rcpp::_["n_obs"]       = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 280 | 
            +
                                    );
         | 
| 281 | 
            +
                                }
         | 
| 282 | 
            +
                            } else {
         | 
| 283 | 
            +
                                lst_stats[row] = Rcpp::List::create(
         | 
| 284 | 
            +
                                    Rcpp::_["categs_common"]      = Rcpp::as<Rcpp::CharacterVector>(ord_levels[outl_col - ncols_num - ncols_cat][tmp_bool]),
         | 
| 285 | 
            +
                                    Rcpp::_["pct_common"]         = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_in_subset),
         | 
| 286 | 
            +
                                    Rcpp::_["pct_next_most_comm"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].perc_next_most_comm),
         | 
| 287 | 
            +
                                    Rcpp::_["n_obs"]              = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].cluster_size)
         | 
| 288 | 
            +
                                );
         | 
| 289 | 
            +
                            }
         | 
| 290 | 
            +
                        }
         | 
| 291 | 
            +
                        
         | 
| 292 | 
            +
                        
         | 
| 293 | 
            +
                        /* then determine conditions from the cluster */
         | 
| 294 | 
            +
                        Rcpp::List cond_clust;
         | 
| 295 | 
            +
                        if (model_outputs.all_clusters[outl_col][outl_clust].column_type != NoType) {
         | 
| 296 | 
            +
             | 
| 297 | 
            +
                            /* add the column name and actual value for the row */
         | 
| 298 | 
            +
                            switch(model_outputs.all_clusters[outl_col][outl_clust].column_type) {
         | 
| 299 | 
            +
                                case Numeric:
         | 
| 300 | 
            +
                                {
         | 
| 301 | 
            +
                                    cond_clust["column"] = Rcpp::CharacterVector(1, colnames_num[model_outputs.all_clusters[outl_col][outl_clust].col_num]);
         | 
| 302 | 
            +
                                    if (model_outputs.all_clusters[outl_col][outl_clust].col_num < ncols_num_num) {
         | 
| 303 | 
            +
                                        cond_clust["value_this"] = Rcpp::wrap(arr_num[row + model_outputs.all_clusters[outl_col][outl_clust].col_num * nrows]);
         | 
| 304 | 
            +
                                        if (model_outputs.all_clusters[outl_col][outl_clust].split_type != IsNa)
         | 
| 305 | 
            +
                                            cond_clust["decimals"] = Rcpp::wrap(model_outputs.min_decimals_col[model_outputs.all_clusters[outl_col][outl_clust].col_num]);
         | 
| 306 | 
            +
                                    } else if (model_outputs.all_clusters[outl_col][outl_clust].col_num < (ncols_num_num + ncols_date)) {
         | 
| 307 | 
            +
                                        cond_clust["value_this"] = Rcpp::Date(arr_num[row + model_outputs.all_clusters[outl_col][outl_clust].col_num * nrows]
         | 
| 308 | 
            +
                                                                              - 1 + min_date[model_outputs.all_clusters[outl_col][outl_clust].col_num - ncols_num_num]);
         | 
| 309 | 
            +
                                    } else {
         | 
| 310 | 
            +
                                        cond_clust["value_this"] = Rcpp::Datetime(arr_num[row + model_outputs.all_clusters[outl_col][outl_clust].col_num * nrows]
         | 
| 311 | 
            +
                                                                                  - 1 + min_ts[model_outputs.all_clusters[outl_col][outl_clust].col_num - ncols_num_num - ncols_date]);
         | 
| 312 | 
            +
                                    }
         | 
| 313 | 
            +
                                    break;
         | 
| 314 | 
            +
                                }
         | 
| 315 | 
            +
                                    
         | 
| 316 | 
            +
                                case Categorical:
         | 
| 317 | 
            +
                                {
         | 
| 318 | 
            +
                                    cond_clust["column"] = Rcpp::CharacterVector(1, colnames_cat[model_outputs.all_clusters[outl_col][outl_clust].col_num]);
         | 
| 319 | 
            +
                                    if (model_outputs.all_clusters[outl_col][outl_clust].col_num < ncols_cat_cat) {
         | 
| 320 | 
            +
                                        if (arr_cat[row + model_outputs.all_clusters[outl_col][outl_clust].col_num * nrows] >= 0) {
         | 
| 321 | 
            +
                                            cond_clust["value_this"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num]
         | 
| 322 | 
            +
                                                                                                          [arr_cat[row + model_outputs.all_clusters[outl_col][outl_clust].col_num * nrows]]);
         | 
| 323 | 
            +
                                        } else {
         | 
| 324 | 
            +
                                            cond_clust["value_this"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 325 | 
            +
                                        }
         | 
| 326 | 
            +
                                    } else {
         | 
| 327 | 
            +
             | 
| 328 | 
            +
                                        if (arr_cat[row + model_outputs.all_clusters[outl_col][outl_clust].col_num * nrows] >= 0) {
         | 
| 329 | 
            +
                                            cond_clust["value_this"] = Rcpp::wrap((bool)arr_cat[row + model_outputs.all_clusters[outl_col][outl_clust].col_num * nrows]);
         | 
| 330 | 
            +
                                        } else {
         | 
| 331 | 
            +
                                            cond_clust["value_this"] = Rcpp::LogicalVector(1, NA_LOGICAL);
         | 
| 332 | 
            +
                                        }
         | 
| 333 | 
            +
                                    }
         | 
| 334 | 
            +
                                    break;
         | 
| 335 | 
            +
                                }
         | 
| 336 | 
            +
                                
         | 
| 337 | 
            +
                                case Ordinal:
         | 
| 338 | 
            +
                                {
         | 
| 339 | 
            +
                                    cond_clust["column"] = Rcpp::CharacterVector(1, colnames_ord[model_outputs.all_clusters[outl_col][outl_clust].col_num]);
         | 
| 340 | 
            +
                                    if (arr_ord[row + model_outputs.all_clusters[outl_col][outl_clust].col_num * nrows] >= 0) {
         | 
| 341 | 
            +
                                        cond_clust["value_this"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num]
         | 
| 342 | 
            +
                                                                                                      [arr_ord[row + model_outputs.all_clusters[outl_col][outl_clust].col_num * nrows]]);
         | 
| 343 | 
            +
                                    } else {
         | 
| 344 | 
            +
                                        cond_clust["value_this"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 345 | 
            +
                                    }
         | 
| 346 | 
            +
                                    break;
         | 
| 347 | 
            +
                                }
         | 
| 348 | 
            +
                            }
         | 
| 349 | 
            +
                            
         | 
| 350 | 
            +
                            /* add the comparison point */
         | 
| 351 | 
            +
                            switch(model_outputs.all_clusters[outl_col][outl_clust].split_type) {
         | 
| 352 | 
            +
                                
         | 
| 353 | 
            +
                                case IsNa:
         | 
| 354 | 
            +
                                {
         | 
| 355 | 
            +
                                    cond_clust["comparison"] = Rcpp::CharacterVector("is NA");
         | 
| 356 | 
            +
                                    switch(model_outputs.all_clusters[outl_col][outl_clust].column_type) {
         | 
| 357 | 
            +
                                        case Numeric:
         | 
| 358 | 
            +
                                        {
         | 
| 359 | 
            +
                                            /* http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2012-October/004379.html */
         | 
| 360 | 
            +
                                            /* this comment below will prevent bug with Rcpp comments having forward slashes */
         | 
| 361 | 
            +
                                            cond_clust["value_comp"] = Rcpp::wrap(NA_REAL);
         | 
| 362 | 
            +
                                            break;
         | 
| 363 | 
            +
                                        }
         | 
| 364 | 
            +
             | 
| 365 | 
            +
                                        case Categorical:
         | 
| 366 | 
            +
                                        {
         | 
| 367 | 
            +
                                            if (model_outputs.all_clusters[outl_col][outl_clust].col_num < ncols_cat_cat) {
         | 
| 368 | 
            +
                                                cond_clust["value_comp"] = Rcpp::wrap(NA_STRING);
         | 
| 369 | 
            +
                                            } else {
         | 
| 370 | 
            +
                                                cond_clust["value_comp"] = Rcpp::LogicalVector(1, NA_LOGICAL);
         | 
| 371 | 
            +
                                            }
         | 
| 372 | 
            +
                                            break;
         | 
| 373 | 
            +
                                        }
         | 
| 374 | 
            +
             | 
| 375 | 
            +
                                        case Ordinal:
         | 
| 376 | 
            +
                                        {
         | 
| 377 | 
            +
                                            cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 378 | 
            +
                                            break;
         | 
| 379 | 
            +
                                        }
         | 
| 380 | 
            +
                                    }
         | 
| 381 | 
            +
                                    break;
         | 
| 382 | 
            +
                                }
         | 
| 383 | 
            +
                                    
         | 
| 384 | 
            +
                                case LessOrEqual:
         | 
| 385 | 
            +
                                {
         | 
| 386 | 
            +
                                    if (model_outputs.all_clusters[outl_col][outl_clust].column_type == Numeric) {
         | 
| 387 | 
            +
                                        if (model_outputs.all_clusters[outl_col][outl_clust].col_num < ncols_num_num) {
         | 
| 388 | 
            +
                                            cond_clust["comparison"] = Rcpp::CharacterVector("<=");
         | 
| 389 | 
            +
                                            cond_clust["value_comp"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].split_point);
         | 
| 390 | 
            +
                                        } else if (model_outputs.all_clusters[outl_col][outl_clust].col_num < (ncols_num_num + ncols_date)) {
         | 
| 391 | 
            +
                                            cond_clust["comparison"] = Rcpp::CharacterVector("<=");
         | 
| 392 | 
            +
                                            cond_clust["value_comp"] = Rcpp::Date(model_outputs.all_clusters[outl_col][outl_clust].split_point
         | 
| 393 | 
            +
                                                                                  - 1 + min_date[model_outputs.all_clusters[outl_col][outl_clust].col_num - ncols_num_num]);
         | 
| 394 | 
            +
                                        } else {
         | 
| 395 | 
            +
                                            cond_clust["comparison"] = Rcpp::CharacterVector("<=");
         | 
| 396 | 
            +
                                            cond_clust["value_comp"] = Rcpp::Datetime(model_outputs.all_clusters[outl_col][outl_clust].split_point
         | 
| 397 | 
            +
                                                                                      - 1 + min_ts[model_outputs.all_clusters[outl_col][outl_clust].col_num - ncols_num_num - ncols_date]);
         | 
| 398 | 
            +
                                        }
         | 
| 399 | 
            +
                                    } else {
         | 
| 400 | 
            +
                                        tmp_bool = Rcpp::LogicalVector(ord_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num].size(), false);
         | 
| 401 | 
            +
                                        for (int cat = 0; cat <= model_outputs.all_clusters[outl_col][outl_clust].split_lev; cat++) tmp_bool[cat] = true;
         | 
| 402 | 
            +
                                        cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 403 | 
            +
                                        cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(ord_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num][tmp_bool]);
         | 
| 404 | 
            +
                                    }
         | 
| 405 | 
            +
                                    break;
         | 
| 406 | 
            +
                                }
         | 
| 407 | 
            +
                                    
         | 
| 408 | 
            +
                                case Greater:
         | 
| 409 | 
            +
                                {
         | 
| 410 | 
            +
                                    if (model_outputs.all_clusters[outl_col][outl_clust].column_type == Numeric) {
         | 
| 411 | 
            +
                                        if (model_outputs.all_clusters[outl_col][outl_clust].col_num < ncols_num_num) {
         | 
| 412 | 
            +
                                            cond_clust["comparison"] = Rcpp::CharacterVector(">");
         | 
| 413 | 
            +
                                            cond_clust["value_comp"] = Rcpp::wrap(model_outputs.all_clusters[outl_col][outl_clust].split_point);
         | 
| 414 | 
            +
                                        } else if (model_outputs.all_clusters[outl_col][outl_clust].col_num < (ncols_num_num + ncols_date)) {
         | 
| 415 | 
            +
                                            cond_clust["comparison"] = Rcpp::CharacterVector(">");
         | 
| 416 | 
            +
                                            cond_clust["value_comp"] = Rcpp::Date(model_outputs.all_clusters[outl_col][outl_clust].split_point
         | 
| 417 | 
            +
                                                                                  - 1 + min_date[model_outputs.all_clusters[outl_col][outl_clust].col_num - ncols_num_num]);
         | 
| 418 | 
            +
                                        } else {
         | 
| 419 | 
            +
                                            cond_clust["comparison"] = Rcpp::CharacterVector(">");
         | 
| 420 | 
            +
                                            cond_clust["value_comp"] = Rcpp::Datetime(model_outputs.all_clusters[outl_col][outl_clust].split_point
         | 
| 421 | 
            +
                                                                                      - 1 + min_ts[model_outputs.all_clusters[outl_col][outl_clust].col_num - ncols_num_num - ncols_date]);
         | 
| 422 | 
            +
                                        }
         | 
| 423 | 
            +
                                    } else {
         | 
| 424 | 
            +
                                        tmp_bool = Rcpp::LogicalVector(ord_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num].size(), true);
         | 
| 425 | 
            +
                                        for (int cat = 0; cat <= model_outputs.all_clusters[outl_col][outl_clust].split_lev; cat++) tmp_bool[cat] = false;
         | 
| 426 | 
            +
                                        cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 427 | 
            +
                                        cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(ord_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num][tmp_bool]);
         | 
| 428 | 
            +
                                    }
         | 
| 429 | 
            +
                                    break;
         | 
| 430 | 
            +
                                }
         | 
| 431 | 
            +
                                    
         | 
| 432 | 
            +
                                case InSubset:
         | 
| 433 | 
            +
                                {
         | 
| 434 | 
            +
                                    tmp_bool = Rcpp::LogicalVector(model_outputs.all_clusters[outl_col][outl_clust].split_subset.size(), false);
         | 
| 435 | 
            +
                                    for (size_t cat = 0; cat < model_outputs.all_clusters[outl_col][outl_clust].split_subset.size(); cat++) {
         | 
| 436 | 
            +
                                        if (model_outputs.all_clusters[outl_col][outl_clust].split_subset[cat] > 0) {
         | 
| 437 | 
            +
                                            tmp_bool[cat] = true;
         | 
| 438 | 
            +
                                        }
         | 
| 439 | 
            +
                                    }
         | 
| 440 | 
            +
                                    cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 441 | 
            +
                                    cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(cat_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num][tmp_bool]);
         | 
| 442 | 
            +
                                    break;
         | 
| 443 | 
            +
                                }
         | 
| 444 | 
            +
                                    
         | 
| 445 | 
            +
                                case NotInSubset:
         | 
| 446 | 
            +
                                {
         | 
| 447 | 
            +
                                    tmp_bool = Rcpp::LogicalVector(model_outputs.all_clusters[outl_col][outl_clust].split_subset.size(), false);
         | 
| 448 | 
            +
                                    for (size_t cat = 0; cat < model_outputs.all_clusters[outl_col][outl_clust].split_subset.size(); cat++) {
         | 
| 449 | 
            +
                                        if (model_outputs.all_clusters[outl_col][outl_clust].split_subset[cat] == 0) {
         | 
| 450 | 
            +
                                            tmp_bool[cat] = true;
         | 
| 451 | 
            +
                                        }
         | 
| 452 | 
            +
                                    }
         | 
| 453 | 
            +
                                    cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 454 | 
            +
                                    cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(cat_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num][tmp_bool]);
         | 
| 455 | 
            +
                                    break;
         | 
| 456 | 
            +
                                }
         | 
| 457 | 
            +
                                    
         | 
| 458 | 
            +
                                case Equal:
         | 
| 459 | 
            +
                                {
         | 
| 460 | 
            +
                                    if (model_outputs.all_clusters[outl_col][outl_clust].column_type == Categorical) {
         | 
| 461 | 
            +
                                        if (model_outputs.all_clusters[outl_col][outl_clust].col_num < ncols_cat_cat) {
         | 
| 462 | 
            +
                                            cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 463 | 
            +
                                            cond_clust["value_comp"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num]
         | 
| 464 | 
            +
                                                                                                          [model_outputs.all_clusters[outl_col][outl_clust].split_lev]);
         | 
| 465 | 
            +
                                        } else {
         | 
| 466 | 
            +
                                            cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 467 | 
            +
                                            cond_clust["value_comp"] = Rcpp::wrap((bool) model_outputs.all_clusters[outl_col][outl_clust].split_lev);
         | 
| 468 | 
            +
                                        }
         | 
| 469 | 
            +
                                    } else {
         | 
| 470 | 
            +
                                        cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 471 | 
            +
                                        cond_clust["value_comp"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num]
         | 
| 472 | 
            +
                                                                                                      [model_outputs.all_clusters[outl_col][outl_clust].split_lev]);
         | 
| 473 | 
            +
                                    }
         | 
| 474 | 
            +
                                    break;
         | 
| 475 | 
            +
                                }
         | 
| 476 | 
            +
                                    
         | 
| 477 | 
            +
                                case NotEqual:
         | 
| 478 | 
            +
                                {
         | 
| 479 | 
            +
                                    if (model_outputs.all_clusters[outl_col][outl_clust].column_type == Categorical) {
         | 
| 480 | 
            +
                                        if (model_outputs.all_clusters[outl_col][outl_clust].col_num < ncols_cat_cat) {
         | 
| 481 | 
            +
                                            cond_clust["comparison"] = Rcpp::CharacterVector("!=");
         | 
| 482 | 
            +
                                            cond_clust["value_comp"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num]
         | 
| 483 | 
            +
                                                                                                          [model_outputs.all_clusters[outl_col][outl_clust].split_lev]);
         | 
| 484 | 
            +
                                        } else {
         | 
| 485 | 
            +
                                            cond_clust["comparison"] = Rcpp::CharacterVector("!=");
         | 
| 486 | 
            +
                                            cond_clust["value_comp"] = Rcpp::wrap(!((bool)model_outputs.all_clusters[outl_col][outl_clust].split_lev));
         | 
| 487 | 
            +
                                        }
         | 
| 488 | 
            +
                                    } else {
         | 
| 489 | 
            +
                                        cond_clust["comparison"] = Rcpp::CharacterVector("!=");
         | 
| 490 | 
            +
                                        cond_clust["value_comp"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_clusters[outl_col][outl_clust].col_num]
         | 
| 491 | 
            +
                                                                                                      [model_outputs.all_clusters[outl_col][outl_clust].split_lev]);
         | 
| 492 | 
            +
                                    }
         | 
| 493 | 
            +
                                    break;
         | 
| 494 | 
            +
                                }
         | 
| 495 | 
            +
                                
         | 
| 496 | 
            +
                            }
         | 
| 497 | 
            +
                            lst_cond[row] = Rcpp::List::create(Rcpp::clone(cond_clust));
         | 
| 498 | 
            +
             | 
| 499 | 
            +
                            /* finally, add conditions from branches that lead to the cluster */
         | 
| 500 | 
            +
                            curr_tree = model_outputs.outlier_trees_final[row];
         | 
| 501 | 
            +
                            Rcpp::List temp_list;
         | 
| 502 | 
            +
                            while (true) {
         | 
| 503 | 
            +
                                if (curr_tree == 0 || model_outputs.all_trees[outl_col][curr_tree].parent_branch == SubTrees) {
         | 
| 504 | 
            +
                                    break;
         | 
| 505 | 
            +
                                }
         | 
| 506 | 
            +
                                parent_tree = model_outputs.all_trees[outl_col][curr_tree].parent;
         | 
| 507 | 
            +
                                cond_clust = Rcpp::List();
         | 
| 508 | 
            +
             | 
| 509 | 
            +
                                /* when using 'follow_all' */
         | 
| 510 | 
            +
                                if (model_outputs.all_trees[outl_col][parent_tree].all_branches.size() > 0) {
         | 
| 511 | 
            +
             | 
| 512 | 
            +
                                    /* add column name and value */
         | 
| 513 | 
            +
                                    switch(model_outputs.all_trees[outl_col][curr_tree].column_type) {
         | 
| 514 | 
            +
                                        case Numeric:
         | 
| 515 | 
            +
                                        {
         | 
| 516 | 
            +
                                            cond_clust["column"] = Rcpp::as<Rcpp::CharacterVector>(colnames_num[model_outputs.all_trees[outl_col][curr_tree].col_num]);
         | 
| 517 | 
            +
                                            break;
         | 
| 518 | 
            +
                                        }
         | 
| 519 | 
            +
             | 
| 520 | 
            +
                                        case Categorical:
         | 
| 521 | 
            +
                                        {
         | 
| 522 | 
            +
                                            cond_clust["column"] = Rcpp::as<Rcpp::CharacterVector>(colnames_cat[model_outputs.all_trees[outl_col][curr_tree].col_num]);
         | 
| 523 | 
            +
                                            break;
         | 
| 524 | 
            +
                                        }
         | 
| 525 | 
            +
             | 
| 526 | 
            +
                                        case Ordinal:
         | 
| 527 | 
            +
                                        {
         | 
| 528 | 
            +
                                            cond_clust["column"] = Rcpp::as<Rcpp::CharacterVector>(colnames_ord[model_outputs.all_trees[outl_col][curr_tree].col_num]);
         | 
| 529 | 
            +
                                            break;
         | 
| 530 | 
            +
                                        }
         | 
| 531 | 
            +
                                    }
         | 
| 532 | 
            +
             | 
| 533 | 
            +
                                    /* add conditions from tree */
         | 
| 534 | 
            +
                                    switch(model_outputs.all_trees[outl_col][curr_tree].column_type) {
         | 
| 535 | 
            +
             | 
| 536 | 
            +
                                        case Numeric:
         | 
| 537 | 
            +
                                        {
         | 
| 538 | 
            +
                                            /* add decimals if appropriate */
         | 
| 539 | 
            +
                                            if (
         | 
| 540 | 
            +
                                                model_outputs.all_trees[outl_col][curr_tree].col_num < ncols_num_num &&
         | 
| 541 | 
            +
                                                model_outputs.all_trees[outl_col][curr_tree].split_this_branch != IsNa
         | 
| 542 | 
            +
                                                )
         | 
| 543 | 
            +
                                            {
         | 
| 544 | 
            +
                                                cond_clust["decimals"] = Rcpp::wrap(model_outputs.min_decimals_col[model_outputs.all_trees[outl_col][curr_tree].col_num]);
         | 
| 545 | 
            +
                                            }
         | 
| 546 | 
            +
             | 
| 547 | 
            +
                                            /* then conditions */
         | 
| 548 | 
            +
                                            switch(model_outputs.all_trees[outl_col][curr_tree].split_this_branch) {
         | 
| 549 | 
            +
             | 
| 550 | 
            +
                                                case IsNa:
         | 
| 551 | 
            +
                                                {
         | 
| 552 | 
            +
                                                    cond_clust["value_this"] = Rcpp::wrap(NA_REAL);
         | 
| 553 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("is NA");
         | 
| 554 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::wrap(NA_REAL);
         | 
| 555 | 
            +
                                                    break;
         | 
| 556 | 
            +
                                                }
         | 
| 557 | 
            +
             | 
| 558 | 
            +
                                                case LessOrEqual:
         | 
| 559 | 
            +
                                                {
         | 
| 560 | 
            +
                                                    if (model_outputs.all_trees[outl_col][curr_tree].col_num < ncols_num_num) {
         | 
| 561 | 
            +
                                                        cond_clust["value_this"] = Rcpp::wrap(arr_num[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]);
         | 
| 562 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("<=");
         | 
| 563 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::wrap(model_outputs.all_trees[outl_col][curr_tree].split_point);
         | 
| 564 | 
            +
                                                    } else if (model_outputs.all_trees[outl_col][curr_tree].col_num < (ncols_num_num + ncols_date)) {
         | 
| 565 | 
            +
                                                        cond_clust["value_this"] = Rcpp::Date(arr_num[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]
         | 
| 566 | 
            +
                                                                                              - 1 + min_date[model_outputs.all_trees[outl_col][curr_tree].col_num - ncols_num_num]);
         | 
| 567 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("<=");
         | 
| 568 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::Date(model_outputs.all_trees[outl_col][curr_tree].split_point
         | 
| 569 | 
            +
                                                                                              - 1 + min_date[model_outputs.all_trees[outl_col][curr_tree].col_num - ncols_num_num]);
         | 
| 570 | 
            +
                                                    } else {
         | 
| 571 | 
            +
                                                        cond_clust["value_this"] = Rcpp::Datetime(arr_num[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]
         | 
| 572 | 
            +
                                                                                                  - 1 + min_ts[model_outputs.all_trees[outl_col][curr_tree].col_num - ncols_num_num - ncols_date]);
         | 
| 573 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("<=");
         | 
| 574 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::Datetime(model_outputs.all_trees[outl_col][curr_tree].split_point
         | 
| 575 | 
            +
                                                                                                  - 1 + min_ts[model_outputs.all_trees[outl_col][curr_tree].col_num - ncols_num_num - ncols_date]);
         | 
| 576 | 
            +
                                                    }
         | 
| 577 | 
            +
                                                    break;
         | 
| 578 | 
            +
                                                }
         | 
| 579 | 
            +
             | 
| 580 | 
            +
                                                case Greater:
         | 
| 581 | 
            +
                                                {
         | 
| 582 | 
            +
                                                    if (model_outputs.all_trees[outl_col][curr_tree].col_num < ncols_num_num) {
         | 
| 583 | 
            +
                                                        cond_clust["value_this"] = Rcpp::wrap(arr_num[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]);
         | 
| 584 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector(">");
         | 
| 585 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::wrap(model_outputs.all_trees[outl_col][curr_tree].split_point);
         | 
| 586 | 
            +
                                                    } else if (model_outputs.all_trees[outl_col][curr_tree].col_num < (ncols_num_num + ncols_date)) {
         | 
| 587 | 
            +
                                                        cond_clust["value_this"] = Rcpp::Date(arr_num[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]
         | 
| 588 | 
            +
                                                                                              - 1 + min_date[model_outputs.all_trees[outl_col][curr_tree].col_num - ncols_num_num]);
         | 
| 589 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector(">");
         | 
| 590 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::Date(model_outputs.all_trees[outl_col][curr_tree].split_point
         | 
| 591 | 
            +
                                                                                              - 1 + min_date[model_outputs.all_trees[outl_col][curr_tree].col_num - ncols_num_num]);
         | 
| 592 | 
            +
                                                    } else {
         | 
| 593 | 
            +
                                                        cond_clust["value_this"] = Rcpp::Datetime(arr_num[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]
         | 
| 594 | 
            +
                                                                                                  - 1 + min_ts[model_outputs.all_trees[outl_col][curr_tree].col_num - ncols_num_num - ncols_date]);
         | 
| 595 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector(">");
         | 
| 596 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::Datetime(model_outputs.all_trees[outl_col][curr_tree].split_point
         | 
| 597 | 
            +
                                                                                                  - 1 + min_ts[model_outputs.all_trees[outl_col][curr_tree].col_num - ncols_num_num - ncols_date]);
         | 
| 598 | 
            +
                                                    }
         | 
| 599 | 
            +
                                                    break;
         | 
| 600 | 
            +
                                                }
         | 
| 601 | 
            +
             | 
| 602 | 
            +
                                            }
         | 
| 603 | 
            +
                                            break;
         | 
| 604 | 
            +
                                        }
         | 
| 605 | 
            +
             | 
| 606 | 
            +
                                        case Categorical:
         | 
| 607 | 
            +
                                        {
         | 
| 608 | 
            +
                                            switch(model_outputs.all_trees[outl_col][curr_tree].split_this_branch) {
         | 
| 609 | 
            +
             | 
| 610 | 
            +
                                                case IsNa:
         | 
| 611 | 
            +
                                                {
         | 
| 612 | 
            +
                                                    if (model_outputs.all_trees[outl_col][curr_tree].col_num < ncols_cat_cat) {
         | 
| 613 | 
            +
                                                        cond_clust["value_this"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 614 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("is NA");
         | 
| 615 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 616 | 
            +
                                                    } else {
         | 
| 617 | 
            +
                                                        cond_clust["value_this"] = Rcpp::LogicalVector(1, NA_LOGICAL);
         | 
| 618 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("is NA");
         | 
| 619 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::LogicalVector(1, NA_LOGICAL);
         | 
| 620 | 
            +
                                                    }
         | 
| 621 | 
            +
                                                    break;
         | 
| 622 | 
            +
                                                }
         | 
| 623 | 
            +
             | 
| 624 | 
            +
                                                case InSubset:
         | 
| 625 | 
            +
                                                {
         | 
| 626 | 
            +
                                                    if (model_outputs.all_trees[outl_col][curr_tree].col_num < ncols_cat_cat) {
         | 
| 627 | 
            +
                                                        tmp_bool = Rcpp::LogicalVector(model_outputs.all_trees[outl_col][curr_tree].split_subset.size(), false);
         | 
| 628 | 
            +
                                                        for (size_t cat = 0; cat < model_outputs.all_trees[outl_col][curr_tree].split_subset.size(); cat++) {
         | 
| 629 | 
            +
                                                            if (model_outputs.all_trees[outl_col][curr_tree].split_subset[cat] > 0) {
         | 
| 630 | 
            +
                                                                tmp_bool[cat] = true;
         | 
| 631 | 
            +
                                                            }
         | 
| 632 | 
            +
                                                        }
         | 
| 633 | 
            +
                                                        cond_clust["value_this"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 634 | 
            +
                                                                                                                      [arr_cat[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]]);
         | 
| 635 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 636 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(cat_levels[model_outputs.all_trees[outl_col][curr_tree].col_num][tmp_bool]);
         | 
| 637 | 
            +
                                                    } else {
         | 
| 638 | 
            +
                                                        cond_clust["value_this"] = Rcpp::wrap((bool) arr_cat[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]);
         | 
| 639 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 640 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::wrap((bool) model_outputs.all_trees[outl_col][curr_tree].split_subset[1]);
         | 
| 641 | 
            +
                                                    }
         | 
| 642 | 
            +
                                                    break;
         | 
| 643 | 
            +
                                                }
         | 
| 644 | 
            +
             | 
| 645 | 
            +
                                                case NotInSubset:
         | 
| 646 | 
            +
                                                {
         | 
| 647 | 
            +
                                                    if (model_outputs.all_trees[outl_col][curr_tree].col_num < ncols_cat_cat) {
         | 
| 648 | 
            +
                                                        tmp_bool = Rcpp::LogicalVector(model_outputs.all_trees[outl_col][curr_tree].split_subset.size(), true);
         | 
| 649 | 
            +
                                                        for (size_t cat = 0; cat < model_outputs.all_trees[outl_col][curr_tree].split_subset.size(); cat++) {
         | 
| 650 | 
            +
                                                            if (model_outputs.all_trees[outl_col][curr_tree].split_subset[cat] > 0) {
         | 
| 651 | 
            +
                                                                tmp_bool[cat] = false;
         | 
| 652 | 
            +
                                                            }
         | 
| 653 | 
            +
                                                        }
         | 
| 654 | 
            +
                                                        cond_clust["value_this"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 655 | 
            +
                                                                                                                      [arr_cat[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]]);
         | 
| 656 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 657 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(cat_levels[model_outputs.all_trees[outl_col][curr_tree].col_num][tmp_bool]);
         | 
| 658 | 
            +
                                                    } else {
         | 
| 659 | 
            +
                                                        cond_clust["value_this"] = Rcpp::wrap((bool) arr_cat[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]);
         | 
| 660 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 661 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::wrap((bool) model_outputs.all_trees[outl_col][curr_tree].split_subset[0]);
         | 
| 662 | 
            +
                                                    }
         | 
| 663 | 
            +
                                                    break;
         | 
| 664 | 
            +
                                                }
         | 
| 665 | 
            +
             | 
| 666 | 
            +
                                                case Equal:
         | 
| 667 | 
            +
                                                {
         | 
| 668 | 
            +
                                                    if (model_outputs.all_trees[outl_col][curr_tree].col_num < ncols_cat_cat) {
         | 
| 669 | 
            +
                                                        cond_clust["value_this"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 670 | 
            +
                                                                                                                      [arr_cat[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]]);
         | 
| 671 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 672 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 673 | 
            +
                                                                                                                      [model_outputs.all_trees[outl_col][curr_tree].split_lev]);
         | 
| 674 | 
            +
                                                    } else {
         | 
| 675 | 
            +
                                                        cond_clust["value_this"] = Rcpp::wrap((bool) arr_cat[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]);
         | 
| 676 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 677 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::wrap((bool) model_outputs.all_trees[outl_col][curr_tree].split_lev);
         | 
| 678 | 
            +
                                                    }
         | 
| 679 | 
            +
                                                    break;
         | 
| 680 | 
            +
                                                }
         | 
| 681 | 
            +
             | 
| 682 | 
            +
                                                case NotEqual:
         | 
| 683 | 
            +
                                                {
         | 
| 684 | 
            +
                                                    if (model_outputs.all_trees[outl_col][curr_tree].col_num < ncols_cat_cat) {
         | 
| 685 | 
            +
                                                        cond_clust["value_this"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 686 | 
            +
                                                                                                                      [arr_cat[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]]);
         | 
| 687 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("!=");
         | 
| 688 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 689 | 
            +
                                                                                                                      [model_outputs.all_trees[outl_col][curr_tree].split_lev]);
         | 
| 690 | 
            +
                                                    } else {
         | 
| 691 | 
            +
                                                        cond_clust["value_this"] = Rcpp::wrap((bool) arr_cat[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]);
         | 
| 692 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 693 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::wrap((bool) !model_outputs.all_trees[outl_col][curr_tree].split_lev);
         | 
| 694 | 
            +
                                                        /* note: booleans should always get converted to Equals, this code is redundant */
         | 
| 695 | 
            +
                                                    }
         | 
| 696 | 
            +
                                                    break;
         | 
| 697 | 
            +
                                                }
         | 
| 698 | 
            +
             | 
| 699 | 
            +
                                            }
         | 
| 700 | 
            +
                                            break;
         | 
| 701 | 
            +
                                        }
         | 
| 702 | 
            +
             | 
| 703 | 
            +
                                        case Ordinal:
         | 
| 704 | 
            +
                                        {
         | 
| 705 | 
            +
                                            switch(model_outputs.all_trees[outl_col][curr_tree].split_this_branch) {
         | 
| 706 | 
            +
             | 
| 707 | 
            +
                                                case IsNa:
         | 
| 708 | 
            +
                                                {
         | 
| 709 | 
            +
                                                    cond_clust["value_this"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 710 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("is NA");
         | 
| 711 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 712 | 
            +
                                                    break;
         | 
| 713 | 
            +
                                                }
         | 
| 714 | 
            +
             | 
| 715 | 
            +
                                                case LessOrEqual:
         | 
| 716 | 
            +
                                                {
         | 
| 717 | 
            +
                                                    tmp_bool = Rcpp::LogicalVector(ord_levels[model_outputs.all_trees[outl_col][curr_tree].col_num].size(), false);
         | 
| 718 | 
            +
                                                    for (int cat = 0; cat <= model_outputs.all_trees[outl_col][curr_tree].split_lev; cat++) {
         | 
| 719 | 
            +
                                                        tmp_bool[cat] = true;
         | 
| 720 | 
            +
                                                    }
         | 
| 721 | 
            +
                                                    cond_clust["value_this"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 722 | 
            +
                                                                                                                  [arr_ord[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]]);
         | 
| 723 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 724 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(ord_levels[model_outputs.all_trees[outl_col][curr_tree].col_num][tmp_bool]);
         | 
| 725 | 
            +
                                                    break;
         | 
| 726 | 
            +
                                                }
         | 
| 727 | 
            +
             | 
| 728 | 
            +
                                                case Greater:
         | 
| 729 | 
            +
                                                {
         | 
| 730 | 
            +
                                                    tmp_bool = Rcpp::LogicalVector(ord_levels[model_outputs.all_trees[outl_col][curr_tree].col_num].size(), true);
         | 
| 731 | 
            +
                                                    for (int cat = 0; cat <= model_outputs.all_trees[outl_col][curr_tree].split_lev; cat++) {
         | 
| 732 | 
            +
                                                        tmp_bool[cat] = false;
         | 
| 733 | 
            +
                                                    }
         | 
| 734 | 
            +
                                                    cond_clust["value_this"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 735 | 
            +
                                                                                                                  [arr_ord[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]]);
         | 
| 736 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 737 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(ord_levels[model_outputs.all_trees[outl_col][curr_tree].col_num][tmp_bool]);
         | 
| 738 | 
            +
                                                    break;
         | 
| 739 | 
            +
                                                }
         | 
| 740 | 
            +
             | 
| 741 | 
            +
                                                case Equal:
         | 
| 742 | 
            +
                                                {
         | 
| 743 | 
            +
                                                    cond_clust["value_this"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 744 | 
            +
                                                                                                                  [arr_ord[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]]);
         | 
| 745 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 746 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 747 | 
            +
                                                                                                                  [model_outputs.all_trees[outl_col][curr_tree].split_lev]);
         | 
| 748 | 
            +
                                                    break;
         | 
| 749 | 
            +
                                                }
         | 
| 750 | 
            +
             | 
| 751 | 
            +
                                                case NotEqual:
         | 
| 752 | 
            +
                                                {
         | 
| 753 | 
            +
                                                    cond_clust["value_this"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 754 | 
            +
                                                                                                                  [arr_ord[row + model_outputs.all_trees[outl_col][curr_tree].col_num * nrows]]);
         | 
| 755 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("!=");
         | 
| 756 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][curr_tree].col_num]
         | 
| 757 | 
            +
                                                                                                                  [model_outputs.all_trees[outl_col][curr_tree].split_lev]);
         | 
| 758 | 
            +
                                                    break;
         | 
| 759 | 
            +
                                                }
         | 
| 760 | 
            +
             | 
| 761 | 
            +
                                            }
         | 
| 762 | 
            +
                                            break;
         | 
| 763 | 
            +
                                        }
         | 
| 764 | 
            +
             | 
| 765 | 
            +
                                    }
         | 
| 766 | 
            +
                                }
         | 
| 767 | 
            +
             | 
| 768 | 
            +
                                /* regular case (no 'follow_all') */
         | 
| 769 | 
            +
                                else
         | 
| 770 | 
            +
                                {
         | 
| 771 | 
            +
             | 
| 772 | 
            +
                                    /* add column name and value */
         | 
| 773 | 
            +
                                    switch(model_outputs.all_trees[outl_col][parent_tree].column_type) {
         | 
| 774 | 
            +
                                        case Numeric:
         | 
| 775 | 
            +
                                        {
         | 
| 776 | 
            +
                                            cond_clust["column"] = Rcpp::as<Rcpp::CharacterVector>(colnames_num[model_outputs.all_trees[outl_col][parent_tree].col_num]);
         | 
| 777 | 
            +
                                            /* add decimals if appropriate */
         | 
| 778 | 
            +
                                            if (
         | 
| 779 | 
            +
                                                model_outputs.all_trees[outl_col][parent_tree].col_num < ncols_num_num &&
         | 
| 780 | 
            +
                                                model_outputs.all_trees[outl_col][curr_tree].parent_branch != IsNa
         | 
| 781 | 
            +
                                                )
         | 
| 782 | 
            +
                                            {
         | 
| 783 | 
            +
                                                cond_clust["decimals"] = Rcpp::wrap(model_outputs.min_decimals_col[model_outputs.all_trees[outl_col][parent_tree].col_num]);
         | 
| 784 | 
            +
                                            }
         | 
| 785 | 
            +
                                            break;
         | 
| 786 | 
            +
                                        }
         | 
| 787 | 
            +
             | 
| 788 | 
            +
                                        case Categorical:
         | 
| 789 | 
            +
                                        {
         | 
| 790 | 
            +
                                            cond_clust["column"] = Rcpp::as<Rcpp::CharacterVector>(colnames_cat[model_outputs.all_trees[outl_col][parent_tree].col_num]);
         | 
| 791 | 
            +
                                            break;
         | 
| 792 | 
            +
                                        }
         | 
| 793 | 
            +
             | 
| 794 | 
            +
                                        case Ordinal:
         | 
| 795 | 
            +
                                        {
         | 
| 796 | 
            +
                                            cond_clust["column"] = Rcpp::as<Rcpp::CharacterVector>(colnames_ord[model_outputs.all_trees[outl_col][parent_tree].col_num]);
         | 
| 797 | 
            +
                                            break;
         | 
| 798 | 
            +
                                        }
         | 
| 799 | 
            +
                                    }
         | 
| 800 | 
            +
             | 
| 801 | 
            +
             | 
| 802 | 
            +
                                    /* add conditions from tree */
         | 
| 803 | 
            +
                                    switch(model_outputs.all_trees[outl_col][curr_tree].parent_branch) {
         | 
| 804 | 
            +
             | 
| 805 | 
            +
             | 
| 806 | 
            +
                                        case IsNa:
         | 
| 807 | 
            +
                                        {
         | 
| 808 | 
            +
                                            switch(model_outputs.all_trees[outl_col][parent_tree].column_type) {
         | 
| 809 | 
            +
                                                case Numeric:
         | 
| 810 | 
            +
                                                {
         | 
| 811 | 
            +
                                                    cond_clust["value_this"] = Rcpp::wrap(NA_REAL);
         | 
| 812 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("is NA");
         | 
| 813 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::wrap(NA_REAL);
         | 
| 814 | 
            +
                                                    break;
         | 
| 815 | 
            +
                                                }
         | 
| 816 | 
            +
             | 
| 817 | 
            +
                                                case Categorical:
         | 
| 818 | 
            +
                                                {
         | 
| 819 | 
            +
                                                    if (model_outputs.all_trees[outl_col][parent_tree].col_num < ncols_cat_cat) {
         | 
| 820 | 
            +
                                                        cond_clust["value_this"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 821 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("is NA");
         | 
| 822 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 823 | 
            +
                                                    } else {
         | 
| 824 | 
            +
                                                        cond_clust["value_this"] = Rcpp::LogicalVector(1, NA_LOGICAL);
         | 
| 825 | 
            +
                                                        cond_clust["comparison"] = Rcpp::CharacterVector("is NA");
         | 
| 826 | 
            +
                                                        cond_clust["value_comp"] = Rcpp::LogicalVector(1, NA_LOGICAL);
         | 
| 827 | 
            +
                                                    }
         | 
| 828 | 
            +
                                                    break;
         | 
| 829 | 
            +
                                                }
         | 
| 830 | 
            +
             | 
| 831 | 
            +
                                                case Ordinal:
         | 
| 832 | 
            +
                                                {
         | 
| 833 | 
            +
                                                    cond_clust["value_this"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 834 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("is NA");
         | 
| 835 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(NA_STRING);
         | 
| 836 | 
            +
                                                    break;
         | 
| 837 | 
            +
                                                }
         | 
| 838 | 
            +
                                            }
         | 
| 839 | 
            +
                                            break;
         | 
| 840 | 
            +
                                        }
         | 
| 841 | 
            +
             | 
| 842 | 
            +
                                        case LessOrEqual:
         | 
| 843 | 
            +
                                        {
         | 
| 844 | 
            +
                                            if (model_outputs.all_trees[outl_col][parent_tree].column_type == Numeric) {
         | 
| 845 | 
            +
                                                if (model_outputs.all_trees[outl_col][parent_tree].col_num < ncols_num_num) {
         | 
| 846 | 
            +
                                                    cond_clust["value_this"] = Rcpp::wrap(arr_num[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]);
         | 
| 847 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("<=");
         | 
| 848 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::wrap(model_outputs.all_trees[outl_col][parent_tree].split_point);
         | 
| 849 | 
            +
                                                } else if (model_outputs.all_trees[outl_col][parent_tree].col_num < (ncols_num_num + ncols_date)) {
         | 
| 850 | 
            +
                                                    cond_clust["value_this"] = Rcpp::Date(arr_num[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]
         | 
| 851 | 
            +
                                                                                          - 1 + min_date[model_outputs.all_trees[outl_col][parent_tree].col_num - ncols_num_num]);
         | 
| 852 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("<=");
         | 
| 853 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::Date(model_outputs.all_trees[outl_col][parent_tree].split_point
         | 
| 854 | 
            +
                                                                                          - 1 + min_date[model_outputs.all_trees[outl_col][parent_tree].col_num - ncols_num_num]);
         | 
| 855 | 
            +
                                                } else {
         | 
| 856 | 
            +
                                                    cond_clust["value_this"] = Rcpp::Datetime(arr_num[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]
         | 
| 857 | 
            +
                                                                                              - 1 + min_ts[model_outputs.all_trees[outl_col][parent_tree].col_num
         | 
| 858 | 
            +
                                                                                                           - ncols_num_num - ncols_date]);
         | 
| 859 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("<=");
         | 
| 860 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::Datetime(model_outputs.all_trees[outl_col][parent_tree].split_point
         | 
| 861 | 
            +
                                                                                              - 1 + min_ts[model_outputs.all_trees[outl_col][parent_tree].col_num
         | 
| 862 | 
            +
                                                                                                           - ncols_num_num - ncols_date]);
         | 
| 863 | 
            +
                                                }
         | 
| 864 | 
            +
                                            } else {
         | 
| 865 | 
            +
                                                tmp_bool = Rcpp::LogicalVector(ord_levels[model_outputs.all_trees[outl_col][parent_tree].col_num].size(), false);
         | 
| 866 | 
            +
                                                for (int cat = 0; cat <= model_outputs.all_trees[outl_col][parent_tree].split_lev; cat++) tmp_bool[cat] = true;
         | 
| 867 | 
            +
                                                cond_clust["value_this"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 868 | 
            +
                                                                                                              [arr_ord[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]]);
         | 
| 869 | 
            +
                                                cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 870 | 
            +
                                                cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(ord_levels[model_outputs.all_trees[outl_col][parent_tree].col_num][tmp_bool]);
         | 
| 871 | 
            +
                                            }
         | 
| 872 | 
            +
                                            break;
         | 
| 873 | 
            +
                                        }
         | 
| 874 | 
            +
             | 
| 875 | 
            +
                                        case Greater:
         | 
| 876 | 
            +
                                        {
         | 
| 877 | 
            +
                                            if (model_outputs.all_trees[outl_col][parent_tree].column_type == Numeric) {
         | 
| 878 | 
            +
                                                if (model_outputs.all_trees[outl_col][parent_tree].col_num < ncols_num_num) {
         | 
| 879 | 
            +
                                                    cond_clust["value_this"] = Rcpp::wrap(arr_num[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]);
         | 
| 880 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector(">");
         | 
| 881 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::wrap(model_outputs.all_trees[outl_col][parent_tree].split_point);
         | 
| 882 | 
            +
                                                } else if (model_outputs.all_trees[outl_col][parent_tree].col_num < (ncols_num_num + ncols_date)) {
         | 
| 883 | 
            +
                                                    cond_clust["value_this"] = Rcpp::Date(arr_num[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]
         | 
| 884 | 
            +
                                                                                          - 1 + min_date[model_outputs.all_trees[outl_col][parent_tree].col_num - ncols_num_num]);
         | 
| 885 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector(">");
         | 
| 886 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::Date(model_outputs.all_trees[outl_col][parent_tree].split_point
         | 
| 887 | 
            +
                                                                                          - 1 + min_date[model_outputs.all_trees[outl_col][parent_tree].col_num - ncols_num_num]);
         | 
| 888 | 
            +
                                                } else {
         | 
| 889 | 
            +
                                                    cond_clust["value_this"] = Rcpp::Datetime(arr_num[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]
         | 
| 890 | 
            +
                                                                                              - 1 + min_ts[model_outputs.all_trees[outl_col][parent_tree].col_num
         | 
| 891 | 
            +
                                                                                                             - ncols_num_num - ncols_date]);
         | 
| 892 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector(">");
         | 
| 893 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::Datetime(model_outputs.all_trees[outl_col][parent_tree].split_point
         | 
| 894 | 
            +
                                                                                              - 1 + min_ts[model_outputs.all_trees[outl_col][parent_tree].col_num
         | 
| 895 | 
            +
                                                                                                             - ncols_num_num - ncols_date]);
         | 
| 896 | 
            +
                                                }
         | 
| 897 | 
            +
                                            } else {
         | 
| 898 | 
            +
                                                tmp_bool = Rcpp::LogicalVector(ord_levels[model_outputs.all_trees[outl_col][parent_tree].col_num].size(), true);
         | 
| 899 | 
            +
                                                for (int cat = 0; cat <= model_outputs.all_trees[outl_col][parent_tree].split_lev; cat++) tmp_bool[cat] = false;
         | 
| 900 | 
            +
                                                cond_clust["value_this"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 901 | 
            +
                                                                                                              [arr_ord[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]]);
         | 
| 902 | 
            +
                                                cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 903 | 
            +
                                                cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(ord_levels[model_outputs.all_trees[outl_col][parent_tree].col_num][tmp_bool]);
         | 
| 904 | 
            +
                                            }
         | 
| 905 | 
            +
                                            break;
         | 
| 906 | 
            +
                                        }
         | 
| 907 | 
            +
             | 
| 908 | 
            +
                                        case InSubset:
         | 
| 909 | 
            +
                                        {
         | 
| 910 | 
            +
                                            if (model_outputs.all_trees[outl_col][parent_tree].col_num < ncols_cat_cat) {
         | 
| 911 | 
            +
                                                tmp_bool = Rcpp::LogicalVector(cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num].size(), false);
         | 
| 912 | 
            +
                                                for (size_t cat = 0; cat < model_outputs.all_trees[outl_col][parent_tree].split_subset.size(); cat++) {
         | 
| 913 | 
            +
                                                    if (model_outputs.all_trees[outl_col][parent_tree].split_subset[cat] > 0) {
         | 
| 914 | 
            +
                                                        tmp_bool[cat] = true;
         | 
| 915 | 
            +
                                                    }
         | 
| 916 | 
            +
                                                }
         | 
| 917 | 
            +
                                                cond_clust["value_this"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 918 | 
            +
                                                                                                              [arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]]);
         | 
| 919 | 
            +
                                                cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 920 | 
            +
                                                cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num][tmp_bool]);
         | 
| 921 | 
            +
                                            } else {
         | 
| 922 | 
            +
                                                cond_clust["value_this"] = Rcpp::wrap((bool) arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]);
         | 
| 923 | 
            +
                                                cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 924 | 
            +
                                                cond_clust["value_comp"] = Rcpp::wrap((bool) model_outputs.all_trees[outl_col][parent_tree].split_subset[1]);
         | 
| 925 | 
            +
                                            }
         | 
| 926 | 
            +
                                            break;
         | 
| 927 | 
            +
                                        }
         | 
| 928 | 
            +
             | 
| 929 | 
            +
                                        case NotInSubset:
         | 
| 930 | 
            +
                                        {
         | 
| 931 | 
            +
                                            if (model_outputs.all_trees[outl_col][parent_tree].col_num < ncols_cat_cat) {
         | 
| 932 | 
            +
                                                tmp_bool = Rcpp::LogicalVector(cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num].size(), false);
         | 
| 933 | 
            +
                                                for (size_t cat = 0; cat < model_outputs.all_trees[outl_col][parent_tree].split_subset.size(); cat++) {
         | 
| 934 | 
            +
                                                    if (model_outputs.all_trees[outl_col][parent_tree].split_subset[cat] == 0) {
         | 
| 935 | 
            +
                                                        tmp_bool[cat] = true;
         | 
| 936 | 
            +
                                                    }
         | 
| 937 | 
            +
                                                }
         | 
| 938 | 
            +
                                                cond_clust["value_this"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 939 | 
            +
                                                                                                              [arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]]);
         | 
| 940 | 
            +
                                                cond_clust["comparison"] = Rcpp::CharacterVector("in");
         | 
| 941 | 
            +
                                                cond_clust["value_comp"] = Rcpp::as<Rcpp::CharacterVector>(cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num][tmp_bool]);
         | 
| 942 | 
            +
                                            } else {
         | 
| 943 | 
            +
                                                cond_clust["value_this"] = Rcpp::wrap((bool) arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]);
         | 
| 944 | 
            +
                                                cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 945 | 
            +
                                                cond_clust["value_comp"] = Rcpp::wrap((bool) model_outputs.all_trees[outl_col][parent_tree].split_subset[0]);
         | 
| 946 | 
            +
                                            }
         | 
| 947 | 
            +
                                            break;
         | 
| 948 | 
            +
                                        }
         | 
| 949 | 
            +
             | 
| 950 | 
            +
                                        case Equal:
         | 
| 951 | 
            +
                                        {
         | 
| 952 | 
            +
                                            if (model_outputs.all_trees[outl_col][parent_tree].column_type == Categorical) {
         | 
| 953 | 
            +
                                                if (model_outputs.all_trees[outl_col][parent_tree].col_num < ncols_cat_cat) {
         | 
| 954 | 
            +
                                                    cond_clust["value_this"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 955 | 
            +
                                                                                                                  [arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]]);
         | 
| 956 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 957 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 958 | 
            +
                                                                                                                  [model_outputs.all_trees[outl_col][parent_tree].split_lev]);
         | 
| 959 | 
            +
                                                } else {
         | 
| 960 | 
            +
                                                    cond_clust["value_this"] = Rcpp::wrap((bool) arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]);
         | 
| 961 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 962 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::wrap((bool) model_outputs.all_trees[outl_col][parent_tree].split_subset[1]);
         | 
| 963 | 
            +
                                                }
         | 
| 964 | 
            +
                                            } else {
         | 
| 965 | 
            +
                                                cond_clust["value_this"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 966 | 
            +
                                                                                                              [arr_ord[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]]);
         | 
| 967 | 
            +
                                                cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 968 | 
            +
                                                cond_clust["value_comp"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 969 | 
            +
                                                                                                              [model_outputs.all_trees[outl_col][parent_tree].split_lev]);
         | 
| 970 | 
            +
                                            }
         | 
| 971 | 
            +
                                            break;
         | 
| 972 | 
            +
                                        }
         | 
| 973 | 
            +
             | 
| 974 | 
            +
                                        case NotEqual:
         | 
| 975 | 
            +
                                        {
         | 
| 976 | 
            +
                                            if (model_outputs.all_trees[outl_col][parent_tree].column_type == Categorical) {
         | 
| 977 | 
            +
                                                if (model_outputs.all_trees[outl_col][parent_tree].col_num < ncols_cat_cat) {
         | 
| 978 | 
            +
                                                    cond_clust["value_this"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 979 | 
            +
                                                                                                                  [arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]]);
         | 
| 980 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("!=");
         | 
| 981 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 982 | 
            +
                                                                                                                  [model_outputs.all_trees[outl_col][parent_tree].split_lev]);
         | 
| 983 | 
            +
                                                } else {
         | 
| 984 | 
            +
                                                    cond_clust["value_this"] = Rcpp::wrap((bool) arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]);
         | 
| 985 | 
            +
                                                    cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 986 | 
            +
                                                    cond_clust["value_comp"] = Rcpp::wrap((bool) model_outputs.all_trees[outl_col][parent_tree].split_subset[0]);
         | 
| 987 | 
            +
                                                }
         | 
| 988 | 
            +
                                            } else {
         | 
| 989 | 
            +
                                                cond_clust["value_this"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 990 | 
            +
                                                                                                              [arr_ord[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]]);
         | 
| 991 | 
            +
                                                cond_clust["comparison"] = Rcpp::CharacterVector("!=");
         | 
| 992 | 
            +
                                                cond_clust["value_comp"] = Rcpp::CharacterVector(1, ord_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 993 | 
            +
                                                                                                              [model_outputs.all_trees[outl_col][parent_tree].split_lev]);
         | 
| 994 | 
            +
                                            }
         | 
| 995 | 
            +
                                            break;
         | 
| 996 | 
            +
                                        }
         | 
| 997 | 
            +
             | 
| 998 | 
            +
                                        case SingleCateg:
         | 
| 999 | 
            +
                                        {
         | 
| 1000 | 
            +
                                            if (model_outputs.all_trees[outl_col][parent_tree].col_num < ncols_cat_cat) {
         | 
| 1001 | 
            +
                                                cond_clust["value_this"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 1002 | 
            +
                                                                                                              [arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]]);
         | 
| 1003 | 
            +
                                                cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 1004 | 
            +
                                                cond_clust["value_comp"] = Rcpp::CharacterVector(1, cat_levels[model_outputs.all_trees[outl_col][parent_tree].col_num]
         | 
| 1005 | 
            +
                                                                                                              [arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]]);
         | 
| 1006 | 
            +
                                            } else {
         | 
| 1007 | 
            +
                                                cond_clust["value_this"] = Rcpp::wrap((bool) arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]);
         | 
| 1008 | 
            +
                                                cond_clust["comparison"] = Rcpp::CharacterVector("=");
         | 
| 1009 | 
            +
                                                cond_clust["value_comp"] = Rcpp::wrap((bool) arr_cat[row + model_outputs.all_trees[outl_col][parent_tree].col_num * nrows]);
         | 
| 1010 | 
            +
                                            }
         | 
| 1011 | 
            +
                                            break;
         | 
| 1012 | 
            +
                                        }
         | 
| 1013 | 
            +
             | 
| 1014 | 
            +
                                    }
         | 
| 1015 | 
            +
             | 
| 1016 | 
            +
                                    
         | 
| 1017 | 
            +
                                }
         | 
| 1018 | 
            +
             | 
| 1019 | 
            +
                                /* https://github.com/RcppCore/Rcpp/issues/979 */
         | 
| 1020 | 
            +
                                /* this comment below will fix Rcpp issue with having slashes in the comment above */
         | 
| 1021 | 
            +
                                temp_list = lst_cond[row];
         | 
| 1022 | 
            +
                                temp_list.push_back(Rcpp::clone(cond_clust));
         | 
| 1023 | 
            +
                                lst_cond[row] = temp_list;
         | 
| 1024 | 
            +
                                curr_tree = parent_tree;
         | 
| 1025 | 
            +
                            }
         | 
| 1026 | 
            +
                            
         | 
| 1027 | 
            +
                        }
         | 
| 1028 | 
            +
                        
         | 
| 1029 | 
            +
                    }
         | 
| 1030 | 
            +
                }
         | 
| 1031 | 
            +
                
         | 
| 1032 | 
            +
                outp["suspicous_value"]  = outlier_val;
         | 
| 1033 | 
            +
                outp["group_statistics"] = lst_stats;
         | 
| 1034 | 
            +
                outp["conditions"]       = lst_cond;
         | 
| 1035 | 
            +
                outp["tree_depth"]       = tree_depth;
         | 
| 1036 | 
            +
                outp["uses_NA_branch"]   = has_na_col;
         | 
| 1037 | 
            +
                outp["outlier_score"]    = outlier_score;
         | 
| 1038 | 
            +
                return outp;
         | 
| 1039 | 
            +
            }
         | 
| 1040 | 
            +
             | 
| 1041 | 
            +
            /* for extracting info about flaggable outliers */
         | 
| 1042 | 
            +
            Rcpp::List extract_outl_bounds(ModelOutputs &model_outputs,
         | 
| 1043 | 
            +
                                           Rcpp::ListOf<Rcpp::StringVector> cat_levels,
         | 
| 1044 | 
            +
                                           Rcpp::ListOf<Rcpp::StringVector> ord_levels,
         | 
| 1045 | 
            +
                                           Rcpp::NumericVector min_date,
         | 
| 1046 | 
            +
                                           Rcpp::NumericVector min_ts)
         | 
| 1047 | 
            +
            {
         | 
| 1048 | 
            +
                size_t ncols_num     = model_outputs.ncols_numeric;
         | 
| 1049 | 
            +
                size_t ncols_cat     = model_outputs.ncols_categ;
         | 
| 1050 | 
            +
                size_t ncols_ord     = model_outputs.ncols_ord;
         | 
| 1051 | 
            +
                size_t col_lim_num   = model_outputs.ncols_numeric - min_date.size() - min_ts.size();
         | 
| 1052 | 
            +
                size_t col_lim_date  = model_outputs.ncols_numeric - min_ts.size();
         | 
| 1053 | 
            +
                size_t ncols_cat_cat = cat_levels.size();
         | 
| 1054 | 
            +
                size_t tot_cols = ncols_num + ncols_cat + ncols_ord;
         | 
| 1055 | 
            +
                Rcpp::LogicalVector temp_bool;
         | 
| 1056 | 
            +
                Rcpp::LogicalVector bool_choice(2, false); bool_choice[1] = true;
         | 
| 1057 | 
            +
                Rcpp::List outp(tot_cols);
         | 
| 1058 | 
            +
                
         | 
| 1059 | 
            +
                for (size_t cl = 0; cl < tot_cols; cl++) {
         | 
| 1060 | 
            +
                    if (cl < col_lim_num) {
         | 
| 1061 | 
            +
                        /* numeric */
         | 
| 1062 | 
            +
                        outp[cl] = Rcpp::List::create(Rcpp::_["lb"] = Rcpp::wrap(model_outputs.min_outlier_any_cl[cl]),
         | 
| 1063 | 
            +
                                                      Rcpp::_["ub"] = Rcpp::wrap(model_outputs.max_outlier_any_cl[cl]));
         | 
| 1064 | 
            +
                    } else if (cl < col_lim_date) {
         | 
| 1065 | 
            +
                        /* date */
         | 
| 1066 | 
            +
                        outp[cl] = Rcpp::List::create(
         | 
| 1067 | 
            +
                            Rcpp::_["lb"] = Rcpp::Date(model_outputs.min_outlier_any_cl[cl] - 1 + min_date[cl - col_lim_num]),
         | 
| 1068 | 
            +
                            Rcpp::_["ub"] = Rcpp::Date(model_outputs.max_outlier_any_cl[cl] - 1 + min_date[cl - col_lim_num])
         | 
| 1069 | 
            +
                        );
         | 
| 1070 | 
            +
                    } else if (cl < ncols_num) {
         | 
| 1071 | 
            +
                        /* timestamp */
         | 
| 1072 | 
            +
                        outp[cl] = Rcpp::List::create(
         | 
| 1073 | 
            +
                            Rcpp::_["lb"] = Rcpp::Datetime(model_outputs.min_outlier_any_cl[cl] - 1 + min_ts[cl - col_lim_date]),
         | 
| 1074 | 
            +
                            Rcpp::_["ub"] = Rcpp::Datetime(model_outputs.max_outlier_any_cl[cl] - 1 + min_ts[cl - col_lim_date])
         | 
| 1075 | 
            +
                        );
         | 
| 1076 | 
            +
                    } else if (cl < (ncols_num + ncols_cat_cat)) {
         | 
| 1077 | 
            +
                        /* categorical */
         | 
| 1078 | 
            +
                        if (model_outputs.cat_outlier_any_cl[cl - ncols_num].size()) {
         | 
| 1079 | 
            +
                            temp_bool = Rcpp::wrap(model_outputs.cat_outlier_any_cl[cl - ncols_num]);
         | 
| 1080 | 
            +
                            outp[cl]  = cat_levels[cl - ncols_num][temp_bool];
         | 
| 1081 | 
            +
                         } else {
         | 
| 1082 | 
            +
                            outp[cl]  = Rcpp::StringVector();
         | 
| 1083 | 
            +
                         }
         | 
| 1084 | 
            +
                    } else if (cl < (ncols_num + ncols_cat)) {
         | 
| 1085 | 
            +
                        /* boolean */
         | 
| 1086 | 
            +
                        if (model_outputs.cat_outlier_any_cl[cl - ncols_num].size()) {
         | 
| 1087 | 
            +
                            temp_bool = Rcpp::wrap(model_outputs.cat_outlier_any_cl[cl - ncols_num]);
         | 
| 1088 | 
            +
                            outp[cl]  = bool_choice[temp_bool];
         | 
| 1089 | 
            +
                        } else {
         | 
| 1090 | 
            +
                            outp[cl]  = Rcpp::LogicalVector();
         | 
| 1091 | 
            +
                        }
         | 
| 1092 | 
            +
                    } else {
         | 
| 1093 | 
            +
                        /* ordinal */
         | 
| 1094 | 
            +
                        if (model_outputs.cat_outlier_any_cl[cl - ncols_num].size()) {
         | 
| 1095 | 
            +
                            temp_bool = Rcpp::wrap(model_outputs.cat_outlier_any_cl[cl - ncols_num]);
         | 
| 1096 | 
            +
                            outp[cl]  = ord_levels[cl - ncols_num - ncols_cat][temp_bool];
         | 
| 1097 | 
            +
                        } else {
         | 
| 1098 | 
            +
                            outp[cl]  = Rcpp::StringVector();
         | 
| 1099 | 
            +
                        }
         | 
| 1100 | 
            +
                    }
         | 
| 1101 | 
            +
                }
         | 
| 1102 | 
            +
                return outp;
         | 
| 1103 | 
            +
            }
         | 
| 1104 | 
            +
             | 
| 1105 | 
            +
             | 
| 1106 | 
            +
            /* external functions for fitting the model and predicting outliers */
         | 
| 1107 | 
            +
            // [[Rcpp::export]]
         | 
| 1108 | 
            +
            Rcpp::List fit_OutlierTree(Rcpp::NumericVector arr_num, size_t ncols_numeric,
         | 
| 1109 | 
            +
                                       Rcpp::IntegerVector arr_cat, size_t ncols_categ,   Rcpp::IntegerVector ncat,
         | 
| 1110 | 
            +
                                       Rcpp::IntegerVector arr_ord, size_t ncols_ord,     Rcpp::IntegerVector ncat_ord,
         | 
| 1111 | 
            +
                                       size_t nrows, Rcpp::LogicalVector cols_ignore_r, int nthreads,
         | 
| 1112 | 
            +
                                       bool categ_as_bin, bool ord_as_bin, bool cat_bruteforce_subset, bool categ_from_maj, bool take_mid,
         | 
| 1113 | 
            +
                                       size_t max_depth, double max_perc_outliers, size_t min_size_numeric, size_t min_size_categ,
         | 
| 1114 | 
            +
                                       double min_gain, bool follow_all, bool gain_as_pct, double z_norm, double z_outlier,
         | 
| 1115 | 
            +
                                       bool return_outliers,
         | 
| 1116 | 
            +
                                       Rcpp::ListOf<Rcpp::StringVector> cat_levels,
         | 
| 1117 | 
            +
                                       Rcpp::ListOf<Rcpp::StringVector> ord_levels,
         | 
| 1118 | 
            +
                                       Rcpp::StringVector colnames_num,
         | 
| 1119 | 
            +
                                       Rcpp::StringVector colnames_cat,
         | 
| 1120 | 
            +
                                       Rcpp::StringVector colnames_ord,
         | 
| 1121 | 
            +
                                       Rcpp::NumericVector min_date,
         | 
| 1122 | 
            +
                                       Rcpp::NumericVector min_ts)
         | 
| 1123 | 
            +
            {
         | 
| 1124 | 
            +
                bool found_outliers;
         | 
| 1125 | 
            +
                Rcpp::List outp;
         | 
| 1126 | 
            +
                size_t tot_cols = ncols_numeric + ncols_categ + ncols_ord;
         | 
| 1127 | 
            +
                std::vector<char> cols_ignore;
         | 
| 1128 | 
            +
                char *cols_ignore_ptr = NULL;
         | 
| 1129 | 
            +
                if (cols_ignore_r.size() > 0) {
         | 
| 1130 | 
            +
                    cols_ignore.resize(tot_cols, false);
         | 
| 1131 | 
            +
                    for (size_t cl = 0; cl < tot_cols; cl++) cols_ignore[cl] = (bool) cols_ignore_r[cl];
         | 
| 1132 | 
            +
                    cols_ignore_ptr = &cols_ignore[0];
         | 
| 1133 | 
            +
                }
         | 
| 1134 | 
            +
                std::vector<double> Xcpp;
         | 
| 1135 | 
            +
                double *arr_num_C = set_R_nan_as_C_nan(&arr_num[0], Xcpp, arr_num.size(), nthreads);
         | 
| 1136 | 
            +
             | 
| 1137 | 
            +
                std::unique_ptr<ModelOutputs> model_outputs = std::unique_ptr<ModelOutputs>(new ModelOutputs());
         | 
| 1138 | 
            +
                found_outliers = fit_outliers_models(*model_outputs,
         | 
| 1139 | 
            +
                                                     arr_num_C, ncols_numeric,
         | 
| 1140 | 
            +
                                                     &arr_cat[0], ncols_categ, &ncat[0],
         | 
| 1141 | 
            +
                                                     &arr_ord[0], ncols_ord,   &ncat_ord[0],
         | 
| 1142 | 
            +
                                                     nrows, cols_ignore_ptr, nthreads,
         | 
| 1143 | 
            +
                                                     categ_as_bin, ord_as_bin, cat_bruteforce_subset, categ_from_maj, take_mid,
         | 
| 1144 | 
            +
                                                     max_depth, max_perc_outliers, min_size_numeric, min_size_categ,
         | 
| 1145 | 
            +
                                                     min_gain, gain_as_pct, follow_all, z_norm, z_outlier);
         | 
| 1146 | 
            +
             | 
| 1147 | 
            +
                outp["bounds"] = extract_outl_bounds(*model_outputs,
         | 
| 1148 | 
            +
                                                     cat_levels,
         | 
| 1149 | 
            +
                                                     ord_levels,
         | 
| 1150 | 
            +
                                                     min_date,
         | 
| 1151 | 
            +
                                                     min_ts);
         | 
| 1152 | 
            +
             | 
| 1153 | 
            +
                outp["serialized_obj"] = serialize_OutlierTree(model_outputs.get());
         | 
| 1154 | 
            +
                if (return_outliers) {
         | 
| 1155 | 
            +
                    outp["outliers_info"] = describe_outliers(*model_outputs,
         | 
| 1156 | 
            +
                                                              arr_num_C,
         | 
| 1157 | 
            +
                                                              &arr_cat[0],
         | 
| 1158 | 
            +
                                                              &arr_ord[0],
         | 
| 1159 | 
            +
                                                              cat_levels,
         | 
| 1160 | 
            +
                                                              ord_levels,
         | 
| 1161 | 
            +
                                                              colnames_num,
         | 
| 1162 | 
            +
                                                              colnames_cat,
         | 
| 1163 | 
            +
                                                              colnames_ord,
         | 
| 1164 | 
            +
                                                              min_date,
         | 
| 1165 | 
            +
                                                              min_ts);
         | 
| 1166 | 
            +
                }
         | 
| 1167 | 
            +
                /* add number of trees and clusters */
         | 
| 1168 | 
            +
                size_t ntrees = 0, nclust = 0;
         | 
| 1169 | 
            +
                for (size_t col = 0; col < model_outputs->all_trees.size(); col++) {
         | 
| 1170 | 
            +
                	ntrees += model_outputs->all_trees[col].size();
         | 
| 1171 | 
            +
                	nclust += model_outputs->all_clusters[col].size();
         | 
| 1172 | 
            +
                }
         | 
| 1173 | 
            +
                outp["ntrees"] = Rcpp::wrap((int) ntrees);
         | 
| 1174 | 
            +
                outp["nclust"] = Rcpp::wrap((int) nclust);
         | 
| 1175 | 
            +
                outp["found_outliers"] = Rcpp::wrap(found_outliers);
         | 
| 1176 | 
            +
                
         | 
| 1177 | 
            +
                forget_row_outputs(*model_outputs);
         | 
| 1178 | 
            +
                outp["ptr_model"] = Rcpp::XPtr<ModelOutputs>(model_outputs.release(), true);
         | 
| 1179 | 
            +
                return outp;
         | 
| 1180 | 
            +
            }
         | 
| 1181 | 
            +
             | 
| 1182 | 
            +
            // [[Rcpp::export]]
         | 
| 1183 | 
            +
            Rcpp::List predict_OutlierTree(SEXP ptr_model, size_t nrows, int nthreads,
         | 
| 1184 | 
            +
                                           Rcpp::NumericVector arr_num, Rcpp::IntegerVector arr_cat, Rcpp::IntegerVector arr_ord,
         | 
| 1185 | 
            +
                                           Rcpp::ListOf<Rcpp::StringVector> cat_levels,
         | 
| 1186 | 
            +
                                           Rcpp::ListOf<Rcpp::StringVector> ord_levels,
         | 
| 1187 | 
            +
                                           Rcpp::StringVector colnames_num,
         | 
| 1188 | 
            +
                                           Rcpp::StringVector colnames_cat,
         | 
| 1189 | 
            +
                                           Rcpp::StringVector colnames_ord,
         | 
| 1190 | 
            +
                                           Rcpp::NumericVector min_date,
         | 
| 1191 | 
            +
                                           Rcpp::NumericVector min_ts)
         | 
| 1192 | 
            +
            {
         | 
| 1193 | 
            +
                std::vector<double> Xcpp;
         | 
| 1194 | 
            +
                double *arr_num_C = set_R_nan_as_C_nan(&arr_num[0], Xcpp, arr_num.size(), nthreads);
         | 
| 1195 | 
            +
             | 
| 1196 | 
            +
                ModelOutputs *model_outputs = static_cast<ModelOutputs*>(R_ExternalPtrAddr(ptr_model));
         | 
| 1197 | 
            +
                bool found_outliers = find_new_outliers(&arr_num[0], &arr_cat[0], &arr_ord[0],
         | 
| 1198 | 
            +
                                                        nrows, nthreads, *model_outputs);
         | 
| 1199 | 
            +
                Rcpp::List outp = describe_outliers(*model_outputs,
         | 
| 1200 | 
            +
                                                    arr_num_C,
         | 
| 1201 | 
            +
                                                    &arr_cat[0],
         | 
| 1202 | 
            +
                                                    &arr_ord[0],
         | 
| 1203 | 
            +
                                                    cat_levels,
         | 
| 1204 | 
            +
                                                    ord_levels,
         | 
| 1205 | 
            +
                                                    colnames_num,
         | 
| 1206 | 
            +
                                                    colnames_cat,
         | 
| 1207 | 
            +
                                                    colnames_ord,
         | 
| 1208 | 
            +
                                                    min_date,
         | 
| 1209 | 
            +
                                                    min_ts);
         | 
| 1210 | 
            +
                outp["found_outliers"] = Rcpp::LogicalVector(found_outliers);
         | 
| 1211 | 
            +
                forget_row_outputs(*model_outputs);
         | 
| 1212 | 
            +
                return outp;
         | 
| 1213 | 
            +
            }
         | 
| 1214 | 
            +
             | 
| 1215 | 
            +
            // [[Rcpp::export]]
         | 
| 1216 | 
            +
            Rcpp::LogicalVector check_few_values(Rcpp::NumericVector arr_num, size_t nrows, size_t ncols, int nthreads)
         | 
| 1217 | 
            +
            {
         | 
| 1218 | 
            +
                std::vector<char> too_few_vals(ncols, 0);
         | 
| 1219 | 
            +
                check_more_two_values(&arr_num[0], nrows, ncols, nthreads, too_few_vals.data());
         | 
| 1220 | 
            +
                Rcpp::LogicalVector outp(ncols);
         | 
| 1221 | 
            +
                for (size_t col = 0; col < ncols; col++) {
         | 
| 1222 | 
            +
                    outp[col] = (bool) too_few_vals[col];
         | 
| 1223 | 
            +
                }
         | 
| 1224 | 
            +
                return outp;
         | 
| 1225 | 
            +
            }
         |