neural-feature-importance 0.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,70 @@
1
+ # This workflow will upload a Python Package to PyPI when a release is created
2
+ # For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries
3
+
4
+ # This workflow uses actions that are not certified by GitHub.
5
+ # They are provided by a third-party and are governed by
6
+ # separate terms of service, privacy policy, and support
7
+ # documentation.
8
+
9
+ name: Upload Python Package
10
+
11
+ on:
12
+ release:
13
+ types: [published]
14
+
15
+ permissions:
16
+ contents: read
17
+
18
+ jobs:
19
+ release-build:
20
+ runs-on: ubuntu-latest
21
+
22
+ steps:
23
+ - uses: actions/checkout@v4
24
+
25
+ - uses: actions/setup-python@v5
26
+ with:
27
+ python-version: "3.x"
28
+
29
+ - name: Build release distributions
30
+ run: |
31
+ # NOTE: put your own distribution build steps here.
32
+ python -m pip install build
33
+ python -m build
34
+
35
+ - name: Upload distributions
36
+ uses: actions/upload-artifact@v4
37
+ with:
38
+ name: release-dists
39
+ path: dist/
40
+
41
+ pypi-publish:
42
+ runs-on: ubuntu-latest
43
+ needs:
44
+ - release-build
45
+ permissions:
46
+ # IMPORTANT: this permission is mandatory for trusted publishing
47
+ id-token: write
48
+
49
+ # Dedicated environments with protections for publishing are strongly recommended.
50
+ # For more information, see: https://docs.github.com/en/actions/deployment/targeting-different-environments/using-environments-for-deployment#deployment-protection-rules
51
+ environment:
52
+ name: pypi
53
+ # OPTIONAL: uncomment and update to include your PyPI project URL in the deployment status:
54
+ # url: https://pypi.org/p/YOURPROJECT
55
+ #
56
+ # ALTERNATIVE: if your GitHub Release name is the PyPI project version string
57
+ # ALTERNATIVE: exactly, uncomment the following line instead:
58
+ # url: https://pypi.org/project/YOURPROJECT/${{ github.event.release.name }}
59
+
60
+ steps:
61
+ - name: Retrieve release distributions
62
+ uses: actions/download-artifact@v4
63
+ with:
64
+ name: release-dists
65
+ path: dist/
66
+
67
+ - name: Publish release distributions to PyPI
68
+ uses: pypa/gh-action-pypi-publish@release/v1
69
+ with:
70
+ packages-dir: dist/
@@ -0,0 +1,24 @@
1
+ # Repository Guidelines
2
+
3
+ This repository contains a minimal Python package named `neural-feature-importance` which provides utilities to compute variance-based feature importances. It exposes `VarianceImportanceCallback` for `tf.keras` models, `VarianceImportanceTorch` for PyTorch, and an `AccuracyMonitor` helper. The method is described in the article [CR de Sá, *Variance-based Feature Importance in Neural Networks*](https://doi.org/10.1007/978-3-030-33778-0_24).
4
+
5
+ ## Style
6
+ - Follow PEP8 and PEP257 standards.
7
+ - Use `tf.keras` APIs instead of the standalone `keras` package.
8
+ - Use Python `logging` for all user-facing messages. Do not add print statements or verbose flags.
9
+ - Provide type hints where practical and keep docstrings concise and descriptive.
10
+
11
+ ## Package layout
12
+ - The public API lives in `neural_feature_importance/callbacks.py` and is re-exported from `neural_feature_importance/__init__.py`.
13
+ - `VarianceImportanceCallback` exposes the attribute `var_scores` and property `feature_importances_`. The module also provides `AccuracyMonitor` and `VarianceImportanceTorch`.
14
+
15
+ ## Testing
16
+ Run these commands after making changes:
17
+
18
+ ```bash
19
+ python -m py_compile neural_feature_importance/callbacks.py
20
+ python -m py_compile "variance-based feature importance in artificial neural networks.ipynb" 2>&1 | head
21
+ # Optionally check notebook conversion if `jupyter` is available
22
+ jupyter nbconvert --to script "variance-based feature importance in artificial neural networks.ipynb" --stdout | head
23
+ ```
24
+
@@ -0,0 +1,10 @@
1
+ Metadata-Version: 2.4
2
+ Name: neural-feature-importance
3
+ Version: 0.5.0
4
+ Summary: Variance-based feature importance for Neural Networks using callbacks for Keras and PyTorch
5
+ Author: CR de Sá
6
+ Requires-Dist: numpy
7
+ Provides-Extra: tensorflow
8
+ Requires-Dist: tensorflow; extra == "tensorflow"
9
+ Provides-Extra: torch
10
+ Requires-Dist: torch; extra == "torch"
@@ -0,0 +1,95 @@
1
+ # Variance-based Feature Importance in Neural Networks / Deep Learning
2
+
3
+ This file provides a working example of how to measure the importance of features (inputs) in neural networks.
4
+
5
+ This method is a new method to measure the relative importance of features in Artificial Neural Networks (ANN) models. Its underlying principle assumes that the more important a feature is, the more the weights, connected to the respective input neuron, will change during the training of the model. To capture this behavior, a running variance of every weight connected to the input layer is measured during training. For that, an adaptation of Welford's online algorithm for computing the online variance is proposed.
6
+
7
+ When the training is finished, for each input, the variances of the weights are combined with the final weights to obtain the measure of relative importance for each feature.
8
+
9
+ The file **variance-based feature importance in artificial neural networks.ipynb** includes the code to fully replicate the results obtained in the paper:
10
+
11
+ CR de Sá [**Variance-based Feature Importance in Neural Networks**](https://doi.org/10.1007/978-3-030-33778-0_24)
12
+ 22st International Conference on Discovery Science (DS 2019) Split, Croatia, October 28-30, 2019
13
+
14
+
15
+ ## VIANN
16
+ #### Variance-based Feature Importance of Artificial Neural Networks
17
+
18
+ This repository exposes the feature importance callback as a small Python package named `neural-feature-importance`.
19
+ It will automatically track the first layer that contains trainable weights so you can use it with models that start with an `InputLayer` or other preprocessing layers.
20
+ There is also a helper for PyTorch models that follows the same API.
21
+
22
+ Install with pip and select the extras that match your framework:
23
+
24
+ ```bash
25
+ pip install "neural-feature-importance[tensorflow]" # for Keras
26
+ pip install "neural-feature-importance[torch]" # for PyTorch
27
+ ```
28
+
29
+ The package uses `setuptools_scm` to derive its version from Git tags. Access it
30
+ via:
31
+
32
+ ```python
33
+ from neural_feature_importance import __version__
34
+
35
+ print(__version__)
36
+ ```
37
+
38
+ ```python
39
+ from neural_feature_importance import VarianceImportanceCallback, AccuracyMonitor
40
+
41
+ import logging
42
+
43
+ logging.basicConfig(level=logging.INFO)
44
+
45
+ VIANN = VarianceImportanceCallback()
46
+ monitor = AccuracyMonitor(baseline=0.95)
47
+ ```
48
+
49
+ For a PyTorch model, use ``VarianceImportanceTorch`` and call its
50
+ ``on_train_begin``, ``on_epoch_end`` and ``on_train_end`` methods inside your
51
+ training loop:
52
+
53
+ ```python
54
+ from neural_feature_importance import VarianceImportanceTorch
55
+
56
+ tracker = VarianceImportanceTorch(model)
57
+ tracker.on_train_begin()
58
+ for epoch in range(num_epochs):
59
+ train_one_epoch(model, optimizer, data_loader)
60
+ tracker.on_epoch_end()
61
+ tracker.on_train_end()
62
+ print(tracker.var_scores)
63
+ ```
64
+
65
+ Use this callback during model training:
66
+
67
+ ```python
68
+ model = Sequential()
69
+ model.add(Dense(50, input_dim=input_dim, activation='relu', kernel_initializer='normal', kernel_regularizer=l2(0.01)))
70
+ model.add(Dense(100, activation='relu', kernel_initializer='normal', kernel_regularizer=l2(0.01)))
71
+ model.add(Dense(50, activation='relu', kernel_initializer='normal', kernel_regularizer=l2(0.01)))
72
+ model.add(Dense(5, activation='softmax', kernel_initializer='normal'))
73
+
74
+ model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
75
+ model.fit(X, Y, validation_split=0.05, epochs=30, batch_size=64, shuffle=True,
76
+ verbose=1, callbacks=[VIANN, monitor])
77
+
78
+ print(VIANN.var_scores)
79
+ ```
80
+
81
+ ## Comparing with Random Forest
82
+
83
+ To verify the variance-based scores, run `compare_feature_importance.py`. The
84
+ script trains a small neural network on the Iris dataset and compares the scores
85
+ with those from a `RandomForestClassifier`.
86
+
87
+ ```bash
88
+ python compare_feature_importance.py
89
+ ```
90
+
91
+ For a larger experiment across several datasets, run `full_experiment.py`. The script builds a simple network for each dataset, applies the `AccuracyMonitor` for early stopping, and prints the correlation between neural network importances and a random forest baseline.
92
+
93
+ ```bash
94
+ python full_experiment.py
95
+ ```
@@ -0,0 +1,112 @@
1
+ """Compare feature importance methods using the Iris dataset."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import logging
6
+ import numpy as np
7
+ from sklearn.datasets import load_iris
8
+ from sklearn.ensemble import RandomForestClassifier
9
+ from sklearn.model_selection import train_test_split
10
+ from tensorflow.keras.models import Sequential
11
+ from tensorflow.keras.layers import Dense
12
+ from tensorflow.keras.utils import to_categorical
13
+ import torch
14
+ from torch import nn
15
+ from torch.utils.data import DataLoader, TensorDataset
16
+
17
+ from neural_feature_importance import (
18
+ VarianceImportanceKeras,
19
+ VarianceImportanceTorch,
20
+ )
21
+
22
+ logging.basicConfig(level=logging.INFO)
23
+ logger = logging.getLogger(__name__)
24
+
25
+
26
+ def build_model(input_dim: int, num_classes: int) -> Sequential:
27
+ """Return a simple feed-forward neural network."""
28
+ model = Sequential(
29
+ [
30
+ Dense(16, input_dim=input_dim, activation="relu"),
31
+ Dense(num_classes, activation="softmax"),
32
+ ]
33
+ )
34
+ model.compile(
35
+ optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"]
36
+ )
37
+ return model
38
+
39
+
40
+ def build_torch_model(input_dim: int, num_classes: int) -> nn.Module:
41
+ """Return a simple feed-forward PyTorch network."""
42
+ return nn.Sequential(
43
+ nn.Linear(input_dim, 16),
44
+ nn.ReLU(),
45
+ nn.Linear(16, num_classes),
46
+ )
47
+
48
+
49
+ def main() -> None:
50
+ """Train models and compare feature importances."""
51
+ data = load_iris()
52
+ X_train, X_test, y_train, y_test = train_test_split(
53
+ data.data, data.target, test_size=0.2, random_state=42
54
+ )
55
+
56
+ num_classes = len(np.unique(data.target))
57
+ y_train_cat = to_categorical(y_train, num_classes)
58
+
59
+ model = build_model(data.data.shape[1], num_classes)
60
+ callback = VarianceImportanceKeras()
61
+ model.fit(
62
+ X_train,
63
+ y_train_cat,
64
+ epochs=50,
65
+ batch_size=16,
66
+ verbose=0,
67
+ callbacks=[callback],
68
+ )
69
+
70
+ nn_scores = callback.var_scores
71
+ logger.info("NN feature importances: %s", nn_scores)
72
+
73
+ # Train a PyTorch model using the same data
74
+ torch_model = build_torch_model(data.data.shape[1], num_classes)
75
+ optimizer = torch.optim.Adam(torch_model.parameters())
76
+ loss_fn = nn.CrossEntropyLoss()
77
+ dataset = TensorDataset(
78
+ torch.tensor(X_train, dtype=torch.float32),
79
+ torch.tensor(y_train, dtype=torch.long),
80
+ )
81
+ loader = DataLoader(dataset, batch_size=16, shuffle=True)
82
+ tracker = VarianceImportanceTorch(torch_model)
83
+ tracker.on_train_begin()
84
+ for _ in range(50):
85
+ torch_model.train()
86
+ for xb, yb in loader:
87
+ optimizer.zero_grad()
88
+ out = torch_model(xb)
89
+ loss = loss_fn(out, yb)
90
+ loss.backward()
91
+ optimizer.step()
92
+ tracker.on_epoch_end()
93
+ tracker.on_train_end()
94
+ torch_scores = tracker.var_scores
95
+ logger.info("PyTorch feature importances: %s", torch_scores)
96
+
97
+ rf = RandomForestClassifier(random_state=42)
98
+ rf.fit(X_train, y_train)
99
+ rf_scores = rf.feature_importances_
100
+ logger.info("Random forest importances: %s", rf_scores)
101
+
102
+ if nn_scores is not None:
103
+ corr = np.corrcoef(nn_scores, rf_scores)[0, 1]
104
+ logger.info("Correlation between Keras and RF: %.3f", corr)
105
+
106
+ if torch_scores is not None:
107
+ corr_torch = np.corrcoef(torch_scores, rf_scores)[0, 1]
108
+ logger.info("Correlation between PyTorch and RF: %.3f", corr_torch)
109
+
110
+
111
+ if __name__ == "__main__":
112
+ main()
@@ -0,0 +1,93 @@
1
+ """Replicate the notebook experiment using the reusable callbacks."""
2
+ from __future__ import annotations
3
+
4
+ import logging
5
+ from typing import List
6
+
7
+ import numpy as np
8
+ from sklearn import datasets
9
+ from sklearn.preprocessing import LabelEncoder, scale
10
+ from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
11
+ from tensorflow.keras.models import Sequential
12
+ from tensorflow.keras.layers import Dense
13
+ from tensorflow.keras.utils import to_categorical
14
+ from tensorflow.keras.regularizers import l2
15
+
16
+ from neural_feature_importance import VarianceImportanceKeras
17
+ from neural_feature_importance.utils import MetricThreshold
18
+
19
+ logging.basicConfig(level=logging.INFO)
20
+ logger = logging.getLogger(__name__)
21
+
22
+
23
+ def nn2(input_dim: int, output_dim: int, classification: bool) -> Sequential:
24
+ """Return a small feed-forward network."""
25
+ model = Sequential(
26
+ [
27
+ Dense(50, input_dim=input_dim, activation="relu", kernel_regularizer=l2(0.01)),
28
+ Dense(100, activation="relu", kernel_regularizer=l2(0.01)),
29
+ Dense(50, activation="relu", kernel_regularizer=l2(0.01)),
30
+ ]
31
+ )
32
+ if classification:
33
+ model.add(Dense(output_dim, activation="softmax"))
34
+ model.compile(optimizer="sgd", loss="categorical_crossentropy", metrics=["accuracy"])
35
+ else:
36
+ model.add(Dense(1))
37
+ model.compile(optimizer="sgd", loss="mean_squared_error")
38
+ return model
39
+
40
+
41
+ def run_experiment() -> None:
42
+ """Train on several datasets and compare with random forest importances."""
43
+ datasets_cfg: List[dict] = [
44
+ {"name": "breastcancer", "classification": True, "data": datasets.load_breast_cancer()},
45
+ {"name": "digits", "classification": True, "data": datasets.load_digits()},
46
+ {"name": "iris", "classification": True, "data": datasets.load_iris()},
47
+ {"name": "wine", "classification": True, "data": datasets.load_wine()},
48
+ {"name": "boston", "classification": False, "data": datasets.load_boston()},
49
+ {"name": "diabetes", "classification": False, "data": datasets.load_diabetes()},
50
+ ]
51
+
52
+ for cfg in datasets_cfg:
53
+ ds = cfg["data"]
54
+ X = scale(ds.data)
55
+ if cfg["classification"]:
56
+ enc = LabelEncoder()
57
+ y_enc = enc.fit_transform(ds.target)
58
+ Y = to_categorical(y_enc)
59
+ output_size = Y.shape[1]
60
+ rf = RandomForestClassifier(n_estimators=100)
61
+ rf.fit(X, ds.target)
62
+ monitor = MetricThreshold(monitor="val_accuracy", threshold=0.95)
63
+ else:
64
+ Y = scale(ds.target)
65
+ output_size = 1
66
+ rf = RandomForestRegressor(n_estimators=100)
67
+ rf.fit(X, ds.target)
68
+ monitor = None
69
+
70
+ model = nn2(X.shape[1], output_size, cfg["classification"])
71
+ viann = VarianceImportanceKeras()
72
+ callbacks = [viann]
73
+ if monitor:
74
+ callbacks.append(monitor)
75
+
76
+ model.fit(
77
+ X,
78
+ Y,
79
+ validation_split=0.05,
80
+ epochs=100,
81
+ batch_size=max(1, int(round(X.shape[0] / 7))),
82
+ verbose=0,
83
+ callbacks=callbacks,
84
+ )
85
+
86
+ nn_scores = viann.var_scores
87
+ rf_scores = rf.feature_importances_
88
+ corr = np.corrcoef(nn_scores, rf_scores)[0, 1]
89
+ logger.info("%s correlation with RF: %.2f", cfg["name"], corr)
90
+
91
+
92
+ if __name__ == "__main__":
93
+ run_experiment()
@@ -0,0 +1,22 @@
1
+ """Utilities for variance-based feature importance in neural networks."""
2
+
3
+ from importlib import metadata
4
+
5
+ from .callbacks import (
6
+ VarianceImportanceBase,
7
+ VarianceImportanceKeras,
8
+ VarianceImportanceTorch,
9
+ )
10
+ from .utils import MetricThreshold
11
+
12
+ try:
13
+ __version__ = metadata.version("neural-feature-importance")
14
+ except metadata.PackageNotFoundError: # pragma: no cover - package not installed
15
+ __version__ = "0.0.dev0"
16
+
17
+ __all__ = [
18
+ "VarianceImportanceBase",
19
+ "VarianceImportanceKeras",
20
+ "VarianceImportanceTorch",
21
+ "MetricThreshold",
22
+ ]
@@ -0,0 +1,142 @@
1
+ """Variance-based feature importance utilities."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import logging
6
+ from typing import Optional
7
+
8
+ import numpy as np
9
+ from tensorflow.keras.callbacks import Callback
10
+ from tensorflow.keras.layers import Layer
11
+
12
+ logger = logging.getLogger(__name__)
13
+
14
+
15
+ class VarianceImportanceBase:
16
+ """Compute feature importance using Welford's algorithm."""
17
+
18
+ def __init__(self) -> None:
19
+ self._n = 0
20
+ self._mean: np.ndarray | None = None
21
+ self._m2: np.ndarray | None = None
22
+ self._last_weights: np.ndarray | None = None
23
+ self.var_scores: np.ndarray | None = None
24
+
25
+ def start(self, weights: np.ndarray) -> None:
26
+ """Initialize statistics for the given weight matrix."""
27
+ self._mean = weights.astype(np.float64)
28
+ self._m2 = np.zeros_like(self._mean)
29
+ self._n = 0
30
+
31
+ def update(self, weights: np.ndarray) -> None:
32
+ """Update running statistics with new weights."""
33
+ if self._mean is None or self._m2 is None:
34
+ return
35
+ self._n += 1
36
+ delta = weights - self._mean
37
+ self._mean += delta / self._n
38
+ delta2 = weights - self._mean
39
+ self._m2 += delta * delta2
40
+ self._last_weights = weights
41
+
42
+ def finalize(self) -> None:
43
+ """Finalize statistics and compute normalized scores."""
44
+ if self._last_weights is None or self._m2 is None:
45
+ logger.warning(
46
+ "%s was not fully initialized; no scores computed", self.__class__.__name__
47
+ )
48
+ return
49
+
50
+ if self._n < 2:
51
+ variance = np.full_like(self._m2, np.nan)
52
+ else:
53
+ variance = self._m2 / (self._n - 1)
54
+
55
+ scores = np.sum(variance * np.abs(self._last_weights), axis=1)
56
+ min_val = float(np.min(scores))
57
+ max_val = float(np.max(scores))
58
+ denom = max_val - min_val if max_val != min_val else 1.0
59
+ self.var_scores = (scores - min_val) / denom
60
+
61
+ top = np.argsort(self.var_scores)[-10:][::-1]
62
+ logger.info("Most important variables: %s", top)
63
+
64
+ @property
65
+ def feature_importances_(self) -> np.ndarray | None:
66
+ """Normalized importance scores for each input feature."""
67
+ return self.var_scores
68
+
69
+
70
+ class VarianceImportanceKeras(Callback, VarianceImportanceBase):
71
+ """Keras callback implementing variance-based feature importance."""
72
+
73
+ def __init__(self) -> None:
74
+ Callback.__init__(self)
75
+ VarianceImportanceBase.__init__(self)
76
+ self._layer: Optional[Layer] = None
77
+
78
+ def on_train_begin(self, logs: Optional[dict] = None) -> None: # type: ignore[override]
79
+ self._layer = None
80
+ for layer in self.model.layers:
81
+ if layer.get_weights():
82
+ self._layer = layer
83
+ break
84
+ if self._layer is None:
85
+ raise ValueError("Model does not contain trainable weights.")
86
+ weights = self._layer.get_weights()[0]
87
+ logger.info(
88
+ "Tracking variance for layer '%s' with %d features",
89
+ self._layer.name,
90
+ weights.shape[0],
91
+ )
92
+ self.start(weights)
93
+
94
+ def on_epoch_end(self, epoch: int, logs: Optional[dict] = None) -> None: # type: ignore[override]
95
+ if self._layer is None:
96
+ return
97
+ weights = self._layer.get_weights()[0]
98
+ self.update(weights)
99
+
100
+ def on_train_end(self, logs: Optional[dict] = None) -> None: # type: ignore[override]
101
+ self.finalize()
102
+
103
+ def get_config(self) -> dict[str, int]:
104
+ """Return configuration for serialization."""
105
+ return {}
106
+
107
+
108
+ class VarianceImportanceTorch(VarianceImportanceBase):
109
+ """Track variance-based feature importance for PyTorch models."""
110
+
111
+ def __init__(self, model: "nn.Module") -> None:
112
+ from torch import nn # Local import to avoid hard dependency
113
+
114
+ super().__init__()
115
+ self.model = model
116
+ self._param: nn.Parameter | None = None
117
+
118
+ def on_train_begin(self) -> None:
119
+ from torch import nn
120
+
121
+ for name, param in self.model.named_parameters():
122
+ if param.requires_grad and param.dim() >= 2:
123
+ self._param = param
124
+ weights = param.detach().cpu().numpy()
125
+ logger.info(
126
+ "Tracking variance for parameter '%s' with %d features",
127
+ name,
128
+ weights.shape[1],
129
+ )
130
+ self.start(weights)
131
+ break
132
+ if self._param is None:
133
+ raise ValueError("Model does not contain trainable parameters")
134
+
135
+ def on_epoch_end(self) -> None:
136
+ if self._param is None:
137
+ return
138
+ weights = self._param.detach().cpu().numpy()
139
+ self.update(weights)
140
+
141
+ def on_train_end(self) -> None:
142
+ self.finalize()
@@ -0,0 +1,5 @@
1
+ """Utility callbacks for model training."""
2
+
3
+ from .monitors import MetricThreshold
4
+
5
+ __all__ = ["MetricThreshold"]
@@ -0,0 +1,54 @@
1
+ """Training utilities for keras models."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import logging
6
+ from typing import Optional
7
+
8
+ from tensorflow.keras.callbacks import Callback
9
+
10
+ logger = logging.getLogger(__name__)
11
+
12
+
13
+ class MetricThreshold(Callback):
14
+ """Stop training when a metric exceeds a given threshold.
15
+
16
+ Parameters
17
+ ----------
18
+ monitor:
19
+ Name of the metric to monitor (e.g. ``"val_accuracy"`` or ``"loss"``).
20
+ threshold:
21
+ Value that the metric must reach to trigger early stopping.
22
+ min_epochs:
23
+ Minimum number of epochs before stopping is allowed.
24
+ """
25
+
26
+ def __init__(self, monitor: str = "val_accuracy", threshold: float | None = None, min_epochs: int = 5) -> None:
27
+ super().__init__()
28
+ self.monitor = monitor
29
+ self.threshold = threshold
30
+ self.min_epochs = min_epochs
31
+ self.stopped_epoch = 0
32
+
33
+ def on_epoch_end(self, epoch: int, logs: Optional[dict] = None) -> None: # type: ignore[override]
34
+ logs = logs or {}
35
+ metric = logs.get(self.monitor)
36
+ if (
37
+ metric is not None
38
+ and self.threshold is not None
39
+ and epoch + 1 >= self.min_epochs
40
+ and metric >= self.threshold
41
+ ):
42
+ self.stopped_epoch = epoch + 1
43
+ self.model.stop_training = True
44
+ logger.info(
45
+ "MetricThreshold: stopped at epoch %d with %s=%.4f (threshold %.4f)",
46
+ self.stopped_epoch,
47
+ self.monitor,
48
+ metric,
49
+ self.threshold,
50
+ )
51
+
52
+ def on_train_end(self, logs: Optional[dict] = None) -> None: # type: ignore[override]
53
+ if self.stopped_epoch:
54
+ logger.info("Training stopped at epoch %d", self.stopped_epoch)
@@ -0,0 +1,10 @@
1
+ Metadata-Version: 2.4
2
+ Name: neural-feature-importance
3
+ Version: 0.5.0
4
+ Summary: Variance-based feature importance for Neural Networks using callbacks for Keras and PyTorch
5
+ Author: CR de Sá
6
+ Requires-Dist: numpy
7
+ Provides-Extra: tensorflow
8
+ Requires-Dist: tensorflow; extra == "tensorflow"
9
+ Provides-Extra: torch
10
+ Requires-Dist: torch; extra == "torch"
@@ -0,0 +1,16 @@
1
+ AGENTS.md
2
+ README.md
3
+ compare_feature_importance.py
4
+ full_experiment.py
5
+ pyproject.toml
6
+ variance-based feature importance in artificial neural networks.ipynb
7
+ .github/workflows/python-publish.yml
8
+ neural_feature_importance/__init__.py
9
+ neural_feature_importance/callbacks.py
10
+ neural_feature_importance.egg-info/PKG-INFO
11
+ neural_feature_importance.egg-info/SOURCES.txt
12
+ neural_feature_importance.egg-info/dependency_links.txt
13
+ neural_feature_importance.egg-info/requires.txt
14
+ neural_feature_importance.egg-info/top_level.txt
15
+ neural_feature_importance/utils/__init__.py
16
+ neural_feature_importance/utils/monitors.py
@@ -0,0 +1,7 @@
1
+ numpy
2
+
3
+ [tensorflow]
4
+ tensorflow
5
+
6
+ [torch]
7
+ torch
@@ -0,0 +1 @@
1
+ neural_feature_importance
@@ -0,0 +1,15 @@
1
+ [build-system]
2
+ requires = ["setuptools", "wheel", "setuptools_scm"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "neural-feature-importance"
7
+ description = "Variance-based feature importance for Neural Networks using callbacks for Keras and PyTorch"
8
+ authors = [{name = "CR de Sá"}]
9
+ dependencies = ["numpy"]
10
+ dynamic = ["version"]
11
+
12
+ [project.optional-dependencies]
13
+ tensorflow = ["tensorflow"]
14
+ torch = ["torch"]
15
+ [tool.setuptools_scm]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,520 @@
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": null,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "import os\n",
10
+ "os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\"\n",
11
+ "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"\n",
12
+ "\n",
13
+ "import tensorflow\n",
14
+ "import pandas as pd\n",
15
+ "import numpy as np\n",
16
+ "\n",
17
+ "from sklearn.model_selection import train_test_split\n",
18
+ "from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor\n",
19
+ "from tensorflow.keras.models import Sequential\n",
20
+ "from tensorflow.keras.layers import Dense, Dropout\n",
21
+ "from tensorflow.keras.utils import np_utils, to_categorical\n",
22
+ "from tensorflow.keras import optimizers\n",
23
+ "from tensorflow.keras.layers.advanced_activations import PReLU\n",
24
+ "from tensorflow.keras.layers.normalization import BatchNormalization\n",
25
+ "from tensorflow.keras.regularizers import l2\n",
26
+ "from sklearn import datasets\n",
27
+ "from sklearn import metrics\n",
28
+ "from sklearn.preprocessing import LabelEncoder, scale\n",
29
+ "from tensorflow.keras.utils import np_utils\n",
30
+ "import tensorflow.keras as keras"
31
+ ]
32
+ },
33
+ {
34
+ "cell_type": "markdown",
35
+ "metadata": {},
36
+ "source": [
37
+ "# Models"
38
+ ]
39
+ },
40
+ {
41
+ "cell_type": "code",
42
+ "execution_count": null,
43
+ "metadata": {},
44
+ "outputs": [],
45
+ "source": [
46
+ "def NN1(input_dim, output_dim, isClassification = True):\n",
47
+ " print(\"Starting NN1\")\n",
48
+ " \n",
49
+ " model = Sequential()\n",
50
+ " model.add(Dense(50, input_dim=input_dim, activation='linear', kernel_initializer='normal', kernel_regularizer=l2(0.01)))\n",
51
+ " model.add(Dense(100, activation='linear', kernel_initializer='normal', kernel_regularizer=l2(0.01)))\n",
52
+ " model.add(Dense(50, activation='linear', kernel_initializer='normal', kernel_regularizer=l2(0.01)))\n",
53
+ "\n",
54
+ " if (isClassification == False):\n",
55
+ " model.add(Dense(1, kernel_initializer='normal'))\n",
56
+ " model.compile(loss='mean_squared_error', optimizer='sgd')\n",
57
+ " elif (isClassification == True):\n",
58
+ " model.add(Dense(output_dim, activation='softmax', kernel_initializer='normal'))\n",
59
+ " model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\n",
60
+ " \n",
61
+ " return model"
62
+ ]
63
+ },
64
+ {
65
+ "cell_type": "code",
66
+ "execution_count": null,
67
+ "metadata": {},
68
+ "outputs": [],
69
+ "source": [
70
+ "def NN2(input_dim, output_dim, isClassification = True):\n",
71
+ " print(\"Starting NN2\")\n",
72
+ " \n",
73
+ " model = Sequential()\n",
74
+ " model.add(Dense(50, input_dim=input_dim, activation='relu', kernel_initializer='normal', kernel_regularizer=l2(0.01)))\n",
75
+ " model.add(Dense(100, activation='relu', kernel_initializer='normal', kernel_regularizer=l2(0.01)))\n",
76
+ " model.add(Dense(50, activation='relu', kernel_initializer='normal', kernel_regularizer=l2(0.01)))\n",
77
+ " \n",
78
+ " if (isClassification == False):\n",
79
+ " model.add(Dense(1, kernel_initializer='normal'))\n",
80
+ " model.compile(loss='mean_squared_error', optimizer='sgd')\n",
81
+ " elif (isClassification == True):\n",
82
+ " model.add(Dense(output_dim, activation='softmax', kernel_initializer='normal'))\n",
83
+ " model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])\n",
84
+ " \n",
85
+ " return model"
86
+ ]
87
+ },
88
+ {
89
+ "cell_type": "code",
90
+ "execution_count": null,
91
+ "metadata": {},
92
+ "outputs": [],
93
+ "source": [
94
+ "# Deep Model\n",
95
+ "def DeepNN(input_dim, output_dim, isClassification = True):\n",
96
+ " print(\"Starting DeepNN\")\n",
97
+ " \n",
98
+ " model = Sequential()\n",
99
+ " model.add(Dense(500, input_dim=input_dim, activation='relu', kernel_initializer='normal'))\n",
100
+ " model.add(BatchNormalization())\n",
101
+ " model.add(Dropout(0.5))\n",
102
+ " model.add(Dense(1024, kernel_initializer='normal'))\n",
103
+ " model.add(BatchNormalization())\n",
104
+ " model.add(Dropout(0.5))\n",
105
+ " model.add(Dense(2048, kernel_initializer='normal', kernel_regularizer=l2(0.1)))\n",
106
+ " model.add(BatchNormalization())\n",
107
+ " model.add(Dropout(0.5))\n",
108
+ " model.add(Dense(4096, kernel_initializer='random_uniform', kernel_regularizer=l2(0.1)))\n",
109
+ " model.add(BatchNormalization())\n",
110
+ " model.add(Dropout(0.5))\n",
111
+ " model.add(Dense(2048, kernel_initializer='random_uniform', kernel_regularizer=l2(0.1)))\n",
112
+ " model.add(BatchNormalization())\n",
113
+ " model.add(Dropout(0.5))\n",
114
+ " model.add(Dense(1024, kernel_initializer='normal', kernel_regularizer=l2(0.1)))\n",
115
+ " model.add(BatchNormalization())\n",
116
+ " model.add(Dropout(0.5))\n",
117
+ " model.add(Dense(500, kernel_initializer='normal'))\n",
118
+ " model.add(BatchNormalization())\n",
119
+ " model.add(Dropout(0.2))\n",
120
+ " model.add(PReLU())\n",
121
+ "\n",
122
+ " if (isClassification == False):\n",
123
+ " model.add(Dense(1, kernel_initializer='normal'))\n",
124
+ " model.compile(loss='mean_squared_error', optimizer='adam')\n",
125
+ " elif (isClassification == True):\n",
126
+ " model.add(Dense(output_dim, activation='softmax', kernel_initializer='normal'))\n",
127
+ " model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])\n",
128
+ " \n",
129
+ " return model"
130
+ ]
131
+ },
132
+ {
133
+ "cell_type": "markdown",
134
+ "metadata": {},
135
+ "source": [
136
+ "# Variance Importance methods"
137
+ ]
138
+ },
139
+ {
140
+ "cell_type": "code",
141
+ "execution_count": null,
142
+ "metadata": {},
143
+ "outputs": [],
144
+ "source": [
145
+ "from neural_importance import VarianceImportanceCallback\n"
146
+ ]
147
+ },
148
+ {
149
+ "cell_type": "code",
150
+ "execution_count": null,
151
+ "metadata": {},
152
+ "outputs": [],
153
+ "source": [
154
+ "# Taken from https://csiu.github.io/blog/update/2017/03/29/day33.html\n",
155
+ "def garson(A, B):\n",
156
+ " \"\"\"\n",
157
+ " Computes Garson's algorithm\n",
158
+ " A = matrix of weights of input-hidden layer (rows=input & cols=hidden)\n",
159
+ " B = vector of weights of hidden-output layer\n",
160
+ " \"\"\"\n",
161
+ " B = np.diag(B)\n",
162
+ "\n",
163
+ " # connection weight through the different hidden node\n",
164
+ " cw = np.dot(A, B)\n",
165
+ "\n",
166
+ " # weight through node (axis=0 is column; sum per input feature)\n",
167
+ " cw_h = abs(cw).sum(axis=0)\n",
168
+ "\n",
169
+ " # relative contribution of input neuron to outgoing signal of each hidden neuron\n",
170
+ " # sum to find relative contribution of input neuron\n",
171
+ " rc = np.divide(abs(cw), abs(cw_h))\n",
172
+ " rc = rc.sum(axis=1)\n",
173
+ "\n",
174
+ " # normalize to 100% for relative importance\n",
175
+ " ri = rc / rc.sum()\n",
176
+ " return(ri)"
177
+ ]
178
+ },
179
+ {
180
+ "cell_type": "code",
181
+ "execution_count": null,
182
+ "metadata": {},
183
+ "outputs": [],
184
+ "source": [
185
+ "# Adapted from https://csiu.github.io/blog/update/2017/03/29/day33.html\n",
186
+ "class VarImpGarson(tensorflow.keras.callbacks.Callback):\n",
187
+ " def __init__(self, verbose=0):\n",
188
+ " self.verbose = verbose\n",
189
+ " \n",
190
+ " def on_train_end(self, batch, logs={}):\n",
191
+ " if self.verbose:\n",
192
+ " print(\"VarImp Garson\")\n",
193
+ " \"\"\"\n",
194
+ " Computes Garson's algorithm\n",
195
+ " A = matrix of weights of input-hidden layer (rows=input & cols=hidden)\n",
196
+ " B = vector of weights of hidden-output layer\n",
197
+ " \"\"\"\n",
198
+ " A = self.model.layers[0].get_weights()[0]\n",
199
+ " B = self.model.layers[len(self.model.layers)-1].get_weights()[0]\n",
200
+ " \n",
201
+ " self.var_scores = 0\n",
202
+ " for i in range(B.shape[1]):\n",
203
+ " self.var_scores += garson(A, np.transpose(B)[i])\n",
204
+ " if self.verbose:\n",
205
+ " print(\"Most important variables: \",\n",
206
+ " np.array(self.var_scores).argsort()[-10:][::-1])"
207
+ ]
208
+ },
209
+ {
210
+ "cell_type": "code",
211
+ "execution_count": null,
212
+ "metadata": {},
213
+ "outputs": [],
214
+ "source": [
215
+ "# Leave-One-Feature-Out LOFO\n",
216
+ "def LeaveOneFeatureOut(model, X, Y):\n",
217
+ " OneOutScore = []\n",
218
+ " n = X.shape[0]\n",
219
+ " for i in range(0,X.shape[1]):\n",
220
+ " newX = X.copy()\n",
221
+ " newX[:,i] = 0 #np.random.normal(0,1,n)\n",
222
+ " OneOutScore.append(model.evaluate(newX, Y, batch_size=2048, verbose=0))\n",
223
+ " OneOutScore = pd.DataFrame(OneOutScore[:])\n",
224
+ " ordered = np.argsort(-OneOutScore.iloc[:,0])\n",
225
+ " return(OneOutScore, ordered)"
226
+ ]
227
+ },
228
+ {
229
+ "cell_type": "markdown",
230
+ "metadata": {},
231
+ "source": [
232
+ "# Testing variable importance"
233
+ ]
234
+ },
235
+ {
236
+ "cell_type": "markdown",
237
+ "metadata": {},
238
+ "source": [
239
+ "#### Settings obtained for each dataset"
240
+ ]
241
+ },
242
+ {
243
+ "cell_type": "code",
244
+ "execution_count": null,
245
+ "metadata": {},
246
+ "outputs": [],
247
+ "source": [
248
+ "data = list()\n",
249
+ "data.append({\"name\": 'breastcancer', \"classification\": True, \"data\": datasets.load_breast_cancer()})\n",
250
+ "data.append({\"name\": 'digits', \"classification\": True, \"data\": datasets.load_digits()})\n",
251
+ "data.append({\"name\": 'iris', \"classification\": True, \"data\": datasets.load_iris()})\n",
252
+ "data.append({\"name\": 'wine', \"classification\": True, \"data\": datasets.load_wine()})\n",
253
+ "data.append({\"name\": 'boston', \"classification\": False, \"data\": datasets.load_boston()})\n",
254
+ "data.append({\"name\": 'diabetes', \"classification\": False, \"data\": datasets.load_diabetes()})"
255
+ ]
256
+ },
257
+ {
258
+ "cell_type": "code",
259
+ "execution_count": null,
260
+ "metadata": {},
261
+ "outputs": [],
262
+ "source": [
263
+ "from tensorflow.keras.callbacks import Callback\n",
264
+ "import numpy as np\n",
265
+ "\n",
266
+ "class AccuracyMonitor(Callback):\n",
267
+ " def __init__(self,\n",
268
+ " monitor='val_acc',\n",
269
+ " verbose=0,\n",
270
+ " min_epochs=5,\n",
271
+ " baseline=None):\n",
272
+ " super(AccuracyMonitor, self).__init__()\n",
273
+ "\n",
274
+ " self.monitor = monitor\n",
275
+ " self.baseline = baseline\n",
276
+ " self.verbose = verbose\n",
277
+ " self.min_epochs = min_epochs\n",
278
+ " self.stopped_epoch = 0\n",
279
+ "\n",
280
+ " def on_epoch_end(self, epoch, logs=None):\n",
281
+ " if logs.get(self.monitor) > self.baseline and epoch > self.min_epochs:\n",
282
+ " self.stopped_epoch = epoch\n",
283
+ " self.model.stop_training = True\n",
284
+ " print('\\n Stopped at epoch {epoch}. Accuracy of {accuracy} reached.'.format(epoch=(self.stopped_epoch + 1), accuracy=logs.get(self.monitor)), \"\\n\")\n",
285
+ "\n",
286
+ " def on_train_end(self, logs=None):\n",
287
+ " if self.stopped_epoch > 0 and self.verbose > 0:\n",
288
+ " print('Epoch %05d: early stopping' % (self.stopped_epoch + 1))"
289
+ ]
290
+ },
291
+ {
292
+ "cell_type": "code",
293
+ "execution_count": null,
294
+ "metadata": {
295
+ "scrolled": true
296
+ },
297
+ "outputs": [],
298
+ "source": [
299
+ "import matplotlib.pyplot as plt\n",
300
+ "from numpy.random import seed\n",
301
+ "from tensorflow.keras.callbacks import EarlyStopping\n",
302
+ "\n",
303
+ "def runExp(data, mdl = \"NN1\", xseed = 42, epochs = 1000, verbose = 0):\n",
304
+ "\n",
305
+ " res = list()\n",
306
+ " VIANN_list = []\n",
307
+ " Garson_list = []\n",
308
+ " LOFO_list = []\n",
309
+ " RF_list = []\n",
310
+ " for i in range(len(data)):\n",
311
+ " seed(xseed)\n",
312
+ " \n",
313
+ " dataset = data[i]['data']\n",
314
+ " isClassification = data[i]['classification']\n",
315
+ " datname = data[i]['name']\n",
316
+ " \n",
317
+ " print(\"============\")\n",
318
+ " print( data[i]['name'])\n",
319
+ " print(\"============\\n\")\n",
320
+ "\n",
321
+ " if isClassification == True:\n",
322
+ " #Classification\n",
323
+ "\n",
324
+ " labels_encoded = []\n",
325
+ " for labels in [dataset.target]:\n",
326
+ " encoder = LabelEncoder()\n",
327
+ " encoder.fit(labels)\n",
328
+ " encoded_Y = encoder.transform(labels)\n",
329
+ " # convert integers to dummy variables (i.e. one hot encoded)\n",
330
+ " labels_encoded.append(np_utils.to_categorical(encoded_Y))\n",
331
+ " dataset.targetLabels = labels_encoded[0]\n",
332
+ "\n",
333
+ " # fit a Random Forest model to the data\n",
334
+ " RFmodel = RandomForestClassifier(n_estimators=100)\n",
335
+ "\n",
336
+ " output_size = dataset.targetLabels.shape[1]\n",
337
+ "\n",
338
+ " else:\n",
339
+ " dataset.targetLabels = scale(dataset.target)\n",
340
+ " output_size = 1\n",
341
+ "\n",
342
+ " # fit a Random Forest model to the data\n",
343
+ " RFmodel = RandomForestRegressor(n_estimators=100)\n",
344
+ "\n",
345
+ " X = scale(dataset.data)\n",
346
+ " Y = dataset.targetLabels\n",
347
+ "\n",
348
+ " RFmodel.fit(X, Y)\n",
349
+ " \n",
350
+ " VIANN = VarianceImportanceCallback()\n",
351
+ " Garson = VarImpGarson(verbose=verbose)\n",
352
+ "\n",
353
+ " if (mdl == \"NN1\"):\n",
354
+ " model = NN1(X.shape[1], output_size, isClassification)\n",
355
+ " elif (mdl == \"NN2\"):\n",
356
+ " model = NN2(X.shape[1], output_size, isClassification)\n",
357
+ " elif (mdl == \"DeepNN\"):\n",
358
+ " model = DeepNN(X.shape[1], output_size, isClassification)\n",
359
+ " \n",
360
+ " clbs = [VIANN,Garson]\n",
361
+ " if isClassification == True:\n",
362
+ " clbs.append(AccuracyMonitor(monitor='val_acc', baseline=0.95, min_epochs = 5))\n",
363
+ " else:\n",
364
+ " epochs = 100\n",
365
+ " \n",
366
+ " model.fit(X, Y, validation_split=0.05, epochs=epochs, batch_size=np.round(X.shape[0]/7).astype(int), shuffle=True, \n",
367
+ " verbose=verbose, callbacks = clbs)\n",
368
+ "\n",
369
+ " LOFO, LOFO_Ordered = LeaveOneFeatureOut(model, X, Y)\n",
370
+ "\n",
371
+ " print('VIANN vs LOFO: ', round(np.corrcoef([VIANN.var_scores,LOFO[0]])[0,1], 2))\n",
372
+ " print('VIANN vs RF: ', round(np.corrcoef([VIANN.var_scores,RFmodel.feature_importances_])[0,1], 2))\n",
373
+ " print('Garson vs LOFO: ', round(np.corrcoef([Garson.var_scores,LOFO[0]])[0,1], 2))\n",
374
+ " print('Garson vs VIANN:', round(np.corrcoef([Garson.var_scores,VIANN.var_scores])[0,1], 2))\n",
375
+ " \n",
376
+ " res.append([data[i]['name'],\n",
377
+ " round(np.corrcoef([VIANN.var_scores,LOFO[0]])[0,1], 2), \n",
378
+ " round(np.corrcoef([VIANN.var_scores,RFmodel.feature_importances_])[0,1], 2),\n",
379
+ " round(np.corrcoef([Garson.var_scores,LOFO[0]])[0,1], 2),\n",
380
+ " round(np.corrcoef([Garson.var_scores,VIANN.var_scores])[0,1], 2)\n",
381
+ " ])\n",
382
+ " \n",
383
+ " VIANN_list.append([data[i]['name'], VIANN.var_scores])\n",
384
+ " Garson_list.append([data[i]['name'], Garson.var_scores])\n",
385
+ " LOFO_list.append([data[i]['name'], LOFO])\n",
386
+ " RF_list.append([data[i]['name'], RFmodel.feature_importances_])\n",
387
+ " \n",
388
+ " df = pd.DataFrame(res)\n",
389
+ " df.columns = (\"Dataset\", \"VIANN vs LOFO\", \"VIANN vs RF\", \"Garson vs LOFO\", \"Garson vs VIANN\")\n",
390
+ " \n",
391
+ " return df, VIANN_list, Garson_list, LOFO_list, RF_list"
392
+ ]
393
+ },
394
+ {
395
+ "cell_type": "code",
396
+ "execution_count": null,
397
+ "metadata": {},
398
+ "outputs": [],
399
+ "source": [
400
+ "rsNN1, VIANN_NN1, Garson_NN1, LOFO_NN1, RF = runExp(data, mdl = \"NN1\", verbose = 0)\n",
401
+ "rsNN1"
402
+ ]
403
+ },
404
+ {
405
+ "cell_type": "code",
406
+ "execution_count": null,
407
+ "metadata": {
408
+ "scrolled": false
409
+ },
410
+ "outputs": [],
411
+ "source": [
412
+ "rsNN2, VIANN_NN2, Garson_NN2, LOFO_NN2, RF = runExp(data, mdl = \"NN2\", verbose = 0)\n",
413
+ "rsNN2"
414
+ ]
415
+ },
416
+ {
417
+ "cell_type": "code",
418
+ "execution_count": null,
419
+ "metadata": {},
420
+ "outputs": [],
421
+ "source": [
422
+ "rsDeepNN, VIANN_DeepNN, Garson_DeepNN, LOFO_DeepNN, RF = runExp(data, mdl = \"DeepNN\", verbose = 0)\n",
423
+ "rsDeepNN"
424
+ ]
425
+ },
426
+ {
427
+ "cell_type": "markdown",
428
+ "metadata": {},
429
+ "source": [
430
+ "## Results published in Discovery Science 2019"
431
+ ]
432
+ },
433
+ {
434
+ "cell_type": "code",
435
+ "execution_count": null,
436
+ "metadata": {},
437
+ "outputs": [],
438
+ "source": [
439
+ "rsNN1"
440
+ ]
441
+ },
442
+ {
443
+ "cell_type": "code",
444
+ "execution_count": null,
445
+ "metadata": {},
446
+ "outputs": [],
447
+ "source": [
448
+ "rsNN2"
449
+ ]
450
+ },
451
+ {
452
+ "cell_type": "code",
453
+ "execution_count": null,
454
+ "metadata": {},
455
+ "outputs": [],
456
+ "source": [
457
+ "rsDeepNN"
458
+ ]
459
+ },
460
+ {
461
+ "cell_type": "code",
462
+ "execution_count": null,
463
+ "metadata": {},
464
+ "outputs": [],
465
+ "source": [
466
+ "modelname = \"NN2\"\n",
467
+ "datname = VIANN_NN2[1][0]\n",
468
+ "xx = VIANN_NN2[1][1]\n",
469
+ "yy = LOFO_NN2[1][1][0]\n",
470
+ "\n",
471
+ "f = plt.figure()\n",
472
+ "plt.scatter(xx, yy)\n",
473
+ "plt.xlabel('VIANN')\n",
474
+ "plt.ylabel('LOFO')\n",
475
+ "plt.title('VIANN vs LOFO' + \" (\" + datname + \" dataset)\")\n",
476
+ "plt.show()\n",
477
+ "f.savefig(\"VIANNvsLOFO_\" + datname + \"_\" + modelname +\".pdf\")\n",
478
+ "\n",
479
+ "print(np.corrcoef([xx,yy])[0,1])"
480
+ ]
481
+ },
482
+ {
483
+ "cell_type": "code",
484
+ "execution_count": null,
485
+ "metadata": {},
486
+ "outputs": [],
487
+ "source": [
488
+ "modelname = \"DeepNN\"\n",
489
+ "datname = VIANN_DeepNN[1][0]\n",
490
+ "xx = VIANN_DeepNN[1][1]\n",
491
+ "yy = RF[1][1]\n",
492
+ "\n",
493
+ "f = plt.figure()\n",
494
+ "plt.scatter(xx, yy)\n",
495
+ "plt.xlabel('VIANN')\n",
496
+ "plt.ylabel('RF feature importance')\n",
497
+ "plt.title('VIANN vs RF' + \" (\" + datname + \" dataset)\")\n",
498
+ "plt.show()\n",
499
+ "f.savefig(\"VIANNvsRF_\" + datname + \"_\" + modelname +\".pdf\")\n",
500
+ "\n",
501
+ "print(np.corrcoef([xx,yy])[0,1])"
502
+ ]
503
+ },
504
+ {
505
+ "cell_type": "code",
506
+ "execution_count": null,
507
+ "metadata": {},
508
+ "outputs": [],
509
+ "source": []
510
+ }
511
+ ],
512
+ "metadata": {
513
+ "kernelspec": {
514
+ "display_name": "Python 3",
515
+ "language": "python",
516
+ "name": "python3"
517
+ },
518
+ "nbformat": 4,
519
+ "nbformat_minor": 2
520
+ }