aisp 0.1.34__py3-none-any.whl → 0.1.40__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- aisp/__init__.py +4 -0
- aisp/base/__init__.py +4 -0
- aisp/base/_classifier.py +90 -0
- aisp/exceptions.py +42 -0
- aisp/nsa/__init__.py +11 -0
- aisp/nsa/_base.py +118 -0
- aisp/nsa/_negative_selection.py +682 -0
- aisp/nsa/_ns_core.py +153 -0
- aisp/utils/__init__.py +2 -1
- aisp/utils/_multiclass.py +16 -30
- aisp/utils/distance.py +215 -0
- aisp/utils/metrics.py +22 -43
- aisp/utils/sanitizers.py +55 -0
- {aisp-0.1.34.dist-info → aisp-0.1.40.dist-info}/METADATA +11 -111
- aisp-0.1.40.dist-info/RECORD +18 -0
- {aisp-0.1.34.dist-info → aisp-0.1.40.dist-info}/WHEEL +1 -1
- aisp/NSA/__init__.py +0 -18
- aisp/NSA/_base.py +0 -281
- aisp/NSA/_negative_selection.py +0 -1115
- aisp-0.1.34.dist-info/RECORD +0 -11
- {aisp-0.1.34.dist-info → aisp-0.1.40.dist-info}/licenses/LICENSE +0 -0
- {aisp-0.1.34.dist-info → aisp-0.1.40.dist-info}/top_level.txt +0 -0
aisp/NSA/_negative_selection.py
DELETED
@@ -1,1115 +0,0 @@
|
|
1
|
-
import numpy as np
|
2
|
-
import numpy.typing as npt
|
3
|
-
from tqdm import tqdm
|
4
|
-
from typing import Dict, Literal, Optional, Union
|
5
|
-
from collections import namedtuple
|
6
|
-
from scipy.spatial.distance import cdist
|
7
|
-
|
8
|
-
from ._base import Base
|
9
|
-
from ..utils import slice_index_list_by_class
|
10
|
-
|
11
|
-
|
12
|
-
class RNSA(Base):
|
13
|
-
"""
|
14
|
-
The ``RNSA`` (Real-Valued Negative Selection Algorithm) class is for classification and \
|
15
|
-
identification purposes. of anomalies through the self and not self method.
|
16
|
-
|
17
|
-
Attributes:
|
18
|
-
---
|
19
|
-
* N (``int``): Number of detectors.
|
20
|
-
* r (``float``): Radius of the detector.
|
21
|
-
* r_s (``float``): rₛ Radius of the ``X`` own samples.
|
22
|
-
* k (``int``): K number of near neighbors to calculate the average
|
23
|
-
distance of the detectors.
|
24
|
-
* metric (``str``): Way to calculate the distance: ``'euclidean', 'minkowski', or 'manhattan'``.
|
25
|
-
* max_discards (``int``): This parameter indicates the maximum number of consecutive \
|
26
|
-
detector discards, aimed at preventing a possible infinite loop in case a radius is \
|
27
|
-
defined that cannot generate non-self detectors.
|
28
|
-
* seed (``int``): Seed for the random generation of detector values.
|
29
|
-
* algorithm(``str``), Set the algorithm version:
|
30
|
-
|
31
|
-
* ``'default-NSA'``: Default algorithm with fixed radius.
|
32
|
-
* ``'V-detector'``: This algorithm uses a variable radius for anomaly detection \
|
33
|
-
in feature spaces.
|
34
|
-
|
35
|
-
Defaults to ``'default-NSA'``.
|
36
|
-
|
37
|
-
* non_self_label (``str``): This variable stores the label that will be when the data has \
|
38
|
-
only one output class, and the sample is classified as not belonging to that class. \
|
39
|
-
Defaults to ``'non-self'``.
|
40
|
-
* cell_bounds (``bool``): If set to ``True``, this option limits the generation of \
|
41
|
-
detectors to the space within the plane between 0 and 1. This means that any detector \
|
42
|
-
whose radius exceeds this limit is discarded, this variable is only used in the \
|
43
|
-
``V-detector`` algorithm. Defaults to ``False``.
|
44
|
-
* p (``float``): This parameter stores the value of ``p`` used in the Minkowski distance. \
|
45
|
-
The default is ``2``, which represents normalized Euclidean distance. Different values \
|
46
|
-
of p lead to different variants of the Minkowski distance \
|
47
|
-
[learn more](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.minkowski.html).
|
48
|
-
|
49
|
-
* detectors (``dict``): This variable stores a list of detectors by class.
|
50
|
-
* classes (``npt.NDArray``): list of output classes.
|
51
|
-
|
52
|
-
---
|
53
|
-
|
54
|
-
A classe ``RNSA`` (Algoritmo de Seleção Negativa de Valor Real) tem a finalidade de classificação \
|
55
|
-
e identificação de anomalias através do método self e not self .
|
56
|
-
|
57
|
-
Attributes:
|
58
|
-
---
|
59
|
-
* N (``int``): Quantidade de detectores.
|
60
|
-
* r (``float``): Raio do detector.
|
61
|
-
* r_s (``float``): O valor de ``rₛ`` é o raio das amostras próprias da matriz ``X``.
|
62
|
-
* k (``int``): K quantidade de vizinhos próximos para calcular a média da distância dos detectores.
|
63
|
-
* metric (``str``): Forma de calcular a distância: ``'euclidiana', 'minkowski', or 'manhattan'``.
|
64
|
-
* max_discards (``int``): Este parâmetro indica o número máximo de descartes de detectores \
|
65
|
-
em sequência, que tem como objetivo evitar um possível loop infinito caso seja definido \
|
66
|
-
um raio que não seja possível gerar detectores do não-próprio.
|
67
|
-
* seed (``int``): Semente para a geração randômica dos valores dos detectores.
|
68
|
-
* algorithm (``str``), Definir a versão do algoritmo:
|
69
|
-
|
70
|
-
* ``'default-NSA'``: Algoritmo padrão com raio fixo.
|
71
|
-
* ``'V-detector'``: Este algoritmo utiliza um raio variável para a detecção de \
|
72
|
-
anomalias em espaços de características.
|
73
|
-
|
74
|
-
Defaults to ``'default-NSA'``.
|
75
|
-
|
76
|
-
* non_self_label (``str``): Esta variável armazena o rótulo que será atribuído quando \
|
77
|
-
os dados possuírem apenas uma classe de saída, e a amostra for classificada como não \
|
78
|
-
pertencente a essa classe. Defaults to ``'non-self'``.
|
79
|
-
* cell_bounds (``bool``): Se definido como ``True``, esta opção limita a geração dos \
|
80
|
-
detectores ao espaço do plano compreendido entre 0 e 1. Isso significa que qualquer \
|
81
|
-
detector cujo raio ultrapasse esse limite é descartado, e esta variável é usada \
|
82
|
-
exclusivamente no algoritmo ``V-detector``.
|
83
|
-
* p (``float``): Este parâmetro armazena o valor de ``p`` utilizada na distância de \
|
84
|
-
Minkowski. O padrão é ``2``, o que significa distância euclidiana normalizada. \
|
85
|
-
Diferentes valores de p levam a diferentes variantes da distância de Minkowski \
|
86
|
-
[saiba mais](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.minkowski.html).
|
87
|
-
|
88
|
-
* detectors (``dict``): Essa variável armazena uma lista com detectores por classes.
|
89
|
-
* classes (``npt.NDArray``): lista com as classes de saída.
|
90
|
-
"""
|
91
|
-
|
92
|
-
def __init__(
|
93
|
-
self,
|
94
|
-
N: int = 100,
|
95
|
-
r: float = 0.05,
|
96
|
-
r_s: float = 0.0001,
|
97
|
-
k: int = 1,
|
98
|
-
metric: Literal["manhattan", "minkowski", "euclidean"] = "euclidean",
|
99
|
-
max_discards: int = 1000,
|
100
|
-
seed: int = None,
|
101
|
-
algorithm: Literal["default-NSA", "V-detector"] = "default-NSA",
|
102
|
-
**kwargs: Dict[str, Union[bool, str, float]],
|
103
|
-
):
|
104
|
-
"""
|
105
|
-
Negative Selection class constructor (``RNSA``).
|
106
|
-
|
107
|
-
Details:
|
108
|
-
---
|
109
|
-
This method initializes the ``detectors``, ``classes``, ``k``, ``metric``, ``N``, ``r``, \
|
110
|
-
``r_S``, ``max_discards``, ``seed`` and ``algorithm`` attributes.
|
111
|
-
|
112
|
-
Parameters:
|
113
|
-
---
|
114
|
-
* N (``int``): Number of detectors. Defaults to ``100``.
|
115
|
-
* r (``float``): Radius of the detector. Defaults to ``0.05``.
|
116
|
-
* r_s (``float``): rₛ Radius of the ``X`` own samples. Defaults to ``0.0001``.
|
117
|
-
* k (``int``): Number of neighbors near the randomly generated detectors to perform the \
|
118
|
-
distance average calculation. Defaults to ``1``.
|
119
|
-
* metric (``str``): Way to calculate the distance between the detector and the sample:
|
120
|
-
|
121
|
-
* ``'Euclidean'`` ➜ The calculation of the distance is given by the expression: \
|
122
|
-
√( (x₁ – x₂)² + (y₁ – y₂)² + ... + (yn – yn)²).
|
123
|
-
* ``'minkowski'`` ➜ The calculation of the distance is given by the expression: \
|
124
|
-
( |X₁ – Y₁|p + |X₂ – Y₂|p + ... + |Xn – Yn|p) ¹/ₚ.
|
125
|
-
* ``'manhattan'`` ➜ The calculation of the distance is given by the expression: \
|
126
|
-
( |x₁ – x₂| + |y₁ – y₂| + ... + |yn – yn|) .
|
127
|
-
|
128
|
-
Defaults to ``'euclidean'``.
|
129
|
-
|
130
|
-
* max_discards (``int``): This parameter indicates the maximum number of consecutive \
|
131
|
-
detector discards, aimed at preventing a possible infinite loop in case a radius is \
|
132
|
-
defined that cannot generate non-self detectors. Defaults to ``1000``.
|
133
|
-
* seed (``int``): Seed for the random generation of values in the detectors. Defaults \
|
134
|
-
to ``None``.
|
135
|
-
|
136
|
-
* algorithm(``str``), Set the algorithm version:
|
137
|
-
|
138
|
-
* ``'default-NSA'``: Default algorithm with fixed radius.
|
139
|
-
* ``'V-detector'``: This algorithm is based on the article \
|
140
|
-
"[Real-Valued Negative Selection Algorithm with Variable-Sized Detectors](https://doi.org/10.1007/978-3-540-24854-5_30)", \
|
141
|
-
by Ji, Z., Dasgupta, D. (2004), and uses a variable radius for anomaly \
|
142
|
-
detection in feature spaces.
|
143
|
-
|
144
|
-
Defaults to ``'default-NSA'``.
|
145
|
-
|
146
|
-
- ``**kwargs``:
|
147
|
-
- non_self_label (``str``): This variable stores the label that will be assigned \
|
148
|
-
when the data has only one output class, and the sample is classified as not \
|
149
|
-
belonging to that class. Defaults to ``'non-self'``.
|
150
|
-
- cell_bounds (``bool``): If set to ``True``, this option limits the generation \
|
151
|
-
of detectors to the space within the plane between 0 and 1. This means that \
|
152
|
-
any detector whose radius exceeds this limit is discarded, this variable is \
|
153
|
-
only used in the ``V-detector`` algorithm. Defaults to ``False``.
|
154
|
-
- p (``float``): This parameter stores the value of ``p`` used in the Minkowski \
|
155
|
-
distance. The default is ``2``, which represents normalized Euclidean distance. \
|
156
|
-
Different values of p lead to different variants of the Minkowski distance \
|
157
|
-
[learn more](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.minkowski.html).
|
158
|
-
---
|
159
|
-
|
160
|
-
Construtor da classe de Seleção negativa (``RNSA``).
|
161
|
-
|
162
|
-
Details:
|
163
|
-
---
|
164
|
-
Este método inicializa os atributos ``detectors``, ``classes``, ``k``, ``metric``, ``N``, \
|
165
|
-
``r`` e ``seed``.
|
166
|
-
|
167
|
-
Parameters:
|
168
|
-
---
|
169
|
-
* N (``int``): Quantidade de detectores. Defaults to ``100``.
|
170
|
-
* r (``float``): Raio do detector. Defaults to ``0.05``.
|
171
|
-
* r_s (``float``): O valor de ``rₛ`` é o raio das amostras próprias da matriz ``X``. \
|
172
|
-
Defaults to ``0.0001``.
|
173
|
-
* k (``int``): Quantidade de vizinhos próximos dos detectores gerados aleatoriamente \
|
174
|
-
para efetuar o cálculo da média da distância. Defaults to ``1``.
|
175
|
-
* metric (``str``): Forma para se calcular a distância entre o detector e a amostra:
|
176
|
-
|
177
|
-
* ``'euclidiana'`` ➜ O cálculo da distância dá-se pela expressão: \
|
178
|
-
√( (x₁ – x₂)² + (y₁ – y₂)² + ... + (yn – yn)²).
|
179
|
-
* ``'minkowski'`` ➜ O cálculo da distância dá-se pela expressão: \
|
180
|
-
( |X₁ – Y₁|p + |X₂ – Y₂|p + ... + |Xn – Yn|p) ¹/ₚ.
|
181
|
-
* ``'manhattan'`` ➜ O cálculo da distância dá-se pela expressão: \
|
182
|
-
( |x₁ – x₂| + |y₁ – y₂| + ... + |yn – yn|).
|
183
|
-
|
184
|
-
Defaults to ``'euclidean'``.
|
185
|
-
|
186
|
-
* max_discards (``int``): Este parâmetro indica o número máximo de descartes de detectores \
|
187
|
-
em sequência, que tem como objetivo evitar um possível loop infinito caso seja definido \
|
188
|
-
um raio que não seja possível gerar detectores do não-próprio. Defaults to ``1000``.
|
189
|
-
* seed (``int``): Semente para a geração randômica dos valores nos detectores. \
|
190
|
-
Defaults to ``None``.
|
191
|
-
* algorithm (``str``), Definir a versão do algoritmo:
|
192
|
-
|
193
|
-
* ``'default-NSA'``: Algoritmo padrão com raio fixo.
|
194
|
-
* ``'V-detector'``: Este algoritmo é baseado no artigo \
|
195
|
-
"[Real-Valued Negative Selection Algorithm with Variable-Sized Detectors](https://doi.org/10.1007/978-3-540-24854-5_30)", \
|
196
|
-
de autoria de Ji, Z., Dasgupta, D. (2004), e utiliza um raio variável para a \
|
197
|
-
detecção de anomalias em espaços de características.
|
198
|
-
|
199
|
-
Defaults to ``'default-NSA'``.
|
200
|
-
|
201
|
-
- ``**kwargs``:
|
202
|
-
- non_self_label (``str``): Esta variável armazena o rótulo que será atribuído \
|
203
|
-
quando os dados possuírem apenas uma classe de saída, e a amostra for \
|
204
|
-
classificada como não pertencente a essa classe. Defaults to ``'non-self'``.
|
205
|
-
- cell_bounds (``bool``): Se definido como ``True``, esta opção limita a \
|
206
|
-
geração dos detectores ao espaço do plano compreendido entre 0 e 1. Isso \
|
207
|
-
significa que qualquer detector cujo raio ultrapasse esse limite é descartado, \
|
208
|
-
e esta variável é usada exclusivamente no algoritmo ``V-detector``.
|
209
|
-
- p (``float``): Este parâmetro armazena o valor de ``p`` utilizada na distância \
|
210
|
-
de Minkowski. O padrão é ``2``, o que significa distância euclidiana normalizada. \
|
211
|
-
Diferentes valores de p levam a diferentes variantes da distância de Minkowski \
|
212
|
-
[saiba mais](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.minkowski.html).
|
213
|
-
"""
|
214
|
-
|
215
|
-
super().__init__(metric)
|
216
|
-
if metric == "manhattan" or metric == "minkowski":
|
217
|
-
self.metric = metric
|
218
|
-
else:
|
219
|
-
self.metric = "euclidean"
|
220
|
-
|
221
|
-
if seed is not None and isinstance(seed, int):
|
222
|
-
np.random.seed(seed)
|
223
|
-
self.seed: int = seed
|
224
|
-
else:
|
225
|
-
self.seed = None
|
226
|
-
|
227
|
-
if k < 1:
|
228
|
-
self.k: int = 1
|
229
|
-
else:
|
230
|
-
self.k: int = k
|
231
|
-
|
232
|
-
if N < 1:
|
233
|
-
self.N: int = 100
|
234
|
-
else:
|
235
|
-
self.N: int = N
|
236
|
-
|
237
|
-
if r < 0:
|
238
|
-
self.r: float = 0.05
|
239
|
-
else:
|
240
|
-
self.r: float = r
|
241
|
-
|
242
|
-
if r_s > 0:
|
243
|
-
self.r_s: float = r_s
|
244
|
-
else:
|
245
|
-
self.r_s: float = 0
|
246
|
-
|
247
|
-
if algorithm == "V-detector":
|
248
|
-
self._Detector = namedtuple("Detector", ["position", "radius"])
|
249
|
-
self._algorithm: str = algorithm
|
250
|
-
else:
|
251
|
-
self._Detector = namedtuple("Detector", "position")
|
252
|
-
self._algorithm: str = "default-NSA"
|
253
|
-
|
254
|
-
if max_discards > 0:
|
255
|
-
self.max_discards: int = max_discards
|
256
|
-
else:
|
257
|
-
self.max_discards: int = 1000
|
258
|
-
|
259
|
-
# Retrieves the variables from kwargs.
|
260
|
-
self.p: float = kwargs.get("p", 2)
|
261
|
-
self._cell_bounds: bool = kwargs.get("cell_bounds", False)
|
262
|
-
self.non_self_label: str = kwargs.get("non_self_label", "non-self")
|
263
|
-
|
264
|
-
# Initializes the other class variables as None.
|
265
|
-
self.detectors: Union[dict, None] = None
|
266
|
-
self.classes: npt.NDArray = None
|
267
|
-
|
268
|
-
def fit(self, X: npt.NDArray, y: npt.NDArray, verbose: bool = True):
|
269
|
-
"""
|
270
|
-
The function ``fit(...)``, performs the training according to ``X`` and ``y``, using the method
|
271
|
-
negative selection method(``NegativeSelect``).
|
272
|
-
|
273
|
-
Parameters:
|
274
|
-
---
|
275
|
-
* X (``npt.NDArray``): Training array, containing the samples and their characteristics, \
|
276
|
-
[``N samples`` (rows)][``N features`` (columns)].
|
277
|
-
* y (``npt.NDArray``): Array of target classes of ``X`` with [``N samples`` (lines)].
|
278
|
-
* verbose (``bool``): Feedback from detector generation to the user.
|
279
|
-
returns:
|
280
|
-
---
|
281
|
-
(``self``): Returns the instance itself.
|
282
|
-
|
283
|
-
----
|
284
|
-
|
285
|
-
A função ``fit(...)``, realiza o treinamento de acordo com ``X`` e ``y``, usando o método
|
286
|
-
de seleção negativa(``NegativeSelect``).
|
287
|
-
|
288
|
-
Parameters:
|
289
|
-
---
|
290
|
-
* X (``npt.NDArray``): Array de treinamento, contendo as amostras é suas características, \
|
291
|
-
[``N amostras`` (linhas)][``N características`` (colunas)].
|
292
|
-
* y (``npt.NDArray``): Array com as classes alvos de ``X`` com [``N amostras`` (linhas)].
|
293
|
-
* verbose (``bool``): Feedback da geração de detectores para o usuário.
|
294
|
-
Returns:
|
295
|
-
---
|
296
|
-
(``self``): Retorna a própria instância.
|
297
|
-
"""
|
298
|
-
progress = None
|
299
|
-
super()._check_and_raise_exceptions_fit(X, y)
|
300
|
-
|
301
|
-
# Identifying the possible classes within the output array `y`.
|
302
|
-
self.classes = np.unique(y)
|
303
|
-
# Dictionary that will store detectors with classes as keys.
|
304
|
-
list_detectors_by_class = dict()
|
305
|
-
# Separates the classes for training.
|
306
|
-
sample_index = self.__slice_index_list_by_class(y)
|
307
|
-
# Progress bar for generating all detectors.
|
308
|
-
if verbose:
|
309
|
-
progress = tqdm(total=int(self.N * (len(self.classes))),
|
310
|
-
bar_format="{desc} ┇{bar}┇ {n}/{total} detectors", postfix="\n", )
|
311
|
-
for _class_ in self.classes:
|
312
|
-
# Initializes the empty set that will contain the valid detectors.
|
313
|
-
valid_detectors_set = []
|
314
|
-
discard_count = 0
|
315
|
-
# Indicating which class the algorithm is currently processing for the progress bar.
|
316
|
-
if verbose:
|
317
|
-
progress.set_description_str(
|
318
|
-
f"Generating the detectors for the {_class_} class:"
|
319
|
-
)
|
320
|
-
while len(valid_detectors_set) < self.N:
|
321
|
-
# Generates a candidate detector vector randomly with values between 0 and 1.
|
322
|
-
vector_x = np.random.random_sample(size=X.shape[1])
|
323
|
-
# Checks the validity of the detector for non-self with respect to the class samples.
|
324
|
-
valid_detector = self.__checks_valid_detector(
|
325
|
-
X=X, vector_x=vector_x, samples_index_class=sample_index[_class_]
|
326
|
-
)
|
327
|
-
|
328
|
-
# If the detector is valid, add it to the list of valid detectors.
|
329
|
-
if self._algorithm == "V-detector" and valid_detector is not False:
|
330
|
-
discard_count = 0
|
331
|
-
valid_detectors_set.append(
|
332
|
-
self._Detector(vector_x, valid_detector[1])
|
333
|
-
)
|
334
|
-
if verbose:
|
335
|
-
progress.update(1)
|
336
|
-
elif valid_detector:
|
337
|
-
discard_count = 0
|
338
|
-
valid_detectors_set.append(self._Detector(vector_x))
|
339
|
-
if verbose:
|
340
|
-
progress.update(1)
|
341
|
-
else:
|
342
|
-
discard_count += 1
|
343
|
-
if discard_count == self.max_discards:
|
344
|
-
raise Exception(
|
345
|
-
"An error has been identified:\n"
|
346
|
-
f"the maximum number of discards of detectors for the {_class_} class "
|
347
|
-
"has been reached.\nIt is recommended to check the defined radius and "
|
348
|
-
"consider reducing its value."
|
349
|
-
)
|
350
|
-
|
351
|
-
# Add detectors, with classes as keys in the dictionary.
|
352
|
-
list_detectors_by_class[_class_] = valid_detectors_set
|
353
|
-
# Notify completion of detector generation for the classes.
|
354
|
-
if verbose:
|
355
|
-
progress.set_description(
|
356
|
-
f'\033[92m✔ Non-self detectors for classes ({", ".join(map(str, self.classes))}) '
|
357
|
-
f'successfully generated\033[0m'
|
358
|
-
)
|
359
|
-
# Saves the found detectors in the attribute for the non-self detectors of the trained model.
|
360
|
-
self.detectors = list_detectors_by_class
|
361
|
-
return self
|
362
|
-
|
363
|
-
def predict(self, X: npt.NDArray) -> Optional[npt.NDArray]:
|
364
|
-
"""
|
365
|
-
Function to perform the prediction of classes based on detectors
|
366
|
-
created after training.
|
367
|
-
|
368
|
-
Parameters:
|
369
|
-
---
|
370
|
-
* X (``npt.NDArray``): Array with input samples with [``N samples`` (Lines)] and
|
371
|
-
[``N characteristics``(Columns)]
|
372
|
-
|
373
|
-
returns:
|
374
|
-
---
|
375
|
-
* C – (``npt.NDArray``): an ndarray of the form ``C`` [``N samples``],
|
376
|
-
containing the predicted classes for ``X``.
|
377
|
-
* ``None``: If there are no detectors for the prediction.
|
378
|
-
|
379
|
-
---
|
380
|
-
|
381
|
-
Função para efetuar a previsão das classes com base nos detectores
|
382
|
-
criados após o treinamento.
|
383
|
-
|
384
|
-
Parameters:
|
385
|
-
---
|
386
|
-
* X (``npt.NDArray``): Array com as amostras de entradas com [``N amostras`` (Linhas)] e
|
387
|
-
[``N características``(Colunas)]
|
388
|
-
|
389
|
-
Returns:
|
390
|
-
---
|
391
|
-
* C – (``npt.NDArray``): um ndarray de forma ``C`` [``N amostras``],
|
392
|
-
contendo as classes previstas para ``X``.
|
393
|
-
* ``None``: Se não existir detectores para a previsão.
|
394
|
-
"""
|
395
|
-
# If there are no detectors, returns None.
|
396
|
-
if self.detectors is None:
|
397
|
-
return None
|
398
|
-
elif not isinstance(X, (np.ndarray, list)):
|
399
|
-
raise TypeError("X is not an ndarray or list")
|
400
|
-
elif len(self.detectors[self.classes[0]][0].position) != len(X[0]):
|
401
|
-
raise Exception(
|
402
|
-
"X does not have {} features to make the prediction".format(
|
403
|
-
len(self.detectors[self.classes[0]][0])
|
404
|
-
)
|
405
|
-
)
|
406
|
-
|
407
|
-
# Initializes an empty array that will store the predictions.
|
408
|
-
C = np.empty(shape=0)
|
409
|
-
# For each sample row in X.
|
410
|
-
for line in X:
|
411
|
-
class_found: bool
|
412
|
-
_class_ = self.__compare_sample_to_detectors(line)
|
413
|
-
if _class_ is None:
|
414
|
-
class_found = False
|
415
|
-
else:
|
416
|
-
C = np.append(C, [_class_])
|
417
|
-
class_found = True
|
418
|
-
|
419
|
-
# If there is only one class and the sample is not classified, set the output as non-self.
|
420
|
-
if not class_found and len(self.classes) == 1:
|
421
|
-
C = np.append(C, [self.non_self_label])
|
422
|
-
# If the class is not identified with the detectors, assign the class with the greatest distance
|
423
|
-
# from the mean of its detectors.
|
424
|
-
elif not class_found:
|
425
|
-
average_distance: dict = {}
|
426
|
-
for _class_ in self.classes:
|
427
|
-
detectores = list(
|
428
|
-
map(lambda x: x.position, self.detectors[_class_])
|
429
|
-
)
|
430
|
-
average_distance[_class_] = np.average(
|
431
|
-
[self.__distance(detector, line)
|
432
|
-
for detector in detectores]
|
433
|
-
)
|
434
|
-
C = np.append(C, [max(average_distance, key=average_distance.get)])
|
435
|
-
return C
|
436
|
-
|
437
|
-
def __slice_index_list_by_class(self, y: npt.NDArray) -> dict:
|
438
|
-
"""
|
439
|
-
The function ``__slice_index_list_by_class(...)``, separates the indices of the lines according \
|
440
|
-
to the output class, to loop through the sample array, only in positions where the output is \
|
441
|
-
the class being trained.
|
442
|
-
|
443
|
-
Parameters:
|
444
|
-
---
|
445
|
-
* y (npt.NDArray): Receives a ``y``[``N sample``] array with the output classes of the \
|
446
|
-
``X`` sample array.
|
447
|
-
|
448
|
-
returns:
|
449
|
-
---
|
450
|
-
* dict: A dictionary with the list of array positions(``y``), with the classes as key.
|
451
|
-
|
452
|
-
---
|
453
|
-
|
454
|
-
A função ``__slice_index_list_by_class(...)``, separa os índices das linhas conforme a classe \
|
455
|
-
de saída, para percorrer o array de amostra, apenas nas posições que a saída for a classe que \
|
456
|
-
está sendo treinada.
|
457
|
-
|
458
|
-
Parameters:
|
459
|
-
---
|
460
|
-
* y (npt.NDArray): Recebe um array ``y``[``N amostra``] com as classes de saída do array \
|
461
|
-
de amostra ``X``.
|
462
|
-
|
463
|
-
Returns:
|
464
|
-
---
|
465
|
-
* dict: Um dicionário com a lista de posições do array(``y``), com as classes como chave.
|
466
|
-
"""
|
467
|
-
return slice_index_list_by_class(self.classes, y)
|
468
|
-
|
469
|
-
def __checks_valid_detector(
|
470
|
-
self,
|
471
|
-
X: npt.NDArray = None,
|
472
|
-
vector_x: npt.NDArray = None,
|
473
|
-
samples_index_class: npt.NDArray = None,
|
474
|
-
):
|
475
|
-
"""
|
476
|
-
Function to check if the detector has a valid non-proper ``r`` radius for the class.
|
477
|
-
|
478
|
-
Parameters:
|
479
|
-
---
|
480
|
-
* X (``npt.NDArray``): Array ``X`` with the samples.
|
481
|
-
* vector_x (``npt.NDArray``): Randomly generated vector x candidate detector with values \
|
482
|
-
between [0, 1].
|
483
|
-
* samples_index_class (``npt.NDArray``): Sample positions of a class in ``X``.
|
484
|
-
|
485
|
-
returns:
|
486
|
-
---
|
487
|
-
* Validity (``bool``): Returns whether the detector is valid or not.
|
488
|
-
|
489
|
-
---
|
490
|
-
|
491
|
-
Função para verificar se o detector possui raio ``r`` válido do não-próprio para a classe.
|
492
|
-
|
493
|
-
Parameters:
|
494
|
-
---
|
495
|
-
* X (``npt.NDArray``): Array ``X`` com as amostras.
|
496
|
-
* vector_x (``npt.NDArray``): Vetor x candidato a detector gerado aleatoriamente com valores \
|
497
|
-
entre [0, 1].
|
498
|
-
* samples_index_class (``npt.NDArray``): Posições das amostras de uma classe em ``X``.
|
499
|
-
|
500
|
-
Returns:
|
501
|
-
---
|
502
|
-
* Validade (``bool``): Retorna se o detector é válido ou não.
|
503
|
-
|
504
|
-
"""
|
505
|
-
# If any of the input arrays have zero size, returns false.
|
506
|
-
if (np.size(samples_index_class) == 0 or np.size(X) == 0 or np.size(vector_x) == 0):
|
507
|
-
return False
|
508
|
-
# If self.k > 1, uses the k nearest neighbors (kNN); otherwise, checks the detector
|
509
|
-
# without considering kNN.
|
510
|
-
if self.k > 1:
|
511
|
-
knn_list = np.empty(shape=0)
|
512
|
-
for i in samples_index_class:
|
513
|
-
# Calculates the distance between the two vectors and adds it to the kNN list if the
|
514
|
-
# distance is smaller than the largest distance in the list.
|
515
|
-
knn_list = self.__compare_KnearestNeighbors_List(
|
516
|
-
knn_list, self.__distance(X[i], vector_x)
|
517
|
-
)
|
518
|
-
# If the average of the distances in the kNN list is less than the radius, returns true.
|
519
|
-
distance_mean = np.mean(knn_list)
|
520
|
-
if self._algorithm == "V-detector":
|
521
|
-
return self.__detector_is_valid_to_Vdetector(distance_mean, vector_x)
|
522
|
-
elif distance_mean > (self.r + self.r_s):
|
523
|
-
return True
|
524
|
-
else:
|
525
|
-
distance: Union[float, None] = None
|
526
|
-
for i in samples_index_class:
|
527
|
-
if self._algorithm == "V-detector":
|
528
|
-
new_distance = self.__distance(X[i], vector_x)
|
529
|
-
if distance is None:
|
530
|
-
distance = new_distance
|
531
|
-
elif distance > new_distance:
|
532
|
-
distance = new_distance
|
533
|
-
else:
|
534
|
-
# Calculates the distance between the vectors; if it is less than or equal to the radius
|
535
|
-
# plus the sample's radius, sets the validity of the detector to false.
|
536
|
-
if (self.r + self.r_s) >= self.__distance(X[i], vector_x):
|
537
|
-
return False # Detector não é valido!
|
538
|
-
|
539
|
-
if self._algorithm == "V-detector":
|
540
|
-
return self.__detector_is_valid_to_Vdetector(distance, vector_x)
|
541
|
-
return True
|
542
|
-
|
543
|
-
return False # Detector is not valid!
|
544
|
-
|
545
|
-
def __compare_KnearestNeighbors_List(self, knn: npt.NDArray, distance: float) -> npt.NDArray:
|
546
|
-
"""
|
547
|
-
Compares the k-nearest neighbor distance at position ``k-1`` in the list ``knn``,
|
548
|
-
if the distance of the new sample is less, replace it and sort in ascending order.
|
549
|
-
|
550
|
-
|
551
|
-
Parameters:
|
552
|
-
---
|
553
|
-
knn (npt.NDArray): List of k-nearest neighbor distances.
|
554
|
-
distance (float): Distance to check.
|
555
|
-
|
556
|
-
returns:
|
557
|
-
---
|
558
|
-
npt.NDArray: Updated and sorted nearest neighbor list.
|
559
|
-
|
560
|
-
---
|
561
|
-
|
562
|
-
Compara a distância do k-vizinho mais próximo na posição ``k-1``da lista ``knn``,
|
563
|
-
se a distância da nova amostra for menor, substitui ela e ordena em ordem crescente.
|
564
|
-
|
565
|
-
|
566
|
-
Parameters:
|
567
|
-
---
|
568
|
-
knn (npt.NDArray): Lista de distâncias dos k-vizinhos mais próximos.
|
569
|
-
distance (float): Distância a ser verificada.
|
570
|
-
|
571
|
-
Returns:
|
572
|
-
---
|
573
|
-
npt.NDArray: Lista de vizinhos mais próximos atualizada e ordenada.
|
574
|
-
"""
|
575
|
-
# If the number of distances in kNN is less than k, adds the distance.
|
576
|
-
if len(knn) < self.k:
|
577
|
-
knn = np.append(knn, distance)
|
578
|
-
knn.sort()
|
579
|
-
else:
|
580
|
-
# Otherwise, add the distance if the new distance is smaller than the largest distance in the list.
|
581
|
-
if knn[self.k - 1] > distance:
|
582
|
-
knn[self.k - 1] = distance
|
583
|
-
knn.sort()
|
584
|
-
|
585
|
-
return knn
|
586
|
-
|
587
|
-
def __compare_sample_to_detectors(self, line: npt.NDArray):
|
588
|
-
"""
|
589
|
-
Function to compare a sample with the detectors, verifying if the sample is proper.
|
590
|
-
|
591
|
-
Details:
|
592
|
-
---
|
593
|
-
In this function, when there is class ambiguity, it returns the class that has the greatest
|
594
|
-
average distance between the detectors.
|
595
|
-
|
596
|
-
Parameters:
|
597
|
-
---
|
598
|
-
* line: vector with N-features
|
599
|
-
|
600
|
-
returns:
|
601
|
-
---
|
602
|
-
* Returns the predicted class with the detectors or None if the sample does not qualify \
|
603
|
-
for any class.
|
604
|
-
|
605
|
-
---
|
606
|
-
|
607
|
-
Função para comparar uma amostra com os detectores, verificando se a amostra é própria.
|
608
|
-
|
609
|
-
Details:
|
610
|
-
---
|
611
|
-
Nesta função, quando possui ambiguidade de classes, retorna a classe que possuir a média de \
|
612
|
-
distância maior entre os detectores.
|
613
|
-
|
614
|
-
Parameters:
|
615
|
-
---
|
616
|
-
* line: vetor com N-características
|
617
|
-
|
618
|
-
Returns:
|
619
|
-
---
|
620
|
-
* Retorna a classe prevista com os detectores ou None se a amostra não se qualificar \
|
621
|
-
a nenhuma classe.
|
622
|
-
"""
|
623
|
-
|
624
|
-
# List to store the classes and the average distance between the detectors and the sample.
|
625
|
-
possible_classes = []
|
626
|
-
for _class_ in self.classes:
|
627
|
-
# Variable to indicate if the class was found with the detectors.
|
628
|
-
class_found: bool = True
|
629
|
-
sum_distance = 0
|
630
|
-
for detector in self.detectors[_class_]:
|
631
|
-
distance = self.__distance(detector.position, line)
|
632
|
-
sum_distance += distance
|
633
|
-
if self._algorithm == "V-detector":
|
634
|
-
if distance <= detector.radius:
|
635
|
-
class_found = False
|
636
|
-
break
|
637
|
-
elif distance <= self.r:
|
638
|
-
class_found = False
|
639
|
-
break
|
640
|
-
|
641
|
-
# If the sample passes through all the detectors of a class, adds the class as a possible prediction.
|
642
|
-
if class_found:
|
643
|
-
possible_classes.append([_class_, sum_distance / self.N])
|
644
|
-
# If classified as belonging to only one class, returns the class.
|
645
|
-
if len(possible_classes) == 1:
|
646
|
-
return possible_classes[0][0]
|
647
|
-
# If belonging to more than one class, returns the class with the greatest average distance.
|
648
|
-
elif len(possible_classes) > 1:
|
649
|
-
return max(possible_classes, key=lambda x: x[1])[0]
|
650
|
-
else:
|
651
|
-
return None
|
652
|
-
|
653
|
-
def __distance(self, u: npt.NDArray, v: npt.NDArray):
|
654
|
-
"""
|
655
|
-
Function to calculate the distance between two points by the chosen ``metric``.
|
656
|
-
|
657
|
-
Parameters:
|
658
|
-
---
|
659
|
-
* u (``npt.NDArray``): Coordinates of the first point.
|
660
|
-
* v (``npt.NDArray``): Coordinates of the second point.
|
661
|
-
|
662
|
-
returns:
|
663
|
-
---
|
664
|
-
* Distance (``double``) between the two points.
|
665
|
-
|
666
|
-
---
|
667
|
-
|
668
|
-
Função para calcular a distância entre dois pontos pela ``metric`` escolhida.
|
669
|
-
|
670
|
-
Parameters:
|
671
|
-
---
|
672
|
-
* u (``npt.NDArray``): Coordenadas do primeiro ponto.
|
673
|
-
* v (``npt.NDArray``): Coordenadas do segundo ponto.
|
674
|
-
|
675
|
-
Returns:
|
676
|
-
---
|
677
|
-
* Distância (``double``) entre os dois pontos.
|
678
|
-
"""
|
679
|
-
return super()._distance(u, v)
|
680
|
-
|
681
|
-
def __detector_is_valid_to_Vdetector(self, distance: float, vector_x: npt.NDArray):
|
682
|
-
"""
|
683
|
-
Check if the distance between the detector and the samples, minus the radius of the samples,
|
684
|
-
is greater than the minimum radius.
|
685
|
-
|
686
|
-
Parameters:
|
687
|
-
---
|
688
|
-
distance (``float``): minimum distance calculated between all samples.
|
689
|
-
vector_x (``numpy.ndarray``): randomly generated candidate detector vector x with values \
|
690
|
-
between 0 and 1.
|
691
|
-
|
692
|
-
Returns:
|
693
|
-
---
|
694
|
-
* ``False``: if the calculated radius is smaller than the minimum distance or exceeds the \
|
695
|
-
edge of the space, if this option is enabled.
|
696
|
-
* ``True`` and the distance minus the radius of the samples, if the radius is valid.`
|
697
|
-
|
698
|
-
----
|
699
|
-
|
700
|
-
Verifique se a distância entre o detector e as amostras, descontando o raio das amostras, \
|
701
|
-
é maior do que o raio mínimo.
|
702
|
-
|
703
|
-
Parameters:
|
704
|
-
---
|
705
|
-
distance (``float``): distância mínima calculada entre todas as amostras.
|
706
|
-
vector_x (``numpy.ndarray``): vetor x candidato do detector gerado aleatoriamente, com \
|
707
|
-
valores entre 0 e 1.
|
708
|
-
|
709
|
-
Returns:
|
710
|
-
---
|
711
|
-
|
712
|
-
* ``False``: caso o raio calculado seja menor do que a distância mínima ou ultrapasse a \
|
713
|
-
borda do espaço, caso essa opção esteja habilitada.
|
714
|
-
* ``True`` e a distância menos o raio das amostras, caso o raio seja válido.
|
715
|
-
"""
|
716
|
-
new_detector_r = float(distance - self.r_s)
|
717
|
-
if self.r >= new_detector_r:
|
718
|
-
return False
|
719
|
-
else:
|
720
|
-
# If _cell_bounds is True, considers the detector to be within the plane bounds.
|
721
|
-
if self._cell_bounds:
|
722
|
-
for p in vector_x:
|
723
|
-
if (p - new_detector_r) < 0 or (p + new_detector_r) > 1:
|
724
|
-
return False
|
725
|
-
return True, new_detector_r
|
726
|
-
|
727
|
-
def get_params(self, deep: bool = True) -> dict:
|
728
|
-
return {
|
729
|
-
"N": self.N,
|
730
|
-
"r": self.r,
|
731
|
-
"k": self.k,
|
732
|
-
"metric": self.metric,
|
733
|
-
"seed": self.seed,
|
734
|
-
"algorithm": self._algorithm,
|
735
|
-
"r_s": self.r_s,
|
736
|
-
"cell_bounds": self._cell_bounds,
|
737
|
-
"p": self.p,
|
738
|
-
}
|
739
|
-
|
740
|
-
|
741
|
-
class BNSA(Base):
|
742
|
-
"""
|
743
|
-
The ``BNSA`` (Binary Negative Selection Algorithm) class is for classification and identification \
|
744
|
-
purposes of anomalies through the self and not self method.
|
745
|
-
|
746
|
-
Attributes:
|
747
|
-
---
|
748
|
-
|
749
|
-
* N (``int``): Number of detectors.
|
750
|
-
* aff_thresh (``float``): The variable represents the percentage of similarity between the \
|
751
|
-
T cell and the own samples.
|
752
|
-
* max_discards (``int``): This parameter indicates the maximum number of detector discards \
|
753
|
-
in sequence, which aims to avoid a possible infinite loop if a radius is defined that \
|
754
|
-
it is not possible to generate non-self detectors.
|
755
|
-
* seed (``int``): Seed for the random generation of values in the detectors.
|
756
|
-
|
757
|
-
* detectors (``dict``): This variable stores a list of detectors by class.
|
758
|
-
* classes (``npt.NDArray``): list of output classes.
|
759
|
-
|
760
|
-
|
761
|
-
---
|
762
|
-
|
763
|
-
A classe ``BNSA`` (Algoritmo de Seleção Negativa Binária) tem a finalidade de classificação e \
|
764
|
-
identificação de anomalias através do método self e not self .
|
765
|
-
|
766
|
-
Attributes:
|
767
|
-
---
|
768
|
-
* N (``int``): Quantidade de detectores. Defaults to ``100``.
|
769
|
-
* aff_thresh (``float``): A variável representa a porcentagem de similaridade entre a célula \
|
770
|
-
T e as amostras próprias. O valor padrão é de 10% (0,1), enquanto que o valor de 1,0 \
|
771
|
-
representa 100% de similaridade.
|
772
|
-
* max_discards (``int``): Este parâmetro indica o número máximo de descartes de detectores \
|
773
|
-
em sequência, que tem como objetivo evitar um possível loop infinito caso seja definido \
|
774
|
-
um raio que não seja possível gerar detectores do não-próprio. Defaults to ``100``.
|
775
|
-
* seed (``int``): Semente para a geração randômica dos valores nos detectores. Defaults to ``None``.
|
776
|
-
* no_label_sample_selection (``str``): Method for selecting labels for samples designated as \
|
777
|
-
non-members by all non-member detectors. Defaults to ``max_average_difference``.
|
778
|
-
|
779
|
-
|
780
|
-
* detectors (``dict``): Essa variável armazena uma lista com detectores por classes.
|
781
|
-
* classes (``npt.NDArray``): lista com as classes de saída.
|
782
|
-
|
783
|
-
"""
|
784
|
-
|
785
|
-
def __init__(
|
786
|
-
self,
|
787
|
-
N: int = 100,
|
788
|
-
aff_thresh: float = 0.1,
|
789
|
-
max_discards: int = 1000,
|
790
|
-
seed: int = None,
|
791
|
-
no_label_sample_selection: Literal[
|
792
|
-
"max_average_difference", "max_nearest_difference"
|
793
|
-
] = "max_average_difference",
|
794
|
-
):
|
795
|
-
"""
|
796
|
-
Constructor of the Negative Selection class (``BNSA``).
|
797
|
-
|
798
|
-
Details:
|
799
|
-
---
|
800
|
-
This method initializes the ``detectors``, ``classes``, ``N``, ``t`` and ``seed`` attributes.
|
801
|
-
|
802
|
-
Parameters:
|
803
|
-
---
|
804
|
-
* N (``int``): Number of detectors. Defaults to ``100``.
|
805
|
-
* aff_thresh (``float``): The variable represents the percentage of similarity between \
|
806
|
-
the T cell and the own samples. The default value is 10% (0.1), while a value of 1.0 \
|
807
|
-
represents 100% similarity.
|
808
|
-
* max_discards (``int``): This parameter indicates the maximum number of detector \
|
809
|
-
discards in sequence, which aims to avoid a possible infinite loop if a radius is \
|
810
|
-
defined that it is not possible to generate non-self detectors. Defaults to ``1000``.
|
811
|
-
* seed (``int``): Seed for the random generation of values in the detectors. Defaults to ``None``.
|
812
|
-
* no_label_sample_selection (``str``): Method for selecting labels for samples designated as \
|
813
|
-
non-members by all non-member detectors. Available method types:
|
814
|
-
- (``max_average_difference``): Selects the class with the highest average difference \
|
815
|
-
among the detectors.
|
816
|
-
- (``max_nearest_difference``): Selects the class with the highest difference between \
|
817
|
-
the nearest and farthest detector from the sample.
|
818
|
-
---
|
819
|
-
|
820
|
-
Construtor da classe de Seleção negativa (``BNSA``).
|
821
|
-
|
822
|
-
Details:
|
823
|
-
---
|
824
|
-
Este método inicializa os atributos ``detectors``, ``classes``, ``N``, ``t`` e ``seed``.
|
825
|
-
|
826
|
-
Parameters:
|
827
|
-
---
|
828
|
-
* N (``int``): Quantidade de detectores. Defaults to ``100``.
|
829
|
-
* aff_thresh (``float``): A variável representa a porcentagem de similaridade entre a \
|
830
|
-
célula T e as amostras próprias. O valor padrão é de 10% (0,1), enquanto que o valor \
|
831
|
-
de 1,0 representa 100% de similaridade.
|
832
|
-
* max_discards (``int``): Este parâmetro indica o número máximo de descartes de detectores \
|
833
|
-
em sequência, que tem como objetivo evitar um possível loop infinito caso seja definido \
|
834
|
-
um raio que não seja possível gerar detectores do não-próprio. Defaults to ``1000``.
|
835
|
-
* seed (``int``): Semente para a geração randômica dos valores nos detectores. Defaults to ``None``.
|
836
|
-
* no_label_sample_selection (``str``): Método para a seleção de rótulos para amostras designadas \
|
837
|
-
como não pertencentes por todos os detectores não pertencentes. Tipos de métodos disponíveis:
|
838
|
-
- (``max_average_difference``): Seleciona a classe com a maior diferença média entre os \
|
839
|
-
detectores.
|
840
|
-
- (``max_nearest_difference``): Seleciona a classe com a maior diferença entre o detector \
|
841
|
-
mais próximo e mais distante da amostra.
|
842
|
-
|
843
|
-
"""
|
844
|
-
super().__init__()
|
845
|
-
if N > 0:
|
846
|
-
self.N: int = N
|
847
|
-
else:
|
848
|
-
self.N: int = 100
|
849
|
-
|
850
|
-
if 0 < aff_thresh < 1:
|
851
|
-
self.aff_thresh: float = aff_thresh
|
852
|
-
else:
|
853
|
-
self.aff_thresh: float = 0.1
|
854
|
-
if max_discards > 0:
|
855
|
-
self.max_discards: int = max_discards
|
856
|
-
else:
|
857
|
-
self.max_discards: int = 1000
|
858
|
-
|
859
|
-
if seed is not None and isinstance(seed, int):
|
860
|
-
np.random.seed(seed)
|
861
|
-
self.seed: int = seed
|
862
|
-
else:
|
863
|
-
self.seed = None
|
864
|
-
|
865
|
-
if no_label_sample_selection == 'nearest_difference':
|
866
|
-
self.no_label_sample_selection = 'nearest_difference'
|
867
|
-
else:
|
868
|
-
self.no_label_sample_selection = 'max_average_difference'
|
869
|
-
|
870
|
-
self.classes: npt.NDArray = None
|
871
|
-
self.detectors: npt.NDArray = None
|
872
|
-
|
873
|
-
def fit(self, X: npt.NDArray, y: npt.NDArray, verbose: bool = True):
|
874
|
-
"""
|
875
|
-
The function ``fit(...)``, performs the training according to ``X`` and ``y``, using the method
|
876
|
-
negative selection method(``NegativeSelect``).
|
877
|
-
|
878
|
-
Parameters:
|
879
|
-
---
|
880
|
-
* X (``npt.NDArray``): Training array, containing the samples and their characteristics,
|
881
|
-
[``N samples`` (rows)][``N features`` (columns)].
|
882
|
-
* y (``npt.NDArray``): Array of target classes of ``X`` with [``N samples`` (lines)].
|
883
|
-
* verbose (``bool``): Feedback from detector generation to the user.
|
884
|
-
returns:
|
885
|
-
---
|
886
|
-
(``self``): Returns the instance itself.
|
887
|
-
|
888
|
-
----
|
889
|
-
|
890
|
-
A função ``fit(...)``, realiza o treinamento de acordo com ``X`` e ``y``, usando o método
|
891
|
-
de seleção negativa(``NegativeSelect``).
|
892
|
-
|
893
|
-
Parameters:
|
894
|
-
---
|
895
|
-
* X (``npt.NDArray``): Array de treinamento, contendo as amostras é suas características, \
|
896
|
-
[``N amostras`` (linhas)][``N características`` (colunas)].
|
897
|
-
* y (``npt.NDArray``): Array com as classes alvos de ``X`` com [``N amostras`` (linhas)].
|
898
|
-
* verbose (``bool``): Feedback da geração de detectores para o usuário.
|
899
|
-
Returns:
|
900
|
-
---
|
901
|
-
(``self``): Retorna a própria instância.
|
902
|
-
"""
|
903
|
-
super()._check_and_raise_exceptions_fit(X, y, "BNSA")
|
904
|
-
|
905
|
-
# Converts the entire array X to boolean
|
906
|
-
if X.dtype != bool:
|
907
|
-
X = X.astype(bool)
|
908
|
-
|
909
|
-
# Identifying the possible classes within the output array `y`.
|
910
|
-
self.classes = np.unique(y)
|
911
|
-
# Dictionary that will store detectors with classes as keys.
|
912
|
-
list_detectors_by_class = dict()
|
913
|
-
# Separates the classes for training.
|
914
|
-
sample_index: dict = self.__slice_index_list_by_class(y)
|
915
|
-
# Progress bar for generating all detectors.
|
916
|
-
if verbose:
|
917
|
-
progress = tqdm(total=int(self.N * (len(self.classes))),
|
918
|
-
bar_format='{desc} ┇{bar}┇ {n}/{total} detectors', postfix='\n')
|
919
|
-
|
920
|
-
for _class_ in self.classes:
|
921
|
-
# Initializes the empty set that will contain the valid detectors.
|
922
|
-
valid_detectors_set: list = []
|
923
|
-
discard_count: int = 0
|
924
|
-
# Updating the progress bar with the current class the algorithm is processing.
|
925
|
-
if verbose:
|
926
|
-
progress.set_description_str(
|
927
|
-
f"Generating the detectors for the {_class_} class:")
|
928
|
-
while len(valid_detectors_set) < self.N:
|
929
|
-
|
930
|
-
is_valid_detector: bool = True
|
931
|
-
# Generates a candidate detector vector randomly with values 0 and 1.
|
932
|
-
vector_x = np.random.choice([False, True], size=X.shape[1])
|
933
|
-
# Calculates the distance between the candidate and the class samples.
|
934
|
-
distances = cdist(np.expand_dims(vector_x, axis=0),
|
935
|
-
X[sample_index[_class_]], metric='hamming')
|
936
|
-
# Checks if any of the distances is below or equal to the threshold.
|
937
|
-
is_valid_detector = not np.any(distances <= self.aff_thresh)
|
938
|
-
|
939
|
-
# If the detector is valid, add it to the list of valid detectors.
|
940
|
-
if is_valid_detector:
|
941
|
-
discard_count = 0
|
942
|
-
valid_detectors_set.append(vector_x)
|
943
|
-
if verbose:
|
944
|
-
progress.update(1)
|
945
|
-
else:
|
946
|
-
discard_count += 1
|
947
|
-
if discard_count == self.max_discards:
|
948
|
-
raise Exception(
|
949
|
-
"An error has been identified:\n"
|
950
|
-
f"the maximum number of discards of detectors for the {_class_} "
|
951
|
-
"class has been reached.\nIt is recommended to check the defined "
|
952
|
-
"radius and consider reducing its value."
|
953
|
-
)
|
954
|
-
|
955
|
-
# Add detectors to the dictionary with classes as keys.
|
956
|
-
list_detectors_by_class[_class_] = valid_detectors_set
|
957
|
-
|
958
|
-
# Notify the completion of detector generation for the classes.
|
959
|
-
if verbose:
|
960
|
-
progress.set_description(
|
961
|
-
f'\033[92m✔ Non-self detectors for classes ({", ".join(map(str, self.classes))}) '
|
962
|
-
f'successfully generated\033[0m')
|
963
|
-
# Saves the found detectors in the attribute for the class detectors.
|
964
|
-
self.detectors = list_detectors_by_class
|
965
|
-
return self
|
966
|
-
|
967
|
-
def predict(self, X: npt.NDArray) -> Optional[npt.NDArray]:
|
968
|
-
"""
|
969
|
-
Function to perform the prediction of classes based on detectors
|
970
|
-
created after training.
|
971
|
-
|
972
|
-
Parameters:
|
973
|
-
---
|
974
|
-
* X (``npt.NDArray``): Array with input samples with [``N samples`` (Lines)] and
|
975
|
-
[``N characteristics``(Columns)]
|
976
|
-
|
977
|
-
returns:
|
978
|
-
---
|
979
|
-
* C – (``npt.NDArray``): an ndarray of the form ``C`` [``N samples``],
|
980
|
-
containing the predicted classes for ``X``.
|
981
|
-
* ``None``: If there are no detectors for the prediction.
|
982
|
-
|
983
|
-
---
|
984
|
-
|
985
|
-
Função para efetuar a previsão das classes com base nos detectores
|
986
|
-
criados após o treinamento.
|
987
|
-
|
988
|
-
Parameters:
|
989
|
-
---
|
990
|
-
* X (``npt.NDArray``): Array com as amostras de entradas com [``N amostras`` (Linhas)] e
|
991
|
-
[``N características``(Colunas)]
|
992
|
-
|
993
|
-
Returns:
|
994
|
-
---
|
995
|
-
* C – (``npt.NDArray``): um ndarray de forma ``C`` [``N amostras``],
|
996
|
-
contendo as classes previstas para ``X``.
|
997
|
-
* ``None``: Se não existir detectores para a previsão.
|
998
|
-
"""
|
999
|
-
# If there are no detectors, returns None.
|
1000
|
-
if self.detectors is None:
|
1001
|
-
return None
|
1002
|
-
elif not isinstance(X, (np.ndarray, list)):
|
1003
|
-
raise TypeError("X is not an ndarray or list")
|
1004
|
-
elif len(self.detectors[self.classes[0]][0]) != len(X[0]):
|
1005
|
-
raise Exception(
|
1006
|
-
"X does not have {} features to make the prediction".format(
|
1007
|
-
len(self.detectors[self.classes[0]][0])
|
1008
|
-
)
|
1009
|
-
)
|
1010
|
-
# Checks if matrix X contains only binary samples. Otherwise, raises an exception.
|
1011
|
-
if not np.isin(X, [0, 1]).all():
|
1012
|
-
raise ValueError(
|
1013
|
-
"The array X contains values that are not composed only of 0 and 1."
|
1014
|
-
)
|
1015
|
-
|
1016
|
-
# Converts the entire array X to boolean.
|
1017
|
-
if X.dtype != bool:
|
1018
|
-
X = X.astype(bool)
|
1019
|
-
|
1020
|
-
# Initializes an empty array that will store the predictions.
|
1021
|
-
C = np.empty(shape=0)
|
1022
|
-
# For each sample row in X.
|
1023
|
-
for line in X:
|
1024
|
-
class_found: bool = True
|
1025
|
-
# List to store the possible classes to which the sample matches with self
|
1026
|
-
# when compared to the non-self detectors.
|
1027
|
-
possible_classes: list = []
|
1028
|
-
for _class_ in self.classes:
|
1029
|
-
similarity_sum: float = 0
|
1030
|
-
# Calculates the Hamming distance between the row and all detectors.
|
1031
|
-
distances = cdist(np.expand_dims(line, axis=0),
|
1032
|
-
self.detectors[_class_], metric='hamming')
|
1033
|
-
|
1034
|
-
# Check if any distance is below or equal to the threshold.
|
1035
|
-
if np.any(distances <= self.aff_thresh):
|
1036
|
-
class_found = False
|
1037
|
-
else:
|
1038
|
-
similarity_sum = np.sum(distances)
|
1039
|
-
|
1040
|
-
# If the sample passes through all detectors of a class, adds the class as a possible prediction
|
1041
|
-
# and its average similarity.
|
1042
|
-
if class_found:
|
1043
|
-
possible_classes.append([_class_, similarity_sum / self.N])
|
1044
|
-
|
1045
|
-
# If belonging to one or more classes, adds the class with the greatest average distance.
|
1046
|
-
if len(possible_classes) > 0:
|
1047
|
-
C = np.append(
|
1048
|
-
C, [max(possible_classes, key=lambda x: x[1])[0]])
|
1049
|
-
class_found = True
|
1050
|
-
else:
|
1051
|
-
class_found = False
|
1052
|
-
|
1053
|
-
# If there is only one class and the sample is not classified, sets the output as non-self.
|
1054
|
-
if not class_found and len(self.classes) == 1:
|
1055
|
-
C = np.append(C, ["non-self"])
|
1056
|
-
# If the class cannot be identified by the detectors
|
1057
|
-
elif not class_found:
|
1058
|
-
class_differences: dict = {}
|
1059
|
-
for _class_ in self.classes:
|
1060
|
-
# Assign the label to the class with the greatest distance from the nearest detector.
|
1061
|
-
if self.no_label_sample_selection == 'nearest_difference':
|
1062
|
-
difference_min: float = cdist(np.expand_dims(line, axis=0),
|
1063
|
-
self.detectors[_class_], metric='hamming'
|
1064
|
-
).min()
|
1065
|
-
class_differences[_class_] = difference_min
|
1066
|
-
# Or based on the greatest distance from the average distances of the detectors.
|
1067
|
-
else:
|
1068
|
-
difference_sum: float = cdist(np.expand_dims(line, axis=0),
|
1069
|
-
self.detectors[_class_], metric='hamming'
|
1070
|
-
).sum()
|
1071
|
-
class_differences[_class_] = difference_sum / self.N
|
1072
|
-
|
1073
|
-
C = np.append(C, [max(class_differences, key=class_differences.get)])
|
1074
|
-
|
1075
|
-
return C
|
1076
|
-
|
1077
|
-
def __slice_index_list_by_class(self, y: npt.NDArray) -> dict:
|
1078
|
-
"""
|
1079
|
-
The function ``__slice_index_list_by_class(...)``, separates the indices of the lines according \
|
1080
|
-
to the output class, to loop through the sample array, only in positions where the output is \
|
1081
|
-
the class being trained.
|
1082
|
-
|
1083
|
-
Parameters:
|
1084
|
-
---
|
1085
|
-
* y (npt.NDArray): Receives a ``y``[``N sample``] array with the output classes of the \
|
1086
|
-
``X`` sample array.
|
1087
|
-
|
1088
|
-
returns:
|
1089
|
-
---
|
1090
|
-
* dict: A dictionary with the list of array positions(``y``), with the classes as key.
|
1091
|
-
|
1092
|
-
---
|
1093
|
-
|
1094
|
-
A função ``__slice_index_list_by_class(...)``, separa os índices das linhas conforme a classe \
|
1095
|
-
de saída, para percorrer o array de amostra, apenas nas posições que a saída for a classe que \
|
1096
|
-
está sendo treinada.
|
1097
|
-
|
1098
|
-
Parameters:
|
1099
|
-
---
|
1100
|
-
* y (npt.NDArray): Recebe um array ``y``[``N amostra``] com as classes de saída do array \
|
1101
|
-
de amostra ``X``.
|
1102
|
-
|
1103
|
-
Returns:
|
1104
|
-
---
|
1105
|
-
* dict: Um dicionário com a lista de posições do array(``y``), com as classes como chave.
|
1106
|
-
"""
|
1107
|
-
return slice_index_list_by_class(self.classes, y)
|
1108
|
-
|
1109
|
-
def get_params(self, deep: bool = True) -> dict:
|
1110
|
-
return {
|
1111
|
-
"N": self.N,
|
1112
|
-
"aff_thresh": self.aff_thresh,
|
1113
|
-
"max_discards": self.max_discards,
|
1114
|
-
"seed": self.seed,
|
1115
|
-
}
|