braindecode 1.2.0.dev176358851__py3-none-any.whl → 1.2.0.dev180217551__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of braindecode might be problematic. Click here for more details.

@@ -13,13 +13,154 @@ from braindecode.modules import CausalConv1d, Ensure4d, MaxNormLinear
13
13
 
14
14
 
15
15
  class ATCNet(EEGModuleMixin, nn.Module):
16
- """ATCNet model from Altaheri et al. (2022) [1]_
16
+ """ATCNet from Altaheri et al. (2022) [1]_.
17
17
 
18
- Pytorch implementation based on official tensorflow code [2]_.
18
+ :bdg-success:`Convolution` :bdg-info:`Small Attention`
19
19
 
20
20
  .. figure:: https://user-images.githubusercontent.com/25565236/185449791-e8539453-d4fa-41e1-865a-2cf7e91f60ef.png
21
- :align: center
22
- :alt: ATCNet Architecture
21
+ :align: center
22
+ :alt: ATCNet Architecture
23
+ :width: 650px
24
+
25
+ .. rubric:: Architectural Overview
26
+
27
+ ATCNet is a *convolution-first* architecture augmented with a *lightweight attention–TCN*
28
+ sequence module. The end-to-end flow is:
29
+
30
+ - (i) :class:`_ConvBlock` learns temporal filter-banks and spatial projections (EEGNet-style),
31
+ downsampling time to a compact feature map;
32
+
33
+ - (ii) Sliding Windows carve overlapping temporal windows from this map;
34
+
35
+ - (iii) for each window, :class:`_AttentionBlock` applies small multi-head self-attention
36
+ over time, followed by a :class:`_TCNResidualBlock` stack (causal, dilated);
37
+
38
+ - (iv) window-level features are aggregated (mean of window logits or concatenation)
39
+ and mapped via a max-norm–constrained linear layer.
40
+
41
+ Relative to ViT, ATCNet replaces linear patch projection with learned *temporal–spatial*
42
+ convolutions; it processes *parallel* window encoders (attention→TCN) instead of a deep
43
+ stack; and swaps the MLP head for a TCN suited to 1-D EEG sequences.
44
+
45
+ .. rubric:: Macro Components
46
+
47
+ - :class:`_ConvBlock` **(Shallow conv stem → feature map)**
48
+
49
+ - *Operations.*
50
+ - **Temporal conv** (:class:`torch.nn.Conv2d`) with kernel ``(L_t, 1)`` builds a
51
+ FIR-like filter bank (``F1`` maps).
52
+ - **Depthwise spatial conv** (:class:`torch.nn.Conv2d`, ``groups=F1``) with kernel
53
+ ``(1, n_chans)`` learns per-filter spatial projections (akin to EEGNet’s CSP-like step).
54
+ - **BN → ELU → AvgPool → Dropout** to stabilize and condense activations.
55
+ - **Refining temporal conv** (:class:`torch.nn.Conv2d`) with kernel ``(L_r, 1)`` +
56
+ **BN → ELU → AvgPool → Dropout**.
57
+
58
+ The output shape is ``(B, F2, T_c, 1)`` with ``F2 = F1·D`` and ``T_c = T/(P1·P2)``.
59
+ Temporal kernels behave as FIR filters; the depthwise-spatial conv yields frequency-specific
60
+ topographies. Pooling acts as a local integrator, reducing variance and imposing a
61
+ useful inductive bias on short EEG windows.
62
+
63
+ - **Sliding-Window Sequencer**
64
+
65
+ From the condensed time axis (length ``T_c``), ATCNet forms ``n`` overlapping windows
66
+ of width ``T_w = T_c - n + 1`` (one start per index). Each window produces a sequence
67
+ ``(B, F2, T_w)`` forwarded to its own attention–TCN branch. This creates *parallel*
68
+ encoders over shifted contexts and is key to robustness on nonstationary EEG.
69
+
70
+ - :class:`_AttentionBlock` **(small MHA on temporal positions)**
71
+
72
+ - *Operations.*
73
+ - Rearrange to ``(B, T_w, F2)``,
74
+ - Normalization :class:`torch.nn.LayerNorm`
75
+ - Custom MultiHeadAttention :class:`_MHA` (``num_heads=H``, per-head dim ``d_h``) + residual add,
76
+ - Dropout :class:`torch.nn.Dropout`
77
+ - Rearrange back to ``(B, F2, T_w)``.
78
+
79
+
80
+ **Note**: Attention is *local to a window* and purely temporal.
81
+
82
+ *Role.* Re-weights evidence across the window, letting the model emphasize informative
83
+ segments (onsets, bursts) before causal convolutions aggregate history.
84
+
85
+ - :class:`_TCNResidualBlock` **(causal dilated temporal CNN)**
86
+
87
+ - *Operations.*
88
+ - Two :class:`braindecode.modules.CausalConv1d` layers per block with dilation ``1, 2, 4, …``
89
+ - Across blocks of `torch.nn.ELU` + `torch.nn.BatchNorm1d` + `torch.nn.Dropout`) +
90
+ a residual (identity or 1x1 mapping).
91
+ - The final feature used per window is the *last* causal step ``[..., -1]`` (forecast-style).
92
+
93
+ *Role.* Efficient long-range temporal integration with stable gradients; the dilated
94
+ receptive field complements attention’s soft selection.
95
+
96
+ - **Aggregation & Classifier**
97
+
98
+ - *Operations.*
99
+ - Either (a) map each window feature ``(B, F2)`` to logits via :class:`braindecode.modules.MaxNormLinear`
100
+ and **average** across windows (default, matching official code), or
101
+ - (b) **concatenate** all window features ``(B, n·F2)`` and apply a single :class:`MaxNormLinear`.
102
+ The max-norm constraint regularizes the readout.
103
+
104
+ .. rubric:: Convolutional Details
105
+
106
+ - **Temporal.** Temporal structure is learned in three places:
107
+ - (1) the stem’s wide ``(L_t, 1)`` conv (learned filter bank),
108
+ - (2) the refining ``(L_r, 1)`` conv after pooling (short-term dynamics), and
109
+ - (3) the TCN’s causal 1-D convolutions with exponentially increasing dilation
110
+ (long-range dependencies). The minimum sequence length required by the TCN stack is
111
+ ``(K_t - 1)·2^{L-1} + 1``; the implementation *auto-scales* kernels/pools/windows
112
+ when inputs are shorter to preserve feasibility.
113
+
114
+ - **Spatial.** A depthwise spatial conv spans the **full montage** (kernel ``(1, n_chans)``),
115
+ producing *per-temporal-filter* spatial projections (no cross-filter mixing at this step).
116
+ This mirrors EEGNet’s interpretability: each temporal filter has its own spatial pattern.
117
+
118
+
119
+ .. rubric:: Attention / Sequential Modules
120
+
121
+ - **Type.** Multi-head self-attention with ``H`` heads and per-head dim ``d_h`` implemented
122
+ in :class:`_MHA`, allowing ``embed_dim = H·d_h`` independent of input and output dims.
123
+ - **Shapes.** ``(B, F2, T_w) → (B, T_w, F2) → (B, F2, T_w)``. Attention operates along
124
+ the **temporal** axis within a window; channels/features stay in the embedding dim ``F2``.
125
+ - **Role.** Highlights salient temporal positions prior to causal convolution; small attention
126
+ keeps compute modest while improving context modeling over pooled features.
127
+
128
+ .. rubric:: Additional Mechanisms
129
+
130
+ - **Parallel encoders over shifted windows.** Improves montage/phase robustness by
131
+ ensembling nearby contexts rather than committing to a single segmentation.
132
+ - **Max-norm classifier.** Enforces weight norm constraints at the readout, a common
133
+ stabilization trick in EEG decoding.
134
+ - **ViT vs. ATCNet (design choices).** Convolutional *nonlinear* projection rather than
135
+ linear patchification; attention followed by **TCN** (not MLP); *parallel* window
136
+ encoders rather than stacked encoders.
137
+
138
+ .. rubric:: Usage and Configuration
139
+
140
+ - ``conv_block_n_filters (F1)``, ``conv_block_depth_mult (D)`` → capacity of the stem
141
+ (with ``F2 = F1·D`` feeding attention/TCN), dimensions aligned to ``F2``, like `EEGNetv4`.
142
+ - Pool sizes ``P1,P2`` trade temporal resolution for stability/compute; they set
143
+ ``T_c = T/(P1·P2)`` and thus window width ``T_w``.
144
+ - ``n_windows`` controls the ensemble over shifts (compute ∝ windows).
145
+ - ``att_num_heads``, ``att_head_dim`` set attention capacity; keep ``H·d_h ≈ F2``.
146
+ - ``tcn_depth``, ``tcn_kernel_size`` govern receptive field; larger values demand
147
+ longer inputs (see minimum length above). The implementation warns and *rescales*
148
+ kernels/pools/windows if inputs are too short.
149
+ - **Aggregation choice.** ``concat=False`` (default, average of per-window logits) matches
150
+ the official code; ``concat=True`` mirrors the paper’s concatenation variant.
151
+
152
+
153
+ Notes
154
+ -----
155
+ - Inputs substantially shorter than the implied minimum length trigger **automatic
156
+ downscaling** of kernels, pools, windows, and TCN kernel size to maintain validity.
157
+ - The attention–TCN sequence operates **per window**; the last causal step is used as the
158
+ window feature, aligning the temporal semantics across windows.
159
+
160
+ .. versionadded:: 1.1
161
+
162
+ - More detailed documentation of the model.
163
+
23
164
 
24
165
  Parameters
25
166
  ----------
@@ -85,15 +226,13 @@ class ATCNet(EEGModuleMixin, nn.Module):
85
226
  Maximum L2-norm constraint imposed on weights of the last
86
227
  fully-connected layer. Defaults to 0.25.
87
228
 
88
-
89
229
  References
90
230
  ----------
91
- .. [1] H. Altaheri, G. Muhammad and M. Alsulaiman,
92
- Physics-informed attention temporal convolutional network for EEG-based
93
- motor imagery classification in IEEE Transactions on Industrial Informatics,
94
- 2022, doi: 10.1109/TII.2022.3197419.
95
- .. [2] EEE-ATCNet implementation.
96
- https://github.com/Altaheri/EEG-ATCNet/blob/main/models.py
231
+ .. [1] H. Altaheri, G. Muhammad, M. Alsulaiman (2022).
232
+ *Physics-informed attention temporal convolutional network for EEG-based motor imagery classification.*
233
+ IEEE Transactions on Industrial Informatics. doi:10.1109/TII.2022.3197419.
234
+ .. [2] Official EEG-ATCNet implementation (TensorFlow):
235
+ https://github.com/Altaheri/EEG-ATCNet/blob/main/models.py
97
236
  """
98
237
 
99
238
  def __init__(
@@ -556,7 +695,8 @@ class _TCNResidualBlock(nn.Module):
556
695
  # Reshape the input for the residual connection when necessary
557
696
  if in_channels != n_filters:
558
697
  self.reshaping_conv = nn.Conv1d(
559
- n_filters,
698
+ in_channels=in_channels,
699
+ out_channels=n_filters,
560
700
  kernel_size=1,
561
701
  padding="same",
562
702
  )
@@ -24,26 +24,150 @@ from braindecode.modules.attention import (
24
24
 
25
25
 
26
26
  class AttentionBaseNet(EEGModuleMixin, nn.Module):
27
- """AttentionBaseNet from Wimpff M et al. (2023) [Martin2023]_.
27
+ """
28
+
29
+ :bdg-success:`Convolution` :bdg-info:`Small Attention`
28
30
 
29
31
  .. figure:: https://content.cld.iop.org/journals/1741-2552/21/3/036020/revision2/jnead48b9f2_hr.jpg
30
- :align: center
31
- :alt: Attention Base Net
32
+ :align: center
33
+ :alt: AttentionBaseNet Architecture
34
+ :width: 640px
35
+
36
+
37
+ .. rubric:: Architectural Overview
38
+
39
+ AttentionBaseNet is a *convolution-first* network with a *channel-attention* stage.
40
+ The end-to-end flow is:
41
+
42
+ - (i) :class:`_FeatureExtractor` learns a temporal filter bank and per-filter spatial
43
+ projections (depthwise across electrodes), then condenses time by pooling;
44
+ - (ii) **Channel Expansion** uses a ``1x1`` convolution to set the feature width;
45
+ - (iii) :class:`_ChannelAttentionBlock` refines features via depthwise–pointwise temporal
46
+ convs and an optional channel-attention module (SE/CBAM/ECA/…);
47
+ - (iv) **Classifier** flattens the sequence and applies a linear readout.
48
+
49
+ This design mirrors shallow CNN pipelines (EEGNet-style stem) but inserts a pluggable
50
+ attention unit that *re-weights channels* (and optionally temporal positions) before
51
+ classification.
52
+
53
+
54
+ .. rubric:: Macro Components
55
+
56
+ - :class:`_FeatureExtractor` **(Shallow conv stem → condensed feature map)**
57
+
58
+ - *Operations.*
59
+ - **Temporal conv** (:class:`torch.nn.Conv2d`) with kernel ``(1, L_t)`` creates a learned
60
+ FIR-like filter bank with ``n_temporal_filters`` maps.
61
+ - **Depthwise spatial conv** (:class:`torch.nn.Conv2d`, ``groups=n_temporal_filters``)
62
+ with kernel ``(n_chans, 1)`` learns per-filter spatial projections over the full montage.
63
+ - **BatchNorm → ELU → AvgPool → Dropout** stabilize and downsample time.
64
+ - Output shape: ``(B, F2, 1, T₁)`` with ``F2 = n_temporal_filters x spatial_expansion``.
65
+
66
+ *Interpretability/robustness.* Temporal kernels behave as analyzable FIR filters; the
67
+ depthwise spatial step yields rhythm-specific topographies. Pooling acts as a local
68
+ integrator that reduces variance on short EEG windows.
69
+
70
+ - **Channel Expansion**
71
+
72
+ - *Operations.*
73
+ - A ``1x1`` conv → BN → activation maps ``F2 → ch_dim`` without changing
74
+ the temporal length ``T₁`` (shape: ``(B, ch_dim, 1, T₁)``).
75
+ This sets the embedding width for the attention block.
76
+
77
+ - :class:`_ChannelAttentionBlock` **(temporal refinement + channel attention)**
78
+
79
+ - *Operations.*
80
+ - **Depthwise temporal conv** ``(1, L_a)`` (groups=``ch_dim``) + **pointwise ``1x1``**,
81
+ BN and activation → preserves shape ``(B, ch_dim, 1, T₁)`` while refining timing.
82
+ - **Optional attention module** (see *Additional Mechanisms*) applies channel reweighting
83
+ (some variants also apply temporal gating).
84
+ - **AvgPool (1, P₂)** with stride ``(1, S₂)`` and **Dropout** → outputs
85
+ ``(B, ch_dim, 1, T₂)``.
86
+
87
+ *Role.* Emphasizes informative channels (and, in certain modes, salient time steps)
88
+ before the classifier; complements the convolutional priors with adaptive re-weighting.
89
+
90
+ - **Classifier (aggregation + readout)**
91
+
92
+ *Operations.* :class:`torch.nn.Flatten` → :class:`torch.nn.Linear` from
93
+ ``(B, ch_dim·T₂)`` to classes.
94
+
95
+
96
+ .. rubric:: Convolutional Details
97
+
98
+ - **Temporal (where time-domain patterns are learned).**
99
+ Wide kernels in the stem (``(1, L_t)``) act as a learned filter bank for oscillatory
100
+ bands/transients; the attention block’s depthwise temporal conv (``(1, L_a)``) sharpens
101
+ short-term dynamics after downsampling. Pool sizes/strides (``P₁,S₁`` then ``P₂,S₂``)
102
+ set the token rate and effective temporal resolution.
103
+
104
+ - **Spatial (how electrodes are processed).**
105
+ A depthwise spatial conv with kernel ``(n_chans, 1)`` spans the full montage to
106
+ learn *per-temporal-filter* spatial projections (no cross-filter mixing at this step),
107
+ mirroring the interpretable spatial stage in shallow CNNs.
108
+
109
+ - **Spectral (how frequency content is captured).**
110
+ No explicit Fourier/wavelet transform is used in the stem—spectral selectivity
111
+ emerges from learned temporal kernels. When ``attention_mode="fca"``, a frequency
112
+ channel attention (DCT-based) summarizes frequencies to drive channel weights.
32
113
 
33
- Neural Network from the paper: EEG motor imagery decoding:
34
- A framework for comparative analysis with channel attention
35
- mechanisms
36
114
 
37
- The paper and original code with more details about the methodological
38
- choices are available at the [Martin2023]_ and [MartinCode]_.
115
+ .. rubric:: Attention / Sequential Modules
39
116
 
40
- The AttentionBaseNet architecture is composed of four modules:
41
- - Input Block that performs a temporal convolution and a spatial
42
- convolution.
43
- - Channel Expansion that modifies the number of channels.
44
- - An attention block that performs channel attention with several
45
- options
46
- - ClassificationHead
117
+ - **Type.** Channel attention chosen by ``attention_mode`` (SE, ECA, CBAM, CAT, GSoP,
118
+ EncNet, GE, GCT, SRM, CATLite). Most operate purely on channels; CBAM/CAT additionally
119
+ include temporal attention.
120
+
121
+ - **Shapes.** Input/Output around attention: ``(B, ch_dim, 1, T₁)``. Re-arrangements
122
+ (if any) are internal to the module; the block returns the same shape before pooling.
123
+
124
+ - **Role.** Re-weights channels (and optionally time) to highlight informative sources
125
+ and suppress distractors, improving SNR ahead of the linear head.
126
+
127
+
128
+ .. rubric:: Additional Mechanisms
129
+
130
+ - **Attention variants at a glance.**
131
+ - ``"se"``: Squeeze-and-Excitation (global pooling → bottleneck → gates).
132
+ - ``"gsop"``: Global second-order pooling (covariance-aware channel weights).
133
+ - ``"fca"``: Frequency Channel Attention (DCT summary; uses ``seq_len`` and ``freq_idx``).
134
+ - ``"encnet"``: EncNet with learned codewords (uses ``n_codewords``).
135
+ - ``"eca"``: Efficient Channel Attention (local 1-D conv over channel descriptor; uses ``kernel_size``).
136
+ - ``"ge"``: Gather–Excite (context pooling with optional MLP; can use ``extra_params``).
137
+ - ``"gct"``: Gated Channel Transformation (global context normalization + gating).
138
+ - ``"srm"``: Style-based recalibration (mean–std descriptors; optional MLP).
139
+ - ``"cbam"``: Channel then temporal attention (uses ``kernel_size``).
140
+ - ``"cat"`` / ``"catlite"``: Collaborative (channel ± temporal) attention; *lite* omits temporal.
141
+ - **Auto-compatibility on short inputs.**
142
+
143
+ If the input duration is too short for the configured kernels/pools, the implementation
144
+ **automatically rescales** temporal lengths/strides downward (with a warning) to keep
145
+ shapes valid and preserve the pipeline semantics.
146
+
147
+
148
+ .. rubric:: Usage and Configuration
149
+
150
+ - ``n_temporal_filters``, ``temporal_filter_length`` and ``spatial_expansion``:
151
+ control the capacity and the number of spatial projections in the stem.
152
+ - ``pool_length_inp``, ``pool_stride_inp`` then ``pool_length``, ``pool_stride``:
153
+ trade temporal resolution for compute; they determine the final sequence length ``T₂``.
154
+ - ``ch_dim``: width after the ``1x1`` expansion and the effective embedding size for attention.
155
+ - ``attention_mode`` + its specific hyperparameters (``reduction_rate``,
156
+ ``kernel_size``, ``seq_len``, ``freq_idx``, ``n_codewords``, ``use_mlp``):
157
+ select and tune the reweighting mechanism.
158
+ - ``drop_prob_inp`` and ``drop_prob_attn``: regularize stem and attention stages.
159
+ - **Training tips.**
160
+
161
+ Start with moderate pooling (e.g., ``P₁=75,S₁=15``) and ELU activations; enable attention
162
+ only after the stem learns stable filters. For small datasets, prefer simpler modes
163
+ (``"se"``, ``"eca"``) before heavier ones (``"gsop"``, ``"encnet"``).
164
+
165
+ Notes
166
+ -----
167
+ - Sequence length after each stage is computed internally; the final classifier expects
168
+ a flattened ``ch_dim x T₂`` vector.
169
+ - Attention operates on *channel* dimension by design; temporal gating exists only in
170
+ specific variants (CBAM/CAT).
47
171
 
48
172
  .. versionadded:: 0.9
49
173
 
@@ -73,18 +197,18 @@ class AttentionBaseNet(EEGModuleMixin, nn.Module):
73
197
  the depth of the network after the initial layer. Default is 16.
74
198
  attention_mode : str, optional
75
199
  The type of attention mechanism to apply. If `None`, no attention is applied.
76
- - "se" for Squeeze-and-excitation network
77
- - "gsop" for Global Second-Order Pooling
78
- - "fca" for Frequency Channel Attention Network
79
- - "encnet" for context encoding module
80
- - "eca" for Efficient channel attention for deep convolutional neural networks
81
- - "ge" for Gather-Excite
82
- - "gct" for Gated Channel Transformation
83
- - "srm" for Style-based Recalibration Module
84
- - "cbam" for Convolutional Block Attention Module
85
- - "cat" for Learning to collaborate channel and temporal attention
86
- from multi-information fusion
87
- - "catlite" for Learning to collaborate channel attention
200
+ - "se" for Squeeze-and-excitation network
201
+ - "gsop" for Global Second-Order Pooling
202
+ - "fca" for Frequency Channel Attention Network
203
+ - "encnet" for context encoding module
204
+ - "eca" for Efficient channel attention for deep convolutional neural networks
205
+ - "ge" for Gather-Excite
206
+ - "gct" for Gated Channel Transformation
207
+ - "srm" for Style-based Recalibration Module
208
+ - "cbam" for Convolutional Block Attention Module
209
+ - "cat" for Learning to collaborate channel and temporal attention
210
+ from multi-information fusion
211
+ - "catlite" for Learning to collaborate channel attention
88
212
  from multi-information fusion (lite version, cat w/o temporal attention)
89
213
  pool_length : int, default=8
90
214
  The length of the window for the average pooling operation.
@@ -27,22 +27,25 @@ class EEGConformer(EEGModuleMixin, nn.Module):
27
27
  EEG-Conformer is a *convolution-first* model augmented with a *lightweight transformer
28
28
  encoder*. The end-to-end flow is:
29
29
 
30
- - (i) :class:`_PatchEmbedding` converts the continuous EEG into a compact sequence of tokens via a :class:`ShallowFBCSPNet` temporal–spatial conv stem and temporal pooling;
31
- - (ii) :class:`_TransformerEncoder applies small multi-head self-attention to integrate longer-range temporal context across tokens;
30
+ - (i) :class:`_PatchEmbedding` converts the continuous EEG into a compact sequence of tokens via a
31
+ :class:`ShallowFBCSPNet` temporal–spatial conv stem and temporal pooling;
32
+ - (ii) :class:`_TransformerEncoder` applies small multi-head self-attention to integrate
33
+ longer-range temporal context across tokens;
32
34
  - (iii) :class:`_ClassificationHead` aggregates the sequence and performs a linear readout.
33
- This preserves the strong inductive biases of shallow CNN filter banks while adding
34
- just enough attention to capture dependencies beyond the pooling horizon [song2022]_.
35
+ This preserves the strong inductive biases of shallow CNN filter banks while adding
36
+ just enough attention to capture dependencies beyond the pooling horizon [song2022]_.
35
37
 
36
38
  .. rubric:: Macro Components
37
39
 
38
40
  - :class:`_PatchEmbedding` **(Shallow conv stem → tokens)**
39
41
 
40
- - *Operations.*
41
- - A temporal convolution (`:class:`torch.nn.Conv2d`) ``(1 x L_t)`` forms a data-driven "filter bank";
42
- - A spatial convolution (`:class:`torch.nn.Conv2d`) (n_chans x 1)`` projects across electrodes, collapsing the channel axis into a virtual channel.
43
- - **Normalization function** `:class:torch.nn.BatchNorm`
44
- - **Activation function** `:class:torch.nn.ELU`
45
- - **Average Pooling** `:class:torch.nn.AvgPool` along time (kernel ``(1, P)`` with stride ``(1, S)``)
42
+ - *Operations.*
43
+ - A temporal convolution (`:class:torch.nn.Conv2d`) ``(1 x L_t)`` forms a data-driven "filter bank";
44
+ - A spatial convolution (`:class:torch.nn.Conv2d`) (n_chans x 1)`` projects across electrodes,
45
+ collapsing the channel axis into a virtual channel.
46
+ - **Normalization function** :class:`torch.nn.BatchNorm`
47
+ - **Activation function** :class:`torch.nn.ELU`
48
+ - **Average Pooling** :class:`torch.nn.AvgPool` along time (kernel ``(1, P)`` with stride ``(1, S)``)
46
49
  - final ``1x1`` :class:`torch.nn.Linear` projection.
47
50
 
48
51
  The result is rearranged to a token sequence ``(B, S_tokens, D)``, where ``D = n_filters_time``.
@@ -53,7 +56,7 @@ class EEGConformer(EEGModuleMixin, nn.Module):
53
56
 
54
57
  - :class:`_TransformerEncoder` **(context over temporal tokens)**
55
58
 
56
- - *Operations.*
59
+ - *Operations.*
57
60
  - A stack of ``att_depth`` encoder blocks. :class:`_TransformerEncoderBlock`
58
61
  - Each block applies LayerNorm :class:`torch.nn.LayerNorm`
59
62
  - Multi-Head Self-Attention (``att_heads``) with dropout + residual :class:`MultiHeadAttention` (:class:`torch.nn.Dropout`)
@@ -67,7 +70,7 @@ class EEGConformer(EEGModuleMixin, nn.Module):
67
70
 
68
71
  - :class:`ClassificationHead` **(aggregation + readout)**
69
72
 
70
- - *Operations*.
73
+ - *Operations*.
71
74
  - Flatten, :class:`torch.nn.Flatten` the sequence ``(B, S_tokens·D)`` -
72
75
  - MLP (:class:`torch.nn.Linear` → activation (default: :class:`torch.nn.ELU`) → :class:`torch.nn.Dropout` → :class:`torch.nn.Linear`)
73
76
  - final Linear to classes.
@@ -100,8 +103,8 @@ class EEGConformer(EEGModuleMixin, nn.Module):
100
103
  - **Type.** Standard multi-head self-attention (MHA) with ``att_heads`` heads over the token sequence.
101
104
  - **Shapes.** Input/Output: ``(B, S_tokens, D)``; attention operates along the ``S_tokens`` axis.
102
105
  - **Role.** Re-weights and integrates evidence across pooled windows, capturing dependencies
103
- longer than any single token while leaving channel relationships to the convolutional stem.
104
- The design is intentionally *small*—attention refines rather than replaces convolutional feature extraction.
106
+ longer than any single token while leaving channel relationships to the convolutional stem.
107
+ The design is intentionally *small*—attention refines rather than replaces convolutional feature extraction.
105
108
 
106
109
  .. rubric:: Additional Mechanisms
107
110
 
@@ -112,20 +115,20 @@ class EEGConformer(EEGModuleMixin, nn.Module):
112
115
  refine temporal context before classification.
113
116
 
114
117
  - **Tokenization knob.** ``pool_time_length`` and especially ``pool_time_stride`` set
115
- the number of tokens ``S_tokens``. Smaller strides → more tokens and higher attention
116
- capacity (but higher compute); larger strides → fewer tokens and stronger inductive bias.
118
+ the number of tokens ``S_tokens``. Smaller strides → more tokens and higher attention
119
+ capacity (but higher compute); larger strides → fewer tokens and stronger inductive bias.
117
120
 
118
121
  - **Embedding dimension = filters.** ``n_filters_time`` serves double duty as both the
119
- number of temporal filters in the stem and the transformer’s embedding size ``D``,
120
- simplifying dimensional alignment.
122
+ number of temporal filters in the stem and the transformer’s embedding size ``D``,
123
+ simplifying dimensional alignment.
121
124
 
122
125
  .. rubric:: Usage and Configuration
123
126
 
124
127
  - **Instantiation.** Choose ``n_filters_time`` (embedding size ``D``) and
125
- ``filter_time_length`` to match the rhythms of interest. Tune
126
- ``pool_time_length/stride`` to trade temporal resolution for sequence length.
127
- Keep ``att_depth`` modest (e.g., 4–6) and set ``att_heads`` to divide ``D``.
128
- ``final_fc_length="auto"`` infers the flattened size from PatchEmbedding.
128
+ ``filter_time_length`` to match the rhythms of interest. Tune
129
+ ``pool_time_length/stride`` to trade temporal resolution for sequence length.
130
+ Keep ``att_depth`` modest (e.g., 4–6) and set ``att_heads`` to divide ``D``.
131
+ ``final_fc_length="auto"`` infers the flattened size from PatchEmbedding.
129
132
 
130
133
  Notes
131
134
  -----
@@ -31,8 +31,7 @@ class EEGNetv4(EEGModuleMixin, nn.Sequential):
31
31
 
32
32
  .. rubric:: Architectural Overview
33
33
 
34
- EEGNetv4 is a compact convolutional network designed for EEG decoding with a
35
- pipeline that mirrors classical EEG processing:
34
+ EEGNetv4 is a compact convolutional network designed for EEG decoding with a pipeline that mirrors classical EEG processing:
36
35
  - (i) learn temporal frequency-selective filters,
37
36
  - (ii) learn spatial filters for those frequencies, and
38
37
  - (iii) condense features with depthwise–separable convolutions before a lightweight classifier.
@@ -56,16 +55,16 @@ class EEGNetv4(EEGModuleMixin, nn.Sequential):
56
55
 
57
56
  .. rubric:: Convolutional Details
58
57
 
59
- **Temporal.** The initial temporal convs serve as a *learned filter bank*:
60
- long 1-D kernels (implemented as 2-D with singleton spatial extent) emphasize oscillatory bands and transients.
61
- Because this stage is linear prior to BN/ELU, kernels can be analyzed as FIR filters to reveal each feature’s spectrum [Lawhern2018]_.
58
+ - **Temporal.** The initial temporal convs serve as a *learned filter bank*:
59
+ long 1-D kernels (implemented as 2-D with singleton spatial extent) emphasize oscillatory bands and transients.
60
+ Because this stage is linear prior to BN/ELU, kernels can be analyzed as FIR filters to reveal each feature’s spectrum [Lawhern2018]_.
62
61
 
63
- **Spatial.** The depthwise spatial conv spans the full channel axis (kernel height = #electrodes; temporal size = 1).
64
- With ``groups = F1``, each temporal filter learns its own set of ``D`` spatial projections—akin to CSP, learned end-to-end and
65
- typically regularized with max-norm.
62
+ - **Spatial.** The depthwise spatial conv spans the full channel axis (kernel height = #electrodes; temporal size = 1).
63
+ With ``groups = F1``, each temporal filter learns its own set of ``D`` spatial projections—akin to CSP, learned end-to-end and
64
+ typically regularized with max-norm.
66
65
 
67
- **Spectral.** No explicit Fourier/wavelet transform is used. Frequency structure
68
- is captured implicitly by the temporal filter bank; later depthwise temporal kernels act as short-time integrators/refiners.
66
+ - **Spectral.** No explicit Fourier/wavelet transform is used. Frequency structure
67
+ is captured implicitly by the temporal filter bank; later depthwise temporal kernels act as short-time integrators/refiners.
69
68
 
70
69
  .. rubric:: Additional Comments
71
70
 
@@ -16,9 +16,123 @@ from braindecode.modules import Conv2dWithConstraint, LinearWithConstraint
16
16
  class EEGNeX(EEGModuleMixin, nn.Module):
17
17
  """EEGNeX model from Chen et al. (2024) [eegnex]_.
18
18
 
19
+ :bdg-success:`Convolution`
20
+
19
21
  .. figure:: https://braindecode.org/dev/_static/model/eegnex.jpg
20
22
  :align: center
21
23
  :alt: EEGNeX Architecture
24
+ :width: 620px
25
+
26
+ .. rubric:: Architectural Overview
27
+
28
+ EEGNeX is a **purely convolutional** architecture that refines the EEGNet-style stem
29
+ and deepens the temporal stack with **dilated temporal convolutions**. The end-to-end
30
+ flow is:
31
+
32
+ - (i) **Block-1/2**: two temporal convolutions ``(1 x L)`` with BN refine a
33
+ learned FIR-like *temporal filter bank* (no pooling yet);
34
+ - (ii) **Block-3**: depthwise **spatial** convolution across electrodes
35
+ ``(n_chans x 1)`` with max-norm constraint, followed by ELU → AvgPool (time) → Dropout;
36
+ - (iii) **Block-4/5**: two additional **temporal** convolutions with increasing **dilation**
37
+ to expand the receptive field; the last block applies ELU → AvgPool → Dropout → Flatten;
38
+ - (iv) **Classifier**: a max-norm–constrained linear layer.
39
+
40
+ The published work positions EEGNeX as a compact, conv-only alternative that consistently
41
+ outperforms prior baselines across MOABB-style benchmarks, with the popular
42
+ “EEGNeX-8,32” shorthand denoting *8 temporal filters* and *kernel length 32*.
43
+
44
+
45
+ .. rubric:: Macro Components
46
+
47
+ - **Block-1 / Block-2 — Temporal filter (learned).**
48
+
49
+ - *Operations.*
50
+ - :class:`torch.nn.Conv2d` with kernels ``(1, L)``
51
+ - :class:`torch.nn.BatchNorm2d` (no nonlinearity until Block-3, mirroring a linear FIR analysis stage).
52
+ These layers set up frequency-selective detectors before spatial mixing.
53
+
54
+ - *Interpretability.* Kernels can be inspected as FIR filters; two stacked temporal
55
+ convs allow longer effective kernels without parameter blow-up.
56
+
57
+ - **Block-3 — Spatial projection + condensation.**
58
+
59
+ - *Operations.*
60
+ - :class:`braindecode.modules.Conv2dWithConstraint` with kernel``(n_chans, 1)``
61
+ and ``groups = filter_2`` (depthwise across filters)
62
+ - :class:`torch.nn.BatchNorm2d`
63
+ - :class:`torch.nn.ELU`
64
+ - :class:`torch.nn.AvgPool2d` (time)
65
+ - :class:`torch.nn.Dropout`.
66
+
67
+ *Role.* Learns per-filter spatial patterns over the **full montage** while temporal
68
+ pooling stabilizes and compresses features; max-norm encourages well-behaved spatial
69
+ weights similar to EEGNet practice.
70
+
71
+ - **Block-4 / Block-5 — Dilated temporal integration.**
72
+
73
+ - *Operations.*
74
+ - :class:`torch.nn.Conv2d` with kernels ``(1, k)`` and **dilations**
75
+ (e.g., 2 then 4);
76
+ - :class:`torch.nn.BatchNorm2d`
77
+ - :class:`torch.nn.ELU`
78
+ - :class:`torch.nn.AvgPool2d` (time)
79
+ - :class:`torch.nn.Dropout`
80
+ - :class:`torch.nn.Flatten`.
81
+
82
+ *Role.* Expands the temporal receptive field efficiently to capture rhythms and
83
+ long-range context after condensation.
84
+
85
+ - **Final Classifier — Max-norm linear.**
86
+
87
+ - *Operations.*
88
+ - :class:`braindecode.modules.LinearWithConstraint` maps the flattened
89
+ vector to the target classes; the max-norm constraint regularizes the readout.
90
+
91
+
92
+ .. rubric:: Convolutional Details
93
+
94
+ - **Temporal (where time-domain patterns are learned).**
95
+ Blocks 1-2 learn the primary filter bank (oscillations/transients), while Blocks 4-5
96
+ use **dilation** to integrate over longer horizons without extra pooling. The final
97
+ AvgPool in Block-5 sets the output token rate and helps noise suppression.
98
+
99
+ - **Spatial (how electrodes are processed).**
100
+ A *single* depthwise spatial conv (Block-3) spans the entire electrode set
101
+ (kernel ``(n_chans, 1)``), producing per-temporal-filter topographies; no cross-filter
102
+ mixing occurs at this stage, aiding interpretability.
103
+
104
+ - **Spectral (how frequency content is captured).**
105
+ Frequency selectivity emerges from the learned temporal kernels; dilation broadens effective
106
+ bandwidth coverage by composing multiple scales.
107
+
108
+ .. rubric:: Additional Mechanisms
109
+
110
+ - **EEGNeX-8,32 naming.** “8,32” indicates *8 temporal filters* and *kernel length 32*,
111
+ reflecting the paper's ablation path from EEGNet-8,2 toward thicker temporal kernels
112
+ and a deeper conv stack.
113
+ - **Max-norm constraints.** Spatial (Block-3) and final linear layers use max-norm
114
+ regularization—standard in EEG CNNs—to reduce overfitting and encourage stable spatial
115
+ patterns.
116
+
117
+ .. rubric:: Usage and Configuration
118
+
119
+ - **Kernel schedule.** Start with the canonical **EEGNeX-8,32** (``filter_1=8``,
120
+ ``kernel_block_1_2=32``) and keep **Block-3** depth multiplier modest (e.g., 2) to match
121
+ the paper's “pure conv” profile.
122
+ - **Pooling vs. dilation.** Use pooling in Blocks 3 and 5 to control compute and variance;
123
+ increase dilations (Blocks 4-5) to widen temporal context when windows are short.
124
+ - **Regularization.** Combine dropout (Blocks 3 & 5) with max-norm on spatial and
125
+ classifier layers; prefer ELU activations for stable training on small EEG datasets.
126
+
127
+
128
+ Notes
129
+ -----
130
+ - The braindecode implementation follows the paper's conv-only design with five blocks
131
+ and reproduces the depthwise spatial step and dilated temporal stack. See the class
132
+ reference for exact kernel sizes, dilations, and pooling defaults.
133
+
134
+ .. versionadded:: 1.1
135
+
22
136
 
23
137
  Parameters
24
138
  ----------
braindecode/version.py CHANGED
@@ -1 +1 @@
1
- __version__ = "1.2.0.dev176358851"
1
+ __version__ = "1.2.0.dev180217551"
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: braindecode
3
- Version: 1.2.0.dev176358851
3
+ Version: 1.2.0.dev180217551
4
4
  Summary: Deep learning software to decode EEG, ECG or MEG signals
5
5
  Author-email: Robin Tibor Schirrmeister <robintibor@gmail.com>
6
6
  Maintainer-email: Alexandre Gramfort <agramfort@meta.com>, Bruno Aristimunha Pinto <b.aristimunha@gmail.com>, Robin Tibor Schirrmeister <robintibor@gmail.com>
@@ -3,7 +3,7 @@ braindecode/classifier.py,sha256=k9vSCtfQbld0YVleDi5rrrmk6k_k5JYEPPBYcNxYjZ8,980
3
3
  braindecode/eegneuralnet.py,sha256=dz8k_-2jV7WqkaX4bQG-dmr-vRT7ZtOwJqomXyC9PTw,15287
4
4
  braindecode/regressor.py,sha256=VLfrpiXklwI4onkwue3QmzlBWcvspu0tlrLo9RT1Oiw,9375
5
5
  braindecode/util.py,sha256=J-tBcDJNlMTIFW2mfOy6Ko0nsgdP4obRoEVDeg2rFH0,12686
6
- braindecode/version.py,sha256=pqOP2XIkF23sSxHn98gks8lm_NZ40LR5V6vz_cuxJQo,35
6
+ braindecode/version.py,sha256=Y04aPFEYg6CFbsSt6BtmY3lNGnXYQ3FAwT0WYnn5X2Q,35
7
7
  braindecode/augmentation/__init__.py,sha256=LG7ONqCufYAF9NZt8POIp10lYXb8iSueYkF-CWGK2Ls,1001
8
8
  braindecode/augmentation/base.py,sha256=gg7wYsVfa9jfqBddtE03B5ZrPHFFmPl2sa3LOrRnGfo,7325
9
9
  braindecode/augmentation/functional.py,sha256=ygkMNEFHaUdRQfk7meMML19FnM406Uf34h-ztKXdJwM,37978
@@ -27,21 +27,21 @@ braindecode/functional/__init__.py,sha256=JPUDFeKtfogEzfrwPaZRBmxexPjBw7AglYMlIm
27
27
  braindecode/functional/functions.py,sha256=CoEweM6YLhigx0tNmmz6yAc8iQ078sTFY2GeCjK5fFs,8622
28
28
  braindecode/functional/initialization.py,sha256=BUSC7y2TMsfShpMYBVwm3xg3ODFqWp-STH7yD4sn8zk,1388
29
29
  braindecode/models/__init__.py,sha256=xv1QPELZxocPgbc_mz-eYM5w08ZDNOsDV4pOnIFhUww,2551
30
- braindecode/models/atcnet.py,sha256=PhDJl6nBChButabjsmLz_heRcGFCCMKoeUt7k7neNzs,24483
31
- braindecode/models/attentionbasenet.py,sha256=1uwrtsdEGiBwokkO8A_2SR5zapOTQUBZd4q7hIpR0cw,23359
30
+ braindecode/models/atcnet.py,sha256=Pn5KzQjv7YxSNDr_CY6O_Yg9K4m9XJ7btCIqyzkcPxc,32102
31
+ braindecode/models/attentionbasenet.py,sha256=zqSzrFAjl89EoacWHuPLbjBRY4RW2awdhWElb-d9JjY,30178
32
32
  braindecode/models/base.py,sha256=9icrWNZBGbh_VLyB9m8g_K1QyK7s3mh8X-hJ29gEbWs,10802
33
33
  braindecode/models/biot.py,sha256=T4PymX3penMJcrdfb5Nq6B3P-jyP2laAIu_R9o3uCXo,17512
34
34
  braindecode/models/contrawr.py,sha256=eeR_ik4gNZ3rJLM6Mw9gJ2gTMkZ8CU8C4rN_GQMQTAE,10044
35
35
  braindecode/models/ctnet.py,sha256=-J9QtUM8kcntz_xinfuBBvwDMECHiMPMcr2MS4GDPEY,17308
36
36
  braindecode/models/deep4.py,sha256=YJQUw-0EuFUi4qjm8caJGB8wRM_aeJa5X_d8jrGaQAI,14588
37
37
  braindecode/models/deepsleepnet.py,sha256=RrciuVJtZ-fhiUl-yLPfK2FP-G29V5Wor6pPlrMHQWQ,9218
38
- braindecode/models/eegconformer.py,sha256=OSORbNYwMA0hvUMjuyB8wI8qBKVSiraioLHTHmt8sdQ,17376
38
+ braindecode/models/eegconformer.py,sha256=rxMAmqErDVLq7nS77CnTtpcC3C2OR_EoZ8-jG-dKP9I,17433
39
39
  braindecode/models/eeginception_erp.py,sha256=mwh3rGSHAJVvnbOlYTuWWkKxlmFAdAXBNCrq4IPgOS4,11408
40
40
  braindecode/models/eeginception_mi.py,sha256=aKJRFuYrpbcRbmmT2xVghKbK8pnl7fzu5hrV0ybRKso,12424
41
41
  braindecode/models/eegitnet.py,sha256=feXFmPCd-Ejxt7jgWPen1Ag0-oSclDVQai0Atwu9d_A,9827
42
42
  braindecode/models/eegminer.py,sha256=ouKZah9Q7_sxT7DJJMcPObwVxNQE87sEljJg6QwiQNw,9847
43
- braindecode/models/eegnet.py,sha256=YeBCmU6Al9FDS4MZQTOLd0MCUfPbM6tmVlGWpb59Qzg,19256
44
- braindecode/models/eegnex.py,sha256=KNJIh8pFNhY087Bey2OPzDD4Uqw9pS6UkwMjnOngBzg,8497
43
+ braindecode/models/eegnet.py,sha256=CtfQuw7iaxQh3j1dRmF_UhdjfO3uHOlObnmasHk_boM,19268
44
+ braindecode/models/eegnex.py,sha256=xfaISCgW5ShlUh9fFBHc5ylz80OhW5C69Fi-_0hVS5U,13767
45
45
  braindecode/models/eegresnet.py,sha256=cqWOSGqfJN_dNYUU9l8nYd_S3T1N-UX5-encKQzfBlg,12057
46
46
  braindecode/models/eegsimpleconv.py,sha256=sHpK-7ZGOCMuXsdkSVuarFTd1T0jMJUP_xwXP3gxQwc,7268
47
47
  braindecode/models/eegtcnet.py,sha256=np-93Ttctp2uaEYpMrfXfH5bJmCOUZZHLjv8GJEEym4,10830
@@ -93,9 +93,9 @@ braindecode/training/scoring.py,sha256=WRkwqbitA3m_dzRnGp2ZIZPge5Nhx9gAEQhIHzeH4
93
93
  braindecode/visualization/__init__.py,sha256=4EER_xHqZIDzEvmgUEm7K1bgNKpyZAIClR9ZCkMuY4M,240
94
94
  braindecode/visualization/confusion_matrices.py,sha256=qIWMLEHow5CJ7PhGggD8mnD55Le6xhma9HSzt4R33fc,9509
95
95
  braindecode/visualization/gradients.py,sha256=KZo-GA0uwiwty2_94j2IjmCR2SKcfPb1Bi3sQq7vpTk,2170
96
- braindecode-1.2.0.dev176358851.dist-info/licenses/LICENSE.txt,sha256=7rg7k6hyj8m9whQ7dpKbqnCssoOEx_Mbtqb4uSOjljE,1525
97
- braindecode-1.2.0.dev176358851.dist-info/licenses/NOTICE.txt,sha256=sOxuTbalPxTM8H6VqtvGbXCt_BoOF7JevEYG_knqbm4,620
98
- braindecode-1.2.0.dev176358851.dist-info/METADATA,sha256=23uR3nYKaKV2G4EtrutqVU6r40qFe4_Oh9DuXDX5fFI,6883
99
- braindecode-1.2.0.dev176358851.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
100
- braindecode-1.2.0.dev176358851.dist-info/top_level.txt,sha256=pHsWQmSy0uhIez62-HA9j0iaXKvSbUL39ifFRkFnChA,12
101
- braindecode-1.2.0.dev176358851.dist-info/RECORD,,
96
+ braindecode-1.2.0.dev180217551.dist-info/licenses/LICENSE.txt,sha256=7rg7k6hyj8m9whQ7dpKbqnCssoOEx_Mbtqb4uSOjljE,1525
97
+ braindecode-1.2.0.dev180217551.dist-info/licenses/NOTICE.txt,sha256=sOxuTbalPxTM8H6VqtvGbXCt_BoOF7JevEYG_knqbm4,620
98
+ braindecode-1.2.0.dev180217551.dist-info/METADATA,sha256=3_WlN-hqNZ-mvQRBABpJzmU2cIb_RKSpjWtgghqT4A0,6883
99
+ braindecode-1.2.0.dev180217551.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
100
+ braindecode-1.2.0.dev180217551.dist-info/top_level.txt,sha256=pHsWQmSy0uhIez62-HA9j0iaXKvSbUL39ifFRkFnChA,12
101
+ braindecode-1.2.0.dev180217551.dist-info/RECORD,,