braindecode 1.2.0.dev182094932__py3-none-any.whl → 1.3.0.dev173691341__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of braindecode might be problematic. Click here for more details.

Files changed (34) hide show
  1. braindecode/datasets/experimental.py +218 -0
  2. braindecode/models/__init__.py +6 -8
  3. braindecode/models/atcnet.py +156 -16
  4. braindecode/models/attentionbasenet.py +151 -26
  5. braindecode/models/{sleep_stager_eldele_2021.py → attn_sleep.py} +12 -2
  6. braindecode/models/ctnet.py +1 -1
  7. braindecode/models/deep4.py +6 -2
  8. braindecode/models/deepsleepnet.py +118 -5
  9. braindecode/models/eegconformer.py +114 -15
  10. braindecode/models/eeginception_erp.py +76 -7
  11. braindecode/models/eeginception_mi.py +2 -0
  12. braindecode/models/eegnet.py +27 -190
  13. braindecode/models/eegnex.py +113 -6
  14. braindecode/models/eegsimpleconv.py +2 -0
  15. braindecode/models/eegtcnet.py +1 -1
  16. braindecode/models/sccnet.py +81 -8
  17. braindecode/models/shallow_fbcsp.py +2 -0
  18. braindecode/models/sleep_stager_blanco_2020.py +2 -0
  19. braindecode/models/sleep_stager_chambon_2018.py +2 -0
  20. braindecode/models/sparcnet.py +2 -0
  21. braindecode/models/summary.csv +39 -41
  22. braindecode/models/tidnet.py +2 -0
  23. braindecode/models/tsinception.py +15 -3
  24. braindecode/models/usleep.py +103 -9
  25. braindecode/models/util.py +5 -5
  26. braindecode/preprocessing/preprocess.py +20 -26
  27. braindecode/version.py +1 -1
  28. {braindecode-1.2.0.dev182094932.dist-info → braindecode-1.3.0.dev173691341.dist-info}/METADATA +7 -2
  29. {braindecode-1.2.0.dev182094932.dist-info → braindecode-1.3.0.dev173691341.dist-info}/RECORD +33 -33
  30. braindecode/models/eegresnet.py +0 -362
  31. {braindecode-1.2.0.dev182094932.dist-info → braindecode-1.3.0.dev173691341.dist-info}/WHEEL +0 -0
  32. {braindecode-1.2.0.dev182094932.dist-info → braindecode-1.3.0.dev173691341.dist-info}/licenses/LICENSE.txt +0 -0
  33. {braindecode-1.2.0.dev182094932.dist-info → braindecode-1.3.0.dev173691341.dist-info}/licenses/NOTICE.txt +0 -0
  34. {braindecode-1.2.0.dev182094932.dist-info → braindecode-1.3.0.dev173691341.dist-info}/top_level.txt +0 -0
@@ -26,25 +26,150 @@ from braindecode.modules.attention import (
26
26
  class AttentionBaseNet(EEGModuleMixin, nn.Module):
27
27
  """AttentionBaseNet from Wimpff M et al. (2023) [Martin2023]_.
28
28
 
29
+ :bdg-success:`Convolution` :bdg-info:`Small Attention`
30
+
29
31
  .. figure:: https://content.cld.iop.org/journals/1741-2552/21/3/036020/revision2/jnead48b9f2_hr.jpg
30
- :align: center
31
- :alt: Attention Base Net
32
+ :align: center
33
+ :alt: AttentionBaseNet Architecture
34
+ :width: 640px
35
+
36
+
37
+ .. rubric:: Architectural Overview
38
+
39
+ AttentionBaseNet is a *convolution-first* network with a *channel-attention* stage.
40
+ The end-to-end flow is:
41
+
42
+ - (i) :class:`_FeatureExtractor` learns a temporal filter bank and per-filter spatial
43
+ projections (depthwise across electrodes), then condenses time by pooling;
44
+ - (ii) **Channel Expansion** uses a ``1x1`` convolution to set the feature width;
45
+ - (iii) :class:`_ChannelAttentionBlock` refines features via depthwise–pointwise temporal
46
+ convs and an optional channel-attention module (SE/CBAM/ECA/…);
47
+ - (iv) **Classifier** flattens the sequence and applies a linear readout.
48
+
49
+ This design mirrors shallow CNN pipelines (EEGNet-style stem) but inserts a pluggable
50
+ attention unit that *re-weights channels* (and optionally temporal positions) before
51
+ classification.
52
+
53
+
54
+ .. rubric:: Macro Components
55
+
56
+ - :class:`_FeatureExtractor` **(Shallow conv stem → condensed feature map)**
57
+
58
+ - *Operations.*
59
+ - **Temporal conv** (:class:`torch.nn.Conv2d`) with kernel ``(1, L_t)`` creates a learned
60
+ FIR-like filter bank with ``n_temporal_filters`` maps.
61
+ - **Depthwise spatial conv** (:class:`torch.nn.Conv2d`, ``groups=n_temporal_filters``)
62
+ with kernel ``(n_chans, 1)`` learns per-filter spatial projections over the full montage.
63
+ - **BatchNorm → ELU → AvgPool → Dropout** stabilize and downsample time.
64
+ - Output shape: ``(B, F2, 1, T₁)`` with ``F2 = n_temporal_filters x spatial_expansion``.
65
+
66
+ *Interpretability/robustness.* Temporal kernels behave as analyzable FIR filters; the
67
+ depthwise spatial step yields rhythm-specific topographies. Pooling acts as a local
68
+ integrator that reduces variance on short EEG windows.
69
+
70
+ - **Channel Expansion**
71
+
72
+ - *Operations.*
73
+ - A ``1x1`` conv → BN → activation maps ``F2 → ch_dim`` without changing
74
+ the temporal length ``T₁`` (shape: ``(B, ch_dim, 1, T₁)``).
75
+ This sets the embedding width for the attention block.
76
+
77
+ - :class:`_ChannelAttentionBlock` **(temporal refinement + channel attention)**
78
+
79
+ - *Operations.*
80
+ - **Depthwise temporal conv** ``(1, L_a)`` (groups=``ch_dim``) + **pointwise ``1x1``**,
81
+ BN and activation → preserves shape ``(B, ch_dim, 1, T₁)`` while refining timing.
82
+ - **Optional attention module** (see *Additional Mechanisms*) applies channel reweighting
83
+ (some variants also apply temporal gating).
84
+ - **AvgPool (1, P₂)** with stride ``(1, S₂)`` and **Dropout** → outputs
85
+ ``(B, ch_dim, 1, T₂)``.
86
+
87
+ *Role.* Emphasizes informative channels (and, in certain modes, salient time steps)
88
+ before the classifier; complements the convolutional priors with adaptive re-weighting.
89
+
90
+ - **Classifier (aggregation + readout)**
91
+
92
+ *Operations.* :class:`torch.nn.Flatten` → :class:`torch.nn.Linear` from
93
+ ``(B, ch_dim·T₂)`` to classes.
94
+
95
+
96
+ .. rubric:: Convolutional Details
97
+
98
+ - **Temporal (where time-domain patterns are learned).**
99
+ Wide kernels in the stem (``(1, L_t)``) act as a learned filter bank for oscillatory
100
+ bands/transients; the attention block’s depthwise temporal conv (``(1, L_a)``) sharpens
101
+ short-term dynamics after downsampling. Pool sizes/strides (``P₁,S₁`` then ``P₂,S₂``)
102
+ set the token rate and effective temporal resolution.
103
+
104
+ - **Spatial (how electrodes are processed).**
105
+ A depthwise spatial conv with kernel ``(n_chans, 1)`` spans the full montage to
106
+ learn *per-temporal-filter* spatial projections (no cross-filter mixing at this step),
107
+ mirroring the interpretable spatial stage in shallow CNNs.
32
108
 
33
- Neural Network from the paper: EEG motor imagery decoding:
34
- A framework for comparative analysis with channel attention
35
- mechanisms
109
+ - **Spectral (how frequency content is captured).**
110
+ No explicit Fourier/wavelet transform is used in the stem—spectral selectivity
111
+ emerges from learned temporal kernels. When ``attention_mode="fca"``, a frequency
112
+ channel attention (DCT-based) summarizes frequencies to drive channel weights.
36
113
 
37
- The paper and original code with more details about the methodological
38
- choices are available at the [Martin2023]_ and [MartinCode]_.
39
114
 
40
- The AttentionBaseNet architecture is composed of four modules:
41
- - Input Block that performs a temporal convolution and a spatial
42
- convolution.
43
- - Channel Expansion that modifies the number of channels.
44
- - An attention block that performs channel attention with several
45
- options
46
- - ClassificationHead
115
+ .. rubric:: Attention / Sequential Modules
47
116
 
117
+ - **Type.** Channel attention chosen by ``attention_mode`` (SE, ECA, CBAM, CAT, GSoP,
118
+ EncNet, GE, GCT, SRM, CATLite). Most operate purely on channels; CBAM/CAT additionally
119
+ include temporal attention.
120
+
121
+ - **Shapes.** Input/Output around attention: ``(B, ch_dim, 1, T₁)``. Re-arrangements
122
+ (if any) are internal to the module; the block returns the same shape before pooling.
123
+
124
+ - **Role.** Re-weights channels (and optionally time) to highlight informative sources
125
+ and suppress distractors, improving SNR ahead of the linear head.
126
+
127
+
128
+ .. rubric:: Additional Mechanisms
129
+
130
+ - **Attention variants at a glance.**
131
+ - ``"se"``: Squeeze-and-Excitation (global pooling → bottleneck → gates).
132
+ - ``"gsop"``: Global second-order pooling (covariance-aware channel weights).
133
+ - ``"fca"``: Frequency Channel Attention (DCT summary; uses ``seq_len`` and ``freq_idx``).
134
+ - ``"encnet"``: EncNet with learned codewords (uses ``n_codewords``).
135
+ - ``"eca"``: Efficient Channel Attention (local 1-D conv over channel descriptor; uses ``kernel_size``).
136
+ - ``"ge"``: Gather–Excite (context pooling with optional MLP; can use ``extra_params``).
137
+ - ``"gct"``: Gated Channel Transformation (global context normalization + gating).
138
+ - ``"srm"``: Style-based recalibration (mean–std descriptors; optional MLP).
139
+ - ``"cbam"``: Channel then temporal attention (uses ``kernel_size``).
140
+ - ``"cat"`` / ``"catlite"``: Collaborative (channel ± temporal) attention; *lite* omits temporal.
141
+ - **Auto-compatibility on short inputs.**
142
+
143
+ If the input duration is too short for the configured kernels/pools, the implementation
144
+ **automatically rescales** temporal lengths/strides downward (with a warning) to keep
145
+ shapes valid and preserve the pipeline semantics.
146
+
147
+
148
+ .. rubric:: Usage and Configuration
149
+
150
+ - ``n_temporal_filters``, ``temporal_filter_length`` and ``spatial_expansion``:
151
+ control the capacity and the number of spatial projections in the stem.
152
+ - ``pool_length_inp``, ``pool_stride_inp`` then ``pool_length``, ``pool_stride``:
153
+ trade temporal resolution for compute; they determine the final sequence length ``T₂``.
154
+ - ``ch_dim``: width after the ``1x1`` expansion and the effective embedding size for attention.
155
+ - ``attention_mode`` + its specific hyperparameters (``reduction_rate``,
156
+ ``kernel_size``, ``seq_len``, ``freq_idx``, ``n_codewords``, ``use_mlp``):
157
+ select and tune the reweighting mechanism.
158
+ - ``drop_prob_inp`` and ``drop_prob_attn``: regularize stem and attention stages.
159
+ - **Training tips.**
160
+
161
+ Start with moderate pooling (e.g., ``P₁=75,S₁=15``) and ELU activations; enable attention
162
+ only after the stem learns stable filters. For small datasets, prefer simpler modes
163
+ (``"se"``, ``"eca"``) before heavier ones (``"gsop"``, ``"encnet"``).
164
+
165
+ Notes
166
+ -----
167
+ - Sequence length after each stage is computed internally; the final classifier expects
168
+ a flattened ``ch_dim x T₂`` vector.
169
+ - Attention operates on *channel* dimension by design; temporal gating exists only in
170
+ specific variants (CBAM/CAT).
171
+ - The paper and original code with more details about the methodological
172
+ choices are available at the [Martin2023]_ and [MartinCode]_.
48
173
  .. versionadded:: 0.9
49
174
 
50
175
  Parameters
@@ -73,18 +198,18 @@ class AttentionBaseNet(EEGModuleMixin, nn.Module):
73
198
  the depth of the network after the initial layer. Default is 16.
74
199
  attention_mode : str, optional
75
200
  The type of attention mechanism to apply. If `None`, no attention is applied.
76
- - "se" for Squeeze-and-excitation network
77
- - "gsop" for Global Second-Order Pooling
78
- - "fca" for Frequency Channel Attention Network
79
- - "encnet" for context encoding module
80
- - "eca" for Efficient channel attention for deep convolutional neural networks
81
- - "ge" for Gather-Excite
82
- - "gct" for Gated Channel Transformation
83
- - "srm" for Style-based Recalibration Module
84
- - "cbam" for Convolutional Block Attention Module
85
- - "cat" for Learning to collaborate channel and temporal attention
86
- from multi-information fusion
87
- - "catlite" for Learning to collaborate channel attention
201
+ - "se" for Squeeze-and-excitation network
202
+ - "gsop" for Global Second-Order Pooling
203
+ - "fca" for Frequency Channel Attention Network
204
+ - "encnet" for context encoding module
205
+ - "eca" for Efficient channel attention for deep convolutional neural networks
206
+ - "ge" for Gather-Excite
207
+ - "gct" for Gated Channel Transformation
208
+ - "srm" for Style-based Recalibration Module
209
+ - "cbam" for Convolutional Block Attention Module
210
+ - "cat" for Learning to collaborate channel and temporal attention
211
+ from multi-information fusion
212
+ - "catlite" for Learning to collaborate channel attention
88
213
  from multi-information fusion (lite version, cat w/o temporal attention)
89
214
  pool_length : int, default=8
90
215
  The length of the window for the average pooling operation.
@@ -8,18 +8,19 @@ from copy import deepcopy
8
8
 
9
9
  import torch
10
10
  import torch.nn.functional as F
11
+ from mne.utils import deprecated
11
12
  from torch import nn
12
13
 
13
14
  from braindecode.models.base import EEGModuleMixin
14
15
  from braindecode.modules import CausalConv1d
15
16
 
16
17
 
17
- class SleepStagerEldele2021(EEGModuleMixin, nn.Module):
18
+ class AttnSleep(EEGModuleMixin, nn.Module):
18
19
  """Sleep Staging Architecture from Eldele et al. (2021) [Eldele2021]_.
19
20
 
20
21
  .. figure:: https://raw.githubusercontent.com/emadeldeen24/AttnSleep/refs/heads/main/imgs/AttnSleep.png
21
22
  :align: center
22
- :alt: SleepStagerEldele2021 Architecture
23
+ :alt: AttnSleep Architecture
23
24
 
24
25
  Attention based Neural Net for sleep staging as described in [Eldele2021]_.
25
26
  The code for the paper and this model is also available at [1]_.
@@ -533,3 +534,12 @@ class _PositionwiseFeedForward(nn.Module):
533
534
  def forward(self, x: torch.Tensor) -> torch.Tensor:
534
535
  """Implements FFN equation."""
535
536
  return self.w_2(self.dropout(self.activate(self.w_1(x))))
537
+
538
+
539
+ @deprecated(
540
+ "`SleepStagerEldele2021` was renamed to `AttnSleep` in v1.12 to follow original author's name; this alias will be removed in v1.14."
541
+ )
542
+ class SleepStagerEldele2021(AttnSleep):
543
+ """Deprecated alias for SleepStagerEldele2021."""
544
+
545
+ pass
@@ -39,7 +39,7 @@ class CTNet(EEGModuleMixin, nn.Module):
39
39
  The architecture consists of three main components:
40
40
 
41
41
  1. **Convolutional Module**:
42
- - Apply EEGNetV4 to perform some feature extraction, denoted here as
42
+ - Apply :class:`EEGNet` to perform some feature extraction, denoted here as
43
43
  _PatchEmbeddingEEGNet module.
44
44
 
45
45
  2. **Transformer Encoder Module**:
@@ -19,9 +19,13 @@ from braindecode.modules import (
19
19
  class Deep4Net(EEGModuleMixin, nn.Sequential):
20
20
  """Deep ConvNet model from Schirrmeister et al (2017) [Schirrmeister2017]_.
21
21
 
22
- .. figure:: https://onlinelibrary.wiley.com/cms/asset/fc200ccc-d8c4-45b4-8577-56ce4d15999a/hbm23730-fig-0001-m.jpg
22
+ :bdg-success:`Convolution`
23
+
24
+ .. figure:: https://onlinelibrary.wiley.com/cms/asset/fc200ccc-d8c4-45b4-8577-56ce4d15999a/hbm23730-fig-0001-m.jpg
23
25
  :align: center
24
- :alt: CTNet Architecture
26
+ :alt: Deep4Net Architecture
27
+ :width: 600px
28
+
25
29
 
26
30
  Model described in [Schirrmeister2017]_.
27
31
 
@@ -8,14 +8,128 @@ from braindecode.models.base import EEGModuleMixin
8
8
 
9
9
 
10
10
  class DeepSleepNet(EEGModuleMixin, nn.Module):
11
- """Sleep staging architecture from Supratak et al. (2017) [Supratak2017]_.
11
+ """DeepSleepNet from Supratak et al. (2017) [Supratak2017]_.
12
12
 
13
- .. figure:: https://raw.githubusercontent.com/akaraspt/deepsleepnet/refs/heads/master/img/deepsleepnet.png
13
+ :bdg-success:`Convolution` :bdg-info:`Recurrent`
14
+
15
+ .. figure:: https://raw.githubusercontent.com/akaraspt/deepsleepnet/master/img/deepsleepnet.png
14
16
  :align: center
15
17
  :alt: DeepSleepNet Architecture
18
+ :width: 700px
19
+
20
+ .. rubric:: Architectural Overview
21
+
22
+ DeepSleepNet couples **dual-path convolution neural network representation learning** with
23
+ **sequence residual learning** via bidirectional LSTMs.
24
+
25
+ The network have:
26
+
27
+ - (i) learns complementary, time-frequency features from each
28
+ 30-s epoch using **two parallel CNNs** (small vs. large first-layer filters), then
29
+ - (ii) models **temporal dependencies across epochs** using **two-layer BiLSTMs**
30
+ with a **residual shortcut** from the CNN features, and finally
31
+ - (iii) outputs per-epoch sleep stages. This design encodes both
32
+ epoch-local patterns and longer-range transition rules used by human scorers.
33
+
34
+ In term of implementation:
35
+
36
+ - (i) :class:`_RepresentationLearning` two CNNs extract epoch-wise features
37
+ (small-filter path for temporal precision; large-filter path for frequency precision);
38
+ - (ii) :class:`_SequenceResidualLearning` stacked BiLSTMs with peepholes + residual shortcut
39
+ inject temporal context while preserving CNN evidence;
40
+ - (iii) :class:`_Classifier` linear readout (softmax) for the five sleep stages.
41
+
42
+ .. rubric:: Macro Components
43
+
44
+ - :class:`_RepresentationLearning` **(dual-path CNN → epoch feature)**
45
+
46
+ - *Operations.*
47
+ - **Small-filter CNN** 4 times:
48
+ - :class:`~torch.nn.Conv1d`
49
+ - :class:`~torch.nn.BatchNorm1d`
50
+ - :class:`~torch.nn.ReLU`
51
+ - :class:`~torch.nn.MaxPool1d` after.
52
+ First conv uses **filter length ≈ Fs/2** and **stride ≈ Fs/16** to emphasize *timing* of graphoelements.
53
+ - **Large-filter CNN**:
54
+ - Same stack but first conv uses **filter length ≈ 4·Fs** and
55
+ - **stride ≈ Fs/2** to emphasize *frequency* content.
56
+ - Outputs from both paths are **concatenated** into the epoch embedding ``a_t``.
57
+
58
+ - *Rationale.*
59
+ Two first-layer scales provide a **learned, dual-scale filter bank** that trades
60
+ temporal vs. frequency precision without hand-crafted features.
61
+
62
+ - :class:`_SequenceResidualLearning` (:class:`~torch.nn.BiLSTM` **context + residual fusion)**
63
+
64
+ - *Operations.*
65
+ - **Two-layer BiLSTM** with **peephole connections** processes the sequence of epoch embeddings
66
+ ``{a_t}`` forward and backward; hidden states from both directions are **concatenated**.
67
+ - A **shortcut MLP** (fully connected + :class:`~torch.nn.BatchNorm1d` + :class:`~torch.nn.ReLU`) projects ``a_t`` to the BiLSTM output
68
+ dimension and is **added** (residual) to the :class:`~torch.nn.BiLSTM` output at each time step.
69
+ - *Role.* Encodes **stage-transition rules** and smooths predictions over time while preserving
70
+ salient CNN features via the residual path.
71
+
72
+ - :class:`_Classifier` **(epoch-wise prediction)**
73
+
74
+ - *Operations.*
75
+ - :class:`~torch.nn.Linear` to produce per-epoch class probabilities.
16
76
 
17
- Convolutional neural network and bidirectional-Long Short-Term
18
- for single channels sleep staging described in [Supratak2017]_.
77
+ Original training uses two-step optimization: CNN pretraining on class-balanced data,
78
+ then end-to-end fine-tuning with sequential batches.
79
+
80
+ .. rubric:: Convolutional Details
81
+
82
+ - **Temporal (where time-domain patterns are learned).**
83
+
84
+ Both CNN paths use **1-D temporal convolutions**. The *small-filter* path (first kernel ≈ Fs/2,
85
+ stride ≈ Fs/16) captures *when* characteristic transients occur; the *large-filter* path
86
+ (first kernel ≈ 4·Fs, stride ≈ Fs/2) captures *which* frequency components dominate over the
87
+ epoch. Deeper layers use **small kernels** to refine features with fewer parameters, interleaved
88
+ with **max pooling** for downsampling.
89
+
90
+ - **Spatial (how channels are processed).**
91
+ The original model operates on **single-channel** raw EEG; convolutions therefore mix only
92
+ along time (no spatial convolution across electrodes).
93
+
94
+ - **Spectral (how frequency information emerges).**
95
+ No explicit Fourier/wavelet transform is used. The **large-filter path** serves as a
96
+ *frequency-sensitive* analyzer, while the **small-filter path** remains *time-sensitive*,
97
+ together functioning as a **two-band learned filter bank** at the first layer.
98
+
99
+ .. rubric:: Attention / Sequential Modules
100
+
101
+ - **Type.** **Bidirectional LSTM** (two layers) with **peephole connections**; forward and
102
+ backward streams are independent and concatenated.
103
+ - **Shapes.** For a sequence of ``N`` epochs, the CNN produces ``{a_t} ∈ R^{D}``;
104
+ BiLSTM outputs ``h_t ∈ R^{2H}``; the shortcut MLP maps ``a_t → R^{2H}`` to enable
105
+ **element-wise residual addition**.
106
+ - **Role.** Models **long-range temporal dependencies** (e.g., persisting N2 without visible
107
+ K-complex/spindles), stabilizing per-epoch predictions.
108
+
109
+
110
+ .. rubric:: Additional Mechanisms
111
+
112
+ - **Residual shortcut over sequence encoder.** Adds projected CNN features to BiLSTM outputs,
113
+ improving gradient flow and retaining discriminative content from representation learning.
114
+ - **Two-step training.**
115
+ - (i) **Pretrain** the CNN paths with class-balanced sampling;
116
+ - (ii) **fine-tune** the full network with sequential batches, using **lower LR** for CNNs and **higher LR** for the
117
+ sequence encoder.
118
+ - **State handling.** BiLSTM states are **reinitialized per subject** so that temporal context
119
+ does not leak across recordings.
120
+
121
+
122
+ .. rubric:: Usage and Configuration
123
+
124
+ - **Epoch pipeline.** Use **two parallel CNNs** with the first conv sized to **Fs/2** (small path)
125
+ and **4·Fs** (large path), with strides **Fs/16** and **Fs/2**, respectively; stack three more
126
+ conv blocks with small kernels, plus **max pooling** in each path. Concatenate path outputs
127
+ to form epoch embeddings.
128
+ - **Sequence encoder.** Apply **two-layer BiLSTM (peepholes)** over the sequence of embeddings;
129
+ add a **projection MLP** on the CNN features and **sum** with BiLSTM outputs (residual).
130
+ Finish with :class:`~torch.nn.Linear` per epoch.
131
+ - **Reference implementation.** See the official repository for a faithful implementation and
132
+ training scripts.
19
133
 
20
134
  Parameters
21
135
  ----------
@@ -32,7 +146,6 @@ class DeepSleepNet(EEGModuleMixin, nn.Module):
32
146
  drop_prob : float, default=0.5
33
147
  The dropout rate for regularization. Values should be between 0 and 1.
34
148
 
35
-
36
149
  References
37
150
  ----------
38
151
  .. [Supratak2017] Supratak, A., Dong, H., Wu, C., & Guo, Y. (2017).
@@ -12,33 +12,129 @@ from braindecode.modules import FeedForwardBlock, MultiHeadAttention
12
12
 
13
13
 
14
14
  class EEGConformer(EEGModuleMixin, nn.Module):
15
- """EEG Conformer from Song et al. (2022) from [song2022]_.
15
+ """EEG Conformer from Song et al. (2022) [song2022]_.
16
16
 
17
- .. figure:: https://raw.githubusercontent.com/eeyhsong/EEG-Conformer/refs/heads/main/visualization/Fig1.png
17
+ :bdg-success:`Convolution` :bdg-info:`Small Attention`
18
+
19
+ .. figure:: https://raw.githubusercontent.com/eeyhsong/EEG-Conformer/refs/heads/main/visualization/Fig1.png
18
20
  :align: center
19
21
  :alt: EEGConformer Architecture
22
+ :width: 600px
23
+
24
+
25
+ .. rubric:: Architectural Overview
26
+
27
+ EEG-Conformer is a *convolution-first* model augmented with a *lightweight transformer
28
+ encoder*. The end-to-end flow is:
29
+
30
+ - (i) :class:`_PatchEmbedding` converts the continuous EEG into a compact sequence of tokens via a
31
+ :class:`ShallowFBCSPNet` temporal–spatial conv stem and temporal pooling;
32
+ - (ii) :class:`_TransformerEncoder` applies small multi-head self-attention to integrate
33
+ longer-range temporal context across tokens;
34
+ - (iii) :class:`_ClassificationHead` aggregates the sequence and performs a linear readout.
35
+ This preserves the strong inductive biases of shallow CNN filter banks while adding
36
+ just enough attention to capture dependencies beyond the pooling horizon [song2022]_.
37
+
38
+ .. rubric:: Macro Components
39
+
40
+ - :class:`_PatchEmbedding` **(Shallow conv stem → tokens)**
41
+
42
+ - *Operations.*
43
+ - A temporal convolution (`:class:torch.nn.Conv2d`) ``(1 x L_t)`` forms a data-driven "filter bank";
44
+ - A spatial convolution (`:class:torch.nn.Conv2d`) (n_chans x 1)`` projects across electrodes,
45
+ collapsing the channel axis into a virtual channel.
46
+ - **Normalization function** :class:`torch.nn.BatchNorm`
47
+ - **Activation function** :class:`torch.nn.ELU`
48
+ - **Average Pooling** :class:`torch.nn.AvgPool` along time (kernel ``(1, P)`` with stride ``(1, S)``)
49
+ - final ``1x1`` :class:`torch.nn.Linear` projection.
50
+
51
+ The result is rearranged to a token sequence ``(B, S_tokens, D)``, where ``D = n_filters_time``.
52
+
53
+ *Interpretability/robustness.* Temporal kernels can be inspected as FIR filters;
54
+ the spatial conv yields channel projections analogous to :class:`ShallowFBCSPNet`’s learned
55
+ spatial filters. Temporal pooling stabilizes statistics and reduces sequence length.
56
+
57
+ - :class:`_TransformerEncoder` **(context over temporal tokens)**
58
+
59
+ - *Operations.*
60
+ - A stack of ``att_depth`` encoder blocks. :class:`_TransformerEncoderBlock`
61
+ - Each block applies LayerNorm :class:`torch.nn.LayerNorm`
62
+ - Multi-Head Self-Attention (``att_heads``) with dropout + residual :class:`MultiHeadAttention` (:class:`torch.nn.Dropout`)
63
+ - LayerNorm :class:`torch.nn.LayerNorm`
64
+ - 2-layer feed-forward (≈4x expansion, :class:`torch.nn.GELU`) with dropout + residual.
65
+
66
+ Shapes remain ``(B, S_tokens, D)`` throughout.
20
67
 
21
- Convolutional Transformer for EEG decoding.
68
+ *Role.* Small attention focuses on interactions among *temporal patches* (not channels),
69
+ extending effective receptive fields at modest cost.
22
70
 
23
- The paper and original code with more details about the methodological
24
- choices are available at the [song2022]_ and [ConformerCode]_.
71
+ - :class:`ClassificationHead` **(aggregation + readout)**
25
72
 
26
- This neural network architecture receives a traditional braindecode input.
27
- The input shape should be three-dimensional matrix representing the EEG
28
- signals.
73
+ - *Operations*.
74
+ - Flatten, :class:`torch.nn.Flatten` the sequence ``(B, S_tokens·D)`` -
75
+ - MLP (:class:`torch.nn.Linear` → activation (default: :class:`torch.nn.ELU`) → :class:`torch.nn.Dropout` → :class:`torch.nn.Linear`)
76
+ - final Linear to classes.
29
77
 
30
- `(batch_size, n_channels, n_timesteps)`.
78
+ With ``return_features=True``, features before the last Linear can be exported for
79
+ linear probing or downstream tasks.
31
80
 
32
- The EEG Conformer architecture is composed of three modules:
33
- - PatchEmbedding
34
- - TransformerEncoder
35
- - ClassificationHead
81
+ .. rubric:: Convolutional Details
82
+
83
+ - **Temporal (where time-domain patterns are learned).**
84
+ The initial ``(1 x L_t)`` conv per channel acts as a *learned filter bank* for oscillatory
85
+ bands and transients. Subsequent **AvgPool** along time performs local integration,
86
+ converting activations into “patches” (tokens). Pool length/stride control the
87
+ token rate and set the lower bound on temporal context within each token.
88
+
89
+ - **Spatial (how electrodes are processed).**
90
+ A single conv with kernel ``(n_chans x 1)`` spans the full montage to learn spatial
91
+ projections for each temporal feature map, collapsing the channel axis into a
92
+ virtual channel before tokenization. This mirrors the shallow spatial step in
93
+ :class:`ShallowFBCSPNet` (temporal filters → spatial projection → temporal condensation).
94
+
95
+ - **Spectral (how frequency content is captured).**
96
+ No explicit Fourier/wavelet stage is used. Spectral selectivity emerges implicitly
97
+ from the learned temporal kernels; pooling further smooths high-frequency noise.
98
+ The effective spectral resolution is thus governed by ``L_t`` and the pooling
99
+ configuration.
100
+
101
+ .. rubric:: Attention / Sequential Modules
102
+
103
+ - **Type.** Standard multi-head self-attention (MHA) with ``att_heads`` heads over the token sequence.
104
+ - **Shapes.** Input/Output: ``(B, S_tokens, D)``; attention operates along the ``S_tokens`` axis.
105
+ - **Role.** Re-weights and integrates evidence across pooled windows, capturing dependencies
106
+ longer than any single token while leaving channel relationships to the convolutional stem.
107
+ The design is intentionally *small*—attention refines rather than replaces convolutional feature extraction.
108
+
109
+ .. rubric:: Additional Mechanisms
110
+
111
+ - **Parallel with ShallowFBCSPNet.** Both begin with a learned temporal filter bank,
112
+ spatial projection across electrodes, and early temporal condensation.
113
+ :class:`ShallowFBCSPNet` then computes band-power (via squaring/log-variance), whereas
114
+ EEG-Conformer applies BN/ELU and **continues with attention** over tokens to
115
+ refine temporal context before classification.
116
+
117
+ - **Tokenization knob.** ``pool_time_length`` and especially ``pool_time_stride`` set
118
+ the number of tokens ``S_tokens``. Smaller strides → more tokens and higher attention
119
+ capacity (but higher compute); larger strides → fewer tokens and stronger inductive bias.
120
+
121
+ - **Embedding dimension = filters.** ``n_filters_time`` serves double duty as both the
122
+ number of temporal filters in the stem and the transformer’s embedding size ``D``,
123
+ simplifying dimensional alignment.
124
+
125
+ .. rubric:: Usage and Configuration
126
+
127
+ - **Instantiation.** Choose ``n_filters_time`` (embedding size ``D``) and
128
+ ``filter_time_length`` to match the rhythms of interest. Tune
129
+ ``pool_time_length/stride`` to trade temporal resolution for sequence length.
130
+ Keep ``att_depth`` modest (e.g., 4–6) and set ``att_heads`` to divide ``D``.
131
+ ``final_fc_length="auto"`` infers the flattened size from PatchEmbedding.
36
132
 
37
133
  Notes
38
134
  -----
39
135
  The authors recommend using data augmentation before using Conformer,
40
136
  e.g. segmentation and recombination,
41
- Please refer to the original paper and code for more details.
137
+ Please refer to the original paper and code for more details [ConformerCode]_.
42
138
 
43
139
  The model was initially tuned on 4 seconds of 250 Hz data.
44
140
  Please adjust the scale of the temporal convolutional layer,
@@ -47,7 +143,10 @@ class EEGConformer(EEGModuleMixin, nn.Module):
47
143
  .. versionadded:: 0.8
48
144
 
49
145
  We aggregate the parameters based on the parts of the models, or
50
- when the parameters were used first, e.g. n_filters_time.
146
+ when the parameters were used first, e.g. ``n_filters_time``.
147
+
148
+ .. versionadded:: 1.1
149
+
51
150
 
52
151
  Parameters
53
152
  ----------
@@ -15,12 +15,84 @@ from braindecode.modules import DepthwiseConv2d, Ensure4d, InceptionBlock
15
15
  class EEGInceptionERP(EEGModuleMixin, nn.Sequential):
16
16
  """EEG Inception for ERP-based from Santamaria-Vazquez et al (2020) [santamaria2020]_.
17
17
 
18
+ :bdg-success:`Convolution`
19
+
18
20
  .. figure:: https://braindecode.org/dev/_static/model/eeginceptionerp.jpg
19
21
  :align: center
20
22
  :alt: EEGInceptionERP Architecture
21
23
 
22
- The code for the paper and this model is also available at [santamaria2020]_
23
- and an adaptation for PyTorch [2]_.
24
+ Figure: Overview of EEG-Inception architecture. 2D convolution blocks and depthwise 2D convolution blocks include batch normalization, activation and dropout regularization. The kernel size is displayed for convolutional and average pooling layers.
25
+
26
+ .. rubric:: Architectural Overview
27
+
28
+ A two-stage, multi-scale CNN tailored to ERP detection from short (0-1000 ms) single-trial epochs. Signals are mapped through
29
+ * (i) :class:`_InceptionModule1` multi-scale temporal feature extraction plus per-branch spatial mixing;
30
+ * (ii) :class:`_InceptionModule2` deeper multi-scale refinement at a reduced temporal resolution; and
31
+ * (iii) :class:`_OutputModule` compact aggregation and linear readout.
32
+
33
+ .. rubric:: Macro Components
34
+
35
+ - :class:`_InceptionModule1` **(multi-scale temporal + spatial mixing)**
36
+
37
+ - *Operations.*
38
+ - `EEGInceptionERP.c1`: :class:`torch.nn.Conv2d` ``k=(64,1)``, stride ``(1,1)``, *same* pad on input reshaped to ``(B,1,128,8)`` → BN → activation → dropout.
39
+ - `EEGInceptionERP.d1`: :class:`torch.nn.Conv2d` (depthwise) ``k=(1,8)``, *valid* pad over channels → BN → activation → dropout.
40
+ - `EEGInceptionERP.c2`: :class:`torch.nn.Conv2d` ``k=(32,1)`` → BN → activation → dropout; then `EEGInceptionERP.d2` depthwise ``k=(1,8)`` → BN → activation → dropout.
41
+ - `EEGInceptionERP.c3`: :class:`torch.nn.Conv2d` ``k=(16,1)`` → BN → activation → dropout; then `EEGInceptionERP.d3` depthwise ``k=(1,8)`` → BN → activation → dropout.
42
+ - `EEGInceptionERP.n1`: :class:`torch.nn.Concat` over branch features.
43
+ - `EEGInceptionERP.a1`: :class:`torch.nn.AvgPool2d` ``pool=(4,1)``, stride ``(4,1)`` for temporal downsampling.
44
+
45
+ *Interpretability/robustness.* Depthwise `1 x n_chans` layers act as learnable montage-wide spatial filters per temporal scale; pooling stabilizes against jitter.
46
+
47
+ - :class:`_InceptionModule2` **(refinement at coarser timebase)**
48
+
49
+ - *Operations.*
50
+ - `EEGInceptionERP.c4`: :class:`torch.nn.Conv2d` ``k=(16,1)`` → BN → activation → dropout.
51
+ - `EEGInceptionERP.c5`: :class:`torch.nn.Conv2d` ``k=(8,1)`` → BN → activation → dropout.
52
+ - `EEGInceptionERP.c6`: :class:`torch.nn.Conv2d` ``k=(4,1)`` → BN → activation → dropout.
53
+ - `EEGInceptionERP.n2`: :class:`torch.nn.Concat` (merge C4-C6 outputs).
54
+ - `EEGInceptionERP.a2`: :class:`torch.nn.AvgPool2d` ``pool=(2,1)``, stride ``(2,1)``.
55
+ - `EEGInceptionERP.c7`: :class:`torch.nn.Conv2d` ``k=(8,1)`` → BN → activation → dropout; then `EEGInceptionERP.a3`: :class:`torch.nn.AvgPool2d` ``pool=(2,1)``.
56
+ - `EEGInceptionERP.c8`: :class:`torch.nn.Conv2d` ``k=(4,1)`` → BN → activation → dropout; then `EEGInceptionERP.a4`: :class:`torch.nn.AvgPool2d` ``pool=(2,1)``.
57
+
58
+ *Role.* Adds higher-level, shorter-window evidence while progressively compressing temporal dimension.
59
+
60
+ - :class:`_OutputModule` **(aggregation + readout)**
61
+
62
+ - *Operations.*
63
+ - :class:`torch.nn.Flatten`
64
+ - :class:`torch.nn.Linear` ``(features → 2)``
65
+
66
+ .. rubric:: Convolutional Details
67
+
68
+ - **Temporal (where time-domain patterns are learned).**
69
+ First module uses 1D temporal kernels along the 128-sample axis: ``64``, ``32``, ``16``
70
+ (≈500, 250, 125 ms at 128 Hz). After ``pool=(4,1)``, the second module applies ``16``,
71
+ ``8``, ``4`` (≈125, 62.5, 31.25 ms at the pooled rate). All strides are ``1`` in convs;
72
+ temporal resolution changes only via average pooling.
73
+
74
+ - **Spatial (how electrodes are processed).**
75
+ Depthwise convs with ``k=(1,8)`` span all channels and are applied **per temporal branch**,
76
+ yielding scale-specific channel projections (no cross-branch mixing until concatenation).
77
+ There is no full 2D mixing kernel; spatial mixing is factorized and lightweight.
78
+
79
+ - **Spectral (how frequency information is captured).**
80
+ No explicit transform; multiple temporal kernels form a *learned filter bank* over
81
+ ERP-relevant bands. Successive pooling acts as low-pass integration to emphasize sustained
82
+ post-stimulus components.
83
+
84
+ .. rubric:: Additional Mechanisms
85
+
86
+ - Every conv/depthwise block includes **BatchNorm**, nonlinearity (paper used grid-searched activation), and **dropout**.
87
+ - Two Inception stages followed by short convs and pooling keep parameters small (≈15k reported) while preserving multi-scale evidence.
88
+ - Expected input: epochs of shape ``(B,1,128,8)`` (time x channels as a 2D map) or reshaped from ``(B,8,128)`` with an added singleton feature dimension.
89
+
90
+ .. rubric:: Usage and Configuration
91
+
92
+ - **Key knobs.** Number of filters per branch; kernel lengths in both Inception modules; depthwise kernel over channels (typically ``n_chans``); pooling lengths/strides; dropout rate; choice of activation.
93
+ - **Training tips.** Use 0-1000 ms windows at 128 Hz with CAR; tune activation and dropout (they strongly affect performance); early-stop on validation loss when overfitting emerges.
94
+
95
+ .. rubric:: Implementation Details
24
96
 
25
97
  The model is strongly based on the original InceptionNet for an image. The main goal is
26
98
  to extract features in parallel with different scales. The authors extracted three scales
@@ -33,12 +105,9 @@ class EEGInceptionERP(EEGModuleMixin, nn.Sequential):
33
105
  The winners of BEETL Competition/NeurIps 2021 used parts of the
34
106
  model [beetl]_.
35
107
 
36
- The model is fully described in [santamaria2020]_.
108
+ The code for the paper and this model is also available at [santamaria2020]_
109
+ and an adaptation for PyTorch [2]_.
37
110
 
38
- Notes
39
- -----
40
- This implementation is not guaranteed to be correct, has not been checked
41
- by original authors, only reimplemented from the paper based on [2]_.
42
111
 
43
112
  Parameters
44
113
  ----------