eegdash 0.0.6__tar.gz → 0.0.8__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of eegdash might be problematic. Click here for more details.

eegdash-0.0.8/PKG-INFO ADDED
@@ -0,0 +1,157 @@
1
+ Metadata-Version: 2.4
2
+ Name: eegdash
3
+ Version: 0.0.8
4
+ Summary: EEG data for machine learning
5
+ Author-email: Young Truong <dt.young112@gmail.com>, Arnaud Delorme <adelorme@gmail.com>
6
+ License: GNU General Public License
7
+
8
+ Copyright (C) 2024-2025
9
+
10
+ Young Truong, UCSD, dt.young112@gmail.com
11
+ Arnaud Delorme, UCSD, adelorme@ucsd.edu
12
+
13
+ This program is free software; you can redistribute it and/or modify
14
+ it under the terms of the GNU General Public License as published by
15
+ the Free Software Foundation; either version 2 of the License, or
16
+ (at your option) any later version.
17
+
18
+ This program is distributed in the hope that it will be useful,
19
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
20
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21
+ GNU General Public License for more details.
22
+
23
+ You should have received a copy of the GNU General Public License
24
+ along with this program; if not, write to the Free Software
25
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1.07 USA
26
+
27
+ Project-URL: Homepage, https://github.com/sccn/EEG-Dash-Data
28
+ Project-URL: Issues, https://github.com/sccn/EEG-Dash-Data/issues
29
+ Classifier: Programming Language :: Python :: 3
30
+ Classifier: License :: OSI Approved :: MIT License
31
+ Classifier: Operating System :: OS Independent
32
+ Requires-Python: >=3.8
33
+ Description-Content-Type: text/markdown
34
+ License-File: LICENSE
35
+ Requires-Dist: xarray
36
+ Requires-Dist: python-dotenv
37
+ Requires-Dist: s3fs
38
+ Requires-Dist: mne
39
+ Requires-Dist: pynwb
40
+ Requires-Dist: h5py
41
+ Requires-Dist: pymongo
42
+ Requires-Dist: joblib
43
+ Requires-Dist: braindecode
44
+ Requires-Dist: mne-bids
45
+ Dynamic: license-file
46
+
47
+ # EEG-Dash
48
+ To leverage recent and ongoing advancements in large-scale computational methods and to ensure the preservation of scientific data generated from publicly funded research, the EEG-DaSh data archive will create a data-sharing resource for MEEG (EEG, MEG) data contributed by collaborators for machine learning (ML) and deep learning (DL) applications.
49
+
50
+ ## Data source
51
+ The data in EEG-DaSh originates from a collaboration involving 25 laboratories, encompassing 27,053 participants. This extensive collection includes MEEG data, which is a combination of EEG and MEG signals. The data is sourced from various studies conducted by these labs, involving both healthy subjects and clinical populations with conditions such as ADHD, depression, schizophrenia, dementia, autism, and psychosis. Additionally, data spans different mental states like sleep, meditation, and cognitive tasks. In addition, EEG-DaSh will incorporate a subset of the data converted from NEMAR, which includes 330 MEEG BIDS-formatted datasets, further expanding the archive with well-curated, standardized neuroelectromagnetic data.
52
+
53
+ ## Available data
54
+
55
+ The following datasets are currently available on EEGDash.
56
+
57
+ | DatasetID | Participants | Files | Sessions | Population | Channels | Is 10-20? | Modality | Size |
58
+ |---|---|---|---|---|---|---|---|---|
59
+ | [ds002181](https://nemar.org/dataexplorer/detail?dataset_id=ds002181) | 20 | 949 | 1 | Healthy | 63 | 10-20 | Visual | 0.163 GB |
60
+ | [ds002578](https://nemar.org/dataexplorer/detail?dataset_id=ds002578) | 2 | 22 | 1 | Healthy | 256 | 10-20 | Visual | 0.001 TB |
61
+ | [ds002680](https://nemar.org/dataexplorer/detail?dataset_id=ds002680) | 14 | 4977 | 2 | Healthy | 0 | 10-20 | Visual | 0.01 TB |
62
+ | [ds002691](https://nemar.org/dataexplorer/detail?dataset_id=ds002691) | 20 | 146 | 1 | Healthy | 32 | other | Visual | 0.001 TB |
63
+ | [ds002718](https://nemar.org/dataexplorer/detail?dataset_id=ds002718) | 18 | 582 | 1 | Healthy | 70 | other | Visual | 0.005 TB |
64
+ | [ds003061](https://nemar.org/dataexplorer/detail?dataset_id=ds003061) | 13 | 282 | 1 | Not specified | 64 | 10-20 | Auditory | 0.002 TB |
65
+ | [ds003690](https://nemar.org/dataexplorer/detail?dataset_id=ds003690) | 75 | 2630 | 1 | Healthy | 61 | 10-20 | Auditory | 0.023 TB |
66
+ | [ds003805](https://nemar.org/dataexplorer/detail?dataset_id=ds003805) | 1 | 10 | 1 | Healthy | 19 | 10-20 | Multisensory | 0 TB |
67
+ | [ds003838](https://nemar.org/dataexplorer/detail?dataset_id=ds003838) | 65 | 947 | 1 | Healthy | 63 | 10-20 | Auditory | 100.2 GB |
68
+ | [ds004010](https://nemar.org/dataexplorer/detail?dataset_id=ds004010) | 24 | 102 | 1 | Healthy | 64 | other | Multisensory | 0.025 TB |
69
+ | [ds004040](https://nemar.org/dataexplorer/detail?dataset_id=ds004040) | 13 | 160 | 2 | Healthy | 64 | 10-20 | Auditory | 0.012 TB |
70
+ | [ds004350](https://nemar.org/dataexplorer/detail?dataset_id=ds004350) | 24 | 960 | 2 | Healthy | 64 | other | Visual | 0.023 TB |
71
+ | [ds004362](https://nemar.org/dataexplorer/detail?dataset_id=ds004362) | 109 | 9162 | 1 | Healthy | 64 | 10-20 | Visual | 0.008 TB |
72
+ | [ds004504](https://nemar.org/dataexplorer/detail?dataset_id=ds004504) | 88 | 269 | 1 | Dementia | 19 | 10-20 | Resting State | 2.6 GB |
73
+ | [ds004554](https://nemar.org/dataexplorer/detail?dataset_id=ds004554) | 16 | 101 | 1 | Healthy | 99 | 10-20 | Visual | 0.009 TB |
74
+ | [ds004635](https://nemar.org/dataexplorer/detail?dataset_id=ds004635) | 48 | 292 | 1 | Healthy | 129 | other | Multisensory | 26.1 GB |
75
+ | [ds004657](https://nemar.org/dataexplorer/detail?dataset_id=ds004657) | 24 | 838 | 6 | Not specified | 64 | 10-20 | Motor | 43.1 GB |
76
+ | [ds004660](https://nemar.org/dataexplorer/detail?dataset_id=ds004660) | 21 | 299 | 1 | Healthy | 32 | 10-20 | Multisensory | 7.2 GB |
77
+ | [ds004661](https://nemar.org/dataexplorer/detail?dataset_id=ds004661) | 17 | 90 | 1 | Not specified | 64 | 10-20 | Multisensory | 1.4 GB |
78
+ | [ds004745](https://nemar.org/dataexplorer/detail?dataset_id=ds004745) | 52 | 762 | 1 | Healthy | 64 | ? | Auditory | 0 TB |
79
+ | [ds004785](https://nemar.org/dataexplorer/detail?dataset_id=ds004785) | 17 | 74 | 1 | Healthy | 32 | ? | Motor | 0 TB |
80
+ | [ds004841](https://nemar.org/dataexplorer/detail?dataset_id=ds004841) | 20 | 1034 | 2 | Not specified | 64 | 10-20 | Multisensory | 7.3 GB |
81
+ | [ds004842](https://nemar.org/dataexplorer/detail?dataset_id=ds004842) | 14 | 719 | 2 | Not specified | 64 | ? | Multisensory | 5.2 GB |
82
+ | [ds004843](https://nemar.org/dataexplorer/detail?dataset_id=ds004843) | 14 | 649 | 1 | Not specified | 64 | ? | Visual | 7.7 GB |
83
+ | [ds004844](https://nemar.org/dataexplorer/detail?dataset_id=ds004844) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 22.3 GB |
84
+ | [ds004849](https://nemar.org/dataexplorer/detail?dataset_id=ds004849) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
85
+ | [ds004850](https://nemar.org/dataexplorer/detail?dataset_id=ds004850) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
86
+ | [ds004851](https://nemar.org/dataexplorer/detail?dataset_id=ds004851) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
87
+ | [ds004852](https://nemar.org/dataexplorer/detail?dataset_id=ds004852) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
88
+ | [ds004853](https://nemar.org/dataexplorer/detail?dataset_id=ds004853) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
89
+ | [ds004854](https://nemar.org/dataexplorer/detail?dataset_id=ds004854) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
90
+ | [ds004855](https://nemar.org/dataexplorer/detail?dataset_id=ds004855) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
91
+ | [ds005034](https://nemar.org/dataexplorer/detail?dataset_id=ds005034) | 25 | 406 | 2 | Healthy | 129 | ? | Visual | 61.4 GB |
92
+ | [ds005079](https://nemar.org/dataexplorer/detail?dataset_id=ds005079) | 1 | 210 | 12 | Healthy | 64 | ? | Multisensory | 1.7 GB |
93
+ | [ds005342](https://nemar.org/dataexplorer/detail?dataset_id=ds005342) | 32 | 134 | 1 | Healthy | 17 | ? | Visual | 2 GB |
94
+ | [ds005410](https://nemar.org/dataexplorer/detail?dataset_id=ds005410) | 81 | 492 | 1 | Healthy | 63 | ? | ? | 19.8 GB |
95
+ | [ds005505](https://nemar.org/dataexplorer/detail?dataset_id=ds005505) | 136 | 5393 | 1 | Healthy | 129 | other | Visual | 103 GB |
96
+ | [ds005506](https://nemar.org/dataexplorer/detail?dataset_id=ds005506) | 150 | 5645 | 1 | Healthy | 129 | other | Visual | 112 GB |
97
+ | [ds005507](https://nemar.org/dataexplorer/detail?dataset_id=ds005507) | 184 | 7273 | 1 | Healthy | 129 | other | Visual | 140 GB |
98
+ | [ds005508](https://nemar.org/dataexplorer/detail?dataset_id=ds005508) | 324 | 13393 | 1 | Healthy | 129 | other | Visual | 230 GB |
99
+ | [ds005509](https://nemar.org/dataexplorer/detail?dataset_id=ds005509) | 330 | 19980 | 1 | Healthy | 129 | other | Visual | 224 GB |
100
+ | [ds005510](https://nemar.org/dataexplorer/detail?dataset_id=ds005510) | 135 | 4933 | 1 | Healthy | 129 | other | Visual | 91 GB |
101
+ | [ds005511](https://nemar.org/dataexplorer/detail?dataset_id=ds005511) | 381 | 18604 | 1 | Healthy | 129 | other | Visual | 245 GB |
102
+ | [ds005512](https://nemar.org/dataexplorer/detail?dataset_id=ds005512) | 257 | 9305 | 1 | Healthy | 129 | other | Visual | 157 GB |
103
+ | [ds005514](https://nemar.org/dataexplorer/detail?dataset_id=ds005514) | 295 | 11565 | 1 | Healthy | 129 | other | Visual | 185 GB |
104
+ | [ds005672](https://nemar.org/dataexplorer/detail?dataset_id=ds005672) | 3 | 18 | 1 | Healthy | 64 | 10-20 | Visual | 4.2 GB |
105
+ | [ds005697](https://nemar.org/dataexplorer/detail?dataset_id=ds005697) | 52 | 210 | 1 | Healthy | 64 | 10-20 | Visual | 67 GB |
106
+ | [ds005787](https://nemar.org/dataexplorer/detail?dataset_id=ds005787) | 30 | ? | 4 | Healthy | 64 | 10-20 | Visual | 185 GB |
107
+
108
+ ## Data format
109
+ EEGDash queries return a **Pytorch Dataset** formatted to facilitate machine learning (ML) and deep learning (DL) applications. PyTorch Datasets are the best format for EEGDash queries because they provide an efficient, scalable, and flexible structure for machine learning (ML) and deep learning (DL) applications. They allow seamless integration with PyTorch’s DataLoader, enabling efficient batching, shuffling, and parallel data loading, which is essential for training deep learning models on large EEG datasets.
110
+
111
+ ## Data preprocessing
112
+ EEGDash datasets are processed using the popular [BrainDecode](https://braindecode.org/stable/index.html) library. In fact, EEGDash datasets are BrainDecode datasets, which are themselves PyTorch datasets. This means that any preprocessing possible on BrainDecode datasets is also possible on EEGDash datasets. Refer to [BrainDecode](https://braindecode.org/stable/index.html) tutorials for guidance on preprocessing EEG data.
113
+
114
+ ## EEG-Dash usage
115
+
116
+ ### Install
117
+ Use your preferred Python environment manager with Python > 3.9 to install the package.
118
+ * To install the eegdash package, use the following temporary command (a direct pip install eegdash option will be available soon): `pip install -i https://test.pypi.org/simple/eegdash`
119
+ * To verify the installation, start a Python session and type: `from eegdash import EEGDash`
120
+
121
+ ### Data access
122
+
123
+ To use the data from a single subject, enter:
124
+
125
+ ```python
126
+ from eegdash import EEGDashDataset
127
+ ds_NDARDB033FW5 = EEGDashDataset({'dataset': 'ds005514', 'task': 'RestingState', 'subject': 'NDARDB033FW5'})
128
+ ```
129
+
130
+ This will search and download the metadata for the task **RestingState** for subject **NDARDB033FW5** in BIDS dataset **ds005514**. The actual data will not be downloaded at this stage. Following standard practice, data is only downloaded once it is processed. The **ds_NDARDB033FW5** object is a fully functional BrainDecode dataset, which is itself a PyTorch dataset. This [tutorial](https://github.com/sccn/EEGDash/blob/develop/notebooks/tutorial_eoec.ipynb) shows how to preprocess the EEG data, extracting portions of the data containing eyes-open and eyes-closed segments, then perform eyes-open vs. eyes-closed classification using a (shallow) deep-learning model.
131
+
132
+ To use the data from multiple subjects, enter:
133
+
134
+ ```python
135
+ from eegdash import EEGDashDataset
136
+ ds_ds005505rest = EEGDashDataset({'dataset': 'ds005505', 'task': 'RestingState'}, target_name='sex')
137
+ ```
138
+
139
+ This will search and download the metadata for the task 'RestingState' for all subjects in BIDS dataset 'ds005505' (a total of 136). As above, the actual data will not be downloaded at this stage so this command is quick to execute. Also, the target class for each subject is assigned using the target_name parameter. This means that this object is ready to be directly fed to a deep learning model, although the [tutorial script](https://github.com/sccn/EEGDash/blob/develop/notebooks/tutorial_sex_classification.ipynb) performs minimal processing on it, prior to training a deep-learning model. Because 14 gigabytes of data are downloaded, this tutorial takes about 10 minutes to execute.
140
+
141
+ ### Automatic caching
142
+
143
+ EEGDash automatically caches the downloaded data in the .eegdash_cache folder of the current directory from which the script is called. This means that if you run the tutorial [scripts](https://github.com/sccn/EEGDash/tree/develop/notebooks), the data will only be downloaded the first time the script is executed.
144
+
145
+ ## Education -- Coming soon...
146
+
147
+ We organize workshops and educational events to foster cross-cultural education and student training, offering both online and in-person opportunities in collaboration with US and Israeli partners. Events for 2025 will be announced via the EEGLABNEWS mailing list. Be sure to [subscribe](https://sccn.ucsd.edu/mailman/listinfo/eeglabnews).
148
+
149
+ ## About EEG-DaSh
150
+
151
+ EEG-DaSh is a collaborative initiative between the United States and Israel, supported by the National Science Foundation (NSF). The partnership brings together experts from the Swartz Center for Computational Neuroscience (SCCN) at the University of California San Diego (UCSD) and Ben-Gurion University (BGU) in Israel.
152
+
153
+ ![Screenshot 2024-10-03 at 09 14 06](https://github.com/user-attachments/assets/327639d3-c3b4-46b1-9335-37803209b0d3)
154
+
155
+
156
+
157
+
@@ -0,0 +1,111 @@
1
+ # EEG-Dash
2
+ To leverage recent and ongoing advancements in large-scale computational methods and to ensure the preservation of scientific data generated from publicly funded research, the EEG-DaSh data archive will create a data-sharing resource for MEEG (EEG, MEG) data contributed by collaborators for machine learning (ML) and deep learning (DL) applications.
3
+
4
+ ## Data source
5
+ The data in EEG-DaSh originates from a collaboration involving 25 laboratories, encompassing 27,053 participants. This extensive collection includes MEEG data, which is a combination of EEG and MEG signals. The data is sourced from various studies conducted by these labs, involving both healthy subjects and clinical populations with conditions such as ADHD, depression, schizophrenia, dementia, autism, and psychosis. Additionally, data spans different mental states like sleep, meditation, and cognitive tasks. In addition, EEG-DaSh will incorporate a subset of the data converted from NEMAR, which includes 330 MEEG BIDS-formatted datasets, further expanding the archive with well-curated, standardized neuroelectromagnetic data.
6
+
7
+ ## Available data
8
+
9
+ The following datasets are currently available on EEGDash.
10
+
11
+ | DatasetID | Participants | Files | Sessions | Population | Channels | Is 10-20? | Modality | Size |
12
+ |---|---|---|---|---|---|---|---|---|
13
+ | [ds002181](https://nemar.org/dataexplorer/detail?dataset_id=ds002181) | 20 | 949 | 1 | Healthy | 63 | 10-20 | Visual | 0.163 GB |
14
+ | [ds002578](https://nemar.org/dataexplorer/detail?dataset_id=ds002578) | 2 | 22 | 1 | Healthy | 256 | 10-20 | Visual | 0.001 TB |
15
+ | [ds002680](https://nemar.org/dataexplorer/detail?dataset_id=ds002680) | 14 | 4977 | 2 | Healthy | 0 | 10-20 | Visual | 0.01 TB |
16
+ | [ds002691](https://nemar.org/dataexplorer/detail?dataset_id=ds002691) | 20 | 146 | 1 | Healthy | 32 | other | Visual | 0.001 TB |
17
+ | [ds002718](https://nemar.org/dataexplorer/detail?dataset_id=ds002718) | 18 | 582 | 1 | Healthy | 70 | other | Visual | 0.005 TB |
18
+ | [ds003061](https://nemar.org/dataexplorer/detail?dataset_id=ds003061) | 13 | 282 | 1 | Not specified | 64 | 10-20 | Auditory | 0.002 TB |
19
+ | [ds003690](https://nemar.org/dataexplorer/detail?dataset_id=ds003690) | 75 | 2630 | 1 | Healthy | 61 | 10-20 | Auditory | 0.023 TB |
20
+ | [ds003805](https://nemar.org/dataexplorer/detail?dataset_id=ds003805) | 1 | 10 | 1 | Healthy | 19 | 10-20 | Multisensory | 0 TB |
21
+ | [ds003838](https://nemar.org/dataexplorer/detail?dataset_id=ds003838) | 65 | 947 | 1 | Healthy | 63 | 10-20 | Auditory | 100.2 GB |
22
+ | [ds004010](https://nemar.org/dataexplorer/detail?dataset_id=ds004010) | 24 | 102 | 1 | Healthy | 64 | other | Multisensory | 0.025 TB |
23
+ | [ds004040](https://nemar.org/dataexplorer/detail?dataset_id=ds004040) | 13 | 160 | 2 | Healthy | 64 | 10-20 | Auditory | 0.012 TB |
24
+ | [ds004350](https://nemar.org/dataexplorer/detail?dataset_id=ds004350) | 24 | 960 | 2 | Healthy | 64 | other | Visual | 0.023 TB |
25
+ | [ds004362](https://nemar.org/dataexplorer/detail?dataset_id=ds004362) | 109 | 9162 | 1 | Healthy | 64 | 10-20 | Visual | 0.008 TB |
26
+ | [ds004504](https://nemar.org/dataexplorer/detail?dataset_id=ds004504) | 88 | 269 | 1 | Dementia | 19 | 10-20 | Resting State | 2.6 GB |
27
+ | [ds004554](https://nemar.org/dataexplorer/detail?dataset_id=ds004554) | 16 | 101 | 1 | Healthy | 99 | 10-20 | Visual | 0.009 TB |
28
+ | [ds004635](https://nemar.org/dataexplorer/detail?dataset_id=ds004635) | 48 | 292 | 1 | Healthy | 129 | other | Multisensory | 26.1 GB |
29
+ | [ds004657](https://nemar.org/dataexplorer/detail?dataset_id=ds004657) | 24 | 838 | 6 | Not specified | 64 | 10-20 | Motor | 43.1 GB |
30
+ | [ds004660](https://nemar.org/dataexplorer/detail?dataset_id=ds004660) | 21 | 299 | 1 | Healthy | 32 | 10-20 | Multisensory | 7.2 GB |
31
+ | [ds004661](https://nemar.org/dataexplorer/detail?dataset_id=ds004661) | 17 | 90 | 1 | Not specified | 64 | 10-20 | Multisensory | 1.4 GB |
32
+ | [ds004745](https://nemar.org/dataexplorer/detail?dataset_id=ds004745) | 52 | 762 | 1 | Healthy | 64 | ? | Auditory | 0 TB |
33
+ | [ds004785](https://nemar.org/dataexplorer/detail?dataset_id=ds004785) | 17 | 74 | 1 | Healthy | 32 | ? | Motor | 0 TB |
34
+ | [ds004841](https://nemar.org/dataexplorer/detail?dataset_id=ds004841) | 20 | 1034 | 2 | Not specified | 64 | 10-20 | Multisensory | 7.3 GB |
35
+ | [ds004842](https://nemar.org/dataexplorer/detail?dataset_id=ds004842) | 14 | 719 | 2 | Not specified | 64 | ? | Multisensory | 5.2 GB |
36
+ | [ds004843](https://nemar.org/dataexplorer/detail?dataset_id=ds004843) | 14 | 649 | 1 | Not specified | 64 | ? | Visual | 7.7 GB |
37
+ | [ds004844](https://nemar.org/dataexplorer/detail?dataset_id=ds004844) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 22.3 GB |
38
+ | [ds004849](https://nemar.org/dataexplorer/detail?dataset_id=ds004849) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
39
+ | [ds004850](https://nemar.org/dataexplorer/detail?dataset_id=ds004850) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
40
+ | [ds004851](https://nemar.org/dataexplorer/detail?dataset_id=ds004851) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
41
+ | [ds004852](https://nemar.org/dataexplorer/detail?dataset_id=ds004852) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
42
+ | [ds004853](https://nemar.org/dataexplorer/detail?dataset_id=ds004853) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
43
+ | [ds004854](https://nemar.org/dataexplorer/detail?dataset_id=ds004854) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
44
+ | [ds004855](https://nemar.org/dataexplorer/detail?dataset_id=ds004855) | 17 | 481 | 4 | Not specified | 64 | ? | Multisensory | 0.077 GB |
45
+ | [ds005034](https://nemar.org/dataexplorer/detail?dataset_id=ds005034) | 25 | 406 | 2 | Healthy | 129 | ? | Visual | 61.4 GB |
46
+ | [ds005079](https://nemar.org/dataexplorer/detail?dataset_id=ds005079) | 1 | 210 | 12 | Healthy | 64 | ? | Multisensory | 1.7 GB |
47
+ | [ds005342](https://nemar.org/dataexplorer/detail?dataset_id=ds005342) | 32 | 134 | 1 | Healthy | 17 | ? | Visual | 2 GB |
48
+ | [ds005410](https://nemar.org/dataexplorer/detail?dataset_id=ds005410) | 81 | 492 | 1 | Healthy | 63 | ? | ? | 19.8 GB |
49
+ | [ds005505](https://nemar.org/dataexplorer/detail?dataset_id=ds005505) | 136 | 5393 | 1 | Healthy | 129 | other | Visual | 103 GB |
50
+ | [ds005506](https://nemar.org/dataexplorer/detail?dataset_id=ds005506) | 150 | 5645 | 1 | Healthy | 129 | other | Visual | 112 GB |
51
+ | [ds005507](https://nemar.org/dataexplorer/detail?dataset_id=ds005507) | 184 | 7273 | 1 | Healthy | 129 | other | Visual | 140 GB |
52
+ | [ds005508](https://nemar.org/dataexplorer/detail?dataset_id=ds005508) | 324 | 13393 | 1 | Healthy | 129 | other | Visual | 230 GB |
53
+ | [ds005509](https://nemar.org/dataexplorer/detail?dataset_id=ds005509) | 330 | 19980 | 1 | Healthy | 129 | other | Visual | 224 GB |
54
+ | [ds005510](https://nemar.org/dataexplorer/detail?dataset_id=ds005510) | 135 | 4933 | 1 | Healthy | 129 | other | Visual | 91 GB |
55
+ | [ds005511](https://nemar.org/dataexplorer/detail?dataset_id=ds005511) | 381 | 18604 | 1 | Healthy | 129 | other | Visual | 245 GB |
56
+ | [ds005512](https://nemar.org/dataexplorer/detail?dataset_id=ds005512) | 257 | 9305 | 1 | Healthy | 129 | other | Visual | 157 GB |
57
+ | [ds005514](https://nemar.org/dataexplorer/detail?dataset_id=ds005514) | 295 | 11565 | 1 | Healthy | 129 | other | Visual | 185 GB |
58
+ | [ds005672](https://nemar.org/dataexplorer/detail?dataset_id=ds005672) | 3 | 18 | 1 | Healthy | 64 | 10-20 | Visual | 4.2 GB |
59
+ | [ds005697](https://nemar.org/dataexplorer/detail?dataset_id=ds005697) | 52 | 210 | 1 | Healthy | 64 | 10-20 | Visual | 67 GB |
60
+ | [ds005787](https://nemar.org/dataexplorer/detail?dataset_id=ds005787) | 30 | ? | 4 | Healthy | 64 | 10-20 | Visual | 185 GB |
61
+
62
+ ## Data format
63
+ EEGDash queries return a **Pytorch Dataset** formatted to facilitate machine learning (ML) and deep learning (DL) applications. PyTorch Datasets are the best format for EEGDash queries because they provide an efficient, scalable, and flexible structure for machine learning (ML) and deep learning (DL) applications. They allow seamless integration with PyTorch’s DataLoader, enabling efficient batching, shuffling, and parallel data loading, which is essential for training deep learning models on large EEG datasets.
64
+
65
+ ## Data preprocessing
66
+ EEGDash datasets are processed using the popular [BrainDecode](https://braindecode.org/stable/index.html) library. In fact, EEGDash datasets are BrainDecode datasets, which are themselves PyTorch datasets. This means that any preprocessing possible on BrainDecode datasets is also possible on EEGDash datasets. Refer to [BrainDecode](https://braindecode.org/stable/index.html) tutorials for guidance on preprocessing EEG data.
67
+
68
+ ## EEG-Dash usage
69
+
70
+ ### Install
71
+ Use your preferred Python environment manager with Python > 3.9 to install the package.
72
+ * To install the eegdash package, use the following temporary command (a direct pip install eegdash option will be available soon): `pip install -i https://test.pypi.org/simple/eegdash`
73
+ * To verify the installation, start a Python session and type: `from eegdash import EEGDash`
74
+
75
+ ### Data access
76
+
77
+ To use the data from a single subject, enter:
78
+
79
+ ```python
80
+ from eegdash import EEGDashDataset
81
+ ds_NDARDB033FW5 = EEGDashDataset({'dataset': 'ds005514', 'task': 'RestingState', 'subject': 'NDARDB033FW5'})
82
+ ```
83
+
84
+ This will search and download the metadata for the task **RestingState** for subject **NDARDB033FW5** in BIDS dataset **ds005514**. The actual data will not be downloaded at this stage. Following standard practice, data is only downloaded once it is processed. The **ds_NDARDB033FW5** object is a fully functional BrainDecode dataset, which is itself a PyTorch dataset. This [tutorial](https://github.com/sccn/EEGDash/blob/develop/notebooks/tutorial_eoec.ipynb) shows how to preprocess the EEG data, extracting portions of the data containing eyes-open and eyes-closed segments, then perform eyes-open vs. eyes-closed classification using a (shallow) deep-learning model.
85
+
86
+ To use the data from multiple subjects, enter:
87
+
88
+ ```python
89
+ from eegdash import EEGDashDataset
90
+ ds_ds005505rest = EEGDashDataset({'dataset': 'ds005505', 'task': 'RestingState'}, target_name='sex')
91
+ ```
92
+
93
+ This will search and download the metadata for the task 'RestingState' for all subjects in BIDS dataset 'ds005505' (a total of 136). As above, the actual data will not be downloaded at this stage so this command is quick to execute. Also, the target class for each subject is assigned using the target_name parameter. This means that this object is ready to be directly fed to a deep learning model, although the [tutorial script](https://github.com/sccn/EEGDash/blob/develop/notebooks/tutorial_sex_classification.ipynb) performs minimal processing on it, prior to training a deep-learning model. Because 14 gigabytes of data are downloaded, this tutorial takes about 10 minutes to execute.
94
+
95
+ ### Automatic caching
96
+
97
+ EEGDash automatically caches the downloaded data in the .eegdash_cache folder of the current directory from which the script is called. This means that if you run the tutorial [scripts](https://github.com/sccn/EEGDash/tree/develop/notebooks), the data will only be downloaded the first time the script is executed.
98
+
99
+ ## Education -- Coming soon...
100
+
101
+ We organize workshops and educational events to foster cross-cultural education and student training, offering both online and in-person opportunities in collaboration with US and Israeli partners. Events for 2025 will be announced via the EEGLABNEWS mailing list. Be sure to [subscribe](https://sccn.ucsd.edu/mailman/listinfo/eeglabnews).
102
+
103
+ ## About EEG-DaSh
104
+
105
+ EEG-DaSh is a collaborative initiative between the United States and Israel, supported by the National Science Foundation (NSF). The partnership brings together experts from the Swartz Center for Computational Neuroscience (SCCN) at the University of California San Diego (UCSD) and Ben-Gurion University (BGU) in Israel.
106
+
107
+ ![Screenshot 2024-10-03 at 09 14 06](https://github.com/user-attachments/assets/327639d3-c3b4-46b1-9335-37803209b0d3)
108
+
109
+
110
+
111
+
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "eegdash"
7
- version = "0.0.6"
7
+ version = "0.0.8"
8
8
  authors = [
9
9
  { name="Young Truong", email="dt.young112@gmail.com" },
10
10
  { name="Arnaud Delorme", email="adelorme@gmail.com" },
@@ -19,7 +19,6 @@ classifiers = [
19
19
  "Operating System :: OS Independent",
20
20
  ]
21
21
  dependencies = [
22
- "zarr==2.18.3",
23
22
  "xarray",
24
23
  "python-dotenv",
25
24
  "s3fs",
@@ -28,6 +27,8 @@ dependencies = [
28
27
  "h5py",
29
28
  "pymongo",
30
29
  "joblib",
30
+ "braindecode",
31
+ "mne-bids",
31
32
  ]
32
33
  [project.urls]
33
34
  Homepage = "https://github.com/sccn/EEG-Dash-Data"
@@ -0,0 +1 @@
1
+ from .main import EEGDash, EEGDashDataset
@@ -12,9 +12,106 @@ from mne._fiff.utils import _find_channels, _read_segments_file
12
12
  import s3fs
13
13
  import tempfile
14
14
  from mne._fiff.utils import _read_segments_file
15
+ from braindecode.datasets import BaseDataset
16
+ import mne_bids
17
+ from mne_bids import (
18
+ BIDSPath,
19
+ )
15
20
 
16
- class RawEEGDash(BaseRaw):
17
- r"""Raw object from EEG-Dash connection with Openneuro S3 file.
21
+ class EEGDashBaseDataset(BaseDataset):
22
+ """Returns samples from an mne.io.Raw object along with a target.
23
+
24
+ Dataset which serves samples from an mne.io.Raw object along with a target.
25
+ The target is unique for the dataset, and is obtained through the
26
+ `description` attribute.
27
+
28
+ Parameters
29
+ ----------
30
+ raw : mne.io.Raw
31
+ Continuous data.
32
+ description : dict | pandas.Series | None
33
+ Holds additional description about the continuous signal / subject.
34
+ target_name : str | tuple | None
35
+ Name(s) of the index in `description` that should be used to provide the
36
+ target (e.g., to be used in a prediction task later on).
37
+ transform : callable | None
38
+ On-the-fly transform applied to the example before it is returned.
39
+ """
40
+ AWS_BUCKET = 's3://openneuro.org'
41
+ def __init__(self, record, cache_dir, **kwargs):
42
+ super().__init__(None, **kwargs)
43
+ self.record = record
44
+ self.cache_dir = Path(cache_dir)
45
+ bids_kwargs = self.get_raw_bids_args()
46
+ self.bidspath = BIDSPath(root=self.cache_dir / record['dataset'], datatype='eeg', suffix='eeg', **bids_kwargs)
47
+ self.s3file = self.get_s3path(record['bidspath'])
48
+ self.filecache = self.cache_dir / record['bidspath']
49
+ self.bids_dependencies = record['bidsdependencies']
50
+ self._raw = None
51
+ # if os.path.exists(self.filecache):
52
+ # self.raw = mne_bids.read_raw_bids(self.bidspath, verbose=False)
53
+
54
+ def get_s3path(self, filepath):
55
+ return f"{self.AWS_BUCKET}/{filepath}"
56
+
57
+ def _download_s3(self):
58
+ self.filecache.parent.mkdir(parents=True, exist_ok=True)
59
+ filesystem = s3fs.S3FileSystem(anon=True, client_kwargs={'region_name': 'us-east-2'})
60
+ filesystem.download(self.s3file, self.filecache)
61
+ self.filenames = [self.filecache]
62
+
63
+ def _download_dependencies(self):
64
+ filesystem = s3fs.S3FileSystem(anon=True, client_kwargs={'region_name': 'us-east-2'})
65
+ for dep in self.bids_dependencies:
66
+ s3path = self.get_s3path(dep)
67
+ filepath = self.cache_dir / dep
68
+ if not filepath.exists():
69
+ filepath.parent.mkdir(parents=True, exist_ok=True)
70
+ filesystem.download(s3path, filepath)
71
+
72
+ def get_raw_bids_args(self):
73
+ desired_fields = ['subject', 'session', 'task', 'run']
74
+ return {k: self.record[k] for k in desired_fields if self.record[k]}
75
+
76
+ def check_and_get_raw(self):
77
+ if not os.path.exists(self.filecache): # not preload
78
+ if self.bids_dependencies:
79
+ self._download_dependencies()
80
+ self._download_s3()
81
+ if self._raw is None:
82
+ self._raw = mne_bids.read_raw_bids(self.bidspath, verbose=False)
83
+
84
+ def __getitem__(self, index):
85
+ # self.check_and_get_raw()
86
+
87
+ X = self.raw[:, index][0]
88
+ y = None
89
+ if self.target_name is not None:
90
+ y = self.description[self.target_name]
91
+ if isinstance(y, pd.Series):
92
+ y = y.to_list()
93
+ if self.transform is not None:
94
+ X = self.transform(X)
95
+ return X, y
96
+
97
+ def __len__(self):
98
+ if self._raw is None:
99
+ return self.record['rawdatainfo']['ntimes']
100
+ else:
101
+ return len(self._raw)
102
+
103
+ @property
104
+ def raw(self):
105
+ if self._raw is None:
106
+ self.check_and_get_raw()
107
+ return self._raw
108
+
109
+ @raw.setter
110
+ def raw(self, raw):
111
+ self._raw = raw
112
+
113
+ class EEGDashBaseRaw(BaseRaw):
114
+ r"""MNE Raw object from EEG-Dash connection with Openneuro S3 file.
18
115
 
19
116
  Parameters
20
117
  ----------
@@ -40,6 +137,7 @@ class RawEEGDash(BaseRaw):
40
137
  .. versionadded:: 0.11.0
41
138
  """
42
139
 
140
+ AWS_BUCKET = 's3://openneuro.org'
43
141
  def __init__(
44
142
  self,
45
143
  input_fname,
@@ -48,6 +146,7 @@ class RawEEGDash(BaseRaw):
48
146
  preload=False,
49
147
  *,
50
148
  cache_dir='./.eegdash_cache',
149
+ bids_dependencies:list = [],
51
150
  uint16_codec=None,
52
151
  montage_units="auto",
53
152
  verbose=None,
@@ -66,9 +165,10 @@ class RawEEGDash(BaseRaw):
66
165
  chtype = 'eog'
67
166
  ch_types.append(chtype)
68
167
  info = mne.create_info(ch_names=ch_names, sfreq=sfreq, ch_types=ch_types)
69
- self.s3file = input_fname
70
- os.makedirs(cache_dir, exist_ok=True)
71
- self.filecache = os.path.join(cache_dir, os.path.basename(self.s3file))
168
+ self.s3file = self.get_s3path(input_fname)
169
+ self.cache_dir = Path(cache_dir)
170
+ self.filecache = self.cache_dir / input_fname
171
+ self.bids_dependencies = bids_dependencies
72
172
 
73
173
  if preload and not os.path.exists(self.filecache):
74
174
  self._download_s3()
@@ -82,17 +182,30 @@ class RawEEGDash(BaseRaw):
82
182
  verbose=verbose,
83
183
  )
84
184
 
185
+ def get_s3path(self, filepath):
186
+ return f"{self.AWS_BUCKET}/{filepath}"
187
+
85
188
  def _download_s3(self):
189
+ self.filecache.parent.mkdir(parents=True, exist_ok=True)
86
190
  filesystem = s3fs.S3FileSystem(anon=True, client_kwargs={'region_name': 'us-east-2'})
87
- print('s3file', self.s3file)
88
- print('filecache', self.filecache)
89
191
  filesystem.download(self.s3file, self.filecache)
90
192
  self.filenames = [self.filecache]
91
193
 
194
+ def _download_dependencies(self):
195
+ filesystem = s3fs.S3FileSystem(anon=True, client_kwargs={'region_name': 'us-east-2'})
196
+ for dep in self.bids_dependencies:
197
+ s3path = self.get_s3path(dep)
198
+ filepath = self.cache_dir / dep
199
+ if not filepath.exists():
200
+ filepath.parent.mkdir(parents=True, exist_ok=True)
201
+ filesystem.download(s3path, filepath)
202
+
92
203
  def _read_segment(
93
204
  self, start=0, stop=None, sel=None, data_buffer=None, *, verbose=None
94
205
  ):
95
206
  if not os.path.exists(self.filecache): # not preload
207
+ if self.bids_dependencies:
208
+ self._download_dependencies()
96
209
  self._download_s3()
97
210
  else: # not preload and file is not cached
98
211
  self.filenames = [self.filecache]
@@ -121,6 +234,7 @@ class BIDSDataset():
121
234
  raise ValueError('data_dir must be specified and must exist')
122
235
  self.bidsdir = Path(data_dir)
123
236
  self.dataset = dataset
237
+ assert str(self.bidsdir).endswith(self.dataset)
124
238
 
125
239
  if raw_format.lower() not in self.ALLOWED_FILE_FORMAT:
126
240
  raise ValueError('raw_format must be one of {}'.format(self.ALLOWED_FILE_FORMAT))
@@ -136,6 +250,10 @@ class BIDSDataset():
136
250
  else:
137
251
  self.files = np.load(temp_dir / f'{dataset}_files.npy', allow_pickle=True)
138
252
 
253
+ def get_relative_bidspath(self, filename):
254
+ bids_parent_dir = self.bidsdir.parent
255
+ return str(Path(filename).relative_to(bids_parent_dir))
256
+
139
257
  def get_property_from_filename(self, property, filename):
140
258
  import platform
141
259
  if platform.system() == "Windows":
@@ -177,11 +295,17 @@ class BIDSDataset():
177
295
  for file in os.listdir(path):
178
296
  # target_file = path / f"{cur_file_basename}_{extension}"
179
297
  if os.path.isfile(path/file):
180
- cur_file_basename = file[:file.rfind('_')] # TODO: change to just search for any file with extension
181
- if file.endswith(extension) and cur_file_basename in basename:
298
+ # check if file has extension extension
299
+ # check if file basename has extension
300
+ if file.endswith(extension):
182
301
  filepath = path / file
183
302
  bids_files.append(filepath)
184
303
 
304
+ # cur_file_basename = file[:file.rfind('_')] # TODO: change to just search for any file with extension
305
+ # if file.endswith(extension) and cur_file_basename in basename:
306
+ # filepath = path / file
307
+ # bids_files.append(filepath)
308
+
185
309
  # check if file is in top level directory
186
310
  if any(file in os.listdir(path) for file in top_level_files):
187
311
  return bids_files
@@ -210,10 +334,7 @@ class BIDSDataset():
210
334
  basename = filename[:filename.rfind('_')]
211
335
  # metadata files
212
336
  meta_files = self.get_bids_file_inheritance(path, basename, metadata_file_extension)
213
- if not meta_files:
214
- raise ValueError('No metadata files found for filepath {filepath} and extension {metadata_file_extension}')
215
- else:
216
- return meta_files
337
+ return meta_files
217
338
 
218
339
  def scan_directory(self, directory, extension):
219
340
  result_files = []
@@ -336,4 +457,25 @@ class BIDSDataset():
336
457
  def num_times(self, data_filepath):
337
458
  eeg_jsons = self.get_bids_metadata_files(data_filepath, 'eeg.json')
338
459
  eeg_json_dict = self.merge_json_inheritance(eeg_jsons)
339
- return int(eeg_json_dict['SamplingFrequency'] * eeg_json_dict['RecordingDuration'])
460
+ return int(eeg_json_dict['SamplingFrequency'] * eeg_json_dict['RecordingDuration'])
461
+
462
+ def subject_participant_tsv(self, data_filepath):
463
+ '''Get participants_tsv info of a subject based on filepath'''
464
+ participants_tsv = pd.read_csv(self.get_bids_metadata_files(data_filepath, 'participants.tsv')[0], sep='\t')
465
+ # set 'participant_id' as index
466
+ participants_tsv.set_index('participant_id', inplace=True)
467
+ subject = f'sub-{self.subject(data_filepath)}'
468
+ return participants_tsv.loc[subject].to_dict()
469
+
470
+ def eeg_json(self, data_filepath):
471
+ eeg_jsons = self.get_bids_metadata_files(data_filepath, 'eeg.json')
472
+ eeg_json_dict = self.merge_json_inheritance(eeg_jsons)
473
+ return eeg_json_dict
474
+
475
+ def channel_tsv(self, data_filepath):
476
+ channels_tsv = pd.read_csv(self.get_bids_metadata_files(data_filepath, 'channels.tsv')[0], sep='\t')
477
+ channel_tsv = channels_tsv.to_dict()
478
+ # 'name' and 'type' now have a dictionary of index-value. Convert them to list
479
+ for list_field in ['name', 'type', 'units']:
480
+ channel_tsv[list_field] = list(channel_tsv[list_field].values())
481
+ return channel_tsv