eegdash 0.0.1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of eegdash might be problematic. Click here for more details.
- eegdash/SignalStore/__init__.py +0 -0
- eegdash/SignalStore/signalstore/__init__.py +3 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/abstract_read_adapter.py +13 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/domain_modeling/schema_read_adapter.py +16 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/domain_modeling/vocabulary_read_adapter.py +19 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/handmade_records/excel_study_organizer_read_adapter.py +114 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/axona/axona_read_adapter.py +912 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/ReadIntanSpikeFile.py +140 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/intan_read_adapter.py +29 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhd_format/intanutil/__init__.py +0 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhd_format/intanutil/data_to_result.py +62 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhd_format/intanutil/get_bytes_per_data_block.py +36 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhd_format/intanutil/notch_filter.py +50 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhd_format/intanutil/qstring.py +41 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhd_format/intanutil/read_header.py +135 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhd_format/intanutil/read_one_data_block.py +45 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhd_format/load_intan_rhd_format.py +204 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhs_format/intanutil/__init__.py +0 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhs_format/intanutil/data_to_result.py +60 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhs_format/intanutil/get_bytes_per_data_block.py +37 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhs_format/intanutil/notch_filter.py +50 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhs_format/intanutil/qstring.py +41 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhs_format/intanutil/read_header.py +153 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhs_format/intanutil/read_one_data_block.py +47 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/intan/load_intan_rhs_format/load_intan_rhs_format.py +213 -0
- eegdash/SignalStore/signalstore/adapters/read_adapters/recording_acquisitions/neurodata_without_borders/neurodata_without_borders_read_adapter.py +14 -0
- eegdash/SignalStore/signalstore/operations/__init__.py +4 -0
- eegdash/SignalStore/signalstore/operations/handler_executor.py +22 -0
- eegdash/SignalStore/signalstore/operations/handler_factory.py +41 -0
- eegdash/SignalStore/signalstore/operations/handlers/base_handler.py +44 -0
- eegdash/SignalStore/signalstore/operations/handlers/domain/property_model_handlers.py +79 -0
- eegdash/SignalStore/signalstore/operations/handlers/domain/schema_handlers.py +3 -0
- eegdash/SignalStore/signalstore/operations/helpers/abstract_helper.py +17 -0
- eegdash/SignalStore/signalstore/operations/helpers/neuroscikit_extractor.py +33 -0
- eegdash/SignalStore/signalstore/operations/helpers/neuroscikit_rawio.py +165 -0
- eegdash/SignalStore/signalstore/operations/helpers/spikeinterface_helper.py +100 -0
- eegdash/SignalStore/signalstore/operations/helpers/wrappers/neo_wrappers.py +21 -0
- eegdash/SignalStore/signalstore/operations/helpers/wrappers/nwb_wrappers.py +27 -0
- eegdash/SignalStore/signalstore/store/__init__.py +8 -0
- eegdash/SignalStore/signalstore/store/data_access_objects.py +1181 -0
- eegdash/SignalStore/signalstore/store/datafile_adapters.py +131 -0
- eegdash/SignalStore/signalstore/store/repositories.py +928 -0
- eegdash/SignalStore/signalstore/store/store_errors.py +68 -0
- eegdash/SignalStore/signalstore/store/unit_of_work.py +97 -0
- eegdash/SignalStore/signalstore/store/unit_of_work_provider.py +67 -0
- eegdash/SignalStore/signalstore/utilities/data_adapters/spike_interface_adapters/si_recording.py +1 -0
- eegdash/SignalStore/signalstore/utilities/data_adapters/spike_interface_adapters/si_sorter.py +1 -0
- eegdash/SignalStore/signalstore/utilities/testing/data_mocks.py +513 -0
- eegdash/SignalStore/signalstore/utilities/tools/dataarrays.py +49 -0
- eegdash/SignalStore/signalstore/utilities/tools/mongo_records.py +25 -0
- eegdash/SignalStore/signalstore/utilities/tools/operation_response.py +78 -0
- eegdash/SignalStore/signalstore/utilities/tools/purge_orchestration_response.py +21 -0
- eegdash/SignalStore/signalstore/utilities/tools/quantities.py +15 -0
- eegdash/SignalStore/signalstore/utilities/tools/strings.py +38 -0
- eegdash/SignalStore/signalstore/utilities/tools/time.py +17 -0
- eegdash/SignalStore/tests/conftest.py +799 -0
- eegdash/SignalStore/tests/data/valid_data/data_arrays/make_fake_data.py +59 -0
- eegdash/SignalStore/tests/unit/store/conftest.py +0 -0
- eegdash/SignalStore/tests/unit/store/test_data_access_objects.py +1235 -0
- eegdash/SignalStore/tests/unit/store/test_repositories.py +1309 -0
- eegdash/SignalStore/tests/unit/store/test_unit_of_work.py +7 -0
- eegdash/SignalStore/tests/unit/test_ci_cd.py +8 -0
- eegdash/__init__.py +1 -0
- eegdash/aws_ingest.py +29 -0
- eegdash/data_utils.py +213 -0
- eegdash/main.py +17 -0
- eegdash/signalstore_data_utils.py +280 -0
- eegdash-0.0.1.dist-info/LICENSE +20 -0
- eegdash-0.0.1.dist-info/METADATA +72 -0
- eegdash-0.0.1.dist-info/RECORD +72 -0
- eegdash-0.0.1.dist-info/WHEEL +5 -0
- eegdash-0.0.1.dist-info/top_level.txt +1 -0
eegdash/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
from eegdash.main import EEGDash
|
eegdash/aws_ingest.py
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
import sys
|
|
2
|
+
sys.path.append('..')
|
|
3
|
+
import argparse
|
|
4
|
+
from src.signalstore_data_utils import SignalstoreBIDS
|
|
5
|
+
|
|
6
|
+
def add_bids_dataset(args):
|
|
7
|
+
signalstore_aws = SignalstoreBIDS(
|
|
8
|
+
dbconnectionstring='mongodb://23.21.113.214:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.2.1',
|
|
9
|
+
local_filesystem=False,
|
|
10
|
+
project_name='eegdash',
|
|
11
|
+
)
|
|
12
|
+
signalstore_aws.add_bids_dataset(dataset=args.dataset, data_dir=args.data, raw_format='eeglab')
|
|
13
|
+
|
|
14
|
+
def main():
|
|
15
|
+
# Create the parser
|
|
16
|
+
parser = argparse.ArgumentParser(description="A simple command line argument parser")
|
|
17
|
+
|
|
18
|
+
# Add arguments
|
|
19
|
+
parser.add_argument('--data', type=str, default="/mnt/nemar/openneuro/ds004186", help="Path to data directory (Default: /mnt/nemar/openneuro/ds004186)")
|
|
20
|
+
parser.add_argument('--dataset', type=str, default="ds004186", help="Dataset name (Default: ds004186)")
|
|
21
|
+
|
|
22
|
+
# Parse the arguments
|
|
23
|
+
args = parser.parse_args()
|
|
24
|
+
print('Arguments:', args)
|
|
25
|
+
|
|
26
|
+
add_bids_dataset(args)
|
|
27
|
+
|
|
28
|
+
if __name__ == "__main__":
|
|
29
|
+
main()
|
eegdash/data_utils.py
ADDED
|
@@ -0,0 +1,213 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import sys
|
|
3
|
+
from joblib import Parallel, delayed
|
|
4
|
+
import mne
|
|
5
|
+
import numpy as np
|
|
6
|
+
from pathlib import Path
|
|
7
|
+
import re
|
|
8
|
+
import json
|
|
9
|
+
|
|
10
|
+
verbose = False
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
class BIDSDataset():
|
|
14
|
+
ALLOWED_FILE_FORMAT = ['eeglab', 'brainvision', 'biosemi', 'european']
|
|
15
|
+
RAW_EXTENSION = {
|
|
16
|
+
'eeglab': '.set',
|
|
17
|
+
'brainvision': '.vhdr',
|
|
18
|
+
'biosemi': '.bdf',
|
|
19
|
+
'european': '.edf'
|
|
20
|
+
}
|
|
21
|
+
METADATA_FILE_EXTENSIONS = ['eeg.json', 'channels.tsv', 'electrodes.tsv', 'events.tsv', 'events.json']
|
|
22
|
+
def __init__(self,
|
|
23
|
+
data_dir=None, # location of asr cleaned data
|
|
24
|
+
dataset='', # dataset name
|
|
25
|
+
raw_format='eeglab', # format of raw data
|
|
26
|
+
):
|
|
27
|
+
if data_dir is None or not os.path.exists(data_dir):
|
|
28
|
+
raise ValueError('data_dir must be specified and must exist')
|
|
29
|
+
self.bidsdir = Path(data_dir)
|
|
30
|
+
self.dataset = dataset
|
|
31
|
+
|
|
32
|
+
if raw_format.lower() not in self.ALLOWED_FILE_FORMAT:
|
|
33
|
+
raise ValueError('raw_format must be one of {}'.format(self.ALLOWED_FILE_FORMAT))
|
|
34
|
+
self.raw_format = raw_format.lower()
|
|
35
|
+
|
|
36
|
+
# get all .set files in the bids directory
|
|
37
|
+
temp_dir = (Path().resolve() / 'data')
|
|
38
|
+
if not os.path.exists(temp_dir):
|
|
39
|
+
os.mkdir(temp_dir)
|
|
40
|
+
if not os.path.exists(temp_dir / f'{dataset}_files.npy'):
|
|
41
|
+
self.files = self.get_files_with_extension_parallel(self.bidsdir, extension=self.RAW_EXTENSION[self.raw_format])
|
|
42
|
+
np.save(temp_dir / f'{dataset}_files.npy', self.files)
|
|
43
|
+
else:
|
|
44
|
+
self.files = np.load(temp_dir / f'{dataset}_files.npy', allow_pickle=True)
|
|
45
|
+
|
|
46
|
+
def get_property_from_filename(self, property, filename):
|
|
47
|
+
lookup = re.search(rf'{property}-(.*?)[_\/]', filename)
|
|
48
|
+
return lookup.group(1) if lookup else ''
|
|
49
|
+
|
|
50
|
+
def get_bids_file_inheritance(self, path, basename, extension):
|
|
51
|
+
'''
|
|
52
|
+
Get all files with given extension that applies to the basename file
|
|
53
|
+
following the BIDS inheritance principle in the order of lowest level first
|
|
54
|
+
@param
|
|
55
|
+
basename: bids file basename without _eeg.set extension for example
|
|
56
|
+
extension: e.g. channels.tsv
|
|
57
|
+
'''
|
|
58
|
+
top_level_files = ['README', 'dataset_description.json', 'participants.tsv']
|
|
59
|
+
bids_files = []
|
|
60
|
+
|
|
61
|
+
# check if path is str object
|
|
62
|
+
if isinstance(path, str):
|
|
63
|
+
path = Path(path)
|
|
64
|
+
if not path.exists:
|
|
65
|
+
raise ValueError('path {path} does not exist')
|
|
66
|
+
|
|
67
|
+
# check if file is in current path
|
|
68
|
+
for file in os.listdir(path):
|
|
69
|
+
# target_file = path / f"{cur_file_basename}_{extension}"
|
|
70
|
+
if os.path.isfile(path/file):
|
|
71
|
+
cur_file_basename = file[:file.rfind('_')]
|
|
72
|
+
if file.endswith(extension) and cur_file_basename in basename:
|
|
73
|
+
filepath = path / file
|
|
74
|
+
bids_files.append(filepath)
|
|
75
|
+
|
|
76
|
+
# check if file is in top level directory
|
|
77
|
+
if any(file in os.listdir(path) for file in top_level_files):
|
|
78
|
+
return bids_files
|
|
79
|
+
else:
|
|
80
|
+
# call get_bids_file_inheritance recursively with parent directory
|
|
81
|
+
bids_files.extend(self.get_bids_file_inheritance(path.parent, basename, extension))
|
|
82
|
+
return bids_files
|
|
83
|
+
|
|
84
|
+
def get_bids_metadata_files(self, filepath, metadata_file_extension):
|
|
85
|
+
"""
|
|
86
|
+
(Wrapper for self.get_bids_file_inheritance)
|
|
87
|
+
Get all BIDS metadata files that are associated with the given filepath, following the BIDS inheritance principle.
|
|
88
|
+
|
|
89
|
+
Args:
|
|
90
|
+
filepath (str or Path): The filepath to get the associated metadata files for.
|
|
91
|
+
metadata_files_extensions (list): A list of file extensions to search for metadata files.
|
|
92
|
+
|
|
93
|
+
Returns:
|
|
94
|
+
list: A list of filepaths for all the associated metadata files
|
|
95
|
+
"""
|
|
96
|
+
if isinstance(filepath, str):
|
|
97
|
+
filepath = Path(filepath)
|
|
98
|
+
if not filepath.exists:
|
|
99
|
+
raise ValueError('filepath {filepath} does not exist')
|
|
100
|
+
path, filename = os.path.split(filepath)
|
|
101
|
+
basename = filename[:filename.rfind('_')]
|
|
102
|
+
# metadata files
|
|
103
|
+
meta_files = self.get_bids_file_inheritance(path, basename, metadata_file_extension)
|
|
104
|
+
if not meta_files:
|
|
105
|
+
raise ValueError('No metadata files found for filepath {filepath} and extension {metadata_file_extension}')
|
|
106
|
+
else:
|
|
107
|
+
return meta_files
|
|
108
|
+
|
|
109
|
+
def scan_directory(self, directory, extension):
|
|
110
|
+
result_files = []
|
|
111
|
+
directory_to_ignore = ['.git']
|
|
112
|
+
with os.scandir(directory) as entries:
|
|
113
|
+
for entry in entries:
|
|
114
|
+
if entry.is_file() and entry.name.endswith(extension):
|
|
115
|
+
print('Adding ', entry.path)
|
|
116
|
+
result_files.append(entry.path)
|
|
117
|
+
elif entry.is_dir():
|
|
118
|
+
# check that entry path doesn't contain any name in ignore list
|
|
119
|
+
if not any(name in entry.name for name in directory_to_ignore):
|
|
120
|
+
result_files.append(entry.path) # Add directory to scan later
|
|
121
|
+
return result_files
|
|
122
|
+
|
|
123
|
+
def get_files_with_extension_parallel(self, directory, extension='.set', max_workers=-1):
|
|
124
|
+
result_files = []
|
|
125
|
+
dirs_to_scan = [directory]
|
|
126
|
+
|
|
127
|
+
# Use joblib.Parallel and delayed to parallelize directory scanning
|
|
128
|
+
while dirs_to_scan:
|
|
129
|
+
print(f"Scanning {len(dirs_to_scan)} directories...", dirs_to_scan)
|
|
130
|
+
# Run the scan_directory function in parallel across directories
|
|
131
|
+
results = Parallel(n_jobs=max_workers, prefer="threads", verbose=1)(
|
|
132
|
+
delayed(self.scan_directory)(d, extension) for d in dirs_to_scan
|
|
133
|
+
)
|
|
134
|
+
|
|
135
|
+
# Reset the directories to scan and process the results
|
|
136
|
+
dirs_to_scan = []
|
|
137
|
+
for res in results:
|
|
138
|
+
for path in res:
|
|
139
|
+
if os.path.isdir(path):
|
|
140
|
+
dirs_to_scan.append(path) # Queue up subdirectories to scan
|
|
141
|
+
else:
|
|
142
|
+
result_files.append(path) # Add files to the final result
|
|
143
|
+
print(f"Current number of files: {len(result_files)}")
|
|
144
|
+
|
|
145
|
+
return result_files
|
|
146
|
+
|
|
147
|
+
def load_and_preprocess_raw(self, raw_file, preprocess=False):
|
|
148
|
+
print(f"Loading {raw_file}")
|
|
149
|
+
EEG = mne.io.read_raw_eeglab(raw_file, preload=True, verbose='error')
|
|
150
|
+
|
|
151
|
+
if preprocess:
|
|
152
|
+
# highpass filter
|
|
153
|
+
EEG = EEG.filter(l_freq=0.25, h_freq=25, verbose=False)
|
|
154
|
+
# remove 60Hz line noise
|
|
155
|
+
EEG = EEG.notch_filter(freqs=(60), verbose=False)
|
|
156
|
+
# bring to common sampling rate
|
|
157
|
+
sfreq = 128
|
|
158
|
+
if EEG.info['sfreq'] != sfreq:
|
|
159
|
+
EEG = EEG.resample(sfreq)
|
|
160
|
+
# # normalize data to zero mean and unit variance
|
|
161
|
+
# scalar = preprocessing.StandardScaler()
|
|
162
|
+
# mat_data = scalar.fit_transform(mat_data.T).T # scalar normalize for each feature and expects shape data x features
|
|
163
|
+
|
|
164
|
+
mat_data = EEG.get_data()
|
|
165
|
+
|
|
166
|
+
if len(mat_data.shape) > 2:
|
|
167
|
+
raise ValueError('Expect raw data to be CxT dimension')
|
|
168
|
+
return mat_data
|
|
169
|
+
|
|
170
|
+
def get_files(self):
|
|
171
|
+
return self.files
|
|
172
|
+
|
|
173
|
+
def resolve_bids_json(self, json_files: list):
|
|
174
|
+
"""
|
|
175
|
+
Resolve the BIDS JSON files and return a dictionary of the resolved values.
|
|
176
|
+
Args:
|
|
177
|
+
json_files (list): A list of JSON files to resolve in order of leaf level first
|
|
178
|
+
|
|
179
|
+
Returns:
|
|
180
|
+
dict: A dictionary of the resolved values.
|
|
181
|
+
"""
|
|
182
|
+
if len(json_files) == 0:
|
|
183
|
+
raise ValueError('No JSON files provided')
|
|
184
|
+
json_files.reverse() # TODO undeterministic
|
|
185
|
+
|
|
186
|
+
json_dict = {}
|
|
187
|
+
for json_file in json_files:
|
|
188
|
+
with open(json_file) as f:
|
|
189
|
+
json_dict.update(json.load(f))
|
|
190
|
+
return json_dict
|
|
191
|
+
|
|
192
|
+
def sfreq(self, data_filepath):
|
|
193
|
+
json_files = self.get_bids_metadata_files(data_filepath, 'eeg.json')
|
|
194
|
+
if len(json_files) == 0:
|
|
195
|
+
raise ValueError('No eeg.json found')
|
|
196
|
+
|
|
197
|
+
metadata = self.resolve_bids_json(json_files)
|
|
198
|
+
if 'SamplingFrequency' not in metadata:
|
|
199
|
+
raise ValueError('SamplingFrequency not found in metadata')
|
|
200
|
+
else:
|
|
201
|
+
return metadata['SamplingFrequency']
|
|
202
|
+
|
|
203
|
+
def task(self, data_filepath):
|
|
204
|
+
return self.get_property_from_filename('task', data_filepath)
|
|
205
|
+
|
|
206
|
+
def session(self, data_filepath):
|
|
207
|
+
return self.get_property_from_filename('session', data_filepath)
|
|
208
|
+
|
|
209
|
+
def run(self, data_filepath):
|
|
210
|
+
return self.get_property_from_filename('run', data_filepath)
|
|
211
|
+
|
|
212
|
+
def subject(self, data_filepath):
|
|
213
|
+
return self.get_property_from_filename('sub', data_filepath)
|
eegdash/main.py
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
from eegdash.signalstore_data_utils import SignalstoreBIDS
|
|
2
|
+
|
|
3
|
+
class EEGDash:
|
|
4
|
+
def __init__(self):
|
|
5
|
+
self.sstore = SignalstoreBIDS(
|
|
6
|
+
# dbconnectionstring='mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.3.1',
|
|
7
|
+
dbconnectionstring='mongodb+srv://eegdash-user:mdzoMjQcHWTVnKDq@cluster0.vz35p.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0',
|
|
8
|
+
is_public=True,
|
|
9
|
+
local_filesystem=False,
|
|
10
|
+
project_name='eegdash'
|
|
11
|
+
)
|
|
12
|
+
|
|
13
|
+
def find(self, *args):
|
|
14
|
+
return self.sstore.find(*args)
|
|
15
|
+
|
|
16
|
+
def get(self, *args):
|
|
17
|
+
return self.sstore.get(*args)
|
|
@@ -0,0 +1,280 @@
|
|
|
1
|
+
from pathlib import Path
|
|
2
|
+
from dotenv import load_dotenv
|
|
3
|
+
import re
|
|
4
|
+
import numpy as np
|
|
5
|
+
import xarray as xr
|
|
6
|
+
import os
|
|
7
|
+
from eegdash.SignalStore.signalstore.store import UnitOfWorkProvider
|
|
8
|
+
# from mongomock import MongoClient
|
|
9
|
+
from pymongo.mongo_client import MongoClient
|
|
10
|
+
from pymongo.server_api import ServerApi
|
|
11
|
+
from fsspec.implementations.local import LocalFileSystem
|
|
12
|
+
from fsspec.implementations.dirfs import DirFileSystem
|
|
13
|
+
import pandas as pd
|
|
14
|
+
import json
|
|
15
|
+
import s3fs
|
|
16
|
+
from eegdash.data_utils import BIDSDataset
|
|
17
|
+
|
|
18
|
+
|
|
19
|
+
class SignalstoreBIDS():
|
|
20
|
+
AWS_BUCKET = 'eegdash'
|
|
21
|
+
def __init__(self,
|
|
22
|
+
project_name=AWS_BUCKET,
|
|
23
|
+
dbconnectionstring="mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.3.1",
|
|
24
|
+
is_public=False,
|
|
25
|
+
local_filesystem=True,
|
|
26
|
+
):
|
|
27
|
+
self.is_public = is_public
|
|
28
|
+
if is_public:
|
|
29
|
+
dbconnectionstring='mongodb+srv://eegdash-user:mdzoMjQcHWTVnKDq@cluster0.vz35p.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0',
|
|
30
|
+
else:
|
|
31
|
+
load_dotenv()
|
|
32
|
+
dbconnectionstring = os.getenv('DB_CONNECTION_STRING')
|
|
33
|
+
|
|
34
|
+
# Create a new client and connect to the server
|
|
35
|
+
client = MongoClient(dbconnectionstring, server_api=ServerApi('1'))
|
|
36
|
+
# Send a ping to confirm a successful connection
|
|
37
|
+
try:
|
|
38
|
+
client.admin.command('ping')
|
|
39
|
+
print("Pinged your deployment. You successfully connected to MongoDB!")
|
|
40
|
+
except Exception as e:
|
|
41
|
+
print(e)
|
|
42
|
+
|
|
43
|
+
memory_store = {}
|
|
44
|
+
filesystem = self.set_up_filesystem(is_local=local_filesystem)
|
|
45
|
+
self.uow_provider = UnitOfWorkProvider(
|
|
46
|
+
mongo_client=client,
|
|
47
|
+
filesystem=filesystem,
|
|
48
|
+
memory_store=memory_store,
|
|
49
|
+
default_filetype='zarr'
|
|
50
|
+
)
|
|
51
|
+
|
|
52
|
+
self.project_name=project_name
|
|
53
|
+
self.uow = self.uow_provider(self.project_name)
|
|
54
|
+
# self.load_domain_models()
|
|
55
|
+
|
|
56
|
+
def set_up_filesystem(self, is_local=True):
|
|
57
|
+
if is_local:
|
|
58
|
+
cache_path='/mnt/nemar/dtyoung/eeg-ssl-data' # path where signalstore netCDF files are stored
|
|
59
|
+
# Create a directory for the dataset
|
|
60
|
+
store_path = Path(cache_path)
|
|
61
|
+
if not os.path.exists(store_path):
|
|
62
|
+
os.makedirs(store_path)
|
|
63
|
+
|
|
64
|
+
filesystem = LocalFileSystem()
|
|
65
|
+
tmp_dir_fs = DirFileSystem(
|
|
66
|
+
store_path,
|
|
67
|
+
filesystem=filesystem
|
|
68
|
+
)
|
|
69
|
+
return tmp_dir_fs
|
|
70
|
+
else:
|
|
71
|
+
if self.is_public:
|
|
72
|
+
s3 = s3fs.S3FileSystem(anon=True, client_kwargs={'region_name': 'us-east-2'})
|
|
73
|
+
else:
|
|
74
|
+
s3 = s3fs.S3FileSystem(client_kwargs={'region_name': 'us-east-2'})
|
|
75
|
+
return s3
|
|
76
|
+
|
|
77
|
+
def load_domain_models(self):
|
|
78
|
+
cwd = Path.cwd()
|
|
79
|
+
domain_models_path = cwd / f"DomainModels/{self.project_name}/data_models.json"
|
|
80
|
+
metamodel_path = cwd / f"DomainModels/{self.project_name}/metamodels.json"
|
|
81
|
+
property_path = cwd / f"DomainModels/{self.project_name}/property_models.json"
|
|
82
|
+
with open(metamodel_path) as f:
|
|
83
|
+
metamodels = json.load(f)
|
|
84
|
+
|
|
85
|
+
with open(property_path) as f:
|
|
86
|
+
property_models = json.load(f)
|
|
87
|
+
|
|
88
|
+
# load domain models json file
|
|
89
|
+
with open(domain_models_path) as f:
|
|
90
|
+
domain_models = json.load(f)
|
|
91
|
+
|
|
92
|
+
with self.uow as uow:
|
|
93
|
+
for property_model in property_models:
|
|
94
|
+
uow.domain_models.add(property_model)
|
|
95
|
+
model = uow.domain_models.get(property_model['schema_name'])
|
|
96
|
+
print('property model: ', model['schema_name'])
|
|
97
|
+
for metamodel in metamodels:
|
|
98
|
+
uow.domain_models.add(metamodel)
|
|
99
|
+
model = uow.domain_models.get(metamodel['schema_name'])
|
|
100
|
+
print('meta model: ', model['schema_name'])
|
|
101
|
+
for domain_model in domain_models:
|
|
102
|
+
uow.domain_models.add(domain_model)
|
|
103
|
+
model = uow.domain_models.get(domain_model['schema_name'])
|
|
104
|
+
print('domain model: ', model['schema_name'])
|
|
105
|
+
uow.commit()
|
|
106
|
+
|
|
107
|
+
def extract_attribute(self, pattern, filename):
|
|
108
|
+
match = re.search(pattern, filename)
|
|
109
|
+
return match.group(1) if match else None
|
|
110
|
+
|
|
111
|
+
def load_eeg_attrs_from_bids_file(self, bids_dataset: BIDSDataset, bids_file):
|
|
112
|
+
'''
|
|
113
|
+
bids_file must be a file of the bids_dataset
|
|
114
|
+
'''
|
|
115
|
+
if bids_file not in bids_dataset.files:
|
|
116
|
+
raise ValueError(f'{bids_file} not in {bids_dataset.dataset}')
|
|
117
|
+
f = os.path.basename(bids_file)
|
|
118
|
+
attrs = {
|
|
119
|
+
'schema_ref': 'eeg_signal',
|
|
120
|
+
'data_name': f'{bids_dataset.dataset}_{f}',
|
|
121
|
+
'dataset': bids_dataset.dataset,
|
|
122
|
+
'subject': bids_dataset.subject(bids_file),
|
|
123
|
+
'task': bids_dataset.task(bids_file),
|
|
124
|
+
'session': bids_dataset.session(bids_file),
|
|
125
|
+
'run': bids_dataset.run(bids_file),
|
|
126
|
+
'sampling_frequency': bids_dataset.sfreq(bids_file),
|
|
127
|
+
'modality': 'EEG',
|
|
128
|
+
}
|
|
129
|
+
|
|
130
|
+
return attrs
|
|
131
|
+
|
|
132
|
+
def load_eeg_data_from_bids_file(self, bids_dataset: BIDSDataset, bids_file, eeg_attrs=None):
|
|
133
|
+
'''
|
|
134
|
+
bids_file must be a file of the bids_dataset
|
|
135
|
+
'''
|
|
136
|
+
if bids_file not in bids_dataset.files:
|
|
137
|
+
raise ValueError(f'{bids_file} not in {bids_dataset.dataset}')
|
|
138
|
+
|
|
139
|
+
attrs = self.load_eeg_attrs_from_bids_file(bids_dataset, bids_file) if eeg_attrs is None else eeg_attrs
|
|
140
|
+
|
|
141
|
+
eeg_data = bids_dataset.load_and_preprocess_raw(bids_file)
|
|
142
|
+
print('data shape:', eeg_data.shape)
|
|
143
|
+
|
|
144
|
+
fs = attrs['sampling_frequency']
|
|
145
|
+
max_time = eeg_data.shape[1] / fs
|
|
146
|
+
time_steps = np.linspace(0, max_time, eeg_data.shape[1]).squeeze() # in seconds
|
|
147
|
+
# print('time steps', len(time_steps))
|
|
148
|
+
|
|
149
|
+
# replace eeg.set with channels.tsv
|
|
150
|
+
# todo this is still a hacky way
|
|
151
|
+
channels_tsv = bids_dataset.get_bids_metadata_files(bids_file, 'channels.tsv')
|
|
152
|
+
channels_tsv = Path(channels_tsv[0])
|
|
153
|
+
if channels_tsv.exists():
|
|
154
|
+
channels = pd.read_csv(channels_tsv, sep='\t')
|
|
155
|
+
# get channel names from channel_coords
|
|
156
|
+
channel_names = channels['name'].values
|
|
157
|
+
|
|
158
|
+
eeg_xarray = xr.DataArray(
|
|
159
|
+
data=eeg_data,
|
|
160
|
+
dims=['channel','time'],
|
|
161
|
+
coords={
|
|
162
|
+
'time': time_steps,
|
|
163
|
+
'channel': channel_names
|
|
164
|
+
},
|
|
165
|
+
attrs=attrs
|
|
166
|
+
)
|
|
167
|
+
return eeg_xarray
|
|
168
|
+
|
|
169
|
+
def exist(self, schema_ref='eeg_signal', data_name=''):
|
|
170
|
+
with self.uow as uow:
|
|
171
|
+
query = {
|
|
172
|
+
"schema_ref": schema_ref,
|
|
173
|
+
"data_name": data_name
|
|
174
|
+
}
|
|
175
|
+
sessions = uow.data.find(query)
|
|
176
|
+
if len(sessions) > 0:
|
|
177
|
+
return True
|
|
178
|
+
else:
|
|
179
|
+
return False
|
|
180
|
+
|
|
181
|
+
def add_bids_dataset(self, dataset, data_dir, raw_format='eeglab', overwrite=False, record_only=False):
|
|
182
|
+
if self.is_public:
|
|
183
|
+
raise ValueError('This operation is not allowed for public users')
|
|
184
|
+
|
|
185
|
+
bids_dataset = BIDSDataset(
|
|
186
|
+
data_dir=data_dir,
|
|
187
|
+
dataset=dataset,
|
|
188
|
+
raw_format=raw_format,
|
|
189
|
+
)
|
|
190
|
+
for bids_file in bids_dataset.get_files():
|
|
191
|
+
print('bids raw file', bids_file)
|
|
192
|
+
|
|
193
|
+
signalstore_data_id = f"{dataset}_{os.path.basename(bids_file)}"
|
|
194
|
+
if overwrite:
|
|
195
|
+
self.remove(signalstore_data_id)
|
|
196
|
+
|
|
197
|
+
if self.exist(data_name=signalstore_data_id):
|
|
198
|
+
print('data already exist. skipped')
|
|
199
|
+
continue
|
|
200
|
+
else:
|
|
201
|
+
eeg_attrs = self.load_eeg_attrs_from_bids_file(bids_dataset, bids_file)
|
|
202
|
+
with self.uow as uow:
|
|
203
|
+
# Assume raw data already exists, recreating record only
|
|
204
|
+
eeg_attrs['has_file'] = True
|
|
205
|
+
print('adding record', eeg_attrs['data_name'])
|
|
206
|
+
uow.data.add(eeg_attrs)
|
|
207
|
+
uow.commit()
|
|
208
|
+
if not record_only:
|
|
209
|
+
eeg_xarray = self.load_eeg_data_from_bids_file(bids_dataset, bids_file, eeg_attrs)
|
|
210
|
+
with self.uow as uow:
|
|
211
|
+
print('adding data', eeg_xarray.attrs['data_name'])
|
|
212
|
+
uow.data.add(eeg_xarray)
|
|
213
|
+
uow.commit()
|
|
214
|
+
|
|
215
|
+
def remove(self, schema_ref='eeg_signal', data_name=''):
|
|
216
|
+
if self.is_public:
|
|
217
|
+
raise ValueError('This operation is not allowed for public users')
|
|
218
|
+
|
|
219
|
+
with self.uow as uow:
|
|
220
|
+
sessions = uow.data.find({'schema_ref': schema_ref, 'data_name': data_name})
|
|
221
|
+
if len(session) > 0:
|
|
222
|
+
for session in range(len(sessions)):
|
|
223
|
+
uow.data.remove(session['schema_ref'], session['data_name'])
|
|
224
|
+
uow.commit()
|
|
225
|
+
|
|
226
|
+
def remove_all(self):
|
|
227
|
+
if self.is_public:
|
|
228
|
+
raise ValueError('This operation is not allowed for public users')
|
|
229
|
+
|
|
230
|
+
with self.uow as uow:
|
|
231
|
+
sessions = uow.data.find({})
|
|
232
|
+
print(len(sessions))
|
|
233
|
+
for session in range(len(sessions)):
|
|
234
|
+
uow.data.remove(session['schema_ref'], session['data_name'])
|
|
235
|
+
uow.commit()
|
|
236
|
+
|
|
237
|
+
uow.purge()
|
|
238
|
+
|
|
239
|
+
print('Verifying deletion job. Dataset length: ', len(uow.data.find({})))
|
|
240
|
+
|
|
241
|
+
def find(self, query:dict, validate=False, get_data=False):
|
|
242
|
+
'''
|
|
243
|
+
query: {
|
|
244
|
+
'dataset': 'dsxxxx',
|
|
245
|
+
|
|
246
|
+
}'''
|
|
247
|
+
with self.uow as uow:
|
|
248
|
+
sessions = uow.data.find(query, validate=validate, get_data=get_data)
|
|
249
|
+
if sessions:
|
|
250
|
+
print(f'Found {len(sessions)} records')
|
|
251
|
+
return sessions
|
|
252
|
+
else:
|
|
253
|
+
return []
|
|
254
|
+
|
|
255
|
+
def get(self, query:dict, validate=False):
|
|
256
|
+
'''
|
|
257
|
+
query: {
|
|
258
|
+
'dataset': 'dsxxxx',
|
|
259
|
+
|
|
260
|
+
}'''
|
|
261
|
+
with self.uow as uow:
|
|
262
|
+
sessions = uow.data.find(query, validate=validate, get_data=True)
|
|
263
|
+
if sessions:
|
|
264
|
+
print(f'Found {len(sessions)} records')
|
|
265
|
+
return sessions
|
|
266
|
+
else:
|
|
267
|
+
return []
|
|
268
|
+
|
|
269
|
+
if __name__ == "__main__":
|
|
270
|
+
# sstore_hbn = SignalstoreHBN()
|
|
271
|
+
# sstore_hbn.add_data()
|
|
272
|
+
# sstore_ds004584 = SignalstoreHBN(
|
|
273
|
+
# data_path='/mnt/nemar/openneuro/ds004584',
|
|
274
|
+
# dataset_name='eegdash',
|
|
275
|
+
# local_filesystem=False,
|
|
276
|
+
# dbconnectionstring='mongodb://23.21.113.214:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.2.1'
|
|
277
|
+
# )
|
|
278
|
+
# sstore_ds004584.load_domain_models()
|
|
279
|
+
# sstore_ds004584.add_data()
|
|
280
|
+
pass
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
GNU General Public License
|
|
2
|
+
|
|
3
|
+
Copyright (C) 2024-2025
|
|
4
|
+
|
|
5
|
+
Young Truong, UCSD, dt.young112@gmail.com
|
|
6
|
+
Arnaud Delorme, UCSD, adelorme@ucsd.edu
|
|
7
|
+
|
|
8
|
+
This program is free software; you can redistribute it and/or modify
|
|
9
|
+
it under the terms of the GNU General Public License as published by
|
|
10
|
+
the Free Software Foundation; either version 2 of the License, or
|
|
11
|
+
(at your option) any later version.
|
|
12
|
+
|
|
13
|
+
This program is distributed in the hope that it will be useful,
|
|
14
|
+
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
15
|
+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
16
|
+
GNU General Public License for more details.
|
|
17
|
+
|
|
18
|
+
You should have received a copy of the GNU General Public License
|
|
19
|
+
along with this program; if not, write to the Free Software
|
|
20
|
+
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1.07 USA
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: eegdash
|
|
3
|
+
Version: 0.0.1
|
|
4
|
+
Summary: EEG data for machine learning
|
|
5
|
+
Author-email: Young Truong <dt.young112@gmail.com>, Arnaud Delorme <adelorme@gmail.com>
|
|
6
|
+
License: GNU General Public License
|
|
7
|
+
|
|
8
|
+
Copyright (C) 2024-2025
|
|
9
|
+
|
|
10
|
+
Young Truong, UCSD, dt.young112@gmail.com
|
|
11
|
+
Arnaud Delorme, UCSD, adelorme@ucsd.edu
|
|
12
|
+
|
|
13
|
+
This program is free software; you can redistribute it and/or modify
|
|
14
|
+
it under the terms of the GNU General Public License as published by
|
|
15
|
+
the Free Software Foundation; either version 2 of the License, or
|
|
16
|
+
(at your option) any later version.
|
|
17
|
+
|
|
18
|
+
This program is distributed in the hope that it will be useful,
|
|
19
|
+
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
20
|
+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
21
|
+
GNU General Public License for more details.
|
|
22
|
+
|
|
23
|
+
You should have received a copy of the GNU General Public License
|
|
24
|
+
along with this program; if not, write to the Free Software
|
|
25
|
+
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1.07 USA
|
|
26
|
+
|
|
27
|
+
Project-URL: Homepage, https://github.com/sccn/EEG-Dash-Data
|
|
28
|
+
Project-URL: Issues, https://github.com/sccn/EEG-Dash-Data/issues
|
|
29
|
+
Classifier: Programming Language :: Python :: 3
|
|
30
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
31
|
+
Classifier: Operating System :: OS Independent
|
|
32
|
+
Requires-Python: >=3.8
|
|
33
|
+
Description-Content-Type: text/markdown
|
|
34
|
+
License-File: LICENSE
|
|
35
|
+
|
|
36
|
+
# EEG-Dash
|
|
37
|
+
To leverage recent and ongoing advancements in large-scale computational methods and to ensure the preservation of scientific data generated from publicly funded research, the EEG-DaSh data archive will create a data-sharing resource for MEEG (EEG, MEG) data contributed by collaborators for machine learning (ML) and deep learning (DL) applications.
|
|
38
|
+
|
|
39
|
+
## Data source
|
|
40
|
+
The data in EEG-DaSh originates from a collaboration involving 25 laboratories, encompassing 27,053 participants. This extensive collection includes MEEG data, which is a combination of EEG and MEG signals. The data is sourced from various studies conducted by these labs, involving both healthy subjects and clinical populations with conditions such as ADHD, depression, schizophrenia, dementia, autism, and psychosis. Additionally, data spans different mental states like sleep, meditation, and cognitive tasks. In addition, EEG-DaSh will also incorporate data converted from NEMAR, which includes a subset of the 330 MEEG BIDS-formatted datasets available on OpenNeuro, further expanding the archive with well-curated, standardized neuroelectromagnetic data.
|
|
41
|
+
|
|
42
|
+
## Data formatting
|
|
43
|
+
The data in EEG-DaSh is formatted to facilitate machine learning (ML) and deep learning (DL) applications by using a simplified structure commonly adopted by these communities. This will involve converting raw MEEG data into a matrix format, where samples (e.g., individual EEG or MEG recordings) are represented by rows, and values (such as time or channel data) are represented by columns. The data is also divided into training and testing sets, with 80% of the data allocated for training and 20% for testing, ensuring a balanced representation of relevant labels across sets. Hierarchical Event Descriptor (HED) tags will be used to annotate labels, which will be stored in a text table, and detailed metadata, including dataset origins and methods. This formatting process will ensure that data is ready for ML/DL models, allowing for efficient training and testing of algorithms while preserving data integrity and reusability.
|
|
44
|
+
|
|
45
|
+

|
|
46
|
+
|
|
47
|
+
## Data access
|
|
48
|
+
The data in EEG-DaSh is formatted to facilitate machine learning (ML) and deep learning (DL) applications by using a simplified structure commonly adopted by these communities. This will involve converting raw MEEG data into a matrix format, where samples (e.g., individual EEG or MEG recordings) are represented by rows, and values (such as time or channel data) are represented by columns. The data is also divided into training and testing sets, with 80% of the data allocated for training and 20% for testing, ensuring a balanced representation of relevant labels across sets. Hierarchical Event Descriptor (HED) tags will be used to annotate labels, which will be stored in a text table, and detailed metadata, including dataset origins and methods. This formatting process will ensure that data is ready for ML/DL models, allowing for efficient training and testing of algorithms while preserving data integrity and reusability.
|
|
49
|
+
|
|
50
|
+
The data in EEG-DaSh is accessed through Python and MATLAB libraries specifically designed for this platform. These libraries will use objects compatible with deep learning data storage formats in each language, such as <i>Torchvision.dataset</i> in Python and <i>DataStore</i> in MATLAB. Users can dynamically fetch data from the EEG-DaSh server which is then cached locally.
|
|
51
|
+
|
|
52
|
+
### AWS S3
|
|
53
|
+
|
|
54
|
+
Coming soon...
|
|
55
|
+
|
|
56
|
+
### EEG-Dash API
|
|
57
|
+
|
|
58
|
+
Coming soon...
|
|
59
|
+
|
|
60
|
+
## Education
|
|
61
|
+
|
|
62
|
+
We organize workshops and educational events to foster cross-cultural education and student training, offering both online and in-person opportunities in collaboration with US and Israeli partners. There is no event planned for 2024. Events for 2025 will be advertised on the EEGLABNEWS mailing list so make sure to [subscribe](https://sccn.ucsd.edu/mailman/listinfo/eeglabnews).
|
|
63
|
+
|
|
64
|
+
## About EEG-DaSh
|
|
65
|
+
|
|
66
|
+
EEG-DaSh is a collaborative initiative between the United States and Israel, supported by the National Science Foundation (NSF). The partnership brings together experts from the Swartz Center for Computational Neuroscience (SCCN) at the University of California San Diego (UCSD) and Ben-Gurion University (BGU) in Israel.
|
|
67
|
+
|
|
68
|
+

|
|
69
|
+
|
|
70
|
+
|
|
71
|
+
|
|
72
|
+
|