meegflow 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. meegflow-0.1.1/PKG-INFO +1033 -0
  2. meegflow-0.1.1/README.md +999 -0
  3. meegflow-0.1.1/setup.cfg +4 -0
  4. meegflow-0.1.1/setup.py +46 -0
  5. meegflow-0.1.1/src/meegflow/__init__.py +3 -0
  6. meegflow-0.1.1/src/meegflow/adaptive_reject.py +235 -0
  7. meegflow-0.1.1/src/meegflow/cli.py +247 -0
  8. meegflow-0.1.1/src/meegflow/pipeline.py +2153 -0
  9. meegflow-0.1.1/src/meegflow/readers.py +454 -0
  10. meegflow-0.1.1/src/meegflow/report.py +344 -0
  11. meegflow-0.1.1/src/meegflow/utils.py +12 -0
  12. meegflow-0.1.1/src/meegflow/viz.py +53 -0
  13. meegflow-0.1.1/src/meegflow.egg-info/PKG-INFO +1033 -0
  14. meegflow-0.1.1/src/meegflow.egg-info/SOURCES.txt +31 -0
  15. meegflow-0.1.1/src/meegflow.egg-info/dependency_links.txt +1 -0
  16. meegflow-0.1.1/src/meegflow.egg-info/entry_points.txt +2 -0
  17. meegflow-0.1.1/src/meegflow.egg-info/requires.txt +9 -0
  18. meegflow-0.1.1/src/meegflow.egg-info/top_level.txt +1 -0
  19. meegflow-0.1.1/tests/test_adaptive_reject_functional.py +305 -0
  20. meegflow-0.1.1/tests/test_adaptive_reject_integration.py +158 -0
  21. meegflow-0.1.1/tests/test_cli_separation.py +188 -0
  22. meegflow-0.1.1/tests/test_custom_steps.py +275 -0
  23. meegflow-0.1.1/tests/test_excluded_channels.py +298 -0
  24. meegflow-0.1.1/tests/test_excluded_channels_integration.py +94 -0
  25. meegflow-0.1.1/tests/test_find_flat_channels.py +544 -0
  26. meegflow-0.1.1/tests/test_find_matching_paths_integration.py +207 -0
  27. meegflow-0.1.1/tests/test_get_entity_values.py +251 -0
  28. meegflow-0.1.1/tests/test_html_report_enhancements.py +257 -0
  29. meegflow-0.1.1/tests/test_pipeline_structure.py +222 -0
  30. meegflow-0.1.1/tests/test_progress_bars.py +124 -0
  31. meegflow-0.1.1/tests/test_readers.py +335 -0
  32. meegflow-0.1.1/tests/test_readers_integration.py +235 -0
  33. meegflow-0.1.1/tests/test_report_module.py +167 -0
@@ -0,0 +1,1033 @@
1
+ Metadata-Version: 2.4
2
+ Name: meegflow
3
+ Version: 0.1.1
4
+ Summary: A modular, configuration-driven extensible M/EEG preprocessing pipeline using MNE-Python
5
+ Home-page: https://github.com/Picnic-DoC/meegflow
6
+ Author: Laouen Belloli
7
+ Classifier: Development Status :: 3 - Alpha
8
+ Classifier: Intended Audience :: Science/Research
9
+ Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.8
12
+ Classifier: Programming Language :: Python :: 3.9
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Requires-Python: >=3.8
16
+ Description-Content-Type: text/markdown
17
+ Requires-Dist: mne>=1.5.0
18
+ Requires-Dist: mne-bids>=0.14
19
+ Requires-Dist: numpy>=1.24.0
20
+ Requires-Dist: scipy>=1.11.0
21
+ Requires-Dist: PyYAML>=6.0
22
+ Requires-Dist: rich>=13.0.0
23
+ Requires-Dist: scikit-learn>=1.8.0
24
+ Requires-Dist: matplotlib>=3.7.0
25
+ Requires-Dist: pandas>=2.0.0
26
+ Dynamic: author
27
+ Dynamic: classifier
28
+ Dynamic: description
29
+ Dynamic: description-content-type
30
+ Dynamic: home-page
31
+ Dynamic: requires-dist
32
+ Dynamic: requires-python
33
+ Dynamic: summary
34
+
35
+ # MEEGFlow: MEEG Preprocessing Pipeline
36
+
37
+ A modular, configuration-driven MEEG preprocessing pipeline using MNE-BIDS. The pipeline uses auxiliary functions for each preprocessing step, allowing you to choose which steps to run, their order, and their parameters through a simple YAML configuration.
38
+
39
+ ## Features
40
+
41
+ - **Flexible File Discovery**: Support for both BIDS-formatted datasets and custom glob patterns
42
+ - **MNE-BIDS Integration**: Seamlessly reads MEEG data in BIDS format
43
+ - **Modular Design**: Each preprocessing step is a separate function
44
+ - **Configuration-Driven**: Choose steps, their order, and parameters via YAML
45
+ - **Custom Steps Support**: Extend the pipeline with your own preprocessing functions
46
+ - **Progress Tracking**: Rich progress bars show real-time progress for recordings and preprocessing steps
47
+ - **Comprehensive Logging**: MNE logger integration with optional log file output
48
+ - **Multiple Output Formats**:
49
+ - Clean preprocessed epochs in `.fif` format
50
+ - Clean preprocessed raw data in `.fif` format
51
+ - Interactive HTML reports using MNE Report
52
+ - JSON reports for easy downstream processing
53
+ - **Batch Processing**: Process multiple subjects sequentially
54
+ - **Command-line Interface**: Easy to use from the terminal
55
+
56
+ ## Installation
57
+
58
+ ### Option 1: Docker (Recommended)
59
+
60
+ Using Docker is the easiest way to get started, as it includes all dependencies and system libraries.
61
+
62
+ 1. Build the Docker image:
63
+ ```bash
64
+ git clone https://github.com/Laouen/meegflow.git
65
+ cd meegflow
66
+ docker build -t meegflow .
67
+ ```
68
+
69
+ 2. Run the container:
70
+ ```bash
71
+ docker run --rm -v /path/to/bids/data:/data meegflow \
72
+ --bids-root /data \
73
+ --subjects 01 02 \
74
+ --tasks rest \
75
+ --config /app/configs/config_example.yaml
76
+ ```
77
+
78
+ ### Option 2: Local Installation
79
+
80
+ 1. Clone this repository:
81
+ ```bash
82
+ git clone https://github.com/Laouen/meegflow.git
83
+ cd meegflow
84
+ ```
85
+
86
+ 2. Install dependencies:
87
+ ```bash
88
+ pip install -r requirements.txt
89
+ ```
90
+
91
+ 3. (Optional) Install the package to use the `meegflow` command:
92
+ ```bash
93
+ pip install -e .
94
+ ```
95
+
96
+ ## Usage
97
+
98
+ ### Using Docker
99
+
100
+ To use the Docker image, mount your BIDS dataset directory to `/data` in the container. The outputs will be written to the `derivatives/meegflow` subdirectory within your BIDS root.
101
+
102
+ **Basic usage:**
103
+ ```bash
104
+ docker run --rm \
105
+ -v /path/to/bids:/data \
106
+ meegflow \
107
+ --bids-root /data \
108
+ --tasks rest
109
+ ```
110
+
111
+ **With custom configuration:**
112
+ ```bash
113
+ docker run --rm \
114
+ -v /path/to/bids:/data \
115
+ -v /path/to/custom/config.yaml:/config.yaml \
116
+ meegflow \
117
+ --bids-root /data \
118
+ --subjects 01 02 03 \
119
+ --tasks rest \
120
+ --config /config.yaml
121
+ ```
122
+
123
+ **With log file output:**
124
+ ```bash
125
+ docker run --rm \
126
+ -v /path/to/bids:/data \
127
+ -v /path/to/logs:/logs \
128
+ meegflow \
129
+ --bids-root /data \
130
+ --tasks rest \
131
+ --log-file /logs/pipeline.log
132
+ ```
133
+
134
+ **Processing specific sessions:**
135
+ ```bash
136
+ docker run --rm \
137
+ -v /path/to/bids:/data \
138
+ meegflow \
139
+ --bids-root /data \
140
+ --subjects 01 02 \
141
+ --sessions 01 02 \
142
+ --tasks rest
143
+ ```
144
+
145
+ ### Using Local Installation
146
+
147
+ #### Process Multiple Subjects
148
+
149
+ Run the preprocessing pipeline on multiple subjects:
150
+
151
+ ```bash
152
+ python src/cli.py \
153
+ --bids-root /path/to/bids/dataset \
154
+ --subjects 01 02 03 \
155
+ --tasks rest \
156
+ --config configs/config_example.yaml
157
+ ```
158
+
159
+ If you installed the package with `pip install -e .`, you can use the `meegflow` command:
160
+
161
+ ```bash
162
+ meegflow \
163
+ --bids-root /path/to/bids/dataset \
164
+ --subjects 01 02 03 \
165
+ --tasks rest \
166
+ --config configs/config_example.yaml
167
+ ```
168
+
169
+ Process all subjects with a specific task:
170
+
171
+ ```bash
172
+ python src/cli.py \
173
+ --bids-root /path/to/bids/dataset \
174
+ --tasks rest
175
+ ```
176
+
177
+ Process specific subjects with multiple tasks:
178
+
179
+ ```bash
180
+ python src/cli.py \
181
+ --bids-root /path/to/bids/dataset \
182
+ --subjects 01 02 \
183
+ --tasks rest task1 task2
184
+ ```
185
+
186
+ #### Python API Usage
187
+
188
+ You can also use the pipeline directly in Python:
189
+
190
+ ```python
191
+ import sys
192
+ sys.path.insert(0, 'src')
193
+ from meegflow import MEEGFlowPipeline
194
+ from readers import BIDSReader
195
+
196
+ # Load configuration
197
+ import yaml
198
+ with open('configs/config_example.yaml', 'r') as f:
199
+ config = yaml.safe_load(f)
200
+
201
+ # Create a BIDS reader
202
+ reader = BIDSReader('/path/to/bids/dataset')
203
+
204
+ # Initialize pipeline
205
+ pipeline = MEEGFlowPipeline(
206
+ reader=reader,
207
+ output_root='/path/to/derivatives',
208
+ config=config
209
+ )
210
+
211
+ # Run preprocessing on multiple subjects
212
+ results = pipeline.run_pipeline(
213
+ subjects=['01', '02', '03'],
214
+ tasks='rest'
215
+ )
216
+
217
+ # Access results for each subject
218
+ for subject, result in results.items():
219
+ print(f"Subject {subject}: {result}")
220
+ ```
221
+
222
+ ## File Discovery with Readers
223
+
224
+ The pipeline supports two types of file readers for discovering data files:
225
+
226
+ ### BIDS Reader (Default)
227
+
228
+ The BIDS reader uses MNE-BIDS to automatically discover files in BIDS-formatted datasets:
229
+
230
+ ```bash
231
+ # BIDS reader is the default (--reader bids can be omitted)
232
+ python src/cli.py \
233
+ --bids-root /path/to/bids/dataset \
234
+ --subjects 01 02 \
235
+ --tasks rest \
236
+ --config configs/config_example.yaml
237
+ ```
238
+
239
+ ### Glob Reader
240
+
241
+ The glob reader allows you to work with custom directory structures using glob patterns with variable extraction:
242
+
243
+ ```bash
244
+ python src/cli.py \
245
+ --reader glob \
246
+ --data-root /path/to/data \
247
+ --glob-pattern "sub-{subject}/ses-{session}/eeg/sub-{subject}_task-{task}_eeg.vhdr" \
248
+ --subjects 01 02 \
249
+ --tasks rest \
250
+ --config configs/config_example.yaml
251
+ ```
252
+
253
+ **Pattern syntax:** Use `{variable_name}` placeholders which:
254
+ - Convert to `*` wildcards for file matching
255
+ - Extract matched values as metadata
256
+
257
+ **Python API:**
258
+
259
+ ```python
260
+ from readers import GlobReader
261
+
262
+ # Create a glob reader with your custom pattern
263
+ reader = GlobReader(
264
+ data_root='/path/to/data',
265
+ pattern='sub-{subject}/ses-{session}/eeg/sub-{subject}_task-{task}_eeg.vhdr'
266
+ )
267
+
268
+ # Initialize pipeline with the glob reader
269
+ pipeline = MEEGFlowPipeline(
270
+ reader=reader,
271
+ config=config
272
+ )
273
+
274
+ # Run pipeline
275
+ results = pipeline.run_pipeline(subjects=['01', '02'], tasks='rest')
276
+ ```
277
+
278
+ For detailed information on readers, pattern examples, and troubleshooting, see [READERS.md](READERS.md).
279
+
280
+ ## Output Structure
281
+
282
+ The pipeline creates outputs in a BIDS-derivatives structure:
283
+
284
+ ```
285
+ derivatives/meegflow/
286
+ ├── epochs/ # When saving epochs with save_clean_instance
287
+ │ └── sub-01/
288
+ │ └── eeg/
289
+ │ └── sub-01_task-rest_proc-clean_desc-cleaned_epo.fif
290
+ ├── raw/ # When saving raw data with save_clean_instance
291
+ │ └── sub-01/
292
+ │ └── eeg/
293
+ │ └── sub-01_task-rest_proc-clean_desc-cleaned_epo.fif
294
+ └── reports/
295
+ └── sub-01/
296
+ └── eeg/
297
+ ├── sub-01_task-rest_proc-clean_desc-cleaned_report.json
298
+ └── sub-01_task-rest_proc-clean_desc-cleaned_report.html
299
+ ```
300
+
301
+ ### Output Details
302
+
303
+ 1. **epochs/** or **raw/**: Contains MNE data objects saved in `.fif` format (if `save_clean_instance` step is included)
304
+ - Epochs can be loaded with `mne.read_epochs()`
305
+ - Raw data can be loaded with `mne.io.read_raw_fif()`
306
+ - Includes all preprocessing (filtering, artifact removal, baseline correction)
307
+
308
+ 2. **reports/**: Contains preprocessing reports
309
+ - **JSON report**: Preprocessing parameters, quality metrics, steps performed (generated by `generate_json_report` step)
310
+ - **HTML report**: Interactive visualization (generated by `generate_html_report` step)
311
+
312
+ ## Configuration
313
+
314
+ The pipeline is configuration-driven. You define a list of preprocessing steps, their order, and parameters in a YAML file.
315
+
316
+ ### Available Steps
317
+
318
+ Data Organization:
319
+ - **strip_recording**: Crop recordings to remove data outside the first and last events
320
+ - **concatenate_recordings**: Concatenate multiple raw recordings into a single continuous recording
321
+ - **copy_instance**: Create a copy of a data instance for comparison or backup purposes
322
+
323
+ Setup:
324
+ - **set_montage**: Set channel montage for EEG data
325
+ - **drop_unused_channels**: Explicitly drop specified channels by name
326
+
327
+ Filtering:
328
+ - **bandpass_filter**: Apply bandpass filtering
329
+ - **notch_filter**: Apply notch filtering
330
+
331
+ Preprocessing:
332
+ - **resample**: Resample data to different sampling frequency
333
+ - **reference**: Apply re-referencing
334
+ - **ica**: ICA-based artifact removal
335
+
336
+ Bad Channel Detection:
337
+ - **find_flat_channels**: Find flat/disconnected channels based on variance
338
+ - **find_bads_channels_threshold**: Find bad channels using threshold-based rejection
339
+ - **find_bads_channels_variance**: Find bad channels using variance-based detection
340
+ - **find_bads_channels_high_frequency**: Find bad channels using high-frequency variance
341
+
342
+ Bad Channel Handling:
343
+ - **interpolate_bad_channels**: Interpolate bad channels
344
+ - **drop_bad_channels**: Drop bad channels without interpolation
345
+
346
+ Epoching:
347
+ - **find_events**: Find events in the data
348
+ - **epoch**: Create epochs around events
349
+ - **chunk_in_epoch**: Create fixed-length epochs from continuous data
350
+ - **find_bads_epochs_threshold**: Find and remove bad epochs using threshold-based rejection
351
+
352
+ Output:
353
+ - **save_clean_instance**: Save raw or epochs data to .fif file
354
+ - **generate_json_report**: Generate JSON report
355
+ - **generate_html_report**: Generate HTML report
356
+
357
+ ### Example Configuration
358
+
359
+ See `configs/config_example.yaml` for a full pipeline with epochs:
360
+
361
+ ```yaml
362
+ pipeline:
363
+ - name: bandpass_filter
364
+ l_freq: 0.5
365
+ h_freq: 40.0
366
+ - name: reference
367
+ ref_channels: average
368
+ instance: 'raw'
369
+ - name: find_events
370
+ shortest_event: 1
371
+ - name: epoch
372
+ tmin: -0.2
373
+ tmax: 0.8
374
+ baseline: [null, 0]
375
+ event_id: null
376
+ reject:
377
+ eeg: 1.5e-04
378
+ - name: save_clean_instance
379
+ instance: epochs
380
+ - name: generate_json_report
381
+ - name: generate_html_report
382
+ ```
383
+
384
+ See `configs/config_raw_only.yaml` for a simpler pipeline without epoching:
385
+
386
+ ```yaml
387
+ pipeline:
388
+ - name: bandpass_filter
389
+ l_freq: 1.0
390
+ h_freq: 30.0
391
+ - name: reference
392
+ ref_channels: average
393
+ - name: ica
394
+ n_components: 15
395
+ method: fastica
396
+ find_eog: true
397
+ apply: true
398
+ - name: generate_json_report
399
+ ```
400
+
401
+ See `configs/config_with_adaptive_reject.yaml` for a pipeline with adaptive autoreject steps. This config includes additional preprocessing steps like montage setting, notch filtering, and resampling:
402
+
403
+ ```yaml
404
+ pipeline:
405
+ - name: concatenate_recordings
406
+
407
+ - name: set_montage
408
+ montage: standard_1020
409
+
410
+ - name: bandpass_filter
411
+ l_freq: 0.1
412
+ h_freq: 40.0
413
+
414
+ - name: notch_filter
415
+ freqs: [50.0, 100.0]
416
+
417
+ - name: resample
418
+ instance: raw
419
+ sfreq: 250.0
420
+ npad: auto
421
+
422
+ - name: find_events
423
+ get_events_from: annotations
424
+ shortest_event: 1
425
+ event_id:
426
+ stim/12hz: 10001
427
+ stim/15hz: 10002
428
+
429
+ - name: epoch
430
+ tmin: -0.2
431
+ tmax: 1.2
432
+ baseline: [null, 0.0]
433
+ reject: null
434
+
435
+ - name: reference
436
+ instance: 'epochs'
437
+ ref_channels: average
438
+
439
+ - name: reference
440
+ instance: 'raw'
441
+ ref_channels: average
442
+
443
+ - name: generate_html_report
444
+ ```
445
+
446
+ Note: This config file also includes commented-out examples of bad channel detection steps (find_bads_channels_threshold, find_bads_channels_variance, find_bads_channels_high_frequency) that can be uncommented and customized as needed.
447
+
448
+ See `configs/config_minimal.yaml` for a comprehensive pipeline including strip_recording, copy_instance, and ICA:
449
+
450
+ ```yaml
451
+ pipeline:
452
+ - name: strip_recording
453
+ instance: all_raw
454
+ get_events_from: annotations
455
+ shortest_event: 5
456
+ start_padding: 1
457
+ end_padding: 1
458
+
459
+ - name: concatenate_recordings
460
+
461
+ - name: set_montage
462
+ montage: GSN-HydroCel-256
463
+
464
+ - name: copy_instance
465
+ from_instance: raw
466
+ to_instance: raw_before_cleaning
467
+
468
+ - name: find_flat_channels
469
+ threshold: 1.0e-12
470
+
471
+ - name: bandpass_filter
472
+ l_freq: 0.1
473
+ h_freq: 40.0
474
+
475
+ - name: chunk_in_epoch
476
+ duration: 1
477
+
478
+ - name: ica
479
+ n_components: 20
480
+ method: fastica
481
+ find_eog: true
482
+ apply: true
483
+
484
+ - name: save_clean_instance
485
+ instance: epochs
486
+ overwrite: true
487
+
488
+ - name: generate_html_report
489
+ compare_instances:
490
+ - title: 'Before vs After Cleaning'
491
+ instance_a:
492
+ name: 'raw'
493
+ label: 'After Cleaning'
494
+ instance_b:
495
+ name: 'raw_before_cleaning'
496
+ label: 'Before Cleaning'
497
+ ```
498
+
499
+ Additional example configurations available in `configs/`:
500
+ - `config_with_drop_bad_channels.yaml` - Example using drop_bad_channels instead of interpolation
501
+ - `config_with_excluded_channels.yaml` - Example using excluded_channels parameter to preserve reference channels
502
+ - `config_with_custom_steps.yaml` - Example showing how to integrate custom preprocessing steps
503
+
504
+ ## Command-Line Arguments
505
+
506
+ ### Required Arguments
507
+ - `--bids-root`: Path to BIDS root directory
508
+
509
+ ### Optional Filter Arguments
510
+ These arguments use the same matching logic as `mne-bids` `find_matching_paths`. If not specified, all matching files will be processed.
511
+
512
+ - `--subjects`: Subject ID(s) to process, space-separated (e.g., `--subjects 01 02 03`)
513
+ - `--sessions`: Session ID(s) to process, space-separated
514
+ - `--tasks`: Task name(s) to process, space-separated (e.g., `--tasks rest task1`)
515
+ - `--acquisitions`: Acquisition parameter(s) to process
516
+ - `--runs`: Run number(s) to process
517
+ - `--extension`: File extension to process (default: `.vhdr`)
518
+
519
+ ### Other Arguments
520
+ - `--output-root`: Custom output path (optional, defaults to `bids-root/derivatives/meegflow`)
521
+ - `--config`: Path to YAML configuration file (optional)
522
+ - `--log-file`: Path to log file (optional, defaults to console output)
523
+ - `--log-level`: Logging level - DEBUG, INFO, WARNING, or ERROR (optional, default: INFO)
524
+
525
+ ## Custom Preprocessing Steps
526
+
527
+ The pipeline supports custom preprocessing steps, allowing you to extend the pipeline with your own processing functions without modifying the core code.
528
+
529
+ ### Creating Custom Steps
530
+
531
+ 1. **Create a Python file** with your custom step functions:
532
+
533
+ ```python
534
+ # my_custom_steps.py
535
+ def my_custom_filter(data, step_config):
536
+ """Apply custom filtering to raw data."""
537
+ if 'raw' not in data:
538
+ raise ValueError("my_custom_filter requires 'raw' in data")
539
+
540
+ # Get parameters from step_config
541
+ cutoff_freq = step_config.get('cutoff_freq', 30.0)
542
+
543
+ # Apply custom processing
544
+ data['raw'].filter(h_freq=cutoff_freq, l_freq=None)
545
+
546
+ # Record the step for reporting
547
+ data['preprocessing_steps'].append({
548
+ 'step': 'my_custom_filter',
549
+ 'cutoff_freq': cutoff_freq
550
+ })
551
+
552
+ return data
553
+ ```
554
+
555
+ 2. **Place the file in a dedicated folder**, for example: `/path/to/my_custom_steps/`
556
+
557
+ 3. **Update your config file** to specify the custom steps folder:
558
+
559
+ ```yaml
560
+ custom_steps_folder: /path/to/my_custom_steps
561
+
562
+ pipeline:
563
+ - name: my_custom_filter
564
+ cutoff_freq: 30.0
565
+ - name: bandpass_filter # Built-in steps still work
566
+ l_freq: 0.5
567
+ h_freq: 40.0
568
+ ```
569
+
570
+ 4. **Run the pipeline** as usual - custom steps are automatically loaded and available.
571
+
572
+ ### Custom Step Requirements
573
+
574
+ Custom step functions must follow these rules:
575
+
576
+ - **Signature**: Accept exactly 2 parameters: `data` (Dict) and `step_config` (Dict)
577
+ - **Return**: Return the updated `data` dictionary
578
+ - **Validation**: Check that required data instances exist (e.g., `'raw'`, `'epochs'`)
579
+ - **Recording**: Append a summary to `data['preprocessing_steps']` for reporting
580
+ - **Naming**: Function names become step names; avoid starting with underscore
581
+
582
+ See `configs/example_custom_steps.py` for complete examples.
583
+
584
+ ### Using Custom Steps with Docker
585
+
586
+ Mount your custom steps folder when running the container:
587
+
588
+ ```bash
589
+ docker run -v /host/bids:/data \
590
+ -v /host/custom_steps:/custom_steps \
591
+ -v /host/config:/config \
592
+ meegflow \
593
+ --bids-root /data \
594
+ --subjects 01 02 \
595
+ --tasks rest \
596
+ --config /config/my_config.yaml
597
+ ```
598
+
599
+ In your config file, use the container path:
600
+
601
+ ```yaml
602
+ custom_steps_folder: /custom_steps
603
+ pipeline:
604
+ - name: my_custom_filter
605
+ cutoff_freq: 30.0
606
+ ```
607
+
608
+ ### Advanced Features
609
+
610
+ - **Override built-in steps**: Custom steps with the same name as built-in steps will override them
611
+ - **Multiple files**: Place multiple `.py` files in the custom steps folder - all will be loaded
612
+ - **Error handling**: If a custom step file has errors, other files will still be loaded
613
+ - **Private functions**: Functions starting with `_` are ignored and not loaded as steps
614
+
615
+ ## Preprocessing Steps Details
616
+
617
+ Each step can be customized through the configuration:
618
+
619
+ ### Excluding Channels from Analysis
620
+
621
+ Many preprocessing steps support an `excluded_channels` parameter that allows you to exclude specific channels (e.g., reference channels like 'Cz') from analysis to avoid reference problems. This is useful when you want to preserve a reference channel or exclude channels that should not be analyzed in certain steps.
622
+
623
+ **Steps that support `excluded_channels`:**
624
+ - `bandpass_filter` - Exclude channels from filtering
625
+ - `notch_filter` - Exclude channels from notch filtering
626
+ - `ica` - Exclude channels from ICA decomposition
627
+ - `find_flat_channels` - Exclude channels from flat channel detection
628
+ - `find_bads_channels_threshold` - Exclude channels from bad channel detection
629
+ - `find_bads_channels_variance` - Exclude channels from variance-based detection
630
+ - `find_bads_channels_high_frequency` - Exclude channels from high-frequency analysis
631
+ - `find_bads_epochs_threshold` - Exclude channels from epoch rejection criteria
632
+ - `interpolate_bad_channels` - Exclude channels from interpolation even if marked as bad
633
+ - `drop_bad_channels` - Exclude channels from dropping even if marked as bad
634
+
635
+ **Steps where exclusion doesn't apply:**
636
+ - `reference` - Reference computation uses selected channels; use `ref_channels` parameter instead
637
+ - `resample` - Resamples all data uniformly
638
+ - `set_montage` - Sets electrode positions for all channels
639
+ - `drop_unused_channels` - Use this for explicit channel removal
640
+
641
+ **Example usage:**
642
+ ```yaml
643
+ - name: bandpass_filter
644
+ l_freq: 0.5
645
+ h_freq: 45.0
646
+ excluded_channels: ['Cz'] # Exclude Cz from filtering
647
+
648
+ - name: find_bads_channels_threshold
649
+ reject:
650
+ eeg: 1.0e-4
651
+ excluded_channels: ['Cz', 'FCz'] # Don't mark these as bad
652
+
653
+ - name: drop_bad_channels
654
+ instance: epochs
655
+ excluded_channels: ['Cz'] # Don't drop Cz even if marked as bad
656
+ ```
657
+
658
+ See `configs/config_with_excluded_channels.yaml` for a complete example.
659
+
660
+ ### Data Organization Steps
661
+
662
+ #### strip_recording
663
+ Crop recordings to remove data outside the first and last events. This is useful for removing unnecessary data at the beginning and end of recordings that don't contain task-relevant data.
664
+ - `instance`: Which data instance to crop - 'all_raw' or 'raw' (default: 'raw')
665
+ - `get_events_from`: How to extract events - 'stim' or 'annotations' (default: 'annotations')
666
+ - `shortest_event`: Minimum number of samples for an event (default: 1)
667
+ - `event_id`: Event IDs to use for finding start/end points. Can be a dict mapping event names to IDs or 'auto' (default: 'auto')
668
+ - `start_padding`: Time in seconds to keep before the first event (default: 1)
669
+ - `end_padding`: Time in seconds to keep after the last event (default: 1)
670
+
671
+ **Example:**
672
+ ```yaml
673
+ - name: strip_recording
674
+ instance: all_raw
675
+ get_events_from: annotations
676
+ shortest_event: 1
677
+ event_id:
678
+ Stimulus/CatNewRepeated/CR: 91
679
+ Stimulus/CatOld/Hit: 101
680
+ start_padding: 1.0
681
+ end_padding: 1.0
682
+ ```
683
+
684
+ #### concatenate_recordings
685
+ Concatenate multiple raw recordings into a single continuous recording. This is useful when data is split across multiple files but needs to be processed as a single session.
686
+ - No parameters required
687
+ - Requires 'all_raw' to be present in data
688
+ - Creates a single 'raw' instance from all recordings in 'all_raw'
689
+
690
+ **Example:**
691
+ ```yaml
692
+ - name: concatenate_recordings
693
+ ```
694
+
695
+ #### copy_instance
696
+ Create a copy of a data instance. This is useful for comparing data at different stages of preprocessing (e.g., before/after cleaning or ICA).
697
+ - `from_instance`: Name of the instance to copy from (default: 'raw')
698
+ - `to_instance`: Name of the new instance to create (default: 'raw_cleaned')
699
+
700
+ **Example:**
701
+ ```yaml
702
+ - name: copy_instance
703
+ from_instance: raw
704
+ to_instance: raw_before_ica
705
+ ```
706
+
707
+ ### Preprocessing Steps
708
+
709
+ ### 1. set_montage
710
+ Set channel montage for EEG data. Useful when data lacks electrode position information.
711
+ - `montage`: Name of standard montage to use (default: 'standard_1020')
712
+ - Examples: 'standard_1020', 'standard_1005', 'biosemi64', etc.
713
+ - See MNE documentation for available montages
714
+
715
+ ### 2. drop_unused_channels
716
+ Explicitly drop specified channels from the data by name. Different from drop_bad_channels, this drops channels regardless of whether they're marked as bad.
717
+ - `channels_to_drop`: List of channel names to drop
718
+ - `instance`: Which data instance to drop channels from - 'raw' or 'epochs' (default: 'raw')
719
+
720
+ ### 3. bandpass_filter
721
+ Apply bandpass filtering.
722
+ - `l_freq`: High-pass filter frequency (Hz)
723
+ - `h_freq`: Low-pass filter frequency (Hz)
724
+ - `l_freq_order`: Filter order for high-pass (default: 6)
725
+ - `h_freq_order`: Filter order for low-pass (default: 8)
726
+ - `picks`: Optional channel indices to filter
727
+ - `excluded_channels`: List of channel names to exclude from filtering (optional)
728
+ - `n_jobs`: Number of parallel jobs (default: 1)
729
+
730
+ ### 4. notch_filter
731
+ Apply notch filtering to remove line noise.
732
+ - `freqs`: Frequencies to notch filter (e.g., [50.0, 100.0])
733
+ - `notch_widths`: Width of notch filters (optional)
734
+ - `method`: Filtering method (default: 'fft')
735
+ - `picks`: Optional channel indices to filter
736
+ - `excluded_channels`: List of channel names to exclude from filtering (optional)
737
+ - `n_jobs`: Number of parallel jobs (default: 1)
738
+
739
+ ### 5. resample
740
+ Resample the data to a different sampling frequency.
741
+ - `instance`: Which data instance to resample - 'raw' or 'epochs' (default: 'raw')
742
+ - `sfreq`: Target sampling frequency in Hz (default: 250)
743
+ - `npad`: Padding to use for resampling (default: 'auto')
744
+ - `resample_events`: Whether to also resample events (default: false)
745
+ - `n_jobs`: Number of parallel jobs (default: 1)
746
+
747
+ ### 6. reference
748
+ Apply re-referencing.
749
+ - `ref_channels`: Reference channels ('average' or channel names)
750
+ - `instance`: Which data instance to reference - 'raw' or 'epochs' (default: 'epochs')
751
+
752
+ ### 7. find_flat_channels
753
+ Find flat/disconnected channels based on variance threshold. Channels with variance below the threshold are marked as bad.
754
+ - `picks`: Channel indices to check (optional, default: EEG channels)
755
+ - `excluded_channels`: List of channel names to exclude from flat channel detection (optional)
756
+ - `threshold`: Variance threshold below which channels are considered flat (default: 1e-12)
757
+
758
+ ### 8. interpolate_bad_channels
759
+ Interpolate bad channels using spherical spline interpolation.
760
+ - `instance`: Which data instance to interpolate - 'raw' or 'epochs' (default: 'epochs')
761
+ - `excluded_channels`: List of channel names to exclude from interpolation (optional)
762
+
763
+ ### 9. drop_bad_channels
764
+ Drop bad channels without interpolation. This step removes channels marked as bad from the data instead of interpolating them.
765
+ - `instance`: Which data instance to drop channels from - 'raw' or 'epochs' (default: 'epochs')
766
+ - `excluded_channels`: List of channel names to exclude from dropping even if marked as bad (optional)
767
+
768
+ ### 10. ica
769
+ ICA-based artifact removal.
770
+ - `n_components`: Number of ICA components (default: 20)
771
+ - `method`: ICA method ('fastica', 'infomax', 'picard', default: 'fastica')
772
+ - `random_state`: Random state for reproducibility (default: 97)
773
+ - `picks`: Channel types to include in ICA (optional, default: EEG channels)
774
+ - `excluded_channels`: List of channel names to exclude from ICA decomposition (optional)
775
+ - `ica_fit_l_freq`: High-pass frequency for filtering data before ICA fit (default: 1.0 Hz)
776
+ - `ica_fit_h_freq`: Low-pass frequency for filtering data before ICA fit (optional, default: None)
777
+ - `find_eog`: Automatically find EOG artifacts (true/false, default: false)
778
+ - `eog_channels`: List of channel names to use for EOG detection (optional, auto-detects if not provided)
779
+ - `eog_threshold`: Correlation threshold for EOG component detection (default: 'auto')
780
+ - `eog_measure`: Measure for EOG detection ('correlation' or 'ctps', default: 'correlation')
781
+ - `eog_l_freq`: High-pass frequency for EOG correlation (default: 1.0 Hz)
782
+ - `eog_h_freq`: Low-pass frequency for EOG correlation (default: 10.0 Hz)
783
+ - `find_ecg`: Automatically find ECG artifacts (true/false, default: false)
784
+ - `ecg_channels`: List of channel names to use for ECG detection (optional)
785
+ - `ecg_threshold`: Correlation threshold for ECG component detection (default: 'auto')
786
+ - `ecg_measure`: Measure for ECG detection ('correlation' or 'ctps', default: 'correlation')
787
+ - `ecg_l_freq`: High-pass frequency for ECG correlation (default: 1.0 Hz)
788
+ - `ecg_h_freq`: Low-pass frequency for ECG correlation (default: 10.0 Hz)
789
+ - `selected_indices`: Manually specify component indices to exclude (optional, list of integers)
790
+ - `apply`: Apply ICA to remove artifacts (true/false, default: true)
791
+
792
+ ### 11. find_events
793
+ Find events in the data.
794
+ - `get_events_from`: How to extract events - 'stim' or 'annotations' (default: 'annotations')
795
+ - `shortest_event`: Minimum event duration in samples (default: 1)
796
+ - `event_id`: Event IDs to extract. Can be 'auto' for all events or a dict mapping event names to IDs (default: 'auto')
797
+
798
+ ### 12. epoch
799
+ Create epochs around events.
800
+ - `tmin`: Start time before event (seconds, default: -0.2)
801
+ - `tmax`: End time after event (seconds, default: 0.5)
802
+ - `baseline`: Baseline correction window (tuple or null, default: (null, 0.0))
803
+ - `event_id`: Event IDs to include (dict or null for all)
804
+ - `reject`: Rejection criteria (dict with channel type keys, optional)
805
+
806
+ ### 13. chunk_in_epoch
807
+ Create fixed-length epochs from continuous raw data. This is an alternative to event-based epoching that splits the data into equal-duration segments.
808
+ - `duration`: Duration of each epoch in seconds (default: 1.0)
809
+
810
+ **Example:**
811
+ ```yaml
812
+ - name: chunk_in_epoch
813
+ duration: 1.0 # Create 1-second epochs
814
+ ```
815
+
816
+ ### 14. find_bads_channels_threshold
817
+ Find bad channels using threshold-based rejection. Marks channels as bad if they exceed rejection thresholds in too many epochs.
818
+ - `picks`: Channel indices to check (optional, default: EEG channels)
819
+ - `excluded_channels`: List of channel names to exclude from bad channel detection (optional)
820
+ - `reject`: Rejection thresholds by channel type (e.g., `{"eeg": 150e-6}`)
821
+ - `n_epochs_bad_ch`: Fraction or number of epochs a channel must be bad in to be marked as bad (default: 0.5)
822
+ - `apply_on`: List of instances to mark bad channels on (default: ['epochs'])
823
+
824
+ ### 15. find_bads_channels_variance
825
+ Find bad channels using variance-based detection. Identifies channels with abnormally high or low variance.
826
+ - `instance`: Which data instance to use - 'raw' or 'epochs' (default: 'epochs')
827
+ - `picks`: Channel indices to check (optional, default: EEG channels)
828
+ - `excluded_channels`: List of channel names to exclude from variance analysis (optional)
829
+ - `zscore_thresh`: Z-score threshold for outlier detection (default: 4)
830
+ - `max_iter`: Maximum iterations for iterative outlier removal (default: 2)
831
+ - `apply_on`: List of instances to mark bad channels on (default: [instance])
832
+
833
+ ### 15. find_bads_channels_high_frequency
834
+ Find bad channels using high-frequency variance. Detects channels with excessive high-frequency noise.
835
+ - `instance`: Which data instance to use - 'raw' or 'epochs' (default: 'epochs')
836
+ - `picks`: Channel indices to check (optional, default: EEG channels)
837
+ - `excluded_channels`: List of channel names to exclude from high-frequency analysis (optional)
838
+ - `zscore_thresh`: Z-score threshold for outlier detection (default: 4)
839
+ - `max_iter`: Maximum iterations for iterative outlier removal (default: 2)
840
+ - `apply_on`: List of instances to mark bad channels on (default: [instance])
841
+
842
+ ### 16. find_bads_epochs_threshold
843
+ Find and remove bad epochs using threshold-based rejection. Drops epochs that have too many bad channels.
844
+ - `picks`: Channel indices to check (optional, default: EEG channels)
845
+ - `excluded_channels`: List of channel names to exclude from epoch rejection criteria (optional)
846
+ - `reject`: Rejection thresholds by channel type (e.g., `{"eeg": 150e-6}`)
847
+ - `n_channels_bad_epoch`: Fraction or number of channels that must be bad for an epoch to be rejected (default: 0.1)
848
+
849
+ ### 17. save_clean_instance
850
+ Save clean raw or epochs data to .fif file in BIDS-derivatives format.
851
+ - `instance`: Which data instance to save - 'raw' or 'epochs' (default: 'epochs')
852
+ - `overwrite`: Whether to overwrite existing files (default: true)
853
+
854
+ ### 18. generate_json_report
855
+ Generate JSON report with preprocessing information. No parameters needed.
856
+
857
+ ### 19. generate_html_report
858
+ Generate HTML report with interactive visualizations.
859
+ - `picks`: Channel types to include in plots (optional, default: EEG channels)
860
+ - `excluded_channels`: List of channel names to exclude from plots (optional)
861
+ - `compare_instances`: List of instance comparisons to plot (optional, see config_minimal.yaml for example)
862
+ - `plot_raw_kwargs`: Additional keyword arguments for raw data plots (optional, dict)
863
+ - `plot_ica_kwargs`: Additional keyword arguments for ICA plots (optional, dict)
864
+ - `plot_events_kwargs`: Additional keyword arguments for event plots (optional, dict)
865
+ - `plot_epochs_kwargs`: Additional keyword arguments for epoch plots (optional, dict)
866
+ - `plot_evokeds_kwargs`: Additional keyword arguments for evoked response plots (optional, dict)
867
+
868
+ ## Batch Processing
869
+
870
+ The pipeline processes multiple subjects and files sequentially. You can process:
871
+
872
+ ```bash
873
+ # Process specific subjects with a specific task
874
+ python src/cli.py \
875
+ --bids-root /path/to/bids/dataset \
876
+ --subjects 01 02 03 04 05 \
877
+ --tasks rest \
878
+ --config configs/config_example.yaml
879
+
880
+ # Process all subjects in the dataset
881
+ python src/cli.py \
882
+ --bids-root /path/to/bids/dataset \
883
+ --config configs/config_example.yaml
884
+
885
+ # Process specific sessions for specific subjects
886
+ python src/cli.py \
887
+ --bids-root /path/to/bids/dataset \
888
+ --subjects 01 02 \
889
+ --sessions 01 02 \
890
+ --tasks rest
891
+ ```
892
+
893
+ For HPC/cluster environments, you can create your own SLURM or other batch submission scripts that call the pipeline with subject lists.
894
+
895
+ ## Progress Tracking and Logging
896
+
897
+ The pipeline includes comprehensive progress tracking and logging features:
898
+
899
+ ### Progress Bars
900
+
901
+ When running the pipeline, you'll see two levels of progress bars:
902
+ 1. **Overall progress**: Shows progress across all recordings being processed
903
+ 2. **Step progress**: Shows progress through preprocessing steps for each recording
904
+
905
+ The progress bars use the `rich` library and display:
906
+ - Spinner animation
907
+ - Progress bar with percentage
908
+ - Time remaining estimate
909
+ - Current step being executed
910
+
911
+ ### Logging
912
+
913
+ The pipeline uses MNE's logger for all output messages. You can:
914
+
915
+ **Console Output (default)**:
916
+ ```bash
917
+ python src/cli.py \
918
+ --bids-root /path/to/bids/dataset \
919
+ --subjects 01 02
920
+ ```
921
+
922
+ **Log to File**:
923
+ ```bash
924
+ python src/cli.py \
925
+ --bids-root /path/to/bids/dataset \
926
+ --subjects 01 02 \
927
+ --log-file /path/to/logs/pipeline.log
928
+ ```
929
+
930
+ **Adjust Logging Level**:
931
+ ```bash
932
+ python src/cli.py \
933
+ --bids-root /path/to/bids/dataset \
934
+ --subjects 01 02 \
935
+ --log-level DEBUG
936
+ ```
937
+
938
+ Available log levels: `DEBUG`, `INFO` (default), `WARNING`, `ERROR`
939
+
940
+ The pipeline also saves a summary of results to `derivatives/meegflow/pipeline_results.json` for easy programmatic access.
941
+
942
+ ## Docker Notes
943
+
944
+ ### Volume Mounting
945
+
946
+ When using Docker, you need to mount your local directories to paths inside the container using the `-v` flag:
947
+
948
+ - **BIDS dataset**: Mount your BIDS root directory to `/data` or any path you specify with `--bids-root`
949
+ - **Configuration files**: Mount custom config files if not using the built-in configs in `/app/configs/`
950
+ - **Output directory**: The pipeline writes outputs to `<bids-root>/derivatives/meegflow/` by default
951
+ - **Log files**: If using `--log-file`, mount a directory for log output
952
+
953
+ ### File Permissions
954
+
955
+ The Docker container runs as root by default. Files created by the container will be owned by root. To avoid permission issues:
956
+
957
+ 1. Run with your user ID:
958
+ ```bash
959
+ docker run --rm --user $(id -u):$(id -g) \
960
+ -v /path/to/bids:/data \
961
+ meegflow \
962
+ --bids-root /data \
963
+ --tasks rest
964
+ ```
965
+
966
+ 2. Or fix permissions after processing:
967
+ ```bash
968
+ sudo chown -R $USER:$USER /path/to/bids/derivatives
969
+ ```
970
+
971
+ ### Using Built-in Configurations
972
+
973
+ The Docker image includes several pre-configured pipeline examples in `/app/configs/`:
974
+ - `/app/configs/config_example.yaml` - Standard pipeline with epochs
975
+ - `/app/configs/config_raw_only.yaml` - Raw data processing without epoching
976
+ - `/app/configs/config_with_adaptive_reject.yaml` - Advanced pipeline with concatenation and event-based epochs
977
+ - `/app/configs/config_minimal.yaml` - Comprehensive pipeline with strip_recording, ICA, and instance comparison
978
+ - `/app/configs/config_with_drop_bad_channels.yaml` - Pipeline using drop_bad_channels instead of interpolation
979
+ - `/app/configs/config_with_excluded_channels.yaml` - Pipeline demonstrating excluded_channels parameter
980
+ - `/app/configs/config_with_custom_steps.yaml` - Example template for using custom preprocessing steps
981
+
982
+ Example using a built-in config:
983
+ ```bash
984
+ docker run --rm \
985
+ -v /path/to/bids:/data \
986
+ meegflow \
987
+ --bids-root /data \
988
+ --tasks rest \
989
+ --config /app/configs/config_with_adaptive_reject.yaml
990
+ ```
991
+
992
+ ### Building from Source
993
+
994
+ If you want to customize the Docker image or use a development version:
995
+
996
+ ```bash
997
+ git clone https://github.com/Laouen/meegflow.git
998
+ cd meegflow
999
+ docker build -t meegflow:custom .
1000
+ ```
1001
+
1002
+ **Building in CI/CD environments with self-signed certificates:**
1003
+
1004
+ If you're building in a CI/CD environment with self-signed SSL certificates, use the `PIP_TRUSTED_HOST` build argument:
1005
+
1006
+ ```bash
1007
+ docker build --build-arg PIP_TRUSTED_HOST=1 -t meegflow:custom .
1008
+ ```
1009
+
1010
+ Note: This disables SSL verification for PyPI and should only be used in trusted CI/CD environments, not for production builds.
1011
+
1012
+ ## Requirements
1013
+
1014
+ - Python >= 3.8
1015
+ - mne >= 1.5.0
1016
+ - mne-bids >= 0.14
1017
+ - numpy >= 1.24.0
1018
+ - scipy >= 1.11.0
1019
+ - rich >= 13.0.0
1020
+ - matplotlib >= 3.7.0 (recommended)
1021
+ - pandas >= 2.0.0 (recommended)
1022
+
1023
+ ## License
1024
+
1025
+ This project is ready to use for several projects and includes scripts for SLURM execution.
1026
+
1027
+ ## Contributing
1028
+
1029
+ Contributions are welcome! Please feel free to submit a Pull Request.
1030
+
1031
+ ## Support
1032
+
1033
+ For issues or questions, please open an issue on the GitHub repository.