@yibeichan/claude-skills 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +98 -0
- package/cli.js +272 -0
- package/install.py +240 -0
- package/package.json +44 -0
- package/skills/bidsapp-nidm-standards/SKILL.md +202 -0
- package/skills/bidsapp-nidm-standards/references/babs_config.md +20 -0
- package/skills/bidsapp-nidm-standards/references/cli_arguments.md +76 -0
- package/skills/bidsapp-nidm-standards/references/container_patterns.md +53 -0
- package/skills/bidsapp-nidm-standards/references/nidm_integration.md +403 -0
- package/skills/bidsapp-nidm-standards/references/repo_structure.md +121 -0
- package/skills/bidsapp-nidm-standards/references/testing_patterns.md +82 -0
- package/skills/dicom2fmriprep/SKILL.md +377 -0
- package/skills/dicom2fmriprep/evals/evals.json +26 -0
- package/skills/dicom2fmriprep/references/babs-details.md +407 -0
- package/skills/dicom2fmriprep/references/fmriprep-details.md +250 -0
- package/skills/dicom2fmriprep/references/heudiconv-details.md +243 -0
- package/skills/fmri-ssm/SKILL.md +317 -0
- package/skills/fmri-ssm/references/code_templates.md +1570 -0
- package/skills/fmri-ssm/references/downstream_analysis.md +680 -0
- package/skills/fmri-ssm/references/group_inference.md +608 -0
- package/skills/fmri-ssm/references/hrf_modeling.md +447 -0
- package/skills/fmri-ssm/references/model_catalog.md +436 -0
- package/skills/fmri-ssm/references/paradigm_guide.md +406 -0
- package/skills/fmri-ssm/references/preprocessing.md +614 -0
- package/skills/fmri-ssm.zip +0 -0
- package/skills/neuroimaging-qc/SKILL.md +203 -0
- package/skills/neuroimaging-qc/references/eeg_qc.md +400 -0
- package/skills/neuroimaging-qc/references/fmri_qc.md +343 -0
- package/skills/neuroimaging-qc/references/fnirs_qc.md +430 -0
- package/skills/neuroimaging-qc/references/structural_qc.md +454 -0
- package/skills/neuroimaging-qc/scripts/parse_fmriprep_confounds.py +153 -0
- package/skills/neuroimaging-qc/scripts/parse_mriqc.py +114 -0
- package/skills/neuroimaging-qc/scripts/qc_report.py +295 -0
- package/skills/scientific-writer/SKILL.md +202 -0
- package/skills/scientific-writer/references/citation_styles.md +163 -0
- package/skills/scientific-writer/references/field_conventions.md +245 -0
- package/skills/scientific-writer/references/figures_tables.md +225 -0
- package/skills/scientific-writer/references/reporting_guidelines.md +225 -0
- package/skills.json +54 -0
|
@@ -0,0 +1,614 @@
|
|
|
1
|
+
# Preprocessing fMRI Data for State-Space Modeling
|
|
2
|
+
|
|
3
|
+
## Table of Contents
|
|
4
|
+
1. [fMRIPrep Output Structure](#fmriprep-outputs)
|
|
5
|
+
2. [XCP-D Denoising for SSMs](#xcpd)
|
|
6
|
+
3. [Confound Strategy](#confounds)
|
|
7
|
+
4. [Dimensionality Reduction](#dim-reduction)
|
|
8
|
+
5. [Parcellation](#parcellation)
|
|
9
|
+
6. [CIFTI Surface-Based Processing](#cifti)
|
|
10
|
+
7. [ICA-Based Approaches](#ica)
|
|
11
|
+
8. [Temporal Filtering](#filtering)
|
|
12
|
+
9. [Data Quality Checks Before SSM Fitting](#qc)
|
|
13
|
+
10. [Preparing the Data Matrix](#data-matrix)
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## 1. fMRIPrep Output Structure {#fmriprep-outputs}
|
|
18
|
+
|
|
19
|
+
fMRIPrep produces minimally preprocessed data with extensive metadata. Key outputs for SSMs:
|
|
20
|
+
|
|
21
|
+
**BOLD data (choose one):**
|
|
22
|
+
- `*_space-MNI152NLin6Asym_res-2_desc-preproc_bold.nii.gz` — volumetric, MNI space
|
|
23
|
+
- `*_space-fsLR_den-91k_bold.dtseries.nii` — CIFTI surface (preferred for surface analyses)
|
|
24
|
+
- `*_space-T1w_desc-preproc_bold.nii.gz` — native T1w space (for subject-specific parcellations)
|
|
25
|
+
|
|
26
|
+
**Confounds file:**
|
|
27
|
+
- `*_desc-confounds_timeseries.tsv` — all computed confounds (100+ columns)
|
|
28
|
+
- Use selectively — do NOT regress out everything
|
|
29
|
+
|
|
30
|
+
**Brain masks:**
|
|
31
|
+
- `*_space-MNI152NLin6Asym_res-2_desc-brain_mask.nii.gz`
|
|
32
|
+
|
|
33
|
+
**Transforms (for custom parcellation):**
|
|
34
|
+
- `*_from-MNI152NLin6Asym_to-T1w_mode-image_xfm.h5` (and reverse)
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## 2. XCP-D Denoising for SSMs {#xcpd}
|
|
39
|
+
|
|
40
|
+
XCP-D applies denoising strategies to fMRIPrep outputs. For SSM analyses, the key choices are:
|
|
41
|
+
|
|
42
|
+
**Recommended pipeline:** `36P` or `acompcor` denoising strategy
|
|
43
|
+
|
|
44
|
+
**36P strategy:**
|
|
45
|
+
- 6 motion parameters + their temporal derivatives + quadratic terms (24 motion regressors)
|
|
46
|
+
- Mean WM signal + derivative + quadratic (4 regressors)
|
|
47
|
+
- Mean CSF signal + derivative + quadratic (4 regressors)
|
|
48
|
+
- Mean global signal + derivative + quadratic (4 regressors) — CONTROVERSIAL, see below
|
|
49
|
+
|
|
50
|
+
**aCompCor strategy (alternative):**
|
|
51
|
+
- Top 5 aCompCor components from WM + CSF
|
|
52
|
+
- 6 motion parameters + temporal derivatives
|
|
53
|
+
- Avoids global signal regression
|
|
54
|
+
|
|
55
|
+
**Global signal regression (GSR) — the controversy for SSMs:**
|
|
56
|
+
GSR removes variance shared across all brain regions. This:
|
|
57
|
+
- Removes global arousal/drowsiness fluctuations (often desirable for resting-state SSMs)
|
|
58
|
+
- But introduces mathematical anticorrelations between regions
|
|
59
|
+
- These anticorrelations can create artifactual "anticorrelated" states in HMMs
|
|
60
|
+
- **Recommendation:** Run analyses both with and without GSR. If your states change
|
|
61
|
+
dramatically, the GSR-sensitive states may be artifacts.
|
|
62
|
+
|
|
63
|
+
**XCP-D output for SSMs:**
|
|
64
|
+
```
|
|
65
|
+
xcp_d/sub-01/func/
|
|
66
|
+
sub-01_task-rest_space-MNI152NLin6Asym_res-2_desc-denoised_bold.nii.gz
|
|
67
|
+
sub-01_task-rest_space-fsLR_den-91k_desc-denoised_bold.dtseries.nii
|
|
68
|
+
sub-01_task-rest_desc-confounds_timeseries.tsv # residual confounds
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## 3. Confound Strategy {#confounds}
|
|
74
|
+
|
|
75
|
+
### Minimum recommended confounds (if not using XCP-D)
|
|
76
|
+
|
|
77
|
+
```python
|
|
78
|
+
import pandas as pd
|
|
79
|
+
|
|
80
|
+
def load_confounds_for_ssm(confounds_file, strategy='moderate'):
|
|
81
|
+
"""Load fMRIPrep confounds appropriate for SSM analysis.
|
|
82
|
+
|
|
83
|
+
Parameters
|
|
84
|
+
----------
|
|
85
|
+
confounds_file : str
|
|
86
|
+
Path to *_desc-confounds_timeseries.tsv
|
|
87
|
+
strategy : str
|
|
88
|
+
'minimal': 6 motion + WM + CSF (12 regressors)
|
|
89
|
+
'moderate': 24 motion + aCompCor top 5 (~29 regressors)
|
|
90
|
+
'aggressive': 36P (36 regressors, includes GSR)
|
|
91
|
+
"""
|
|
92
|
+
df = pd.read_csv(confounds_file, sep='\t')
|
|
93
|
+
|
|
94
|
+
# Motion parameters (always include)
|
|
95
|
+
motion_cols = ['trans_x', 'trans_y', 'trans_z', 'rot_x', 'rot_y', 'rot_z']
|
|
96
|
+
|
|
97
|
+
if strategy == 'minimal':
|
|
98
|
+
confound_cols = motion_cols + ['csf', 'white_matter']
|
|
99
|
+
|
|
100
|
+
elif strategy == 'moderate':
|
|
101
|
+
motion_derivs = [f'{c}_derivative1' for c in motion_cols]
|
|
102
|
+
motion_power2 = [f'{c}_power2' for c in motion_cols]
|
|
103
|
+
motion_deriv_power2 = [f'{c}_derivative1_power2' for c in motion_cols]
|
|
104
|
+
acompcor = [c for c in df.columns if c.startswith('a_comp_cor_')][:5]
|
|
105
|
+
confound_cols = motion_cols + motion_derivs + motion_power2 + motion_deriv_power2 + acompcor
|
|
106
|
+
|
|
107
|
+
elif strategy == 'aggressive':
|
|
108
|
+
confound_cols = [c for c in df.columns if any(c.startswith(p) for p in
|
|
109
|
+
['trans_', 'rot_', 'csf', 'white_matter', 'global_signal'])]
|
|
110
|
+
# Keep only the 36P set
|
|
111
|
+
confound_cols = [c for c in confound_cols if not c.startswith('a_comp_cor')]
|
|
112
|
+
|
|
113
|
+
confounds = df[confound_cols].values
|
|
114
|
+
# Handle NaN in first row (derivatives)
|
|
115
|
+
confounds = np.nan_to_num(confounds, nan=0.0)
|
|
116
|
+
|
|
117
|
+
return confounds, confound_cols
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### Motion scrubbing / censoring
|
|
121
|
+
|
|
122
|
+
High-motion time points can create artifactual states. Two approaches:
|
|
123
|
+
|
|
124
|
+
**Approach A: Scrub before fitting (recommended)**
|
|
125
|
+
Remove high-motion TRs (framewise displacement > 0.5mm) and their neighbors. For the
|
|
126
|
+
remaining gaps, use this strategy based on gap size:
|
|
127
|
+
|
|
128
|
+
- **Short gaps (1–2 consecutive censored TRs):** Linearly interpolate across the gap so
|
|
129
|
+
the HMM sees a continuous sequence without an abrupt discontinuity. Interpolated TRs
|
|
130
|
+
do not contribute real dynamics but prevent boundary artifacts.
|
|
131
|
+
- **Longer gaps (≥3 consecutive censored TRs):** Treat as a run boundary — pass the gap
|
|
132
|
+
endpoints as separate segments in the `lengths` array. Do NOT interpolate across long
|
|
133
|
+
gaps; the interpolated signal would be fabricated.
|
|
134
|
+
|
|
135
|
+
**Approach B: Flag and verify after fitting**
|
|
136
|
+
Fit the SSM on all data, then check if any states correlate with framewise displacement.
|
|
137
|
+
If a state's occupancy correlates with FD > 0.3, it's likely motion-driven.
|
|
138
|
+
|
|
139
|
+
```python
|
|
140
|
+
def identify_high_motion_trs(confounds_file, fd_threshold=0.5, n_before=0, n_after=2):
|
|
141
|
+
"""Identify TRs to censor due to high motion.
|
|
142
|
+
|
|
143
|
+
Returns boolean mask: True = keep, False = censor.
|
|
144
|
+
"""
|
|
145
|
+
df = pd.read_csv(confounds_file, sep='\t')
|
|
146
|
+
fd = df['framewise_displacement'].values
|
|
147
|
+
fd[0] = 0 # First TR has no FD
|
|
148
|
+
|
|
149
|
+
censor = fd > fd_threshold
|
|
150
|
+
|
|
151
|
+
# Expand censoring to neighbors
|
|
152
|
+
censor_expanded = censor.copy()
|
|
153
|
+
for i in range(len(censor)):
|
|
154
|
+
if censor[i]:
|
|
155
|
+
start = max(0, i - n_before)
|
|
156
|
+
end = min(len(censor), i + n_after + 1)
|
|
157
|
+
censor_expanded[start:end] = True
|
|
158
|
+
|
|
159
|
+
keep_mask = ~censor_expanded
|
|
160
|
+
pct_removed = 100 * censor_expanded.sum() / len(censor_expanded)
|
|
161
|
+
print(f"Censoring {censor_expanded.sum()}/{len(censor)} TRs ({pct_removed:.1f}%)")
|
|
162
|
+
|
|
163
|
+
if pct_removed > 25:
|
|
164
|
+
print("WARNING: >25% of data censored. Consider excluding this run.")
|
|
165
|
+
|
|
166
|
+
return keep_mask
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
---
|
|
170
|
+
|
|
171
|
+
## 4. Dimensionality Reduction {#dim-reduction}
|
|
172
|
+
|
|
173
|
+
### When to reduce dimensions
|
|
174
|
+
- ROI timeseries from fine parcellations (>100 ROIs): full-covariance HMMs may need reduction
|
|
175
|
+
- ICA with many components (>50): consider selecting or reducing
|
|
176
|
+
- CIFTI / voxel-level data: always reduce before SSM fitting
|
|
177
|
+
- Rule of thumb: n_features should be at most T / (10 × K) for stable full-covariance estimation
|
|
178
|
+
|
|
179
|
+
### PCA
|
|
180
|
+
|
|
181
|
+
```python
|
|
182
|
+
from sklearn.decomposition import PCA
|
|
183
|
+
|
|
184
|
+
def reduce_dimensions_pca(bold_data, n_components=None, variance_explained=0.95):
|
|
185
|
+
"""PCA dimensionality reduction for SSM input.
|
|
186
|
+
|
|
187
|
+
Parameters
|
|
188
|
+
----------
|
|
189
|
+
bold_data : array, shape (T, n_features)
|
|
190
|
+
n_components : int or None
|
|
191
|
+
Fixed number of components. If None, use variance_explained.
|
|
192
|
+
variance_explained : float
|
|
193
|
+
Target cumulative variance explained (used if n_components is None)
|
|
194
|
+
"""
|
|
195
|
+
if n_components is not None:
|
|
196
|
+
pca = PCA(n_components=n_components)
|
|
197
|
+
else:
|
|
198
|
+
pca = PCA(n_components=variance_explained)
|
|
199
|
+
|
|
200
|
+
reduced = pca.fit_transform(bold_data)
|
|
201
|
+
print(f"Reduced {bold_data.shape[1]} features to {reduced.shape[1]} components")
|
|
202
|
+
print(f"Cumulative variance explained: {pca.explained_variance_ratio_.sum():.3f}")
|
|
203
|
+
|
|
204
|
+
return reduced, pca
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
### Recommended preprocessing order before dimensionality reduction
|
|
208
|
+
|
|
209
|
+
Apply steps in this order to avoid introducing artifacts:
|
|
210
|
+
1. **Confound regression** — regress out motion, WM/CSF signals, aCompCor components
|
|
211
|
+
2. **Z-score per region** — zero mean and unit variance across time (per ROI)
|
|
212
|
+
3. **PCA or ICA** — after z-scoring, so PCA components reflect variance structure, not scale
|
|
213
|
+
|
|
214
|
+
Reversing steps 2 and 3 (PCA before z-scoring) can bias components toward high-variance
|
|
215
|
+
regions (e.g., large subcortical structures), not the most informative regions.
|
|
216
|
+
|
|
217
|
+
### Which dimensionality reduction for which model?
|
|
218
|
+
|
|
219
|
+
| Model | Recommended approach | Typical n_components |
|
|
220
|
+
|-------|---------------------|---------------------|
|
|
221
|
+
| Gaussian HMM (full cov) | PCA or parcellation | 15-50 |
|
|
222
|
+
| Gaussian HMM (diag cov) | Parcellation alone is fine | 50-400 |
|
|
223
|
+
| HMM-MAR | ICA or PCA (mandatory) | 15-25 |
|
|
224
|
+
| SLDS/rSLDS | PCA or parcellation | 20-50 (observation), 5-15 (latent) |
|
|
225
|
+
|
|
226
|
+
---
|
|
227
|
+
|
|
228
|
+
## 5. Parcellation {#parcellation}
|
|
229
|
+
|
|
230
|
+
### Common parcellation atlases
|
|
231
|
+
|
|
232
|
+
| Atlas | Resolutions | Space | Notes |
|
|
233
|
+
|-------|------------|-------|-------|
|
|
234
|
+
| Schaefer | 100, 200, 300, 400, 500, 600, 800, 1000 | MNI, fsLR | Most popular for HMMs. Comes with 7- and 17-network labels. |
|
|
235
|
+
| Gordon | 333 parcels | MNI, fsLR | Good community detection-based parcellation |
|
|
236
|
+
| Glasser (HCP-MMP) | 360 parcels | fsLR (surface) | Multimodal parcellation, surface-based |
|
|
237
|
+
| AAL | 90/116 regions | MNI | Older, anatomical. Still used but Schaefer preferred. |
|
|
238
|
+
| Harvard-Oxford | 48/96 regions | MNI | Probabilistic, anatomical |
|
|
239
|
+
| Tian (subcortical) | 16/32/50 scales | MNI | Pair with Schaefer for subcortical coverage |
|
|
240
|
+
|
|
241
|
+
### Parcellating with nilearn
|
|
242
|
+
|
|
243
|
+
```python
|
|
244
|
+
from nilearn import datasets, maskers
|
|
245
|
+
import numpy as np
|
|
246
|
+
|
|
247
|
+
def parcellate_bold(bold_file, atlas='schaefer', n_rois=200, tr=2.0,
|
|
248
|
+
confounds=None, standardize='zscore_sample'):
|
|
249
|
+
"""Extract parcellated timeseries from BOLD data.
|
|
250
|
+
|
|
251
|
+
Parameters
|
|
252
|
+
----------
|
|
253
|
+
bold_file : str
|
|
254
|
+
Path to preprocessed BOLD NIfTI file
|
|
255
|
+
atlas : str
|
|
256
|
+
'schaefer', 'gordon', 'aal', 'harvard_oxford'
|
|
257
|
+
n_rois : int
|
|
258
|
+
Number of ROIs (for Schaefer)
|
|
259
|
+
tr : float
|
|
260
|
+
Repetition time in seconds. Required when high_pass is set — nilearn uses
|
|
261
|
+
it to convert the high_pass frequency cutoff to a scan-count cutoff.
|
|
262
|
+
confounds : array or None
|
|
263
|
+
Confound matrix to regress out during extraction
|
|
264
|
+
standardize : str
|
|
265
|
+
'zscore_sample' recommended for SSMs (zero mean, unit variance per region)
|
|
266
|
+
"""
|
|
267
|
+
if atlas == 'schaefer':
|
|
268
|
+
atlas_data = datasets.fetch_atlas_schaefer_2018(
|
|
269
|
+
n_rois=n_rois, resolution_mm=2
|
|
270
|
+
)
|
|
271
|
+
labels_img = atlas_data.maps
|
|
272
|
+
elif atlas == 'gordon':
|
|
273
|
+
atlas_data = datasets.fetch_atlas_gordon_2016()
|
|
274
|
+
labels_img = atlas_data.maps
|
|
275
|
+
|
|
276
|
+
masker = maskers.NiftiLabelsMasker(
|
|
277
|
+
labels_img=labels_img,
|
|
278
|
+
standardize=standardize,
|
|
279
|
+
detrend=True,
|
|
280
|
+
high_pass=0.01, # Remove very slow drift (requires t_r to be set)
|
|
281
|
+
t_r=tr, # REQUIRED when high_pass is set; without it filtering is silently skipped
|
|
282
|
+
memory='nilearn_cache',
|
|
283
|
+
)
|
|
284
|
+
|
|
285
|
+
timeseries = masker.fit_transform(bold_file, confounds=confounds)
|
|
286
|
+
print(f"Extracted timeseries: {timeseries.shape} (TRs × ROIs)")
|
|
287
|
+
|
|
288
|
+
return timeseries, masker
|
|
289
|
+
|
|
290
|
+
def add_subcortical(cortical_ts, bold_file, confounds=None):
|
|
291
|
+
"""Add subcortical ROIs (Tian atlas) to cortical parcellation."""
|
|
292
|
+
# Tian subcortical atlas — 16-parcel scale
|
|
293
|
+
tian = datasets.fetch_atlas_tian_2020(resolution=2)
|
|
294
|
+
masker_sub = maskers.NiftiLabelsMasker(
|
|
295
|
+
labels_img=tian.maps,
|
|
296
|
+
standardize='zscore_sample',
|
|
297
|
+
detrend=True,
|
|
298
|
+
)
|
|
299
|
+
subcort_ts = masker_sub.fit_transform(bold_file, confounds=confounds)
|
|
300
|
+
combined = np.hstack([cortical_ts, subcort_ts])
|
|
301
|
+
print(f"Combined: {combined.shape} ({cortical_ts.shape[1]} cortical + {subcort_ts.shape[1]} subcortical)")
|
|
302
|
+
return combined
|
|
303
|
+
```
|
|
304
|
+
|
|
305
|
+
---
|
|
306
|
+
|
|
307
|
+
## 6. CIFTI Surface-Based Processing {#cifti}
|
|
308
|
+
|
|
309
|
+
CIFTI files (.dtseries.nii) contain surface vertices (L/R cortex) + subcortical voxels in a
|
|
310
|
+
single file. This is the preferred format for HCP-style analyses and preserves cortical
|
|
311
|
+
topology better than volumetric approaches.
|
|
312
|
+
|
|
313
|
+
### Loading CIFTI data
|
|
314
|
+
|
|
315
|
+
```python
|
|
316
|
+
import nibabel as nib
|
|
317
|
+
import numpy as np
|
|
318
|
+
|
|
319
|
+
def load_cifti_timeseries(cifti_file):
|
|
320
|
+
"""Load a CIFTI dtseries file and return timeseries + metadata."""
|
|
321
|
+
img = nib.load(cifti_file)
|
|
322
|
+
data = img.get_fdata() # shape: (T, n_greyordinates)
|
|
323
|
+
|
|
324
|
+
# Get brain model information
|
|
325
|
+
axes = [img.header.get_axis(i) for i in range(img.ndim)]
|
|
326
|
+
brain_axis = axes[1] # BrainModelAxis
|
|
327
|
+
|
|
328
|
+
# Identify structures
|
|
329
|
+
structures = {}
|
|
330
|
+
for name, indices, model in brain_axis.iter_structures():
|
|
331
|
+
structures[str(name)] = {
|
|
332
|
+
'indices': indices,
|
|
333
|
+
'n_vertices': len(range(indices.start, indices.stop)),
|
|
334
|
+
}
|
|
335
|
+
|
|
336
|
+
print(f"CIFTI shape: {data.shape}")
|
|
337
|
+
for name, info in structures.items():
|
|
338
|
+
print(f" {name}: {info['n_vertices']} greyordinates")
|
|
339
|
+
|
|
340
|
+
return data, img, structures
|
|
341
|
+
|
|
342
|
+
def parcellate_cifti(cifti_file, dlabel_file):
|
|
343
|
+
"""Parcellate CIFTI timeseries using a dlabel parcellation.
|
|
344
|
+
|
|
345
|
+
Parameters
|
|
346
|
+
----------
|
|
347
|
+
cifti_file : str
|
|
348
|
+
Path to .dtseries.nii
|
|
349
|
+
dlabel_file : str
|
|
350
|
+
Path to .dlabel.nii parcellation (e.g., Schaefer on fsLR)
|
|
351
|
+
|
|
352
|
+
Returns
|
|
353
|
+
-------
|
|
354
|
+
parcellated : array, shape (T, n_parcels)
|
|
355
|
+
"""
|
|
356
|
+
bold_img = nib.load(cifti_file)
|
|
357
|
+
bold_data = bold_img.get_fdata() # (T, n_greyordinates)
|
|
358
|
+
|
|
359
|
+
label_img = nib.load(dlabel_file)
|
|
360
|
+
labels = label_img.get_fdata().squeeze() # (n_greyordinates,)
|
|
361
|
+
|
|
362
|
+
unique_labels = np.unique(labels)
|
|
363
|
+
unique_labels = unique_labels[unique_labels > 0] # remove background
|
|
364
|
+
|
|
365
|
+
parcellated = np.zeros((bold_data.shape[0], len(unique_labels)))
|
|
366
|
+
for i, label in enumerate(unique_labels):
|
|
367
|
+
mask = labels == label
|
|
368
|
+
parcellated[:, i] = bold_data[:, mask].mean(axis=1)
|
|
369
|
+
|
|
370
|
+
# Z-score each parcel
|
|
371
|
+
parcellated = (parcellated - parcellated.mean(axis=0)) / parcellated.std(axis=0)
|
|
372
|
+
|
|
373
|
+
print(f"Parcellated CIFTI: {parcellated.shape}")
|
|
374
|
+
return parcellated
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
### Workbench command-line tools for CIFTI
|
|
378
|
+
|
|
379
|
+
```bash
|
|
380
|
+
# Parcellate CIFTI using wb_command (fast, handles structures correctly)
|
|
381
|
+
wb_command -cifti-parcellate \
|
|
382
|
+
sub-01_bold.dtseries.nii \
|
|
383
|
+
Schaefer2018_200Parcels_17Networks.dlabel.nii \
|
|
384
|
+
COLUMN \
|
|
385
|
+
sub-01_bold_parcellated.ptseries.nii
|
|
386
|
+
|
|
387
|
+
# Smooth on surface before parcellation (recommended: 4-6mm FWHM)
|
|
388
|
+
wb_command -cifti-smoothing \
|
|
389
|
+
sub-01_bold.dtseries.nii \
|
|
390
|
+
4 4 COLUMN \
|
|
391
|
+
sub-01_bold_smoothed.dtseries.nii \
|
|
392
|
+
-left-surface sub-01.L.midthickness.32k_fs_LR.surf.gii \
|
|
393
|
+
-right-surface sub-01.R.midthickness.32k_fs_LR.surf.gii
|
|
394
|
+
```
|
|
395
|
+
|
|
396
|
+
---
|
|
397
|
+
|
|
398
|
+
## 7. ICA-Based Approaches {#ica}
|
|
399
|
+
|
|
400
|
+
### When to use ICA instead of parcellation
|
|
401
|
+
- When you want data-driven spatial features (not constrained by atlas boundaries)
|
|
402
|
+
- When you expect the relevant signals to be spatially distributed/overlapping
|
|
403
|
+
- When using HMM-MAR (ICA components are the standard input)
|
|
404
|
+
- When the number of meaningful dimensions is unknown
|
|
405
|
+
|
|
406
|
+
### Group ICA with FSL MELODIC
|
|
407
|
+
|
|
408
|
+
```bash
|
|
409
|
+
# Run group ICA (typically 15-50 components for SSM input)
|
|
410
|
+
melodic -i bold_files_list.txt \
|
|
411
|
+
-o group_ica_output \
|
|
412
|
+
--dim=25 \
|
|
413
|
+
--tr=0.8 \
|
|
414
|
+
--Oall \
|
|
415
|
+
--report
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
### Extracting subject-level ICA timeseries (dual regression)
|
|
419
|
+
|
|
420
|
+
```bash
|
|
421
|
+
# Dual regression: project group ICA maps onto individual data
|
|
422
|
+
dual_regression \
|
|
423
|
+
group_ica_output/melodic_IC.nii.gz \
|
|
424
|
+
1 \ # variance normalization
|
|
425
|
+
-1 \ # no permutation testing
|
|
426
|
+
output_dir \
|
|
427
|
+
bold_files_list.txt
|
|
428
|
+
```
|
|
429
|
+
|
|
430
|
+
### Using nilearn for ICA
|
|
431
|
+
|
|
432
|
+
```python
|
|
433
|
+
from nilearn.decomposition import CanICA
|
|
434
|
+
|
|
435
|
+
def run_group_ica(bold_files, n_components=25, random_state=42):
|
|
436
|
+
"""Run group ICA on multiple subjects using nilearn."""
|
|
437
|
+
canica = CanICA(
|
|
438
|
+
n_components=n_components,
|
|
439
|
+
memory='nilearn_cache',
|
|
440
|
+
memory_level=2,
|
|
441
|
+
threshold=3.,
|
|
442
|
+
n_init=10,
|
|
443
|
+
random_state=random_state,
|
|
444
|
+
)
|
|
445
|
+
canica.fit(bold_files)
|
|
446
|
+
|
|
447
|
+
# Extract timeseries for each subject
|
|
448
|
+
all_timeseries = []
|
|
449
|
+
for bold_file in bold_files:
|
|
450
|
+
ts = canica.transform([bold_file])[0] # (T, n_components)
|
|
451
|
+
all_timeseries.append(ts)
|
|
452
|
+
|
|
453
|
+
return all_timeseries, canica
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
---
|
|
457
|
+
|
|
458
|
+
## 8. Temporal Filtering {#filtering}
|
|
459
|
+
|
|
460
|
+
### High-pass filtering
|
|
461
|
+
fMRIPrep applies cosine drift regressors (default: 128s cutoff). XCP-D may apply additional
|
|
462
|
+
filtering. For SSMs, slow drift removal is important — otherwise, slow drift can be
|
|
463
|
+
mistaken for a "state."
|
|
464
|
+
|
|
465
|
+
**Recommended:** High-pass filter at 0.01 Hz (100s period) or use cosine regressors.
|
|
466
|
+
Do NOT use aggressive high-pass (>0.03 Hz) as this can remove real slow state dynamics.
|
|
467
|
+
|
|
468
|
+
### Low-pass filtering
|
|
469
|
+
Generally NOT recommended for SSMs. Low-pass filtering removes the high-frequency information
|
|
470
|
+
that distinguishes states. Exception: if you have very fast TR (<0.5s) and want to remove
|
|
471
|
+
cardiac/respiratory aliasing, a gentle low-pass at 0.2 Hz may help.
|
|
472
|
+
|
|
473
|
+
### Band-pass filtering
|
|
474
|
+
Some resting-state analyses use 0.01-0.1 Hz bandpass. This is standard for FC analyses
|
|
475
|
+
but overly aggressive for SSMs — it removes fast transitions that SSMs are designed to detect.
|
|
476
|
+
**Recommendation:** Use 0.01 Hz high-pass only, no low-pass, unless you have specific reasons.
|
|
477
|
+
|
|
478
|
+
---
|
|
479
|
+
|
|
480
|
+
## 9. Data Quality Checks Before SSM Fitting {#qc}
|
|
481
|
+
|
|
482
|
+
Run these checks BEFORE fitting any SSM:
|
|
483
|
+
|
|
484
|
+
```python
|
|
485
|
+
def pre_ssm_quality_checks(timeseries, confounds_file, tr):
|
|
486
|
+
"""Quality checks for SSM input data.
|
|
487
|
+
|
|
488
|
+
Parameters
|
|
489
|
+
----------
|
|
490
|
+
timeseries : array, shape (T, n_features)
|
|
491
|
+
confounds_file : str
|
|
492
|
+
tr : float
|
|
493
|
+
"""
|
|
494
|
+
import matplotlib.pyplot as plt
|
|
495
|
+
|
|
496
|
+
T, p = timeseries.shape
|
|
497
|
+
df = pd.read_csv(confounds_file, sep='\t')
|
|
498
|
+
|
|
499
|
+
# 1. Check for NaN/Inf
|
|
500
|
+
n_nan = np.isnan(timeseries).sum()
|
|
501
|
+
n_inf = np.isinf(timeseries).sum()
|
|
502
|
+
print(f"NaN values: {n_nan}, Inf values: {n_inf}")
|
|
503
|
+
assert n_nan == 0 and n_inf == 0, "Data contains NaN or Inf — fix preprocessing"
|
|
504
|
+
|
|
505
|
+
# 2. Check temporal SNR per region
|
|
506
|
+
tsnr = timeseries.mean(axis=0) / timeseries.std(axis=0)
|
|
507
|
+
low_tsnr = (tsnr < 20).sum()
|
|
508
|
+
print(f"Regions with tSNR < 20: {low_tsnr}/{p}")
|
|
509
|
+
if low_tsnr > p * 0.1:
|
|
510
|
+
print("WARNING: >10% of regions have low tSNR")
|
|
511
|
+
|
|
512
|
+
# 3. Check motion
|
|
513
|
+
fd = df['framewise_displacement'].values
|
|
514
|
+
fd[0] = 0
|
|
515
|
+
mean_fd = fd.mean()
|
|
516
|
+
pct_high = 100 * (fd > 0.5).sum() / len(fd)
|
|
517
|
+
print(f"Mean FD: {mean_fd:.3f} mm, TRs with FD>0.5mm: {pct_high:.1f}%")
|
|
518
|
+
if mean_fd > 0.3:
|
|
519
|
+
print("WARNING: High mean motion — consider excluding")
|
|
520
|
+
|
|
521
|
+
# 4. Check variance across regions (detect dead regions)
|
|
522
|
+
region_var = timeseries.var(axis=0)
|
|
523
|
+
dead_regions = (region_var < 1e-6).sum()
|
|
524
|
+
print(f"Dead regions (near-zero variance): {dead_regions}")
|
|
525
|
+
|
|
526
|
+
# 5. Check for extreme outliers
|
|
527
|
+
z_scores = np.abs((timeseries - timeseries.mean(0)) / timeseries.std(0))
|
|
528
|
+
extreme_trs = (z_scores > 5).any(axis=1).sum()
|
|
529
|
+
print(f"TRs with extreme outliers (|z|>5): {extreme_trs}/{T}")
|
|
530
|
+
|
|
531
|
+
# 6. Scan length adequacy
|
|
532
|
+
min_trs_per_state_k8 = 8 * 50 # rough: 50 TRs per state for K=8
|
|
533
|
+
print(f"Total TRs: {T} ({T*tr/60:.1f} minutes)")
|
|
534
|
+
print(f"Rough max K for stable estimation (full cov): ~{T // 50}")
|
|
535
|
+
|
|
536
|
+
return {
|
|
537
|
+
'tsnr': tsnr, 'mean_fd': mean_fd, 'pct_high_motion': pct_high,
|
|
538
|
+
'dead_regions': dead_regions, 'extreme_trs': extreme_trs,
|
|
539
|
+
}
|
|
540
|
+
```
|
|
541
|
+
|
|
542
|
+
---
|
|
543
|
+
|
|
544
|
+
## 10. Preparing the Data Matrix {#data-matrix}
|
|
545
|
+
|
|
546
|
+
### Final assembly for SSM fitting
|
|
547
|
+
|
|
548
|
+
```python
|
|
549
|
+
def prepare_ssm_input(bold_files, confounds_files, parcellation='schaefer200',
|
|
550
|
+
standardize=True, concat_runs=True, tr=None):
|
|
551
|
+
"""Full pipeline from fMRIPrep outputs to SSM-ready data matrix.
|
|
552
|
+
|
|
553
|
+
Returns
|
|
554
|
+
-------
|
|
555
|
+
data : array or list of arrays
|
|
556
|
+
If concat_runs: single array (total_T, n_features) with run_boundaries
|
|
557
|
+
If not: list of arrays, one per run
|
|
558
|
+
run_boundaries : list of int
|
|
559
|
+
TR indices where runs start (for resetting HMM forward algorithm)
|
|
560
|
+
"""
|
|
561
|
+
all_runs = []
|
|
562
|
+
run_boundaries = [0]
|
|
563
|
+
|
|
564
|
+
for bold_file, confounds_file in zip(bold_files, confounds_files):
|
|
565
|
+
# 1. Load confounds
|
|
566
|
+
confounds, _ = load_confounds_for_ssm(confounds_file, strategy='moderate')
|
|
567
|
+
|
|
568
|
+
# 2. Parcellate (with confound regression built in)
|
|
569
|
+
ts, masker = parcellate_bold(
|
|
570
|
+
bold_file, atlas='schaefer', n_rois=200,
|
|
571
|
+
confounds=confounds, standardize='zscore_sample'
|
|
572
|
+
)
|
|
573
|
+
|
|
574
|
+
# 3. Quality check
|
|
575
|
+
qc = pre_ssm_quality_checks(ts, confounds_file, tr)
|
|
576
|
+
|
|
577
|
+
all_runs.append(ts)
|
|
578
|
+
run_boundaries.append(run_boundaries[-1] + ts.shape[0])
|
|
579
|
+
|
|
580
|
+
if concat_runs:
|
|
581
|
+
data = np.vstack(all_runs)
|
|
582
|
+
print(f"Concatenated: {data.shape}")
|
|
583
|
+
if standardize:
|
|
584
|
+
data = (data - data.mean(axis=0)) / data.std(axis=0)
|
|
585
|
+
return data, run_boundaries[:-1] # exclude last boundary
|
|
586
|
+
else:
|
|
587
|
+
return all_runs, run_boundaries[:-1]
|
|
588
|
+
```
|
|
589
|
+
|
|
590
|
+
### Handling run boundaries in HMMs
|
|
591
|
+
|
|
592
|
+
When concatenating runs, you MUST tell the HMM where run boundaries are. Otherwise it will
|
|
593
|
+
try to model transitions from the end of run N to the start of run N+1, which are not
|
|
594
|
+
real transitions.
|
|
595
|
+
|
|
596
|
+
```python
|
|
597
|
+
# hmmlearn supports this via the 'lengths' parameter
|
|
598
|
+
lengths = [run.shape[0] for run in all_runs]
|
|
599
|
+
data_concat = np.vstack(all_runs)
|
|
600
|
+
|
|
601
|
+
model = hmm.GaussianHMM(n_components=K, n_iter=200)
|
|
602
|
+
model.fit(data_concat, lengths=lengths)
|
|
603
|
+
|
|
604
|
+
# For prediction, also pass lengths
|
|
605
|
+
states = model.predict(data_concat, lengths=lengths)
|
|
606
|
+
```
|
|
607
|
+
|
|
608
|
+
For the `ssm` library, fit on a list of timeseries instead:
|
|
609
|
+
```python
|
|
610
|
+
import ssm
|
|
611
|
+
|
|
612
|
+
model = ssm.HMM(K, D, observations='gaussian')
|
|
613
|
+
model.fit([run for run in all_runs]) # pass list of arrays
|
|
614
|
+
```
|
|
Binary file
|