autobidsify 0.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,65 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual environments
24
+ venv/
25
+ env/
26
+ ENV/
27
+ .venv/
28
+ auto_bidsify/
29
+
30
+ # IDEs
31
+ .vscode/
32
+ .idea/
33
+ *.swp
34
+ *.swo
35
+ *~
36
+ .DS_Store
37
+
38
+ # Project specific - Data directories
39
+ datasets/
40
+ outputs/
41
+
42
+ # Project specific - Staging and temp
43
+ _staging/
44
+ *.log
45
+ *.tmp
46
+
47
+ # API keys and secrets
48
+ .env
49
+ .env.local
50
+ *.key
51
+ *.pem
52
+
53
+ # Jupyter
54
+ .ipynb_checkpoints/
55
+ *.ipynb
56
+
57
+ # Testing
58
+ .pytest_cache/
59
+ .coverage
60
+ htmlcov/
61
+
62
+ # Backups
63
+ backups/
64
+ *.backup
65
+ *.bak
@@ -0,0 +1,231 @@
1
+ Metadata-Version: 2.4
2
+ Name: autobidsify
3
+ Version: 0.5.0
4
+ Summary: Automated BIDS standardization tool powered by LLM-first architecture
5
+ Project-URL: Homepage, https://github.com/fangzhouliucode/autobidsify
6
+ Project-URL: Documentation, https://autobidsify.readthedocs.io
7
+ Project-URL: Repository, https://github.com/fangzhouliucode/autobidsify
8
+ Project-URL: Issues, https://github.com/fangzhouliucode/autobidsify/issues
9
+ Project-URL: Changelog, https://github.com/fangzhouliucode/autobidsify/blob/main/CHANGELOG.md
10
+ Author-email: Yiyi Liu <yiyi.liu3@northeastern.edu>
11
+ Maintainer-email: Yiyi Liu <yiyi.liu3@northeastern.edu>
12
+ License: MIT
13
+ Keywords: bids,brain-imaging,data-standardization,dicom,fnirs,medical-imaging,mri,neuroimaging,nifti
14
+ Classifier: Development Status :: 4 - Beta
15
+ Classifier: Intended Audience :: Healthcare Industry
16
+ Classifier: Intended Audience :: Science/Research
17
+ Classifier: License :: OSI Approved :: MIT License
18
+ Classifier: Operating System :: OS Independent
19
+ Classifier: Programming Language :: Python :: 3
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Programming Language :: Python :: 3.13
24
+ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
25
+ Classifier: Topic :: Scientific/Engineering :: Image Processing
26
+ Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
27
+ Requires-Python: >=3.10
28
+ Requires-Dist: h5py>=3.8.0
29
+ Requires-Dist: nibabel>=5.0.0
30
+ Requires-Dist: numpy>=1.24.0
31
+ Requires-Dist: openai>=1.0.0
32
+ Requires-Dist: openpyxl>=3.1.0
33
+ Requires-Dist: pandas>=2.0.0
34
+ Requires-Dist: pdfplumber>=0.10.0
35
+ Requires-Dist: pydicom>=2.4.0
36
+ Requires-Dist: pypdf2>=3.0.0
37
+ Requires-Dist: python-docx>=1.0.0
38
+ Requires-Dist: pyyaml>=6.0
39
+ Requires-Dist: scipy>=1.10.0
40
+ Provides-Extra: all
41
+ Requires-Dist: black>=23.0; extra == 'all'
42
+ Requires-Dist: mypy>=1.0; extra == 'all'
43
+ Requires-Dist: myst-parser>=2.0; extra == 'all'
44
+ Requires-Dist: pre-commit>=3.0; extra == 'all'
45
+ Requires-Dist: pytest-cov>=4.0; extra == 'all'
46
+ Requires-Dist: pytest>=7.0; extra == 'all'
47
+ Requires-Dist: python-dotenv>=1.0.0; extra == 'all'
48
+ Requires-Dist: ruff>=0.1.0; extra == 'all'
49
+ Requires-Dist: sphinx-rtd-theme>=1.0; extra == 'all'
50
+ Requires-Dist: sphinx>=6.0; extra == 'all'
51
+ Provides-Extra: dev
52
+ Requires-Dist: black>=23.0; extra == 'dev'
53
+ Requires-Dist: mypy>=1.0; extra == 'dev'
54
+ Requires-Dist: pre-commit>=3.0; extra == 'dev'
55
+ Requires-Dist: pytest-cov>=4.0; extra == 'dev'
56
+ Requires-Dist: pytest>=7.0; extra == 'dev'
57
+ Requires-Dist: ruff>=0.1.0; extra == 'dev'
58
+ Provides-Extra: docs
59
+ Requires-Dist: myst-parser>=2.0; extra == 'docs'
60
+ Requires-Dist: sphinx-rtd-theme>=1.0; extra == 'docs'
61
+ Requires-Dist: sphinx>=6.0; extra == 'docs'
62
+ Provides-Extra: dotenv
63
+ Requires-Dist: python-dotenv>=1.0.0; extra == 'dotenv'
64
+ Description-Content-Type: text/markdown
65
+
66
+ # auto-bidsify
67
+
68
+ Automated BIDS standardization tool powered by LLM-first architecture.
69
+
70
+ ## Features
71
+
72
+ - **General compatibility**: Handles diverse dataset structures (flat, hierarchical, multi-site)
73
+ - **Multi-modal support**: MRI, fNIRS, and mixed modality datasets
74
+ - **Intelligent metadata extraction**: Automatic participant demographics from DICOM headers, documents, and filenames
75
+ - **Format conversion**: DICOM→NIfTI, CSV→SNIRF, and more
76
+ - **Evidence-based reasoning**: Confidence scoring and provenance tracking for all decisions
77
+
78
+ ## Supported Formats
79
+
80
+ **Input formats:**
81
+ - MRI: DICOM, NIfTI (.nii, .nii.gz)
82
+ - fNIRS: SNIRF, Homer3 (.nirs), CSV/TSV tables
83
+ - Documents: PDF, DOCX, TXT, Markdown, ...
84
+
85
+ **Output:** BIDS-compliant dataset (v1.10.0)
86
+
87
+ ## Quick Start
88
+
89
+ ### Installation
90
+
91
+ ```bash
92
+ # Clone repository
93
+ git clone https://github.com/yourusername/auto-bidsify.git
94
+ cd auto-bidsify
95
+
96
+ # Setup environment
97
+ conda create -n bidsify python=3.10
98
+ conda activate bidsify
99
+ pip install -r requirements.txt
100
+
101
+ # Set OpenAI API key
102
+ export OPENAI_API_KEY="your-key-here"
103
+ ```
104
+
105
+ ### Basic Usage
106
+
107
+ ```bash
108
+ # Full pipeline (one command)
109
+ python cli.py full \
110
+ --input /path/to/your/data \
111
+ --output outputs/my_dataset \
112
+ --model gpt-4o \
113
+ --modality mri
114
+
115
+ # Step-by-step execution
116
+ python cli.py ingest --input data.zip --output outputs/run
117
+ python cli.py evidence --output outputs/run --modality mri
118
+ python cli.py trio --output outputs/run --model gpt-4o
119
+ python cli.py plan --output outputs/run --model gpt-4o
120
+ python cli.py execute --output outputs/run
121
+ python cli.py validate --output outputs/run
122
+ ```
123
+
124
+ ### Command Options
125
+
126
+ ```bash
127
+ --input PATH Input data (archive or directory)
128
+ --output PATH Output directory
129
+ --model MODEL LLM model (default: gpt-4o)
130
+ --modality TYPE Data modality: mri|nirs|mixed
131
+ --nsubjects N Number of subjects (optional)
132
+ --describe "TEXT" Dataset description (recommended)
133
+ ```
134
+
135
+ ## Pipeline Stages
136
+
137
+ | Stage | Command | Input | Output | Purpose |
138
+ |-------|-------------|-----------------|----------------------------|------------------------------------|
139
+ | 1 | `ingest` | Raw data | `ingest_info.json` | Extract/reference data |
140
+ | 2 | `evidence` | All files | `evidence_bundle.json` | Analyze structure, detect subjects |
141
+ | 3 | `classify` | Mixed data | `classification_plan.json` | Separate MRI/fNIRS (optional) |
142
+ | 4 | `trio` | Evidence | BIDS trio files | Generate metadata files |
143
+ | 5 | `plan` | Evidence + trio | `BIDSPlan.yaml` | Create conversion strategy |
144
+ | 6 | `execute` | Plan | `bids_compatible/` | Execute conversions |
145
+ | 7 | `validate` | BIDS dataset | Validation report | Check compliance |
146
+
147
+ ## Output Structure
148
+
149
+ ```
150
+ outputs/my_dataset/
151
+ bids_compatible/ # Final BIDS dataset
152
+ dataset_description.json
153
+ README.md
154
+ participants.tsv
155
+ sub-001/
156
+ anat/
157
+ sub-001_T1w.nii.gz
158
+ func/
159
+ sub-001_task-rest_bold.nii.gz
160
+ _staging/ # Intermediate files
161
+ evidence_bundle.json
162
+ BIDSPlan.yaml
163
+ conversion_log.json
164
+ ```
165
+
166
+ ## Examples
167
+
168
+ ### Example 1: Single-site MRI study
169
+ ```bash
170
+ python cli.py full \
171
+ --input brain_scans/ \
172
+ --output outputs/study1 \
173
+ --nsubjects 50 \
174
+ --model gpt-4o \
175
+ --modality mri
176
+ ```
177
+
178
+ ### Example 2: Multi-site dataset with description
179
+ ```bash
180
+ python cli.py full \
181
+ --input camcan_data/ \
182
+ --output outputs/camcan \
183
+ --model gpt-4o \
184
+ --modality mri \
185
+ --describe "Cambridge Centre for Ageing and Neuroscience: 650 participants, ages 18-88, multi-site MRI study"
186
+ ```
187
+
188
+ ### Example 3: fNIRS dataset from CSV
189
+ ```bash
190
+ python cli.py full \
191
+ --input fnirs_study/ \
192
+ --output outputs/fnirs \
193
+ --model gpt-4o \
194
+ --modality nirs \
195
+ --describe "Prefrontal cortex activation during cognitive tasks, 30 subjects"
196
+ ```
197
+
198
+ ## Architecture
199
+
200
+ **LLM-First Design:**
201
+ - **Python**: Deterministic operations (file I/O, format conversion, validation)
202
+ - **LLM**: Semantic understanding (file classification, metadata extraction, pattern recognition)
203
+ - **Hybrid**: Best of both worlds - reliability + flexibility
204
+
205
+ ## Requirements
206
+
207
+ - Python 3.10+
208
+ - OpenAI API key
209
+ - Optional: `dcm2niix` for DICOM conversion
210
+ - Optional: `bids-validator` for validation
211
+
212
+ ## Current Status
213
+
214
+ **Version:** 1.0 (LLM-First Architecture with Evidence-Based Reasoning)
215
+
216
+ **Tested datasets:**
217
+ - Visible Human Project (flat structure, CT scans)
218
+ - CamCAN (hierarchical, multi-site, 1288 subjects)
219
+ - [Your dataset here - help us test!]
220
+
221
+ **Known limitations:**
222
+ - Classification stage (Stage 3) and mat/spreadsheet conversion is experimental
223
+ - Some edge cases in participant metadata extraction
224
+
225
+ ## Contributing
226
+
227
+ We need YOUR datasets to improve robustness! Please test and report:
228
+ - Success cases
229
+ - Failure cases
230
+ - Edge cases
231
+
@@ -0,0 +1,166 @@
1
+ # auto-bidsify
2
+
3
+ Automated BIDS standardization tool powered by LLM-first architecture.
4
+
5
+ ## Features
6
+
7
+ - **General compatibility**: Handles diverse dataset structures (flat, hierarchical, multi-site)
8
+ - **Multi-modal support**: MRI, fNIRS, and mixed modality datasets
9
+ - **Intelligent metadata extraction**: Automatic participant demographics from DICOM headers, documents, and filenames
10
+ - **Format conversion**: DICOM→NIfTI, CSV→SNIRF, and more
11
+ - **Evidence-based reasoning**: Confidence scoring and provenance tracking for all decisions
12
+
13
+ ## Supported Formats
14
+
15
+ **Input formats:**
16
+ - MRI: DICOM, NIfTI (.nii, .nii.gz)
17
+ - fNIRS: SNIRF, Homer3 (.nirs), CSV/TSV tables
18
+ - Documents: PDF, DOCX, TXT, Markdown, ...
19
+
20
+ **Output:** BIDS-compliant dataset (v1.10.0)
21
+
22
+ ## Quick Start
23
+
24
+ ### Installation
25
+
26
+ ```bash
27
+ # Clone repository
28
+ git clone https://github.com/yourusername/auto-bidsify.git
29
+ cd auto-bidsify
30
+
31
+ # Setup environment
32
+ conda create -n bidsify python=3.10
33
+ conda activate bidsify
34
+ pip install -r requirements.txt
35
+
36
+ # Set OpenAI API key
37
+ export OPENAI_API_KEY="your-key-here"
38
+ ```
39
+
40
+ ### Basic Usage
41
+
42
+ ```bash
43
+ # Full pipeline (one command)
44
+ python cli.py full \
45
+ --input /path/to/your/data \
46
+ --output outputs/my_dataset \
47
+ --model gpt-4o \
48
+ --modality mri
49
+
50
+ # Step-by-step execution
51
+ python cli.py ingest --input data.zip --output outputs/run
52
+ python cli.py evidence --output outputs/run --modality mri
53
+ python cli.py trio --output outputs/run --model gpt-4o
54
+ python cli.py plan --output outputs/run --model gpt-4o
55
+ python cli.py execute --output outputs/run
56
+ python cli.py validate --output outputs/run
57
+ ```
58
+
59
+ ### Command Options
60
+
61
+ ```bash
62
+ --input PATH Input data (archive or directory)
63
+ --output PATH Output directory
64
+ --model MODEL LLM model (default: gpt-4o)
65
+ --modality TYPE Data modality: mri|nirs|mixed
66
+ --nsubjects N Number of subjects (optional)
67
+ --describe "TEXT" Dataset description (recommended)
68
+ ```
69
+
70
+ ## Pipeline Stages
71
+
72
+ | Stage | Command | Input | Output | Purpose |
73
+ |-------|-------------|-----------------|----------------------------|------------------------------------|
74
+ | 1 | `ingest` | Raw data | `ingest_info.json` | Extract/reference data |
75
+ | 2 | `evidence` | All files | `evidence_bundle.json` | Analyze structure, detect subjects |
76
+ | 3 | `classify` | Mixed data | `classification_plan.json` | Separate MRI/fNIRS (optional) |
77
+ | 4 | `trio` | Evidence | BIDS trio files | Generate metadata files |
78
+ | 5 | `plan` | Evidence + trio | `BIDSPlan.yaml` | Create conversion strategy |
79
+ | 6 | `execute` | Plan | `bids_compatible/` | Execute conversions |
80
+ | 7 | `validate` | BIDS dataset | Validation report | Check compliance |
81
+
82
+ ## Output Structure
83
+
84
+ ```
85
+ outputs/my_dataset/
86
+ bids_compatible/ # Final BIDS dataset
87
+ dataset_description.json
88
+ README.md
89
+ participants.tsv
90
+ sub-001/
91
+ anat/
92
+ sub-001_T1w.nii.gz
93
+ func/
94
+ sub-001_task-rest_bold.nii.gz
95
+ _staging/ # Intermediate files
96
+ evidence_bundle.json
97
+ BIDSPlan.yaml
98
+ conversion_log.json
99
+ ```
100
+
101
+ ## Examples
102
+
103
+ ### Example 1: Single-site MRI study
104
+ ```bash
105
+ python cli.py full \
106
+ --input brain_scans/ \
107
+ --output outputs/study1 \
108
+ --nsubjects 50 \
109
+ --model gpt-4o \
110
+ --modality mri
111
+ ```
112
+
113
+ ### Example 2: Multi-site dataset with description
114
+ ```bash
115
+ python cli.py full \
116
+ --input camcan_data/ \
117
+ --output outputs/camcan \
118
+ --model gpt-4o \
119
+ --modality mri \
120
+ --describe "Cambridge Centre for Ageing and Neuroscience: 650 participants, ages 18-88, multi-site MRI study"
121
+ ```
122
+
123
+ ### Example 3: fNIRS dataset from CSV
124
+ ```bash
125
+ python cli.py full \
126
+ --input fnirs_study/ \
127
+ --output outputs/fnirs \
128
+ --model gpt-4o \
129
+ --modality nirs \
130
+ --describe "Prefrontal cortex activation during cognitive tasks, 30 subjects"
131
+ ```
132
+
133
+ ## Architecture
134
+
135
+ **LLM-First Design:**
136
+ - **Python**: Deterministic operations (file I/O, format conversion, validation)
137
+ - **LLM**: Semantic understanding (file classification, metadata extraction, pattern recognition)
138
+ - **Hybrid**: Best of both worlds - reliability + flexibility
139
+
140
+ ## Requirements
141
+
142
+ - Python 3.10+
143
+ - OpenAI API key
144
+ - Optional: `dcm2niix` for DICOM conversion
145
+ - Optional: `bids-validator` for validation
146
+
147
+ ## Current Status
148
+
149
+ **Version:** 1.0 (LLM-First Architecture with Evidence-Based Reasoning)
150
+
151
+ **Tested datasets:**
152
+ - Visible Human Project (flat structure, CT scans)
153
+ - CamCAN (hierarchical, multi-site, 1288 subjects)
154
+ - [Your dataset here - help us test!]
155
+
156
+ **Known limitations:**
157
+ - Classification stage (Stage 3) and mat/spreadsheet conversion is experimental
158
+ - Some edge cases in participant metadata extraction
159
+
160
+ ## Contributing
161
+
162
+ We need YOUR datasets to improve robustness! Please test and report:
163
+ - Success cases
164
+ - Failure cases
165
+ - Edge cases
166
+
@@ -0,0 +1,20 @@
1
+ """
2
+ autobidsify: Automated BIDS standardization tool powered by LLM-first architecture.
3
+ """
4
+
5
+ __version__ = "1.0.0"
6
+ __author__ = "Your Name"
7
+
8
+ from autobidsify.utils import info, warn, fatal, debug
9
+ from autobidsify.constants import BIDS_VERSION, MODALITY_MRI, MODALITY_NIRS
10
+
11
+ __all__ = [
12
+ "__version__",
13
+ "info",
14
+ "warn",
15
+ "fatal",
16
+ "debug",
17
+ "BIDS_VERSION",
18
+ "MODALITY_MRI",
19
+ "MODALITY_NIRS",
20
+ ]