honeybee-ml 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- honeybee_ml-0.1.0/CONTRIBUTING.md +363 -0
- honeybee_ml-0.1.0/LICENSE +21 -0
- honeybee_ml-0.1.0/MANIFEST.in +31 -0
- honeybee_ml-0.1.0/PKG-INFO +278 -0
- honeybee_ml-0.1.0/README.md +198 -0
- honeybee_ml-0.1.0/honeybee/__init__.py +198 -0
- honeybee_ml-0.1.0/honeybee/loaders/Radiology/__init__.py +19 -0
- honeybee_ml-0.1.0/honeybee/loaders/Radiology/dataset.py +239 -0
- honeybee_ml-0.1.0/honeybee/loaders/Radiology/loader.py +543 -0
- honeybee_ml-0.1.0/honeybee/loaders/Radiology/metadata.py +90 -0
- honeybee_ml-0.1.0/honeybee/loaders/Radiology/radiology.py +80 -0
- honeybee_ml-0.1.0/honeybee/loaders/Reader/mindsDBreader.py +45 -0
- honeybee_ml-0.1.0/honeybee/loaders/Reader/reader.py +84 -0
- honeybee_ml-0.1.0/honeybee/loaders/Scans/scan.py +40 -0
- honeybee_ml-0.1.0/honeybee/loaders/Slide/__init__.py +4 -0
- honeybee_ml-0.1.0/honeybee/loaders/Slide/slide.py +291 -0
- honeybee_ml-0.1.0/honeybee/loaders/__init__.py +14 -0
- honeybee_ml-0.1.0/honeybee/models/HuggingFaceEmbedder/embedder.py +101 -0
- honeybee_ml-0.1.0/honeybee/models/LlamaEmbedder/__init__.py +3 -0
- honeybee_ml-0.1.0/honeybee/models/LlamaEmbedder/llama_embedder.py +307 -0
- honeybee_ml-0.1.0/honeybee/models/REMEDIS/remedis.py +58 -0
- honeybee_ml-0.1.0/honeybee/models/RadImageNet/__init__.py +9 -0
- honeybee_ml-0.1.0/honeybee/models/RadImageNet/radimagenet.py +457 -0
- honeybee_ml-0.1.0/honeybee/models/TissueDetector/tissue_detector.py +91 -0
- honeybee_ml-0.1.0/honeybee/models/UNI/uni.py +34 -0
- honeybee_ml-0.1.0/honeybee/models/UNI2/__init__.py +3 -0
- honeybee_ml-0.1.0/honeybee/models/UNI2/uni2.py +178 -0
- honeybee_ml-0.1.0/honeybee/models/Virchow2/__init__.py +3 -0
- honeybee_ml-0.1.0/honeybee/models/Virchow2/virchow2.py +164 -0
- honeybee_ml-0.1.0/honeybee/models/Virchow2/virchow2_simple.py +144 -0
- honeybee_ml-0.1.0/honeybee/models/__init__.py +6 -0
- honeybee_ml-0.1.0/honeybee/processors/__init__.py +25 -0
- honeybee_ml-0.1.0/honeybee/processors/clinical_processor.py +1351 -0
- honeybee_ml-0.1.0/honeybee/processors/pathology_processor.py +552 -0
- honeybee_ml-0.1.0/honeybee/processors/radiology/__init__.py +35 -0
- honeybee_ml-0.1.0/honeybee/processors/radiology/preprocessing.py +573 -0
- honeybee_ml-0.1.0/honeybee/processors/radiology/processor.py +790 -0
- honeybee_ml-0.1.0/honeybee/processors/radiology/segmentation.py +714 -0
- honeybee_ml-0.1.0/honeybee/processors/radiology_processor.py +20 -0
- honeybee_ml-0.1.0/honeybee/processors/wsi/__init__.py +58 -0
- honeybee_ml-0.1.0/honeybee/processors/wsi/stain_normalization.py +700 -0
- honeybee_ml-0.1.0/honeybee/processors/wsi/stain_normalization_fixed.py +150 -0
- honeybee_ml-0.1.0/honeybee/processors/wsi/stain_normalization_working.py +130 -0
- honeybee_ml-0.1.0/honeybee/processors/wsi/stain_separation.py +299 -0
- honeybee_ml-0.1.0/honeybee/processors/wsi/tissue_detection.py +357 -0
- honeybee_ml-0.1.0/honeybee/utils/__init__.py +5 -0
- honeybee_ml-0.1.0/honeybee/utils/pydantic_compat.py +31 -0
- honeybee_ml-0.1.0/honeybee_ml.egg-info/PKG-INFO +278 -0
- honeybee_ml-0.1.0/honeybee_ml.egg-info/SOURCES.txt +53 -0
- honeybee_ml-0.1.0/honeybee_ml.egg-info/dependency_links.txt +1 -0
- honeybee_ml-0.1.0/honeybee_ml.egg-info/requires.txt +57 -0
- honeybee_ml-0.1.0/honeybee_ml.egg-info/top_level.txt +1 -0
- honeybee_ml-0.1.0/pyproject.toml +180 -0
- honeybee_ml-0.1.0/requirements.txt +34 -0
- honeybee_ml-0.1.0/setup.cfg +4 -0
|
@@ -0,0 +1,363 @@
|
|
|
1
|
+
# Contributing to HoneyBee
|
|
2
|
+
|
|
3
|
+
Thank you for your interest in contributing to HoneyBee! We welcome contributions from the community and are grateful for any help you can provide. This guide will help you get started with contributing to our multimodal AI framework for oncology.
|
|
4
|
+
|
|
5
|
+
## Table of Contents
|
|
6
|
+
|
|
7
|
+
- [Code of Conduct](#code-of-conduct)
|
|
8
|
+
- [Getting Started](#getting-started)
|
|
9
|
+
- [Development Setup](#development-setup)
|
|
10
|
+
- [How to Contribute](#how-to-contribute)
|
|
11
|
+
- [Contribution Guidelines](#contribution-guidelines)
|
|
12
|
+
- [Testing](#testing)
|
|
13
|
+
- [Documentation](#documentation)
|
|
14
|
+
- [Pull Request Process](#pull-request-process)
|
|
15
|
+
- [Community](#community)
|
|
16
|
+
|
|
17
|
+
## Code of Conduct
|
|
18
|
+
|
|
19
|
+
By participating in this project, you agree to abide by our Code of Conduct:
|
|
20
|
+
|
|
21
|
+
- **Be Respectful**: Treat everyone with respect. No harassment, discrimination, or inappropriate behavior will be tolerated.
|
|
22
|
+
- **Be Collaborative**: Work together to resolve conflicts and assume good intentions.
|
|
23
|
+
- **Be Professional**: Keep discussions focused on improving the project.
|
|
24
|
+
- **Be Inclusive**: Welcome newcomers and help them get started.
|
|
25
|
+
|
|
26
|
+
## Getting Started
|
|
27
|
+
|
|
28
|
+
1. **Fork the Repository**: Click the "Fork" button on the [HoneyBee GitHub page](https://github.com/lab-rasool/HoneyBee)
|
|
29
|
+
|
|
30
|
+
2. **Star the Repository**: If you find HoneyBee useful, please star it to show your support!
|
|
31
|
+
|
|
32
|
+
3. **Check Issues**: Look through our [open issues](https://github.com/lab-rasool/HoneyBee/issues) for something to work on:
|
|
33
|
+
- Issues labeled `good first issue` are great for newcomers
|
|
34
|
+
- Issues labeled `help wanted` need attention
|
|
35
|
+
- Issues labeled `enhancement` are for new features
|
|
36
|
+
|
|
37
|
+
## Development Setup
|
|
38
|
+
|
|
39
|
+
### Prerequisites
|
|
40
|
+
|
|
41
|
+
- Python 3.8 or higher
|
|
42
|
+
- PyTorch 2.0 or higher
|
|
43
|
+
- Git
|
|
44
|
+
- CUDA 11.7+ (optional, for GPU support)
|
|
45
|
+
|
|
46
|
+
### System Dependencies
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
# Ubuntu/Debian
|
|
50
|
+
sudo apt-get update
|
|
51
|
+
sudo apt-get install -y openslide-tools tesseract-ocr
|
|
52
|
+
|
|
53
|
+
# macOS
|
|
54
|
+
brew install openslide tesseract
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### Setting Up Your Development Environment
|
|
58
|
+
|
|
59
|
+
1. **Clone Your Fork**
|
|
60
|
+
```bash
|
|
61
|
+
git clone https://github.com/YOUR_USERNAME/HoneyBee.git
|
|
62
|
+
cd HoneyBee
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
2. **Add Upstream Remote**
|
|
66
|
+
```bash
|
|
67
|
+
git remote add upstream https://github.com/lab-rasool/HoneyBee.git
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
3. **Create a Virtual Environment**
|
|
71
|
+
```bash
|
|
72
|
+
python -m venv venv
|
|
73
|
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
4. **Install Dependencies**
|
|
77
|
+
```bash
|
|
78
|
+
pip install -r requirements.txt
|
|
79
|
+
pip install -r requirements-dev.txt # If available
|
|
80
|
+
python -c "import nltk; nltk.download('punkt')"
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
5. **Install HoneyBee in Development Mode**
|
|
84
|
+
```bash
|
|
85
|
+
pip install -e .
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
6. **Set Up Environment Variables**
|
|
89
|
+
Create a `.env` file in the project root (see README.md for details)
|
|
90
|
+
|
|
91
|
+
## How to Contribute
|
|
92
|
+
|
|
93
|
+
### Reporting Bugs
|
|
94
|
+
|
|
95
|
+
Before creating a bug report, please check if the issue already exists. If not, create a new issue with:
|
|
96
|
+
|
|
97
|
+
- **Clear title**: Summarize the bug concisely
|
|
98
|
+
- **Description**: Detailed description of the bug
|
|
99
|
+
- **Steps to reproduce**: List the exact steps to reproduce the behavior
|
|
100
|
+
- **Expected behavior**: What you expected to happen
|
|
101
|
+
- **Actual behavior**: What actually happened
|
|
102
|
+
- **Environment details**: Python version, OS, GPU info, etc.
|
|
103
|
+
- **Error messages**: Include full error traceback
|
|
104
|
+
- **Code samples**: Minimal code to reproduce the issue
|
|
105
|
+
|
|
106
|
+
### Suggesting Enhancements
|
|
107
|
+
|
|
108
|
+
Enhancement suggestions are tracked as GitHub issues. When creating an enhancement suggestion, include:
|
|
109
|
+
|
|
110
|
+
- **Clear title**: Summarize the enhancement
|
|
111
|
+
- **Motivation**: Why is this enhancement needed?
|
|
112
|
+
- **Detailed description**: How should it work?
|
|
113
|
+
- **Alternative solutions**: Any alternative solutions you've considered
|
|
114
|
+
- **Additional context**: Mockups, diagrams, or examples
|
|
115
|
+
|
|
116
|
+
### Contributing Code
|
|
117
|
+
|
|
118
|
+
1. **Find or Create an Issue**: Before starting work, ensure there's an issue for what you're working on
|
|
119
|
+
|
|
120
|
+
2. **Create a Feature Branch**
|
|
121
|
+
```bash
|
|
122
|
+
git checkout -b feature/your-feature-name
|
|
123
|
+
# or
|
|
124
|
+
git checkout -b fix/your-bug-fix
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
3. **Make Your Changes**
|
|
128
|
+
- Follow our coding standards (see below)
|
|
129
|
+
- Add or update tests as needed
|
|
130
|
+
- Update documentation if necessary
|
|
131
|
+
|
|
132
|
+
4. **Commit Your Changes**
|
|
133
|
+
```bash
|
|
134
|
+
git add .
|
|
135
|
+
git commit -m "feat: add new feature X"
|
|
136
|
+
# or
|
|
137
|
+
git commit -m "fix: resolve issue with Y"
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
We follow conventional commits:
|
|
141
|
+
- `feat:` New feature
|
|
142
|
+
- `fix:` Bug fix
|
|
143
|
+
- `docs:` Documentation changes
|
|
144
|
+
- `style:` Code style changes (formatting, etc.)
|
|
145
|
+
- `refactor:` Code refactoring
|
|
146
|
+
- `test:` Test additions or modifications
|
|
147
|
+
- `chore:` Maintenance tasks
|
|
148
|
+
|
|
149
|
+
5. **Push to Your Fork**
|
|
150
|
+
```bash
|
|
151
|
+
git push origin feature/your-feature-name
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
6. **Create a Pull Request**: Go to GitHub and create a PR from your fork to the main repository
|
|
155
|
+
|
|
156
|
+
## Contribution Guidelines
|
|
157
|
+
|
|
158
|
+
### Code Style
|
|
159
|
+
|
|
160
|
+
- **Python Style**: Follow [PEP 8](https://www.python.org/dev/peps/pep-0008/)
|
|
161
|
+
- **Docstrings**: Use Google-style docstrings for all public functions and classes
|
|
162
|
+
- **Type Hints**: Use type hints where appropriate
|
|
163
|
+
- **Line Length**: Maximum 100 characters
|
|
164
|
+
- **Imports**: Sort imports with `isort`
|
|
165
|
+
|
|
166
|
+
Example:
|
|
167
|
+
```python
|
|
168
|
+
from typing import List, Optional, Tuple
|
|
169
|
+
|
|
170
|
+
import numpy as np
|
|
171
|
+
import torch
|
|
172
|
+
from torch import nn
|
|
173
|
+
|
|
174
|
+
from honeybee.models.base import BaseModel
|
|
175
|
+
|
|
176
|
+
|
|
177
|
+
class NewModel(BaseModel):
|
|
178
|
+
"""Brief description of the model.
|
|
179
|
+
|
|
180
|
+
Longer description explaining the model's purpose,
|
|
181
|
+
architecture, and use cases.
|
|
182
|
+
|
|
183
|
+
Args:
|
|
184
|
+
input_dim: Dimension of input features
|
|
185
|
+
hidden_dim: Dimension of hidden layers
|
|
186
|
+
output_dim: Dimension of output
|
|
187
|
+
|
|
188
|
+
Example:
|
|
189
|
+
>>> model = NewModel(input_dim=512, hidden_dim=256, output_dim=10)
|
|
190
|
+
>>> embeddings = model.generate_embeddings(data)
|
|
191
|
+
"""
|
|
192
|
+
|
|
193
|
+
def __init__(
|
|
194
|
+
self,
|
|
195
|
+
input_dim: int,
|
|
196
|
+
hidden_dim: int = 256,
|
|
197
|
+
output_dim: int = 128,
|
|
198
|
+
) -> None:
|
|
199
|
+
super().__init__()
|
|
200
|
+
self.encoder = nn.Sequential(
|
|
201
|
+
nn.Linear(input_dim, hidden_dim),
|
|
202
|
+
nn.ReLU(),
|
|
203
|
+
nn.Linear(hidden_dim, output_dim),
|
|
204
|
+
)
|
|
205
|
+
|
|
206
|
+
def generate_embeddings(
|
|
207
|
+
self,
|
|
208
|
+
data: torch.Tensor,
|
|
209
|
+
batch_size: Optional[int] = None,
|
|
210
|
+
) -> np.ndarray:
|
|
211
|
+
"""Generate embeddings for input data.
|
|
212
|
+
|
|
213
|
+
Args:
|
|
214
|
+
data: Input tensor
|
|
215
|
+
batch_size: Optional batch size for processing
|
|
216
|
+
|
|
217
|
+
Returns:
|
|
218
|
+
Embeddings as numpy array
|
|
219
|
+
"""
|
|
220
|
+
# Implementation here
|
|
221
|
+
pass
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
### Architecture Guidelines
|
|
225
|
+
|
|
226
|
+
When adding new components, follow the 3-layer architecture:
|
|
227
|
+
|
|
228
|
+
1. **Data Loaders** (`honeybee/loaders/`)
|
|
229
|
+
- Inherit from appropriate base class
|
|
230
|
+
- Implement standard loading interface
|
|
231
|
+
- Handle error cases gracefully
|
|
232
|
+
- Document supported formats
|
|
233
|
+
|
|
234
|
+
2. **Embedding Models** (`honeybee/models/`)
|
|
235
|
+
- Implement `generate_embeddings()` method
|
|
236
|
+
- Support both CPU and GPU
|
|
237
|
+
- Include model weights handling
|
|
238
|
+
- Document model requirements
|
|
239
|
+
|
|
240
|
+
3. **Processors** (`honeybee/processors/`)
|
|
241
|
+
- Combine loaders and models
|
|
242
|
+
- Implement preprocessing pipelines
|
|
243
|
+
- Handle multimodal integration
|
|
244
|
+
- Provide clear configuration options
|
|
245
|
+
|
|
246
|
+
### Adding New Features
|
|
247
|
+
|
|
248
|
+
1. **Discuss First**: For major features, open an issue for discussion before implementation
|
|
249
|
+
2. **Backward Compatibility**: Ensure changes don't break existing functionality
|
|
250
|
+
3. **Configuration**: Make new features configurable when possible
|
|
251
|
+
4. **Examples**: Add example usage in `examples/` directory
|
|
252
|
+
5. **Documentation**: Update relevant documentation
|
|
253
|
+
|
|
254
|
+
## Testing
|
|
255
|
+
|
|
256
|
+
Currently, HoneyBee uses example scripts for testing. When contributing:
|
|
257
|
+
|
|
258
|
+
1. **Test Your Changes**
|
|
259
|
+
```bash
|
|
260
|
+
# Run relevant example scripts
|
|
261
|
+
python clinical/test_clinical_processing.py
|
|
262
|
+
python examples/survival.py
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
2. **Add Test Cases**: If adding new functionality, create corresponding test scripts
|
|
266
|
+
|
|
267
|
+
3. **Test Multiple Environments**: Test on different Python versions and with/without GPU
|
|
268
|
+
|
|
269
|
+
Future: We plan to implement a comprehensive test suite using pytest.
|
|
270
|
+
|
|
271
|
+
## Documentation
|
|
272
|
+
|
|
273
|
+
### Code Documentation
|
|
274
|
+
|
|
275
|
+
- All public functions and classes must have docstrings
|
|
276
|
+
- Use clear, descriptive variable names
|
|
277
|
+
- Add inline comments for complex logic
|
|
278
|
+
- Include type hints
|
|
279
|
+
|
|
280
|
+
### User Documentation
|
|
281
|
+
|
|
282
|
+
When adding new features:
|
|
283
|
+
|
|
284
|
+
1. Update the main `README.md` if needed
|
|
285
|
+
2. Update `CLAUDE.md` with relevant information
|
|
286
|
+
3. Add example notebooks to `examples/`
|
|
287
|
+
4. Update the website documentation in `website/`
|
|
288
|
+
|
|
289
|
+
### Website Development
|
|
290
|
+
|
|
291
|
+
The documentation website uses Astro:
|
|
292
|
+
|
|
293
|
+
```bash
|
|
294
|
+
cd website
|
|
295
|
+
npm install
|
|
296
|
+
npm run dev # Start development server
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
## Pull Request Process
|
|
300
|
+
|
|
301
|
+
1. **Update Your Branch**
|
|
302
|
+
```bash
|
|
303
|
+
git fetch upstream
|
|
304
|
+
git rebase upstream/main
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
2. **Ensure Quality**
|
|
308
|
+
- [ ] Code follows style guidelines
|
|
309
|
+
- [ ] Tests pass (run example scripts)
|
|
310
|
+
- [ ] Documentation is updated
|
|
311
|
+
- [ ] Commit messages follow conventions
|
|
312
|
+
- [ ] No sensitive data (keys, passwords) included
|
|
313
|
+
|
|
314
|
+
3. **PR Description Template**
|
|
315
|
+
```markdown
|
|
316
|
+
## Description
|
|
317
|
+
Brief description of changes
|
|
318
|
+
|
|
319
|
+
## Type of Change
|
|
320
|
+
- [ ] Bug fix
|
|
321
|
+
- [ ] New feature
|
|
322
|
+
- [ ] Breaking change
|
|
323
|
+
- [ ] Documentation update
|
|
324
|
+
|
|
325
|
+
## Testing
|
|
326
|
+
- [ ] Tested locally
|
|
327
|
+
- [ ] Added new tests
|
|
328
|
+
- [ ] All tests pass
|
|
329
|
+
|
|
330
|
+
## Checklist
|
|
331
|
+
- [ ] My code follows the project style guidelines
|
|
332
|
+
- [ ] I have performed a self-review
|
|
333
|
+
- [ ] I have commented my code where necessary
|
|
334
|
+
- [ ] I have updated the documentation
|
|
335
|
+
- [ ] My changes generate no new warnings
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
4. **Review Process**
|
|
339
|
+
- A maintainer will review your PR
|
|
340
|
+
- Address any requested changes
|
|
341
|
+
- Once approved, your PR will be merged
|
|
342
|
+
|
|
343
|
+
## Community
|
|
344
|
+
|
|
345
|
+
### Getting Help
|
|
346
|
+
|
|
347
|
+
- **Issues**: Use GitHub issues for bugs and features
|
|
348
|
+
- **Discussions**: Use GitHub Discussions for questions and ideas
|
|
349
|
+
|
|
350
|
+
### Recognition
|
|
351
|
+
|
|
352
|
+
Contributors are recognized in several ways:
|
|
353
|
+
- Listed in the project's contributors section
|
|
354
|
+
- Mentioned in release notes for significant contributions
|
|
355
|
+
- Co-authorship on related publications (for substantial research contributions)
|
|
356
|
+
|
|
357
|
+
## Thank You!
|
|
358
|
+
|
|
359
|
+
Your contributions make HoneyBee better for everyone in the oncology AI research community. We appreciate your time and effort in improving this project!
|
|
360
|
+
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
If you have any questions about contributing, please don't hesitate to ask. We're here to help and look forward to your contributions! 🐝
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 Aakash Tripathi @ Moffitt Cancer Center
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# Include documentation and license files
|
|
2
|
+
include README.md
|
|
3
|
+
include LICENSE
|
|
4
|
+
include CONTRIBUTING.md
|
|
5
|
+
|
|
6
|
+
# Include requirements file for reference
|
|
7
|
+
include requirements.txt
|
|
8
|
+
|
|
9
|
+
# Exclude unnecessary files
|
|
10
|
+
exclude .env
|
|
11
|
+
exclude .gitignore
|
|
12
|
+
exclude .gitattributes
|
|
13
|
+
|
|
14
|
+
# Exclude test and development directories
|
|
15
|
+
recursive-exclude tests *
|
|
16
|
+
recursive-exclude docs *
|
|
17
|
+
recursive-exclude website *
|
|
18
|
+
recursive-exclude results *
|
|
19
|
+
recursive-exclude BACKUP *
|
|
20
|
+
recursive-exclude to_delete *
|
|
21
|
+
recursive-exclude paper *
|
|
22
|
+
|
|
23
|
+
# Exclude compiled Python files
|
|
24
|
+
global-exclude __pycache__
|
|
25
|
+
global-exclude *.py[co]
|
|
26
|
+
global-exclude *.so
|
|
27
|
+
|
|
28
|
+
# Exclude git and IDE files
|
|
29
|
+
global-exclude .git*
|
|
30
|
+
global-exclude .vscode
|
|
31
|
+
global-exclude .DS_Store
|
|
@@ -0,0 +1,278 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: honeybee-ml
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: A Scalable Modular Framework for Multimodal AI in Oncology
|
|
5
|
+
Author-email: Aakash Tripathi <aakash.tripathi@moffitt.org>, Lab Rasool <ghulam.rasool@moffitt.org>
|
|
6
|
+
Maintainer-email: Aakash Tripathi <aakash.tripathi@moffitt.org>
|
|
7
|
+
License-Expression: MIT
|
|
8
|
+
Project-URL: Homepage, https://github.com/lab-rasool/HoneyBee
|
|
9
|
+
Project-URL: Documentation, https://lab-rasool.github.io/HoneyBee/
|
|
10
|
+
Project-URL: Repository, https://github.com/lab-rasool/HoneyBee
|
|
11
|
+
Project-URL: Bug Tracker, https://github.com/lab-rasool/HoneyBee/issues
|
|
12
|
+
Project-URL: Paper, https://arxiv.org/abs/2405.07460
|
|
13
|
+
Keywords: multimodal AI,oncology,cancer research,medical imaging,clinical NLP,machine learning,pathology,radiology,biomedical,healthcare
|
|
14
|
+
Classifier: Development Status :: 3 - Alpha
|
|
15
|
+
Classifier: Intended Audience :: Science/Research
|
|
16
|
+
Classifier: Intended Audience :: Healthcare Industry
|
|
17
|
+
Classifier: Programming Language :: Python :: 3
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
23
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
24
|
+
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
|
|
25
|
+
Classifier: Topic :: Scientific/Engineering :: Image Recognition
|
|
26
|
+
Classifier: Topic :: Text Processing :: Linguistic
|
|
27
|
+
Classifier: Operating System :: OS Independent
|
|
28
|
+
Requires-Python: >=3.8
|
|
29
|
+
Description-Content-Type: text/markdown
|
|
30
|
+
License-File: LICENSE
|
|
31
|
+
Requires-Dist: numpy>=1.20.0
|
|
32
|
+
Requires-Dist: pandas>=1.3.0
|
|
33
|
+
Requires-Dist: torch>=2.0.0
|
|
34
|
+
Requires-Dist: torchvision>=0.15.0
|
|
35
|
+
Requires-Dist: torchaudio>=2.0.0
|
|
36
|
+
Requires-Dist: transformers>=4.30.0
|
|
37
|
+
Requires-Dist: accelerate>=0.20.0
|
|
38
|
+
Requires-Dist: bitsandbytes>=0.40.0
|
|
39
|
+
Requires-Dist: scikit-image>=0.19.0
|
|
40
|
+
Requires-Dist: scipy>=1.7.0
|
|
41
|
+
Requires-Dist: matplotlib>=3.3.0
|
|
42
|
+
Requires-Dist: imageio>=2.9.0
|
|
43
|
+
Provides-Extra: clinical
|
|
44
|
+
Requires-Dist: llama-index>=0.9.0; extra == "clinical"
|
|
45
|
+
Requires-Dist: langchain>=0.1.0; extra == "clinical"
|
|
46
|
+
Requires-Dist: pytesseract>=0.3.8; extra == "clinical"
|
|
47
|
+
Requires-Dist: pdf2image>=1.16.0; extra == "clinical"
|
|
48
|
+
Requires-Dist: PyPDF2>=3.0.0; extra == "clinical"
|
|
49
|
+
Provides-Extra: radiology
|
|
50
|
+
Requires-Dist: pydicom>=2.3.0; extra == "radiology"
|
|
51
|
+
Requires-Dist: SimpleITK>=2.2.0; extra == "radiology"
|
|
52
|
+
Requires-Dist: nibabel>=4.0.0; extra == "radiology"
|
|
53
|
+
Requires-Dist: opencv-python>=4.5.0; extra == "radiology"
|
|
54
|
+
Provides-Extra: pathology
|
|
55
|
+
Requires-Dist: openslide-python>=1.2.0; extra == "pathology"
|
|
56
|
+
Requires-Dist: colour-science>=0.4.0; extra == "pathology"
|
|
57
|
+
Requires-Dist: albumentations>=1.3.0; extra == "pathology"
|
|
58
|
+
Requires-Dist: cucim>=23.0.0; extra == "pathology"
|
|
59
|
+
Provides-Extra: molecular
|
|
60
|
+
Requires-Dist: pyarrow>=10.0.0; extra == "molecular"
|
|
61
|
+
Requires-Dist: fastparquet>=2023.0.0; extra == "molecular"
|
|
62
|
+
Provides-Extra: database
|
|
63
|
+
Requires-Dist: pymongo>=4.0.0; extra == "database"
|
|
64
|
+
Provides-Extra: dev
|
|
65
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
66
|
+
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
|
|
67
|
+
Requires-Dist: black>=23.0.0; extra == "dev"
|
|
68
|
+
Requires-Dist: ruff>=0.1.0; extra == "dev"
|
|
69
|
+
Requires-Dist: mypy>=1.0.0; extra == "dev"
|
|
70
|
+
Provides-Extra: visualization
|
|
71
|
+
Requires-Dist: ipykernel>=6.0.0; extra == "visualization"
|
|
72
|
+
Requires-Dist: ipywidgets>=8.0.0; extra == "visualization"
|
|
73
|
+
Provides-Extra: models
|
|
74
|
+
Requires-Dist: timm>=0.9.0; extra == "models"
|
|
75
|
+
Requires-Dist: onnxruntime>=1.15.0; extra == "models"
|
|
76
|
+
Requires-Dist: peft>=0.5.0; extra == "models"
|
|
77
|
+
Provides-Extra: all
|
|
78
|
+
Requires-Dist: honeybee-ml[clinical,database,models,molecular,pathology,radiology,visualization]; extra == "all"
|
|
79
|
+
Dynamic: license-file
|
|
80
|
+
|
|
81
|
+
<div align="center">
|
|
82
|
+
<img src="website/public/images/logo.png" alt="HoneyBee Logo" width="200">
|
|
83
|
+
|
|
84
|
+
# HoneyBee
|
|
85
|
+
|
|
86
|
+
**A Scalable Modular Framework for Multimodal AI in Oncology**
|
|
87
|
+
|
|
88
|
+
[](https://arxiv.org/abs/2405.07460)
|
|
89
|
+
[](LICENSE)
|
|
90
|
+
[](https://github.com/lab-rasool/HoneyBee/stargazers)
|
|
91
|
+
[](https://www.python.org/downloads/)
|
|
92
|
+
[](https://pytorch.org/)
|
|
93
|
+
|
|
94
|
+
[Documentation](https://lab-rasool.github.io/HoneyBee/) | [Paper](https://arxiv.org/abs/2405.07460) | [Examples](examples/) | [Demo](app.py) | [Google Colab](https://colab.research.google.com/)
|
|
95
|
+
</div>
|
|
96
|
+
|
|
97
|
+
## 🚀 Overview
|
|
98
|
+
|
|
99
|
+
HoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data types—clinical text, radiology images, pathology slides, and molecular data—through a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.
|
|
100
|
+
|
|
101
|
+
> [!WARNING]
|
|
102
|
+
> **Alpha Release**: This framework is currently in alpha. APIs may change, and some features are still under development.
|
|
103
|
+
|
|
104
|
+
## ✨ Key Features
|
|
105
|
+
|
|
106
|
+
### 🏗️ Modular Architecture
|
|
107
|
+
- **3-Layer Design**: Clean separation between data loaders, embedding models, and processors
|
|
108
|
+
- **Unified API**: Consistent interface across all modalities
|
|
109
|
+
- **Extensible**: Easy to add new models and data sources
|
|
110
|
+
- **Production-Ready**: Optimized for both research and clinical deployment
|
|
111
|
+
|
|
112
|
+
### 📊 Comprehensive Data Support
|
|
113
|
+
|
|
114
|
+
#### Medical Imaging
|
|
115
|
+
- **Pathology**: Whole Slide Images (WSI) - SVS, TIFF formats with tissue detection
|
|
116
|
+
- **Radiology**: DICOM, NIFTI processing with 3D support
|
|
117
|
+
- **Preprocessing**: Advanced augmentation and normalization pipelines
|
|
118
|
+
|
|
119
|
+
#### Clinical Text
|
|
120
|
+
- **Document Processing**: PDF support with OCR for scanned documents
|
|
121
|
+
- **NLP Pipeline**: Cancer entity extraction, temporal parsing, medical ontology integration
|
|
122
|
+
- **Database Integration**: Native [MINDS](https://github.com/lab-rasool/MINDS) format support
|
|
123
|
+
- **Long Document Handling**: Multiple tokenization strategies for clinical notes
|
|
124
|
+
|
|
125
|
+
#### Molecular Data
|
|
126
|
+
- **Genomics**: Support for expression data and mutation profiles
|
|
127
|
+
- **Integration**: Seamless combination with imaging and clinical data
|
|
128
|
+
|
|
129
|
+
### 🧠 State-of-the-Art Embedding Models
|
|
130
|
+
|
|
131
|
+
#### Clinical Text Embeddings
|
|
132
|
+
- **GatorTron**: Domain-specific clinical language model
|
|
133
|
+
- **BioBERT**: Biomedical text understanding
|
|
134
|
+
- **PubMedBERT**: Scientific literature embeddings
|
|
135
|
+
- **Clinical-T5**: Text-to-text clinical transformers
|
|
136
|
+
|
|
137
|
+
#### Medical Image Embeddings
|
|
138
|
+
- **REMEDIS**: Self-supervised medical image representations
|
|
139
|
+
- **RadImageNet**: Pre-trained radiological feature extractors
|
|
140
|
+
- **UNI**: Universal medical image encoder
|
|
141
|
+
- **Custom Models**: Easy integration of proprietary models
|
|
142
|
+
|
|
143
|
+
### 🛠️ Advanced Capabilities
|
|
144
|
+
|
|
145
|
+
#### Multimodal Integration
|
|
146
|
+
- **Cross-Modal Learning**: Unified representations across modalities
|
|
147
|
+
- **Attention Mechanisms**: Interpretable fusion strategies
|
|
148
|
+
- **Patient-Level Aggregation**: Comprehensive patient profiles
|
|
149
|
+
|
|
150
|
+
#### Analysis Tools
|
|
151
|
+
- **Survival Analysis**: Cox PH, Random Survival Forest, DeepSurv
|
|
152
|
+
- **Classification**: Multi-class cancer type prediction
|
|
153
|
+
- **Retrieval**: Similar patient identification
|
|
154
|
+
- **Visualization**: Interactive t-SNE dashboards
|
|
155
|
+
|
|
156
|
+
#### Clinical Applications
|
|
157
|
+
- **Risk Stratification**: Patient outcome prediction
|
|
158
|
+
- **Treatment Planning**: Personalized therapy recommendations
|
|
159
|
+
- **Biomarker Discovery**: Multi-omic pattern identification
|
|
160
|
+
|
|
161
|
+
## 🚀 Quick Start
|
|
162
|
+
|
|
163
|
+
### Prerequisites
|
|
164
|
+
|
|
165
|
+
- Python 3.8+
|
|
166
|
+
- PyTorch 2.0+
|
|
167
|
+
- CUDA 11.7+ (optional, for GPU acceleration)
|
|
168
|
+
|
|
169
|
+
### System Dependencies
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
# Ubuntu/Debian
|
|
173
|
+
sudo apt-get update
|
|
174
|
+
sudo apt-get install -y openslide-tools tesseract-ocr
|
|
175
|
+
|
|
176
|
+
# macOS
|
|
177
|
+
brew install openslide tesseract
|
|
178
|
+
|
|
179
|
+
# Windows
|
|
180
|
+
# Install from official websites:
|
|
181
|
+
# - OpenSlide: https://openslide.org/download/
|
|
182
|
+
# - Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
### Installation
|
|
186
|
+
|
|
187
|
+
```bash
|
|
188
|
+
# Clone the repository
|
|
189
|
+
git clone https://github.com/lab-rasool/HoneyBee.git
|
|
190
|
+
cd HoneyBee
|
|
191
|
+
|
|
192
|
+
# Install dependencies
|
|
193
|
+
pip install -r requirements.txt
|
|
194
|
+
|
|
195
|
+
# Download required NLTK data
|
|
196
|
+
python -c "import nltk; nltk.download('punkt')"
|
|
197
|
+
|
|
198
|
+
# Install HoneyBee in development mode
|
|
199
|
+
pip install -e .
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
### Environment Setup
|
|
203
|
+
|
|
204
|
+
Create a `.env` file in the project root:
|
|
205
|
+
|
|
206
|
+
```bash
|
|
207
|
+
# MINDS database credentials (if using MINDS format)
|
|
208
|
+
HOST=your_server
|
|
209
|
+
PORT=5433
|
|
210
|
+
DB_USER=postgres
|
|
211
|
+
PASSWORD=your_password
|
|
212
|
+
DATABASE=minds
|
|
213
|
+
|
|
214
|
+
# HuggingFace API (for some models)
|
|
215
|
+
HF_API_KEY=your_huggingface_api_key
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
## 🔬 Research Applications
|
|
219
|
+
|
|
220
|
+
HoneyBee has been successfully applied to:
|
|
221
|
+
|
|
222
|
+
- **Cancer Subtype Classification**: Automated identification of cancer subtypes from multimodal data
|
|
223
|
+
- **Survival Prediction**: Risk stratification and outcome prediction for treatment planning
|
|
224
|
+
- **Similar Patient Retrieval**: Finding patients with similar clinical profiles for precision medicine
|
|
225
|
+
- **Biomarker Discovery**: Identifying multimodal patterns associated with treatment response
|
|
226
|
+
|
|
227
|
+
## 🤝 Contributing
|
|
228
|
+
|
|
229
|
+
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
|
|
230
|
+
|
|
231
|
+
### Development Setup
|
|
232
|
+
|
|
233
|
+
```bash
|
|
234
|
+
# Fork and clone your fork
|
|
235
|
+
git clone https://github.com/YOUR_USERNAME/HoneyBee.git
|
|
236
|
+
cd HoneyBee
|
|
237
|
+
|
|
238
|
+
# Create a virtual environment
|
|
239
|
+
python -m venv venv
|
|
240
|
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
241
|
+
|
|
242
|
+
# Install in development mode
|
|
243
|
+
pip install -r requirements.txt
|
|
244
|
+
pip install -e .
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
## 🐛 Known Issues & Limitations
|
|
248
|
+
|
|
249
|
+
- **Alpha Status**: Some features are still under development
|
|
250
|
+
- **Memory Requirements**: WSI processing requires significant RAM (16GB+ recommended)
|
|
251
|
+
- **GPU Recommended**: While CPU fallback exists, GPU acceleration significantly improves performance
|
|
252
|
+
- **Limited Test Coverage**: Comprehensive test suite is planned for future releases
|
|
253
|
+
|
|
254
|
+
## 📜 License
|
|
255
|
+
|
|
256
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
257
|
+
|
|
258
|
+
## 📝 Citation
|
|
259
|
+
|
|
260
|
+
If you use HoneyBee in your research, please cite our paper:
|
|
261
|
+
|
|
262
|
+
```bibtex
|
|
263
|
+
@article{tripathi2024honeybee,
|
|
264
|
+
title={HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models},
|
|
265
|
+
author={Aakash Tripathi and Asim Waqas and Yasin Yilmaz and Ghulam Rasool},
|
|
266
|
+
journal={arXiv preprint arXiv:2405.07460},
|
|
267
|
+
year={2024},
|
|
268
|
+
eprint={2405.07460},
|
|
269
|
+
archivePrefix={arXiv},
|
|
270
|
+
primaryClass={cs.LG}
|
|
271
|
+
}
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
---
|
|
275
|
+
|
|
276
|
+
<div align="center">
|
|
277
|
+
Made with ❤️ by the <a href="https://github.com/lab-rasool">Lab Rasool</a> team
|
|
278
|
+
</div>
|