honeybee-ml 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. honeybee_ml-0.1.0/CONTRIBUTING.md +363 -0
  2. honeybee_ml-0.1.0/LICENSE +21 -0
  3. honeybee_ml-0.1.0/MANIFEST.in +31 -0
  4. honeybee_ml-0.1.0/PKG-INFO +278 -0
  5. honeybee_ml-0.1.0/README.md +198 -0
  6. honeybee_ml-0.1.0/honeybee/__init__.py +198 -0
  7. honeybee_ml-0.1.0/honeybee/loaders/Radiology/__init__.py +19 -0
  8. honeybee_ml-0.1.0/honeybee/loaders/Radiology/dataset.py +239 -0
  9. honeybee_ml-0.1.0/honeybee/loaders/Radiology/loader.py +543 -0
  10. honeybee_ml-0.1.0/honeybee/loaders/Radiology/metadata.py +90 -0
  11. honeybee_ml-0.1.0/honeybee/loaders/Radiology/radiology.py +80 -0
  12. honeybee_ml-0.1.0/honeybee/loaders/Reader/mindsDBreader.py +45 -0
  13. honeybee_ml-0.1.0/honeybee/loaders/Reader/reader.py +84 -0
  14. honeybee_ml-0.1.0/honeybee/loaders/Scans/scan.py +40 -0
  15. honeybee_ml-0.1.0/honeybee/loaders/Slide/__init__.py +4 -0
  16. honeybee_ml-0.1.0/honeybee/loaders/Slide/slide.py +291 -0
  17. honeybee_ml-0.1.0/honeybee/loaders/__init__.py +14 -0
  18. honeybee_ml-0.1.0/honeybee/models/HuggingFaceEmbedder/embedder.py +101 -0
  19. honeybee_ml-0.1.0/honeybee/models/LlamaEmbedder/__init__.py +3 -0
  20. honeybee_ml-0.1.0/honeybee/models/LlamaEmbedder/llama_embedder.py +307 -0
  21. honeybee_ml-0.1.0/honeybee/models/REMEDIS/remedis.py +58 -0
  22. honeybee_ml-0.1.0/honeybee/models/RadImageNet/__init__.py +9 -0
  23. honeybee_ml-0.1.0/honeybee/models/RadImageNet/radimagenet.py +457 -0
  24. honeybee_ml-0.1.0/honeybee/models/TissueDetector/tissue_detector.py +91 -0
  25. honeybee_ml-0.1.0/honeybee/models/UNI/uni.py +34 -0
  26. honeybee_ml-0.1.0/honeybee/models/UNI2/__init__.py +3 -0
  27. honeybee_ml-0.1.0/honeybee/models/UNI2/uni2.py +178 -0
  28. honeybee_ml-0.1.0/honeybee/models/Virchow2/__init__.py +3 -0
  29. honeybee_ml-0.1.0/honeybee/models/Virchow2/virchow2.py +164 -0
  30. honeybee_ml-0.1.0/honeybee/models/Virchow2/virchow2_simple.py +144 -0
  31. honeybee_ml-0.1.0/honeybee/models/__init__.py +6 -0
  32. honeybee_ml-0.1.0/honeybee/processors/__init__.py +25 -0
  33. honeybee_ml-0.1.0/honeybee/processors/clinical_processor.py +1351 -0
  34. honeybee_ml-0.1.0/honeybee/processors/pathology_processor.py +552 -0
  35. honeybee_ml-0.1.0/honeybee/processors/radiology/__init__.py +35 -0
  36. honeybee_ml-0.1.0/honeybee/processors/radiology/preprocessing.py +573 -0
  37. honeybee_ml-0.1.0/honeybee/processors/radiology/processor.py +790 -0
  38. honeybee_ml-0.1.0/honeybee/processors/radiology/segmentation.py +714 -0
  39. honeybee_ml-0.1.0/honeybee/processors/radiology_processor.py +20 -0
  40. honeybee_ml-0.1.0/honeybee/processors/wsi/__init__.py +58 -0
  41. honeybee_ml-0.1.0/honeybee/processors/wsi/stain_normalization.py +700 -0
  42. honeybee_ml-0.1.0/honeybee/processors/wsi/stain_normalization_fixed.py +150 -0
  43. honeybee_ml-0.1.0/honeybee/processors/wsi/stain_normalization_working.py +130 -0
  44. honeybee_ml-0.1.0/honeybee/processors/wsi/stain_separation.py +299 -0
  45. honeybee_ml-0.1.0/honeybee/processors/wsi/tissue_detection.py +357 -0
  46. honeybee_ml-0.1.0/honeybee/utils/__init__.py +5 -0
  47. honeybee_ml-0.1.0/honeybee/utils/pydantic_compat.py +31 -0
  48. honeybee_ml-0.1.0/honeybee_ml.egg-info/PKG-INFO +278 -0
  49. honeybee_ml-0.1.0/honeybee_ml.egg-info/SOURCES.txt +53 -0
  50. honeybee_ml-0.1.0/honeybee_ml.egg-info/dependency_links.txt +1 -0
  51. honeybee_ml-0.1.0/honeybee_ml.egg-info/requires.txt +57 -0
  52. honeybee_ml-0.1.0/honeybee_ml.egg-info/top_level.txt +1 -0
  53. honeybee_ml-0.1.0/pyproject.toml +180 -0
  54. honeybee_ml-0.1.0/requirements.txt +34 -0
  55. honeybee_ml-0.1.0/setup.cfg +4 -0
@@ -0,0 +1,363 @@
1
+ # Contributing to HoneyBee
2
+
3
+ Thank you for your interest in contributing to HoneyBee! We welcome contributions from the community and are grateful for any help you can provide. This guide will help you get started with contributing to our multimodal AI framework for oncology.
4
+
5
+ ## Table of Contents
6
+
7
+ - [Code of Conduct](#code-of-conduct)
8
+ - [Getting Started](#getting-started)
9
+ - [Development Setup](#development-setup)
10
+ - [How to Contribute](#how-to-contribute)
11
+ - [Contribution Guidelines](#contribution-guidelines)
12
+ - [Testing](#testing)
13
+ - [Documentation](#documentation)
14
+ - [Pull Request Process](#pull-request-process)
15
+ - [Community](#community)
16
+
17
+ ## Code of Conduct
18
+
19
+ By participating in this project, you agree to abide by our Code of Conduct:
20
+
21
+ - **Be Respectful**: Treat everyone with respect. No harassment, discrimination, or inappropriate behavior will be tolerated.
22
+ - **Be Collaborative**: Work together to resolve conflicts and assume good intentions.
23
+ - **Be Professional**: Keep discussions focused on improving the project.
24
+ - **Be Inclusive**: Welcome newcomers and help them get started.
25
+
26
+ ## Getting Started
27
+
28
+ 1. **Fork the Repository**: Click the "Fork" button on the [HoneyBee GitHub page](https://github.com/lab-rasool/HoneyBee)
29
+
30
+ 2. **Star the Repository**: If you find HoneyBee useful, please star it to show your support!
31
+
32
+ 3. **Check Issues**: Look through our [open issues](https://github.com/lab-rasool/HoneyBee/issues) for something to work on:
33
+ - Issues labeled `good first issue` are great for newcomers
34
+ - Issues labeled `help wanted` need attention
35
+ - Issues labeled `enhancement` are for new features
36
+
37
+ ## Development Setup
38
+
39
+ ### Prerequisites
40
+
41
+ - Python 3.8 or higher
42
+ - PyTorch 2.0 or higher
43
+ - Git
44
+ - CUDA 11.7+ (optional, for GPU support)
45
+
46
+ ### System Dependencies
47
+
48
+ ```bash
49
+ # Ubuntu/Debian
50
+ sudo apt-get update
51
+ sudo apt-get install -y openslide-tools tesseract-ocr
52
+
53
+ # macOS
54
+ brew install openslide tesseract
55
+ ```
56
+
57
+ ### Setting Up Your Development Environment
58
+
59
+ 1. **Clone Your Fork**
60
+ ```bash
61
+ git clone https://github.com/YOUR_USERNAME/HoneyBee.git
62
+ cd HoneyBee
63
+ ```
64
+
65
+ 2. **Add Upstream Remote**
66
+ ```bash
67
+ git remote add upstream https://github.com/lab-rasool/HoneyBee.git
68
+ ```
69
+
70
+ 3. **Create a Virtual Environment**
71
+ ```bash
72
+ python -m venv venv
73
+ source venv/bin/activate # On Windows: venv\Scripts\activate
74
+ ```
75
+
76
+ 4. **Install Dependencies**
77
+ ```bash
78
+ pip install -r requirements.txt
79
+ pip install -r requirements-dev.txt # If available
80
+ python -c "import nltk; nltk.download('punkt')"
81
+ ```
82
+
83
+ 5. **Install HoneyBee in Development Mode**
84
+ ```bash
85
+ pip install -e .
86
+ ```
87
+
88
+ 6. **Set Up Environment Variables**
89
+ Create a `.env` file in the project root (see README.md for details)
90
+
91
+ ## How to Contribute
92
+
93
+ ### Reporting Bugs
94
+
95
+ Before creating a bug report, please check if the issue already exists. If not, create a new issue with:
96
+
97
+ - **Clear title**: Summarize the bug concisely
98
+ - **Description**: Detailed description of the bug
99
+ - **Steps to reproduce**: List the exact steps to reproduce the behavior
100
+ - **Expected behavior**: What you expected to happen
101
+ - **Actual behavior**: What actually happened
102
+ - **Environment details**: Python version, OS, GPU info, etc.
103
+ - **Error messages**: Include full error traceback
104
+ - **Code samples**: Minimal code to reproduce the issue
105
+
106
+ ### Suggesting Enhancements
107
+
108
+ Enhancement suggestions are tracked as GitHub issues. When creating an enhancement suggestion, include:
109
+
110
+ - **Clear title**: Summarize the enhancement
111
+ - **Motivation**: Why is this enhancement needed?
112
+ - **Detailed description**: How should it work?
113
+ - **Alternative solutions**: Any alternative solutions you've considered
114
+ - **Additional context**: Mockups, diagrams, or examples
115
+
116
+ ### Contributing Code
117
+
118
+ 1. **Find or Create an Issue**: Before starting work, ensure there's an issue for what you're working on
119
+
120
+ 2. **Create a Feature Branch**
121
+ ```bash
122
+ git checkout -b feature/your-feature-name
123
+ # or
124
+ git checkout -b fix/your-bug-fix
125
+ ```
126
+
127
+ 3. **Make Your Changes**
128
+ - Follow our coding standards (see below)
129
+ - Add or update tests as needed
130
+ - Update documentation if necessary
131
+
132
+ 4. **Commit Your Changes**
133
+ ```bash
134
+ git add .
135
+ git commit -m "feat: add new feature X"
136
+ # or
137
+ git commit -m "fix: resolve issue with Y"
138
+ ```
139
+
140
+ We follow conventional commits:
141
+ - `feat:` New feature
142
+ - `fix:` Bug fix
143
+ - `docs:` Documentation changes
144
+ - `style:` Code style changes (formatting, etc.)
145
+ - `refactor:` Code refactoring
146
+ - `test:` Test additions or modifications
147
+ - `chore:` Maintenance tasks
148
+
149
+ 5. **Push to Your Fork**
150
+ ```bash
151
+ git push origin feature/your-feature-name
152
+ ```
153
+
154
+ 6. **Create a Pull Request**: Go to GitHub and create a PR from your fork to the main repository
155
+
156
+ ## Contribution Guidelines
157
+
158
+ ### Code Style
159
+
160
+ - **Python Style**: Follow [PEP 8](https://www.python.org/dev/peps/pep-0008/)
161
+ - **Docstrings**: Use Google-style docstrings for all public functions and classes
162
+ - **Type Hints**: Use type hints where appropriate
163
+ - **Line Length**: Maximum 100 characters
164
+ - **Imports**: Sort imports with `isort`
165
+
166
+ Example:
167
+ ```python
168
+ from typing import List, Optional, Tuple
169
+
170
+ import numpy as np
171
+ import torch
172
+ from torch import nn
173
+
174
+ from honeybee.models.base import BaseModel
175
+
176
+
177
+ class NewModel(BaseModel):
178
+ """Brief description of the model.
179
+
180
+ Longer description explaining the model's purpose,
181
+ architecture, and use cases.
182
+
183
+ Args:
184
+ input_dim: Dimension of input features
185
+ hidden_dim: Dimension of hidden layers
186
+ output_dim: Dimension of output
187
+
188
+ Example:
189
+ >>> model = NewModel(input_dim=512, hidden_dim=256, output_dim=10)
190
+ >>> embeddings = model.generate_embeddings(data)
191
+ """
192
+
193
+ def __init__(
194
+ self,
195
+ input_dim: int,
196
+ hidden_dim: int = 256,
197
+ output_dim: int = 128,
198
+ ) -> None:
199
+ super().__init__()
200
+ self.encoder = nn.Sequential(
201
+ nn.Linear(input_dim, hidden_dim),
202
+ nn.ReLU(),
203
+ nn.Linear(hidden_dim, output_dim),
204
+ )
205
+
206
+ def generate_embeddings(
207
+ self,
208
+ data: torch.Tensor,
209
+ batch_size: Optional[int] = None,
210
+ ) -> np.ndarray:
211
+ """Generate embeddings for input data.
212
+
213
+ Args:
214
+ data: Input tensor
215
+ batch_size: Optional batch size for processing
216
+
217
+ Returns:
218
+ Embeddings as numpy array
219
+ """
220
+ # Implementation here
221
+ pass
222
+ ```
223
+
224
+ ### Architecture Guidelines
225
+
226
+ When adding new components, follow the 3-layer architecture:
227
+
228
+ 1. **Data Loaders** (`honeybee/loaders/`)
229
+ - Inherit from appropriate base class
230
+ - Implement standard loading interface
231
+ - Handle error cases gracefully
232
+ - Document supported formats
233
+
234
+ 2. **Embedding Models** (`honeybee/models/`)
235
+ - Implement `generate_embeddings()` method
236
+ - Support both CPU and GPU
237
+ - Include model weights handling
238
+ - Document model requirements
239
+
240
+ 3. **Processors** (`honeybee/processors/`)
241
+ - Combine loaders and models
242
+ - Implement preprocessing pipelines
243
+ - Handle multimodal integration
244
+ - Provide clear configuration options
245
+
246
+ ### Adding New Features
247
+
248
+ 1. **Discuss First**: For major features, open an issue for discussion before implementation
249
+ 2. **Backward Compatibility**: Ensure changes don't break existing functionality
250
+ 3. **Configuration**: Make new features configurable when possible
251
+ 4. **Examples**: Add example usage in `examples/` directory
252
+ 5. **Documentation**: Update relevant documentation
253
+
254
+ ## Testing
255
+
256
+ Currently, HoneyBee uses example scripts for testing. When contributing:
257
+
258
+ 1. **Test Your Changes**
259
+ ```bash
260
+ # Run relevant example scripts
261
+ python clinical/test_clinical_processing.py
262
+ python examples/survival.py
263
+ ```
264
+
265
+ 2. **Add Test Cases**: If adding new functionality, create corresponding test scripts
266
+
267
+ 3. **Test Multiple Environments**: Test on different Python versions and with/without GPU
268
+
269
+ Future: We plan to implement a comprehensive test suite using pytest.
270
+
271
+ ## Documentation
272
+
273
+ ### Code Documentation
274
+
275
+ - All public functions and classes must have docstrings
276
+ - Use clear, descriptive variable names
277
+ - Add inline comments for complex logic
278
+ - Include type hints
279
+
280
+ ### User Documentation
281
+
282
+ When adding new features:
283
+
284
+ 1. Update the main `README.md` if needed
285
+ 2. Update `CLAUDE.md` with relevant information
286
+ 3. Add example notebooks to `examples/`
287
+ 4. Update the website documentation in `website/`
288
+
289
+ ### Website Development
290
+
291
+ The documentation website uses Astro:
292
+
293
+ ```bash
294
+ cd website
295
+ npm install
296
+ npm run dev # Start development server
297
+ ```
298
+
299
+ ## Pull Request Process
300
+
301
+ 1. **Update Your Branch**
302
+ ```bash
303
+ git fetch upstream
304
+ git rebase upstream/main
305
+ ```
306
+
307
+ 2. **Ensure Quality**
308
+ - [ ] Code follows style guidelines
309
+ - [ ] Tests pass (run example scripts)
310
+ - [ ] Documentation is updated
311
+ - [ ] Commit messages follow conventions
312
+ - [ ] No sensitive data (keys, passwords) included
313
+
314
+ 3. **PR Description Template**
315
+ ```markdown
316
+ ## Description
317
+ Brief description of changes
318
+
319
+ ## Type of Change
320
+ - [ ] Bug fix
321
+ - [ ] New feature
322
+ - [ ] Breaking change
323
+ - [ ] Documentation update
324
+
325
+ ## Testing
326
+ - [ ] Tested locally
327
+ - [ ] Added new tests
328
+ - [ ] All tests pass
329
+
330
+ ## Checklist
331
+ - [ ] My code follows the project style guidelines
332
+ - [ ] I have performed a self-review
333
+ - [ ] I have commented my code where necessary
334
+ - [ ] I have updated the documentation
335
+ - [ ] My changes generate no new warnings
336
+ ```
337
+
338
+ 4. **Review Process**
339
+ - A maintainer will review your PR
340
+ - Address any requested changes
341
+ - Once approved, your PR will be merged
342
+
343
+ ## Community
344
+
345
+ ### Getting Help
346
+
347
+ - **Issues**: Use GitHub issues for bugs and features
348
+ - **Discussions**: Use GitHub Discussions for questions and ideas
349
+
350
+ ### Recognition
351
+
352
+ Contributors are recognized in several ways:
353
+ - Listed in the project's contributors section
354
+ - Mentioned in release notes for significant contributions
355
+ - Co-authorship on related publications (for substantial research contributions)
356
+
357
+ ## Thank You!
358
+
359
+ Your contributions make HoneyBee better for everyone in the oncology AI research community. We appreciate your time and effort in improving this project!
360
+
361
+ ---
362
+
363
+ If you have any questions about contributing, please don't hesitate to ask. We're here to help and look forward to your contributions! 🐝
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Aakash Tripathi @ Moffitt Cancer Center
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,31 @@
1
+ # Include documentation and license files
2
+ include README.md
3
+ include LICENSE
4
+ include CONTRIBUTING.md
5
+
6
+ # Include requirements file for reference
7
+ include requirements.txt
8
+
9
+ # Exclude unnecessary files
10
+ exclude .env
11
+ exclude .gitignore
12
+ exclude .gitattributes
13
+
14
+ # Exclude test and development directories
15
+ recursive-exclude tests *
16
+ recursive-exclude docs *
17
+ recursive-exclude website *
18
+ recursive-exclude results *
19
+ recursive-exclude BACKUP *
20
+ recursive-exclude to_delete *
21
+ recursive-exclude paper *
22
+
23
+ # Exclude compiled Python files
24
+ global-exclude __pycache__
25
+ global-exclude *.py[co]
26
+ global-exclude *.so
27
+
28
+ # Exclude git and IDE files
29
+ global-exclude .git*
30
+ global-exclude .vscode
31
+ global-exclude .DS_Store
@@ -0,0 +1,278 @@
1
+ Metadata-Version: 2.4
2
+ Name: honeybee-ml
3
+ Version: 0.1.0
4
+ Summary: A Scalable Modular Framework for Multimodal AI in Oncology
5
+ Author-email: Aakash Tripathi <aakash.tripathi@moffitt.org>, Lab Rasool <ghulam.rasool@moffitt.org>
6
+ Maintainer-email: Aakash Tripathi <aakash.tripathi@moffitt.org>
7
+ License-Expression: MIT
8
+ Project-URL: Homepage, https://github.com/lab-rasool/HoneyBee
9
+ Project-URL: Documentation, https://lab-rasool.github.io/HoneyBee/
10
+ Project-URL: Repository, https://github.com/lab-rasool/HoneyBee
11
+ Project-URL: Bug Tracker, https://github.com/lab-rasool/HoneyBee/issues
12
+ Project-URL: Paper, https://arxiv.org/abs/2405.07460
13
+ Keywords: multimodal AI,oncology,cancer research,medical imaging,clinical NLP,machine learning,pathology,radiology,biomedical,healthcare
14
+ Classifier: Development Status :: 3 - Alpha
15
+ Classifier: Intended Audience :: Science/Research
16
+ Classifier: Intended Audience :: Healthcare Industry
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.8
19
+ Classifier: Programming Language :: Python :: 3.9
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
24
+ Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
25
+ Classifier: Topic :: Scientific/Engineering :: Image Recognition
26
+ Classifier: Topic :: Text Processing :: Linguistic
27
+ Classifier: Operating System :: OS Independent
28
+ Requires-Python: >=3.8
29
+ Description-Content-Type: text/markdown
30
+ License-File: LICENSE
31
+ Requires-Dist: numpy>=1.20.0
32
+ Requires-Dist: pandas>=1.3.0
33
+ Requires-Dist: torch>=2.0.0
34
+ Requires-Dist: torchvision>=0.15.0
35
+ Requires-Dist: torchaudio>=2.0.0
36
+ Requires-Dist: transformers>=4.30.0
37
+ Requires-Dist: accelerate>=0.20.0
38
+ Requires-Dist: bitsandbytes>=0.40.0
39
+ Requires-Dist: scikit-image>=0.19.0
40
+ Requires-Dist: scipy>=1.7.0
41
+ Requires-Dist: matplotlib>=3.3.0
42
+ Requires-Dist: imageio>=2.9.0
43
+ Provides-Extra: clinical
44
+ Requires-Dist: llama-index>=0.9.0; extra == "clinical"
45
+ Requires-Dist: langchain>=0.1.0; extra == "clinical"
46
+ Requires-Dist: pytesseract>=0.3.8; extra == "clinical"
47
+ Requires-Dist: pdf2image>=1.16.0; extra == "clinical"
48
+ Requires-Dist: PyPDF2>=3.0.0; extra == "clinical"
49
+ Provides-Extra: radiology
50
+ Requires-Dist: pydicom>=2.3.0; extra == "radiology"
51
+ Requires-Dist: SimpleITK>=2.2.0; extra == "radiology"
52
+ Requires-Dist: nibabel>=4.0.0; extra == "radiology"
53
+ Requires-Dist: opencv-python>=4.5.0; extra == "radiology"
54
+ Provides-Extra: pathology
55
+ Requires-Dist: openslide-python>=1.2.0; extra == "pathology"
56
+ Requires-Dist: colour-science>=0.4.0; extra == "pathology"
57
+ Requires-Dist: albumentations>=1.3.0; extra == "pathology"
58
+ Requires-Dist: cucim>=23.0.0; extra == "pathology"
59
+ Provides-Extra: molecular
60
+ Requires-Dist: pyarrow>=10.0.0; extra == "molecular"
61
+ Requires-Dist: fastparquet>=2023.0.0; extra == "molecular"
62
+ Provides-Extra: database
63
+ Requires-Dist: pymongo>=4.0.0; extra == "database"
64
+ Provides-Extra: dev
65
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
66
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
67
+ Requires-Dist: black>=23.0.0; extra == "dev"
68
+ Requires-Dist: ruff>=0.1.0; extra == "dev"
69
+ Requires-Dist: mypy>=1.0.0; extra == "dev"
70
+ Provides-Extra: visualization
71
+ Requires-Dist: ipykernel>=6.0.0; extra == "visualization"
72
+ Requires-Dist: ipywidgets>=8.0.0; extra == "visualization"
73
+ Provides-Extra: models
74
+ Requires-Dist: timm>=0.9.0; extra == "models"
75
+ Requires-Dist: onnxruntime>=1.15.0; extra == "models"
76
+ Requires-Dist: peft>=0.5.0; extra == "models"
77
+ Provides-Extra: all
78
+ Requires-Dist: honeybee-ml[clinical,database,models,molecular,pathology,radiology,visualization]; extra == "all"
79
+ Dynamic: license-file
80
+
81
+ <div align="center">
82
+ <img src="website/public/images/logo.png" alt="HoneyBee Logo" width="200">
83
+
84
+ # HoneyBee
85
+
86
+ **A Scalable Modular Framework for Multimodal AI in Oncology**
87
+
88
+ [![arXiv](https://img.shields.io/badge/arXiv-2405.07460-b31b1b.svg)](https://arxiv.org/abs/2405.07460)
89
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
90
+ [![GitHub stars](https://img.shields.io/github/stars/lab-rasool/HoneyBee?style=social)](https://github.com/lab-rasool/HoneyBee/stargazers)
91
+ [![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
92
+ [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-ee4c2c.svg)](https://pytorch.org/)
93
+
94
+ [Documentation](https://lab-rasool.github.io/HoneyBee/) | [Paper](https://arxiv.org/abs/2405.07460) | [Examples](examples/) | [Demo](app.py) | [Google Colab](https://colab.research.google.com/)
95
+ </div>
96
+
97
+ ## 🚀 Overview
98
+
99
+ HoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data types—clinical text, radiology images, pathology slides, and molecular data—through a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.
100
+
101
+ > [!WARNING]
102
+ > **Alpha Release**: This framework is currently in alpha. APIs may change, and some features are still under development.
103
+
104
+ ## ✨ Key Features
105
+
106
+ ### 🏗️ Modular Architecture
107
+ - **3-Layer Design**: Clean separation between data loaders, embedding models, and processors
108
+ - **Unified API**: Consistent interface across all modalities
109
+ - **Extensible**: Easy to add new models and data sources
110
+ - **Production-Ready**: Optimized for both research and clinical deployment
111
+
112
+ ### 📊 Comprehensive Data Support
113
+
114
+ #### Medical Imaging
115
+ - **Pathology**: Whole Slide Images (WSI) - SVS, TIFF formats with tissue detection
116
+ - **Radiology**: DICOM, NIFTI processing with 3D support
117
+ - **Preprocessing**: Advanced augmentation and normalization pipelines
118
+
119
+ #### Clinical Text
120
+ - **Document Processing**: PDF support with OCR for scanned documents
121
+ - **NLP Pipeline**: Cancer entity extraction, temporal parsing, medical ontology integration
122
+ - **Database Integration**: Native [MINDS](https://github.com/lab-rasool/MINDS) format support
123
+ - **Long Document Handling**: Multiple tokenization strategies for clinical notes
124
+
125
+ #### Molecular Data
126
+ - **Genomics**: Support for expression data and mutation profiles
127
+ - **Integration**: Seamless combination with imaging and clinical data
128
+
129
+ ### 🧠 State-of-the-Art Embedding Models
130
+
131
+ #### Clinical Text Embeddings
132
+ - **GatorTron**: Domain-specific clinical language model
133
+ - **BioBERT**: Biomedical text understanding
134
+ - **PubMedBERT**: Scientific literature embeddings
135
+ - **Clinical-T5**: Text-to-text clinical transformers
136
+
137
+ #### Medical Image Embeddings
138
+ - **REMEDIS**: Self-supervised medical image representations
139
+ - **RadImageNet**: Pre-trained radiological feature extractors
140
+ - **UNI**: Universal medical image encoder
141
+ - **Custom Models**: Easy integration of proprietary models
142
+
143
+ ### 🛠️ Advanced Capabilities
144
+
145
+ #### Multimodal Integration
146
+ - **Cross-Modal Learning**: Unified representations across modalities
147
+ - **Attention Mechanisms**: Interpretable fusion strategies
148
+ - **Patient-Level Aggregation**: Comprehensive patient profiles
149
+
150
+ #### Analysis Tools
151
+ - **Survival Analysis**: Cox PH, Random Survival Forest, DeepSurv
152
+ - **Classification**: Multi-class cancer type prediction
153
+ - **Retrieval**: Similar patient identification
154
+ - **Visualization**: Interactive t-SNE dashboards
155
+
156
+ #### Clinical Applications
157
+ - **Risk Stratification**: Patient outcome prediction
158
+ - **Treatment Planning**: Personalized therapy recommendations
159
+ - **Biomarker Discovery**: Multi-omic pattern identification
160
+
161
+ ## 🚀 Quick Start
162
+
163
+ ### Prerequisites
164
+
165
+ - Python 3.8+
166
+ - PyTorch 2.0+
167
+ - CUDA 11.7+ (optional, for GPU acceleration)
168
+
169
+ ### System Dependencies
170
+
171
+ ```bash
172
+ # Ubuntu/Debian
173
+ sudo apt-get update
174
+ sudo apt-get install -y openslide-tools tesseract-ocr
175
+
176
+ # macOS
177
+ brew install openslide tesseract
178
+
179
+ # Windows
180
+ # Install from official websites:
181
+ # - OpenSlide: https://openslide.org/download/
182
+ # - Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
183
+ ```
184
+
185
+ ### Installation
186
+
187
+ ```bash
188
+ # Clone the repository
189
+ git clone https://github.com/lab-rasool/HoneyBee.git
190
+ cd HoneyBee
191
+
192
+ # Install dependencies
193
+ pip install -r requirements.txt
194
+
195
+ # Download required NLTK data
196
+ python -c "import nltk; nltk.download('punkt')"
197
+
198
+ # Install HoneyBee in development mode
199
+ pip install -e .
200
+ ```
201
+
202
+ ### Environment Setup
203
+
204
+ Create a `.env` file in the project root:
205
+
206
+ ```bash
207
+ # MINDS database credentials (if using MINDS format)
208
+ HOST=your_server
209
+ PORT=5433
210
+ DB_USER=postgres
211
+ PASSWORD=your_password
212
+ DATABASE=minds
213
+
214
+ # HuggingFace API (for some models)
215
+ HF_API_KEY=your_huggingface_api_key
216
+ ```
217
+
218
+ ## 🔬 Research Applications
219
+
220
+ HoneyBee has been successfully applied to:
221
+
222
+ - **Cancer Subtype Classification**: Automated identification of cancer subtypes from multimodal data
223
+ - **Survival Prediction**: Risk stratification and outcome prediction for treatment planning
224
+ - **Similar Patient Retrieval**: Finding patients with similar clinical profiles for precision medicine
225
+ - **Biomarker Discovery**: Identifying multimodal patterns associated with treatment response
226
+
227
+ ## 🤝 Contributing
228
+
229
+ We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
230
+
231
+ ### Development Setup
232
+
233
+ ```bash
234
+ # Fork and clone your fork
235
+ git clone https://github.com/YOUR_USERNAME/HoneyBee.git
236
+ cd HoneyBee
237
+
238
+ # Create a virtual environment
239
+ python -m venv venv
240
+ source venv/bin/activate # On Windows: venv\Scripts\activate
241
+
242
+ # Install in development mode
243
+ pip install -r requirements.txt
244
+ pip install -e .
245
+ ```
246
+
247
+ ## 🐛 Known Issues & Limitations
248
+
249
+ - **Alpha Status**: Some features are still under development
250
+ - **Memory Requirements**: WSI processing requires significant RAM (16GB+ recommended)
251
+ - **GPU Recommended**: While CPU fallback exists, GPU acceleration significantly improves performance
252
+ - **Limited Test Coverage**: Comprehensive test suite is planned for future releases
253
+
254
+ ## 📜 License
255
+
256
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
257
+
258
+ ## 📝 Citation
259
+
260
+ If you use HoneyBee in your research, please cite our paper:
261
+
262
+ ```bibtex
263
+ @article{tripathi2024honeybee,
264
+ title={HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models},
265
+ author={Aakash Tripathi and Asim Waqas and Yasin Yilmaz and Ghulam Rasool},
266
+ journal={arXiv preprint arXiv:2405.07460},
267
+ year={2024},
268
+ eprint={2405.07460},
269
+ archivePrefix={arXiv},
270
+ primaryClass={cs.LG}
271
+ }
272
+ ```
273
+
274
+ ---
275
+
276
+ <div align="center">
277
+ Made with ❤️ by the <a href="https://github.com/lab-rasool">Lab Rasool</a> team
278
+ </div>