scibex 0.1.0b1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- scibex-0.1.0b1/CONTRIBUTING.md +119 -0
- scibex-0.1.0b1/HISTORY.md +13 -0
- scibex-0.1.0b1/LICENSE +21 -0
- scibex-0.1.0b1/MANIFEST.in +10 -0
- scibex-0.1.0b1/PKG-INFO +184 -0
- scibex-0.1.0b1/README.md +147 -0
- scibex-0.1.0b1/docs/index.md +1 -0
- scibex-0.1.0b1/docs/installation.md +175 -0
- scibex-0.1.0b1/docs/usage.md +121 -0
- scibex-0.1.0b1/pyproject.toml +76 -0
- scibex-0.1.0b1/setup.cfg +4 -0
- scibex-0.1.0b1/src/scibex/__init__.py +9 -0
- scibex-0.1.0b1/src/scibex/_ibex.py +89 -0
- scibex-0.1.0b1/src/scibex/_r.py +71 -0
- scibex-0.1.0b1/src/scibex/_types.py +10 -0
- scibex-0.1.0b1/src/scibex/tl/__init__.py +1 -0
- scibex-0.1.0b1/src/scibex/tl/_ibex.py +121 -0
- scibex-0.1.0b1/src/scibex/utils.py +148 -0
- scibex-0.1.0b1/src/scibex.egg-info/PKG-INFO +184 -0
- scibex-0.1.0b1/src/scibex.egg-info/SOURCES.txt +24 -0
- scibex-0.1.0b1/src/scibex.egg-info/dependency_links.txt +1 -0
- scibex-0.1.0b1/src/scibex.egg-info/requires.txt +14 -0
- scibex-0.1.0b1/src/scibex.egg-info/top_level.txt +1 -0
- scibex-0.1.0b1/tests/__init__.py +1 -0
- scibex-0.1.0b1/tests/conftest.py +40 -0
- scibex-0.1.0b1/tests/test_ibex.py +665 -0
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
# Contributing
|
|
2
|
+
|
|
3
|
+
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
|
|
4
|
+
|
|
5
|
+
You can contribute in many ways:
|
|
6
|
+
|
|
7
|
+
## Types of Contributions
|
|
8
|
+
|
|
9
|
+
### Report Bugs
|
|
10
|
+
|
|
11
|
+
Report bugs at https://github.com/Qile0317/scibex/issues.
|
|
12
|
+
|
|
13
|
+
If you are reporting a bug, please include:
|
|
14
|
+
|
|
15
|
+
- Your operating system name and version.
|
|
16
|
+
- Any details about your local setup that might be helpful in troubleshooting.
|
|
17
|
+
- Detailed steps to reproduce the bug.
|
|
18
|
+
|
|
19
|
+
### Fix Bugs
|
|
20
|
+
|
|
21
|
+
Look through the GitHub issues for bugs. Anything tagged with "bug" and "help wanted" is open to whoever wants to implement it.
|
|
22
|
+
|
|
23
|
+
### Implement Features
|
|
24
|
+
|
|
25
|
+
Look through the GitHub issues for features. Anything tagged with "enhancement" and "help wanted" is open to whoever wants to implement it.
|
|
26
|
+
|
|
27
|
+
### Write Documentation
|
|
28
|
+
|
|
29
|
+
scibex could always use more documentation, whether as part of the official docs, in docstrings, or even on the web in blog posts, articles, and such.
|
|
30
|
+
|
|
31
|
+
### Submit Feedback
|
|
32
|
+
|
|
33
|
+
The best way to send feedback is to file an issue at https://github.com/Qile0317/scibex/issues.
|
|
34
|
+
|
|
35
|
+
If you are proposing a feature:
|
|
36
|
+
|
|
37
|
+
- Explain in detail how it would work.
|
|
38
|
+
- Keep the scope as narrow as possible, to make it easier to implement.
|
|
39
|
+
- Remember that this is a volunteer-driven project, and that contributions are welcome :)
|
|
40
|
+
|
|
41
|
+
## Get Started!
|
|
42
|
+
|
|
43
|
+
Ready to contribute? Here's how to set up `scibex` for local development.
|
|
44
|
+
|
|
45
|
+
1. Fork the `scibex` repo on GitHub.
|
|
46
|
+
2. Clone your fork locally:
|
|
47
|
+
|
|
48
|
+
```sh
|
|
49
|
+
git clone git@github.com:your_name_here/scibex.git
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
|
|
53
|
+
|
|
54
|
+
```sh
|
|
55
|
+
mkvirtualenv scibex
|
|
56
|
+
cd scibex/
|
|
57
|
+
python setup.py develop
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
4. Create a branch for local development:
|
|
61
|
+
|
|
62
|
+
```sh
|
|
63
|
+
git checkout -b name-of-your-bugfix-or-feature
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Now you can make your changes locally.
|
|
67
|
+
|
|
68
|
+
5. When you're done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:
|
|
69
|
+
|
|
70
|
+
```sh
|
|
71
|
+
make lint
|
|
72
|
+
make test
|
|
73
|
+
# Or
|
|
74
|
+
make test-all
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
To get flake8 and tox, just pip install them into your virtualenv.
|
|
78
|
+
|
|
79
|
+
6. Commit your changes and push your branch to GitHub:
|
|
80
|
+
|
|
81
|
+
```sh
|
|
82
|
+
git add .
|
|
83
|
+
git commit -m "Your detailed description of your changes."
|
|
84
|
+
git push origin name-of-your-bugfix-or-feature
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
7. Submit a pull request through the GitHub website.
|
|
88
|
+
|
|
89
|
+
## Pull Request Guidelines
|
|
90
|
+
|
|
91
|
+
Before you submit a pull request, check that it meets these guidelines:
|
|
92
|
+
|
|
93
|
+
1. The pull request should include tests.
|
|
94
|
+
2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.md.
|
|
95
|
+
3. The pull request should work for Python 3.12 and 3.13. Tests run in GitHub Actions on every pull request to the main branch, make sure that the tests pass for all supported Python versions.
|
|
96
|
+
|
|
97
|
+
## Tips
|
|
98
|
+
|
|
99
|
+
To run a subset of tests:
|
|
100
|
+
|
|
101
|
+
```sh
|
|
102
|
+
pytest tests.test_scibex
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
## Deploying
|
|
106
|
+
|
|
107
|
+
A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in HISTORY.md). Then run:
|
|
108
|
+
|
|
109
|
+
```sh
|
|
110
|
+
uv version patch # or: minor, major
|
|
111
|
+
git commit -am "Release X.Y.Z"
|
|
112
|
+
just tag
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
GitHub Actions will automatically publish to PyPI when the tag is pushed. See `.github/workflows/publish.yml` for details.
|
|
116
|
+
|
|
117
|
+
## Code of Conduct
|
|
118
|
+
|
|
119
|
+
Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms.
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
# History
|
|
2
|
+
|
|
3
|
+
## 0.1.0b1 (2026-06-12)
|
|
4
|
+
|
|
5
|
+
* First public pre-release on PyPI.
|
|
6
|
+
* Python interface to the Ibex BCR embedding R package for scverse single-cell analysis.
|
|
7
|
+
* `scibex.tl.ibex` — embed BCR sequences from scirpy `AnnData`/`MuData` into `obsm`.
|
|
8
|
+
* `scibex.ibex_matrix` — low-level embedding of a plain sequence list.
|
|
9
|
+
* Supports geometric, CNN, and VAE encoders; CDR3-only and expanded CDR1+2+3 (EXP) variants.
|
|
10
|
+
* `strategy` parameter (`"lenient"` / `"strict"`) for handling partial CDR missingness in EXP models.
|
|
11
|
+
* `fill_value` and `verbose` options for missing-sequence handling.
|
|
12
|
+
* Central type aliases (`_types.py`) for all shared `Literal` annotations.
|
|
13
|
+
* Zensical documentation site; ReadTheDocs integration.
|
scibex-0.1.0b1/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026, Qile Yang
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
scibex-0.1.0b1/PKG-INFO
ADDED
|
@@ -0,0 +1,184 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: scibex
|
|
3
|
+
Version: 0.1.0b1
|
|
4
|
+
Summary: Python interface to the Ibex BCR embedding R package for scverse single-cell analysis
|
|
5
|
+
Author-email: Qile Yang <qile.yang@berkeley.edu>
|
|
6
|
+
Maintainer-email: Qile Yang <qile.yang@berkeley.edu>
|
|
7
|
+
License: MIT
|
|
8
|
+
Project-URL: bugs, https://github.com/Qile0317/scibex/issues
|
|
9
|
+
Project-URL: changelog, https://github.com/Qile0317/scibex/blob/main/changelog.md
|
|
10
|
+
Project-URL: homepage, https://github.com/Qile0317/scibex
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Science/Research
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
19
|
+
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
|
|
20
|
+
Classifier: Typing :: Typed
|
|
21
|
+
Requires-Python: >=3.10
|
|
22
|
+
Description-Content-Type: text/markdown
|
|
23
|
+
License-File: LICENSE
|
|
24
|
+
Requires-Dist: rpy2
|
|
25
|
+
Requires-Dist: anndata
|
|
26
|
+
Requires-Dist: scirpy
|
|
27
|
+
Requires-Dist: muon
|
|
28
|
+
Provides-Extra: docs
|
|
29
|
+
Requires-Dist: zensical; extra == "docs"
|
|
30
|
+
Provides-Extra: test
|
|
31
|
+
Requires-Dist: coverage; extra == "test"
|
|
32
|
+
Requires-Dist: pytest; extra == "test"
|
|
33
|
+
Requires-Dist: ruff; extra == "test"
|
|
34
|
+
Requires-Dist: ty; extra == "test"
|
|
35
|
+
Requires-Dist: ipdb; extra == "test"
|
|
36
|
+
Dynamic: license-file
|
|
37
|
+
|
|
38
|
+
# scibex
|
|
39
|
+
|
|
40
|
+
[](https://pypi.org/project/scibex/)
|
|
41
|
+
[](https://scibex.readthedocs.io/en/latest/?version=latest)
|
|
42
|
+
[](https://opensource.org/licenses/MIT)
|
|
43
|
+
|
|
44
|
+
**scibex** brings [Ibex](https://github.com/BorchLab/Ibex) BCR embeddings into the [scverse](https://scverse.org) ecosystem. It wraps the R Ibex package via rpy2 and stores results directly in `AnnData.obsm`, making it a drop-in complement to [scirpy](https://scirpy.scverse.org) for B-cell receptor analysis.
|
|
45
|
+
|
|
46
|
+
Ibex encodes CDR3 (or CDR1+2+3) amino acid sequences from paired heavy and light chains using convolutional/variational autoencoders or a fast geometric transform. The resulting low-dimensional embeddings can be combined with gene expression data for multimodal single-cell analysis.
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## Features
|
|
51
|
+
|
|
52
|
+
- **scirpy-native**: reads chain sequences from `obsm["chain_indices"]`; writes embeddings back to `obsm`
|
|
53
|
+
- **Heavy and light chains**: embed each independently then combine downstream
|
|
54
|
+
- **Multiple models**: geometric baseline, CNN autoencoder, VAE, and expanded CDR1+2+3 variants
|
|
55
|
+
- **Multiple encodings**: Atchley factors, Kidera factors, Cruciani properties, MSWHIM, tScales, one-hot
|
|
56
|
+
- **No manual sequence handling**: `scibex.tl.ibex(mdata, ...)` does the full extract → embed → store pipeline
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## Installation
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
pip install scibex
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
scibex wraps the [Ibex R package](https://github.com/BorchLab/Ibex) via rpy2.
|
|
67
|
+
Install the R dependency from Python:
|
|
68
|
+
|
|
69
|
+
```python
|
|
70
|
+
import scibex as ib
|
|
71
|
+
ib.install_r_deps() # into R's default library
|
|
72
|
+
ib.install_r_deps(lib_loc="/path/to/my/Rlib") # into a specific directory
|
|
73
|
+
ib.install_r_deps(force=True) # force-reinstall everything
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
<!-- This also installs `callr`, which basilisk needs to run the encoder in an
|
|
77
|
+
isolated subprocess (required when calling scibex from a Jupyter notebook). -->
|
|
78
|
+
|
|
79
|
+
Or directly in R:
|
|
80
|
+
|
|
81
|
+
```r
|
|
82
|
+
remotes::install_github("BorchLab/Ibex@devel")
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
If Ibex is in a non-default R library, call `ib.setup(lib_loc=...)` **once**
|
|
86
|
+
before any embedding call:
|
|
87
|
+
|
|
88
|
+
```python
|
|
89
|
+
ib.setup(lib_loc="/path/to/my/Rlib")
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
See the [Installation docs](docs/installation.md) for R environment
|
|
93
|
+
troubleshooting (conda ABI mismatches, `.Rprofile` interference, keras setup).
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## Quick start
|
|
98
|
+
|
|
99
|
+
```python
|
|
100
|
+
import scirpy as ir
|
|
101
|
+
import scibex as ib
|
|
102
|
+
|
|
103
|
+
# Load a scirpy MuData (chain_indices must already be populated)
|
|
104
|
+
mdata = ir.datasets.stephenson2021_5k()
|
|
105
|
+
|
|
106
|
+
# Embed heavy-chain CDR3 sequences → stored in mdata["airr"].obsm["X_ibex_heavy"]
|
|
107
|
+
ib.tl.ibex(mdata, chain="Heavy", key_added="X_ibex_heavy")
|
|
108
|
+
|
|
109
|
+
# Embed light-chain CDR3 sequences → stored in mdata["airr"].obsm["X_ibex_light"]
|
|
110
|
+
ib.tl.ibex(mdata, chain="Light", key_added="X_ibex_light")
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
Switch `encoder_input` or `encoder_model` for different representations:
|
|
114
|
+
|
|
115
|
+
```python
|
|
116
|
+
ib.tl.ibex(
|
|
117
|
+
mdata,
|
|
118
|
+
chain="Heavy",
|
|
119
|
+
method="encoder",
|
|
120
|
+
encoder_model="VAE",
|
|
121
|
+
encoder_input="kideraFactors",
|
|
122
|
+
key_added="X_ibex_heavy",
|
|
123
|
+
)
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
<!-- For a fast geometric baseline (no model download needed):
|
|
127
|
+
|
|
128
|
+
```python
|
|
129
|
+
ib.tl.ibex(mdata, chain="Heavy", method="geometric", key_added="X_ibex_heavy")
|
|
130
|
+
``` -->
|
|
131
|
+
|
|
132
|
+
If you only have a list of sequences (e.g. from a custom pipeline), use the low-level function directly:
|
|
133
|
+
|
|
134
|
+
```python
|
|
135
|
+
embedding = ib.ibex_matrix(
|
|
136
|
+
["CARDLVSYGMDVW", "CAKGGQIFHFSSGFYFDFW"],
|
|
137
|
+
chain="Heavy",
|
|
138
|
+
method="encoder",
|
|
139
|
+
) # returns np.ndarray of shape [N, D]
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## Tutorial
|
|
145
|
+
|
|
146
|
+
A complete end-to-end tutorial on the Stephenson 2021 COVID-19 dataset (5k BCR cells) is available in
|
|
147
|
+
[`docs/notebooks/tutorial_5k_bcr.ipynb`](docs/notebooks/tutorial_5k_bcr.ipynb).
|
|
148
|
+
|
|
149
|
+
It covers:
|
|
150
|
+
|
|
151
|
+
- Loading a scirpy `MuData`
|
|
152
|
+
- Embedding heavy and light chains with `scibex.tl.ibex`
|
|
153
|
+
- Visualising the embedding space as a UMAP
|
|
154
|
+
- Training a logistic-regression classifier to predict patient outcome from paired BCR embeddings
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
## API overview
|
|
159
|
+
|
|
160
|
+
| Function | Description |
|
|
161
|
+
| --- | --- |
|
|
162
|
+
| `scibex.tl.ibex(adata, ...)` | Embed BCR sequences in a scirpy `AnnData`/`MuData`; stores result in `obsm` |
|
|
163
|
+
| `scibex.ibex_matrix(seqs, ...)` | Low-level: embed a list of CDR3 strings, returns `[N, D]` numpy array |
|
|
164
|
+
|
|
165
|
+
**Key parameters for `tl.ibex`:**
|
|
166
|
+
|
|
167
|
+
| Parameter | Options | Default |
|
|
168
|
+
| --- | --- | --- |
|
|
169
|
+
| `chain` | `"Heavy"`, `"Light"` | `"Heavy"` |
|
|
170
|
+
| `method` | `"encoder"`, `"geometric"` | `"encoder"` |
|
|
171
|
+
| `encoder_model` | `"CNN"`, `"VAE"`, `"CNN.EXP"`, `"VAE.EXP"` | `"VAE"` |
|
|
172
|
+
| `encoder_input` | `"atchleyFactors"`, `"kideraFactors"`, `"crucianiProperties"`, `"MSWHIM"`, `"tScales"`, `"OHE"` | `"atchleyFactors"` |
|
|
173
|
+
| `species` | `"Human"`, `"Mouse"` | `"Human"` |
|
|
174
|
+
| `key_added` | any string | `"X_ibex"` |
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## Acknowledgements
|
|
179
|
+
|
|
180
|
+
scibex is a Python interface to the [Ibex R package](https://github.com/BorchLab/Ibex). If you use scibex in your work, please cite the original Ibex publication.
|
|
181
|
+
|
|
182
|
+
- PyPI: <https://pypi.org/project/scibex/>
|
|
183
|
+
- Documentation: <https://scibex.readthedocs.io>
|
|
184
|
+
- License: MIT
|
scibex-0.1.0b1/README.md
ADDED
|
@@ -0,0 +1,147 @@
|
|
|
1
|
+
# scibex
|
|
2
|
+
|
|
3
|
+
[](https://pypi.org/project/scibex/)
|
|
4
|
+
[](https://scibex.readthedocs.io/en/latest/?version=latest)
|
|
5
|
+
[](https://opensource.org/licenses/MIT)
|
|
6
|
+
|
|
7
|
+
**scibex** brings [Ibex](https://github.com/BorchLab/Ibex) BCR embeddings into the [scverse](https://scverse.org) ecosystem. It wraps the R Ibex package via rpy2 and stores results directly in `AnnData.obsm`, making it a drop-in complement to [scirpy](https://scirpy.scverse.org) for B-cell receptor analysis.
|
|
8
|
+
|
|
9
|
+
Ibex encodes CDR3 (or CDR1+2+3) amino acid sequences from paired heavy and light chains using convolutional/variational autoencoders or a fast geometric transform. The resulting low-dimensional embeddings can be combined with gene expression data for multimodal single-cell analysis.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Features
|
|
14
|
+
|
|
15
|
+
- **scirpy-native**: reads chain sequences from `obsm["chain_indices"]`; writes embeddings back to `obsm`
|
|
16
|
+
- **Heavy and light chains**: embed each independently then combine downstream
|
|
17
|
+
- **Multiple models**: geometric baseline, CNN autoencoder, VAE, and expanded CDR1+2+3 variants
|
|
18
|
+
- **Multiple encodings**: Atchley factors, Kidera factors, Cruciani properties, MSWHIM, tScales, one-hot
|
|
19
|
+
- **No manual sequence handling**: `scibex.tl.ibex(mdata, ...)` does the full extract → embed → store pipeline
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Installation
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
pip install scibex
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
scibex wraps the [Ibex R package](https://github.com/BorchLab/Ibex) via rpy2.
|
|
30
|
+
Install the R dependency from Python:
|
|
31
|
+
|
|
32
|
+
```python
|
|
33
|
+
import scibex as ib
|
|
34
|
+
ib.install_r_deps() # into R's default library
|
|
35
|
+
ib.install_r_deps(lib_loc="/path/to/my/Rlib") # into a specific directory
|
|
36
|
+
ib.install_r_deps(force=True) # force-reinstall everything
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
<!-- This also installs `callr`, which basilisk needs to run the encoder in an
|
|
40
|
+
isolated subprocess (required when calling scibex from a Jupyter notebook). -->
|
|
41
|
+
|
|
42
|
+
Or directly in R:
|
|
43
|
+
|
|
44
|
+
```r
|
|
45
|
+
remotes::install_github("BorchLab/Ibex@devel")
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
If Ibex is in a non-default R library, call `ib.setup(lib_loc=...)` **once**
|
|
49
|
+
before any embedding call:
|
|
50
|
+
|
|
51
|
+
```python
|
|
52
|
+
ib.setup(lib_loc="/path/to/my/Rlib")
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
See the [Installation docs](docs/installation.md) for R environment
|
|
56
|
+
troubleshooting (conda ABI mismatches, `.Rprofile` interference, keras setup).
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## Quick start
|
|
61
|
+
|
|
62
|
+
```python
|
|
63
|
+
import scirpy as ir
|
|
64
|
+
import scibex as ib
|
|
65
|
+
|
|
66
|
+
# Load a scirpy MuData (chain_indices must already be populated)
|
|
67
|
+
mdata = ir.datasets.stephenson2021_5k()
|
|
68
|
+
|
|
69
|
+
# Embed heavy-chain CDR3 sequences → stored in mdata["airr"].obsm["X_ibex_heavy"]
|
|
70
|
+
ib.tl.ibex(mdata, chain="Heavy", key_added="X_ibex_heavy")
|
|
71
|
+
|
|
72
|
+
# Embed light-chain CDR3 sequences → stored in mdata["airr"].obsm["X_ibex_light"]
|
|
73
|
+
ib.tl.ibex(mdata, chain="Light", key_added="X_ibex_light")
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Switch `encoder_input` or `encoder_model` for different representations:
|
|
77
|
+
|
|
78
|
+
```python
|
|
79
|
+
ib.tl.ibex(
|
|
80
|
+
mdata,
|
|
81
|
+
chain="Heavy",
|
|
82
|
+
method="encoder",
|
|
83
|
+
encoder_model="VAE",
|
|
84
|
+
encoder_input="kideraFactors",
|
|
85
|
+
key_added="X_ibex_heavy",
|
|
86
|
+
)
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
<!-- For a fast geometric baseline (no model download needed):
|
|
90
|
+
|
|
91
|
+
```python
|
|
92
|
+
ib.tl.ibex(mdata, chain="Heavy", method="geometric", key_added="X_ibex_heavy")
|
|
93
|
+
``` -->
|
|
94
|
+
|
|
95
|
+
If you only have a list of sequences (e.g. from a custom pipeline), use the low-level function directly:
|
|
96
|
+
|
|
97
|
+
```python
|
|
98
|
+
embedding = ib.ibex_matrix(
|
|
99
|
+
["CARDLVSYGMDVW", "CAKGGQIFHFSSGFYFDFW"],
|
|
100
|
+
chain="Heavy",
|
|
101
|
+
method="encoder",
|
|
102
|
+
) # returns np.ndarray of shape [N, D]
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Tutorial
|
|
108
|
+
|
|
109
|
+
A complete end-to-end tutorial on the Stephenson 2021 COVID-19 dataset (5k BCR cells) is available in
|
|
110
|
+
[`docs/notebooks/tutorial_5k_bcr.ipynb`](docs/notebooks/tutorial_5k_bcr.ipynb).
|
|
111
|
+
|
|
112
|
+
It covers:
|
|
113
|
+
|
|
114
|
+
- Loading a scirpy `MuData`
|
|
115
|
+
- Embedding heavy and light chains with `scibex.tl.ibex`
|
|
116
|
+
- Visualising the embedding space as a UMAP
|
|
117
|
+
- Training a logistic-regression classifier to predict patient outcome from paired BCR embeddings
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## API overview
|
|
122
|
+
|
|
123
|
+
| Function | Description |
|
|
124
|
+
| --- | --- |
|
|
125
|
+
| `scibex.tl.ibex(adata, ...)` | Embed BCR sequences in a scirpy `AnnData`/`MuData`; stores result in `obsm` |
|
|
126
|
+
| `scibex.ibex_matrix(seqs, ...)` | Low-level: embed a list of CDR3 strings, returns `[N, D]` numpy array |
|
|
127
|
+
|
|
128
|
+
**Key parameters for `tl.ibex`:**
|
|
129
|
+
|
|
130
|
+
| Parameter | Options | Default |
|
|
131
|
+
| --- | --- | --- |
|
|
132
|
+
| `chain` | `"Heavy"`, `"Light"` | `"Heavy"` |
|
|
133
|
+
| `method` | `"encoder"`, `"geometric"` | `"encoder"` |
|
|
134
|
+
| `encoder_model` | `"CNN"`, `"VAE"`, `"CNN.EXP"`, `"VAE.EXP"` | `"VAE"` |
|
|
135
|
+
| `encoder_input` | `"atchleyFactors"`, `"kideraFactors"`, `"crucianiProperties"`, `"MSWHIM"`, `"tScales"`, `"OHE"` | `"atchleyFactors"` |
|
|
136
|
+
| `species` | `"Human"`, `"Mouse"` | `"Human"` |
|
|
137
|
+
| `key_added` | any string | `"X_ibex"` |
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## Acknowledgements
|
|
142
|
+
|
|
143
|
+
scibex is a Python interface to the [Ibex R package](https://github.com/BorchLab/Ibex). If you use scibex in your work, please cite the original Ibex publication.
|
|
144
|
+
|
|
145
|
+
- PyPI: <https://pypi.org/project/scibex/>
|
|
146
|
+
- Documentation: <https://scibex.readthedocs.io>
|
|
147
|
+
- License: MIT
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
--8<-- "README.md"
|
|
@@ -0,0 +1,175 @@
|
|
|
1
|
+
# Installation
|
|
2
|
+
|
|
3
|
+
## Python package
|
|
4
|
+
|
|
5
|
+
```sh
|
|
6
|
+
pip install scibex
|
|
7
|
+
# or with uv
|
|
8
|
+
uv add scibex
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
## R dependency
|
|
12
|
+
|
|
13
|
+
scibex is a Python wrapper around the [Ibex R package](https://github.com/BorchLab/Ibex).
|
|
14
|
+
The R package must be installed and visible to the R runtime that rpy2 uses before
|
|
15
|
+
calling any `scibex` function.
|
|
16
|
+
|
|
17
|
+
### Option A — install from Python (recommended)
|
|
18
|
+
|
|
19
|
+
```python
|
|
20
|
+
import scibex as ib
|
|
21
|
+
|
|
22
|
+
ib.install_r_deps() # installs into R's default .libPaths()
|
|
23
|
+
ib.install_r_deps(lib_loc="/path/to/my/Rlib") # install into a specific directory
|
|
24
|
+
ib.install_r_deps(force=True) # force-reinstall everything
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
This installs Ibex, `remotes`, and `callr`. Packages already present are
|
|
28
|
+
skipped, so repeated calls return quickly. `callr` is required for the
|
|
29
|
+
encoder (`method="encoder"`) to work from a Jupyter notebook: without it,
|
|
30
|
+
basilisk runs inline and conflicts with rpy2's pre-initialized Python.
|
|
31
|
+
|
|
32
|
+
If the target directory is non-standard, tell scibex where to find it at runtime:
|
|
33
|
+
|
|
34
|
+
```python
|
|
35
|
+
ib.setup(lib_loc="/path/to/my/Rlib")
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
`setup()` must be called **before** the first `ib.tl.ibex(...)` or `ib.ibex_matrix(...)` call.
|
|
39
|
+
|
|
40
|
+
### Option B — install directly in R
|
|
41
|
+
|
|
42
|
+
```r
|
|
43
|
+
remotes::install_github("BorchLab/Ibex@devel")
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## Troubleshooting R environments
|
|
47
|
+
|
|
48
|
+
### rpy2 / conda: base packages not found at startup
|
|
49
|
+
|
|
50
|
+
```text
|
|
51
|
+
During startup - Warning messages:
|
|
52
|
+
1: package 'grDevices' in options("defaultPackages") was not found
|
|
53
|
+
2: package 'graphics' in options("defaultPackages") was not found
|
|
54
|
+
3: package 'stats' in options("defaultPackages") was not found
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
This means rpy2's C extension was compiled against a different `libR.so` than the
|
|
58
|
+
one in your conda environment. It happens when rpy2 is installed from PyPI inside a
|
|
59
|
+
conda env — the PyPI wheel is not compiled with the `-rpath` flag pointing at
|
|
60
|
+
conda's R library.
|
|
61
|
+
|
|
62
|
+
**Fix (option A) — recompile from source in the conda env:**
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
conda activate ibex # or your env name
|
|
66
|
+
LDFLAGS="-Wl,-rpath,$CONDA_PREFIX/lib/R/lib" \
|
|
67
|
+
pip install --force-reinstall --no-binary rpy2 rpy2
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
**Fix (option B) — use the conda-forge build (pre-patched):**
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
conda install -n ibex -c conda-forge rpy2
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### rpy2 picks up the wrong R installation
|
|
77
|
+
|
|
78
|
+
rpy2 uses whichever `R` binary appears first on `PATH`. Check which one it will use:
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
R --version # should match your intended R installation
|
|
82
|
+
Rscript -e "R.home()"
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
If you are using conda, activate the correct environment before starting Python.
|
|
86
|
+
|
|
87
|
+
### `list.files` arity error (R ABI mismatch)
|
|
88
|
+
|
|
89
|
+
```text
|
|
90
|
+
RRuntimeError: Error in list.files(...) :
|
|
91
|
+
8 arguments passed to .Internal(list.files) which requires 9
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
R 4.5+ changed `.Internal(list.files)` from 8 to 9 arguments. This error means
|
|
95
|
+
the C runtime (the `libR.so` loaded by rpy2) is R 4.5+ but the R bytecode being
|
|
96
|
+
executed was compiled for R ≤ 4.4.
|
|
97
|
+
|
|
98
|
+
**Case A — conda R only (partial upgrade).** After `conda update r-base`, the
|
|
99
|
+
base package bytecode may lag the C library. Fix:
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
conda install -n <env> -c conda-forge r-base --force-reinstall
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Then reinstall any R packages that depend on compiled C/C++/Fortran code.
|
|
106
|
+
|
|
107
|
+
**Case B — conda R 4.5.x + system R 4.6+.** If your machine has both a conda R
|
|
108
|
+
(e.g. 4.5.1) and a system R (e.g. 4.6.0), rpy2's compiled CFFI extension may
|
|
109
|
+
load the *system* `libR.so` via its rpath, while `R_HOME` points to the conda
|
|
110
|
+
install. The result is an ABI mismatch between the system R 4.6 C code and the
|
|
111
|
+
conda R 4.5 bytecode.
|
|
112
|
+
|
|
113
|
+
Fix — recompile rpy2 and patch the binary's rpath:
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
conda activate <env>
|
|
117
|
+
# Install patchelf if not already present:
|
|
118
|
+
conda install -c conda-forge patchelf
|
|
119
|
+
# Recompile rpy2 and patch the CFFI extension in one step:
|
|
120
|
+
just setup-r
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
If you are not using `just`, the equivalent manual steps are:
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
LDFLAGS="-Wl,-rpath,$CONDA_PREFIX/lib/R/lib" \
|
|
127
|
+
pip install --force-reinstall --no-binary rpy2 rpy2
|
|
128
|
+
|
|
129
|
+
CFFI_SO=$(find $VIRTUAL_ENV/lib -name "_rinterface_cffi_api*.so" | head -1)
|
|
130
|
+
patchelf --force-rpath \
|
|
131
|
+
--set-rpath "$CONDA_PREFIX/lib/R/lib:$CONDA_PREFIX/lib" "$CFFI_SO"
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
Without patchelf you can use `LD_PRELOAD` as a per-process workaround:
|
|
135
|
+
|
|
136
|
+
```bash
|
|
137
|
+
LD_PRELOAD="$CONDA_PREFIX/lib/R/lib/libR.so" python my_script.py
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### `.Rprofile` interference
|
|
141
|
+
|
|
142
|
+
If an ancestor directory contains an `.Rprofile` (e.g. a sibling project's
|
|
143
|
+
`renv/activate.R`), R will source it at startup, potentially modifying
|
|
144
|
+
`.libPaths()` in unexpected ways. Disable `.Rprofile` loading when running tests
|
|
145
|
+
or one-off scripts:
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
R_PROFILE_USER=/dev/null python my_script.py
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
To inject a custom R library path without touching `.Rprofile`:
|
|
152
|
+
|
|
153
|
+
```bash
|
|
154
|
+
R_LIBS_USER=/path/to/my/Rlib python my_script.py
|
|
155
|
+
# or equivalently via scibex:
|
|
156
|
+
python -c "import scibex as ib; ib.setup(lib_loc='/path/to/my/Rlib'); ..."
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### Encoder models require keras / tensorflow
|
|
160
|
+
|
|
161
|
+
`method="encoder"` (the default) downloads model weights on first use via the
|
|
162
|
+
`basilisk`-managed Python environment inside the Ibex R package. If keras is not
|
|
163
|
+
available, use the fast geometric baseline instead:
|
|
164
|
+
|
|
165
|
+
```python
|
|
166
|
+
ib.tl.ibex(mdata, chain="Heavy", method="geometric")
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
## From source
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
git clone https://github.com/Qile0317/scibex
|
|
173
|
+
cd scibex
|
|
174
|
+
uv pip install -e ".[dev]"
|
|
175
|
+
```
|