PyPI - mapmycells2cl - Versions diffs - 0.1.0__tar.gz - Mend

mapmycells2cl 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

mapmycells2cl-0.1.0/.gitignore +32 -0
mapmycells2cl-0.1.0/LICENSE +21 -0
mapmycells2cl-0.1.0/PKG-INFO +349 -0
mapmycells2cl-0.1.0/README.md +327 -0
mapmycells2cl-0.1.0/pyproject.toml +84 -0
mapmycells2cl-0.1.0/src/mapmycells2cl/__init__.py +5 -0
mapmycells2cl-0.1.0/src/mapmycells2cl/annotator.py +340 -0
mapmycells2cl-0.1.0/src/mapmycells2cl/cli.py +199 -0
mapmycells2cl-0.1.0/src/mapmycells2cl/data/.gitkeep +0 -0
mapmycells2cl-0.1.0/src/mapmycells2cl/data/mapping.json +184030 -0
mapmycells2cl-0.1.0/src/mapmycells2cl/mapper.py +205 -0
mapmycells2cl-0.1.0/src/mapmycells2cl/parser.py +609 -0
mapmycells2cl-0.1.0/tests/__init__.py +0 -0
mapmycells2cl-0.1.0/tests/conftest.py +111 -0
mapmycells2cl-0.1.0/tests/integration/__init__.py +0 -0
mapmycells2cl-0.1.0/tests/integration/test_h5ad.py +45 -0
mapmycells2cl-0.1.0/tests/unit/__init__.py +0 -0
mapmycells2cl-0.1.0/tests/unit/test_annotator.py +245 -0
mapmycells2cl-0.1.0/tests/unit/test_mapper.py +111 -0
mapmycells2cl-0.1.0/tests/unit/test_parser.py +103 -0

mapmycells2cl-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,32 @@
+# Python-generated files
+__pycache__/
+*.py[oc]
+build/
+dist/
+wheels/
+*.egg-info
+# Virtual environments
+.venv
+# Coverage
+.coverage
+htmlcov/
+# Large OWL/ontology source files (use update-mappings to regenerate)
+cl.owl
+cl-full.owl
+cl-full.json
+pcl.owl
+# MapMyCells annotated output files
+*_annotated.csv
+*_annotated.json
+# Pytest cache
+.pytest_cache/
+# Sphinx docs build output and generated AutoAPI source files
+docs/_build/
+docs/autoapi/

mapmycells2cl-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Cellular Semantics
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

mapmycells2cl-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,349 @@
+Metadata-Version: 2.4
+Name: mapmycells2cl
+Version: 0.1.0
+Summary: Map MapMyCells ABA taxonomy IDs to Cell Ontology (CL) terms
+Project-URL: Homepage, https://github.com/Cellular-Semantics/MapMyCells2CL
+Project-URL: Repository, https://github.com/Cellular-Semantics/MapMyCells2CL
+Project-URL: Issues, https://github.com/Cellular-Semantics/MapMyCells2CL/issues
+Author: Cellular Semantics
+License: MIT
+License-File: LICENSE
+Keywords: allen-brain-atlas,cell-ontology,mapmycells,single-cell
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
+Requires-Python: >=3.13
+Requires-Dist: anndata>=0.10
+Requires-Dist: click>=8.1
+Description-Content-Type: text/markdown
+# MapMyCells2CL
+Annotate [MapMyCells](https://brain-map.org/bkp/analyze/mapmycells) output with [Cell Ontology (CL)](https://obofoundry.org/ontology/cl.html) terms.
+MapMyCells assigns cells to Allen Brain Atlas (ABA) taxonomy nodes (e.g. `CS20230722_SUBC_053`). This library maps those IDs to CL or Provisional Cell Ontology (PCL) terms and selects the **most specific CL term** using information-content (IC) ranking — ready for [CELLxGENE](https://cellxgene.cziscience.com/) schema compliance.
+---
+## Quick start
+```bash
+pip install mapmycells2cl
+# Annotate a MapMyCells CSV
+mmc2cl annotate results.csv
+# → results_annotated.csv
+# Annotate an h5ad file (CxG-compliant obs columns)
+mmc2cl annotate-h5ad results.csv cells.h5ad
+# → cells_annotated.h5ad
+```
+---
+## Installation
+```bash
+pip install mapmycells2cl
+# or
+uv add mapmycells2cl
+```
+Installing the package places two equivalent commands in your PATH: `mmc2cl` (short form) and `mapmycells2cl`.
+### From source (development)
+Requires [uv](https://docs.astral.sh/uv/).
+```bash
+git clone https://github.com/Cellular-Semantics/MapMyCells2CL.git
+cd MapMyCells2CL
+uv sync
+# Run via uv (no venv activation needed)
+uv run mmc2cl annotate results.csv
+# Or activate the venv once and use the command directly
+source .venv/bin/activate
+mmc2cl annotate results.csv
+```
+---
+## CLI reference
+### `annotate`
+Annotate a MapMyCells CSV or JSON output file with CL terms.
+```bash
+mmc2cl annotate INPUT_FILE [OPTIONS]
+```
+| Option | Description |
+|--------|-------------|
+| `-o, --output PATH` | Output file path. Defaults to `<input>_annotated.<ext>` |
+| `--mapping PATH` | Path to a custom mapping JSON (default: bundled `mapping.json`) |
+**Examples:**
+```bash
+# Annotate CSV
+mmc2cl annotate results.csv
+# Annotate JSON
+mmc2cl annotate results.json
+# Specify output path
+mmc2cl annotate results.csv -o /data/annotated.csv
+```
+**CSV output columns** — added after each `{level}_label` column using the CAP/HCA double-dash convention:
+| Column | Content | When |
+|--------|---------|------|
+| `{level}--cell_type_ontology_term_id` | Most specific CL CURIE (IC-ranked) | Always |
+| `{level}--cell_type` | Label for the above | Always |
+| `{level}--cell_type_pcl_ontology_term_id` | PCL exact match CURIE | PCL exact only |
+| `{level}--cell_type_pcl` | PCL exact label | PCL exact only |
+| `{level}--cell_type_cl_broad_ontology_term_ids` | All CL broad CURIEs, `\|`-joined | PCL exact only |
+Example (subclass level, PCL exact match):
+```
+subclass_label                              → CS20230722_SUBC_053
+subclass--cell_type_ontology_term_id        → CL:4023017
+subclass--cell_type                         → sst GABAergic cortical interneuron
+subclass--cell_type_pcl_ontology_term_id    → PCL:0110113
+subclass--cell_type_pcl                     → Sst Gaba sst GABAergic cortical interneuron (Mmus)
+subclass--cell_type_cl_broad_ontology_term_ids → CL:4023017|CL:4023069
+```
+**JSON output** — `cell_type_ontology_term_id`, `cell_type`, and (for PCL) `cell_type_pcl_ontology_term_id`, `cell_type_pcl`, `cell_type_cl_broad_ontology_term_ids` are added to each level's assignment dict.
+---
+### `annotate-h5ad`
+Annotate an AnnData h5ad file with CL terms from a MapMyCells CSV. Adds CL columns directly to `adata.obs`, including the unprefixed `cell_type_ontology_term_id` / `cell_type` pair required by the [CELLxGENE schema](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.3.0/schema.md).
+```bash
+mmc2cl annotate-h5ad MMC_CSV H5AD_IN [OPTIONS]
+```
+| Option | Description |
+|--------|-------------|
+| `-o, --output PATH` | Output h5ad path. Defaults to `<input>_annotated.h5ad` |
+| `--cxg-level TEXT` | Taxonomy level used for unprefixed CxG columns (default: `cluster`) |
+| `--mapping PATH` | Path to a custom mapping JSON |
+**Examples:**
+```bash
+# Annotate h5ad — output written to cells_annotated.h5ad
+mmc2cl annotate-h5ad results.csv cells.h5ad
+# Use supertype level for the CxG cell_type columns
+mmc2cl annotate-h5ad results.csv cells.h5ad --cxg-level supertype
+```
+**obs columns added:**
+| Column | Content |
+|--------|---------|
+| `cell_type_ontology_term_id` | IC-best CL CURIE from `--cxg-level` (CxG required) |
+| `cell_type` | Label for the above (CxG required) |
+| `{level}--cell_type_ontology_term_id` | Per-level IC-best CL CURIE |
+| `{level}--cell_type` | Per-level label |
+| `{level}--cell_type_pcl_ontology_term_id` | PCL CURIE (PCL exact only) |
+| `{level}--cell_type_pcl` | PCL label (PCL exact only) |
+| `{level}--cell_type_cl_broad_ontology_term_ids` | `\|`-joined broad CL CURIEs (PCL exact only) |
+Cells present in the h5ad but absent from the mmc CSV get empty strings.
+---
+### `update-mappings`
+Download the latest `pcl.owl` and regenerate the bundled `mapping.json`. Pass `--cl-owl` to include IC-ranked best-CL data (strongly recommended).
+```bash
+mmc2cl update-mappings [OPTIONS]
+```
+| Option | Description |
+|--------|-------------|
+| `--owl PATH` | Use a local `pcl.owl` instead of downloading |
+| `--cl-owl PATH` | Path to base `cl.owl` for IC computation. Downloads if omitted |
+| `--output PATH` | Output path (default: bundled `src/mapmycells2cl/data/mapping.json`) |
+**Examples:**
+```bash
+# Download latest pcl.owl and regenerate (no IC)
+mmc2cl update-mappings
+# With IC ranking (recommended) — requires cl.owl (~63 MB)
+mmc2cl update-mappings --cl-owl cl.owl
+# Use locally cached files
+mmc2cl update-mappings --owl pcl.owl --cl-owl cl.owl
+```
+> **Note:** `cl.owl` is large (~63 MB). The PURL `http://purl.obolibrary.org/obo/cl.owl` redirects to GitHub; download it manually if needed and pass the path with `--cl-owl`.
+---
+## Python API
+### `CellTypeMapper`
+```python
+from mapmycells2cl import CellTypeMapper
+mapper = CellTypeMapper()              # bundled mapping
+print(mapper.mapping_version)          # e.g. "2025-07-07"
+print(mapper.has_ic)                   # True when mapping includes IC data
+```
+### Single lookup
+```python
+result = mapper.lookup("CS20230722_SUBC_313")
+result.found            # True
+result.exact_id         # "CL:4300353"
+result.exact_label      # "Purkinje cell (Mmus)"
+result.ontology         # "CL"
+result.broad            # [] — already CL, no broad match needed
+result.best_cl_id       # "CL:4300353" — IC-ranked most specific CL term
+result.best_cl_label    # "Purkinje cell (Mmus)"
+result.best_cl_ic       # IC score (higher = more specific)
+result.mapping_version  # "2025-07-07"
+```
+```python
+result = mapper.lookup("CS20230722_SUBC_053")
+result.exact_id     # "PCL:0110113"
+result.ontology     # "PCL"
+result.best_cl_id   # "CL:4023017"  — IC-ranked best CL broad match
+result.broad        # [BroadMatch(id="CL:4023017", ...), BroadMatch(id="CL:4023069", ...)]
+for b in result.broad:
+    print(b.id, b.label, b.via)
+```
+```python
+result = mapper.lookup("CS20230722_UNKNOWN_999")
+result.found        # False
+result.best_cl_id   # ""
+```
+### Batch lookup
+```python
+results = mapper.lookup_many([
+    "CS20230722_SUBC_313",
+    "CS20230722_SUBC_053",
+    "CS20230722_CLUS_0768",
+])
+# Returns List[MatchResult] in the same order
+```
+### Annotator (programmatic use)
+```python
+from pathlib import Path
+from mapmycells2cl import CellTypeMapper
+from mapmycells2cl.annotator import annotate_csv, annotate_json, annotate_h5ad
+mapper = CellTypeMapper()
+# CSV / JSON
+annotate_csv(Path("results.csv"), Path("results_annotated.csv"), mapper)
+annotate_json(Path("results.json"), Path("results_annotated.json"), mapper)
+# h5ad — CxG-compliant obs columns
+annotate_h5ad(
+    Path("results.csv"),
+    Path("cells.h5ad"),
+    Path("cells_annotated.h5ad"),
+    mapper,
+    cxg_level="cluster",   # level used for unprefixed cell_type columns
+)
+```
+---
+## How it works
+### Exact matches
+Extracted from `owl:equivalentClass` axioms in `pcl.owl`:
+```
+CL/PCL_class ≡ CL_0000000 ∧ (RO_0015001 hasValue <ABA_individual>)
+```
+Every ABA taxonomy ID maps to either a **CL term** (direct Cell Ontology entry) or a **PCL term** (Provisional Cell Ontology — finer-grained types not yet promoted to CL).
+### Broad matches
+For PCL exact matches, the library walks `rdfs:subClassOf` edges upward until CL terms are reached. Because the hierarchy is a DAG (not a tree), a single PCL term may yield **multiple CL broad matches** (polyhierarchy).
+### IC-ranked best CL term
+When multiple CL broad matches exist, the **most specific** is selected using structure-based Information Content computed over the base CL hierarchy (no PCL):
+```
+IC(c) = -log2(|distinct leaf descendants of c| / |total CL leaves|)
+```
+Higher IC = more specific. This is pre-computed at `update-mappings` time and stored in `mapping.json`, so there is no runtime CL dependency.
+### Coverage (CCN20230722 taxonomy)
+| Level | → CL | → PCL | Total |
+|-------|------|-------|-------|
+| CLAS (class) | 3 | 24 | 27 |
+| SUBC (subclass) | 15 | 230 | 245 |
+| SUPT (supertype) | 32 | 983 | 1,015 |
+| CLUS (cluster) | 80 | 5,234 | 5,314 |
+| **Total** | **130** | **6,471** | **6,601** |
+---
+## Data sources
+- [`pcl.owl`](http://purl.obolibrary.org/obo/pcl.owl) — Provisional Cell Ontology; primary mapping source
+- [`cl.owl`](http://purl.obolibrary.org/obo/cl.owl) — Base Cell Ontology (no imports); used for IC computation
+Both large OWL files are excluded from the repo. The bundled `mapping.json` is versioned with the PCL release date and includes all pre-computed IC scores.
+---
+## Development
+```bash
+uv sync --dev
+uv run mypy src/                          # type check
+uv run ruff check --fix src/ tests/       # lint
+uv run ruff format src/ tests/            # format
+uv run pytest -m unit --cov              # unit tests (fast, no external deps)
+uv run pytest -m integration             # integration tests (requires test_resources/)
+```
+CI runs mypy, ruff, and unit tests on every PR via GitHub Actions.
+---
+## Known gaps
+- Basal Ganglia ABA mappings are absent from CL — fix planned for a future CL release.
+- `oaklib` integration deferred (not yet needed for current use cases).