PyPI - biopipen - Versions diffs - 0.34.2__py3-none-any.whl → 0.34.3__py3-none-any.whl - Mend

biopipen 0.34.2py3-none-any.whl → 0.34.3py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of biopipen might be problematic. Click here for more details.

Files changed (27) hide show

biopipen/__init__.py CHANGED Viewed

	@@ -1 +1 @@
1	- __version__ = "0.34.2"
1	+ __version__ = "0.34.3"

biopipen/ns/scrna.py CHANGED Viewed

@@ -531,6 +531,8 @@ class SeuratClusterStats(Proc):
     Envs:
         mutaters (type=json): The mutaters to mutate the metadata to subset the cells.
             The mutaters will be applied in the order specified.
+            You can also use the clone selectors to select the TCR clones/clusters.
+            See <https://pwwang.github.io/scplotter/reference/clone_selectors.html>.
         cache (type=auto): Whether to cache the plots.
             Currently only plots for features are supported, since creating the those
             plots can be time consuming.
@@ -564,6 +566,7 @@ class SeuratClusterStats(Proc):
                 - res (type=int): The resolution of the plots.
                 - height (type=int): The height of the plots.
                 - width (type=int): The width of the plots.
+            - descr: The description of the plot, showing in the report.
             - more_formats (type=list): The formats to save the plots other than `png`.
             - save_code (flag): Whether to save the code to reproduce the plot.
             - save_data (flag): Whether to save the data used to generate the plot.
@@ -655,6 +658,7 @@ class SeuratClusterStats(Proc):
         "clustrees": {},
         "stats_defaults": {
             "subset": None,
+            "descr": None,
             "devpars": {"res": 100},
             "more_formats": [],
             "save_code": False,
@@ -663,10 +667,12 @@ class SeuratClusterStats(Proc):
         "stats": {
             "Number of cells in each cluster (Bar Chart)": {
                 "plot_type": "bar",
+                "x_text_angle": 90,
             },
             "Number of cells in each cluster by Sample (Bar Chart)": {
                 "plot_type": "bar",
                 "group_by": "Sample",
+                "x_text_angle": 90,
             },
         },
         "ngenes_defaults": {
@@ -698,7 +704,6 @@ class SeuratClusterStats(Proc):
         "dimplots": {
             "Dimensional reduction plot": {
                 "label": True,
-                "label_insitu": True,
             },
         },
     }
@@ -1025,7 +1030,9 @@ class MarkersFinder(Proc):
         ncores (type=int): Number of cores to use for parallel computing for some `Seurat` procedures.
             * Used in `future::plan(strategy = "multicore", workers = <ncores>)` to parallelize some Seurat procedures.
             * See also: <https://satijalab.org/seurat/articles/future_vignette.html>
-        mutaters (type=json): The mutaters to mutate the metadata
+        mutaters (type=json): The mutaters to mutate the metadata.
+            You can also use the clone selectors to select the TCR clones/clusters.
+            See <https://pwwang.github.io/scplotter/reference/clone_selectors.html>.
         group_by: The column name in metadata to group the cells.
             If only `group_by` is specified, and `ident-1` and `ident-2` are
             not specified, markers will be found for all groups in this column
@@ -1237,7 +1244,9 @@ class TopExpressingGenes(Proc):
         outdir: The output directory for the tables and plots
     Envs:
-        mutaters (type=json): The mutaters to mutate the metadata
+        mutaters (type=json): The mutaters to mutate the metadata.
+            You can also use the clone selectors to select the TCR clones/clusters.
+            See <https://pwwang.github.io/scplotter/reference/clone_selectors.html>.
         ident: The group of cells to find the top expressing genes.
             The cells will be selected by the `group_by` column with this
             `ident` value in metadata.
@@ -1606,6 +1615,8 @@ class ScFGSEA(Proc):
             Passed to `nproc` of `fgseaMultilevel()`.
         mutaters (type=json): The mutaters to mutate the metadata.
             The key-value pairs will be passed the `dplyr::mutate()` to mutate the metadata.
+            You can also use the clone selectors to select the TCR clones/clusters.
+            See <https://pwwang.github.io/scplotter/reference/clone_selectors.html>.
         group_by: The column name in metadata to group the cells.
         ident_1: The first group of cells to compare
@@ -2699,6 +2710,8 @@ class PseudoBulkDEG(Proc):
             seurat object. Keys are the new column names and values are the
             expressions to mutate the columns. These new columns can be
             used to define your cases.
+            You can also use the clone selectors to select the TCR clones/clusters.
+            See <https://pwwang.github.io/scplotter/reference/clone_selectors.html>.
         each: The column name in metadata to separate the cells into different cases.
             When specified, the case will be expanded to multiple cases for
             each value in the column.

biopipen/ns/scrna_metabolic_landscape.py CHANGED Viewed

@@ -165,7 +165,7 @@ class MetabolicFeatures(Proc):
             `1`, `2` and `3` in the `group_by` column, we could have
             `comparisons = ["1", "2"]`, which will compare the group `1` with groups
             `2` and `3`, and the group `2` with groups `1` and `3`. We could also
-            have `comparisons = ["1,2", "1,3"]`, which will compare the group `1` with
+            have `comparisons = ["1:2", "1:3"]`, which will compare the group `1` with
             group `2` and group `1` with group `3`.
         fgsea_args (type=json): Other arguments for the `fgsea::fgsea()` function.
             For example, `{"minSize": 15, "maxSize": 500}`.

biopipen/ns/tcr.py CHANGED Viewed

@@ -1749,6 +1749,11 @@ class ScRepCombiningExpression(Proc):
     Output:
         outfile: The `Seurat` object with the TCR/BCR data combined
+            In addition to the meta columns added by
+            `scRepertoire::combineExpression()`, a new column `TCR_Presence` will be
+            added to the metadata. It indicates whether the cell has a TCR/BCR
+            sequence or not. The value is `TRUE` if the cell has a TCR/BCR sequence,
+            and `FALSE` otherwise.
     Envs:
         cloneCall: How to call the clone - VDJC gene (gene), CDR3 nucleotide (nt),

biopipen/reports/scrna_metabolic_landscape/MetabolicFeatures.svelte CHANGED Viewed

@@ -34,15 +34,15 @@ The cells are grouped at 2 dimensions: `subset_by`, usually the clinic groups th
 <UnorderedList>
 <ListItem>
-    <a href="../MetabolicPathwayActivity/index.html">MetabolicPathwayActivity</a>
+    <a href="?proc=MetabolicPathwayActivity" class="listitem">MetabolicPathwayActivity</a>
     <Tile><p>Investigating the metabolic pathways of the cells in different subsets and groups.</p></Tile>
 </ListItem>
 <ListItem>
-    <a href="../MetabolicPathwayHeterogeneity/index.html">MetabolicPathwayHeterogeneity</a>
+    <a href="?proc=MetabolicPathwayHeterogeneity" class="listitem">MetabolicPathwayHeterogeneity</a>
     <Tile><p>Showing metabolic pathways enriched in genes with highest contribution to the metabolic heterogeneities</p></Tile>
 </ListItem>
 <ListItem>
-    MetabolicFeatures (this page)
+    <span class="listitem">MetabolicFeatures (this page)</span>
     <Tile>
     <p>Gene set enrichment analysis against the metabolic pathways for comparisons by different groups in different subsets.</p>
     <p>The metabolic features are actual gene set enrichment analysis (GSEA) results for the metabolic pathways with given comparisons.</p>
@@ -59,3 +59,12 @@ The cells are grouped at 2 dimensions: `subset_by`, usually the clinic groups th
 {%- endmacro -%}
 {{ report_jobs(jobs, head_job, report_job) }}
+<style>
+.listitem {
+    font-size: large;
+    font-weight: bold;
+    margin: 1rem 0 0.5rem 0;
+    display: inline-block;
+}
+</style>

biopipen/reports/scrna_metabolic_landscape/MetabolicPathwayActivity.svelte CHANGED Viewed

@@ -34,7 +34,7 @@ The cells are grouped at 2 dimensions: `subset_by`, usually the clinic groups th
 <UnorderedList>
 <ListItem>
-    MetabolicPathwayActivity (this page)
+    <span class="listitem">MetabolicPathwayActivity (this page)</span>
     <Tile>
         <p>Investigating the metabolic pathways of the cells in different subsets and groups.</p>
         <p>The cells are first subset by subsets and then the metabolic activities are examined for each groups in different subsets.</p>
@@ -69,13 +69,13 @@ The cells are grouped at 2 dimensions: `subset_by`, usually the clinic groups th
     </Tile>
 </ListItem>
 <ListItem>
-    <a href="../MetabolicPathwayHeterogeneity/index.html">MetabolicPathwayHeterogeneity</a>
+    <a href="?proc=MetabolicPathwayHeterogeneity" class="listitem">MetabolicPathwayHeterogeneity</a>
     <Tile>
         <p>Showing metabolic pathways enriched in genes with highest contribution to the metabolic heterogeneities</p>
     </Tile>
 </ListItem>
 <ListItem>
-    <a href="../MetabolicFeatures/index.html">MetabolicFeatures</a>
+    <a href="?proc=MetabolicFeatures" class="listitem">MetabolicFeatures</a>
     <Tile>
         <p>Gene set enrichment analysis against the metabolic pathways for comparisons by different groups in different subsets.</p>
     </Tile>
@@ -91,3 +91,12 @@ The cells are grouped at 2 dimensions: `subset_by`, usually the clinic groups th
 {%- endmacro -%}
 {{ report_jobs(jobs, head_job, report_job) }}
+<style>
+.listitem {
+    font-size: large;
+    font-weight: bold;
+    margin: 1rem 0 0.5rem 0;
+    display: inline-block;
+}
+</style>

biopipen/reports/scrna_metabolic_landscape/MetabolicPathwayHeterogeneity.svelte CHANGED Viewed

@@ -34,13 +34,13 @@ The cells are grouped at 2 dimensions: `subset_by`, usually the clinic groups th
 <UnorderedList>
 <ListItem>
-    <a href="../MetabolicPathwayActivity/index.html">MetabolicPathwayActivity</a>
+    <a href="?proc=MetabolicPathwayActivity" class="listitem">MetabolicPathwayActivity</a>
     <Tile>
     <p>Investigating the metabolic pathways of the cells in different subsets and groups.</p>
     </Tile>
 </ListItem>
 <ListItem>
-    MetabolicPathwayHeterogeneity (this page)
+    <span class="listitem">MetabolicPathwayHeterogeneity (this page)</span>
     <Tile>
     <p>Showing metabolic pathways enriched in genes with highest contribution to the metabolic heterogeneities</p>
     <p>
@@ -54,7 +54,7 @@ The cells are grouped at 2 dimensions: `subset_by`, usually the clinic groups th
     </Tile>
 </ListItem>
 <ListItem>
-    <a href="../MetabolicFeatures/index.html">MetabolicFeatures</a>
+    <a href="?proc=MetabolicFeatures" class="listitem">MetabolicFeatures</a>
     <Tile>
     <p>Gene set enrichment analysis against the metabolic pathways for comparisons by different groups in different subsets.</p>
     </Tile>
@@ -70,3 +70,12 @@ The cells are grouped at 2 dimensions: `subset_by`, usually the clinic groups th
 {%- endmacro -%}
 {{ report_jobs(jobs, head_job, report_job) }}
+<style>
+.listitem {
+    font-size: large;
+    font-weight: bold;
+    margin: 1rem 0 0.5rem 0;
+    display: inline-block;
+}
+</style>

biopipen/scripts/scrna/CellTypeAnnotation-celltypist.R CHANGED Viewed

@@ -26,15 +26,8 @@ if (is.null(celltypist_args$model)) {
 }
 dir.create(file.path(outdir, "data", "models"), recursive = TRUE, showWarnings = FALSE)
 modelfile <- file.path(outdir, "data", "models", basename(celltypist_args$model))
-if (!file.exists(modelfile)) {
-    file.symlink(celltypist_args$model, modelfile)
-} else {
-    real_modelfile <- normalizePath(Sys.readlink(modelfile))
-    if (real_modelfile != normalizePath(celltypist_args$model)) {
-        file.remove(modelfile)
-        file.symlink(celltypist_args$model, modelfile)
-    }
-}
+suppressWarnings(file.remove(modelfile))
+file.symlink(normalizePath(celltypist_args$model), modelfile)
 sobj <- NULL
 if (!endsWith(sobjfile, ".h5ad")) {
@@ -43,7 +36,7 @@ if (!endsWith(sobjfile, ".h5ad")) {
         # find the default ident name in meta.data
         for (col in colnames(sobj@meta.data)) {
             if (!is.factor(sobj@meta.data[[col]])) { next }
-            if (isTRUE(all.equal(Idents(sobj), sobj@meta.data[[col]]))) {
+            if (isTRUE(all.equal(unname(Idents(sobj)), sobj@meta.data[[col]]))) {
                 celltypist_args$over_clustering <- col
                 break
             }

biopipen/scripts/scrna/SeuratClusterStats-clustree.R CHANGED Viewed

@@ -26,6 +26,22 @@ if (
     if (length(clustrees) == 0) {
         log$warn("- no case found, skipping ...")
     } else {
+        reporter$add(
+            list(
+                kind = "descr",
+                content = 'The clustree plots displays clustering results from the Seurat object across different
+                resolutions of the clustering algorithm
+                (<a target="_blank" href="https://satijalab.org/seurat/reference/findclusters">Seurat::FindClusters</a>).
+                Each node represents a cluster, with the resolution levels labeled along the vertical (y) axis.
+                The size of each node reflects the number of cells in that cluster. Edges connect clusters between
+                adjacent resolutions and indicate how cells transition between clusters as resolution increases.
+                The thickness of the edges corresponds to the proportion of shared cells (in_prop) between clusters,
+                where darker lines signify a higher overlap (up to 100%). The color of the edges indicates the actual
+                number of cells that transitioned between clusters.'
+            ),
+            h1 = "Clustree plots"
+        )
         reports <- list()
         for (name in names(clustrees)) {
             if (is.null(clustrees[[name]]$prefix)) {

biopipen/scripts/scrna/SeuratClusterStats-dimplots.R CHANGED Viewed

@@ -40,7 +40,7 @@ do_one_dimplot = function(name) {
     reporter$add(
         list(
             kind = "descr",
-            content = paste0("Dimensionality reduction plot for ", case$group.by)
+            content = paste0("Dimensionality reduction plot for ", case$group_by)
         ),
         reporter$image(prefix, "pdf", FALSE),
         h1 = name

biopipen/scripts/scrna/SeuratClusterStats-features.R CHANGED Viewed

@@ -64,11 +64,11 @@ do_one_features <- function(name) {
     log$info("- Case: {name}")
     case <- list_update(features_defaults, features[[name]])
-    case$descr <- case$descr %||% ""
     case <- extract_vars(
         case,
         "devpars", "more_formats", "save_code", "save_data", "order_by",
-        "subset", "features", "descr")
+        "subset", "features", "descr",
+        allow_nonexisting = TRUE)
     if (!is.null(subset)) {
         case$object <- srtobj %>% filter(!!parse_expr(subset))
@@ -77,6 +77,7 @@ do_one_features <- function(name) {
     }
     if (exists("order_by") && !is.null(order_by)) {
+        case$ident <- case$ident %||% GetIdentityColumn(case$object)
         if (length(order_by) < 2) {
             clusters <- case$object@meta.data %>%
                 group_by(!!sym(case$ident)) %>%
@@ -126,12 +127,34 @@ do_one_features <- function(name) {
         caching$save(info$prefix)
     }
     # add reports
-    if (!is.null(descr) && nchar(descr) > 0) {
-        reporter$add2(
-            list(kind = "descr", content = descr),
-            hs = c(info$section, info$name)
+    default_descr <- glue(
+        "The plot shows the distribution or pattern of the specified features ({paste(case$features %||% features, collapse = ', ')}) ",
+        "across cells",
+        "{if (!is.null(case$ident)) glue(', identified by \"{case$ident}\"') else ''}",
+        "{if (!is.null(case$group_by)) glue(', grouped by \"{case$group_by}\"') else ''}",
+        "{if (!is.null(case$split_by)) glue(', and split by \"{case$split_by}\"') else ''}. ",
+        "The plot type is '{case$plot_type}', ",
+        "{if (case$plot_type == 'dim') 'displaying the features on a dimensional reduction embedding' ",
+        " else if (case$plot_type == 'heatmap') 'arranged as a heatmap by rows_name and other grouping variables' ",
+        " else if (case$plot_type %in% c('violin', 'box', 'ridge')) 'showing the distribution of feature values by the grouping variables' ",
+        " else if (case$plot_type == 'cor') 'showing the correlation between features' ",
+        " else 'showing aggregated feature values by the grouping variables'}. ",
+        "{if (!is.null(case$facet_by)) glue('Plots are further faceted by \"{case$facet_by}\". ') else ''}",
+        "{if (case$plot_type == 'dim') glue('The reduction used is \"{if (!is.null(case$reduction)) case$reduction else DefaultDimReduc(case$object)}\"') else ''}",
+        "{if (case$plot_type == 'dim' && !is.null(case$graph)) glue(', with graph \"{case$graph}\" drawn to show cell neighbor edges') else ''}",
+        "{if (case$plot_type == 'dim' && !is.null(case$bg_cutoff) && case$bg_cutoff > 0) glue(', and a background cutoff of {case$bg_cutoff}') else ''}",
+        "{if (case$plot_type == 'dim') glue(', using dimensions {paste(case$dims %||% 1:2, collapse = \",\")}') else ''}"
+    )
+    if (!is.null(case$comparisons)) {
+        default_descr <- paste0(
+            default_descr,
+            "Statistical comparisons were performed between groups using '{case$pairwise_method %||% 'wilcox.test'}' method."
         )
     }
+    reporter$add2(
+        list(kind = "descr", content = descr %||% default_descr),
+        hs = c(info$section, info$name)
+    )
     if (save_data) {
         reporter$add2(

biopipen/scripts/scrna/SeuratClusterStats-stats.R CHANGED Viewed

@@ -5,17 +5,26 @@ log$info("stats:")
 odir <- file.path(outdir, "stats")
 dir.create(odir, recursive=TRUE, showWarnings=FALSE)
 do_one_stats <- function(name) {
     log$info("- Case: {name}")
     case <- list_update(stats_defaults, stats[[name]])
-    extract_vars(case, "devpars", "more_formats", "save_code", "save_data", "subset")
+    case <- extract_vars(case, "devpars", "more_formats", "save_code", "save_data", "subset", "descr")
     if (!is.null(subset)) {
         case$object <- srtobj %>% filter(!!parse_expr(subset))
     } else {
         case$object <- srtobj
     }
+    ident <- case$ident %||% GetIdentityColumn(case$object)
+    groupings <- unique(c(case$group_by, case$rows_by, case$columns_by, case$pie_group_by, ident))
+    if (length(groupings) > 0) {
+        for (g in groupings) {
+            case$object <- filter(case$object, !is.na(!!sym(g)))
+        }
+    }
     info <- case_info(name, odir, is_dir = FALSE, create = TRUE)
     p <- do_call(gglogger::register(CellStatPlot), case)
@@ -27,6 +36,20 @@ do_one_stats <- function(name) {
             auto_data_setup = FALSE)
     }
+    frac <- case$frac %||% "none"
+    default_descr <- glue(
+        "The {case$plot_type} plot shows the distribution of cells across categories defined by '{ident}'",
+        "{if (!is.null(case$group_by)) glue(', grouped by {case$group_by}') else ''}",
+        "{if (!is.null(case$split_by)) glue(', and split by {case$split_by}') else ''}. ",
+        "The values represent ",
+        "{if (frac == 'none') 'the number of cells' else glue('the fraction of cells calculated by \"{frac}\"')}. "
+    )
+    if (!is.null(case$comparisons)) {
+        default_descr <- paste0(
+            default_descr,
+            "Statistical comparisons were performed between groups using '{case$pairwise_method %||% 'wilcox.test'}' method."
+        )
+    }
     if (save_data) {
         pdata <- attr(p, "data") %||% p$data
         if (!inherits(pdata, "data.frame") && !inherits(pdata, "matrix")) {
@@ -37,6 +60,10 @@ do_one_stats <- function(name) {
             list(
                 name = "Plot",
                 contents = list(
+                    list(
+                        kind = "descr",
+                        content = case$descr %||% default_descr
+                    ),
                     reporter$image(
                         info$prefix, more_formats, save_code, kind = "image")
                 )
@@ -60,6 +87,7 @@ do_one_stats <- function(name) {
         )
     } else {
         reporter$add2(
+            list(kind = "descr", content = case$descr %||% default_descr),
             reporter$image(info$prefix, more_formats, save_code, kind = "image"),
             hs = c(info$section, info$name)
         )

biopipen/scripts/scrna/SeuratClusterStats.R CHANGED Viewed

@@ -3,6 +3,7 @@ library(rlang)
 library(dplyr)
 library(tidyr)
 library(tibble)
+library(glue)
 library(forcats)
 library(tidyseurat)
 library(gglogger)

biopipen/scripts/scrna/celltypist-wrapper.py CHANGED Viewed

@@ -29,6 +29,8 @@ if __name__ == "__main__":
         raise ValueError(
             f"Over clustering column '{over_clustering}' not found in AnnData object."
         )
+    if 'neighbors' in adata.uns and 'params' in adata.uns['neighbors']:
+        adata.uns['neighbors']['params'].setdefault('n_neighbors', 15)
     annotated = celltypist.annotate(
         adata,

biopipen/scripts/scrna_metabolic_landscape/MetabolicFeatures.R CHANGED Viewed

@@ -98,7 +98,13 @@ do_comparison <- function(object, caseinfo, subset_by, subset_val, group_by, gro
     }
     classes <- as.character(object@meta.data[[group_by]])
-    classes[classes != group1] <- "_REST"
+    if (!group1 %in% classes) {
+        stop("Group '", group1, "' not found in '", group_by, "' column of the Seurat object.")
+    }
+    if (!is.null(group2) && !group2 %in% classes) {
+        stop("Group '", group2, "' not found in '", group_by, "' column of the Seurat object.")
+    }
+    classes[classes != group1] <- "Other"
     if (any(table(classes) < 5)) {
         msg <- paste0(
             "  ! skipped. Group has less than 5 cells: ",
@@ -266,8 +272,8 @@ do_subset <- function(object, caseinfo, subset_by, subset_val, group_by, compari
             rbind, lapply(
                 as.character(comparisons),
                 function(comparison) {
-                    if (grepl(",", comparison)) {
-                        group1 <- trimws(unlist(strsplit(comparison, ",")))
+                    if (grepl(":", comparison)) {
+                        group1 <- trimws(unlist(strsplit(comparison, ":")))
                         group2 <- group1[2]
                         group1 <- group1[1]
                     } else {

biopipen/scripts/scrna_metabolic_landscape/MetabolicPathwayActivity.R CHANGED Viewed

@@ -315,8 +315,8 @@ do_subset <- function(
             plotargs$keep_empty <- TRUE
             p <- do_call(plotfn, plotargs)
-            devpars$width <- devpars$width %||% (attr(p, "width") * devpars$res) %||% 1000
-            devpars$height <- devpars$height %||% (attr(p, "height") * devpars$res) %||% 1000
+            devpars$width <- devpars$width %||% (attr(p, "width") * 2 * devpars$res) %||% 1000
+            devpars$height <- devpars$height %||% (attr(p, "height") * 2 * devpars$res) %||% 1000
         } else {  # heatmap
             minval <- min(dat)
             maxval <- max(dat)

biopipen/scripts/scrna_metabolic_landscape/MetabolicPathwayHeterogeneity.R CHANGED Viewed

@@ -195,6 +195,7 @@ do_subset <- function(object, caseinfo, subset_by, subset_val, group_by, plots,
         plotprefix <- file.path(odir, slugify(plot))
         plotargs$devpars$width <- plotargs$devpars$width %||% (attr(p, "width") * plotargs$devpars$res) %||% 800
         plotargs$devpars$height <- plotargs$devpars$height %||% (attr(p, "height") * plotargs$devpars$res) %||% 600
+        plotargs$devpars$height <- max(plotargs$devpars$height, plotargs$devpars$width / 1.5)
         png(
             filename = paste0(plotprefix, ".png"),
             width = plotargs$devpars$width,

biopipen/scripts/tcr/GIANA/GIANA4.py CHANGED Viewed

@@ -36,9 +36,6 @@ from sklearn.manifold import MDS
 import faiss
 from query import *
 try:
-    from Bio.SubsMat.MatrixInfo import blosum62
-    print(blosum62)
-except ModuleNotFoundError:
     from Bio.Align import substitution_matrices
     blosum62 = substitution_matrices.load("BLOSUM62")
     _tmp = {}
@@ -46,7 +43,8 @@ except ModuleNotFoundError:
         for ab2 in blosum62.alphabet:
             _tmp[(ab1, ab2)] = int(blosum62[(ab1, ab2)])
     blosum62 = _tmp
-    print(blosum62)
+except ModuleNotFoundError:
+    from Bio.SubsMat.MatrixInfo import blosum62
 AAstring = "ACDEFGHIKLMNPQRSTVWY"
 AAstringList = list(AAstring)

biopipen/scripts/tcr/ScRepCombiningExpression.R CHANGED Viewed

@@ -34,6 +34,7 @@ obj <- combineExpression(
     cloneSize = unlist(cloneSize),
     addLabel = addLabel
 )
+obj$TCR_Presence <- !is.na(obj$CTaa)
 log$info("Saving combined object ...")
 save_obj(obj, outfile)

biopipen/scripts/tcr/ScRepLoading.R CHANGED Viewed

@@ -118,8 +118,13 @@ load_contig <- function(input, sample, fmt) {
     fmt <- dirfmt[[2]]
     if (is.null(dir)) { return(NULL) }
     x <- loadContigs(dir, format = fmt %||% "10X")
-    x[[1]]$sample <- NULL
-    x[[1]]
+    x <- x[[1]]
+    x$sample <- NULL
+    if (identical(fmt %||% "10X", "10X") && colnames(x)[1] == "X") {
+        x$X <- NULL
+    }
+    x
 }

biopipen/scripts/tcr/TCRClustering.R CHANGED Viewed

@@ -130,11 +130,10 @@ output.clusters_df.to_csv(clustcr_dir + "/clusters.txt", sep="\t", index=False)
     clustcr_file
 }
-clean_clustcr_output = function(clustcr_outfile, clustcr_input) {
+clean_clustcr_output = function(clustcr_outfile) {
     clustcr_out = read.delim2(clustcr_outfile, header=TRUE, row.names = NULL)
     colnames(clustcr_out) = c("CDR3.aa", "TCR_Cluster")
-    in_cdr3 = read.delim2(clustcr_input, header=TRUE, row.names = NULL)
-    out = left_join(in_cdr3, distinct(clustcr_out), by=c("CDR3.aa")) %>%
+    out = left_join(cdr3aa_df, distinct(clustcr_out), by=c(cdr3seq4clustering = "CDR3.aa")) %>%
         mutate(
             TCR_Cluster = if_else(
                 is.na(TCR_Cluster),
@@ -170,7 +169,7 @@ run_clustcr = function() {
         quit(status=rc)
     }
     clustcr_outfile = file.path(clustcr_dir, "clusters.txt")
-    clean_clustcr_output(clustcr_outfile, clustcr_input)
+    clean_clustcr_output(clustcr_outfile)
 }
 prepare_giana = function() {
@@ -193,21 +192,8 @@ prepare_giana = function() {
 }
 prepare_input = function() {
-    # prepare input file for GIANA
-    cdr3 = c()
-    # cdr3col = if (!on_multi) "cdr3" else "CDR3.aa"
-    cdr3col = "CDR3.aa"
-    for (sample in names(seqdata)) {
-        sdata = seqdata[[sample]]
-        if (on_multi) {
-            sdata[[cdr3col]] = sub(";", "", sdata[[cdr3col]])
-        } else if ("chain" %in% colnames(sdata)) {
-            sdata = sdata %>% separate_rows(chain, cdr3col, sep = ";") %>%
-                filter(chain == "TRB")
-        }
-        cdr3 = union(cdr3, unique(sdata[[cdr3col]]))
-    }
-    cdr3 = unique(cdr3)
+    cdr3aa_df$cdr3seq4clustering <<- gsub("[^A-Z]", "", cdr3aa_df$CDR3.aa)  # Remove non-amino acid characters
+    cdr3 <- unique(cdr3aa_df$cdr3seq4clustering)
     # cdr3 = distinct(cdr3, aminoAcid, vMaxResolved)
@@ -220,15 +206,14 @@ prepare_input = function() {
     cdr3file
 }
-clean_giana_output = function(giana_outfile, giana_infile) {
+clean_giana_output = function(giana_outfile) {
     # generate an output file with columns:
     # CDR3.aa, TCR_Cluster, V.name, Sample
     # If sequence doesn't exist in the input file,
     # Then a unique cluster id is assigned to it.
     giana_out = read.delim2(giana_outfile, header=FALSE, comment.char = "#", row.names = NULL)[, 1:2, drop=FALSE]
     colnames(giana_out) = c("CDR3.aa", "TCR_Cluster")
-    in_cdr3 = read.delim2(giana_infile, header=TRUE, row.names = NULL)
-    out = left_join(in_cdr3, distinct(giana_out), by=c("CDR3.aa")) %>%
+    out = left_join(cdr3aa_df, distinct(giana_out), by=c(cdr3seq4clustering = "CDR3.aa")) %>%
         mutate(
             TCR_Cluster = if_else(
                 is.na(TCR_Cluster),
@@ -283,10 +268,11 @@ run_giana = function() {
         quit(status=rc)
     }
     giana_outfile = file.path(giana_outdir, "cdr3--RotationEncodingBL62.txt")
-    clean_giana_output(giana_outfile, giana_input)
+    clean_giana_output(giana_outfile)
 }
 attach_to_obj = function(obj, out) {
+    out <- as.data.frame(out)
     rownames(out) <- out$Barcode
     if (is_seurat) {
         # Attach results to Seurat object

biopipen/scripts/tcr/TESSA.R CHANGED Viewed

@@ -39,9 +39,11 @@ log$info("Preparing TCR input file ...")
 # If immfile endswith .rds, then it is an immunarch object
 tcrdata <- sobj@meta.data %>%
     rownames_to_column("contig_id") %>%
+    select(contig_id, CTaa, CTgene, sample = Sample) %>%
     filter(!is.na(CTaa) & !is.na(CTgene)) %>%
-    separate(CTaa, into = c(NA, "cdr3"), sep = "_", remove = FALSE) %>%
-    separate(CTgene, into = c(NA, "vjgene"), sep = "_", remove = FALSE) %>%
+    separate(CTaa, into = c(NA, "cdr3"), sep = "_", remove = TRUE) %>%
+    filter(!is.na(cdr3) & cdr3 != "NA" & cdr3 != "nan") %>%
+    separate(CTgene, into = c(NA, "vjgene"), sep = "_", remove = TRUE) %>%
     separate(vjgene, into = c("v_gene", NA, "j_gene", NA), sep = "\\.", remove = TRUE) %>%
     mutate(v_gene = sub("-\\d+$", "", v_gene), j_gene = sub("-\\d+$", "", j_gene))

{biopipen-0.34.2.dist-info → biopipen-0.34.3.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: biopipen
-Version: 0.34.2
+Version: 0.34.3
 Summary: Bioinformatics processes/pipelines that can be run from `pipen run`
 License: MIT
 Author: pwwang

biopipen 0.34.2__py3-none-any.whl → 0.34.3__py3-none-any.whl

Potentially problematic release.

biopipen 0.34.2py3-none-any.whl → 0.34.3py3-none-any.whl