DeConveil 0.1.0__tar.gz → 0.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {deconveil-0.1.0 → deconveil-0.1.1}/DeConveil.egg-info/PKG-INFO +3 -2
- deconveil-0.1.1/DeConveil.egg-info/SOURCES.txt +18 -0
- deconveil-0.1.1/DeConveil.egg-info/top_level.txt +1 -0
- {deconveil-0.1.0 → deconveil-0.1.1}/PKG-INFO +3 -2
- deconveil-0.1.1/README.md +64 -0
- {deconveil-0.1.0/DeConveil → deconveil-0.1.1/deconveil}/dds.py +6 -10
- {deconveil-0.1.0/DeConveil → deconveil-0.1.1/deconveil}/default_inference.py +9 -17
- {deconveil-0.1.0/DeConveil → deconveil-0.1.1/deconveil}/ds.py +1 -3
- {deconveil-0.1.0/DeConveil → deconveil-0.1.1/deconveil}/grid_search.py +2 -2
- {deconveil-0.1.0/DeConveil → deconveil-0.1.1/deconveil}/inference.py +2 -9
- deconveil-0.1.1/deconveil/utils_clustering.py +201 -0
- deconveil-0.1.0/DeConveil/utils_CNaware.py → deconveil-0.1.1/deconveil/utils_fit.py +3 -287
- deconveil-0.1.1/deconveil/utils_plot.py +308 -0
- deconveil-0.1.1/deconveil/utils_processing.py +132 -0
- {deconveil-0.1.0 → deconveil-0.1.1}/setup.py +2 -2
- deconveil-0.1.0/DeConveil.egg-info/SOURCES.txt +0 -15
- deconveil-0.1.0/DeConveil.egg-info/top_level.txt +0 -1
- deconveil-0.1.0/README.md +0 -40
- {deconveil-0.1.0 → deconveil-0.1.1}/DeConveil.egg-info/dependency_links.txt +0 -0
- {deconveil-0.1.0 → deconveil-0.1.1}/DeConveil.egg-info/requires.txt +0 -0
- {deconveil-0.1.0 → deconveil-0.1.1}/LICENSE +0 -0
- {deconveil-0.1.0/DeConveil → deconveil-0.1.1/deconveil}/__init__.py +0 -0
- {deconveil-0.1.0 → deconveil-0.1.1}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
|
-
Metadata-Version: 2.
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
2
|
Name: DeConveil
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.1
|
|
4
4
|
Summary: An extension of PyDESeq2/DESeq2 designed to account for genome aneuploidy
|
|
5
5
|
Home-page: https://github.com/caravagnalab/DeConveil
|
|
6
6
|
Author: Katsiaryna Davydzenka
|
|
@@ -29,6 +29,7 @@ Dynamic: author
|
|
|
29
29
|
Dynamic: author-email
|
|
30
30
|
Dynamic: home-page
|
|
31
31
|
Dynamic: license
|
|
32
|
+
Dynamic: license-file
|
|
32
33
|
Dynamic: provides-extra
|
|
33
34
|
Dynamic: requires-dist
|
|
34
35
|
Dynamic: requires-python
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
LICENSE
|
|
2
|
+
README.md
|
|
3
|
+
setup.py
|
|
4
|
+
DeConveil.egg-info/PKG-INFO
|
|
5
|
+
DeConveil.egg-info/SOURCES.txt
|
|
6
|
+
DeConveil.egg-info/dependency_links.txt
|
|
7
|
+
DeConveil.egg-info/requires.txt
|
|
8
|
+
DeConveil.egg-info/top_level.txt
|
|
9
|
+
deconveil/__init__.py
|
|
10
|
+
deconveil/dds.py
|
|
11
|
+
deconveil/default_inference.py
|
|
12
|
+
deconveil/ds.py
|
|
13
|
+
deconveil/grid_search.py
|
|
14
|
+
deconveil/inference.py
|
|
15
|
+
deconveil/utils_clustering.py
|
|
16
|
+
deconveil/utils_fit.py
|
|
17
|
+
deconveil/utils_plot.py
|
|
18
|
+
deconveil/utils_processing.py
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
deconveil
|
|
@@ -1,6 +1,6 @@
|
|
|
1
|
-
Metadata-Version: 2.
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
2
|
Name: DeConveil
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.1
|
|
4
4
|
Summary: An extension of PyDESeq2/DESeq2 designed to account for genome aneuploidy
|
|
5
5
|
Home-page: https://github.com/caravagnalab/DeConveil
|
|
6
6
|
Author: Katsiaryna Davydzenka
|
|
@@ -29,6 +29,7 @@ Dynamic: author
|
|
|
29
29
|
Dynamic: author-email
|
|
30
30
|
Dynamic: home-page
|
|
31
31
|
Dynamic: license
|
|
32
|
+
Dynamic: license-file
|
|
32
33
|
Dynamic: provides-extra
|
|
33
34
|
Dynamic: requires-dist
|
|
34
35
|
Dynamic: requires-python
|
|
@@ -0,0 +1,64 @@
|
|
|
1
|
+
# DeConveil <a href="caravagnalab.github.io/DeConveil"><img src="docs/deconveil_logo.png" align="right" height="150" /></a>
|
|
2
|
+
|
|
3
|
+
The goal of *DeConveil* is the extension of Differential Gene Expression testing by accounting for genome aneuploidy.
|
|
4
|
+
This computational framework extends traditional DGE analysis by integrating DNA Copy Number Variation (CNV) data.
|
|
5
|
+
This approach adjusts for dosage effects and categorizes genes as *dosage-sensitive (DSG)*, *dosage-insensitive (DIG)*, and *dosage-compensated (DCG)*, separating the expression changes caused by CNVs from other alterations in transcriptional regulation.
|
|
6
|
+
To perform this gene separation we need to carry out DGE testing using both *PyDESeq2 (CN-naive)* and *DeConveil (CN-aware)* methods.
|
|
7
|
+
|
|
8
|
+
You can download the results of our analysis from [deconveilCaseStudies](https://github.com/kdavydzenka/deconveilCaseStudies)
|
|
9
|
+
|
|
10
|
+
|
|
11
|
+
### Installation
|
|
12
|
+
|
|
13
|
+
**Pre-required installations before running DeConveil**
|
|
14
|
+
|
|
15
|
+
Python libraries are required to be installed: *pydeseq2*
|
|
16
|
+
|
|
17
|
+
`pip install pydeseq2`
|
|
18
|
+
|
|
19
|
+
`pip install DeConveil`
|
|
20
|
+
|
|
21
|
+
or `git clone https://github.com/caravagnalab/DeConveil.git`
|
|
22
|
+
|
|
23
|
+
|
|
24
|
+
**Input data**
|
|
25
|
+
|
|
26
|
+
DeConveil requires the following input matrices:
|
|
27
|
+
|
|
28
|
+
- matched mRNA read counts (normal and tumor samples) and absolute CN values (for normal diploid samples we assign CN=2), structured as NxG matrix, where N represents the number of samples and G represents the number of genes;
|
|
29
|
+
|
|
30
|
+
- a design matrix structured as an N × F matrix, where N is the number of samples and F is the number of features or covariates.
|
|
31
|
+
|
|
32
|
+
Example of CN data for a given gene *g*:
|
|
33
|
+
CN = [1, 2, 3, 4, 5, 6].
|
|
34
|
+
|
|
35
|
+
An example of the input data can be found in the *test_deconveil* Jupyter Notebook.
|
|
36
|
+
|
|
37
|
+
|
|
38
|
+
**Output data**
|
|
39
|
+
|
|
40
|
+
`res_CNnaive.csv` (for *PyDESeq2* method) and `res_CNaware.csv` (for *DeConveil*) data frames reporting *log2FC* and *p.adjust* values for both methods.
|
|
41
|
+
|
|
42
|
+
These data frames are further processed to separate gene groups using `define_gene_groups()` function included in DeConveil framework.
|
|
43
|
+
|
|
44
|
+
A tutorial of the analysis workflow is available in `test_deconveil.ipynb`
|
|
45
|
+
|
|
46
|
+
|
|
47
|
+
#### Citation
|
|
48
|
+
|
|
49
|
+
[](https://doi.org/10.1101/2025.03.29.646108)
|
|
50
|
+
|
|
51
|
+
If you use `DeConveil`, cite:
|
|
52
|
+
|
|
53
|
+
K. Davydzenka, G. Caravagna, G. Sanguinetti. Extending differential gene expression testing to handle genome aneuploidy in cancer. [bioRxiv preprint](https://doi.org/10.1101/2025.03.29.646108), 2025.
|
|
54
|
+
|
|
55
|
+
|
|
56
|
+
#### Copyright and contacts
|
|
57
|
+
|
|
58
|
+
Katsiaryna Davydzenka, Cancer Data Science (CDS) Laboratory.
|
|
59
|
+
|
|
60
|
+
[](https://github.com/caravagnalab)
|
|
61
|
+
[](https://www.caravagnalab.org/)
|
|
62
|
+
|
|
63
|
+
|
|
64
|
+
|
|
@@ -1,11 +1,7 @@
|
|
|
1
1
|
import sys
|
|
2
2
|
import time
|
|
3
3
|
import warnings
|
|
4
|
-
from typing import List
|
|
5
|
-
from typing import Literal
|
|
6
|
-
from typing import Optional
|
|
7
|
-
from typing import Union
|
|
8
|
-
from typing import cast
|
|
4
|
+
from typing import List, Literal, Optional, Union, cast
|
|
9
5
|
|
|
10
6
|
import numpy as np
|
|
11
7
|
import pandas as pd
|
|
@@ -16,11 +12,11 @@ from scipy.stats import trim_mean # type: ignore
|
|
|
16
12
|
|
|
17
13
|
from deconveil.default_inference import DefInference
|
|
18
14
|
from deconveil.inference import Inference
|
|
19
|
-
from deconveil import
|
|
20
|
-
from deconveil.
|
|
21
|
-
from deconveil.
|
|
22
|
-
from deconveil.
|
|
23
|
-
from deconveil.
|
|
15
|
+
from deconveil import utils_fit
|
|
16
|
+
from deconveil.utils_fit import fit_rough_dispersions
|
|
17
|
+
from deconveil.utils_fit import fit_moments_dispersions2
|
|
18
|
+
from deconveil.utils_fit import grid_fit_beta
|
|
19
|
+
from deconveil.utils_fit import irls_glm
|
|
24
20
|
|
|
25
21
|
from pydeseq2.preprocessing import deseq2_norm_fit
|
|
26
22
|
from pydeseq2.preprocessing import deseq2_norm_transform
|
|
@@ -1,17 +1,13 @@
|
|
|
1
|
-
from typing import Literal
|
|
2
|
-
from typing import Optional
|
|
3
|
-
from typing import Tuple
|
|
1
|
+
from typing import Literal, Optional, Tuple
|
|
4
2
|
|
|
5
3
|
import numpy as np
|
|
6
4
|
import pandas as pd
|
|
7
|
-
from joblib import Parallel # type: ignore
|
|
8
|
-
from joblib import delayed
|
|
9
|
-
from joblib import parallel_backend
|
|
5
|
+
from joblib import Parallel, delayed, parallel_backend # type: ignore
|
|
10
6
|
from scipy.optimize import minimize # type: ignore
|
|
11
7
|
|
|
12
8
|
from deconveil import inference
|
|
13
|
-
from deconveil import
|
|
14
|
-
from deconveil.
|
|
9
|
+
from deconveil import utils_fit
|
|
10
|
+
from deconveil.utils_fit import fit_lin_mu
|
|
15
11
|
|
|
16
12
|
from pydeseq2 import utils
|
|
17
13
|
from pydeseq2.utils import get_num_processes
|
|
@@ -42,8 +38,8 @@ class DefInference(inference.Inference):
|
|
|
42
38
|
Joblib backend.
|
|
43
39
|
"""
|
|
44
40
|
|
|
45
|
-
fit_rough_dispersions = staticmethod(
|
|
46
|
-
fit_moments_dispersions2 = staticmethod(
|
|
41
|
+
fit_rough_dispersions = staticmethod(utils_fit.fit_rough_dispersions) # type: ignore
|
|
42
|
+
fit_moments_dispersions2 = staticmethod(utils_fit.fit_moments_dispersions2) # type: ignore
|
|
47
43
|
|
|
48
44
|
def __init__(
|
|
49
45
|
self,
|
|
@@ -79,7 +75,7 @@ class DefInference(inference.Inference):
|
|
|
79
75
|
verbose=self._joblib_verbosity,
|
|
80
76
|
batch_size=self._batch_size,
|
|
81
77
|
)(
|
|
82
|
-
delayed(
|
|
78
|
+
delayed(utils_fit.fit_lin_mu)(
|
|
83
79
|
counts=counts[:, i],
|
|
84
80
|
size_factors=size_factors,
|
|
85
81
|
design_matrix=design_matrix,
|
|
@@ -110,7 +106,7 @@ class DefInference(inference.Inference):
|
|
|
110
106
|
verbose=self._joblib_verbosity,
|
|
111
107
|
batch_size=self._batch_size,
|
|
112
108
|
)(
|
|
113
|
-
delayed(
|
|
109
|
+
delayed(utils_fit.irls_glm)(
|
|
114
110
|
counts=counts[:, i],
|
|
115
111
|
size_factors=size_factors,
|
|
116
112
|
design_matrix=design_matrix,
|
|
@@ -262,7 +258,7 @@ class DefInference(inference.Inference):
|
|
|
262
258
|
verbose=self._joblib_verbosity,
|
|
263
259
|
batch_size=self._batch_size,
|
|
264
260
|
)(
|
|
265
|
-
delayed(
|
|
261
|
+
delayed(utils_fit.nbinomGLM)(
|
|
266
262
|
design_matrix=design_matrix,
|
|
267
263
|
counts=counts[:, i],
|
|
268
264
|
cnv=cnv[:, i],
|
|
@@ -278,7 +274,3 @@ class DefInference(inference.Inference):
|
|
|
278
274
|
res = zip(*res)
|
|
279
275
|
lfcs, inv_hessians, l_bfgs_b_converged_ = (np.array(m) for m in res)
|
|
280
276
|
return lfcs, inv_hessians, l_bfgs_b_converged_
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
|
|
@@ -3,7 +3,7 @@ from typing import Optional
|
|
|
3
3
|
import numpy as np
|
|
4
4
|
from scipy.special import gammaln # type: ignore
|
|
5
5
|
|
|
6
|
-
from deconveil import
|
|
6
|
+
from deconveil import utils_fit
|
|
7
7
|
|
|
8
8
|
|
|
9
9
|
def grid_fit_beta(
|
|
@@ -156,7 +156,7 @@ def grid_fit_shrink_beta(
|
|
|
156
156
|
def loss(beta: np.ndarray) -> float:
|
|
157
157
|
# closure to minimize
|
|
158
158
|
return (
|
|
159
|
-
|
|
159
|
+
utils_fit.nbinomFn(
|
|
160
160
|
beta,
|
|
161
161
|
design_matrix,
|
|
162
162
|
counts,
|
|
@@ -1,8 +1,6 @@
|
|
|
1
1
|
from abc import ABC
|
|
2
2
|
from abc import abstractmethod
|
|
3
|
-
from typing import Literal
|
|
4
|
-
from typing import Optional
|
|
5
|
-
from typing import Tuple
|
|
3
|
+
from typing import Literal, Optional, Tuple
|
|
6
4
|
|
|
7
5
|
import numpy as np
|
|
8
6
|
import pandas as pd
|
|
@@ -365,9 +363,4 @@ class Inference(ABC):
|
|
|
365
363
|
converged: ndarray
|
|
366
364
|
Whether L-BFGS-B converged for each optimization problem.
|
|
367
365
|
"""
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
366
|
+
|
|
@@ -0,0 +1,201 @@
|
|
|
1
|
+
import numpy as np
|
|
2
|
+
import pandas as pd
|
|
3
|
+
from sklearn.decomposition import PCA
|
|
4
|
+
from sklearn.cluster import KMeans, AgglomerativeClustering
|
|
5
|
+
from sklearn.metrics import silhouette_score
|
|
6
|
+
from scipy.spatial.distance import pdist, squareform
|
|
7
|
+
import random
|
|
8
|
+
|
|
9
|
+
import matplotlib.pyplot as plt
|
|
10
|
+
import seaborn as sns
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
def pca_cluster_cn(
|
|
14
|
+
gene_cn: pd.DataFrame,
|
|
15
|
+
n_components: int = 20,
|
|
16
|
+
k: int = 2,
|
|
17
|
+
method: str = "kmeans",
|
|
18
|
+
random_state: int = 0,
|
|
19
|
+
) -> dict:
|
|
20
|
+
"""
|
|
21
|
+
Perform PCA on gene-level CN and cluster patients in PCA space.
|
|
22
|
+
|
|
23
|
+
Parameters
|
|
24
|
+
----------
|
|
25
|
+
gene_cn : DataFrame
|
|
26
|
+
Gene x Sample matrix of CN values (log2 ratios).
|
|
27
|
+
n_components : int
|
|
28
|
+
Number of PCA components to keep.
|
|
29
|
+
k : int
|
|
30
|
+
Number of clusters.
|
|
31
|
+
method : str
|
|
32
|
+
'kmeans' or 'hierarchical'.
|
|
33
|
+
random_state : int
|
|
34
|
+
For reproducibility.
|
|
35
|
+
|
|
36
|
+
Returns
|
|
37
|
+
-------
|
|
38
|
+
dict with:
|
|
39
|
+
- labels: pd.Series (sample -> cluster)
|
|
40
|
+
- pca_coords: DataFrame of PCA coords
|
|
41
|
+
- explained_var: explained variance ratios
|
|
42
|
+
"""
|
|
43
|
+
X = gene_cn.fillna(0).T # samples × genes
|
|
44
|
+
pca = PCA(n_components=min(n_components, X.shape[1]))
|
|
45
|
+
coords = pca.fit_transform(X)
|
|
46
|
+
coords_df = pd.DataFrame(
|
|
47
|
+
coords, index=X.index, columns=[f"PC{i+1}" for i in range(coords.shape[1])]
|
|
48
|
+
)
|
|
49
|
+
|
|
50
|
+
if method == "kmeans":
|
|
51
|
+
model = KMeans(n_clusters=k, n_init=20, random_state=random_state)
|
|
52
|
+
labels = model.fit_predict(coords)
|
|
53
|
+
elif method == "hierarchical":
|
|
54
|
+
model = AgglomerativeClustering(n_clusters=k)
|
|
55
|
+
labels = model.fit_predict(coords)
|
|
56
|
+
else:
|
|
57
|
+
raise ValueError("method must be 'kmeans' or 'hierarchical'")
|
|
58
|
+
|
|
59
|
+
labels = pd.Series(labels, index=X.index, name="cluster")
|
|
60
|
+
return {
|
|
61
|
+
"labels": labels,
|
|
62
|
+
"pca_coords": coords_df,
|
|
63
|
+
"explained_var": pca.explained_variance_ratio_,
|
|
64
|
+
}
|
|
65
|
+
|
|
66
|
+
|
|
67
|
+
def consensus_cluster_cn(
|
|
68
|
+
gene_cn: pd.DataFrame,
|
|
69
|
+
k: int = 2,
|
|
70
|
+
n_resamples: int = 50,
|
|
71
|
+
sample_fraction: float = 0.8,
|
|
72
|
+
feature_fraction: float = 0.8,
|
|
73
|
+
top_genes: int = 2000,
|
|
74
|
+
random_state: int = 0,
|
|
75
|
+
) -> dict:
|
|
76
|
+
"""
|
|
77
|
+
Consensus clustering of patients based on CN profiles.
|
|
78
|
+
|
|
79
|
+
Parameters
|
|
80
|
+
----------
|
|
81
|
+
gene_cn : DataFrame
|
|
82
|
+
Gene x Sample CN matrix.
|
|
83
|
+
k : int
|
|
84
|
+
Number of clusters.
|
|
85
|
+
n_resamples : int
|
|
86
|
+
Number of resampling iterations.
|
|
87
|
+
sample_fraction : float
|
|
88
|
+
Fraction of patients sampled each iteration.
|
|
89
|
+
feature_fraction : float
|
|
90
|
+
Fraction of genes sampled each iteration.
|
|
91
|
+
top_genes : int
|
|
92
|
+
Use top variable genes only.
|
|
93
|
+
random_state : int
|
|
94
|
+
For reproducibility.
|
|
95
|
+
|
|
96
|
+
Returns
|
|
97
|
+
-------
|
|
98
|
+
dict with:
|
|
99
|
+
- labels: pd.Series (sample -> cluster) from consensus
|
|
100
|
+
- consensus_matrix: DataFrame (samples × samples) with co-clustering frequencies
|
|
101
|
+
"""
|
|
102
|
+
rng = np.random.RandomState(random_state)
|
|
103
|
+
|
|
104
|
+
# Select top variable genes
|
|
105
|
+
var_genes = gene_cn.var(axis=1).sort_values(ascending=False).index[:top_genes]
|
|
106
|
+
data = gene_cn.loc[var_genes].fillna(0).values # genes × samples
|
|
107
|
+
samples = gene_cn.columns.tolist()
|
|
108
|
+
n = len(samples)
|
|
109
|
+
|
|
110
|
+
co_mat = np.zeros((n, n))
|
|
111
|
+
counts = np.zeros((n, n))
|
|
112
|
+
|
|
113
|
+
for r in range(n_resamples):
|
|
114
|
+
samp_idx = rng.choice(n, size=int(sample_fraction * n), replace=False)
|
|
115
|
+
feat_idx = rng.choice(
|
|
116
|
+
data.shape[0], size=int(feature_fraction * data.shape[0]), replace=False
|
|
117
|
+
)
|
|
118
|
+
X = data[feat_idx][:, samp_idx].T # subsampled patients × genes
|
|
119
|
+
|
|
120
|
+
# k-means in subsample
|
|
121
|
+
km = KMeans(n_clusters=k, n_init=10, random_state=rng).fit(X)
|
|
122
|
+
labels_sub = km.labels_
|
|
123
|
+
|
|
124
|
+
# update co-occurrence
|
|
125
|
+
for i, si in enumerate(samp_idx):
|
|
126
|
+
for j, sj in enumerate(samp_idx):
|
|
127
|
+
counts[si, sj] += 1
|
|
128
|
+
if labels_sub[i] == labels_sub[j]:
|
|
129
|
+
co_mat[si, sj] += 1
|
|
130
|
+
|
|
131
|
+
consensus = np.divide(co_mat, counts, out=np.zeros_like(co_mat), where=counts > 0)
|
|
132
|
+
consensus_df = pd.DataFrame(consensus, index=samples, columns=samples)
|
|
133
|
+
|
|
134
|
+
# Cluster consensus matrix
|
|
135
|
+
dist = 1 - consensus
|
|
136
|
+
agg = AgglomerativeClustering(n_clusters=k, affinity="precomputed", linkage="average")
|
|
137
|
+
labels = agg.fit_predict(dist)
|
|
138
|
+
labels = pd.Series(labels, index=samples, name="cluster")
|
|
139
|
+
|
|
140
|
+
return {"labels": labels, "consensus_matrix": consensus_df}
|
|
141
|
+
|
|
142
|
+
|
|
143
|
+
def consensus_cdf_range(
|
|
144
|
+
gene_cn, k_values=(2,3,4,5,6),
|
|
145
|
+
n_resamples=50, sample_fraction=0.8, feature_fraction=0.8,
|
|
146
|
+
top_genes=2000, random_state=0
|
|
147
|
+
):
|
|
148
|
+
"""
|
|
149
|
+
Run consensus clustering across multiple k and plot CDFs.
|
|
150
|
+
|
|
151
|
+
Parameters
|
|
152
|
+
----------
|
|
153
|
+
gene_cn : DataFrame
|
|
154
|
+
Gene × Sample CN matrix.
|
|
155
|
+
k_values : list/tuple
|
|
156
|
+
Range of k to test.
|
|
157
|
+
n_resamples, sample_fraction, feature_fraction, top_genes, random_state
|
|
158
|
+
Passed to consensus_cluster_cn().
|
|
159
|
+
|
|
160
|
+
Returns
|
|
161
|
+
-------
|
|
162
|
+
dict
|
|
163
|
+
{k: {"labels", "consensus_matrix", "auc"}}
|
|
164
|
+
"""
|
|
165
|
+
results = {}
|
|
166
|
+
|
|
167
|
+
plt.figure(figsize=(7,5))
|
|
168
|
+
|
|
169
|
+
for k in k_values:
|
|
170
|
+
res = consensus_cluster_cn(
|
|
171
|
+
gene_cn, k=k,
|
|
172
|
+
n_resamples=n_resamples,
|
|
173
|
+
sample_fraction=sample_fraction,
|
|
174
|
+
feature_fraction=feature_fraction,
|
|
175
|
+
top_genes=top_genes,
|
|
176
|
+
random_state=random_state
|
|
177
|
+
)
|
|
178
|
+
|
|
179
|
+
mat = res["consensus_matrix"].values
|
|
180
|
+
mask = ~np.eye(mat.shape[0], dtype=bool)
|
|
181
|
+
vals = mat[mask]
|
|
182
|
+
|
|
183
|
+
sorted_vals = np.sort(vals)
|
|
184
|
+
cdf = np.arange(1, len(sorted_vals)+1) / len(sorted_vals)
|
|
185
|
+
|
|
186
|
+
# Compute area under CDF (AUC)
|
|
187
|
+
auc = np.trapz(cdf, sorted_vals)
|
|
188
|
+
res["auc"] = auc
|
|
189
|
+
results[k] = res
|
|
190
|
+
|
|
191
|
+
plt.plot(sorted_vals, cdf, lw=2, label=f"k={k} (AUC={auc:.3f})")
|
|
192
|
+
|
|
193
|
+
plt.xlabel("Consensus value")
|
|
194
|
+
plt.ylabel("Cumulative fraction")
|
|
195
|
+
plt.title("Consensus CDF across k", fontsize=14)
|
|
196
|
+
plt.legend()
|
|
197
|
+
plt.grid(True, alpha=0.3)
|
|
198
|
+
plt.tight_layout()
|
|
199
|
+
plt.show()
|
|
200
|
+
|
|
201
|
+
return results
|
|
@@ -1,30 +1,20 @@
|
|
|
1
1
|
import os
|
|
2
2
|
import multiprocessing
|
|
3
3
|
import warnings
|
|
4
|
-
from math import ceil
|
|
5
|
-
from math import floor
|
|
4
|
+
from math import ceil, floor
|
|
6
5
|
from pathlib import Path
|
|
7
|
-
from typing import List
|
|
8
|
-
from typing import Literal
|
|
9
|
-
from typing import Optional
|
|
10
|
-
from typing import Tuple
|
|
11
|
-
from typing import Union
|
|
12
|
-
from typing import cast
|
|
6
|
+
from typing import List, Literal, Optional, Tuple, Union, cast, Dict, Any
|
|
13
7
|
|
|
14
8
|
import numpy as np
|
|
15
9
|
import pandas as pd
|
|
16
|
-
from matplotlib import pyplot as plt
|
|
17
10
|
from scipy.linalg import solve # type: ignore
|
|
18
11
|
from scipy.optimize import minimize # type: ignore
|
|
19
12
|
from scipy.special import gammaln # type: ignore
|
|
20
13
|
from scipy.special import polygamma # type: ignore
|
|
21
14
|
from scipy.stats import norm # type: ignore
|
|
22
15
|
from sklearn.linear_model import LinearRegression # type: ignore
|
|
23
|
-
import matplotlib.pyplot as plt
|
|
24
|
-
import seaborn as sns
|
|
25
16
|
|
|
26
17
|
from deconveil.grid_search import grid_fit_beta
|
|
27
|
-
|
|
28
18
|
from pydeseq2.utils import fit_alpha_mle
|
|
29
19
|
from pydeseq2.utils import get_num_processes
|
|
30
20
|
from pydeseq2.grid_search import grid_fit_alpha
|
|
@@ -209,7 +199,6 @@ def fit_rough_dispersions(
|
|
|
209
199
|
return np.maximum(alpha_rde, 0)
|
|
210
200
|
|
|
211
201
|
|
|
212
|
-
|
|
213
202
|
def fit_moments_dispersions2(
|
|
214
203
|
normed_counts: np.ndarray, size_factors: np.ndarray
|
|
215
204
|
) -> np.ndarray:
|
|
@@ -470,6 +459,7 @@ def nbinomGLM(
|
|
|
470
459
|
inv_hessian = np.linalg.inv(ddf(beta, 1))
|
|
471
460
|
|
|
472
461
|
return beta, inv_hessian, converged
|
|
462
|
+
|
|
473
463
|
|
|
474
464
|
def nbinomFn(
|
|
475
465
|
beta: np.ndarray,
|
|
@@ -533,277 +523,3 @@ def nbinomFn(
|
|
|
533
523
|
).sum(0)
|
|
534
524
|
|
|
535
525
|
return prior - nll
|
|
536
|
-
|
|
537
|
-
|
|
538
|
-
|
|
539
|
-
def process_results(file_path, method, lfc_cut = 1.0, pval_cut = 0.05):
|
|
540
|
-
df = pd.read_csv(file_path, index_col=0)
|
|
541
|
-
df['isDE'] = (np.abs(df['log2FoldChange']) >= lfc_cut) & (df['padj'] <= pval_cut)
|
|
542
|
-
df['DEtype'] = np.where(
|
|
543
|
-
~df['isDE'],
|
|
544
|
-
"n.s.",
|
|
545
|
-
np.where(df['log2FoldChange'] > 0, "Up-reg", "Down-reg")
|
|
546
|
-
)
|
|
547
|
-
df['method'] = method
|
|
548
|
-
return df[['log2FoldChange', 'padj', 'isDE', 'DEtype', 'method']]
|
|
549
|
-
|
|
550
|
-
|
|
551
|
-
def define_gene_groups(res_joint):
|
|
552
|
-
DSGs = res_joint[
|
|
553
|
-
((res_joint['DEtype_naive'] == "Up-reg") & (res_joint['DEtype_aware'] == "n.s.")) |
|
|
554
|
-
((res_joint['DEtype_naive'] == "Down-reg") & (res_joint['DEtype_aware'] == "n.s."))
|
|
555
|
-
].assign(gene_category='DSGs')
|
|
556
|
-
|
|
557
|
-
DIGs = res_joint[
|
|
558
|
-
((res_joint['DEtype_naive'] == "Up-reg") & (res_joint['DEtype_aware'] == "Up-reg")) |
|
|
559
|
-
((res_joint['DEtype_naive'] == "Down-reg") & (res_joint['DEtype_aware'] == "Down-reg"))
|
|
560
|
-
].assign(gene_category='DIGs')
|
|
561
|
-
|
|
562
|
-
DCGs = res_joint[
|
|
563
|
-
((res_joint['DEtype_naive'] == "n.s.") & (res_joint['DEtype_aware'] == "Up-reg")) |
|
|
564
|
-
((res_joint['DEtype_naive'] == "n.s.") & (res_joint['DEtype_aware'] == "Down-reg"))
|
|
565
|
-
].assign(gene_category='DCGs')
|
|
566
|
-
|
|
567
|
-
non_DEGs = res_joint[
|
|
568
|
-
(res_joint['DEtype_naive'] == "n.s.") & (res_joint['DEtype_aware'] == "n.s.")
|
|
569
|
-
].assign(gene_category='non-DEGs')
|
|
570
|
-
|
|
571
|
-
return {
|
|
572
|
-
"DSGs": DSGs,
|
|
573
|
-
"DIGs": DIGs,
|
|
574
|
-
"DCGs": DCGs,
|
|
575
|
-
"non_DEGs": non_DEGs
|
|
576
|
-
}
|
|
577
|
-
|
|
578
|
-
|
|
579
|
-
def generate_volcano_plot(plot_data, lfc_cut=1.0, pval_cut=0.05, xlim=None, ylim=None):
|
|
580
|
-
plot_data['gene_group'] = plot_data['gene_group'].astype('category')
|
|
581
|
-
|
|
582
|
-
# Define gene group colors
|
|
583
|
-
gene_group_colors = {
|
|
584
|
-
"DIGs": "#8F3931FF",
|
|
585
|
-
"DSGs": "#FFB977",
|
|
586
|
-
"DCGs": "#FFC300"
|
|
587
|
-
}
|
|
588
|
-
|
|
589
|
-
# Create a FacetGrid for faceted plots
|
|
590
|
-
g = sns.FacetGrid(
|
|
591
|
-
plot_data,
|
|
592
|
-
col="method",
|
|
593
|
-
margin_titles=True,
|
|
594
|
-
hue="gene_group",
|
|
595
|
-
palette=gene_group_colors,
|
|
596
|
-
sharey=False,
|
|
597
|
-
sharex=True
|
|
598
|
-
)
|
|
599
|
-
|
|
600
|
-
|
|
601
|
-
# Add points for "DIGs"
|
|
602
|
-
g.map_dataframe(
|
|
603
|
-
sns.scatterplot,
|
|
604
|
-
x="log2FC",
|
|
605
|
-
y="-log10(padj)",
|
|
606
|
-
alpha=0.1,
|
|
607
|
-
size=0.5,
|
|
608
|
-
legend=False,
|
|
609
|
-
data=plot_data[plot_data['gene_group'].isin(["DIGs"])]
|
|
610
|
-
)
|
|
611
|
-
|
|
612
|
-
# Add points for "DSGs" and "DCGs"
|
|
613
|
-
g.map_dataframe(
|
|
614
|
-
sns.scatterplot,
|
|
615
|
-
x="log2FC",
|
|
616
|
-
y="-log10(padj)",
|
|
617
|
-
alpha=1.0,
|
|
618
|
-
size=3.0,
|
|
619
|
-
legend=False,
|
|
620
|
-
data=plot_data[plot_data['gene_group'].isin(["DSGs", "DCGs"])]
|
|
621
|
-
)
|
|
622
|
-
|
|
623
|
-
# Add vertical and horizontal dashed lines
|
|
624
|
-
for ax in g.axes.flat:
|
|
625
|
-
ax.axvline(x=-lfc_cut, color="gray", linestyle="dashed")
|
|
626
|
-
ax.axvline(x=lfc_cut, color="gray", linestyle="dashed")
|
|
627
|
-
ax.axhline(y=-np.log10(pval_cut), color="gray", linestyle="dashed")
|
|
628
|
-
|
|
629
|
-
if xlim:
|
|
630
|
-
ax.set_xlim(xlim)
|
|
631
|
-
if ylim:
|
|
632
|
-
ax.set_ylim(ylim)
|
|
633
|
-
|
|
634
|
-
# Set axis labels
|
|
635
|
-
g.set_axis_labels("Log2 FC", "-Log10 P-value")
|
|
636
|
-
|
|
637
|
-
# Add titles, legends, and customize
|
|
638
|
-
g.add_legend(title="Gene category")
|
|
639
|
-
g.set_titles(row_template="{row_name}", col_template="{col_name}")
|
|
640
|
-
g.tight_layout()
|
|
641
|
-
|
|
642
|
-
# Adjust font sizes for better readability
|
|
643
|
-
for ax in g.axes.flat:
|
|
644
|
-
ax.tick_params(axis='both', labelsize=12)
|
|
645
|
-
ax.set_xlabel("Log2 FC", fontsize=14)
|
|
646
|
-
ax.set_ylabel("-Log10 P-value", fontsize=14)
|
|
647
|
-
|
|
648
|
-
# Save or display the plot
|
|
649
|
-
plt.show()
|
|
650
|
-
|
|
651
|
-
|
|
652
|
-
def plot_cnv_hist(cnv_mean, binwidth=0.2):
|
|
653
|
-
"""
|
|
654
|
-
Plots a histogram of the CNV mean distribution.
|
|
655
|
-
|
|
656
|
-
Parameters:
|
|
657
|
-
cnv_mean (pd.Series or list): The CNV mean values to plot.
|
|
658
|
-
binwidth (float): The bin width for the histogram.
|
|
659
|
-
"""
|
|
660
|
-
# Convert to a DataFrame if it's not already
|
|
661
|
-
if isinstance(cnv_mean, list):
|
|
662
|
-
cnv_mean = pd.DataFrame({'cnv_mean': cnv_mean})
|
|
663
|
-
elif isinstance(cnv_mean, pd.Series):
|
|
664
|
-
cnv_mean = cnv_mean.to_frame(name='cnv_mean')
|
|
665
|
-
|
|
666
|
-
# Create the histogram plot
|
|
667
|
-
plt.figure(figsize=(5, 5))
|
|
668
|
-
sns.histplot(
|
|
669
|
-
cnv_mean['cnv_mean'],
|
|
670
|
-
bins=int((cnv_mean['cnv_mean'].max() - cnv_mean['cnv_mean'].min()) / binwidth),
|
|
671
|
-
kde=False,
|
|
672
|
-
color="#F39B7F",
|
|
673
|
-
edgecolor="black",
|
|
674
|
-
alpha=0.7
|
|
675
|
-
)
|
|
676
|
-
|
|
677
|
-
# Add labels and titles
|
|
678
|
-
plt.title("", fontsize=14)
|
|
679
|
-
plt.xlabel("CN state", fontsize=14, labelpad=8)
|
|
680
|
-
plt.ylabel("Frequency", fontsize=14, labelpad=8)
|
|
681
|
-
|
|
682
|
-
# Customize the appearance of axes
|
|
683
|
-
plt.xticks(fontsize=12, color="black", rotation=45, ha="right")
|
|
684
|
-
plt.yticks(fontsize=12, color="black")
|
|
685
|
-
plt.gca().spines["top"].set_visible(False)
|
|
686
|
-
plt.gca().spines["right"].set_visible(False)
|
|
687
|
-
plt.gca().spines["left"].set_linewidth(1)
|
|
688
|
-
plt.gca().spines["bottom"].set_linewidth(1)
|
|
689
|
-
|
|
690
|
-
# Add a grid
|
|
691
|
-
plt.grid(visible=False)
|
|
692
|
-
|
|
693
|
-
# Show the plot
|
|
694
|
-
plt.tight_layout()
|
|
695
|
-
plt.show()
|
|
696
|
-
|
|
697
|
-
|
|
698
|
-
def plot_stacked_bar(combined_data):
|
|
699
|
-
"""
|
|
700
|
-
Creates a stacked bar plot of gene counts by CNV group for each tumor type.
|
|
701
|
-
|
|
702
|
-
Parameters:
|
|
703
|
-
- combined_data: DataFrame containing the data to plot.
|
|
704
|
-
"""
|
|
705
|
-
# Define CNV colors inside the function
|
|
706
|
-
cnv_colors = {
|
|
707
|
-
"loss": "#0000FF",
|
|
708
|
-
"neutral": "#808080",
|
|
709
|
-
"gain": "#00FF00",
|
|
710
|
-
"amplification": "#FF0000"
|
|
711
|
-
}
|
|
712
|
-
|
|
713
|
-
tumor_types = combined_data['tumor_type'].unique()
|
|
714
|
-
|
|
715
|
-
# Create subplots for each tumor type
|
|
716
|
-
fig, axes = plt.subplots(1, len(tumor_types), figsize=(5, 5), sharey=True)
|
|
717
|
-
|
|
718
|
-
# If there's only one tumor type, axes will not be an array, so we convert it into a list
|
|
719
|
-
if len(tumor_types) == 1:
|
|
720
|
-
axes = [axes]
|
|
721
|
-
|
|
722
|
-
for idx, tumor_type in enumerate(tumor_types):
|
|
723
|
-
ax = axes[idx]
|
|
724
|
-
tumor_data = combined_data[combined_data['tumor_type'] == tumor_type]
|
|
725
|
-
|
|
726
|
-
# Create a table of counts for CNV group vs gene group
|
|
727
|
-
counts = pd.crosstab(tumor_data['gene_group'], tumor_data['cnv_group'])
|
|
728
|
-
|
|
729
|
-
# Plot stacked bars
|
|
730
|
-
counts.plot(kind='bar', stacked=True, ax=ax, color=[cnv_colors[group] for group in counts.columns], width=0.6)
|
|
731
|
-
|
|
732
|
-
ax.set_title(tumor_type, fontsize=16)
|
|
733
|
-
ax.set_xlabel("")
|
|
734
|
-
ax.set_ylabel("Gene Counts", fontsize=16)
|
|
735
|
-
|
|
736
|
-
# Customize axis labels and tick marks
|
|
737
|
-
ax.tick_params(axis='x', labelsize=16, labelcolor="black")
|
|
738
|
-
ax.tick_params(axis='y', labelsize=16, labelcolor="black")
|
|
739
|
-
|
|
740
|
-
# Overall settings for layout and labels
|
|
741
|
-
plt.xticks(fontsize=12, color="black", rotation=45, ha="right")
|
|
742
|
-
plt.tight_layout()
|
|
743
|
-
plt.show()
|
|
744
|
-
|
|
745
|
-
|
|
746
|
-
def plot_percentage_bar(barplot_data):
|
|
747
|
-
"""
|
|
748
|
-
Creates a bar plot showing the percentage of genes for each gene group across tumor types.
|
|
749
|
-
|
|
750
|
-
Parameters:
|
|
751
|
-
- barplot_data: DataFrame containing 'gene_group', 'percentage', and 'Count' columns.
|
|
752
|
-
"""
|
|
753
|
-
# Define the gene group colors inside the function
|
|
754
|
-
gene_group_colors = {
|
|
755
|
-
"DIGs": "#8F3931FF",
|
|
756
|
-
"DSGs": "#FFB977",
|
|
757
|
-
"DCGs": "#FFC300"
|
|
758
|
-
}
|
|
759
|
-
|
|
760
|
-
tumor_types = barplot_data['tumor_type'].unique()
|
|
761
|
-
|
|
762
|
-
plt.figure(figsize=(5, 5))
|
|
763
|
-
sns.set(style="whitegrid")
|
|
764
|
-
|
|
765
|
-
# Create subplots for each tumor type
|
|
766
|
-
fig, axes = plt.subplots(1, len(tumor_types), figsize=(5, 5), sharey=True)
|
|
767
|
-
|
|
768
|
-
# If only one tumor type, ensure axes is a list
|
|
769
|
-
if len(tumor_types) == 1:
|
|
770
|
-
axes = [axes]
|
|
771
|
-
|
|
772
|
-
for idx, tumor_type in enumerate(tumor_types):
|
|
773
|
-
ax = axes[idx]
|
|
774
|
-
tumor_data = barplot_data[barplot_data['tumor_type'] == tumor_type]
|
|
775
|
-
|
|
776
|
-
# Plot the percentage bar plot
|
|
777
|
-
sns.barplot(data=tumor_data, x="gene_group", y="percentage", hue="gene_group",
|
|
778
|
-
palette=gene_group_colors, ax=ax, width=0.6)
|
|
779
|
-
|
|
780
|
-
# Add counts and percentages as labels
|
|
781
|
-
for p in ax.patches:
|
|
782
|
-
height = p.get_height()
|
|
783
|
-
gene_group = p.get_x() + p.get_width() / 2 # Get the x position of the patch (bar)
|
|
784
|
-
|
|
785
|
-
# Find the gene_group in the data based on its position
|
|
786
|
-
group_name = tumor_data.iloc[int(gene_group)]['gene_group']
|
|
787
|
-
count = tumor_data.loc[tumor_data['gene_group'] == group_name, 'Count'].values[0]
|
|
788
|
-
percentage = tumor_data.loc[tumor_data['gene_group'] == group_name, 'percentage'].values[0]
|
|
789
|
-
|
|
790
|
-
# Position the labels slightly above the bars
|
|
791
|
-
ax.text(p.get_x() + p.get_width() / 2, height + 0.5, f'{count} ({round(percentage, 1)}%)',
|
|
792
|
-
ha='center', va='bottom', fontsize=12, color="black")
|
|
793
|
-
|
|
794
|
-
ax.set_title(tumor_type, fontsize=16)
|
|
795
|
-
ax.set_xlabel("")
|
|
796
|
-
ax.set_ylabel("Percentage of Genes", fontsize=16)
|
|
797
|
-
|
|
798
|
-
# Customize axis labels and tick marks
|
|
799
|
-
ax.tick_params(axis='x', labelsize=16, labelcolor="black", rotation=45)
|
|
800
|
-
ax.tick_params(axis='y', labelsize=16, labelcolor="black")
|
|
801
|
-
|
|
802
|
-
# Explicitly set the x-tick labels with proper rotation and alignment
|
|
803
|
-
for tick in ax.get_xticklabels():
|
|
804
|
-
tick.set_horizontalalignment('right') # This ensures proper alignment for x-ticks
|
|
805
|
-
tick.set_rotation(45)
|
|
806
|
-
|
|
807
|
-
# Overall settings for layout and labels
|
|
808
|
-
plt.tight_layout()
|
|
809
|
-
plt.show()
|
|
@@ -0,0 +1,308 @@
|
|
|
1
|
+
import numpy as np
|
|
2
|
+
import pandas as pd
|
|
3
|
+
|
|
4
|
+
import matplotlib.pyplot as plt
|
|
5
|
+
import seaborn as sns
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
def plot_volcano(plot_data, lfc_cut=1.0, pval_cut=0.05, xlim=None, ylim=None):
|
|
9
|
+
plot_data['gene_group'] = plot_data['gene_group'].astype('category')
|
|
10
|
+
|
|
11
|
+
# Define gene group colors
|
|
12
|
+
gene_group_colors = {
|
|
13
|
+
"DIGs": "#8F3931FF",
|
|
14
|
+
"DSGs": "#FFB977",
|
|
15
|
+
"DCGs": "#FFC300"
|
|
16
|
+
}
|
|
17
|
+
|
|
18
|
+
# Create a FacetGrid for faceted plots
|
|
19
|
+
g = sns.FacetGrid(
|
|
20
|
+
plot_data,
|
|
21
|
+
col="method",
|
|
22
|
+
margin_titles=True,
|
|
23
|
+
hue="gene_group",
|
|
24
|
+
palette=gene_group_colors,
|
|
25
|
+
sharey=False,
|
|
26
|
+
sharex=True
|
|
27
|
+
)
|
|
28
|
+
|
|
29
|
+
|
|
30
|
+
# Add points for "DIGs"
|
|
31
|
+
g.map_dataframe(
|
|
32
|
+
sns.scatterplot,
|
|
33
|
+
x="log2FC",
|
|
34
|
+
y="-log10(padj)",
|
|
35
|
+
alpha=0.2,
|
|
36
|
+
size=0.5,
|
|
37
|
+
legend=False,
|
|
38
|
+
data=plot_data[plot_data['gene_group'] == "DIGs"]
|
|
39
|
+
)
|
|
40
|
+
|
|
41
|
+
# Add points for "DSGs" and "DCGs
|
|
42
|
+
g.map_dataframe(
|
|
43
|
+
sns.scatterplot,
|
|
44
|
+
x="log2FC",
|
|
45
|
+
y="-log10(padj)",
|
|
46
|
+
alpha=0.8,
|
|
47
|
+
s=3.0,
|
|
48
|
+
legend=False,
|
|
49
|
+
data=plot_data[plot_data['gene_group'] == "DSGs"]
|
|
50
|
+
)
|
|
51
|
+
|
|
52
|
+
g.map_dataframe(
|
|
53
|
+
sns.scatterplot,
|
|
54
|
+
x="log2FC",
|
|
55
|
+
y="-log10(padj)",
|
|
56
|
+
alpha=1.0,
|
|
57
|
+
s=3.0,
|
|
58
|
+
legend=False,
|
|
59
|
+
data=plot_data[plot_data['gene_group'] == "DCGs"],
|
|
60
|
+
zorder=5 # force to front
|
|
61
|
+
)
|
|
62
|
+
|
|
63
|
+
# Threshold lines
|
|
64
|
+
for ax in g.axes.flat:
|
|
65
|
+
ax.axvline(x=-lfc_cut, color="gray", linestyle="dashed")
|
|
66
|
+
ax.axvline(x=lfc_cut, color="gray", linestyle="dashed")
|
|
67
|
+
ax.axhline(y=-np.log10(pval_cut), color="gray", linestyle="dashed")
|
|
68
|
+
|
|
69
|
+
if xlim:
|
|
70
|
+
ax.set_xlim(xlim)
|
|
71
|
+
if ylim:
|
|
72
|
+
ax.set_ylim(ylim)
|
|
73
|
+
|
|
74
|
+
# Labels and legend
|
|
75
|
+
g.set_axis_labels("Log2 FC", "-Log10 P-value")
|
|
76
|
+
g.add_legend(title="Gene category")
|
|
77
|
+
g.set_titles(row_template="{row_name}", col_template="{col_name}")
|
|
78
|
+
g.tight_layout()
|
|
79
|
+
|
|
80
|
+
# Axis formatting
|
|
81
|
+
for ax in g.axes.flat:
|
|
82
|
+
ax.tick_params(axis='both', labelsize=12)
|
|
83
|
+
ax.set_xlabel("Log2 FC", fontsize=14)
|
|
84
|
+
ax.set_ylabel("-Log10 P-value", fontsize=14)
|
|
85
|
+
|
|
86
|
+
# Save or display the plot
|
|
87
|
+
plt.show()
|
|
88
|
+
|
|
89
|
+
|
|
90
|
+
def plot_cnv_hist(cnv_mean, binwidth=0.2, title="CNV Mean Distribution"):
|
|
91
|
+
"""
|
|
92
|
+
Plots a histogram of the CNV mean distribution.
|
|
93
|
+
|
|
94
|
+
Parameters:
|
|
95
|
+
cnv_mean (pd.Series or list): The CNV mean values to plot.
|
|
96
|
+
binwidth (float): The bin width for the histogram.
|
|
97
|
+
title (str): The title of the plot.
|
|
98
|
+
"""
|
|
99
|
+
# Convert to a DataFrame if it's not already
|
|
100
|
+
if isinstance(cnv_mean, list):
|
|
101
|
+
cnv_mean = pd.DataFrame({'cnv_mean': cnv_mean})
|
|
102
|
+
elif isinstance(cnv_mean, pd.Series):
|
|
103
|
+
cnv_mean = cnv_mean.to_frame(name='cnv_mean')
|
|
104
|
+
|
|
105
|
+
# Create the histogram plot
|
|
106
|
+
plt.figure(figsize=(5, 5))
|
|
107
|
+
sns.histplot(
|
|
108
|
+
cnv_mean['cnv_mean'],
|
|
109
|
+
bins=int((cnv_mean['cnv_mean'].max() - cnv_mean['cnv_mean'].min()) / binwidth),
|
|
110
|
+
kde=False,
|
|
111
|
+
color="#F39B7F",
|
|
112
|
+
edgecolor="black",
|
|
113
|
+
alpha=0.7
|
|
114
|
+
)
|
|
115
|
+
|
|
116
|
+
# Add labels and titles
|
|
117
|
+
plt.title(title, fontsize=14, pad=12)
|
|
118
|
+
plt.xlabel("CN state", fontsize=14, labelpad=8)
|
|
119
|
+
plt.ylabel("Frequency", fontsize=14, labelpad=8)
|
|
120
|
+
|
|
121
|
+
# Customize the appearance of axes
|
|
122
|
+
plt.xticks(fontsize=12, color="black", rotation=45, ha="right")
|
|
123
|
+
plt.yticks(fontsize=12, color="black")
|
|
124
|
+
plt.gca().spines["top"].set_visible(False)
|
|
125
|
+
plt.gca().spines["right"].set_visible(False)
|
|
126
|
+
plt.gca().spines["left"].set_linewidth(1)
|
|
127
|
+
plt.gca().spines["bottom"].set_linewidth(1)
|
|
128
|
+
|
|
129
|
+
# Add a grid
|
|
130
|
+
plt.grid(visible=False)
|
|
131
|
+
|
|
132
|
+
# Show the plot
|
|
133
|
+
plt.tight_layout()
|
|
134
|
+
plt.show()
|
|
135
|
+
|
|
136
|
+
|
|
137
|
+
def plot_stacked_bar(combined_data):
|
|
138
|
+
"""
|
|
139
|
+
Creates a stacked bar plot of gene counts by CNV group for each tumor type.
|
|
140
|
+
|
|
141
|
+
Parameters:
|
|
142
|
+
- combined_data: DataFrame containing the data to plot.
|
|
143
|
+
"""
|
|
144
|
+
# Define CNV colors inside the function
|
|
145
|
+
cnv_colors = {
|
|
146
|
+
"loss": "dodgerblue",
|
|
147
|
+
"neutral": "gray",
|
|
148
|
+
"gain": "yellowgreen",
|
|
149
|
+
"amplification": "coral"
|
|
150
|
+
}
|
|
151
|
+
|
|
152
|
+
tumor_types = combined_data['tumor_type'].unique()
|
|
153
|
+
|
|
154
|
+
# Create subplots for each tumor type
|
|
155
|
+
fig, axes = plt.subplots(1, len(tumor_types), figsize=(5, 5), sharey=True)
|
|
156
|
+
|
|
157
|
+
# If there's only one tumor type, axes will not be an array, so we convert it into a list
|
|
158
|
+
if len(tumor_types) == 1:
|
|
159
|
+
axes = [axes]
|
|
160
|
+
|
|
161
|
+
for idx, tumor_type in enumerate(tumor_types):
|
|
162
|
+
ax = axes[idx]
|
|
163
|
+
tumor_data = combined_data[combined_data['tumor_type'] == tumor_type]
|
|
164
|
+
|
|
165
|
+
# Create a table of counts for CNV group vs gene group
|
|
166
|
+
counts = pd.crosstab(tumor_data['gene_group'], tumor_data['cnv_group'])
|
|
167
|
+
|
|
168
|
+
# Plot stacked bars
|
|
169
|
+
counts.plot(kind='bar', stacked=True, ax=ax, color=[cnv_colors[group] for group in counts.columns], width=0.6)
|
|
170
|
+
|
|
171
|
+
ax.set_title(tumor_type, fontsize=16)
|
|
172
|
+
ax.set_xlabel("")
|
|
173
|
+
ax.set_ylabel("Gene Counts", fontsize=16)
|
|
174
|
+
|
|
175
|
+
# Customize axis labels and tick marks
|
|
176
|
+
ax.tick_params(axis='x', labelsize=16, labelcolor="black")
|
|
177
|
+
ax.tick_params(axis='y', labelsize=16, labelcolor="black")
|
|
178
|
+
|
|
179
|
+
# Overall settings for layout and labels
|
|
180
|
+
plt.xticks(fontsize=12, color="black", rotation=45, ha="right")
|
|
181
|
+
plt.tight_layout()
|
|
182
|
+
plt.show()
|
|
183
|
+
|
|
184
|
+
|
|
185
|
+
def plot_percentage_bar(barplot_data):
|
|
186
|
+
"""
|
|
187
|
+
Creates a bar plot showing the percentage of genes for each gene group across tumor types.
|
|
188
|
+
|
|
189
|
+
Parameters:
|
|
190
|
+
- barplot_data: DataFrame containing 'gene_group', 'percentage', and 'Count' columns.
|
|
191
|
+
"""
|
|
192
|
+
# Define the gene group colors inside the function
|
|
193
|
+
gene_group_colors = {
|
|
194
|
+
"DIGs": "#8F3931FF",
|
|
195
|
+
"DSGs": "#FFB977",
|
|
196
|
+
"DCGs": "#FFC300"
|
|
197
|
+
}
|
|
198
|
+
|
|
199
|
+
tumor_types = barplot_data['tumor_type'].unique()
|
|
200
|
+
|
|
201
|
+
plt.figure(figsize=(5, 5))
|
|
202
|
+
sns.set(style="whitegrid")
|
|
203
|
+
|
|
204
|
+
# Create subplots for each tumor type
|
|
205
|
+
fig, axes = plt.subplots(1, len(tumor_types), figsize=(5, 5), sharey=True)
|
|
206
|
+
|
|
207
|
+
# If only one tumor type, ensure axes is a list
|
|
208
|
+
if len(tumor_types) == 1:
|
|
209
|
+
axes = [axes]
|
|
210
|
+
|
|
211
|
+
for idx, tumor_type in enumerate(tumor_types):
|
|
212
|
+
ax = axes[idx]
|
|
213
|
+
tumor_data = barplot_data[barplot_data['tumor_type'] == tumor_type]
|
|
214
|
+
|
|
215
|
+
# Plot the percentage bar plot
|
|
216
|
+
sns.barplot(data=tumor_data, x="gene_group", y="percentage", hue="gene_group",
|
|
217
|
+
palette=gene_group_colors, ax=ax, width=0.6)
|
|
218
|
+
|
|
219
|
+
# Add counts and percentages as labels
|
|
220
|
+
for p in ax.patches:
|
|
221
|
+
height = p.get_height()
|
|
222
|
+
gene_group = p.get_x() + p.get_width() / 2 # Get the x position of the patch (bar)
|
|
223
|
+
|
|
224
|
+
# Find the gene_group in the data based on its position
|
|
225
|
+
group_name = tumor_data.iloc[int(gene_group)]['gene_group']
|
|
226
|
+
count = tumor_data.loc[tumor_data['gene_group'] == group_name, 'Count'].values[0]
|
|
227
|
+
percentage = tumor_data.loc[tumor_data['gene_group'] == group_name, 'percentage'].values[0]
|
|
228
|
+
|
|
229
|
+
# Position the labels slightly above the bars
|
|
230
|
+
ax.text(p.get_x() + p.get_width() / 2, height + 0.5, f'{count} ({round(percentage, 1)}%)',
|
|
231
|
+
ha='center', va='bottom', fontsize=12, color="black")
|
|
232
|
+
|
|
233
|
+
ax.set_title(tumor_type, fontsize=16)
|
|
234
|
+
ax.set_xlabel("")
|
|
235
|
+
ax.set_ylabel("Percentage of Genes", fontsize=16)
|
|
236
|
+
|
|
237
|
+
# Customize axis labels and tick marks
|
|
238
|
+
ax.tick_params(axis='x', labelsize=16, labelcolor="black", rotation=45)
|
|
239
|
+
ax.tick_params(axis='y', labelsize=16, labelcolor="black")
|
|
240
|
+
|
|
241
|
+
# Explicitly set the x-tick labels with proper rotation and alignment
|
|
242
|
+
for tick in ax.get_xticklabels():
|
|
243
|
+
tick.set_horizontalalignment('right') # This ensures proper alignment for x-ticks
|
|
244
|
+
tick.set_rotation(45)
|
|
245
|
+
|
|
246
|
+
# Overall settings for layout and labels
|
|
247
|
+
plt.tight_layout()
|
|
248
|
+
plt.show()
|
|
249
|
+
|
|
250
|
+
|
|
251
|
+
def plot_pca_clusters(pca_coords, labels, explained_var, title="PCA Clustering"):
|
|
252
|
+
"""
|
|
253
|
+
Scatterplot of first 2 PCs with cluster colors, showing variance explained.
|
|
254
|
+
|
|
255
|
+
Parameters
|
|
256
|
+
----------
|
|
257
|
+
pca_coords : DataFrame
|
|
258
|
+
PCA coordinates (samples × PCs), from pca_cluster_cn().
|
|
259
|
+
labels : pd.Series
|
|
260
|
+
Cluster assignments (index must match pca_coords).
|
|
261
|
+
explained_var : array-like
|
|
262
|
+
Explained variance ratio for each PC.
|
|
263
|
+
title : str
|
|
264
|
+
Plot title.
|
|
265
|
+
"""
|
|
266
|
+
df_plot = pca_coords.copy()
|
|
267
|
+
df_plot["cluster"] = labels.astype(str)
|
|
268
|
+
|
|
269
|
+
plt.figure(figsize=(7,6))
|
|
270
|
+
sns.scatterplot(
|
|
271
|
+
x="PC1", y="PC2", hue="cluster",
|
|
272
|
+
data=df_plot, palette="Set2", s=70, alpha=0.9, edgecolor="k"
|
|
273
|
+
)
|
|
274
|
+
|
|
275
|
+
# Format axis labels with variance %
|
|
276
|
+
pc1_var = explained_var[0] * 100
|
|
277
|
+
pc2_var = explained_var[1] * 100
|
|
278
|
+
plt.xlabel(f"PC1 ({pc1_var:.1f}% variance)")
|
|
279
|
+
plt.ylabel(f"PC2 ({pc2_var:.1f}% variance)")
|
|
280
|
+
|
|
281
|
+
plt.title(title, fontsize=14)
|
|
282
|
+
plt.legend(title="Cluster", bbox_to_anchor=(1.05, 1), loc="upper left")
|
|
283
|
+
plt.tight_layout()
|
|
284
|
+
plt.show()
|
|
285
|
+
|
|
286
|
+
|
|
287
|
+
def plot_consensus_matrix(consensus_matrix, labels, title="Consensus Matrix"):
|
|
288
|
+
"""
|
|
289
|
+
Heatmap of consensus matrix with samples ordered by cluster.
|
|
290
|
+
|
|
291
|
+
Parameters
|
|
292
|
+
----------
|
|
293
|
+
consensus_matrix : pd.DataFrame
|
|
294
|
+
Sample × Sample consensus values.
|
|
295
|
+
labels : pd.Series
|
|
296
|
+
Final cluster assignments (same sample index).
|
|
297
|
+
"""
|
|
298
|
+
# Order samples by cluster
|
|
299
|
+
ordered_samples = labels.sort_values().index
|
|
300
|
+
mat = consensus_matrix.loc[ordered_samples, ordered_samples]
|
|
301
|
+
|
|
302
|
+
plt.figure(figsize=(7,6))
|
|
303
|
+
sns.heatmap(mat, cmap="viridis", square=True, cbar_kws={"label": "Consensus"})
|
|
304
|
+
plt.title(title, fontsize=14)
|
|
305
|
+
plt.xlabel("Samples")
|
|
306
|
+
plt.ylabel("Samples")
|
|
307
|
+
plt.tight_layout()
|
|
308
|
+
plt.show()
|
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import warnings
|
|
3
|
+
from math import ceil, floor
|
|
4
|
+
from pathlib import Path
|
|
5
|
+
|
|
6
|
+
import numpy as np
|
|
7
|
+
import pandas as pd
|
|
8
|
+
|
|
9
|
+
from typing import List, Literal, Optional, Dict, Any, cast
|
|
10
|
+
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
def filter_low_count_genes(
|
|
14
|
+
df: pd.DataFrame,
|
|
15
|
+
other_dfs: Optional[List[pd.DataFrame]] = None,
|
|
16
|
+
min_count: int = 10,
|
|
17
|
+
min_samples: Optional[int] = 3,
|
|
18
|
+
min_frac: Optional[float] = None,
|
|
19
|
+
return_mask: bool = False
|
|
20
|
+
) -> Dict[str, Any]:
|
|
21
|
+
"""
|
|
22
|
+
Filter genes (columns) by expression thresholds.
|
|
23
|
+
|
|
24
|
+
Parameters
|
|
25
|
+
----------
|
|
26
|
+
df : pd.DataFrame
|
|
27
|
+
Main dataframe (genes as columns, samples as rows).
|
|
28
|
+
other_dfs : list of pd.DataFrame, optional
|
|
29
|
+
Other dataframes with the same columns to filter in parallel.
|
|
30
|
+
min_count : int, default=10
|
|
31
|
+
Minimum expression/count threshold.
|
|
32
|
+
min_samples : int, default=3
|
|
33
|
+
Minimum number of samples meeting the threshold.
|
|
34
|
+
min_frac : float, optional
|
|
35
|
+
Fraction of samples that must meet the threshold.
|
|
36
|
+
If provided, overrides min_samples.
|
|
37
|
+
return_mask : bool, default=False
|
|
38
|
+
If True, also return the boolean mask of kept genes.
|
|
39
|
+
|
|
40
|
+
Returns
|
|
41
|
+
-------
|
|
42
|
+
result : dict
|
|
43
|
+
{
|
|
44
|
+
"filtered_df": pd.DataFrame,
|
|
45
|
+
"other_filtered": list[pd.DataFrame] or None,
|
|
46
|
+
"mask": pd.Series (if return_mask),
|
|
47
|
+
"stats": dict with counts
|
|
48
|
+
}
|
|
49
|
+
"""
|
|
50
|
+
# compute required min_samples
|
|
51
|
+
if min_frac is not None:
|
|
52
|
+
min_samples = max(1, int(round(min_frac * df.shape[0])))
|
|
53
|
+
|
|
54
|
+
# gene-wise filter mask
|
|
55
|
+
mask = (df >= min_count).sum(axis=0) >= min_samples
|
|
56
|
+
|
|
57
|
+
# apply mask
|
|
58
|
+
filtered_df = df.loc[:, mask]
|
|
59
|
+
filtered_others = [odf.loc[:, mask] for odf in other_dfs] if other_dfs else None
|
|
60
|
+
|
|
61
|
+
# collect stats
|
|
62
|
+
stats = {
|
|
63
|
+
"n_total": df.shape[1],
|
|
64
|
+
"n_kept": int(mask.sum()),
|
|
65
|
+
"n_removed": int((~mask).sum()),
|
|
66
|
+
"min_count": min_count,
|
|
67
|
+
"min_samples": min_samples,
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
result = {
|
|
71
|
+
"filtered_df": filtered_df,
|
|
72
|
+
"other_filtered": filtered_others,
|
|
73
|
+
"stats": stats,
|
|
74
|
+
}
|
|
75
|
+
if return_mask:
|
|
76
|
+
result["mask"] = mask
|
|
77
|
+
|
|
78
|
+
return result
|
|
79
|
+
|
|
80
|
+
|
|
81
|
+
def process_results(file_path, method, lfc_cut = 1.0, pval_cut = 0.05):
|
|
82
|
+
df = pd.read_csv(file_path, index_col=0)
|
|
83
|
+
df['isDE'] = (np.abs(df['log2FoldChange']) >= lfc_cut) & (df['padj'] <= pval_cut)
|
|
84
|
+
df['DEtype'] = np.where(
|
|
85
|
+
~df['isDE'],
|
|
86
|
+
"n.s.",
|
|
87
|
+
np.where(df['log2FoldChange'] > 0, "Up-reg", "Down-reg")
|
|
88
|
+
)
|
|
89
|
+
df['method'] = method
|
|
90
|
+
return df[['log2FoldChange', 'padj', 'isDE', 'DEtype', 'method']]
|
|
91
|
+
|
|
92
|
+
|
|
93
|
+
def define_gene_groups(res_joint):
|
|
94
|
+
DSGs = res_joint[
|
|
95
|
+
((res_joint['DEtype_naive'] == "Up-reg") & (res_joint['DEtype_aware'] == "n.s.")) |
|
|
96
|
+
((res_joint['DEtype_naive'] == "Down-reg") & (res_joint['DEtype_aware'] == "n.s."))
|
|
97
|
+
].assign(gene_category='DSGs')
|
|
98
|
+
|
|
99
|
+
DIGs = res_joint[
|
|
100
|
+
((res_joint['DEtype_naive'] == "Up-reg") & (res_joint['DEtype_aware'] == "Up-reg")) |
|
|
101
|
+
((res_joint['DEtype_naive'] == "Down-reg") & (res_joint['DEtype_aware'] == "Down-reg"))
|
|
102
|
+
].assign(gene_category='DIGs')
|
|
103
|
+
|
|
104
|
+
DCGs = res_joint[
|
|
105
|
+
((res_joint['DEtype_naive'] == "n.s.") & (res_joint['DEtype_aware'] == "Up-reg")) |
|
|
106
|
+
((res_joint['DEtype_naive'] == "n.s.") & (res_joint['DEtype_aware'] == "Down-reg"))
|
|
107
|
+
].assign(gene_category='DCGs')
|
|
108
|
+
|
|
109
|
+
non_DEGs = res_joint[
|
|
110
|
+
(res_joint['DEtype_naive'] == "n.s.") & (res_joint['DEtype_aware'] == "n.s.")
|
|
111
|
+
].assign(gene_category='non-DEGs')
|
|
112
|
+
|
|
113
|
+
return {
|
|
114
|
+
"DSGs": DSGs,
|
|
115
|
+
"DIGs": DIGs,
|
|
116
|
+
"DCGs": DCGs,
|
|
117
|
+
"non_DEGs": non_DEGs
|
|
118
|
+
}
|
|
119
|
+
|
|
120
|
+
|
|
121
|
+
def clean_gene_group(df, mode="naive"):
|
|
122
|
+
"""Rename and subset a gene group dataframe for a given mode."""
|
|
123
|
+
suffix = f"_{mode}"
|
|
124
|
+
rename_map = {
|
|
125
|
+
f"logFC{suffix}": "log2FC",
|
|
126
|
+
f"padj{suffix}": "padj",
|
|
127
|
+
f"isDE{suffix}": "isDE",
|
|
128
|
+
f"DEtype{suffix}": "DEtype",
|
|
129
|
+
f"method{suffix}": "method",
|
|
130
|
+
"gene_category": "gene_group"
|
|
131
|
+
}
|
|
132
|
+
return df.rename(columns=rename_map)[["log2FC", "padj", "isDE", "DEtype", "method", "gene_group"]]
|
|
@@ -6,13 +6,13 @@ long_description = (this_directory / "README.md").read_text()
|
|
|
6
6
|
|
|
7
7
|
setup(
|
|
8
8
|
name="DeConveil",
|
|
9
|
-
version="0.1.
|
|
9
|
+
version="0.1.1",
|
|
10
10
|
description="An extension of PyDESeq2/DESeq2 designed to account for genome aneuploidy",
|
|
11
11
|
url="https://github.com/caravagnalab/DeConveil",
|
|
12
12
|
author="Katsiaryna Davydzenka",
|
|
13
13
|
author_email="katiasen89@gmail.com",
|
|
14
14
|
license="MIT",
|
|
15
|
-
packages=["
|
|
15
|
+
packages=["deconveil"],
|
|
16
16
|
python_requires=">=3.10.0",
|
|
17
17
|
install_requires=[
|
|
18
18
|
"anndata>=0.8.0",
|
|
@@ -1,15 +0,0 @@
|
|
|
1
|
-
LICENSE
|
|
2
|
-
README.md
|
|
3
|
-
setup.py
|
|
4
|
-
DeConveil/__init__.py
|
|
5
|
-
DeConveil/dds.py
|
|
6
|
-
DeConveil/default_inference.py
|
|
7
|
-
DeConveil/ds.py
|
|
8
|
-
DeConveil/grid_search.py
|
|
9
|
-
DeConveil/inference.py
|
|
10
|
-
DeConveil/utils_CNaware.py
|
|
11
|
-
DeConveil.egg-info/PKG-INFO
|
|
12
|
-
DeConveil.egg-info/SOURCES.txt
|
|
13
|
-
DeConveil.egg-info/dependency_links.txt
|
|
14
|
-
DeConveil.egg-info/requires.txt
|
|
15
|
-
DeConveil.egg-info/top_level.txt
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
DeConveil
|
deconveil-0.1.0/README.md
DELETED
|
@@ -1,40 +0,0 @@
|
|
|
1
|
-
# DeConveil
|
|
2
|
-
|
|
3
|
-
The goal of *DeConveil* is the extension of Differential Gene Expression testing by accounting for genome aneuploidy.
|
|
4
|
-
This computational framework extends traditional DGE analysis by integrating Copy Number Variation (CNV) data.
|
|
5
|
-
This approach adjusts for dosage effects and categorizes genes as *dosage-sensitive (DSG)*, *dosage-insensitive (DIG)*, and *dosage-compensated (DCG)*, separating the expression changes caused by CNVs from other alterations in transcriptional regulation.
|
|
6
|
-
To perform this gene separation we need to perform DGE testing using both PyDESeq2 (CN-naive) and DeConveil (CN-aware) methods.
|
|
7
|
-
|
|
8
|
-
**Pre-required installations before running DeConveil**
|
|
9
|
-
|
|
10
|
-
Python libraries are required to be installed: *pydeseq2*
|
|
11
|
-
|
|
12
|
-
`pip install pydeseq2`
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
**How to install DeConveil**
|
|
16
|
-
|
|
17
|
-
`git clone https://github.com/caravagnalab/DeConveil.git`
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
**Input data**
|
|
21
|
-
|
|
22
|
-
DeConveil requires the following input matrices: matched mRNA read counts (normal and tumor samples) and absolute CN values (for normal diploid samples we assign CN 2), and design matrix. Example of CN data for a given gene *g*: [1,2,3,4,5,6]. Each value of CN we divide by 2: CN/2. Example of input data is shown in *test_deconveil* Jupyter Notebook.
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
**Output data**
|
|
26
|
-
|
|
27
|
-
`res_CNnaive.csv` (for *PyDESeq2* method) and `res_CNaware.csv` (for *DeConveil*) data frames reporting *log2FC* and *p.adjust* for both methods.
|
|
28
|
-
These data are further processed to separate gene groups using `define_gene_groups()` function included in DeConveil framework.
|
|
29
|
-
|
|
30
|
-
The tutorial of the analysis workflow is shown in `test_deconveil.ipynb`
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
### References
|
|
35
|
-
|
|
36
|
-
1. Michael I Love, Wolfgang Huber, and Simon Anders. Moderated estimation of fold change and dispersion for rna-seq data with deseq2. Genome biology, 15(12):1–21, 2014. doi:10.1186/s13059-014-0550-8.
|
|
37
|
-
|
|
38
|
-
2. Boris Muzellec, Maria Telenczuk, Vincent Cabeli, and Mathieu Andreux. Pydeseq2: a python package for bulk rna-seq differential expression analysis. bioRxiv, pages 2022–12, 2022. doi:10.1101/2022.12.14.520412.
|
|
39
|
-
|
|
40
|
-
3. Anqi Zhu, Joseph G Ibrahim, and Michael I Love. Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics, 35(12):2084–2092, 2019. doi:10.1093/bioinformatics/bty895.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|