bio-analyze-plot 0.1.0a0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- bio_analyze_plot-0.1.0a0/.gitignore +26 -0
- bio_analyze_plot-0.1.0a0/CHANGELOG.md +0 -0
- bio_analyze_plot-0.1.0a0/PKG-INFO +273 -0
- bio_analyze_plot-0.1.0a0/README.md +249 -0
- bio_analyze_plot-0.1.0a0/metadata/bar.json +169 -0
- bio_analyze_plot-0.1.0a0/metadata/bar_api.json +190 -0
- bio_analyze_plot-0.1.0a0/metadata/bar_cli.json +170 -0
- bio_analyze_plot-0.1.0a0/metadata/box.json +149 -0
- bio_analyze_plot-0.1.0a0/metadata/box_api.json +150 -0
- bio_analyze_plot-0.1.0a0/metadata/box_cli.json +150 -0
- bio_analyze_plot-0.1.0a0/metadata/chromosome_api.json +120 -0
- bio_analyze_plot-0.1.0a0/metadata/chromosome_cli.json +110 -0
- bio_analyze_plot-0.1.0a0/metadata/gsea.json +159 -0
- bio_analyze_plot-0.1.0a0/metadata/gsea_api.json +140 -0
- bio_analyze_plot-0.1.0a0/metadata/gsea_cli.json +160 -0
- bio_analyze_plot-0.1.0a0/metadata/heatmap.json +99 -0
- bio_analyze_plot-0.1.0a0/metadata/heatmap_api.json +120 -0
- bio_analyze_plot-0.1.0a0/metadata/heatmap_cli.json +120 -0
- bio_analyze_plot-0.1.0a0/metadata/line.json +169 -0
- bio_analyze_plot-0.1.0a0/metadata/line_api.json +200 -0
- bio_analyze_plot-0.1.0a0/metadata/line_cli.json +170 -0
- bio_analyze_plot-0.1.0a0/metadata/pca.json +109 -0
- bio_analyze_plot-0.1.0a0/metadata/pca_api.json +120 -0
- bio_analyze_plot-0.1.0a0/metadata/pca_cli.json +150 -0
- bio_analyze_plot-0.1.0a0/metadata/pie.json +119 -0
- bio_analyze_plot-0.1.0a0/metadata/pie_api.json +110 -0
- bio_analyze_plot-0.1.0a0/metadata/pie_cli.json +120 -0
- bio_analyze_plot-0.1.0a0/metadata/scatter.json +129 -0
- bio_analyze_plot-0.1.0a0/metadata/scatter_api.json +130 -0
- bio_analyze_plot-0.1.0a0/metadata/scatter_cli.json +130 -0
- bio_analyze_plot-0.1.0a0/metadata/volcano.json +99 -0
- bio_analyze_plot-0.1.0a0/metadata/volcano_api.json +120 -0
- bio_analyze_plot-0.1.0a0/metadata/volcano_cli.json +100 -0
- bio_analyze_plot-0.1.0a0/pyproject.toml +39 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/__init__.py +35 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/cli.py +578 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/__init__.py +30 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/bar.py +282 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/base.py +147 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/box.py +137 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/chromosome.py +219 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/gsea.py +248 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/heatmap.py +143 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/line.py +254 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/pca.py +176 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/pie.py +119 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/scatter.py +197 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/plots/volcano.py +158 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/theme.py +349 -0
- bio_analyze_plot-0.1.0a0/src/bio_analyze_plot/volcano.py +84 -0
- bio_analyze_plot-0.1.0a0/tests/conftest.py +139 -0
- bio_analyze_plot-0.1.0a0/tests/debug_fonts.py +41 -0
- bio_analyze_plot-0.1.0a0/tests/test_bar/test_bar_plot_custom_error_bars.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_bar/test_bar_plot_generation.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_bar/test_bar_plot_sd.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_bar/test_bar_plot_se.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_bar/test_bar_plot_significance.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_bar.py +83 -0
- bio_analyze_plot-0.1.0a0/tests/test_box/test_box_plot_generation.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_box/test_box_plot_significance.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_box/test_box_plot_swarm.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_box.py +67 -0
- bio_analyze_plot-0.1.0a0/tests/test_chinese_font.py +44 -0
- bio_analyze_plot-0.1.0a0/tests/test_chromosome/test_chromosome_plot_custom_colors.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_chromosome/test_chromosome_plot_generation.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_chromosome/test_chromosome_plot_max_chroms.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_chromosome/test_chromosome_plot_sorting.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_chromosome.py +49 -0
- bio_analyze_plot-0.1.0a0/tests/test_cli.py +103 -0
- bio_analyze_plot-0.1.0a0/tests/test_cli_sheet.py +45 -0
- bio_analyze_plot-0.1.0a0/tests/test_custom_theme.py +54 -0
- bio_analyze_plot-0.1.0a0/tests/test_customization.py +83 -0
- bio_analyze_plot-0.1.0a0/tests/test_gsea/test_gsea_no_border.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_gsea/test_gsea_no_metric.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_gsea/test_gsea_with_metric.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_gsea.py +69 -0
- bio_analyze_plot-0.1.0a0/tests/test_heatmap/test_heatmap_plot_generation.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_heatmap.py +32 -0
- bio_analyze_plot-0.1.0a0/tests/test_latex.py +35 -0
- bio_analyze_plot-0.1.0a0/tests/test_line/test_line_plot_generation.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_line.py +22 -0
- bio_analyze_plot-0.1.0a0/tests/test_pca/test_pca_plot.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_pca/test_pca_plot_tidy.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_pca.py +56 -0
- bio_analyze_plot-0.1.0a0/tests/test_pca_cluster.py +38 -0
- bio_analyze_plot-0.1.0a0/tests/test_pie/test_pie_plot.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_pie/test_pie_plot_explode.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_pie/test_pie_plot_explode_list.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_pie.py +50 -0
- bio_analyze_plot-0.1.0a0/tests/test_plot.py +6 -0
- bio_analyze_plot-0.1.0a0/tests/test_scatter/test_scatter_basic.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_scatter/test_scatter_complex.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_scatter/test_scatter_ellipse.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_scatter.py +69 -0
- bio_analyze_plot-0.1.0a0/tests/test_themes_exhaustive.py +85 -0
- bio_analyze_plot-0.1.0a0/tests/test_volcano/test_volcano_plot_generation.png +0 -0
- bio_analyze_plot-0.1.0a0/tests/test_volcano.py +23 -0
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
__pycache__/
|
|
2
|
+
*.py[cod]
|
|
3
|
+
*.pyd
|
|
4
|
+
*.pyo
|
|
5
|
+
*.so
|
|
6
|
+
.Python
|
|
7
|
+
.venv/
|
|
8
|
+
env/
|
|
9
|
+
venv/
|
|
10
|
+
|
|
11
|
+
build/
|
|
12
|
+
dist/
|
|
13
|
+
*.egg-info/
|
|
14
|
+
.pytest_cache/
|
|
15
|
+
.ruff_cache/
|
|
16
|
+
.mypy_cache/
|
|
17
|
+
|
|
18
|
+
.idea/
|
|
19
|
+
.vscode/
|
|
20
|
+
|
|
21
|
+
uv.lock
|
|
22
|
+
|
|
23
|
+
*.log
|
|
24
|
+
output
|
|
25
|
+
.trae/
|
|
26
|
+
*.xml
|
|
File without changes
|
|
@@ -0,0 +1,273 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: bio-analyze-plot
|
|
3
|
+
Version: 0.1.0a0
|
|
4
|
+
Summary: Publication-ready plotting module for bio-analyze.
|
|
5
|
+
Author: qww
|
|
6
|
+
License: GPL-3.0
|
|
7
|
+
Requires-Python: <3.15,>=3.9
|
|
8
|
+
Requires-Dist: bio-analyze-core>=0.1.0a0
|
|
9
|
+
Requires-Dist: matplotlib>=3.8.0
|
|
10
|
+
Requires-Dist: numpy>=1.20.0
|
|
11
|
+
Requires-Dist: openpyxl>=3.1.0
|
|
12
|
+
Requires-Dist: pandas>=2.0.0
|
|
13
|
+
Requires-Dist: scikit-learn>=1.3.0
|
|
14
|
+
Requires-Dist: scipy>=1.10.0
|
|
15
|
+
Requires-Dist: seaborn>=0.13.0
|
|
16
|
+
Requires-Dist: statannotations>=0.6.0
|
|
17
|
+
Requires-Dist: typer>=0.9.0
|
|
18
|
+
Provides-Extra: dev
|
|
19
|
+
Requires-Dist: allure-pytest; extra == 'dev'
|
|
20
|
+
Requires-Dist: pillow; extra == 'dev'
|
|
21
|
+
Requires-Dist: pytest; extra == 'dev'
|
|
22
|
+
Requires-Dist: pytest-regressions; extra == 'dev'
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
|
|
25
|
+
# bio-analyze-plot
|
|
26
|
+
|
|
27
|
+
**bio-analyze-plot** is the professional plotting module in the `bio-analyze` toolbox. Built on `matplotlib` and `seaborn`, it aims to generate publication-ready statistical charts and supports one-click switching between journal themes like `Nature` and `Science`.
|
|
28
|
+
|
|
29
|
+
## ✨ Features
|
|
30
|
+
|
|
31
|
+
- **Publication-Ready Themes**: Built-in `nature`, `science`, and `default` themes that automatically adjust fonts, font sizes, line widths, and color palettes.
|
|
32
|
+
- **Wide Data Support**: Supports `.csv`, `.tsv`, `.txt`, `.xlsx`, and `.xls` formats. Specific Excel sheets can be targeted via `--sheet`.
|
|
33
|
+
- **Multi-Format Export**: Supports various image formats including `png`, `pdf`, `svg`, `jpg`, and `tiff`.
|
|
34
|
+
- **LaTeX Support**: Automatically parses LaTeX formulas in axis labels (e.g., `$y = \sin(x)$`).
|
|
35
|
+
- **Unified CLI**: All charts can be invoked through a unified command-line interface.
|
|
36
|
+
|
|
37
|
+
## 📊 Supported Plots
|
|
38
|
+
|
|
39
|
+
### 1. Volcano Plot
|
|
40
|
+
|
|
41
|
+
Used to display the distribution of Differentially Expressed Genes (DEGs), intuitively showing significantly up-regulated and down-regulated genes.
|
|
42
|
+
|
|
43
|
+
- **Command**: `volcano`
|
|
44
|
+
- **Key Parameters**:
|
|
45
|
+
- `-x`: log2 Fold Change column name (Default: "log2FoldChange")
|
|
46
|
+
- `-y`: P-value column name (Default: "pvalue")
|
|
47
|
+
- `--fc-cutoff`: Fold Change threshold
|
|
48
|
+
- `--p-cutoff`: P-value threshold
|
|
49
|
+
- `--labels`: Custom labels (e.g., `{"up": "Up", "down": "Down", "ns": "NS"}`)
|
|
50
|
+
|
|
51
|
+
### 2. Bar Plot
|
|
52
|
+
|
|
53
|
+
Supports bar charts with error bars (SD/SE/CI) and significance markers.
|
|
54
|
+
|
|
55
|
+
- **Command**: `bar`
|
|
56
|
+
- **Key Parameters**:
|
|
57
|
+
- `--error-bar-type`: Error bar type. Options: `SD` (Standard Deviation), `SE` (Standard Error), `CI` (Confidence Interval).
|
|
58
|
+
- `--error-bar-ci`: Confidence level when type is `CI` (Default: 95).
|
|
59
|
+
- `--significance`: Specify group pairs for significance annotation, e.g., "Control,Treated".
|
|
60
|
+
- `--test`: Significance test method. Supports `t-test_ind`, `t-test_welch`, `Mann-Whitney`, etc.
|
|
61
|
+
- `--text-format`: Significance marker format (`star`, `full`, `simple`, `pvalue`).
|
|
62
|
+
|
|
63
|
+
### 3. Box Plot
|
|
64
|
+
|
|
65
|
+
Displays data distribution, supporting overlaid SwarmPlot scatter points and significance markers.
|
|
66
|
+
|
|
67
|
+
- **Command**: `box`
|
|
68
|
+
- **Key Parameters**:
|
|
69
|
+
- `-x`: Grouping column (Categorical)
|
|
70
|
+
- `-y`: Value column (Numerical)
|
|
71
|
+
- `--hue`: Color grouping column
|
|
72
|
+
- `--add-swarm`: Whether to overlay a Swarmplot to show all data points.
|
|
73
|
+
- `--significance`: Group pairs for significance annotation.
|
|
74
|
+
|
|
75
|
+
### 4. Heatmap
|
|
76
|
+
|
|
77
|
+
Used to display clustered heatmaps of gene expression or other matrix data.
|
|
78
|
+
|
|
79
|
+
- **Command**: `heatmap`
|
|
80
|
+
- **Key Parameters**:
|
|
81
|
+
- `--cluster-rows` / `--cluster-cols`: Whether to cluster rows/columns.
|
|
82
|
+
- `--z-score`: Perform Z-score normalization on rows (0) or columns (1).
|
|
83
|
+
|
|
84
|
+
### 5. PCA Plot
|
|
85
|
+
|
|
86
|
+
Displays the distribution of samples in principal component space, supporting automatic clustering ellipses.
|
|
87
|
+
|
|
88
|
+
- **Command**: `pca`
|
|
89
|
+
- **Key Parameters**:
|
|
90
|
+
- `--transpose`: Whether to transpose the matrix (if input is Genes x Samples, it usually needs to be transposed to Samples x Genes).
|
|
91
|
+
- `--hue`: Sample grouping column.
|
|
92
|
+
- `--cluster`: Whether to display clustering confidence ellipses.
|
|
93
|
+
|
|
94
|
+
### 6. Line Plot
|
|
95
|
+
|
|
96
|
+
Used to display time series or trend data, supporting smooth fitting and error bars.
|
|
97
|
+
|
|
98
|
+
- **Command**: `line`
|
|
99
|
+
- **Key Parameters**:
|
|
100
|
+
- `--hue`: Grouping column; different groups are shown in different colors.
|
|
101
|
+
- `--smooth`: Enable smooth curve fitting (B-spline).
|
|
102
|
+
- `--smooth-points`: Number of interpolation points for smoothing (Default: 300).
|
|
103
|
+
- `--error-bar-type`: Error bar type (`SD`, `SE`, `CI`).
|
|
104
|
+
- `--error-bar-ci`: Confidence interval size.
|
|
105
|
+
- `--error-bar-capsize`: Width of the error bar caps.
|
|
106
|
+
- `--markers`: Display markers for original data points.
|
|
107
|
+
|
|
108
|
+
### 7. Scatter Plot
|
|
109
|
+
|
|
110
|
+
Displays the relationship between two variables, supporting confidence ellipses.
|
|
111
|
+
|
|
112
|
+
- **Command**: `scatter`
|
|
113
|
+
- **Key Parameters**:
|
|
114
|
+
- `--x`, `--y`: X/Y axis column names.
|
|
115
|
+
- `--hue`: Color grouping column.
|
|
116
|
+
- `--size`: Column to map point sizes.
|
|
117
|
+
- `--style`: Column to map point styles/shapes.
|
|
118
|
+
- `--add-ellipse`: Draw confidence ellipses for each group.
|
|
119
|
+
- `--ellipse-std`: Standard deviation multiplier for the ellipse (Default: 2.0).
|
|
120
|
+
|
|
121
|
+
### 8. Pie Chart
|
|
122
|
+
|
|
123
|
+
Displays the proportions of categorical data.
|
|
124
|
+
|
|
125
|
+
- **Command**: `pie`
|
|
126
|
+
- **Key Parameters**:
|
|
127
|
+
- `--explode`: Distance to explode sectors.
|
|
128
|
+
- `--autopct`: Percentage display format (Default: "%1.1f%%").
|
|
129
|
+
|
|
130
|
+
### 9. Chromosome Distribution Plot
|
|
131
|
+
|
|
132
|
+
Displays the distribution density of Reads across whole-genome chromosomes.
|
|
133
|
+
|
|
134
|
+
- **Command**: `chromosome`
|
|
135
|
+
- **Description**: Typically used with the `rna_seq` pipeline to show read coverage on positive and negative strands.
|
|
136
|
+
|
|
137
|
+
### 10. GSEA Enrichment Plot
|
|
138
|
+
|
|
139
|
+
Displays the Enrichment Score trend of GSEA analysis.
|
|
140
|
+
|
|
141
|
+
- **Command**: `gsea`
|
|
142
|
+
- **Key Parameters**:
|
|
143
|
+
- `--rank`: Rank value column name.
|
|
144
|
+
- `--score`: Running ES column name.
|
|
145
|
+
- `--nes`: Normalized Enrichment Score (displayed in the title).
|
|
146
|
+
- `--pvalue` / `--fdr`: Statistical significance metrics.
|
|
147
|
+
|
|
148
|
+
## 🎨 Themes
|
|
149
|
+
|
|
150
|
+
Supports customizing plotting themes via JSON files or Python code.
|
|
151
|
+
|
|
152
|
+
### Using Built-in Themes
|
|
153
|
+
|
|
154
|
+
```bash
|
|
155
|
+
# Use Nature style
|
|
156
|
+
bioanalyze plot volcano result.csv --theme nature
|
|
157
|
+
|
|
158
|
+
# Use Science style
|
|
159
|
+
bioanalyze plot volcano result.csv --theme science
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### Custom Themes (JSON)
|
|
163
|
+
|
|
164
|
+
Create `my_theme.json`:
|
|
165
|
+
|
|
166
|
+
```json
|
|
167
|
+
{
|
|
168
|
+
"name": "dark_presentation",
|
|
169
|
+
"style": "darkgrid",
|
|
170
|
+
"context": "talk",
|
|
171
|
+
"font": "Arial",
|
|
172
|
+
"rc_params": {
|
|
173
|
+
"lines.linewidth": 2.5,
|
|
174
|
+
"axes.labelsize": 14
|
|
175
|
+
}
|
|
176
|
+
}
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
Usage: `bioanalyze plot volcano ... --theme ./my_theme.json`
|
|
180
|
+
|
|
181
|
+
## 📦 Python API
|
|
182
|
+
|
|
183
|
+
All charts can be invoked directly via Python classes, supporting more flexible customization.
|
|
184
|
+
|
|
185
|
+
### Basic Usage
|
|
186
|
+
|
|
187
|
+
```python
|
|
188
|
+
import pandas as pd
|
|
189
|
+
from bio_analyze_plot.plots import VolcanoPlot, PCAPlot
|
|
190
|
+
|
|
191
|
+
# 1. Plot Volcano
|
|
192
|
+
df = pd.read_csv("de_results.csv")
|
|
193
|
+
plotter = VolcanoPlot(theme="nature")
|
|
194
|
+
plotter.plot(
|
|
195
|
+
data=df,
|
|
196
|
+
x="log2FoldChange",
|
|
197
|
+
y="padj",
|
|
198
|
+
fc_cutoff=1.5,
|
|
199
|
+
p_cutoff=0.05,
|
|
200
|
+
title="Differential Expression",
|
|
201
|
+
output="volcano.pdf"
|
|
202
|
+
)
|
|
203
|
+
|
|
204
|
+
# 2. Plot PCA
|
|
205
|
+
counts = pd.read_csv("counts.csv", index_col=0)
|
|
206
|
+
pca = PCAPlot(theme="science")
|
|
207
|
+
pca.plot(
|
|
208
|
+
data=counts,
|
|
209
|
+
transpose=True, # If input is Genes x Samples
|
|
210
|
+
hue=["Control", "Control", "Treat", "Treat"], # Sample grouping
|
|
211
|
+
cluster=True, # Draw confidence ellipses
|
|
212
|
+
output="pca.png"
|
|
213
|
+
)
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
### Chart Class Index
|
|
217
|
+
|
|
218
|
+
#### `VolcanoPlot`
|
|
219
|
+
- **plot() Parameters**:
|
|
220
|
+
- `data` (DataFrame): Data source.
|
|
221
|
+
- `x`, `y` (str): Column names.
|
|
222
|
+
- `log_y` (bool): Whether to apply -log10 to y (Default: True).
|
|
223
|
+
- `fc_cutoff`, `p_cutoff` (float): Threshold lines.
|
|
224
|
+
- `labels` (dict): Legend labels (e.g., `{"up": "Up", "down": "Down", "ns": "NS"}`).
|
|
225
|
+
|
|
226
|
+
#### `HeatmapPlot`
|
|
227
|
+
- **plot() Parameters**:
|
|
228
|
+
- `cluster_rows`, `cluster_cols` (bool): Whether to cluster.
|
|
229
|
+
- `z_score` (int): 0=Row standardization, 1=Column standardization, None=No standardization.
|
|
230
|
+
- `cmap` (str): Colormap (Default: "vlag").
|
|
231
|
+
|
|
232
|
+
#### `BoxPlot`
|
|
233
|
+
- **plot() Parameters**:
|
|
234
|
+
- `significance` (list[tuple]): Pairs for significance markers (e.g., `[("Ctrl", "Treat")]`).
|
|
235
|
+
- `test` (str): Test method (Default: "t-test_ind").
|
|
236
|
+
- `add_swarm` (bool): Whether to overlay scatter points.
|
|
237
|
+
|
|
238
|
+
#### `LinePlot`
|
|
239
|
+
- **plot() Parameters**:
|
|
240
|
+
- `smooth` (bool): Enable smooth curve.
|
|
241
|
+
- `error_bar_type` (str): Error bar type.
|
|
242
|
+
- `markers` (bool/list): Data point markers.
|
|
243
|
+
|
|
244
|
+
#### `ScatterPlot`
|
|
245
|
+
- **plot() Parameters**:
|
|
246
|
+
- `add_ellipse` (bool): Draw confidence ellipses.
|
|
247
|
+
- `ellipse_std` (float): Ellipse standard deviation.
|
|
248
|
+
- `style`, `size` (str): Style/size mapping columns.
|
|
249
|
+
|
|
250
|
+
#### `PCAPlot` (Inherits from ScatterPlot)
|
|
251
|
+
- **plot() Parameters**:
|
|
252
|
+
- `transpose` (bool): Whether to transpose the input matrix.
|
|
253
|
+
- `n_components` (int): Number of principal components.
|
|
254
|
+
- `cluster` (bool): Whether to draw confidence ellipses.
|
|
255
|
+
|
|
256
|
+
#### `ChromosomeDistributionPlot`
|
|
257
|
+
- **plot() Parameters**:
|
|
258
|
+
- `chrom_col`, `pos_col`: Chromosome and position column names.
|
|
259
|
+
- `pos_counts_col`, `neg_counts_col`: Positive and negative strand count columns.
|
|
260
|
+
- `max_chroms` (int): Maximum number of chromosomes to display.
|
|
261
|
+
|
|
262
|
+
#### `GSEAPlot`
|
|
263
|
+
- **plot() Parameters**:
|
|
264
|
+
- `rank`, `score`: Rank and score data columns.
|
|
265
|
+
- `nes`, `pvalue`, `fdr`: Statistical metrics (displayed in the plot).
|
|
266
|
+
|
|
267
|
+
## 💻 Development
|
|
268
|
+
|
|
269
|
+
Unit test outputs are located in the `packages/plot/tests/output` directory.
|
|
270
|
+
|
|
271
|
+
```bash
|
|
272
|
+
pytest packages/plot/tests
|
|
273
|
+
```
|
|
@@ -0,0 +1,249 @@
|
|
|
1
|
+
# bio-analyze-plot
|
|
2
|
+
|
|
3
|
+
**bio-analyze-plot** is the professional plotting module in the `bio-analyze` toolbox. Built on `matplotlib` and `seaborn`, it aims to generate publication-ready statistical charts and supports one-click switching between journal themes like `Nature` and `Science`.
|
|
4
|
+
|
|
5
|
+
## ✨ Features
|
|
6
|
+
|
|
7
|
+
- **Publication-Ready Themes**: Built-in `nature`, `science`, and `default` themes that automatically adjust fonts, font sizes, line widths, and color palettes.
|
|
8
|
+
- **Wide Data Support**: Supports `.csv`, `.tsv`, `.txt`, `.xlsx`, and `.xls` formats. Specific Excel sheets can be targeted via `--sheet`.
|
|
9
|
+
- **Multi-Format Export**: Supports various image formats including `png`, `pdf`, `svg`, `jpg`, and `tiff`.
|
|
10
|
+
- **LaTeX Support**: Automatically parses LaTeX formulas in axis labels (e.g., `$y = \sin(x)$`).
|
|
11
|
+
- **Unified CLI**: All charts can be invoked through a unified command-line interface.
|
|
12
|
+
|
|
13
|
+
## 📊 Supported Plots
|
|
14
|
+
|
|
15
|
+
### 1. Volcano Plot
|
|
16
|
+
|
|
17
|
+
Used to display the distribution of Differentially Expressed Genes (DEGs), intuitively showing significantly up-regulated and down-regulated genes.
|
|
18
|
+
|
|
19
|
+
- **Command**: `volcano`
|
|
20
|
+
- **Key Parameters**:
|
|
21
|
+
- `-x`: log2 Fold Change column name (Default: "log2FoldChange")
|
|
22
|
+
- `-y`: P-value column name (Default: "pvalue")
|
|
23
|
+
- `--fc-cutoff`: Fold Change threshold
|
|
24
|
+
- `--p-cutoff`: P-value threshold
|
|
25
|
+
- `--labels`: Custom labels (e.g., `{"up": "Up", "down": "Down", "ns": "NS"}`)
|
|
26
|
+
|
|
27
|
+
### 2. Bar Plot
|
|
28
|
+
|
|
29
|
+
Supports bar charts with error bars (SD/SE/CI) and significance markers.
|
|
30
|
+
|
|
31
|
+
- **Command**: `bar`
|
|
32
|
+
- **Key Parameters**:
|
|
33
|
+
- `--error-bar-type`: Error bar type. Options: `SD` (Standard Deviation), `SE` (Standard Error), `CI` (Confidence Interval).
|
|
34
|
+
- `--error-bar-ci`: Confidence level when type is `CI` (Default: 95).
|
|
35
|
+
- `--significance`: Specify group pairs for significance annotation, e.g., "Control,Treated".
|
|
36
|
+
- `--test`: Significance test method. Supports `t-test_ind`, `t-test_welch`, `Mann-Whitney`, etc.
|
|
37
|
+
- `--text-format`: Significance marker format (`star`, `full`, `simple`, `pvalue`).
|
|
38
|
+
|
|
39
|
+
### 3. Box Plot
|
|
40
|
+
|
|
41
|
+
Displays data distribution, supporting overlaid SwarmPlot scatter points and significance markers.
|
|
42
|
+
|
|
43
|
+
- **Command**: `box`
|
|
44
|
+
- **Key Parameters**:
|
|
45
|
+
- `-x`: Grouping column (Categorical)
|
|
46
|
+
- `-y`: Value column (Numerical)
|
|
47
|
+
- `--hue`: Color grouping column
|
|
48
|
+
- `--add-swarm`: Whether to overlay a Swarmplot to show all data points.
|
|
49
|
+
- `--significance`: Group pairs for significance annotation.
|
|
50
|
+
|
|
51
|
+
### 4. Heatmap
|
|
52
|
+
|
|
53
|
+
Used to display clustered heatmaps of gene expression or other matrix data.
|
|
54
|
+
|
|
55
|
+
- **Command**: `heatmap`
|
|
56
|
+
- **Key Parameters**:
|
|
57
|
+
- `--cluster-rows` / `--cluster-cols`: Whether to cluster rows/columns.
|
|
58
|
+
- `--z-score`: Perform Z-score normalization on rows (0) or columns (1).
|
|
59
|
+
|
|
60
|
+
### 5. PCA Plot
|
|
61
|
+
|
|
62
|
+
Displays the distribution of samples in principal component space, supporting automatic clustering ellipses.
|
|
63
|
+
|
|
64
|
+
- **Command**: `pca`
|
|
65
|
+
- **Key Parameters**:
|
|
66
|
+
- `--transpose`: Whether to transpose the matrix (if input is Genes x Samples, it usually needs to be transposed to Samples x Genes).
|
|
67
|
+
- `--hue`: Sample grouping column.
|
|
68
|
+
- `--cluster`: Whether to display clustering confidence ellipses.
|
|
69
|
+
|
|
70
|
+
### 6. Line Plot
|
|
71
|
+
|
|
72
|
+
Used to display time series or trend data, supporting smooth fitting and error bars.
|
|
73
|
+
|
|
74
|
+
- **Command**: `line`
|
|
75
|
+
- **Key Parameters**:
|
|
76
|
+
- `--hue`: Grouping column; different groups are shown in different colors.
|
|
77
|
+
- `--smooth`: Enable smooth curve fitting (B-spline).
|
|
78
|
+
- `--smooth-points`: Number of interpolation points for smoothing (Default: 300).
|
|
79
|
+
- `--error-bar-type`: Error bar type (`SD`, `SE`, `CI`).
|
|
80
|
+
- `--error-bar-ci`: Confidence interval size.
|
|
81
|
+
- `--error-bar-capsize`: Width of the error bar caps.
|
|
82
|
+
- `--markers`: Display markers for original data points.
|
|
83
|
+
|
|
84
|
+
### 7. Scatter Plot
|
|
85
|
+
|
|
86
|
+
Displays the relationship between two variables, supporting confidence ellipses.
|
|
87
|
+
|
|
88
|
+
- **Command**: `scatter`
|
|
89
|
+
- **Key Parameters**:
|
|
90
|
+
- `--x`, `--y`: X/Y axis column names.
|
|
91
|
+
- `--hue`: Color grouping column.
|
|
92
|
+
- `--size`: Column to map point sizes.
|
|
93
|
+
- `--style`: Column to map point styles/shapes.
|
|
94
|
+
- `--add-ellipse`: Draw confidence ellipses for each group.
|
|
95
|
+
- `--ellipse-std`: Standard deviation multiplier for the ellipse (Default: 2.0).
|
|
96
|
+
|
|
97
|
+
### 8. Pie Chart
|
|
98
|
+
|
|
99
|
+
Displays the proportions of categorical data.
|
|
100
|
+
|
|
101
|
+
- **Command**: `pie`
|
|
102
|
+
- **Key Parameters**:
|
|
103
|
+
- `--explode`: Distance to explode sectors.
|
|
104
|
+
- `--autopct`: Percentage display format (Default: "%1.1f%%").
|
|
105
|
+
|
|
106
|
+
### 9. Chromosome Distribution Plot
|
|
107
|
+
|
|
108
|
+
Displays the distribution density of Reads across whole-genome chromosomes.
|
|
109
|
+
|
|
110
|
+
- **Command**: `chromosome`
|
|
111
|
+
- **Description**: Typically used with the `rna_seq` pipeline to show read coverage on positive and negative strands.
|
|
112
|
+
|
|
113
|
+
### 10. GSEA Enrichment Plot
|
|
114
|
+
|
|
115
|
+
Displays the Enrichment Score trend of GSEA analysis.
|
|
116
|
+
|
|
117
|
+
- **Command**: `gsea`
|
|
118
|
+
- **Key Parameters**:
|
|
119
|
+
- `--rank`: Rank value column name.
|
|
120
|
+
- `--score`: Running ES column name.
|
|
121
|
+
- `--nes`: Normalized Enrichment Score (displayed in the title).
|
|
122
|
+
- `--pvalue` / `--fdr`: Statistical significance metrics.
|
|
123
|
+
|
|
124
|
+
## 🎨 Themes
|
|
125
|
+
|
|
126
|
+
Supports customizing plotting themes via JSON files or Python code.
|
|
127
|
+
|
|
128
|
+
### Using Built-in Themes
|
|
129
|
+
|
|
130
|
+
```bash
|
|
131
|
+
# Use Nature style
|
|
132
|
+
bioanalyze plot volcano result.csv --theme nature
|
|
133
|
+
|
|
134
|
+
# Use Science style
|
|
135
|
+
bioanalyze plot volcano result.csv --theme science
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
### Custom Themes (JSON)
|
|
139
|
+
|
|
140
|
+
Create `my_theme.json`:
|
|
141
|
+
|
|
142
|
+
```json
|
|
143
|
+
{
|
|
144
|
+
"name": "dark_presentation",
|
|
145
|
+
"style": "darkgrid",
|
|
146
|
+
"context": "talk",
|
|
147
|
+
"font": "Arial",
|
|
148
|
+
"rc_params": {
|
|
149
|
+
"lines.linewidth": 2.5,
|
|
150
|
+
"axes.labelsize": 14
|
|
151
|
+
}
|
|
152
|
+
}
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
Usage: `bioanalyze plot volcano ... --theme ./my_theme.json`
|
|
156
|
+
|
|
157
|
+
## 📦 Python API
|
|
158
|
+
|
|
159
|
+
All charts can be invoked directly via Python classes, supporting more flexible customization.
|
|
160
|
+
|
|
161
|
+
### Basic Usage
|
|
162
|
+
|
|
163
|
+
```python
|
|
164
|
+
import pandas as pd
|
|
165
|
+
from bio_analyze_plot.plots import VolcanoPlot, PCAPlot
|
|
166
|
+
|
|
167
|
+
# 1. Plot Volcano
|
|
168
|
+
df = pd.read_csv("de_results.csv")
|
|
169
|
+
plotter = VolcanoPlot(theme="nature")
|
|
170
|
+
plotter.plot(
|
|
171
|
+
data=df,
|
|
172
|
+
x="log2FoldChange",
|
|
173
|
+
y="padj",
|
|
174
|
+
fc_cutoff=1.5,
|
|
175
|
+
p_cutoff=0.05,
|
|
176
|
+
title="Differential Expression",
|
|
177
|
+
output="volcano.pdf"
|
|
178
|
+
)
|
|
179
|
+
|
|
180
|
+
# 2. Plot PCA
|
|
181
|
+
counts = pd.read_csv("counts.csv", index_col=0)
|
|
182
|
+
pca = PCAPlot(theme="science")
|
|
183
|
+
pca.plot(
|
|
184
|
+
data=counts,
|
|
185
|
+
transpose=True, # If input is Genes x Samples
|
|
186
|
+
hue=["Control", "Control", "Treat", "Treat"], # Sample grouping
|
|
187
|
+
cluster=True, # Draw confidence ellipses
|
|
188
|
+
output="pca.png"
|
|
189
|
+
)
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
### Chart Class Index
|
|
193
|
+
|
|
194
|
+
#### `VolcanoPlot`
|
|
195
|
+
- **plot() Parameters**:
|
|
196
|
+
- `data` (DataFrame): Data source.
|
|
197
|
+
- `x`, `y` (str): Column names.
|
|
198
|
+
- `log_y` (bool): Whether to apply -log10 to y (Default: True).
|
|
199
|
+
- `fc_cutoff`, `p_cutoff` (float): Threshold lines.
|
|
200
|
+
- `labels` (dict): Legend labels (e.g., `{"up": "Up", "down": "Down", "ns": "NS"}`).
|
|
201
|
+
|
|
202
|
+
#### `HeatmapPlot`
|
|
203
|
+
- **plot() Parameters**:
|
|
204
|
+
- `cluster_rows`, `cluster_cols` (bool): Whether to cluster.
|
|
205
|
+
- `z_score` (int): 0=Row standardization, 1=Column standardization, None=No standardization.
|
|
206
|
+
- `cmap` (str): Colormap (Default: "vlag").
|
|
207
|
+
|
|
208
|
+
#### `BoxPlot`
|
|
209
|
+
- **plot() Parameters**:
|
|
210
|
+
- `significance` (list[tuple]): Pairs for significance markers (e.g., `[("Ctrl", "Treat")]`).
|
|
211
|
+
- `test` (str): Test method (Default: "t-test_ind").
|
|
212
|
+
- `add_swarm` (bool): Whether to overlay scatter points.
|
|
213
|
+
|
|
214
|
+
#### `LinePlot`
|
|
215
|
+
- **plot() Parameters**:
|
|
216
|
+
- `smooth` (bool): Enable smooth curve.
|
|
217
|
+
- `error_bar_type` (str): Error bar type.
|
|
218
|
+
- `markers` (bool/list): Data point markers.
|
|
219
|
+
|
|
220
|
+
#### `ScatterPlot`
|
|
221
|
+
- **plot() Parameters**:
|
|
222
|
+
- `add_ellipse` (bool): Draw confidence ellipses.
|
|
223
|
+
- `ellipse_std` (float): Ellipse standard deviation.
|
|
224
|
+
- `style`, `size` (str): Style/size mapping columns.
|
|
225
|
+
|
|
226
|
+
#### `PCAPlot` (Inherits from ScatterPlot)
|
|
227
|
+
- **plot() Parameters**:
|
|
228
|
+
- `transpose` (bool): Whether to transpose the input matrix.
|
|
229
|
+
- `n_components` (int): Number of principal components.
|
|
230
|
+
- `cluster` (bool): Whether to draw confidence ellipses.
|
|
231
|
+
|
|
232
|
+
#### `ChromosomeDistributionPlot`
|
|
233
|
+
- **plot() Parameters**:
|
|
234
|
+
- `chrom_col`, `pos_col`: Chromosome and position column names.
|
|
235
|
+
- `pos_counts_col`, `neg_counts_col`: Positive and negative strand count columns.
|
|
236
|
+
- `max_chroms` (int): Maximum number of chromosomes to display.
|
|
237
|
+
|
|
238
|
+
#### `GSEAPlot`
|
|
239
|
+
- **plot() Parameters**:
|
|
240
|
+
- `rank`, `score`: Rank and score data columns.
|
|
241
|
+
- `nes`, `pvalue`, `fdr`: Statistical metrics (displayed in the plot).
|
|
242
|
+
|
|
243
|
+
## 💻 Development
|
|
244
|
+
|
|
245
|
+
Unit test outputs are located in the `packages/plot/tests/output` directory.
|
|
246
|
+
|
|
247
|
+
```bash
|
|
248
|
+
pytest packages/plot/tests
|
|
249
|
+
```
|