pathview-plus 2.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pathview_plus-2.0.0/PKG-INFO +661 -0
- pathview_plus-2.0.0/README.md +621 -0
- pathview_plus-2.0.0/bin/pathview-cli.py +252 -0
- pathview_plus-2.0.0/lib/__init__.py +124 -0
- pathview_plus-2.0.0/lib/color_mapping.py +153 -0
- pathview_plus-2.0.0/lib/constants.py +27 -0
- pathview_plus-2.0.0/lib/databases.py +309 -0
- pathview_plus-2.0.0/lib/examples.py +342 -0
- pathview_plus-2.0.0/lib/highlighting.py +375 -0
- pathview_plus-2.0.0/lib/id_mapping.py +170 -0
- pathview_plus-2.0.0/lib/kegg_api.py +143 -0
- pathview_plus-2.0.0/lib/kgml_parser.py +189 -0
- pathview_plus-2.0.0/lib/mol_data.py +168 -0
- pathview_plus-2.0.0/lib/node_mapping.py +99 -0
- pathview_plus-2.0.0/lib/pathview.py +316 -0
- pathview_plus-2.0.0/lib/rendering.py +409 -0
- pathview_plus-2.0.0/lib/sbgn_parser.py +353 -0
- pathview_plus-2.0.0/lib/splines.py +304 -0
- pathview_plus-2.0.0/lib/svg_rendering.py +305 -0
- pathview_plus-2.0.0/lib/test_all_features.py +343 -0
- pathview_plus-2.0.0/lib/utils.py +80 -0
- pathview_plus-2.0.0/pathview_plus.egg-info/PKG-INFO +661 -0
- pathview_plus-2.0.0/pathview_plus.egg-info/SOURCES.txt +26 -0
- pathview_plus-2.0.0/pathview_plus.egg-info/dependency_links.txt +1 -0
- pathview_plus-2.0.0/pathview_plus.egg-info/requires.txt +15 -0
- pathview_plus-2.0.0/pathview_plus.egg-info/top_level.txt +1 -0
- pathview_plus-2.0.0/setup.cfg +4 -0
- pathview_plus-2.0.0/setup.py +53 -0
|
@@ -0,0 +1,661 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pathview-plus
|
|
3
|
+
Version: 2.0.0
|
|
4
|
+
Summary: Complete pathway visualization: KEGG + SBGN + highlighting + splines
|
|
5
|
+
Home-page: https://github.com/raw-lab/pathview-plus
|
|
6
|
+
Author: Richard Allen White III
|
|
7
|
+
Classifier: Development Status :: 4 - Beta
|
|
8
|
+
Classifier: Intended Audience :: Science/Research
|
|
9
|
+
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
|
|
10
|
+
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
|
|
11
|
+
Classifier: Programming Language :: Python :: 3
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
16
|
+
Requires-Python: >=3.10
|
|
17
|
+
Description-Content-Type: text/markdown
|
|
18
|
+
Requires-Dist: polars>=0.19.0
|
|
19
|
+
Requires-Dist: numpy>=1.24.0
|
|
20
|
+
Requires-Dist: matplotlib>=3.7.0
|
|
21
|
+
Requires-Dist: seaborn>=0.12.0
|
|
22
|
+
Requires-Dist: Pillow>=10.0.0
|
|
23
|
+
Requires-Dist: networkx>=3.1
|
|
24
|
+
Requires-Dist: requests>=2.31.0
|
|
25
|
+
Provides-Extra: dev
|
|
26
|
+
Requires-Dist: pytest>=7.0; extra == "dev"
|
|
27
|
+
Requires-Dist: black>=23.0; extra == "dev"
|
|
28
|
+
Requires-Dist: mypy>=1.0; extra == "dev"
|
|
29
|
+
Provides-Extra: fast
|
|
30
|
+
Requires-Dist: lxml>=4.9.0; extra == "fast"
|
|
31
|
+
Dynamic: author
|
|
32
|
+
Dynamic: classifier
|
|
33
|
+
Dynamic: description
|
|
34
|
+
Dynamic: description-content-type
|
|
35
|
+
Dynamic: home-page
|
|
36
|
+
Dynamic: provides-extra
|
|
37
|
+
Dynamic: requires-dist
|
|
38
|
+
Dynamic: requires-python
|
|
39
|
+
Dynamic: summary
|
|
40
|
+
|
|
41
|
+
# Pathview-plus — Complete Pathway Visualization
|
|
42
|
+
|
|
43
|
+
**Full-featured Python implementation of R pathview + SBGNview with support for KEGG, Reactome, MetaCyc, and more.**
|
|
44
|
+
|
|
45
|
+
[](https://www.python.org/downloads/)
|
|
46
|
+
[](https://www.gnu.org/licenses/gpl-3.0)
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## 🎯 Features
|
|
51
|
+
|
|
52
|
+
### Core Capabilities
|
|
53
|
+
- ✅ **KEGG Pathways** — Download and visualize any KEGG pathway
|
|
54
|
+
- ✅ **SBGN Pathways** — Support for Reactome, MetaCyc, PANTHER, SMPDB
|
|
55
|
+
- ✅ **Multiple Formats** — PNG (native overlay), SVG (vector), PDF (graph layout)
|
|
56
|
+
- ✅ **Gene & Metabolite Data** — Overlay expression and abundance data
|
|
57
|
+
- ✅ **Multi-Condition** — Visualize multiple experiments side-by-side
|
|
58
|
+
- ✅ **ID Conversion** — Automatic mapping: Entrez ↔ Symbol ↔ UniProt ↔ Ensembl
|
|
59
|
+
- ✅ **Highlighting** — Post-hoc emphasis of specific nodes/edges/paths
|
|
60
|
+
- ✅ **Spline Curves** — Smooth Bezier edge routing
|
|
61
|
+
- ✅ **Custom Colors** — Configurable diverging color scales
|
|
62
|
+
|
|
63
|
+
### New in v2.0
|
|
64
|
+
- 🆕 **Full SBGN-ML support** — Parse and render SBGN Process Description files
|
|
65
|
+
- 🆕 **Database integration** — Direct download from Reactome, MetaCyc
|
|
66
|
+
- 🆕 **SVG vector output** — Scalable graphics for web and publication
|
|
67
|
+
- 🆕 **Highlighting system** — ggplot2-style composable modifications
|
|
68
|
+
- 🆕 **Spline rendering** — Cubic Bezier and Catmull-Rom curves
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## 📦 Installation
|
|
73
|
+
|
|
74
|
+
### Quick install
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
pip install pathview-plus
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### Custom install
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
# Clone repository
|
|
84
|
+
git clone https://github.com/raw-lab/pathview-plus
|
|
85
|
+
cd pathview-plus
|
|
86
|
+
|
|
87
|
+
# Install dependencies
|
|
88
|
+
pip install -r requirements.txt
|
|
89
|
+
pip install .
|
|
90
|
+
|
|
91
|
+
# Or install specific packages
|
|
92
|
+
pip install polars numpy matplotlib seaborn Pillow networkx requests
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
**Dependencies:**
|
|
96
|
+
- Python ≥ 3.10
|
|
97
|
+
- polars ≥ 0.19.0
|
|
98
|
+
- matplotlib ≥ 3.7.0
|
|
99
|
+
- seaborn ≥ 0.12.0
|
|
100
|
+
- numpy ≥ 1.24.0
|
|
101
|
+
- Pillow ≥ 10.0.0
|
|
102
|
+
- networkx ≥ 3.1
|
|
103
|
+
- requests ≥ 2.31.0
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## 🚀 Quick Start
|
|
108
|
+
|
|
109
|
+
### 1. Basic KEGG Pathway
|
|
110
|
+
|
|
111
|
+
```python
|
|
112
|
+
import polars as pl
|
|
113
|
+
from pathview import pathview
|
|
114
|
+
|
|
115
|
+
# Load your data
|
|
116
|
+
gene_data = pl.read_csv("gene_expr.tsv", separator="\t")
|
|
117
|
+
|
|
118
|
+
# Visualize on KEGG pathway
|
|
119
|
+
result = pathview(
|
|
120
|
+
pathway_id="04110", # Cell cycle
|
|
121
|
+
gene_data=gene_data,
|
|
122
|
+
species="hsa",
|
|
123
|
+
output_format="png"
|
|
124
|
+
)
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### 2. Reactome SBGN Pathway
|
|
128
|
+
|
|
129
|
+
```python
|
|
130
|
+
from pathview import download_reactome, parse_sbgn, sbgn_to_df, pathview
|
|
131
|
+
|
|
132
|
+
# Download Reactome pathway
|
|
133
|
+
path = download_reactome("R-HSA-109582") # Hemostasis
|
|
134
|
+
|
|
135
|
+
# Parse and visualize
|
|
136
|
+
pathway = parse_sbgn(path)
|
|
137
|
+
node_df = sbgn_to_df(pathway)
|
|
138
|
+
|
|
139
|
+
# Overlay data
|
|
140
|
+
result = pathview(
|
|
141
|
+
pathway_id="R-HSA-109582",
|
|
142
|
+
gene_data=gene_data,
|
|
143
|
+
output_format="svg" # Vector graphics
|
|
144
|
+
)
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### 3. Multi-Condition Comparison
|
|
148
|
+
|
|
149
|
+
```python
|
|
150
|
+
# Three experimental conditions
|
|
151
|
+
gene_data = pl.DataFrame({
|
|
152
|
+
"entrez": ["1956", "2099", "5594", "207"],
|
|
153
|
+
"Control": [0.5, -0.3, 1.2, -0.8],
|
|
154
|
+
"Treatment_A": [2.1, -1.5, 0.4, 1.3],
|
|
155
|
+
"Treatment_B": [1.8, -0.9, 2.3, 0.7],
|
|
156
|
+
})
|
|
157
|
+
|
|
158
|
+
result = pathview(
|
|
159
|
+
pathway_id="04010", # MAPK signaling
|
|
160
|
+
gene_data=gene_data,
|
|
161
|
+
species="hsa",
|
|
162
|
+
limit={"gene": 2.5, "cpd": 1.5},
|
|
163
|
+
)
|
|
164
|
+
# Each node shows 3 color bands (one per condition)
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
### 4. Custom Color Schemes
|
|
168
|
+
|
|
169
|
+
```python
|
|
170
|
+
result = pathview(
|
|
171
|
+
pathway_id="04151",
|
|
172
|
+
gene_data=gene_data,
|
|
173
|
+
species="hsa",
|
|
174
|
+
low={"gene": "#2166AC", "cpd": "#4575B4"}, # Blue
|
|
175
|
+
mid={"gene": "#F7F7F7", "cpd": "#F7F7F7"}, # White
|
|
176
|
+
high={"gene": "#D6604D", "cpd": "#B2182B"}, # Red
|
|
177
|
+
)
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
---
|
|
181
|
+
|
|
182
|
+
## 📖 Complete Examples
|
|
183
|
+
|
|
184
|
+
### Example 1: Gene Symbol IDs
|
|
185
|
+
|
|
186
|
+
```python
|
|
187
|
+
gene_data = pl.DataFrame({
|
|
188
|
+
"symbol": ["TP53", "EGFR", "KRAS", "PIK3CA", "AKT1"],
|
|
189
|
+
"log2fc": [-1.8, 2.4, 1.1, 1.5, 0.9],
|
|
190
|
+
})
|
|
191
|
+
|
|
192
|
+
result = pathview(
|
|
193
|
+
pathway_id="04151",
|
|
194
|
+
gene_data=gene_data,
|
|
195
|
+
species="hsa",
|
|
196
|
+
gene_idtype="SYMBOL", # Automatic conversion to Entrez
|
|
197
|
+
)
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
### Example 2: Combined Gene + Metabolite
|
|
201
|
+
|
|
202
|
+
```python
|
|
203
|
+
from pathview import sim_mol_data
|
|
204
|
+
|
|
205
|
+
gene_data = sim_mol_data(mol_type="gene", species="hsa", n_mol=80)
|
|
206
|
+
cpd_data = sim_mol_data(mol_type="cpd", n_mol=30)
|
|
207
|
+
|
|
208
|
+
result = pathview(
|
|
209
|
+
pathway_id="00010", # Glycolysis
|
|
210
|
+
gene_data=gene_data,
|
|
211
|
+
cpd_data=cpd_data,
|
|
212
|
+
species="hsa",
|
|
213
|
+
low={"gene": "green", "cpd": "blue"},
|
|
214
|
+
high={"gene": "red", "cpd": "yellow"},
|
|
215
|
+
)
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
### Example 3: SVG Vector Output
|
|
219
|
+
|
|
220
|
+
```python
|
|
221
|
+
result = pathview(
|
|
222
|
+
pathway_id="04110",
|
|
223
|
+
gene_data=gene_data,
|
|
224
|
+
species="hsa",
|
|
225
|
+
output_format="svg", # Scalable vector graphics
|
|
226
|
+
)
|
|
227
|
+
# Output: hsa04110.pathview.svg
|
|
228
|
+
# - Scalable without quality loss
|
|
229
|
+
# - Smaller file size
|
|
230
|
+
# - Editable in Inkscape/Illustrator
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
### Example 4: Graph Layout (No PNG Background)
|
|
234
|
+
|
|
235
|
+
```python
|
|
236
|
+
result = pathview(
|
|
237
|
+
pathway_id="04010",
|
|
238
|
+
gene_data=gene_data,
|
|
239
|
+
species="hsa",
|
|
240
|
+
kegg_native=False, # Use NetworkX layout
|
|
241
|
+
output_format="pdf",
|
|
242
|
+
)
|
|
243
|
+
# Output: hsa04010.pathview.pdf
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
### Example 5: Highlighting (API Preview)
|
|
247
|
+
|
|
248
|
+
```python
|
|
249
|
+
from pathview import highlight_nodes, highlight_path
|
|
250
|
+
|
|
251
|
+
result = pathview("04010", gene_data=data)
|
|
252
|
+
|
|
253
|
+
# Composable modifications (ggplot2-style)
|
|
254
|
+
highlighted = (result
|
|
255
|
+
+ highlight_nodes(["1956", "2099"], color="red", width=4)
|
|
256
|
+
+ highlight_path(["1956", "2099", "5594"], color="orange"))
|
|
257
|
+
|
|
258
|
+
highlighted.save("highlighted.png")
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
### Example 6: Spline Curves
|
|
262
|
+
|
|
263
|
+
```python
|
|
264
|
+
from pathview import cubic_bezier, catmull_rom_spline
|
|
265
|
+
import matplotlib.pyplot as plt
|
|
266
|
+
|
|
267
|
+
# Smooth Bezier curve
|
|
268
|
+
curve = cubic_bezier((0,0), (1,2), (3,2), (4,0), n_points=100)
|
|
269
|
+
|
|
270
|
+
plt.plot(curve[:, 0], curve[:, 1], linewidth=2)
|
|
271
|
+
plt.title("Bezier Curve Edge Routing")
|
|
272
|
+
plt.savefig("bezier_example.png")
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
### Example 7: Batch Processing
|
|
276
|
+
|
|
277
|
+
```python
|
|
278
|
+
pathways = ["04110", "04010", "04151", "00010"]
|
|
279
|
+
|
|
280
|
+
for pw_id in pathways:
|
|
281
|
+
try:
|
|
282
|
+
result = pathview(
|
|
283
|
+
pathway_id=pw_id,
|
|
284
|
+
gene_data=gene_data,
|
|
285
|
+
species="hsa",
|
|
286
|
+
out_suffix=f"batch_{pw_id}",
|
|
287
|
+
)
|
|
288
|
+
print(f"✓ Completed {pw_id}")
|
|
289
|
+
except Exception as e:
|
|
290
|
+
print(f"✗ Failed {pw_id}: {e}")
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
---
|
|
294
|
+
|
|
295
|
+
## 🖥️ Command Line Interface
|
|
296
|
+
|
|
297
|
+
```bash
|
|
298
|
+
# Basic usage
|
|
299
|
+
python pathview_cli.py --pathway-id 04110 --gene-data expr.tsv
|
|
300
|
+
|
|
301
|
+
# Specify species and ID type
|
|
302
|
+
python pathview_cli.py \
|
|
303
|
+
--pathway-id 04110 \
|
|
304
|
+
--species hsa \
|
|
305
|
+
--gene-data expr.tsv \
|
|
306
|
+
--gene-idtype SYMBOL
|
|
307
|
+
|
|
308
|
+
# Custom colors
|
|
309
|
+
python pathview_cli.py \
|
|
310
|
+
--pathway-id 04010 \
|
|
311
|
+
--gene-data expr.tsv \
|
|
312
|
+
--low-gene '#2166AC' \
|
|
313
|
+
--high-gene '#D6604D' \
|
|
314
|
+
--output-format svg
|
|
315
|
+
|
|
316
|
+
# Simulate data (for testing)
|
|
317
|
+
python pathview_cli.py \
|
|
318
|
+
--pathway-id 04110 \
|
|
319
|
+
--simulate \
|
|
320
|
+
--n-sim 200
|
|
321
|
+
|
|
322
|
+
# Display KEGG legend
|
|
323
|
+
python pathview_cli.py --legend
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
**CLI Arguments:**
|
|
327
|
+
|
|
328
|
+
```
|
|
329
|
+
Pathway:
|
|
330
|
+
--pathway-id ID KEGG pathway number (e.g., '04110')
|
|
331
|
+
|
|
332
|
+
Input data:
|
|
333
|
+
--gene-data TSV Gene expression file (TSV)
|
|
334
|
+
--cpd-data TSV Compound abundance file (TSV)
|
|
335
|
+
--gene-idtype TYPE Gene ID type: ENTREZ, SYMBOL, UNIPROT, ENSEMBL
|
|
336
|
+
--cpd-idtype TYPE Compound ID type: KEGG, PUBCHEM, CHEBI
|
|
337
|
+
|
|
338
|
+
Species & paths:
|
|
339
|
+
--species CODE KEGG species code (default: hsa)
|
|
340
|
+
--kegg-dir DIR Directory for files (default: .)
|
|
341
|
+
--out-suffix SUFFIX Output filename suffix (default: pathview)
|
|
342
|
+
|
|
343
|
+
Rendering:
|
|
344
|
+
--kegg-native Use KEGG PNG background (default: True)
|
|
345
|
+
--output-format FORMAT Output format: png, pdf, svg (default: png)
|
|
346
|
+
--map-symbol Replace Entrez with symbols (default: True)
|
|
347
|
+
--node-sum METHOD Aggregation: sum, mean, median, max
|
|
348
|
+
--no-signature Suppress watermark
|
|
349
|
+
--no-col-key Suppress color legend
|
|
350
|
+
|
|
351
|
+
Color scale:
|
|
352
|
+
--limit-gene FLOAT Color scale limit (default: 1.0)
|
|
353
|
+
--bins-gene INT Color bins (default: 10)
|
|
354
|
+
--low-gene COLOR Low-end color (default: green)
|
|
355
|
+
--mid-gene COLOR Mid-point color (default: gray)
|
|
356
|
+
--high-gene COLOR High-end color (default: red)
|
|
357
|
+
--low-cpd COLOR Low compound color (default: blue)
|
|
358
|
+
--high-cpd COLOR High compound color (default: yellow)
|
|
359
|
+
|
|
360
|
+
Utilities:
|
|
361
|
+
--legend Display KEGG legend and exit
|
|
362
|
+
--simulate Generate simulated data
|
|
363
|
+
--n-sim INT Number of simulated molecules (default: 200)
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
---
|
|
367
|
+
|
|
368
|
+
## 📊 Input File Formats
|
|
369
|
+
|
|
370
|
+
### Gene Data (TSV)
|
|
371
|
+
|
|
372
|
+
First column = gene IDs, remaining columns = numeric expression values.
|
|
373
|
+
|
|
374
|
+
```tsv
|
|
375
|
+
entrez Control Treatment_A Treatment_B
|
|
376
|
+
1956 2.31 0.45 1.82
|
|
377
|
+
2099 -1.14 -0.88 0.33
|
|
378
|
+
5594 0.72 1.33 -0.51
|
|
379
|
+
207 -0.88 1.21 0.94
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
### Gene Symbols
|
|
383
|
+
|
|
384
|
+
```tsv
|
|
385
|
+
gene_symbol log2fc p_value
|
|
386
|
+
TP53 -1.8 0.001
|
|
387
|
+
EGFR 2.4 0.0001
|
|
388
|
+
KRAS 1.1 0.01
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
### Compound Data (TSV)
|
|
392
|
+
|
|
393
|
+
```tsv
|
|
394
|
+
kegg abundance
|
|
395
|
+
C00031 1.45
|
|
396
|
+
C00118 -0.83
|
|
397
|
+
C00022 2.11
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
---
|
|
401
|
+
|
|
402
|
+
## 🎨 Color Scale Configuration
|
|
403
|
+
|
|
404
|
+
### Three-Point Diverging Scale
|
|
405
|
+
|
|
406
|
+
```python
|
|
407
|
+
pathview(
|
|
408
|
+
pathway_id="04110",
|
|
409
|
+
gene_data=data,
|
|
410
|
+
limit={"gene": 2.0, "cpd": 1.5}, # ±2.0 for genes, ±1.5 for compounds
|
|
411
|
+
bins={"gene": 20, "cpd": 10}, # Color resolution
|
|
412
|
+
low={"gene": "blue", "cpd": "green"},
|
|
413
|
+
mid={"gene": "white", "cpd": "gray"},
|
|
414
|
+
high={"gene": "red", "cpd": "yellow"},
|
|
415
|
+
)
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
The scale maps:
|
|
419
|
+
- `low value` → `low color` (default: green/blue)
|
|
420
|
+
- `0` → `mid color` (default: gray)
|
|
421
|
+
- `high value` → `high color` (default: red/yellow)
|
|
422
|
+
|
|
423
|
+
### One-Directional Scale
|
|
424
|
+
|
|
425
|
+
```python
|
|
426
|
+
both_dirs={"gene": False, "cpd": False}
|
|
427
|
+
# Maps: 0 (mid) → max (high)
|
|
428
|
+
```
|
|
429
|
+
|
|
430
|
+
---
|
|
431
|
+
|
|
432
|
+
## 🗂️ Supported ID Types
|
|
433
|
+
|
|
434
|
+
### Gene IDs
|
|
435
|
+
|
|
436
|
+
| Type | Value | Example |
|
|
437
|
+
|------|-------|---------|
|
|
438
|
+
| Entrez | `ENTREZ` | `1956` |
|
|
439
|
+
| Symbol | `SYMBOL` | `EGFR` |
|
|
440
|
+
| UniProt | `UNIPROT` | `P00533` |
|
|
441
|
+
| Ensembl | `ENSEMBL` | `ENSG00000146648` |
|
|
442
|
+
| KEGG | `KEGG` | `hsa:1956` |
|
|
443
|
+
|
|
444
|
+
### Compound IDs
|
|
445
|
+
|
|
446
|
+
| Type | Value | Example |
|
|
447
|
+
|------|-------|---------|
|
|
448
|
+
| KEGG | `KEGG` | `C00031` |
|
|
449
|
+
| PubChem | `PUBCHEM` | `5793` |
|
|
450
|
+
| ChEBI | `CHEBI` | `4167` |
|
|
451
|
+
|
|
452
|
+
---
|
|
453
|
+
|
|
454
|
+
## 🧬 Supported Databases
|
|
455
|
+
|
|
456
|
+
### KEGG
|
|
457
|
+
- **Format:** KGML (XML)
|
|
458
|
+
- **Species:** 500+ organisms
|
|
459
|
+
- **Download:** Automatic via KEGG REST API
|
|
460
|
+
- **Example:** `pathway_id="hsa04110"`
|
|
461
|
+
|
|
462
|
+
### Reactome
|
|
463
|
+
- **Format:** SBGN-ML
|
|
464
|
+
- **Species:** Human, mouse, rat, and more
|
|
465
|
+
- **Download:** `download_reactome("R-HSA-109582")`
|
|
466
|
+
- **Example:** Hemostasis, Immune System, Signaling
|
|
467
|
+
|
|
468
|
+
### MetaCyc
|
|
469
|
+
- **Format:** SBGN-ML
|
|
470
|
+
- **Coverage:** 2,800+ metabolic pathways
|
|
471
|
+
- **Download:** `download_metacyc("PWY-7210")`
|
|
472
|
+
- **Example:** Pyrimidine biosynthesis
|
|
473
|
+
|
|
474
|
+
### PANTHER
|
|
475
|
+
- **Format:** SBGN-ML
|
|
476
|
+
- **Coverage:** 177 signaling and metabolic pathways
|
|
477
|
+
- **Note:** Manual download required
|
|
478
|
+
|
|
479
|
+
### SMPDB
|
|
480
|
+
- **Format:** SBGN-ML
|
|
481
|
+
- **Coverage:** Small molecule pathways
|
|
482
|
+
- **Note:** Manual download from website
|
|
483
|
+
|
|
484
|
+
---
|
|
485
|
+
|
|
486
|
+
## 🏗️ Architecture
|
|
487
|
+
|
|
488
|
+
```
|
|
489
|
+
pathview/
|
|
490
|
+
├── __init__.py # Public API exports
|
|
491
|
+
├── constants.py # Type definitions
|
|
492
|
+
├── utils.py # String/numeric utilities
|
|
493
|
+
│
|
|
494
|
+
├── id_mapping.py # Gene/compound ID conversion
|
|
495
|
+
├── mol_data.py # Data aggregation, simulation
|
|
496
|
+
│
|
|
497
|
+
├── kegg_api.py # KEGG REST API
|
|
498
|
+
├── databases.py # Reactome, MetaCyc downloaders
|
|
499
|
+
│
|
|
500
|
+
├── kgml_parser.py # KEGG KGML (XML) parser
|
|
501
|
+
├── sbgn_parser.py # SBGN-ML (XML) parser
|
|
502
|
+
│
|
|
503
|
+
├── color_mapping.py # Colormaps, node coloring
|
|
504
|
+
├── node_mapping.py # Map data onto nodes
|
|
505
|
+
│
|
|
506
|
+
├── rendering.py # PNG/PDF renderers
|
|
507
|
+
├── svg_rendering.py # SVG vector renderer
|
|
508
|
+
├── highlighting.py # Post-hoc modifications
|
|
509
|
+
├── splines.py # Bezier curve math
|
|
510
|
+
│
|
|
511
|
+
└── pathview.py # Core orchestrator
|
|
512
|
+
|
|
513
|
+
pathview_cli.py # Command-line interface
|
|
514
|
+
requirements.txt # Dependencies
|
|
515
|
+
README.md # This file
|
|
516
|
+
```
|
|
517
|
+
|
|
518
|
+
**Module Statistics:**
|
|
519
|
+
- **15 modules** | **3,506 lines of code**
|
|
520
|
+
- Functional programming style
|
|
521
|
+
- Full type hints
|
|
522
|
+
- Comprehensive docstrings
|
|
523
|
+
|
|
524
|
+
---
|
|
525
|
+
|
|
526
|
+
## 🔧 API Reference
|
|
527
|
+
|
|
528
|
+
### Core Function
|
|
529
|
+
|
|
530
|
+
```python
|
|
531
|
+
pathview(
|
|
532
|
+
pathway_id: str,
|
|
533
|
+
gene_data: Optional[pl.DataFrame] = None,
|
|
534
|
+
cpd_data: Optional[pl.DataFrame] = None,
|
|
535
|
+
species: str = "hsa",
|
|
536
|
+
kegg_dir: Path = ".",
|
|
537
|
+
kegg_native: bool = True,
|
|
538
|
+
output_format: str = "png", # "png", "pdf", "svg"
|
|
539
|
+
gene_idtype: str = "ENTREZ",
|
|
540
|
+
cpd_idtype: str = "KEGG",
|
|
541
|
+
out_suffix: str = "pathview",
|
|
542
|
+
node_sum: str = "sum",
|
|
543
|
+
map_symbol: bool = True,
|
|
544
|
+
map_null: bool = True,
|
|
545
|
+
min_nnodes: int = 3,
|
|
546
|
+
new_signature: bool = True,
|
|
547
|
+
plot_col_key: bool = True,
|
|
548
|
+
# Color scale parameters
|
|
549
|
+
limit: dict = {"gene": 1.0, "cpd": 1.0},
|
|
550
|
+
bins: dict = {"gene": 10, "cpd": 10},
|
|
551
|
+
both_dirs: dict = {"gene": True, "cpd": True},
|
|
552
|
+
low: dict = {"gene": "green", "cpd": "blue"},
|
|
553
|
+
mid: dict = {"gene": "gray", "cpd": "gray"},
|
|
554
|
+
high: dict = {"gene": "red", "cpd": "yellow"},
|
|
555
|
+
na_col: str = "transparent",
|
|
556
|
+
) -> dict
|
|
557
|
+
```
|
|
558
|
+
|
|
559
|
+
### Data Functions
|
|
560
|
+
|
|
561
|
+
```python
|
|
562
|
+
sim_mol_data(mol_type="gene", species="hsa", n_mol=100, n_exp=1) → pl.DataFrame
|
|
563
|
+
mol_sum(mol_data, id_map, sum_method="sum") → pl.DataFrame
|
|
564
|
+
```
|
|
565
|
+
|
|
566
|
+
### ID Mapping
|
|
567
|
+
|
|
568
|
+
```python
|
|
569
|
+
id2eg(ids, category, org="Hs") → pl.DataFrame
|
|
570
|
+
eg2id(eg_ids, category="SYMBOL", org="Hs") → pl.DataFrame
|
|
571
|
+
cpd_id_map(in_ids, in_type, out_type="KEGG") → pl.DataFrame
|
|
572
|
+
```
|
|
573
|
+
|
|
574
|
+
### Parsing
|
|
575
|
+
|
|
576
|
+
```python
|
|
577
|
+
# KEGG
|
|
578
|
+
parse_kgml(filepath) → KGMLPathway
|
|
579
|
+
node_info(pathway) → pl.DataFrame
|
|
580
|
+
|
|
581
|
+
# SBGN
|
|
582
|
+
parse_sbgn(filepath) → SBGNPathway
|
|
583
|
+
sbgn_to_df(pathway) → pl.DataFrame
|
|
584
|
+
```
|
|
585
|
+
|
|
586
|
+
### Database Downloads
|
|
587
|
+
|
|
588
|
+
```python
|
|
589
|
+
download_kegg(pathway_id, species="hsa", kegg_dir=".") → dict
|
|
590
|
+
download_reactome(pathway_id, output_dir=".") → Path
|
|
591
|
+
download_metacyc(pathway_id, output_dir=".") → Path
|
|
592
|
+
list_reactome_pathways(species="Homo sapiens") → list[dict]
|
|
593
|
+
detect_database(pathway_id) → str
|
|
594
|
+
```
|
|
595
|
+
|
|
596
|
+
### Highlighting
|
|
597
|
+
|
|
598
|
+
```python
|
|
599
|
+
# API design (full implementation in progress)
|
|
600
|
+
result = pathview(...)
|
|
601
|
+
highlighted = result + highlight_nodes(["1956", "2099"], color="red")
|
|
602
|
+
highlighted.save("output.png")
|
|
603
|
+
```
|
|
604
|
+
|
|
605
|
+
### Splines
|
|
606
|
+
|
|
607
|
+
```python
|
|
608
|
+
cubic_bezier(p0, p1, p2, p3, n_points=50) → np.ndarray
|
|
609
|
+
quadratic_bezier(p0, p1, p2, n_points=50) → np.ndarray
|
|
610
|
+
catmull_rom_spline(points, n_points=50, alpha=0.5) → np.ndarray
|
|
611
|
+
route_edge_spline(source, target, obstacles, mode="orthogonal") → np.ndarray
|
|
612
|
+
bezier_to_svg_path(curve, close=False) → str
|
|
613
|
+
```
|
|
614
|
+
|
|
615
|
+
---
|
|
616
|
+
|
|
617
|
+
## 📈 Performance
|
|
618
|
+
|
|
619
|
+
- **KEGG pathways:** ~2-5 seconds (download + render)
|
|
620
|
+
- **SBGN pathways:** ~3-8 seconds (more complex)
|
|
621
|
+
- **Multi-condition:** Linear scaling with # conditions
|
|
622
|
+
- **Batch processing:** Parallel processing possible
|
|
623
|
+
|
|
624
|
+
**Optimization tips:**
|
|
625
|
+
- Cache downloaded files (automatic)
|
|
626
|
+
- Use `output_format="svg"` for faster rendering
|
|
627
|
+
- Disable color key for batch jobs: `plot_col_key=False`
|
|
628
|
+
|
|
629
|
+
---
|
|
630
|
+
|
|
631
|
+
## 🤝 Contributing
|
|
632
|
+
|
|
633
|
+
Contributions welcome! Areas for improvement:
|
|
634
|
+
|
|
635
|
+
1. **SBGN rendering** — Improve glyph shape variety
|
|
636
|
+
2. **Edge routing** — Implement A* pathfinding for splines
|
|
637
|
+
3. **Database integration** — Add PANTHER, SMPDB auto-download
|
|
638
|
+
4. **Highlighting** — Wire up image modification backend
|
|
639
|
+
5. **Performance** — Parallel pathway processing
|
|
640
|
+
|
|
641
|
+
---
|
|
642
|
+
|
|
643
|
+
## 📄 License
|
|
644
|
+
|
|
645
|
+
GPL v3.0 — See LICENSE file
|
|
646
|
+
|
|
647
|
+
**Citations:**
|
|
648
|
+
- Original R pathview: Luo & Brouwer (2013). Bioinformatics
|
|
649
|
+
- SBGNview: Shashikant et al. (2022). Bioinformatics
|
|
650
|
+
- KEGG: Kanehisa et al. (2023). Nucleic Acids Research
|
|
651
|
+
- Reactome: Gillespie et al. (2022). Nucleic Acids Research
|
|
652
|
+
|
|
653
|
+
---
|
|
654
|
+
|
|
655
|
+
## 📞 Support
|
|
656
|
+
|
|
657
|
+
- **Issues:** https://github.com/raw-lab/pathview-plus/issues
|
|
658
|
+
- **Email:** rwhit101@charlotte.edu
|
|
659
|
+
---
|
|
660
|
+
|
|
661
|
+
**Made with ❤️ for the pathway visualization community**
|