celljanus 0.1.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Zhaoqing Wang
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,326 @@
1
+ Metadata-Version: 2.4
2
+ Name: celljanus
3
+ Version: 0.1.3
4
+ Summary: CellJanus: Dual-Perspective Deconvolution of Host and Microbial Transcriptomes from FASTQ Data
5
+ Author: Zhaoqing Wang
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/zhaoqing-wang/CellJanus
8
+ Project-URL: Repository, https://github.com/zhaoqing-wang/CellJanus
9
+ Project-URL: Changelog, https://github.com/zhaoqing-wang/CellJanus/blob/main/CHANGELOG.md
10
+ Project-URL: Bug Tracker, https://github.com/zhaoqing-wang/CellJanus/issues
11
+ Keywords: bioinformatics,metagenomics,single-cell,spatial-transcriptomics,microbiome,host-microbe,deconvolution,docker
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Programming Language :: Python :: 3.13
22
+ Requires-Python: >=3.9
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE
25
+ Requires-Dist: click>=8.1
26
+ Requires-Dist: rich>=13.0
27
+ Requires-Dist: pandas>=2.0
28
+ Requires-Dist: numpy>=1.24
29
+ Requires-Dist: matplotlib>=3.7
30
+ Requires-Dist: seaborn>=0.12
31
+ Requires-Dist: biopython>=1.81
32
+ Requires-Dist: requests>=2.31
33
+ Requires-Dist: tqdm>=4.65
34
+ Requires-Dist: psutil>=5.9
35
+ Provides-Extra: dev
36
+ Requires-Dist: pytest>=7.0; extra == "dev"
37
+ Requires-Dist: pytest-cov>=4.0; extra == "dev"
38
+ Requires-Dist: black>=23.0; extra == "dev"
39
+ Requires-Dist: ruff>=0.1; extra == "dev"
40
+ Dynamic: license-file
41
+
42
+ <table>
43
+ <tr>
44
+ <td>
45
+ <h1>CellJanus: Dual-Perspective Deconvolution of Host and Microbial Transcriptomes from FASTQ Data</h1>
46
+ <p>
47
+ <a href="https://github.com/zhaoqing-wang/CellJanus/releases"><img src="https://img.shields.io/badge/dynamic/toml?url=https%3A%2F%2Fraw.githubusercontent.com%2Fzhaoqing-wang%2FCellJanus%2Fmain%2Fpyproject.toml&label=Version&query=project.version&color=blue&style=flat-square" alt="Version" /></a>
48
+ <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License: MIT" /></a>
49
+ <a href="https://www.python.org/"><img src="https://img.shields.io/badge/Python-3.9%2B-blue.svg" alt="Python 3.9+" /></a>
50
+ <a href="https://github.com/zhaoqing-wang"><img src="https://img.shields.io/badge/Maintainer-Zhaoqing_Wang-green" alt="GitHub Maintainer" /></a>
51
+ </p>
52
+ </td>
53
+ <td width="200">
54
+ <img src="docs/Sticker.png" alt="CellJanus Logo" width="200" />
55
+ </td>
56
+ </tr>
57
+ </table>
58
+
59
+ ## Pipeline
60
+
61
+ ```
62
+ FASTQ ─→ fastp (QC) ─→ Bowtie2 (host alignment) ─→ unmapped reads
63
+ │ │
64
+ ▼ ▼
65
+ host_aligned.bam Kraken2 + Bracken ─→ plots (PNG/PDF) + CSV tables
66
+ (gene expression) (microbial abundance)
67
+ ```
68
+
69
+ ## Contents
70
+
71
+ 1. [Installation](#installation)
72
+ 2. [Quick Start](#quick-start)
73
+ 3. [Real Data](#real-data)
74
+ 4. [CLI Reference](#cli-reference)
75
+ 5. [Python API](#python-api)
76
+ 6. [Output Structure](#output-structure)
77
+ 7. [Citation](#citation)
78
+
79
+ ---
80
+
81
+ ## Installation
82
+
83
+ > Requires **Linux / macOS / WSL2**. Bioconda packages are not available on native Windows.
84
+
85
+ ```bash
86
+ # Create conda environment with all dependencies
87
+ conda create -n CellJanus -c bioconda -c conda-forge \
88
+ python=3.11 fastp bowtie2 samtools kraken2 bracken
89
+
90
+ # Install CellJanus
91
+ conda activate CellJanus
92
+ git clone https://github.com/zhaoqing-wang/CellJanus.git
93
+ cd CellJanus
94
+ pip install .
95
+
96
+ # Verify installation
97
+ celljanus check
98
+ ```
99
+
100
+ All tools should show **✔ Found**. STAR is optional (for future RNA-seq alignment support).
101
+
102
+ <details>
103
+ <summary>Docker alternative</summary>
104
+
105
+ ```bash
106
+ docker build -t celljanus .
107
+ docker run --rm celljanus celljanus check
108
+ ```
109
+ </details>
110
+
111
+ ---
112
+
113
+ ## Quick Start
114
+
115
+ The repository includes test data and pre-built reference databases — run the full pipeline immediately with **no downloads required**.
116
+
117
+ ```bash
118
+ conda activate CellJanus
119
+ cd CellJanus
120
+
121
+ celljanus run \
122
+ --read1 testdata/reads_R1.fastq.gz \
123
+ --read2 testdata/reads_R2.fastq.gz \
124
+ --host-index testdata/refs/host_genome/host \
125
+ --kraken2-db testdata/refs/kraken2_testdb \
126
+ --output-dir test_results \
127
+ --threads 4
128
+ ```
129
+
130
+ **Test data**: 1,000 paired-end reads (600 human, 300 microbial, 100 low-quality).
131
+
132
+ **Results** (~4 seconds):
133
+
134
+ | Step | Metric |
135
+ |------|--------|
136
+ | QC | 1,000 → 900 pairs retained (90%), Q20 improved 88% → 98% |
137
+ | Host alignment | 66.39% aligned to host genome |
138
+ | Classification | 300 reads classified → 3 species detected |
139
+ | Top species | *S. aureus* 38.7%, *K. pneumoniae* 31.3%, *E. coli* 30.0% |
140
+ | Output | 8 plots (PNG + PDF), 3 CSV tables, QC reports |
141
+
142
+ #### Example Output
143
+
144
+ | Pipeline Dashboard |
145
+ |:--:|
146
+ | ![Pipeline Dashboard](docs/pipeline_dashboard.png) |
147
+ | *Summarises QC, alignment and classification metrics in a single view.* |
148
+
149
+ | Abundance Bar Chart | Abundance Donut Chart | Abundance Heatmap |
150
+ |:--:|:--:|:--:|
151
+ | ![Bar](docs/abundance_bar.png) | ![Pie](docs/abundance_pie.png) | ![Heatmap](docs/abundance_heatmap.png) |
152
+ | Top species ranked by read count. | Relative proportion of each species. | Log₁₀-scaled heatmap of species abundance. |
153
+
154
+ ### Run Individual Steps
155
+
156
+ ```bash
157
+ # QC only
158
+ celljanus qc -1 testdata/reads_R1.fastq.gz -2 testdata/reads_R2.fastq.gz -o results/01_qc
159
+
160
+ # Align to host
161
+ celljanus align -1 results/01_qc/reads_R1_qc.fastq.gz \
162
+ -2 results/01_qc/reads_R2_qc.fastq.gz \
163
+ -x testdata/refs/host_genome/host -o results/02_alignment
164
+
165
+ # Classify microbial reads
166
+ celljanus classify -1 results/02_alignment/unmapped_R1.fastq.gz \
167
+ -2 results/02_alignment/unmapped_R2.fastq.gz \
168
+ -d testdata/refs/kraken2_testdb -o results/04_classification
169
+
170
+ # Generate plots
171
+ celljanus visualize -b results/04_classification/bracken_S.txt -o results/05_visualisation
172
+ ```
173
+
174
+ ---
175
+
176
+ ## Real Data
177
+
178
+ ### 1. Download reference databases
179
+
180
+ ```bash
181
+ # Human genome hg38 + Bowtie2 index (~5 GB)
182
+ celljanus download hg38 -o ./refs
183
+
184
+ # Kraken2 standard database (~8 GB)
185
+ celljanus download kraken2 -o ./refs --db-name standard_8
186
+ ```
187
+
188
+ ### 2. Run pipeline
189
+
190
+ ```bash
191
+ celljanus run \
192
+ -1 /path/to/sample_R1.fastq.gz \
193
+ -2 /path/to/sample_R2.fastq.gz \
194
+ -x ./refs/bowtie2_index/GRCh38_noalt_as \
195
+ -d ./refs/standard_8 \
196
+ -o ./results \
197
+ --threads 8
198
+ ```
199
+
200
+ ### Key Options
201
+
202
+ | Option | Default | Description |
203
+ |--------|---------|-------------|
204
+ | `-1, --read1` | *required* | R1 FASTQ (or single-end FASTQ) |
205
+ | `-2, --read2` | — | R2 FASTQ for paired-end |
206
+ | `-x, --host-index` | *required* | Bowtie2 index prefix |
207
+ | `-d, --kraken2-db` | *required* | Kraken2 database path |
208
+ | `-o, --output-dir` | `celljanus_output` | Output directory |
209
+ | `-t, --threads` | auto (CPUs − 2) | Worker threads |
210
+ | `--min-quality` | 15 | Phred quality threshold |
211
+ | `--confidence` | 0.05 | Kraken2 confidence |
212
+ | `--bracken-level` | S | Taxonomic level (D/P/C/O/F/G/S) |
213
+ | `--skip-qc` | — | Skip QC step |
214
+ | `--skip-classify` | — | Skip classification |
215
+ | `--skip-visualize` | — | Skip visualisation |
216
+
217
+ ---
218
+
219
+ ## CLI Reference
220
+
221
+ | Command | Description |
222
+ |---------|-------------|
223
+ | `celljanus run` | Full pipeline: QC → Align → Classify → Visualize |
224
+ | `celljanus qc` | Quality control (fastp) |
225
+ | `celljanus align` | Host alignment + unmapped extraction (Bowtie2) |
226
+ | `celljanus extract` | Extract unmapped reads from BAM |
227
+ | `celljanus classify` | Taxonomic classification (Kraken2 + Bracken) |
228
+ | `celljanus visualize` | Generate abundance plots |
229
+ | `celljanus download` | Download reference databases |
230
+ | `celljanus check` | Verify external tool installation |
231
+
232
+ Run `celljanus <command> --help` for full option details.
233
+
234
+ ---
235
+
236
+ ## Python API
237
+
238
+ ```python
239
+ from pathlib import Path
240
+ from celljanus.config import CellJanusConfig
241
+ from celljanus.pipeline import run_pipeline
242
+
243
+ cfg = CellJanusConfig(
244
+ output_dir=Path("./results"),
245
+ host_index=Path("./refs/bowtie2_index/GRCh38_noalt_as"),
246
+ kraken2_db=Path("./refs/standard_8"),
247
+ threads=8,
248
+ )
249
+
250
+ result = run_pipeline(
251
+ Path("sample_R1.fastq.gz"),
252
+ read2=Path("sample_R2.fastq.gz"),
253
+ cfg=cfg,
254
+ )
255
+
256
+ result.bracken_df # Species abundance (pandas DataFrame)
257
+ result.qc_report.summary() # QC statistics
258
+ ```
259
+
260
+ ---
261
+
262
+ ## Output Structure
263
+
264
+ ```
265
+ output_dir/
266
+ ├── 01_qc/ # Quality control
267
+ │ ├── *_qc.fastq.gz # Trimmed reads
268
+ │ ├── *_fastp.json # QC metrics
269
+ │ └── *_fastp.html # Interactive report
270
+ ├── 02_alignment/ # Host alignment
271
+ │ ├── host_aligned.sorted.bam # Full alignment
272
+ │ ├── host_mapped.sorted.bam # Host-only reads
273
+ │ ├── unmapped_R{1,2}.fastq.gz # Non-host reads → classification
274
+ │ └── host_align_stats.txt # Alignment statistics
275
+ ├── 04_classification/ # Microbial classification
276
+ │ ├── kraken2_report.txt # Taxonomic report
277
+ │ ├── kraken2_output.txt # Per-read assignments
278
+ │ └── bracken_S.txt # Species abundance
279
+ ├── 05_visualisation/plots/ # Figures (PNG + PDF)
280
+ │ ├── abundance_bar.* # Horizontal bar chart
281
+ │ ├── abundance_pie.* # Donut chart
282
+ │ ├── abundance_heatmap.* # Heatmap (log₁₀ scale)
283
+ │ └── pipeline_dashboard.* # Summary dashboard
284
+ ├── 06_tables/ # Machine-readable results
285
+ │ ├── pipeline_summary.csv # Per-step metrics
286
+ │ ├── species_abundance.csv # Species × reads × fraction
287
+ │ └── output_manifest.csv # File inventory with sizes
288
+ └── celljanus.log # Pipeline log
289
+ ```
290
+
291
+ ### CSV Tables
292
+
293
+ **`species_abundance.csv`**:
294
+
295
+ | name | taxonomy_id | bracken_estimated | fraction_pct |
296
+ |------|-------------|------------------:|--------------:|
297
+ | Staphylococcus aureus | 1280 | 116 | 38.67 |
298
+ | Klebsiella pneumoniae | 573 | 94 | 31.33 |
299
+ | Escherichia coli | 562 | 90 | 30.00 |
300
+
301
+ **`pipeline_summary.csv`**: one row per metric (Step, Metric, Value) covering QC, alignment, and classification statistics.
302
+
303
+ ---
304
+
305
+ ## Performance
306
+
307
+ | Component | Memory | Note |
308
+ |-----------|--------|------|
309
+ | fastp | < 1 GB | Streaming I/O |
310
+ | Bowtie2 + hg38 | ~3.5 GB | Memory-mapped index |
311
+ | Kraken2 (standard DB) | ~8 GB | `--memory-mapping` flag |
312
+ | **Peak total** | **~12–14 GB** | Fits a 32 GB laptop |
313
+
314
+ ---
315
+
316
+ ## Citation
317
+
318
+ ```
319
+ Wang Z (2026). CellJanus: A Dual-Perspective Tool for Deconvolving Host
320
+ Single-Cell and Microbial Transcriptomes. Python package version 0.1.3.
321
+ https://github.com/zhaoqing-wang/CellJanus
322
+ ```
323
+
324
+ ## License
325
+
326
+ [MIT](LICENSE)
@@ -0,0 +1,285 @@
1
+ <table>
2
+ <tr>
3
+ <td>
4
+ <h1>CellJanus: Dual-Perspective Deconvolution of Host and Microbial Transcriptomes from FASTQ Data</h1>
5
+ <p>
6
+ <a href="https://github.com/zhaoqing-wang/CellJanus/releases"><img src="https://img.shields.io/badge/dynamic/toml?url=https%3A%2F%2Fraw.githubusercontent.com%2Fzhaoqing-wang%2FCellJanus%2Fmain%2Fpyproject.toml&label=Version&query=project.version&color=blue&style=flat-square" alt="Version" /></a>
7
+ <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License: MIT" /></a>
8
+ <a href="https://www.python.org/"><img src="https://img.shields.io/badge/Python-3.9%2B-blue.svg" alt="Python 3.9+" /></a>
9
+ <a href="https://github.com/zhaoqing-wang"><img src="https://img.shields.io/badge/Maintainer-Zhaoqing_Wang-green" alt="GitHub Maintainer" /></a>
10
+ </p>
11
+ </td>
12
+ <td width="200">
13
+ <img src="docs/Sticker.png" alt="CellJanus Logo" width="200" />
14
+ </td>
15
+ </tr>
16
+ </table>
17
+
18
+ ## Pipeline
19
+
20
+ ```
21
+ FASTQ ─→ fastp (QC) ─→ Bowtie2 (host alignment) ─→ unmapped reads
22
+ │ │
23
+ ▼ ▼
24
+ host_aligned.bam Kraken2 + Bracken ─→ plots (PNG/PDF) + CSV tables
25
+ (gene expression) (microbial abundance)
26
+ ```
27
+
28
+ ## Contents
29
+
30
+ 1. [Installation](#installation)
31
+ 2. [Quick Start](#quick-start)
32
+ 3. [Real Data](#real-data)
33
+ 4. [CLI Reference](#cli-reference)
34
+ 5. [Python API](#python-api)
35
+ 6. [Output Structure](#output-structure)
36
+ 7. [Citation](#citation)
37
+
38
+ ---
39
+
40
+ ## Installation
41
+
42
+ > Requires **Linux / macOS / WSL2**. Bioconda packages are not available on native Windows.
43
+
44
+ ```bash
45
+ # Create conda environment with all dependencies
46
+ conda create -n CellJanus -c bioconda -c conda-forge \
47
+ python=3.11 fastp bowtie2 samtools kraken2 bracken
48
+
49
+ # Install CellJanus
50
+ conda activate CellJanus
51
+ git clone https://github.com/zhaoqing-wang/CellJanus.git
52
+ cd CellJanus
53
+ pip install .
54
+
55
+ # Verify installation
56
+ celljanus check
57
+ ```
58
+
59
+ All tools should show **✔ Found**. STAR is optional (for future RNA-seq alignment support).
60
+
61
+ <details>
62
+ <summary>Docker alternative</summary>
63
+
64
+ ```bash
65
+ docker build -t celljanus .
66
+ docker run --rm celljanus celljanus check
67
+ ```
68
+ </details>
69
+
70
+ ---
71
+
72
+ ## Quick Start
73
+
74
+ The repository includes test data and pre-built reference databases — run the full pipeline immediately with **no downloads required**.
75
+
76
+ ```bash
77
+ conda activate CellJanus
78
+ cd CellJanus
79
+
80
+ celljanus run \
81
+ --read1 testdata/reads_R1.fastq.gz \
82
+ --read2 testdata/reads_R2.fastq.gz \
83
+ --host-index testdata/refs/host_genome/host \
84
+ --kraken2-db testdata/refs/kraken2_testdb \
85
+ --output-dir test_results \
86
+ --threads 4
87
+ ```
88
+
89
+ **Test data**: 1,000 paired-end reads (600 human, 300 microbial, 100 low-quality).
90
+
91
+ **Results** (~4 seconds):
92
+
93
+ | Step | Metric |
94
+ |------|--------|
95
+ | QC | 1,000 → 900 pairs retained (90%), Q20 improved 88% → 98% |
96
+ | Host alignment | 66.39% aligned to host genome |
97
+ | Classification | 300 reads classified → 3 species detected |
98
+ | Top species | *S. aureus* 38.7%, *K. pneumoniae* 31.3%, *E. coli* 30.0% |
99
+ | Output | 8 plots (PNG + PDF), 3 CSV tables, QC reports |
100
+
101
+ #### Example Output
102
+
103
+ | Pipeline Dashboard |
104
+ |:--:|
105
+ | ![Pipeline Dashboard](docs/pipeline_dashboard.png) |
106
+ | *Summarises QC, alignment and classification metrics in a single view.* |
107
+
108
+ | Abundance Bar Chart | Abundance Donut Chart | Abundance Heatmap |
109
+ |:--:|:--:|:--:|
110
+ | ![Bar](docs/abundance_bar.png) | ![Pie](docs/abundance_pie.png) | ![Heatmap](docs/abundance_heatmap.png) |
111
+ | Top species ranked by read count. | Relative proportion of each species. | Log₁₀-scaled heatmap of species abundance. |
112
+
113
+ ### Run Individual Steps
114
+
115
+ ```bash
116
+ # QC only
117
+ celljanus qc -1 testdata/reads_R1.fastq.gz -2 testdata/reads_R2.fastq.gz -o results/01_qc
118
+
119
+ # Align to host
120
+ celljanus align -1 results/01_qc/reads_R1_qc.fastq.gz \
121
+ -2 results/01_qc/reads_R2_qc.fastq.gz \
122
+ -x testdata/refs/host_genome/host -o results/02_alignment
123
+
124
+ # Classify microbial reads
125
+ celljanus classify -1 results/02_alignment/unmapped_R1.fastq.gz \
126
+ -2 results/02_alignment/unmapped_R2.fastq.gz \
127
+ -d testdata/refs/kraken2_testdb -o results/04_classification
128
+
129
+ # Generate plots
130
+ celljanus visualize -b results/04_classification/bracken_S.txt -o results/05_visualisation
131
+ ```
132
+
133
+ ---
134
+
135
+ ## Real Data
136
+
137
+ ### 1. Download reference databases
138
+
139
+ ```bash
140
+ # Human genome hg38 + Bowtie2 index (~5 GB)
141
+ celljanus download hg38 -o ./refs
142
+
143
+ # Kraken2 standard database (~8 GB)
144
+ celljanus download kraken2 -o ./refs --db-name standard_8
145
+ ```
146
+
147
+ ### 2. Run pipeline
148
+
149
+ ```bash
150
+ celljanus run \
151
+ -1 /path/to/sample_R1.fastq.gz \
152
+ -2 /path/to/sample_R2.fastq.gz \
153
+ -x ./refs/bowtie2_index/GRCh38_noalt_as \
154
+ -d ./refs/standard_8 \
155
+ -o ./results \
156
+ --threads 8
157
+ ```
158
+
159
+ ### Key Options
160
+
161
+ | Option | Default | Description |
162
+ |--------|---------|-------------|
163
+ | `-1, --read1` | *required* | R1 FASTQ (or single-end FASTQ) |
164
+ | `-2, --read2` | — | R2 FASTQ for paired-end |
165
+ | `-x, --host-index` | *required* | Bowtie2 index prefix |
166
+ | `-d, --kraken2-db` | *required* | Kraken2 database path |
167
+ | `-o, --output-dir` | `celljanus_output` | Output directory |
168
+ | `-t, --threads` | auto (CPUs − 2) | Worker threads |
169
+ | `--min-quality` | 15 | Phred quality threshold |
170
+ | `--confidence` | 0.05 | Kraken2 confidence |
171
+ | `--bracken-level` | S | Taxonomic level (D/P/C/O/F/G/S) |
172
+ | `--skip-qc` | — | Skip QC step |
173
+ | `--skip-classify` | — | Skip classification |
174
+ | `--skip-visualize` | — | Skip visualisation |
175
+
176
+ ---
177
+
178
+ ## CLI Reference
179
+
180
+ | Command | Description |
181
+ |---------|-------------|
182
+ | `celljanus run` | Full pipeline: QC → Align → Classify → Visualize |
183
+ | `celljanus qc` | Quality control (fastp) |
184
+ | `celljanus align` | Host alignment + unmapped extraction (Bowtie2) |
185
+ | `celljanus extract` | Extract unmapped reads from BAM |
186
+ | `celljanus classify` | Taxonomic classification (Kraken2 + Bracken) |
187
+ | `celljanus visualize` | Generate abundance plots |
188
+ | `celljanus download` | Download reference databases |
189
+ | `celljanus check` | Verify external tool installation |
190
+
191
+ Run `celljanus <command> --help` for full option details.
192
+
193
+ ---
194
+
195
+ ## Python API
196
+
197
+ ```python
198
+ from pathlib import Path
199
+ from celljanus.config import CellJanusConfig
200
+ from celljanus.pipeline import run_pipeline
201
+
202
+ cfg = CellJanusConfig(
203
+ output_dir=Path("./results"),
204
+ host_index=Path("./refs/bowtie2_index/GRCh38_noalt_as"),
205
+ kraken2_db=Path("./refs/standard_8"),
206
+ threads=8,
207
+ )
208
+
209
+ result = run_pipeline(
210
+ Path("sample_R1.fastq.gz"),
211
+ read2=Path("sample_R2.fastq.gz"),
212
+ cfg=cfg,
213
+ )
214
+
215
+ result.bracken_df # Species abundance (pandas DataFrame)
216
+ result.qc_report.summary() # QC statistics
217
+ ```
218
+
219
+ ---
220
+
221
+ ## Output Structure
222
+
223
+ ```
224
+ output_dir/
225
+ ├── 01_qc/ # Quality control
226
+ │ ├── *_qc.fastq.gz # Trimmed reads
227
+ │ ├── *_fastp.json # QC metrics
228
+ │ └── *_fastp.html # Interactive report
229
+ ├── 02_alignment/ # Host alignment
230
+ │ ├── host_aligned.sorted.bam # Full alignment
231
+ │ ├── host_mapped.sorted.bam # Host-only reads
232
+ │ ├── unmapped_R{1,2}.fastq.gz # Non-host reads → classification
233
+ │ └── host_align_stats.txt # Alignment statistics
234
+ ├── 04_classification/ # Microbial classification
235
+ │ ├── kraken2_report.txt # Taxonomic report
236
+ │ ├── kraken2_output.txt # Per-read assignments
237
+ │ └── bracken_S.txt # Species abundance
238
+ ├── 05_visualisation/plots/ # Figures (PNG + PDF)
239
+ │ ├── abundance_bar.* # Horizontal bar chart
240
+ │ ├── abundance_pie.* # Donut chart
241
+ │ ├── abundance_heatmap.* # Heatmap (log₁₀ scale)
242
+ │ └── pipeline_dashboard.* # Summary dashboard
243
+ ├── 06_tables/ # Machine-readable results
244
+ │ ├── pipeline_summary.csv # Per-step metrics
245
+ │ ├── species_abundance.csv # Species × reads × fraction
246
+ │ └── output_manifest.csv # File inventory with sizes
247
+ └── celljanus.log # Pipeline log
248
+ ```
249
+
250
+ ### CSV Tables
251
+
252
+ **`species_abundance.csv`**:
253
+
254
+ | name | taxonomy_id | bracken_estimated | fraction_pct |
255
+ |------|-------------|------------------:|--------------:|
256
+ | Staphylococcus aureus | 1280 | 116 | 38.67 |
257
+ | Klebsiella pneumoniae | 573 | 94 | 31.33 |
258
+ | Escherichia coli | 562 | 90 | 30.00 |
259
+
260
+ **`pipeline_summary.csv`**: one row per metric (Step, Metric, Value) covering QC, alignment, and classification statistics.
261
+
262
+ ---
263
+
264
+ ## Performance
265
+
266
+ | Component | Memory | Note |
267
+ |-----------|--------|------|
268
+ | fastp | < 1 GB | Streaming I/O |
269
+ | Bowtie2 + hg38 | ~3.5 GB | Memory-mapped index |
270
+ | Kraken2 (standard DB) | ~8 GB | `--memory-mapping` flag |
271
+ | **Peak total** | **~12–14 GB** | Fits a 32 GB laptop |
272
+
273
+ ---
274
+
275
+ ## Citation
276
+
277
+ ```
278
+ Wang Z (2026). CellJanus: A Dual-Perspective Tool for Deconvolving Host
279
+ Single-Cell and Microbial Transcriptomes. Python package version 0.1.3.
280
+ https://github.com/zhaoqing-wang/CellJanus
281
+ ```
282
+
283
+ ## License
284
+
285
+ [MIT](LICENSE)
@@ -0,0 +1,10 @@
1
+ """
2
+ CellJanus: A Dual-Perspective Tool for Deconvolving Host Single-Cell
3
+ and Microbial Transcriptomes.
4
+
5
+ Pipeline: FASTQ → QC (fastp) → Align hg38 (Bowtie2) → Extract unmapped
6
+ (samtools) → Classify (Kraken2) → Quantify (Bracken) → Visualize
7
+ """
8
+
9
+ __version__ = "0.1.3"
10
+ __author__ = "CellJanus Team"
@@ -0,0 +1,6 @@
1
+ """Allow running: python -m celljanus"""
2
+
3
+ from celljanus.cli import main
4
+
5
+ if __name__ == "__main__":
6
+ main()