@bgicli/bgicli 2.1.1 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/data/skills/aav-vector-design-agent/SKILL.md +198 -0
- package/data/skills/adaptyv/SKILL.md +112 -0
- package/data/skills/adhd-daily-planner/SKILL.md +271 -0
- package/data/skills/aeon/SKILL.md +372 -0
- package/data/skills/agent-browser/SKILL.md +159 -0
- package/data/skills/agentd-drug-discovery/SKILL.md +52 -0
- package/data/skills/ai-analyzer/SKILL.md +218 -0
- package/data/skills/alphafold/SKILL.md +183 -0
- package/data/skills/alphafold-database/SKILL.md +500 -0
- package/data/skills/anndata/SKILL.md +394 -0
- package/data/skills/antibody-design-agent/SKILL.md +64 -0
- package/data/skills/arboreto/SKILL.md +237 -0
- package/data/skills/armored-cart-design-agent/SKILL.md +225 -0
- package/data/skills/arxiv-search/SKILL.md +224 -0
- package/data/skills/autonomous-oncology-agent/SKILL.md +77 -0
- package/data/skills/bayesian-optimizer/SKILL.md +60 -0
- package/data/skills/benchling-integration/SKILL.md +473 -0
- package/data/skills/bgpt-paper-search/SKILL.md +81 -0
- package/data/skills/bindcraft/SKILL.md +198 -0
- package/data/skills/binder-design/SKILL.md +182 -0
- package/data/skills/binding-characterization/SKILL.md +234 -0
- package/data/skills/bindingdb-database/SKILL.md +332 -0
- package/data/skills/bio-admet-prediction/SKILL.md +224 -0
- package/data/skills/bio-alignment-files-bam-statistics/SKILL.md +340 -0
- package/data/skills/bio-alignment-filtering/SKILL.md +322 -0
- package/data/skills/bio-alignment-indexing/SKILL.md +249 -0
- package/data/skills/bio-alignment-io/SKILL.md +301 -0
- package/data/skills/bio-alignment-msa-parsing/SKILL.md +366 -0
- package/data/skills/bio-alignment-msa-statistics/SKILL.md +375 -0
- package/data/skills/bio-alignment-pairwise/SKILL.md +277 -0
- package/data/skills/bio-alignment-sorting/SKILL.md +296 -0
- package/data/skills/bio-alignment-validation/SKILL.md +374 -0
- package/data/skills/bio-atac-seq-atac-peak-calling/SKILL.md +221 -0
- package/data/skills/bio-atac-seq-atac-qc/SKILL.md +292 -0
- package/data/skills/bio-atac-seq-differential-accessibility/SKILL.md +268 -0
- package/data/skills/bio-atac-seq-footprinting/SKILL.md +256 -0
- package/data/skills/bio-atac-seq-motif-deviation/SKILL.md +319 -0
- package/data/skills/bio-atac-seq-nucleosome-positioning/SKILL.md +321 -0
- package/data/skills/bio-basecalling/SKILL.md +368 -0
- package/data/skills/bio-batch-downloads/SKILL.md +384 -0
- package/data/skills/bio-batch-processing/SKILL.md +303 -0
- package/data/skills/bio-bedgraph-handling/SKILL.md +336 -0
- package/data/skills/bio-blast-searches/SKILL.md +354 -0
- package/data/skills/bio-causal-genomics-colocalization-analysis/SKILL.md +264 -0
- package/data/skills/bio-causal-genomics-fine-mapping/SKILL.md +267 -0
- package/data/skills/bio-causal-genomics-mediation-analysis/SKILL.md +264 -0
- package/data/skills/bio-causal-genomics-mendelian-randomization/SKILL.md +221 -0
- package/data/skills/bio-causal-genomics-pleiotropy-detection/SKILL.md +292 -0
- package/data/skills/bio-cfdna-preprocessing/SKILL.md +200 -0
- package/data/skills/bio-chipseq-differential-binding/SKILL.md +262 -0
- package/data/skills/bio-chipseq-motif-analysis/SKILL.md +387 -0
- package/data/skills/bio-chipseq-peak-annotation/SKILL.md +239 -0
- package/data/skills/bio-chipseq-peak-calling/SKILL.md +277 -0
- package/data/skills/bio-chipseq-qc/SKILL.md +391 -0
- package/data/skills/bio-chipseq-super-enhancers/SKILL.md +288 -0
- package/data/skills/bio-chipseq-visualization/SKILL.md +289 -0
- package/data/skills/bio-clinical-databases-clinvar-lookup/SKILL.md +188 -0
- package/data/skills/bio-clinical-databases-dbsnp-queries/SKILL.md +171 -0
- package/data/skills/bio-clinical-databases-gnomad-frequencies/SKILL.md +205 -0
- package/data/skills/bio-clinical-databases-hla-typing/SKILL.md +248 -0
- package/data/skills/bio-clinical-databases-myvariant-queries/SKILL.md +174 -0
- package/data/skills/bio-clinical-databases-pharmacogenomics/SKILL.md +232 -0
- package/data/skills/bio-clinical-databases-polygenic-risk/SKILL.md +276 -0
- package/data/skills/bio-clinical-databases-somatic-signatures/SKILL.md +261 -0
- package/data/skills/bio-clinical-databases-tumor-mutational-burden/SKILL.md +301 -0
- package/data/skills/bio-clinical-databases-variant-prioritization/SKILL.md +225 -0
- package/data/skills/bio-clip-seq-binding-site-annotation/SKILL.md +66 -0
- package/data/skills/bio-clip-seq-clip-alignment/SKILL.md +70 -0
- package/data/skills/bio-clip-seq-clip-motif-analysis/SKILL.md +62 -0
- package/data/skills/bio-clip-seq-clip-peak-calling/SKILL.md +282 -0
- package/data/skills/bio-clip-seq-clip-preprocessing/SKILL.md +142 -0
- package/data/skills/bio-codon-usage/SKILL.md +353 -0
- package/data/skills/bio-comparative-genomics-ancestral-reconstruction/SKILL.md +312 -0
- package/data/skills/bio-comparative-genomics-hgt-detection/SKILL.md +341 -0
- package/data/skills/bio-comparative-genomics-ortholog-inference/SKILL.md +308 -0
- package/data/skills/bio-comparative-genomics-positive-selection/SKILL.md +354 -0
- package/data/skills/bio-comparative-genomics-synteny-analysis/SKILL.md +315 -0
- package/data/skills/bio-compressed-files/SKILL.md +263 -0
- package/data/skills/bio-consensus-sequences/SKILL.md +340 -0
- package/data/skills/bio-copy-number-cnv-annotation/SKILL.md +307 -0
- package/data/skills/bio-copy-number-cnv-visualization/SKILL.md +294 -0
- package/data/skills/bio-copy-number-cnvkit-analysis/SKILL.md +290 -0
- package/data/skills/bio-copy-number-gatk-cnv/SKILL.md +270 -0
- package/data/skills/bio-crispr-screens-base-editing-analysis/SKILL.md +110 -0
- package/data/skills/bio-crispr-screens-batch-correction/SKILL.md +316 -0
- package/data/skills/bio-crispr-screens-crispresso-editing/SKILL.md +205 -0
- package/data/skills/bio-crispr-screens-hit-calling/SKILL.md +264 -0
- package/data/skills/bio-crispr-screens-jacks-analysis/SKILL.md +313 -0
- package/data/skills/bio-crispr-screens-library-design/SKILL.md +417 -0
- package/data/skills/bio-crispr-screens-mageck-analysis/SKILL.md +222 -0
- package/data/skills/bio-crispr-screens-screen-qc/SKILL.md +243 -0
- package/data/skills/bio-ctdna-mutation-detection/SKILL.md +234 -0
- package/data/skills/bio-data-visualization-circos-plots/SKILL.md +405 -0
- package/data/skills/bio-data-visualization-color-palettes/SKILL.md +244 -0
- package/data/skills/bio-data-visualization-genome-browser-tracks/SKILL.md +328 -0
- package/data/skills/bio-data-visualization-genome-tracks/SKILL.md +249 -0
- package/data/skills/bio-data-visualization-ggplot2-fundamentals/SKILL.md +313 -0
- package/data/skills/bio-data-visualization-heatmaps-clustering/SKILL.md +227 -0
- package/data/skills/bio-data-visualization-interactive-visualization/SKILL.md +210 -0
- package/data/skills/bio-data-visualization-multipanel-figures/SKILL.md +274 -0
- package/data/skills/bio-data-visualization-specialized-omics-plots/SKILL.md +251 -0
- package/data/skills/bio-data-visualization-upset-plots/SKILL.md +228 -0
- package/data/skills/bio-data-visualization-volcano-customization/SKILL.md +233 -0
- package/data/skills/bio-de-deseq2-basics/SKILL.md +376 -0
- package/data/skills/bio-de-edger-basics/SKILL.md +418 -0
- package/data/skills/bio-de-results/SKILL.md +378 -0
- package/data/skills/bio-de-visualization/SKILL.md +408 -0
- package/data/skills/bio-differential-expression-batch-correction/SKILL.md +253 -0
- package/data/skills/bio-differential-expression-timeseries-de/SKILL.md +370 -0
- package/data/skills/bio-differential-splicing/SKILL.md +177 -0
- package/data/skills/bio-duplicate-handling/SKILL.md +292 -0
- package/data/skills/bio-entrez-fetch/SKILL.md +334 -0
- package/data/skills/bio-entrez-link/SKILL.md +325 -0
- package/data/skills/bio-entrez-search/SKILL.md +311 -0
- package/data/skills/bio-epidemiological-genomics-amr-surveillance/SKILL.md +233 -0
- package/data/skills/bio-epidemiological-genomics-pathogen-typing/SKILL.md +202 -0
- package/data/skills/bio-epidemiological-genomics-phylodynamics/SKILL.md +207 -0
- package/data/skills/bio-epidemiological-genomics-transmission-inference/SKILL.md +237 -0
- package/data/skills/bio-epidemiological-genomics-variant-surveillance/SKILL.md +237 -0
- package/data/skills/bio-epitranscriptomics-m6a-differential/SKILL.md +88 -0
- package/data/skills/bio-epitranscriptomics-m6a-peak-calling/SKILL.md +89 -0
- package/data/skills/bio-epitranscriptomics-m6anet-analysis/SKILL.md +101 -0
- package/data/skills/bio-epitranscriptomics-merip-preprocessing/SKILL.md +81 -0
- package/data/skills/bio-epitranscriptomics-modification-visualization/SKILL.md +98 -0
- package/data/skills/bio-experimental-design-batch-design/SKILL.md +110 -0
- package/data/skills/bio-experimental-design-multiple-testing/SKILL.md +98 -0
- package/data/skills/bio-experimental-design-power-analysis/SKILL.md +84 -0
- package/data/skills/bio-experimental-design-sample-size/SKILL.md +93 -0
- package/data/skills/bio-expression-matrix-counts-ingest/SKILL.md +220 -0
- package/data/skills/bio-expression-matrix-gene-id-mapping/SKILL.md +256 -0
- package/data/skills/bio-expression-matrix-metadata-joins/SKILL.md +271 -0
- package/data/skills/bio-expression-matrix-sparse-handling/SKILL.md +247 -0
- package/data/skills/bio-fastq-quality/SKILL.md +279 -0
- package/data/skills/bio-filter-sequences/SKILL.md +265 -0
- package/data/skills/bio-flow-cytometry-bead-normalization/SKILL.md +315 -0
- package/data/skills/bio-flow-cytometry-clustering-phenotyping/SKILL.md +237 -0
- package/data/skills/bio-flow-cytometry-compensation-transformation/SKILL.md +196 -0
- package/data/skills/bio-flow-cytometry-cytometry-qc/SKILL.md +382 -0
- package/data/skills/bio-flow-cytometry-differential-analysis/SKILL.md +217 -0
- package/data/skills/bio-flow-cytometry-doublet-detection/SKILL.md +288 -0
- package/data/skills/bio-flow-cytometry-fcs-handling/SKILL.md +221 -0
- package/data/skills/bio-flow-cytometry-gating-analysis/SKILL.md +193 -0
- package/data/skills/bio-format-conversion/SKILL.md +193 -0
- package/data/skills/bio-fragment-analysis/SKILL.md +214 -0
- package/data/skills/bio-gatk-variant-calling/SKILL.md +422 -0
- package/data/skills/bio-genome-assembly-assembly-polishing/SKILL.md +333 -0
- package/data/skills/bio-genome-assembly-assembly-qc/SKILL.md +344 -0
- package/data/skills/bio-genome-assembly-contamination-detection/SKILL.md +235 -0
- package/data/skills/bio-genome-assembly-hifi-assembly/SKILL.md +178 -0
- package/data/skills/bio-genome-assembly-long-read-assembly/SKILL.md +307 -0
- package/data/skills/bio-genome-assembly-metagenome-assembly/SKILL.md +227 -0
- package/data/skills/bio-genome-assembly-scaffolding/SKILL.md +204 -0
- package/data/skills/bio-genome-assembly-short-read-assembly/SKILL.md +319 -0
- package/data/skills/bio-genome-engineering-base-editing-design/SKILL.md +277 -0
- package/data/skills/bio-genome-engineering-grna-design/SKILL.md +221 -0
- package/data/skills/bio-genome-engineering-hdr-template-design/SKILL.md +264 -0
- package/data/skills/bio-genome-engineering-off-target-prediction/SKILL.md +232 -0
- package/data/skills/bio-genome-engineering-prime-editing-design/SKILL.md +275 -0
- package/data/skills/bio-genome-intervals-bed-file-basics/SKILL.md +357 -0
- package/data/skills/bio-genome-intervals-bigwig-tracks/SKILL.md +351 -0
- package/data/skills/bio-genome-intervals-coverage-analysis/SKILL.md +300 -0
- package/data/skills/bio-genome-intervals-gtf-gff-handling/SKILL.md +345 -0
- package/data/skills/bio-genome-intervals-interval-arithmetic/SKILL.md +485 -0
- package/data/skills/bio-genome-intervals-proximity-operations/SKILL.md +337 -0
- package/data/skills/bio-geo-data/SKILL.md +380 -0
- package/data/skills/bio-hi-c-analysis-compartment-analysis/SKILL.md +261 -0
- package/data/skills/bio-hi-c-analysis-contact-pairs/SKILL.md +278 -0
- package/data/skills/bio-hi-c-analysis-hic-data-io/SKILL.md +260 -0
- package/data/skills/bio-hi-c-analysis-hic-differential/SKILL.md +328 -0
- package/data/skills/bio-hi-c-analysis-hic-visualization/SKILL.md +297 -0
- package/data/skills/bio-hi-c-analysis-loop-calling/SKILL.md +284 -0
- package/data/skills/bio-hi-c-analysis-matrix-operations/SKILL.md +274 -0
- package/data/skills/bio-hi-c-analysis-tad-detection/SKILL.md +239 -0
- package/data/skills/bio-imaging-mass-cytometry-cell-segmentation/SKILL.md +241 -0
- package/data/skills/bio-imaging-mass-cytometry-data-preprocessing/SKILL.md +279 -0
- package/data/skills/bio-imaging-mass-cytometry-interactive-annotation/SKILL.md +304 -0
- package/data/skills/bio-imaging-mass-cytometry-phenotyping/SKILL.md +231 -0
- package/data/skills/bio-imaging-mass-cytometry-quality-metrics/SKILL.md +316 -0
- package/data/skills/bio-imaging-mass-cytometry-spatial-analysis/SKILL.md +246 -0
- package/data/skills/bio-immunoinformatics-epitope-prediction/SKILL.md +259 -0
- package/data/skills/bio-immunoinformatics-immunogenicity-scoring/SKILL.md +275 -0
- package/data/skills/bio-immunoinformatics-mhc-binding-prediction/SKILL.md +260 -0
- package/data/skills/bio-immunoinformatics-neoantigen-prediction/SKILL.md +277 -0
- package/data/skills/bio-immunoinformatics-tcr-epitope-binding/SKILL.md +257 -0
- package/data/skills/bio-isoform-switching/SKILL.md +192 -0
- package/data/skills/bio-liquid-biopsy-pipeline/SKILL.md +311 -0
- package/data/skills/bio-local-blast/SKILL.md +350 -0
- package/data/skills/bio-long-read-sequencing-clair3-variants/SKILL.md +252 -0
- package/data/skills/bio-long-read-sequencing-isoseq-analysis/SKILL.md +334 -0
- package/data/skills/bio-long-read-sequencing-nanopore-methylation/SKILL.md +110 -0
- package/data/skills/bio-longitudinal-monitoring/SKILL.md +271 -0
- package/data/skills/bio-longread-alignment/SKILL.md +193 -0
- package/data/skills/bio-longread-medaka/SKILL.md +176 -0
- package/data/skills/bio-longread-qc/SKILL.md +224 -0
- package/data/skills/bio-longread-structural-variants/SKILL.md +201 -0
- package/data/skills/bio-machine-learning-atlas-mapping/SKILL.md +139 -0
- package/data/skills/bio-machine-learning-biomarker-discovery/SKILL.md +157 -0
- package/data/skills/bio-machine-learning-model-validation/SKILL.md +148 -0
- package/data/skills/bio-machine-learning-omics-classifiers/SKILL.md +146 -0
- package/data/skills/bio-machine-learning-prediction-explanation/SKILL.md +162 -0
- package/data/skills/bio-machine-learning-survival-analysis/SKILL.md +176 -0
- package/data/skills/bio-metabolomics-lipidomics/SKILL.md +265 -0
- package/data/skills/bio-metabolomics-metabolite-annotation/SKILL.md +241 -0
- package/data/skills/bio-metabolomics-msdial-preprocessing/SKILL.md +308 -0
- package/data/skills/bio-metabolomics-normalization-qc/SKILL.md +283 -0
- package/data/skills/bio-metabolomics-pathway-mapping/SKILL.md +237 -0
- package/data/skills/bio-metabolomics-statistical-analysis/SKILL.md +276 -0
- package/data/skills/bio-metabolomics-targeted-analysis/SKILL.md +314 -0
- package/data/skills/bio-metabolomics-xcms-preprocessing/SKILL.md +268 -0
- package/data/skills/bio-metagenomics-abundance/SKILL.md +203 -0
- package/data/skills/bio-metagenomics-amr-detection/SKILL.md +293 -0
- package/data/skills/bio-metagenomics-functional-profiling/SKILL.md +252 -0
- package/data/skills/bio-metagenomics-kraken/SKILL.md +204 -0
- package/data/skills/bio-metagenomics-metaphlan/SKILL.md +214 -0
- package/data/skills/bio-metagenomics-strain-tracking/SKILL.md +292 -0
- package/data/skills/bio-metagenomics-visualization/SKILL.md +240 -0
- package/data/skills/bio-methylation-based-detection/SKILL.md +223 -0
- package/data/skills/bio-methylation-bismark-alignment/SKILL.md +195 -0
- package/data/skills/bio-methylation-calling/SKILL.md +200 -0
- package/data/skills/bio-methylation-dmr-detection/SKILL.md +211 -0
- package/data/skills/bio-methylation-methylkit/SKILL.md +219 -0
- package/data/skills/bio-microbiome-amplicon-processing/SKILL.md +137 -0
- package/data/skills/bio-microbiome-differential-abundance/SKILL.md +147 -0
- package/data/skills/bio-microbiome-diversity-analysis/SKILL.md +188 -0
- package/data/skills/bio-microbiome-functional-prediction/SKILL.md +153 -0
- package/data/skills/bio-microbiome-qiime2-workflow/SKILL.md +219 -0
- package/data/skills/bio-microbiome-taxonomy-assignment/SKILL.md +168 -0
- package/data/skills/bio-molecular-descriptors/SKILL.md +200 -0
- package/data/skills/bio-molecular-io/SKILL.md +188 -0
- package/data/skills/bio-motif-search/SKILL.md +354 -0
- package/data/skills/bio-multi-omics-data-harmonization/SKILL.md +228 -0
- package/data/skills/bio-multi-omics-mixomics-analysis/SKILL.md +221 -0
- package/data/skills/bio-multi-omics-mofa-integration/SKILL.md +225 -0
- package/data/skills/bio-multi-omics-similarity-network/SKILL.md +235 -0
- package/data/skills/bio-orchestrator/SKILL.md +133 -0
- package/data/skills/bio-paired-end-fastq/SKILL.md +334 -0
- package/data/skills/bio-pathway-enrichment-visualization/SKILL.md +278 -0
- package/data/skills/bio-pathway-go-enrichment/SKILL.md +218 -0
- package/data/skills/bio-pathway-gsea/SKILL.md +227 -0
- package/data/skills/bio-pathway-kegg-pathways/SKILL.md +234 -0
- package/data/skills/bio-pathway-reactome/SKILL.md +215 -0
- package/data/skills/bio-pathway-wikipathways/SKILL.md +255 -0
- package/data/skills/bio-pdb-geometric-analysis/SKILL.md +475 -0
- package/data/skills/bio-pdb-structure-io/SKILL.md +296 -0
- package/data/skills/bio-pdb-structure-modification/SKILL.md +448 -0
- package/data/skills/bio-pdb-structure-navigation/SKILL.md +335 -0
- package/data/skills/bio-phasing-imputation-genotype-imputation/SKILL.md +201 -0
- package/data/skills/bio-phasing-imputation-haplotype-phasing/SKILL.md +190 -0
- package/data/skills/bio-phasing-imputation-imputation-qc/SKILL.md +265 -0
- package/data/skills/bio-phasing-imputation-reference-panels/SKILL.md +203 -0
- package/data/skills/bio-phylo-distance-calculations/SKILL.md +307 -0
- package/data/skills/bio-phylo-modern-tree-inference/SKILL.md +274 -0
- package/data/skills/bio-phylo-tree-io/SKILL.md +252 -0
- package/data/skills/bio-phylo-tree-manipulation/SKILL.md +375 -0
- package/data/skills/bio-phylo-tree-visualization/SKILL.md +275 -0
- package/data/skills/bio-pileup-generation/SKILL.md +314 -0
- package/data/skills/bio-population-genetics-association-testing/SKILL.md +293 -0
- package/data/skills/bio-population-genetics-linkage-disequilibrium/SKILL.md +260 -0
- package/data/skills/bio-population-genetics-plink-basics/SKILL.md +338 -0
- package/data/skills/bio-population-genetics-population-structure/SKILL.md +352 -0
- package/data/skills/bio-population-genetics-scikit-allel-analysis/SKILL.md +306 -0
- package/data/skills/bio-population-genetics-selection-statistics/SKILL.md +251 -0
- package/data/skills/bio-primer-design-primer-basics/SKILL.md +289 -0
- package/data/skills/bio-primer-design-primer-validation/SKILL.md +344 -0
- package/data/skills/bio-primer-design-qpcr-primers/SKILL.md +273 -0
- package/data/skills/bio-proteomics-data-import/SKILL.md +122 -0
- package/data/skills/bio-proteomics-dia-analysis/SKILL.md +246 -0
- package/data/skills/bio-proteomics-differential-abundance/SKILL.md +129 -0
- package/data/skills/bio-proteomics-peptide-identification/SKILL.md +122 -0
- package/data/skills/bio-proteomics-protein-inference/SKILL.md +174 -0
- package/data/skills/bio-proteomics-proteomics-qc/SKILL.md +208 -0
- package/data/skills/bio-proteomics-ptm-analysis/SKILL.md +139 -0
- package/data/skills/bio-proteomics-quantification/SKILL.md +141 -0
- package/data/skills/bio-proteomics-spectral-libraries/SKILL.md +270 -0
- package/data/skills/bio-reaction-enumeration/SKILL.md +251 -0
- package/data/skills/bio-read-alignment-bowtie2-alignment/SKILL.md +189 -0
- package/data/skills/bio-read-alignment-bwa-alignment/SKILL.md +166 -0
- package/data/skills/bio-read-alignment-hisat2-alignment/SKILL.md +205 -0
- package/data/skills/bio-read-alignment-star-alignment/SKILL.md +204 -0
- package/data/skills/bio-read-qc-adapter-trimming/SKILL.md +222 -0
- package/data/skills/bio-read-qc-contamination-screening/SKILL.md +252 -0
- package/data/skills/bio-read-qc-fastp-workflow/SKILL.md +278 -0
- package/data/skills/bio-read-qc-quality-filtering/SKILL.md +231 -0
- package/data/skills/bio-read-qc-quality-reports/SKILL.md +204 -0
- package/data/skills/bio-read-qc-umi-processing/SKILL.md +391 -0
- package/data/skills/bio-read-sequences/SKILL.md +319 -0
- package/data/skills/bio-reference-operations/SKILL.md +302 -0
- package/data/skills/bio-reporting-automated-qc-reports/SKILL.md +103 -0
- package/data/skills/bio-reporting-figure-export/SKILL.md +112 -0
- package/data/skills/bio-reporting-jupyter-reports/SKILL.md +98 -0
- package/data/skills/bio-reporting-quarto-reports/SKILL.md +295 -0
- package/data/skills/bio-reporting-rmarkdown-reports/SKILL.md +276 -0
- package/data/skills/bio-research-tools-biomarker-signature-studio/SKILL.md +99 -0
- package/data/skills/bio-restriction-enzyme-selection/SKILL.md +342 -0
- package/data/skills/bio-restriction-fragment-analysis/SKILL.md +259 -0
- package/data/skills/bio-restriction-mapping/SKILL.md +239 -0
- package/data/skills/bio-restriction-sites/SKILL.md +222 -0
- package/data/skills/bio-reverse-complement/SKILL.md +250 -0
- package/data/skills/bio-ribo-seq-orf-detection/SKILL.md +303 -0
- package/data/skills/bio-ribo-seq-riboseq-preprocessing/SKILL.md +176 -0
- package/data/skills/bio-ribo-seq-ribosome-periodicity/SKILL.md +182 -0
- package/data/skills/bio-ribo-seq-ribosome-stalling/SKILL.md +217 -0
- package/data/skills/bio-ribo-seq-translation-efficiency/SKILL.md +183 -0
- package/data/skills/bio-rna-quantification-alignment-free-quant/SKILL.md +226 -0
- package/data/skills/bio-rna-quantification-count-matrix-qc/SKILL.md +310 -0
- package/data/skills/bio-rna-quantification-featurecounts-counting/SKILL.md +190 -0
- package/data/skills/bio-rna-quantification-tximport-workflow/SKILL.md +240 -0
- package/data/skills/bio-rnaseq-qc/SKILL.md +320 -0
- package/data/skills/bio-sam-bam-basics/SKILL.md +248 -0
- package/data/skills/bio-sashimi-plots/SKILL.md +175 -0
- package/data/skills/bio-seq-objects/SKILL.md +240 -0
- package/data/skills/bio-sequence-properties/SKILL.md +397 -0
- package/data/skills/bio-sequence-similarity/SKILL.md +335 -0
- package/data/skills/bio-sequence-slicing/SKILL.md +232 -0
- package/data/skills/bio-sequence-statistics/SKILL.md +318 -0
- package/data/skills/bio-similarity-searching/SKILL.md +200 -0
- package/data/skills/bio-single-cell-batch-integration/SKILL.md +317 -0
- package/data/skills/bio-single-cell-cell-annotation/SKILL.md +259 -0
- package/data/skills/bio-single-cell-cell-communication/SKILL.md +257 -0
- package/data/skills/bio-single-cell-clustering/SKILL.md +330 -0
- package/data/skills/bio-single-cell-data-io/SKILL.md +315 -0
- package/data/skills/bio-single-cell-doublet-detection/SKILL.md +362 -0
- package/data/skills/bio-single-cell-lineage-tracing/SKILL.md +319 -0
- package/data/skills/bio-single-cell-markers-annotation/SKILL.md +317 -0
- package/data/skills/bio-single-cell-metabolite-communication/SKILL.md +258 -0
- package/data/skills/bio-single-cell-multimodal-integration/SKILL.md +242 -0
- package/data/skills/bio-single-cell-perturb-seq/SKILL.md +258 -0
- package/data/skills/bio-single-cell-preprocessing/SKILL.md +338 -0
- package/data/skills/bio-single-cell-scatac-analysis/SKILL.md +326 -0
- package/data/skills/bio-single-cell-splicing/SKILL.md +199 -0
- package/data/skills/bio-single-cell-trajectory-inference/SKILL.md +225 -0
- package/data/skills/bio-small-rna-seq-differential-mirna/SKILL.md +194 -0
- package/data/skills/bio-small-rna-seq-mirdeep2-analysis/SKILL.md +180 -0
- package/data/skills/bio-small-rna-seq-mirge3-analysis/SKILL.md +178 -0
- package/data/skills/bio-small-rna-seq-smrna-preprocessing/SKILL.md +174 -0
- package/data/skills/bio-small-rna-seq-target-prediction/SKILL.md +202 -0
- package/data/skills/bio-spatial-transcriptomics-image-analysis/SKILL.md +283 -0
- package/data/skills/bio-spatial-transcriptomics-spatial-communication/SKILL.md +299 -0
- package/data/skills/bio-spatial-transcriptomics-spatial-data-io/SKILL.md +272 -0
- package/data/skills/bio-spatial-transcriptomics-spatial-deconvolution/SKILL.md +314 -0
- package/data/skills/bio-spatial-transcriptomics-spatial-domains/SKILL.md +254 -0
- package/data/skills/bio-spatial-transcriptomics-spatial-multiomics/SKILL.md +181 -0
- package/data/skills/bio-spatial-transcriptomics-spatial-neighbors/SKILL.md +198 -0
- package/data/skills/bio-spatial-transcriptomics-spatial-preprocessing/SKILL.md +269 -0
- package/data/skills/bio-spatial-transcriptomics-spatial-proteomics/SKILL.md +124 -0
- package/data/skills/bio-spatial-transcriptomics-spatial-statistics/SKILL.md +237 -0
- package/data/skills/bio-spatial-transcriptomics-spatial-visualization/SKILL.md +287 -0
- package/data/skills/bio-splicing-pipeline/SKILL.md +253 -0
- package/data/skills/bio-splicing-qc/SKILL.md +190 -0
- package/data/skills/bio-splicing-quantification/SKILL.md +145 -0
- package/data/skills/bio-sra-data/SKILL.md +363 -0
- package/data/skills/bio-structural-biology-alphafold-predictions/SKILL.md +258 -0
- package/data/skills/bio-structural-biology-modern-structure-prediction/SKILL.md +346 -0
- package/data/skills/bio-substructure-search/SKILL.md +206 -0
- package/data/skills/bio-systems-biology-context-specific-models/SKILL.md +241 -0
- package/data/skills/bio-systems-biology-flux-balance-analysis/SKILL.md +206 -0
- package/data/skills/bio-systems-biology-gene-essentiality/SKILL.md +235 -0
- package/data/skills/bio-systems-biology-metabolic-reconstruction/SKILL.md +215 -0
- package/data/skills/bio-systems-biology-model-curation/SKILL.md +243 -0
- package/data/skills/bio-tcr-bcr-analysis-immcantation-analysis/SKILL.md +195 -0
- package/data/skills/bio-tcr-bcr-analysis-mixcr-analysis/SKILL.md +167 -0
- package/data/skills/bio-tcr-bcr-analysis-repertoire-visualization/SKILL.md +224 -0
- package/data/skills/bio-tcr-bcr-analysis-scirpy-analysis/SKILL.md +168 -0
- package/data/skills/bio-tcr-bcr-analysis-vdjtools-analysis/SKILL.md +188 -0
- package/data/skills/bio-transcription-translation/SKILL.md +237 -0
- package/data/skills/bio-tumor-fraction-estimation/SKILL.md +211 -0
- package/data/skills/bio-uniprot-access/SKILL.md +239 -0
- package/data/skills/bio-variant-annotation/SKILL.md +410 -0
- package/data/skills/bio-variant-calling/SKILL.md +266 -0
- package/data/skills/bio-variant-calling-clinical-interpretation/SKILL.md +355 -0
- package/data/skills/bio-variant-calling-deepvariant/SKILL.md +315 -0
- package/data/skills/bio-variant-calling-filtering-best-practices/SKILL.md +403 -0
- package/data/skills/bio-variant-calling-joint-calling/SKILL.md +338 -0
- package/data/skills/bio-variant-calling-structural-variant-calling/SKILL.md +253 -0
- package/data/skills/bio-variant-normalization/SKILL.md +325 -0
- package/data/skills/bio-vcf-basics/SKILL.md +342 -0
- package/data/skills/bio-vcf-manipulation/SKILL.md +429 -0
- package/data/skills/bio-vcf-statistics/SKILL.md +445 -0
- package/data/skills/bio-virtual-screening/SKILL.md +263 -0
- package/data/skills/bio-workflow-management-cwl-workflows/SKILL.md +433 -0
- package/data/skills/bio-workflow-management-nextflow-pipelines/SKILL.md +386 -0
- package/data/skills/bio-workflow-management-snakemake-workflows/SKILL.md +383 -0
- package/data/skills/bio-workflow-management-wdl-workflows/SKILL.md +500 -0
- package/data/skills/bio-workflows-atacseq-pipeline/SKILL.md +362 -0
- package/data/skills/bio-workflows-biomarker-pipeline/SKILL.md +272 -0
- package/data/skills/bio-workflows-chipseq-pipeline/SKILL.md +282 -0
- package/data/skills/bio-workflows-clip-pipeline/SKILL.md +268 -0
- package/data/skills/bio-workflows-cnv-pipeline/SKILL.md +324 -0
- package/data/skills/bio-workflows-crispr-editing-pipeline/SKILL.md +455 -0
- package/data/skills/bio-workflows-crispr-screen-pipeline/SKILL.md +278 -0
- package/data/skills/bio-workflows-cytometry-pipeline/SKILL.md +328 -0
- package/data/skills/bio-workflows-expression-to-pathways/SKILL.md +329 -0
- package/data/skills/bio-workflows-fastq-to-variants/SKILL.md +374 -0
- package/data/skills/bio-workflows-genome-assembly-pipeline/SKILL.md +290 -0
- package/data/skills/bio-workflows-gwas-pipeline/SKILL.md +323 -0
- package/data/skills/bio-workflows-hic-pipeline/SKILL.md +304 -0
- package/data/skills/bio-workflows-imc-pipeline/SKILL.md +304 -0
- package/data/skills/bio-workflows-longread-sv-pipeline/SKILL.md +281 -0
- package/data/skills/bio-workflows-merip-pipeline/SKILL.md +222 -0
- package/data/skills/bio-workflows-metabolic-modeling-pipeline/SKILL.md +408 -0
- package/data/skills/bio-workflows-metabolomics-pipeline/SKILL.md +297 -0
- package/data/skills/bio-workflows-metagenomics-pipeline/SKILL.md +283 -0
- package/data/skills/bio-workflows-methylation-pipeline/SKILL.md +274 -0
- package/data/skills/bio-workflows-microbiome-pipeline/SKILL.md +221 -0
- package/data/skills/bio-workflows-multi-omics-pipeline/SKILL.md +362 -0
- package/data/skills/bio-workflows-multiome-pipeline/SKILL.md +298 -0
- package/data/skills/bio-workflows-neoantigen-pipeline/SKILL.md +325 -0
- package/data/skills/bio-workflows-outbreak-pipeline/SKILL.md +341 -0
- package/data/skills/bio-workflows-proteomics-pipeline/SKILL.md +226 -0
- package/data/skills/bio-workflows-riboseq-pipeline/SKILL.md +94 -0
- package/data/skills/bio-workflows-rnaseq-to-de/SKILL.md +345 -0
- package/data/skills/bio-workflows-scrnaseq-pipeline/SKILL.md +354 -0
- package/data/skills/bio-workflows-smrna-pipeline/SKILL.md +86 -0
- package/data/skills/bio-workflows-somatic-variant-pipeline/SKILL.md +313 -0
- package/data/skills/bio-workflows-spatial-pipeline/SKILL.md +267 -0
- package/data/skills/bio-workflows-tcr-pipeline/SKILL.md +84 -0
- package/data/skills/bio-write-sequences/SKILL.md +205 -0
- package/data/skills/bioinformatics-singlecell/SKILL.md +143 -0
- package/data/skills/biokernel/SKILL.md +61 -0
- package/data/skills/biologist-analyst/SKILL.md +799 -0
- package/data/skills/biomaster-workflows/SKILL.md +55 -0
- package/data/skills/biomcp-server/SKILL.md +65 -0
- package/data/skills/biomedical-data-analysis/SKILL.md +56 -0
- package/data/skills/biomedical-search/SKILL.md +214 -0
- package/data/skills/biomni/SKILL.md +309 -0
- package/data/skills/biomni-general-agent/SKILL.md +43 -0
- package/data/skills/biomni-research-agent/SKILL.md +76 -0
- package/data/skills/biopython/SKILL.md +437 -0
- package/data/skills/biorxiv-database/SKILL.md +477 -0
- package/data/skills/bioservices/SKILL.md +355 -0
- package/data/skills/boltz/SKILL.md +188 -0
- package/data/skills/boltzgen/SKILL.md +287 -0
- package/data/skills/bone-marrow-ai-agent/SKILL.md +163 -0
- package/data/skills/brainstorming/SKILL.md +96 -0
- package/data/skills/brenda-database/SKILL.md +714 -0
- package/data/skills/bulk-combat-correction/SKILL.md +54 -0
- package/data/skills/bulk-deg-analysis/SKILL.md +61 -0
- package/data/skills/bulk-deseq2-analysis/SKILL.md +50 -0
- package/data/skills/bulk-stringdb-ppi/SKILL.md +49 -0
- package/data/skills/bulk-to-single-deconvolution/SKILL.md +50 -0
- package/data/skills/bulk-trajblend-interpolation/SKILL.md +52 -0
- package/data/skills/bulk-wgcna-analysis/SKILL.md +56 -0
- package/data/skills/cancer-metabolism-agent/SKILL.md +180 -0
- package/data/skills/care-coordination/SKILL.md +35 -0
- package/data/skills/cart-design-optimizer-agent/SKILL.md +162 -0
- package/data/skills/cbioportal-database/SKILL.md +367 -0
- package/data/skills/cell-free-expression/SKILL.md +291 -0
- package/data/skills/cellagent-annotation/SKILL.md +69 -0
- package/data/skills/cellfree-rna-agent/SKILL.md +182 -0
- package/data/skills/cellular-senescence-agent/SKILL.md +183 -0
- package/data/skills/cellxgene-census/SKILL.md +505 -0
- package/data/skills/chai/SKILL.md +272 -0
- package/data/skills/chatehr-clinician-assistant/SKILL.md +67 -0
- package/data/skills/chematagent-drug-discovery/SKILL.md +68 -0
- package/data/skills/chembl-database/SKILL.md +383 -0
- package/data/skills/chembl-search/SKILL.md +211 -0
- package/data/skills/chemcrow-drug-discovery/SKILL.md +61 -0
- package/data/skills/chemical-property-lookup/SKILL.md +42 -0
- package/data/skills/chemist-analyst/SKILL.md +1603 -0
- package/data/skills/chemistry-agent/SKILL.md +62 -0
- package/data/skills/chip-clonal-hematopoiesis-agent/SKILL.md +224 -0
- package/data/skills/chromosomal-instability-agent/SKILL.md +187 -0
- package/data/skills/citation-management/SKILL.md +1081 -0
- package/data/skills/claims-appeals/SKILL.md +35 -0
- package/data/skills/claw-ancestry-pca/SKILL.md +145 -0
- package/data/skills/claw-metagenomics/SKILL.md +238 -0
- package/data/skills/claw-semantic-sim/SKILL.md +151 -0
- package/data/skills/clinical-decision-support/SKILL.md +504 -0
- package/data/skills/clinical-diagnostic-reasoning/SKILL.md +222 -0
- package/data/skills/clinical-nlp-extractor/SKILL.md +59 -0
- package/data/skills/clinical-note-summarization/SKILL.md +52 -0
- package/data/skills/clinical-reports/SKILL.md +1127 -0
- package/data/skills/clinical-trial-protocol-skill/SKILL.md +508 -0
- package/data/skills/clinical-trials-search/SKILL.md +211 -0
- package/data/skills/clinicaltrials-database/SKILL.md +501 -0
- package/data/skills/clinpgx/SKILL.md +96 -0
- package/data/skills/clinpgx-database/SKILL.md +632 -0
- package/data/skills/clinvar-database/SKILL.md +356 -0
- package/data/skills/cnv-caller-agent/SKILL.md +171 -0
- package/data/skills/coagulation-thrombosis-agent/SKILL.md +141 -0
- package/data/skills/cobrapy/SKILL.md +457 -0
- package/data/skills/compbioagent-explorer/SKILL.md +67 -0
- package/data/skills/computational-pathology-agent/SKILL.md +72 -0
- package/data/skills/convergence-study/SKILL.md +98 -0
- package/data/skills/cosmic-database/SKILL.md +330 -0
- package/data/skills/crisis-detection-intervention-ai/SKILL.md +569 -0
- package/data/skills/crisis-response-protocol/SKILL.md +456 -0
- package/data/skills/crispr-guide-design/SKILL.md +72 -0
- package/data/skills/crispr-offtarget-predictor/SKILL.md +56 -0
- package/data/skills/cryoem-ai-drug-design-agent/SKILL.md +216 -0
- package/data/skills/ctdna-dynamics-mrd-agent/SKILL.md +206 -0
- package/data/skills/cytokine-storm-analysis-agent/SKILL.md +180 -0
- package/data/skills/dask/SKILL.md +454 -0
- package/data/skills/data-stats-analysis/SKILL.md +477 -0
- package/data/skills/data-transform/SKILL.md +576 -0
- package/data/skills/data-visualization-biomedical/SKILL.md +252 -0
- package/data/skills/data-visualization-expert/SKILL.md +72 -0
- package/data/skills/data-viz-plots/SKILL.md +461 -0
- package/data/skills/datacommons-client/SKILL.md +253 -0
- package/data/skills/datamol/SKILL.md +700 -0
- package/data/skills/deep-research/SKILL.md +111 -0
- package/data/skills/deep-research-swarm/SKILL.md +62 -0
- package/data/skills/deep-visual-proteomics-agent/SKILL.md +149 -0
- package/data/skills/deepchem/SKILL.md +591 -0
- package/data/skills/deeptools/SKILL.md +525 -0
- package/data/skills/depmap/SKILL.md +300 -0
- package/data/skills/diffdock/SKILL.md +477 -0
- package/data/skills/differentiation-schemes/SKILL.md +159 -0
- package/data/skills/digital-twin-clinical-agent/SKILL.md +228 -0
- package/data/skills/dispatching-parallel-agents/SKILL.md +180 -0
- package/data/skills/dnanexus-integration/SKILL.md +376 -0
- package/data/skills/doc-coauthoring/SKILL.md +375 -0
- package/data/skills/docx/SKILL.md +590 -0
- package/data/skills/docx-official/SKILL.md +197 -0
- package/data/skills/drug-discovery-search/SKILL.md +214 -0
- package/data/skills/drug-interaction-checker/SKILL.md +56 -0
- package/data/skills/drug-labels-search/SKILL.md +211 -0
- package/data/skills/drug-photo/SKILL.md +149 -0
- package/data/skills/drugbank-database/SKILL.md +184 -0
- package/data/skills/drugbank-search/SKILL.md +211 -0
- package/data/skills/ehr-fhir-integration/SKILL.md +60 -0
- package/data/skills/emergency-card/SKILL.md +426 -0
- package/data/skills/ena-database/SKILL.md +198 -0
- package/data/skills/ensembl-database/SKILL.md +305 -0
- package/data/skills/epidemiologist-analyst/SKILL.md +1844 -0
- package/data/skills/epigenomics-methylgpt-agent/SKILL.md +111 -0
- package/data/skills/equity-scorer/SKILL.md +182 -0
- package/data/skills/esm/SKILL.md +300 -0
- package/data/skills/etetoolkit/SKILL.md +617 -0
- package/data/skills/executing-plans/SKILL.md +84 -0
- package/data/skills/exosome-ev-analysis-agent/SKILL.md +171 -0
- package/data/skills/exploratory-data-analysis/SKILL.md +440 -0
- package/data/skills/family-health-analyzer/SKILL.md +137 -0
- package/data/skills/fastq-analysis/SKILL.md +191 -0
- package/data/skills/fda-database/SKILL.md +512 -0
- package/data/skills/fhir-developer-skill/SKILL.md +294 -0
- package/data/skills/fhir-development/SKILL.md +35 -0
- package/data/skills/find-skills/SKILL.md +133 -0
- package/data/skills/finishing-a-development-branch/SKILL.md +200 -0
- package/data/skills/fitness-analyzer/SKILL.md +431 -0
- package/data/skills/flowio/SKILL.md +602 -0
- package/data/skills/foldseek/SKILL.md +179 -0
- package/data/skills/galaxy-bridge/SKILL.md +215 -0
- package/data/skills/gene-database/SKILL.md +173 -0
- package/data/skills/gene-panel-design-agent/SKILL.md +192 -0
- package/data/skills/geniml/SKILL.md +312 -0
- package/data/skills/genome-compare/SKILL.md +127 -0
- package/data/skills/geo-database/SKILL.md +809 -0
- package/data/skills/geopandas/SKILL.md +245 -0
- package/data/skills/gget/SKILL.md +865 -0
- package/data/skills/ginkgo-cloud-lab/SKILL.md +56 -0
- package/data/skills/glycoengineering/SKILL.md +338 -0
- package/data/skills/gnomad-database/SKILL.md +395 -0
- package/data/skills/goal-analyzer/SKILL.md +605 -0
- package/data/skills/grief-companion/SKILL.md +250 -0
- package/data/skills/gsea-enrichment/SKILL.md +151 -0
- package/data/skills/gtars/SKILL.md +279 -0
- package/data/skills/gtex-database/SKILL.md +315 -0
- package/data/skills/gwas-database/SKILL.md +602 -0
- package/data/skills/gwas-lookup/SKILL.md +122 -0
- package/data/skills/gwas-prs/SKILL.md +178 -0
- package/data/skills/health-trend-analyzer/SKILL.md +451 -0
- package/data/skills/hemoglobinopathy-analysis-agent/SKILL.md +167 -0
- package/data/skills/hipaa-compliance/SKILL.md +230 -0
- package/data/skills/histolab/SKILL.md +672 -0
- package/data/skills/hmdb-database/SKILL.md +190 -0
- package/data/skills/hrd-analysis-agent/SKILL.md +184 -0
- package/data/skills/hrv-alexithymia-expert/SKILL.md +151 -0
- package/data/skills/hypogenic/SKILL.md +649 -0
- package/data/skills/hypothesis-generation/SKILL.md +286 -0
- package/data/skills/imaging-data-commons/SKILL.md +843 -0
- package/data/skills/immune-checkpoint-combination-agent/SKILL.md +170 -0
- package/data/skills/infographics/SKILL.md +563 -0
- package/data/skills/instrument-data-to-allotrope/SKILL.md +280 -0
- package/data/skills/interpro-database/SKILL.md +305 -0
- package/data/skills/ipsae/SKILL.md +190 -0
- package/data/skills/iso-13485-certification/SKILL.md +678 -0
- package/data/skills/jaspar-database/SKILL.md +351 -0
- package/data/skills/jungian-psychologist/SKILL.md +191 -0
- package/data/skills/kegg-database/SKILL.md +371 -0
- package/data/skills/knowledge-synthesis/SKILL.md +283 -0
- package/data/skills/kragen-knowledge-graph/SKILL.md +68 -0
- package/data/skills/lab-results/SKILL.md +35 -0
- package/data/skills/labarchive-integration/SKILL.md +262 -0
- package/data/skills/labstep/SKILL.md +208 -0
- package/data/skills/lamindb/SKILL.md +384 -0
- package/data/skills/latchbio-integration/SKILL.md +347 -0
- package/data/skills/latex-posters/SKILL.md +1602 -0
- package/data/skills/leads-literature-mining/SKILL.md +68 -0
- package/data/skills/ligandmpnn/SKILL.md +170 -0
- package/data/skills/linear-solvers/SKILL.md +165 -0
- package/data/skills/liquid-biopsy-analytics-agent/SKILL.md +171 -0
- package/data/skills/lit-synthesizer/SKILL.md +53 -0
- package/data/skills/literature-review/SKILL.md +584 -0
- package/data/skills/literature-search/SKILL.md +214 -0
- package/data/skills/lobster-bioinformatics/SKILL.md +305 -0
- package/data/skills/long-read-sequencing-agent/SKILL.md +181 -0
- package/data/skills/mage-antibody-generator/SKILL.md +54 -0
- package/data/skills/markdown-mermaid-writing/SKILL.md +327 -0
- package/data/skills/markitdown/SKILL.md +486 -0
- package/data/skills/matchms/SKILL.md +197 -0
- package/data/skills/matplotlib/SKILL.md +359 -0
- package/data/skills/mcpmed-bioinformatics-server/SKILL.md +42 -0
- package/data/skills/medchem/SKILL.md +400 -0
- package/data/skills/medea-therapeutic-discovery/SKILL.md +45 -0
- package/data/skills/medical-entity-extractor/SKILL.md +144 -0
- package/data/skills/medical-imaging-review/SKILL.md +170 -0
- package/data/skills/medical-research-toolkit/SKILL.md +273 -0
- package/data/skills/medrxiv-search/SKILL.md +211 -0
- package/data/skills/mental-health-analyzer/SKILL.md +981 -0
- package/data/skills/mesh-generation/SKILL.md +149 -0
- package/data/skills/metabolomics-workbench-database/SKILL.md +253 -0
- package/data/skills/microbiome-cancer-agent/SKILL.md +180 -0
- package/data/skills/modern-drug-rehab-computer/SKILL.md +392 -0
- package/data/skills/molecular-dynamics/SKILL.md +457 -0
- package/data/skills/molecular-glue-discovery-agent/SKILL.md +224 -0
- package/data/skills/molecule-evolution-agent/SKILL.md +62 -0
- package/data/skills/molfeat/SKILL.md +505 -0
- package/data/skills/monarch-database/SKILL.md +372 -0
- package/data/skills/mpn-progression-monitor-agent/SKILL.md +228 -0
- package/data/skills/mpn-research-assistant/SKILL.md +197 -0
- package/data/skills/mrd-edge-detection-agent/SKILL.md +213 -0
- package/data/skills/multi-ancestry-prs-agent/SKILL.md +224 -0
- package/data/skills/multi-search-engine/SKILL.md +110 -0
- package/data/skills/multimodal-medical-imaging/SKILL.md +59 -0
- package/data/skills/multimodal-radpath-fusion-agent/SKILL.md +213 -0
- package/data/skills/myeloma-mrd-agent/SKILL.md +184 -0
- package/data/skills/networkx/SKILL.md +435 -0
- package/data/skills/neurokit2/SKILL.md +350 -0
- package/data/skills/neuropixels-analysis/SKILL.md +344 -0
- package/data/skills/nextflow-development/SKILL.md +290 -0
- package/data/skills/ngs-analysis/SKILL.md +183 -0
- package/data/skills/nicheformer-spatial-agent/SKILL.md +197 -0
- package/data/skills/nk-cell-therapy-agent/SKILL.md +186 -0
- package/data/skills/nonlinear-solvers/SKILL.md +180 -0
- package/data/skills/numerical-integration/SKILL.md +166 -0
- package/data/skills/numerical-stability/SKILL.md +149 -0
- package/data/skills/nutrition-analyzer/SKILL.md +775 -0
- package/data/skills/occupational-health-analyzer/SKILL.md +386 -0
- package/data/skills/omero-integration/SKILL.md +245 -0
- package/data/skills/ontology-explorer/SKILL.md +168 -0
- package/data/skills/ontology-mapper/SKILL.md +171 -0
- package/data/skills/ontology-validator/SKILL.md +136 -0
- package/data/skills/open-notebook/SKILL.md +289 -0
- package/data/skills/open-targets-search/SKILL.md +211 -0
- package/data/skills/openalex-database/SKILL.md +488 -0
- package/data/skills/opentargets-database/SKILL.md +367 -0
- package/data/skills/opentrons-integration/SKILL.md +567 -0
- package/data/skills/opentrons-protocol-agent/SKILL.md +58 -0
- package/data/skills/organoid-drug-response-agent/SKILL.md +189 -0
- package/data/skills/pan-cancer-multiomics-agent/SKILL.md +159 -0
- package/data/skills/paper-2-web/SKILL.md +495 -0
- package/data/skills/parameter-optimization/SKILL.md +141 -0
- package/data/skills/patents-search/SKILL.md +211 -0
- package/data/skills/pathml/SKILL.md +160 -0
- package/data/skills/patiently-ai/SKILL.md +103 -0
- package/data/skills/pdb/SKILL.md +217 -0
- package/data/skills/pdb-database/SKILL.md +303 -0
- package/data/skills/pdf/SKILL.md +314 -0
- package/data/skills/pdf-anthropic/SKILL.md +294 -0
- package/data/skills/pdf-processing/SKILL.md +149 -0
- package/data/skills/pdf-processing-pro/SKILL.md +296 -0
- package/data/skills/pdx-model-analysis-agent/SKILL.md +169 -0
- package/data/skills/peer-review/SKILL.md +565 -0
- package/data/skills/performance-profiling/SKILL.md +255 -0
- package/data/skills/perplexity-search/SKILL.md +441 -0
- package/data/skills/pharmacogenomics-agent/SKILL.md +143 -0
- package/data/skills/pharmgx-reporter/SKILL.md +134 -0
- package/data/skills/phylogenetics/SKILL.md +404 -0
- package/data/skills/plotly/SKILL.md +265 -0
- package/data/skills/polars/SKILL.md +385 -0
- package/data/skills/popeve-variant-predictor-agent/SKILL.md +213 -0
- package/data/skills/post-processing/SKILL.md +338 -0
- package/data/skills/pptx/SKILL.md +232 -0
- package/data/skills/pptx-official/SKILL.md +484 -0
- package/data/skills/pptx-posters/SKILL.md +414 -0
- package/data/skills/precision-oncology-agent/SKILL.md +53 -0
- package/data/skills/prior-auth-coworker/SKILL.md +60 -0
- package/data/skills/prior-auth-review-skill/SKILL.md +360 -0
- package/data/skills/profile-report/SKILL.md +120 -0
- package/data/skills/protac-design-agent/SKILL.md +220 -0
- package/data/skills/protein-design-workflow/SKILL.md +199 -0
- package/data/skills/protein-qc/SKILL.md +300 -0
- package/data/skills/protein-structure-prediction/SKILL.md +59 -0
- package/data/skills/proteinmpnn/SKILL.md +279 -0
- package/data/skills/protocolsio-integration/SKILL.md +415 -0
- package/data/skills/prs-net-deep-learning-agent/SKILL.md +232 -0
- package/data/skills/psychologist-analyst/SKILL.md +1888 -0
- package/data/skills/pubchem-database/SKILL.md +568 -0
- package/data/skills/pubmed-database/SKILL.md +454 -0
- package/data/skills/pubmed-search/SKILL.md +103 -0
- package/data/skills/pydeseq2/SKILL.md +553 -0
- package/data/skills/pydicom/SKILL.md +428 -0
- package/data/skills/pyhealth/SKILL.md +485 -0
- package/data/skills/pylabrobot/SKILL.md +179 -0
- package/data/skills/pymc/SKILL.md +566 -0
- package/data/skills/pymoo/SKILL.md +565 -0
- package/data/skills/pyopenms/SKILL.md +211 -0
- package/data/skills/pysam/SKILL.md +259 -0
- package/data/skills/pytdc/SKILL.md +454 -0
- package/data/skills/pytorch-lightning/SKILL.md +172 -0
- package/data/skills/pyzotero/SKILL.md +111 -0
- package/data/skills/radgpt-radiology-reporter/SKILL.md +67 -0
- package/data/skills/radiomics-pathomics-fusion-agent/SKILL.md +221 -0
- package/data/skills/rdkit/SKILL.md +763 -0
- package/data/skills/reactome-database/SKILL.md +272 -0
- package/data/skills/receiving-code-review/SKILL.md +213 -0
- package/data/skills/recovery-community-moderator/SKILL.md +175 -0
- package/data/skills/regulatory-drafter/SKILL.md +56 -0
- package/data/skills/regulatory-drafting/SKILL.md +35 -0
- package/data/skills/rehabilitation-analyzer/SKILL.md +636 -0
- package/data/skills/repro-enforcer/SKILL.md +50 -0
- package/data/skills/requesting-code-review/SKILL.md +105 -0
- package/data/skills/research-grants/SKILL.md +935 -0
- package/data/skills/research-literature/SKILL.md +35 -0
- package/data/skills/research-lookup/SKILL.md +502 -0
- package/data/skills/rfdiffusion/SKILL.md +306 -0
- package/data/skills/rna-velocity-agent/SKILL.md +174 -0
- package/data/skills/scanpy/SKILL.md +380 -0
- package/data/skills/scfoundation-model-agent/SKILL.md +210 -0
- package/data/skills/scientific-brainstorming/SKILL.md +185 -0
- package/data/skills/scientific-critical-thinking/SKILL.md +566 -0
- package/data/skills/scientific-manuscript/SKILL.md +181 -0
- package/data/skills/scientific-problem-selection/SKILL.md +269 -0
- package/data/skills/scientific-schematics/SKILL.md +619 -0
- package/data/skills/scientific-slides/SKILL.md +1154 -0
- package/data/skills/scientific-visualization/SKILL.md +773 -0
- package/data/skills/scientific-writing/SKILL.md +483 -0
- package/data/skills/scikit-bio/SKILL.md +431 -0
- package/data/skills/scikit-learn/SKILL.md +515 -0
- package/data/skills/scikit-survival/SKILL.md +393 -0
- package/data/skills/scrna-orchestrator/SKILL.md +204 -0
- package/data/skills/scrna-qc/SKILL.md +43 -0
- package/data/skills/scvelo/SKILL.md +321 -0
- package/data/skills/scvi-tools/SKILL.md +184 -0
- package/data/skills/seaborn/SKILL.md +671 -0
- package/data/skills/search-strategy/SKILL.md +247 -0
- package/data/skills/seq-wrangler/SKILL.md +58 -0
- package/data/skills/shap/SKILL.md +560 -0
- package/data/skills/simo-multiomics-integration-agent/SKILL.md +178 -0
- package/data/skills/simpy/SKILL.md +423 -0
- package/data/skills/simulation-orchestrator/SKILL.md +230 -0
- package/data/skills/simulation-validator/SKILL.md +195 -0
- package/data/skills/single-annotation/SKILL.md +129 -0
- package/data/skills/single-cell-rna-qc/SKILL.md +175 -0
- package/data/skills/single-cellphone-db/SKILL.md +68 -0
- package/data/skills/single-clustering/SKILL.md +75 -0
- package/data/skills/single-downstream-analysis/SKILL.md +150 -0
- package/data/skills/single-multiomics/SKILL.md +44 -0
- package/data/skills/single-preprocessing/SKILL.md +184 -0
- package/data/skills/single-to-spatial-mapping/SKILL.md +48 -0
- package/data/skills/single-trajectory/SKILL.md +62 -0
- package/data/skills/sleep-analyzer/SKILL.md +773 -0
- package/data/skills/slurm-job-script-generator/SKILL.md +135 -0
- package/data/skills/solublempnn/SKILL.md +165 -0
- package/data/skills/spatial-agent/SKILL.md +56 -0
- package/data/skills/spatial-epigenomics-agent/SKILL.md +163 -0
- package/data/skills/spatial-transcriptomics-agent/SKILL.md +75 -0
- package/data/skills/spatial-transcriptomics-analysis/SKILL.md +72 -0
- package/data/skills/spatial-transcriptomics-analysis/STAgent/SKILL.md +75 -0
- package/data/skills/spatial-transcriptomics-analysis/SpatialAgent/SKILL.md +56 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/image-analysis/SKILL.md +266 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-communication/SKILL.md +287 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-data-io/SKILL.md +243 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-deconvolution/SKILL.md +298 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-domains/SKILL.md +229 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-multiomics/SKILL.md +172 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-neighbors/SKILL.md +189 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-preprocessing/SKILL.md +232 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-proteomics/SKILL.md +127 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-statistics/SKILL.md +225 -0
- package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-visualization/SKILL.md +270 -0
- package/data/skills/spatial-tutorials/SKILL.md +87 -0
- package/data/skills/speech-pathology-ai/SKILL.md +184 -0
- package/data/skills/statistical-analysis/SKILL.md +626 -0
- package/data/skills/statsmodels/SKILL.md +608 -0
- package/data/skills/string-database/SKILL.md +528 -0
- package/data/skills/struct-predictor/SKILL.md +52 -0
- package/data/skills/subagent-driven-development/SKILL.md +242 -0
- package/data/skills/systematic-debugging/SKILL.md +296 -0
- package/data/skills/tcell-exhaustion-analysis-agent/SKILL.md +139 -0
- package/data/skills/tcga-preprocessing/SKILL.md +49 -0
- package/data/skills/tcm-constitution-analyzer/SKILL.md +664 -0
- package/data/skills/tcr-pmhc-prediction-agent/SKILL.md +226 -0
- package/data/skills/tcr-repertoire-analysis-agent/SKILL.md +218 -0
- package/data/skills/test-driven-development/SKILL.md +371 -0
- package/data/skills/tiledbvcf/SKILL.md +459 -0
- package/data/skills/time-resolved-cryoem-agent/SKILL.md +223 -0
- package/data/skills/time-stepping/SKILL.md +140 -0
- package/data/skills/timesfm-forecasting/SKILL.md +785 -0
- package/data/skills/tme-immune-profiling-agent/SKILL.md +220 -0
- package/data/skills/tooluniverse-adverse-event-detection/SKILL.md +1115 -0
- package/data/skills/tooluniverse-antibody-engineering/SKILL.md +1581 -0
- package/data/skills/tooluniverse-binder-discovery/SKILL.md +1459 -0
- package/data/skills/tooluniverse-cancer-variant-interpretation/SKILL.md +971 -0
- package/data/skills/tooluniverse-chemical-compound-retrieval/SKILL.md +322 -0
- package/data/skills/tooluniverse-chemical-safety/SKILL.md +733 -0
- package/data/skills/tooluniverse-clinical-guidelines/SKILL.md +399 -0
- package/data/skills/tooluniverse-clinical-trial-design/SKILL.md +1195 -0
- package/data/skills/tooluniverse-clinical-trial-matching/SKILL.md +1333 -0
- package/data/skills/tooluniverse-crispr-screen-analysis/SKILL.md +900 -0
- package/data/skills/tooluniverse-disease-research/SKILL.md +630 -0
- package/data/skills/tooluniverse-drug-drug-interaction/SKILL.md +73 -0
- package/data/skills/tooluniverse-drug-repurposing/SKILL.md +595 -0
- package/data/skills/tooluniverse-drug-research/SKILL.md +1642 -0
- package/data/skills/tooluniverse-drug-target-validation/SKILL.md +1206 -0
- package/data/skills/tooluniverse-epigenomics/SKILL.md +1489 -0
- package/data/skills/tooluniverse-expression-data-retrieval/SKILL.md +389 -0
- package/data/skills/tooluniverse-gene-enrichment/SKILL.md +402 -0
- package/data/skills/tooluniverse-gwas-drug-discovery/SKILL.md +576 -0
- package/data/skills/tooluniverse-gwas-finemapping/SKILL.md +309 -0
- package/data/skills/tooluniverse-gwas-snp-interpretation/SKILL.md +223 -0
- package/data/skills/tooluniverse-gwas-study-explorer/SKILL.md +342 -0
- package/data/skills/tooluniverse-gwas-trait-to-gene/SKILL.md +236 -0
- package/data/skills/tooluniverse-image-analysis/SKILL.md +439 -0
- package/data/skills/tooluniverse-immune-repertoire-analysis/SKILL.md +949 -0
- package/data/skills/tooluniverse-immunotherapy-response-prediction/SKILL.md +865 -0
- package/data/skills/tooluniverse-infectious-disease/SKILL.md +749 -0
- package/data/skills/tooluniverse-literature-deep-research/SKILL.md +1050 -0
- package/data/skills/tooluniverse-metabolomics/SKILL.md +298 -0
- package/data/skills/tooluniverse-metabolomics-analysis/SKILL.md +764 -0
- package/data/skills/tooluniverse-multi-omics-integration/SKILL.md +703 -0
- package/data/skills/tooluniverse-multiomic-disease-characterization/SKILL.md +1138 -0
- package/data/skills/tooluniverse-network-pharmacology/SKILL.md +1312 -0
- package/data/skills/tooluniverse-pharmacovigilance/SKILL.md +807 -0
- package/data/skills/tooluniverse-phylogenetics/SKILL.md +461 -0
- package/data/skills/tooluniverse-polygenic-risk-score/SKILL.md +397 -0
- package/data/skills/tooluniverse-precision-medicine-stratification/SKILL.md +1143 -0
- package/data/skills/tooluniverse-precision-oncology/SKILL.md +1091 -0
- package/data/skills/tooluniverse-protein-interactions/SKILL.md +446 -0
- package/data/skills/tooluniverse-protein-structure-retrieval/SKILL.md +416 -0
- package/data/skills/tooluniverse-protein-therapeutic-design/SKILL.md +637 -0
- package/data/skills/tooluniverse-proteomics-analysis/SKILL.md +843 -0
- package/data/skills/tooluniverse-rare-disease-diagnosis/SKILL.md +1257 -0
- package/data/skills/tooluniverse-rnaseq-deseq2/SKILL.md +536 -0
- package/data/skills/tooluniverse-sequence-retrieval/SKILL.md +419 -0
- package/data/skills/tooluniverse-single-cell/SKILL.md +719 -0
- package/data/skills/tooluniverse-spatial-omics-analysis/SKILL.md +1102 -0
- package/data/skills/tooluniverse-spatial-transcriptomics/SKILL.md +788 -0
- package/data/skills/tooluniverse-statistical-modeling/SKILL.md +557 -0
- package/data/skills/tooluniverse-structural-variant-analysis/SKILL.md +1356 -0
- package/data/skills/tooluniverse-systems-biology/SKILL.md +374 -0
- package/data/skills/tooluniverse-target-research/SKILL.md +1510 -0
- package/data/skills/tooluniverse-variant-analysis/SKILL.md +448 -0
- package/data/skills/tooluniverse-variant-interpretation/SKILL.md +1118 -0
- package/data/skills/torch-geometric/SKILL.md +674 -0
- package/data/skills/torch_geometric/SKILL.md +670 -0
- package/data/skills/torchdrug/SKILL.md +444 -0
- package/data/skills/tpd-ternary-complex-agent/SKILL.md +226 -0
- package/data/skills/transformers/SKILL.md +157 -0
- package/data/skills/travel-health-analyzer/SKILL.md +421 -0
- package/data/skills/treatment-plans/SKILL.md +1576 -0
- package/data/skills/trial-eligibility-agent/SKILL.md +54 -0
- package/data/skills/trialgpt-matching/SKILL.md +66 -0
- package/data/skills/tumor-clonal-evolution-agent/SKILL.md +134 -0
- package/data/skills/tumor-heterogeneity-agent/SKILL.md +216 -0
- package/data/skills/tumor-mutational-burden-agent/SKILL.md +188 -0
- package/data/skills/ukb-navigator/SKILL.md +113 -0
- package/data/skills/umap-learn/SKILL.md +473 -0
- package/data/skills/uniprot-database/SKILL.md +189 -0
- package/data/skills/universal-single-cell-annotator/SKILL.md +72 -0
- package/data/skills/using-git-worktrees/SKILL.md +218 -0
- package/data/skills/using-superpowers/SKILL.md +95 -0
- package/data/skills/usmle/SKILL.md +62 -0
- package/data/skills/uspto-database/SKILL.md +597 -0
- package/data/skills/vaex/SKILL.md +180 -0
- package/data/skills/varcadd-pathogenicity/SKILL.md +68 -0
- package/data/skills/variant-interpretation-acmg/SKILL.md +58 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/clinical-interpretation/SKILL.md +334 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/consensus-sequences/SKILL.md +343 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/deepvariant/SKILL.md +279 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/filtering-best-practices/SKILL.md +362 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/gatk-variant-calling/SKILL.md +398 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/joint-calling/SKILL.md +343 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/structural-variant-calling/SKILL.md +256 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/variant-annotation/SKILL.md +387 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/variant-calling/SKILL.md +258 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/variant-normalization/SKILL.md +304 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/vcf-basics/SKILL.md +329 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/vcf-manipulation/SKILL.md +398 -0
- package/data/skills/variant-interpretation-acmg/bioSkills/vcf-statistics/SKILL.md +424 -0
- package/data/skills/variant-interpretation-acmg/varCADD/SKILL.md +68 -0
- package/data/skills/vcf-annotator/SKILL.md +55 -0
- package/data/skills/verification-before-completion/SKILL.md +139 -0
- package/data/skills/virtual-lab-agent/SKILL.md +240 -0
- package/data/skills/wearable-analysis-agent/SKILL.md +70 -0
- package/data/skills/weightloss-analyzer/SKILL.md +320 -0
- package/data/skills/wellally-tech/SKILL.md +685 -0
- package/data/skills/wikipedia-search/SKILL.md +481 -0
- package/data/skills/writing-plans/SKILL.md +116 -0
- package/data/skills/writing-skills/SKILL.md +655 -0
- package/data/skills/xlsx/SKILL.md +292 -0
- package/data/skills/xlsx-official/SKILL.md +289 -0
- package/data/skills/zarr-python/SKILL.md +777 -0
- package/data/skills/zinc-database/SKILL.md +398 -0
- package/data/tools/__init__.py +8 -0
- package/data/tools/hpc.py +71 -0
- package/data/tools/hpc_client/__init__.py +8 -0
- package/data/tools/hpc_client/builders/__init__.py +12 -0
- package/data/tools/hpc_client/builders/alphafold.py +36 -0
- package/data/tools/hpc_client/builders/boltz.py +33 -0
- package/data/tools/hpc_client/builders/chai.py +30 -0
- package/data/tools/hpc_client/builders/immunebuilder.py +31 -0
- package/data/tools/hpc_client/builders/rfantibody.py +58 -0
- package/data/tools/hpc_client/builders/thermompnn.py +16 -0
- package/data/tools/hpc_client/hpc_api.py +41 -0
- package/data/tools/hpc_client/hpc_tools.py +218 -0
- package/data/tools/hpc_dynamic.py +71 -0
- package/data/tools/integrations/__init__.py +14 -0
- package/data/tools/integrations/adaptyv.py +107 -0
- package/data/tools/integrations/addgene.py +52 -0
- package/data/tools/integrations/api_internal.py +33 -0
- package/data/tools/molecular_biology.py +688 -0
- package/data/tools/pharmacology.py +67 -0
- package/data/workflows/bulk-omics-clustering/SKILL.md +501 -0
- package/data/workflows/bulk-omics-clustering/references/best_practices.md +395 -0
- package/data/workflows/bulk-omics-clustering/references/clustering_methods_comparison.md +288 -0
- package/data/workflows/bulk-omics-clustering/references/common-patterns.md +1136 -0
- package/data/workflows/bulk-omics-clustering/references/decision-guide.md +819 -0
- package/data/workflows/bulk-omics-clustering/references/distance_metrics_guide.md +388 -0
- package/data/workflows/bulk-omics-clustering/references/parameter_guide.md +396 -0
- package/data/workflows/bulk-omics-clustering/references/r-quick-start.md +105 -0
- package/data/workflows/bulk-omics-clustering/references/validation_metrics_guide.md +315 -0
- package/data/workflows/bulk-omics-clustering/scripts/characterize_clusters.py +255 -0
- package/data/workflows/bulk-omics-clustering/scripts/cluster_validation.py +449 -0
- package/data/workflows/bulk-omics-clustering/scripts/density_clustering.py +321 -0
- package/data/workflows/bulk-omics-clustering/scripts/dimensionality_reduction.py +328 -0
- package/data/workflows/bulk-omics-clustering/scripts/distance_metrics.py +251 -0
- package/data/workflows/bulk-omics-clustering/scripts/export_results.py +456 -0
- package/data/workflows/bulk-omics-clustering/scripts/hierarchical_clustering.R +229 -0
- package/data/workflows/bulk-omics-clustering/scripts/hierarchical_clustering.py +269 -0
- package/data/workflows/bulk-omics-clustering/scripts/kmeans_clustering.py +346 -0
- package/data/workflows/bulk-omics-clustering/scripts/load_example_data.R +171 -0
- package/data/workflows/bulk-omics-clustering/scripts/load_example_data.py +171 -0
- package/data/workflows/bulk-omics-clustering/scripts/model_based_clustering.py +370 -0
- package/data/workflows/bulk-omics-clustering/scripts/optimal_clusters.py +381 -0
- package/data/workflows/bulk-omics-clustering/scripts/plot_cluster_heatmap.R +141 -0
- package/data/workflows/bulk-omics-clustering/scripts/plot_clustering_results.py +452 -0
- package/data/workflows/bulk-omics-clustering/scripts/prepare_data.py +250 -0
- package/data/workflows/bulk-omics-clustering/scripts/stability_analysis.py +434 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/SKILL.md +505 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/references/comprehensive-reference.md +440 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/references/decision-guide.md +327 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/references/troubleshooting.md +456 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/references/usage-guide.md +75 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/basic_workflow.R +149 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/batch_correction.R +44 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/export_results.R +190 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/extract_results.R +242 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/load_example_data.R +250 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/multi_condition.R +50 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/qc_plots.R +410 -0
- package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/transformations.R +218 -0
- package/data/workflows/chip-atlas-diff-analysis/SKILL.md +222 -0
- package/data/workflows/chip-atlas-diff-analysis/references/chipatlas_diff_api_format.md +106 -0
- package/data/workflows/chip-atlas-diff-analysis/references/diff_analysis_methods.md +89 -0
- package/data/workflows/chip-atlas-diff-analysis/references/output_format.md +78 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/__init__.py +1 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/annotate_genes.py +144 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/export_all.py +498 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/filter_regions.py +176 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/generate_all_plots.py +321 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/load_example_data.py +149 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/load_user_data.py +211 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/parse_bed_results.py +240 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/qc_checks.py +621 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/query_chipatlas_api.py +329 -0
- package/data/workflows/chip-atlas-diff-analysis/scripts/run_diff_workflow.py +256 -0
- package/data/workflows/chip-atlas-peak-enrichment/SKILL.md +212 -0
- package/data/workflows/chip-atlas-peak-enrichment/references/chipatlas_metadata_format.md +115 -0
- package/data/workflows/chip-atlas-peak-enrichment/references/enrichment_statistics.md +145 -0
- package/data/workflows/chip-atlas-peak-enrichment/references/peak_thresholds.md +63 -0
- package/data/workflows/chip-atlas-peak-enrichment/references/promoter_definitions.md +69 -0
- package/data/workflows/chip-atlas-peak-enrichment/scripts/__init__.py +1 -0
- package/data/workflows/chip-atlas-peak-enrichment/scripts/convert_genes_to_regions.py +271 -0
- package/data/workflows/chip-atlas-peak-enrichment/scripts/export_all.py +456 -0
- package/data/workflows/chip-atlas-peak-enrichment/scripts/filter_experiments.py +116 -0
- package/data/workflows/chip-atlas-peak-enrichment/scripts/generate_all_plots.py +280 -0
- package/data/workflows/chip-atlas-peak-enrichment/scripts/load_example_data.py +96 -0
- package/data/workflows/chip-atlas-peak-enrichment/scripts/load_user_data.py +183 -0
- package/data/workflows/chip-atlas-peak-enrichment/scripts/query_chipatlas_api.py +349 -0
- package/data/workflows/chip-atlas-peak-enrichment/scripts/run_enrichment_workflow.py +271 -0
- package/data/workflows/chip-atlas-target-genes/SKILL.md +230 -0
- package/data/workflows/chip-atlas-target-genes/references/macs2_binding_scores.md +89 -0
- package/data/workflows/chip-atlas-target-genes/references/string_scores.md +58 -0
- package/data/workflows/chip-atlas-target-genes/references/target_genes_data_format.md +73 -0
- package/data/workflows/chip-atlas-target-genes/scripts/__init__.py +0 -0
- package/data/workflows/chip-atlas-target-genes/scripts/download_target_genes.py +200 -0
- package/data/workflows/chip-atlas-target-genes/scripts/export_all.py +340 -0
- package/data/workflows/chip-atlas-target-genes/scripts/filter_targets.py +205 -0
- package/data/workflows/chip-atlas-target-genes/scripts/generate_all_plots.py +330 -0
- package/data/workflows/chip-atlas-target-genes/scripts/load_example_query.py +61 -0
- package/data/workflows/chip-atlas-target-genes/scripts/load_user_query.py +47 -0
- package/data/workflows/chip-atlas-target-genes/scripts/run_target_genes_workflow.py +141 -0
- package/data/workflows/clinicaltrials-landscape/SKILL.md +257 -0
- package/data/workflows/clinicaltrials-landscape/references/api-parameters.md +181 -0
- package/data/workflows/clinicaltrials-landscape/references/mechanisms.md +141 -0
- package/data/workflows/clinicaltrials-landscape/references/output-schema.md +184 -0
- package/data/workflows/clinicaltrials-landscape/scripts/__init__.py +1 -0
- package/data/workflows/clinicaltrials-landscape/scripts/classify_mechanisms.py +359 -0
- package/data/workflows/clinicaltrials-landscape/scripts/compile_trials.py +579 -0
- package/data/workflows/clinicaltrials-landscape/scripts/disease_config.py +161 -0
- package/data/workflows/clinicaltrials-landscape/scripts/export_all.py +242 -0
- package/data/workflows/clinicaltrials-landscape/scripts/generate_landscape_plots.py +761 -0
- package/data/workflows/clinicaltrials-landscape/scripts/generate_pdf_report.py +1465 -0
- package/data/workflows/clinicaltrials-landscape/scripts/generate_report.py +1813 -0
- package/data/workflows/clinicaltrials-landscape/scripts/query_clinicaltrials.py +307 -0
- package/data/workflows/coexpression-network/SKILL.md +344 -0
- package/data/workflows/coexpression-network/references/parameter-tuning-guide.md +591 -0
- package/data/workflows/coexpression-network/references/troubleshooting.md +483 -0
- package/data/workflows/coexpression-network/references/wgcna-best-practices.md +563 -0
- package/data/workflows/coexpression-network/references/wgcna-reference.md +538 -0
- package/data/workflows/coexpression-network/scripts/build_network.R +43 -0
- package/data/workflows/coexpression-network/scripts/correlate_modules_traits.R +92 -0
- package/data/workflows/coexpression-network/scripts/export_wgcna_results.R +117 -0
- package/data/workflows/coexpression-network/scripts/identify_hub_genes.R +63 -0
- package/data/workflows/coexpression-network/scripts/load_example_data.R +214 -0
- package/data/workflows/coexpression-network/scripts/module_enrichment.R +159 -0
- package/data/workflows/coexpression-network/scripts/pick_soft_power.R +70 -0
- package/data/workflows/coexpression-network/scripts/plot_all_wgcna.R +104 -0
- package/data/workflows/coexpression-network/scripts/plot_eigengene_heatmap.R +65 -0
- package/data/workflows/coexpression-network/scripts/plot_hub_genes.R +70 -0
- package/data/workflows/coexpression-network/scripts/plot_module_dendrogram.R +50 -0
- package/data/workflows/coexpression-network/scripts/plotting_helpers.R +87 -0
- package/data/workflows/coexpression-network/scripts/prepare_wgcna_data.R +73 -0
- package/data/workflows/coexpression-network/scripts/wgcna_workflow.R +93 -0
- package/data/workflows/experimental-design-statistics/SKILL.md +408 -0
- package/data/workflows/experimental-design-statistics/references/batch_effect_mitigation.md +756 -0
- package/data/workflows/experimental-design-statistics/references/cv_tissue_database.csv +30 -0
- package/data/workflows/experimental-design-statistics/references/experimental_design_best_practices.md +515 -0
- package/data/workflows/experimental-design-statistics/references/multiple_testing_guide.md +730 -0
- package/data/workflows/experimental-design-statistics/references/power_analysis_guidelines.md +635 -0
- package/data/workflows/experimental-design-statistics/references/qc_guidelines.md +310 -0
- package/data/workflows/experimental-design-statistics/references/software_requirements.md +328 -0
- package/data/workflows/experimental-design-statistics/references/troubleshooting_guide.md +510 -0
- package/data/workflows/experimental-design-statistics/scripts/batch_assignment.R +302 -0
- package/data/workflows/experimental-design-statistics/scripts/batch_validation.R +342 -0
- package/data/workflows/experimental-design-statistics/scripts/export_design.R +352 -0
- package/data/workflows/experimental-design-statistics/scripts/load_example_data.R +204 -0
- package/data/workflows/experimental-design-statistics/scripts/multiple_testing.R +417 -0
- package/data/workflows/experimental-design-statistics/scripts/plot_power_curves.R +317 -0
- package/data/workflows/experimental-design-statistics/scripts/power_atacseq.R +229 -0
- package/data/workflows/experimental-design-statistics/scripts/power_pilot_based.R +289 -0
- package/data/workflows/experimental-design-statistics/scripts/power_rnaseq.R +247 -0
- package/data/workflows/experimental-design-statistics/scripts/sample_size_de.R +327 -0
- package/data/workflows/experimental-design-statistics/scripts/sample_size_scrna.R +304 -0
- package/data/workflows/functional-enrichment-from-degs/SKILL.md +387 -0
- package/data/workflows/functional-enrichment-from-degs/references/database_guide.md +354 -0
- package/data/workflows/functional-enrichment-from-degs/references/decision-guide.md +546 -0
- package/data/workflows/functional-enrichment-from-degs/references/gsea_ora_comparison.md +213 -0
- package/data/workflows/functional-enrichment-from-degs/references/gsea_ora_validation_framework.md +483 -0
- package/data/workflows/functional-enrichment-from-degs/references/interpretation_guidelines.md +374 -0
- package/data/workflows/functional-enrichment-from-degs/references/method-reference.md +742 -0
- package/data/workflows/functional-enrichment-from-degs/scripts/export_results.R +190 -0
- package/data/workflows/functional-enrichment-from-degs/scripts/generate_plots.R +240 -0
- package/data/workflows/functional-enrichment-from-degs/scripts/get_msigdb_genesets.R +75 -0
- package/data/workflows/functional-enrichment-from-degs/scripts/load_de_results.R +60 -0
- package/data/workflows/functional-enrichment-from-degs/scripts/load_example_data.R +212 -0
- package/data/workflows/functional-enrichment-from-degs/scripts/prepare_gene_lists.R +92 -0
- package/data/workflows/functional-enrichment-from-degs/scripts/run_gsea.R +44 -0
- package/data/workflows/functional-enrichment-from-degs/scripts/run_ora.R +53 -0
- package/data/workflows/genetic-variant-annotation/SKILL.md +440 -0
- package/data/workflows/genetic-variant-annotation/references/auto_installation_implementation.md +274 -0
- package/data/workflows/genetic-variant-annotation/references/consequence_terms.md +392 -0
- package/data/workflows/genetic-variant-annotation/references/filtering_strategies.md +808 -0
- package/data/workflows/genetic-variant-annotation/references/installation_guide.md +557 -0
- package/data/workflows/genetic-variant-annotation/references/pathogenicity_interpretation.md +473 -0
- package/data/workflows/genetic-variant-annotation/references/qc_guidelines.md +524 -0
- package/data/workflows/genetic-variant-annotation/references/snpeff_best_practices.md +481 -0
- package/data/workflows/genetic-variant-annotation/references/tool_selection_guide.md +433 -0
- package/data/workflows/genetic-variant-annotation/references/troubleshooting_guide.md +678 -0
- package/data/workflows/genetic-variant-annotation/references/vep_best_practices.md +450 -0
- package/data/workflows/genetic-variant-annotation/scripts/annotate_genes.py +243 -0
- package/data/workflows/genetic-variant-annotation/scripts/export_results.py +450 -0
- package/data/workflows/genetic-variant-annotation/scripts/filter_variants.py +365 -0
- package/data/workflows/genetic-variant-annotation/scripts/install_tools.py +246 -0
- package/data/workflows/genetic-variant-annotation/scripts/load_example_data.py +166 -0
- package/data/workflows/genetic-variant-annotation/scripts/parse_snpeff_output.py +283 -0
- package/data/workflows/genetic-variant-annotation/scripts/parse_vep_output.py +257 -0
- package/data/workflows/genetic-variant-annotation/scripts/plot_variant_distribution.py +372 -0
- package/data/workflows/genetic-variant-annotation/scripts/prioritize_variants.py +287 -0
- package/data/workflows/genetic-variant-annotation/scripts/run_snpeff.py +418 -0
- package/data/workflows/genetic-variant-annotation/scripts/run_vep.py +358 -0
- package/data/workflows/genetic-variant-annotation/scripts/select_tool.py +203 -0
- package/data/workflows/genetic-variant-annotation/scripts/test_complete_workflow.py +312 -0
- package/data/workflows/genetic-variant-annotation/scripts/test_pickle_load.py +118 -0
- package/data/workflows/genetic-variant-annotation/scripts/validate_vcf.py +351 -0
- package/data/workflows/genetic-variant-annotation/scripts/verify_changes.py +212 -0
- package/data/workflows/grn-pyscenic/SKILL.md +331 -0
- package/data/workflows/grn-pyscenic/references/cli_interface.md +222 -0
- package/data/workflows/grn-pyscenic/references/database_downloads.md +245 -0
- package/data/workflows/grn-pyscenic/scripts/export_all.py +192 -0
- package/data/workflows/grn-pyscenic/scripts/generate_report.py +512 -0
- package/data/workflows/grn-pyscenic/scripts/integrate_with_adata.py +54 -0
- package/data/workflows/grn-pyscenic/scripts/load_example_data.py +200 -0
- package/data/workflows/grn-pyscenic/scripts/load_expression_data.py +61 -0
- package/data/workflows/grn-pyscenic/scripts/plot_regulon_visualizations.py +263 -0
- package/data/workflows/grn-pyscenic/scripts/run_grn_workflow.py +184 -0
- package/data/workflows/gwas-to-function-twas/SKILL.md +394 -0
- package/data/workflows/gwas-to-function-twas/references/fusion_best_practices.md +120 -0
- package/data/workflows/gwas-to-function-twas/references/installation-guide.md +414 -0
- package/data/workflows/gwas-to-function-twas/references/ldsc_qc_guidelines.md +287 -0
- package/data/workflows/gwas-to-function-twas/references/spredixxcan_best_practices.md +166 -0
- package/data/workflows/gwas-to-function-twas/references/therapeutic_interpretation_guide.md +717 -0
- package/data/workflows/gwas-to-function-twas/references/tissue_reference_guide.md +182 -0
- package/data/workflows/gwas-to-function-twas/references/troubleshooting_guide.md +317 -0
- package/data/workflows/gwas-to-function-twas/references/twas_hub_validation_guide.md +88 -0
- package/data/workflows/gwas-to-function-twas/scripts/colocalization_analysis.py +187 -0
- package/data/workflows/gwas-to-function-twas/scripts/druggability_scoring.py +199 -0
- package/data/workflows/gwas-to-function-twas/scripts/export_results.py +220 -0
- package/data/workflows/gwas-to-function-twas/scripts/integrate_variant_annotation.py +194 -0
- package/data/workflows/gwas-to-function-twas/scripts/interpret_therapeutic_direction.py +418 -0
- package/data/workflows/gwas-to-function-twas/scripts/mendelian_randomization.py +749 -0
- package/data/workflows/gwas-to-function-twas/scripts/multilayer_direction_analysis.py +471 -0
- package/data/workflows/gwas-to-function-twas/scripts/plot_twas_results.py +252 -0
- package/data/workflows/gwas-to-function-twas/scripts/run_fusion.py +155 -0
- package/data/workflows/gwas-to-function-twas/scripts/run_smultixcan.py +102 -0
- package/data/workflows/gwas-to-function-twas/scripts/run_spredixxcan.py +138 -0
- package/data/workflows/gwas-to-function-twas/scripts/select_reference_panel.py +253 -0
- package/data/workflows/gwas-to-function-twas/scripts/validate_gwas_sumstats.py +214 -0
- package/data/workflows/gwas-to-function-twas/scripts/validate_with_twas_hub.py +439 -0
- package/data/workflows/lasso-biomarker-panel/SKILL.md +322 -0
- package/data/workflows/lasso-biomarker-panel/references/decision-guide.md +64 -0
- package/data/workflows/lasso-biomarker-panel/references/lasso-reference.md +110 -0
- package/data/workflows/lasso-biomarker-panel/references/validation-guide.md +105 -0
- package/data/workflows/lasso-biomarker-panel/scripts/biological_interpretation.R +1560 -0
- package/data/workflows/lasso-biomarker-panel/scripts/biomarker_plots.R +350 -0
- package/data/workflows/lasso-biomarker-panel/scripts/export_results.R +1492 -0
- package/data/workflows/lasso-biomarker-panel/scripts/lasso_workflow.R +328 -0
- package/data/workflows/lasso-biomarker-panel/scripts/load_example_data.R +1903 -0
- package/data/workflows/lasso-biomarker-panel/scripts/plotting_helpers.R +78 -0
- package/data/workflows/lasso-biomarker-panel/scripts/prepare_features.R +225 -0
- package/data/workflows/lasso-biomarker-panel/scripts/query_cellxgene.py +107 -0
- package/data/workflows/lasso-biomarker-panel/scripts/validate_external.R +174 -0
- package/data/workflows/literature-preclinical/SKILL.md +276 -0
- package/data/workflows/literature-preclinical/assets/eval/simple_test.py +386 -0
- package/data/workflows/literature-preclinical/references/experiment-extraction-guide.md +147 -0
- package/data/workflows/literature-preclinical/references/full-text-enrichment-guide.md +121 -0
- package/data/workflows/literature-preclinical/references/preclinical-search-guide.md +117 -0
- package/data/workflows/literature-preclinical/scripts/extract_experiments.py +401 -0
- package/data/workflows/literature-preclinical/scripts/generate_plots.R +303 -0
- package/data/workflows/literature-preclinical/scripts/narrative_synthesis.py +653 -0
- package/data/workflows/literature-preclinical/scripts/preclinical_search.py +332 -0
- package/data/workflows/literature-preclinical/scripts/preclinical_synthesis.py +237 -0
- package/data/workflows/literature-preclinical/scripts/report_generation.py +326 -0
- package/data/workflows/mendelian-randomization-twosamplemr/SKILL.md +210 -0
- package/data/workflows/mendelian-randomization-twosamplemr/references/interpretation-guide.md +239 -0
- package/data/workflows/mendelian-randomization-twosamplemr/references/method-reference.md +190 -0
- package/data/workflows/mendelian-randomization-twosamplemr/scripts/export_results.R +123 -0
- package/data/workflows/mendelian-randomization-twosamplemr/scripts/generate_report.R +411 -0
- package/data/workflows/mendelian-randomization-twosamplemr/scripts/load_data.R +281 -0
- package/data/workflows/mendelian-randomization-twosamplemr/scripts/mr_plots.R +163 -0
- package/data/workflows/mendelian-randomization-twosamplemr/scripts/run_mr_analysis.R +322 -0
- package/data/workflows/pcr-primer-design/SKILL.md +397 -0
- package/data/workflows/pcr-primer-design/references/code_examples.md +594 -0
- package/data/workflows/pcr-primer-design/references/miqe_guidelines.md +453 -0
- package/data/workflows/pcr-primer-design/references/parameter_ranges.md +356 -0
- package/data/workflows/pcr-primer-design/references/primer_design_best_practices.md +451 -0
- package/data/workflows/pcr-primer-design/references/troubleshooting_guide.md +477 -0
- package/data/workflows/pcr-primer-design/scripts/__init__.py +2 -0
- package/data/workflows/pcr-primer-design/scripts/calculate_tm.py +306 -0
- package/data/workflows/pcr-primer-design/scripts/check_dimers.py +298 -0
- package/data/workflows/pcr-primer-design/scripts/check_secondary_structures.py +343 -0
- package/data/workflows/pcr-primer-design/scripts/design_qpcr_primers.py +233 -0
- package/data/workflows/pcr-primer-design/scripts/design_standard_primers.py +197 -0
- package/data/workflows/pcr-primer-design/scripts/design_taqman_probes.py +226 -0
- package/data/workflows/pcr-primer-design/scripts/export_results.py +382 -0
- package/data/workflows/pcr-primer-design/scripts/generate_reports.py +379 -0
- package/data/workflows/pcr-primer-design/scripts/validate_specificity.py +311 -0
- package/data/workflows/pcr-primer-design/scripts/visualize_primers.py +379 -0
- package/data/workflows/polygenic-risk-score-prs-catalog/SKILL.md +195 -0
- package/data/workflows/polygenic-risk-score-prs-catalog/references/interpretation-guide.md +80 -0
- package/data/workflows/polygenic-risk-score-prs-catalog/references/pgs-catalog-guide.md +109 -0
- package/data/workflows/polygenic-risk-score-prs-catalog/scripts/export_results.R +186 -0
- package/data/workflows/polygenic-risk-score-prs-catalog/scripts/generate_plots.R +283 -0
- package/data/workflows/polygenic-risk-score-prs-catalog/scripts/load_pgs_weights.R +228 -0
- package/data/workflows/polygenic-risk-score-prs-catalog/scripts/load_reference_data.R +191 -0
- package/data/workflows/polygenic-risk-score-prs-catalog/scripts/score_traits.R +216 -0
- package/data/workflows/pooled-crispr-screens/SKILL.md +362 -0
- package/data/workflows/pooled-crispr-screens/references/crispr_screen_best_practices.md +349 -0
- package/data/workflows/pooled-crispr-screens/references/qc_guidelines.md +722 -0
- package/data/workflows/pooled-crispr-screens/references/statistical_methods.md +644 -0
- package/data/workflows/pooled-crispr-screens/references/troubleshooting_guide.md +684 -0
- package/data/workflows/pooled-crispr-screens/references/umi_optimization.md +297 -0
- package/data/workflows/pooled-crispr-screens/scripts/concatenate_libraries.py +132 -0
- package/data/workflows/pooled-crispr-screens/scripts/detect_perturbed_cells.py +255 -0
- package/data/workflows/pooled-crispr-screens/scripts/differential_expression.py +202 -0
- package/data/workflows/pooled-crispr-screens/scripts/differential_expression_glmgampoi.py +320 -0
- package/data/workflows/pooled-crispr-screens/scripts/export_results.py +261 -0
- package/data/workflows/pooled-crispr-screens/scripts/expression_filtering.py +159 -0
- package/data/workflows/pooled-crispr-screens/scripts/gene_name_corrections.py +188 -0
- package/data/workflows/pooled-crispr-screens/scripts/generate_report.py +485 -0
- package/data/workflows/pooled-crispr-screens/scripts/load_10x_libraries.py +69 -0
- package/data/workflows/pooled-crispr-screens/scripts/load_example_data.py +257 -0
- package/data/workflows/pooled-crispr-screens/scripts/map_sgrna_to_cells.py +119 -0
- package/data/workflows/pooled-crispr-screens/scripts/normalize_and_scale.py +140 -0
- package/data/workflows/pooled-crispr-screens/scripts/qc_filtering.py +185 -0
- package/data/workflows/pooled-crispr-screens/scripts/run_glmgampoi.R +181 -0
- package/data/workflows/pooled-crispr-screens/scripts/screen_all_perturbations.py +306 -0
- package/data/workflows/pooled-crispr-screens/scripts/validate_perturbations.py +314 -0
- package/data/workflows/pooled-crispr-screens/scripts/visualize_perturbations.py +314 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/SKILL.md +425 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/references/ambient_rna_correction.md +422 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/references/common-patterns.md +533 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/references/integration_methods.md +820 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/references/marker_gene_database.md +471 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/references/pseudobulk_de_guide.md +408 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/references/qc_guidelines.md +535 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/references/scanpy_best_practices.md +496 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/references/troubleshooting_guide.md +668 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/references/workflow-details.md +727 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/annotate_celltypes.py +431 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/cluster_cells.py +293 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/export_results.py +423 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/filter_cells.py +531 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/find_markers.py +391 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/find_variable_genes.py +222 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/integrate_scvi.py +665 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/integration_diagnostics.py +678 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/load_example_data.py +68 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/normalize_data.py +325 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/plot_dimreduction.py +389 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/plot_qc.py +320 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/pseudobulk_de.py +553 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/qc_metrics.py +477 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/remove_ambient_rna.py +347 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/run_umap.py +188 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/scale_and_pca.py +365 -0
- package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/setup_and_import.py +334 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/SKILL.md +585 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/references/ambient_rna_correction.md +422 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/references/common-patterns.md +667 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/references/decision-guide.md +456 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/references/integration_methods.md +864 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/references/marker_gene_database.md +471 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/references/pseudobulk_de_guide.md +408 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/references/qc_guidelines.md +452 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/references/seurat_best_practices.md +417 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/references/troubleshooting_guide.md +566 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/references/workflow-details.md +801 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/annotate_celltypes.R +306 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/cluster_cells.R +223 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/export_results.R +292 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/filter_cells.R +576 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/find_markers.R +325 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/find_variable_features.R +106 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/integrate_batches.R +504 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/integration_diagnostics.R +596 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/load_example_data.R +89 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/normalize_data.R +184 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/plot_dimreduction.R +273 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/plot_qc.R +250 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/pseudobulk_de.R +324 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/qc_metrics.R +358 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/remove_ambient_rna.R +281 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/run_umap.R +116 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/scale_and_pca.R +243 -0
- package/data/workflows/scrnaseq-seurat-core-analysis/scripts/setup_and_import.R +193 -0
- package/data/workflows/spatial-transcriptomics/SKILL.md +256 -0
- package/data/workflows/spatial-transcriptomics/references/spatial-analysis-guide.md +216 -0
- package/data/workflows/spatial-transcriptomics/scripts/export_results.py +214 -0
- package/data/workflows/spatial-transcriptomics/scripts/generate_all_plots.py +397 -0
- package/data/workflows/spatial-transcriptomics/scripts/load_example_data.py +175 -0
- package/data/workflows/spatial-transcriptomics/scripts/spatial_workflow.py +206 -0
- package/dist/bgi.js +28 -1
- package/package.json +2 -1
|
@@ -0,0 +1,819 @@
|
|
|
1
|
+
# Clustering Decision Guide
|
|
2
|
+
|
|
3
|
+
Comprehensive guide for making critical clustering decisions with flowcharts,
|
|
4
|
+
examples, and troubleshooting.
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Decision 1: Which Clustering Algorithm?
|
|
9
|
+
|
|
10
|
+
### Overview
|
|
11
|
+
|
|
12
|
+
Choosing the right clustering algorithm depends on dataset size, expected
|
|
13
|
+
cluster shapes, whether you know k, and computational constraints.
|
|
14
|
+
|
|
15
|
+
### Algorithm Selection Table
|
|
16
|
+
|
|
17
|
+
| Algorithm | Best For | Pros | Cons | Sample Size | Requires k? |
|
|
18
|
+
| ---------------- | ----------------------------------------------------- | ----------------------------------------------------------------------------- | --------------------------------------------------------------------------- | --------------- | ------------------- |
|
|
19
|
+
| **Hierarchical** | Exploration, visualization, biological interpretation | Dendrogram shows structure at all scales; deterministic; flexible k selection | Slow for >5k samples; high memory for distance matrix | <5k samples | No (cut tree later) |
|
|
20
|
+
| **K-means** | Large datasets, spherical clusters | Very fast; scales to millions of samples; simple interpretation | Requires k upfront; assumes spherical clusters; sensitive to initialization | >1k samples | Yes |
|
|
21
|
+
| **HDBSCAN** | Unknown k, outlier detection, arbitrary shapes | Finds k automatically; handles arbitrary shapes; marks outliers | Slow for >50k samples; parameter tuning can be tricky | 100-50k samples | No (automatic) |
|
|
22
|
+
| **GMM** | Soft clustering, uncertainty quantification | Probabilistic memberships; overlapping clusters; statistical framework | Requires k; assumes Gaussian distributions; can overfit | 100-10k samples | Yes |
|
|
23
|
+
|
|
24
|
+
### Decision Flowchart
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
START
|
|
28
|
+
|
|
|
29
|
+
├─ Do you know k (number of clusters)?
|
|
30
|
+
| |
|
|
31
|
+
| ├─ NO → Do you want to detect outliers?
|
|
32
|
+
| | |
|
|
33
|
+
| | ├─ YES → HDBSCAN (automatic k + outlier detection)
|
|
34
|
+
| | |
|
|
35
|
+
| | └─ NO → Hierarchical (explore dendrogram, cut at any k)
|
|
36
|
+
| |
|
|
37
|
+
| └─ YES → How many samples do you have?
|
|
38
|
+
| |
|
|
39
|
+
| ├─ <5k samples → Do you need visualization?
|
|
40
|
+
| | |
|
|
41
|
+
| | ├─ YES → Hierarchical (dendrogram + known k)
|
|
42
|
+
| | |
|
|
43
|
+
| | └─ NO → Do clusters overlap?
|
|
44
|
+
| | |
|
|
45
|
+
| | ├─ YES → GMM (soft clustering)
|
|
46
|
+
| | |
|
|
47
|
+
| | └─ NO → K-means (efficient partitioning)
|
|
48
|
+
| |
|
|
49
|
+
| └─ >5k samples → K-means (fast, scalable)
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### When to Use Each Algorithm
|
|
53
|
+
|
|
54
|
+
#### Hierarchical Clustering
|
|
55
|
+
|
|
56
|
+
**Use when:**
|
|
57
|
+
|
|
58
|
+
- ✅ You want to explore data structure before committing to k
|
|
59
|
+
- ✅ You need a dendrogram for publication/interpretation
|
|
60
|
+
- ✅ Sample size <5k (feasible computation)
|
|
61
|
+
- ✅ You want deterministic results (same every run)
|
|
62
|
+
- ✅ You need to show relationships between clusters
|
|
63
|
+
|
|
64
|
+
**Don't use when:**
|
|
65
|
+
|
|
66
|
+
- ❌ You have >10k samples (too slow, too much memory)
|
|
67
|
+
- ❌ You only care about final partitioning (k-means is faster)
|
|
68
|
+
- ❌ You have very high-dimensional data without PCA preprocessing
|
|
69
|
+
|
|
70
|
+
**Linkage methods:**
|
|
71
|
+
|
|
72
|
+
- **Ward** (default): Minimizes within-cluster variance, creates compact
|
|
73
|
+
spherical clusters
|
|
74
|
+
- **Average**: More flexible cluster shapes, good for gene expression
|
|
75
|
+
- **Complete**: Avoids chaining, creates tight clusters
|
|
76
|
+
- **Single**: Can create elongated clusters, sensitive to outliers
|
|
77
|
+
|
|
78
|
+
#### K-means Clustering
|
|
79
|
+
|
|
80
|
+
**Use when:**
|
|
81
|
+
|
|
82
|
+
- ✅ You have >5k samples and need speed
|
|
83
|
+
- ✅ You know k or can test a range
|
|
84
|
+
- ✅ Clusters are roughly spherical/compact
|
|
85
|
+
- ✅ You want hard assignments (each sample to one cluster)
|
|
86
|
+
- ✅ You can run PCA preprocessing for high-dimensional data
|
|
87
|
+
|
|
88
|
+
**Don't use when:**
|
|
89
|
+
|
|
90
|
+
- ❌ Clusters have arbitrary shapes (use HDBSCAN)
|
|
91
|
+
- ❌ You don't know k at all (use hierarchical or HDBSCAN)
|
|
92
|
+
- ❌ You have extreme outliers (they will be forced into clusters)
|
|
93
|
+
- ❌ Clusters have very different sizes/densities
|
|
94
|
+
|
|
95
|
+
**Variants:**
|
|
96
|
+
|
|
97
|
+
- **K-means** (Lloyd's algorithm): Standard, uses centroids
|
|
98
|
+
- **K-means++**: Better initialization (recommended)
|
|
99
|
+
- **Mini-batch K-means**: For >100k samples, trades accuracy for speed
|
|
100
|
+
- **K-medoids**: Uses actual data points as centers, robust to outliers
|
|
101
|
+
|
|
102
|
+
#### HDBSCAN (Density-Based)
|
|
103
|
+
|
|
104
|
+
**Use when:**
|
|
105
|
+
|
|
106
|
+
- ✅ You don't know k and want automatic detection
|
|
107
|
+
- ✅ You expect arbitrary cluster shapes (non-spherical)
|
|
108
|
+
- ✅ You want outlier detection
|
|
109
|
+
- ✅ Clusters have varying densities
|
|
110
|
+
- ✅ You want hierarchical + density-based benefits
|
|
111
|
+
|
|
112
|
+
**Don't use when:**
|
|
113
|
+
|
|
114
|
+
- ❌ You have >50k samples (becomes very slow)
|
|
115
|
+
- ❌ All your data points should be in clusters (HDBSCAN marks noise)
|
|
116
|
+
- ❌ Clusters are well-separated and spherical (k-means is faster)
|
|
117
|
+
- ❌ You need exactly k clusters for downstream analysis
|
|
118
|
+
|
|
119
|
+
**Key parameters:**
|
|
120
|
+
|
|
121
|
+
- **min_cluster_size**: Minimum samples per cluster (start with 5-10)
|
|
122
|
+
- **min_samples**: Noise threshold (same as min_cluster_size is good default)
|
|
123
|
+
|
|
124
|
+
#### Gaussian Mixture Models (GMM)
|
|
125
|
+
|
|
126
|
+
**Use when:**
|
|
127
|
+
|
|
128
|
+
- ✅ You need soft clustering (probabilistic memberships)
|
|
129
|
+
- ✅ Clusters overlap and you want uncertainty estimates
|
|
130
|
+
- ✅ You assume clusters follow Gaussian distributions
|
|
131
|
+
- ✅ You want a statistical model framework
|
|
132
|
+
- ✅ You need to quantify cluster assignment confidence
|
|
133
|
+
|
|
134
|
+
**Don't use when:**
|
|
135
|
+
|
|
136
|
+
- ❌ You want hard assignments only (k-means is simpler)
|
|
137
|
+
- ❌ Clusters are not approximately Gaussian
|
|
138
|
+
- ❌ You have many features without PCA (model can overfit)
|
|
139
|
+
- ❌ You need maximum speed (GMM is slower than k-means)
|
|
140
|
+
|
|
141
|
+
**Covariance types:**
|
|
142
|
+
|
|
143
|
+
- **full**: Each cluster has its own full covariance matrix (most flexible)
|
|
144
|
+
- **tied**: All clusters share same covariance (assumes similar shapes)
|
|
145
|
+
- **diag**: Diagonal covariance (assumes features are independent)
|
|
146
|
+
- **spherical**: Single variance per cluster (like k-means)
|
|
147
|
+
|
|
148
|
+
### Common Decision Scenarios
|
|
149
|
+
|
|
150
|
+
**Scenario 1:** "I have 500 gene expression samples and don't know how many
|
|
151
|
+
subtypes exist"
|
|
152
|
+
|
|
153
|
+
- **Solution:** Hierarchical clustering with correlation distance
|
|
154
|
+
- **Why:** Small sample size (efficient), explore dendrogram, cut at multiple k
|
|
155
|
+
values, correlation handles expression patterns
|
|
156
|
+
|
|
157
|
+
**Scenario 2:** "I have 50,000 samples and know there should be 5 clusters"
|
|
158
|
+
|
|
159
|
+
- **Solution:** K-means with k=5 and n_init=50
|
|
160
|
+
- **Why:** Large sample size needs speed, known k, k-means scales well
|
|
161
|
+
|
|
162
|
+
**Scenario 3:** "I expect 3-4 clusters but also outliers from batch effects"
|
|
163
|
+
|
|
164
|
+
- **Solution:** HDBSCAN to identify outliers, then k-means on clean data
|
|
165
|
+
- **Why:** HDBSCAN finds outliers (-1 label), remove them, then efficient
|
|
166
|
+
k-means
|
|
167
|
+
|
|
168
|
+
**Scenario 4:** "My clusters might overlap and I need confidence scores"
|
|
169
|
+
|
|
170
|
+
- **Solution:** GMM with appropriate k
|
|
171
|
+
- **Why:** Probabilistic memberships quantify overlap, soft clustering handles
|
|
172
|
+
uncertainty
|
|
173
|
+
|
|
174
|
+
**Scenario 5:** "I want to compare multiple methods for publication"
|
|
175
|
+
|
|
176
|
+
- **Solution:** Run hierarchical, k-means, HDBSCAN, and GMM; compare validation
|
|
177
|
+
metrics
|
|
178
|
+
- **Why:** Robust results agree across methods, shows thorough analysis
|
|
179
|
+
|
|
180
|
+
---
|
|
181
|
+
|
|
182
|
+
## Decision 2: Which Distance Metric?
|
|
183
|
+
|
|
184
|
+
### Overview
|
|
185
|
+
|
|
186
|
+
The distance metric determines how similarity is measured between
|
|
187
|
+
samples/features. The right choice depends on data type, normalization, and what
|
|
188
|
+
"similarity" means in your context.
|
|
189
|
+
|
|
190
|
+
### Distance Metric Comparison Table
|
|
191
|
+
|
|
192
|
+
| Metric | Best For | Properties | Sensitive To | Range |
|
|
193
|
+
| --------------- | ------------------------------------------------- | ----------------------------------------------------- | -------------------------------------- | ------ |
|
|
194
|
+
| **Euclidean** | Normalized continuous data, physical measurements | Straight-line distance; all features weighted equally | Scale differences, outliers | [0, ∞) |
|
|
195
|
+
| **Correlation** | Gene expression, time series | Pattern similarity (shape), ignores magnitude | Missing values, zero-variance features | [0, 2] |
|
|
196
|
+
| **Manhattan** | High-dimensional data, outlier-robust | City-block distance; less sensitive to outliers | Scale differences | [0, ∞) |
|
|
197
|
+
| **Cosine** | Sparse data, text-like features, directional data | Angle between vectors; ignores magnitude | Zero vectors | [0, 2] |
|
|
198
|
+
|
|
199
|
+
### Distance Metric Decision Tree
|
|
200
|
+
|
|
201
|
+
```
|
|
202
|
+
START
|
|
203
|
+
|
|
|
204
|
+
├─ What type of data?
|
|
205
|
+
| |
|
|
206
|
+
| ├─ Gene expression / Time series
|
|
207
|
+
| | └─ Do you care about pattern similarity more than magnitude?
|
|
208
|
+
| | ├─ YES → Correlation (1 - Pearson correlation)
|
|
209
|
+
| | └─ NO → Euclidean (after normalization)
|
|
210
|
+
| |
|
|
211
|
+
| ├─ Proteomics / Metabolomics
|
|
212
|
+
| | └─ Is data normalized?
|
|
213
|
+
| | ├─ YES → Euclidean
|
|
214
|
+
| | └─ NO → Normalize first, then Euclidean
|
|
215
|
+
| |
|
|
216
|
+
| ├─ Clinical / Mixed features
|
|
217
|
+
| | └─ Do you have outliers?
|
|
218
|
+
| | ├─ YES → Manhattan (robust)
|
|
219
|
+
| | └─ NO → Euclidean
|
|
220
|
+
| |
|
|
221
|
+
| └─ High-dimensional / Sparse
|
|
222
|
+
| └─ Cosine (directional similarity)
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### Detailed Metric Properties
|
|
226
|
+
|
|
227
|
+
#### Euclidean Distance
|
|
228
|
+
|
|
229
|
+
**Formula:** `d(x,y) = sqrt(Σ(xi - yi)²)`
|
|
230
|
+
|
|
231
|
+
**Use when:**
|
|
232
|
+
|
|
233
|
+
- ✅ Data is normalized (all features on same scale)
|
|
234
|
+
- ✅ Magnitude differences are meaningful
|
|
235
|
+
- ✅ Features have similar importance
|
|
236
|
+
- ✅ Data is continuous and not too sparse
|
|
237
|
+
|
|
238
|
+
**Properties:**
|
|
239
|
+
|
|
240
|
+
- Most commonly used metric
|
|
241
|
+
- Assumes all features contribute equally
|
|
242
|
+
- Sensitive to outliers (squared differences)
|
|
243
|
+
- Works well with k-means (minimizes Euclidean distance by design)
|
|
244
|
+
|
|
245
|
+
**Example:** Comparing normalized protein abundances where both pattern and
|
|
246
|
+
magnitude matter
|
|
247
|
+
|
|
248
|
+
#### Correlation Distance
|
|
249
|
+
|
|
250
|
+
**Formula:** `d(x,y) = 1 - Pearson_correlation(x,y)`
|
|
251
|
+
|
|
252
|
+
**Use when:**
|
|
253
|
+
|
|
254
|
+
- ✅ Gene expression data (pattern similarity)
|
|
255
|
+
- ✅ Time-series data with different baselines
|
|
256
|
+
- ✅ You want to group samples with similar trends regardless of magnitude
|
|
257
|
+
- ✅ Features are co-regulated or follow similar patterns
|
|
258
|
+
|
|
259
|
+
**Properties:**
|
|
260
|
+
|
|
261
|
+
- Invariant to linear transformations (scale + shift)
|
|
262
|
+
- Range: [0, 2] where 0 = perfect correlation, 2 = perfect anti-correlation
|
|
263
|
+
- Works excellently with hierarchical clustering (average linkage)
|
|
264
|
+
- Can fail with zero-variance features (returns NaN)
|
|
265
|
+
|
|
266
|
+
**Example:** Clustering genes by expression pattern across conditions (high vs
|
|
267
|
+
low expression levels don't matter, pattern does)
|
|
268
|
+
|
|
269
|
+
#### Manhattan Distance
|
|
270
|
+
|
|
271
|
+
**Formula:** `d(x,y) = Σ|xi - yi|`
|
|
272
|
+
|
|
273
|
+
**Use when:**
|
|
274
|
+
|
|
275
|
+
- ✅ Data has outliers
|
|
276
|
+
- ✅ High-dimensional data (curse of dimensionality)
|
|
277
|
+
- ✅ Features on different scales (more robust than Euclidean)
|
|
278
|
+
- ✅ You want robust clustering
|
|
279
|
+
|
|
280
|
+
**Properties:**
|
|
281
|
+
|
|
282
|
+
- Less sensitive to outliers than Euclidean (absolute vs squared)
|
|
283
|
+
- Also called L1 distance or city-block distance
|
|
284
|
+
- Often performs better in high dimensions
|
|
285
|
+
- Computationally efficient
|
|
286
|
+
|
|
287
|
+
**Example:** Clinical data with mixed continuous features that may have outliers
|
|
288
|
+
|
|
289
|
+
#### Cosine Distance
|
|
290
|
+
|
|
291
|
+
**Formula:** `d(x,y) = 1 - (x·y)/(||x|| ||y||)`
|
|
292
|
+
|
|
293
|
+
**Use when:**
|
|
294
|
+
|
|
295
|
+
- ✅ Sparse, high-dimensional data
|
|
296
|
+
- ✅ Direction matters more than magnitude
|
|
297
|
+
- ✅ Features represent counts or frequencies
|
|
298
|
+
- ✅ Data is strictly non-negative
|
|
299
|
+
|
|
300
|
+
**Properties:**
|
|
301
|
+
|
|
302
|
+
- Measures angle between vectors, not magnitude
|
|
303
|
+
- Range: [0, 2] where 0 = same direction
|
|
304
|
+
- Invariant to vector scaling
|
|
305
|
+
- Fails with zero vectors
|
|
306
|
+
|
|
307
|
+
**Example:** Document clustering, sparse gene expression matrices, directional
|
|
308
|
+
data
|
|
309
|
+
|
|
310
|
+
### Feature Scaling Requirements
|
|
311
|
+
|
|
312
|
+
**Before using Euclidean or Manhattan:**
|
|
313
|
+
|
|
314
|
+
- ✅ **Z-score normalization** (mean=0, sd=1): Recommended for most cases
|
|
315
|
+
- ✅ **Min-max scaling** (range [0,1]): When you want bounded distances
|
|
316
|
+
- ✅ **Robust scaling** (median, IQR): When outliers present
|
|
317
|
+
|
|
318
|
+
**Correlation distance:**
|
|
319
|
+
|
|
320
|
+
- No scaling needed (inherently scale-invariant)
|
|
321
|
+
- But remove zero-variance features (cause NaN)
|
|
322
|
+
|
|
323
|
+
**Cosine distance:**
|
|
324
|
+
|
|
325
|
+
- No scaling needed (magnitude-invariant)
|
|
326
|
+
- Works directly on raw counts or frequencies
|
|
327
|
+
|
|
328
|
+
### When to Try Multiple Metrics
|
|
329
|
+
|
|
330
|
+
**Always try multiple metrics when:**
|
|
331
|
+
|
|
332
|
+
- You're doing exploratory analysis
|
|
333
|
+
- You have mixed feature types
|
|
334
|
+
- Clusters are unstable with one metric
|
|
335
|
+
- Results don't match biological expectations
|
|
336
|
+
|
|
337
|
+
**Compare metrics by:**
|
|
338
|
+
|
|
339
|
+
- Silhouette scores (which metric gives better separation?)
|
|
340
|
+
- Stability analysis (which is more robust?)
|
|
341
|
+
- Biological validation (which matches known groups?)
|
|
342
|
+
|
|
343
|
+
---
|
|
344
|
+
|
|
345
|
+
## Decision 3: How Many Clusters (k)?
|
|
346
|
+
|
|
347
|
+
### Overview
|
|
348
|
+
|
|
349
|
+
Determining the optimal number of clusters is often the hardest decision.
|
|
350
|
+
Multiple methods exist, and they often disagree. Use multiple approaches and
|
|
351
|
+
prioritize biological interpretability.
|
|
352
|
+
|
|
353
|
+
### Optimal k Determination Methods
|
|
354
|
+
|
|
355
|
+
#### 1. Elbow Method
|
|
356
|
+
|
|
357
|
+
**What it does:** Plots within-cluster sum of squares (inertia) vs k
|
|
358
|
+
|
|
359
|
+
**How to use:**
|
|
360
|
+
|
|
361
|
+
- Look for "elbow" where adding clusters gives diminishing returns
|
|
362
|
+
- Elbow = point where curve bends sharply
|
|
363
|
+
|
|
364
|
+
**Strengths:**
|
|
365
|
+
|
|
366
|
+
- ✅ Simple, intuitive
|
|
367
|
+
- ✅ Works with any clustering method
|
|
368
|
+
|
|
369
|
+
**Weaknesses:**
|
|
370
|
+
|
|
371
|
+
- ❌ Elbow often ambiguous or absent
|
|
372
|
+
- ❌ Subjective interpretation
|
|
373
|
+
|
|
374
|
+
**Best for:** Initial exploration, k-means clustering
|
|
375
|
+
|
|
376
|
+
#### 2. Silhouette Method
|
|
377
|
+
|
|
378
|
+
**What it does:** Measures how similar each point is to its own cluster vs other
|
|
379
|
+
clusters
|
|
380
|
+
|
|
381
|
+
**How to use:**
|
|
382
|
+
|
|
383
|
+
- Compute average silhouette score for each k
|
|
384
|
+
- Choose k with highest average silhouette
|
|
385
|
+
- Silhouette ranges from -1 (wrong cluster) to +1 (perfect)
|
|
386
|
+
|
|
387
|
+
**Thresholds:**
|
|
388
|
+
|
|
389
|
+
- > 0.7: Strong structure
|
|
390
|
+
- 0.5-0.7: Reasonable structure
|
|
391
|
+
- 0.25-0.5: Weak structure
|
|
392
|
+
- <0.25: No substantial structure
|
|
393
|
+
|
|
394
|
+
**Strengths:**
|
|
395
|
+
|
|
396
|
+
- ✅ Accounts for both cohesion and separation
|
|
397
|
+
- ✅ Works with any distance metric
|
|
398
|
+
- ✅ Per-sample scores identify misclassified points
|
|
399
|
+
|
|
400
|
+
**Weaknesses:**
|
|
401
|
+
|
|
402
|
+
- ❌ Computationally expensive for large datasets
|
|
403
|
+
- ❌ Favors equal-sized clusters
|
|
404
|
+
|
|
405
|
+
**Best for:** Final validation, comparing multiple k values
|
|
406
|
+
|
|
407
|
+
#### 3. Gap Statistic
|
|
408
|
+
|
|
409
|
+
**What it does:** Compares within-cluster dispersion to null reference
|
|
410
|
+
distribution
|
|
411
|
+
|
|
412
|
+
**How to use:**
|
|
413
|
+
|
|
414
|
+
- Choose k where gap(k) is maximum
|
|
415
|
+
- Or use "1-standard-error rule": smallest k where gap(k) ≥ gap(k+1) - SE(k+1)
|
|
416
|
+
|
|
417
|
+
**Strengths:**
|
|
418
|
+
|
|
419
|
+
- ✅ Statistical rigor (compares to null)
|
|
420
|
+
- ✅ Works when no clear structure exists (suggests k=1)
|
|
421
|
+
- ✅ Less biased than elbow method
|
|
422
|
+
|
|
423
|
+
**Weaknesses:**
|
|
424
|
+
|
|
425
|
+
- ❌ Very slow (requires many bootstrap samples)
|
|
426
|
+
- ❌ Can be unstable with small samples
|
|
427
|
+
|
|
428
|
+
**Best for:** Publication-quality analysis, when you need statistical
|
|
429
|
+
justification
|
|
430
|
+
|
|
431
|
+
#### 4. Calinski-Harabasz Index (Variance Ratio Criterion)
|
|
432
|
+
|
|
433
|
+
**What it does:** Ratio of between-cluster variance to within-cluster variance
|
|
434
|
+
|
|
435
|
+
**How to use:**
|
|
436
|
+
|
|
437
|
+
- Choose k with highest index
|
|
438
|
+
- Higher values = better defined clusters
|
|
439
|
+
|
|
440
|
+
**Strengths:**
|
|
441
|
+
|
|
442
|
+
- ✅ Very fast to compute
|
|
443
|
+
- ✅ Works well for compact, well-separated clusters
|
|
444
|
+
|
|
445
|
+
**Weaknesses:**
|
|
446
|
+
|
|
447
|
+
- ❌ Favors k-means-like solutions
|
|
448
|
+
- ❌ Can overestimate k
|
|
449
|
+
|
|
450
|
+
**Best for:** Quick screening of many k values
|
|
451
|
+
|
|
452
|
+
### Decision Strategy When Metrics Disagree
|
|
453
|
+
|
|
454
|
+
**Common scenario:** Elbow suggests k=4, Silhouette suggests k=3, Gap suggests
|
|
455
|
+
k=5
|
|
456
|
+
|
|
457
|
+
**Strategy:**
|
|
458
|
+
|
|
459
|
+
1. **Prioritize Silhouette** (most robust for cluster quality)
|
|
460
|
+
2. **Check biological interpretability** (does k=3 make sense in your domain?)
|
|
461
|
+
3. **Test stability** at k=3, 4, and 5 (which is most reproducible?)
|
|
462
|
+
4. **Visualize** with PCA/UMAP (do you see 3, 4, or 5 groups?)
|
|
463
|
+
5. **Consider range** [k-1, k+1] for sensitivity analysis
|
|
464
|
+
|
|
465
|
+
### Biological Interpretability Considerations
|
|
466
|
+
|
|
467
|
+
**Ask these questions:**
|
|
468
|
+
|
|
469
|
+
- Does k match known biological subtypes?
|
|
470
|
+
- Can you explain what distinguishes each cluster?
|
|
471
|
+
- Are cluster sizes reasonable (not 1% vs 99%)?
|
|
472
|
+
- Do clusters correspond to experimental groups, tissues, stages?
|
|
473
|
+
- Can you find differentially expressed genes between clusters?
|
|
474
|
+
|
|
475
|
+
**Example:** If silhouette suggests k=7 but you only have 3 treatment groups,
|
|
476
|
+
check:
|
|
477
|
+
|
|
478
|
+
- Are clusters splitting known groups? (may indicate batch effects)
|
|
479
|
+
- Are there true biological subtypes within groups? (subtype discovery)
|
|
480
|
+
- Is k=3 much worse than k=7? (sacrifice some metrics for interpretability)
|
|
481
|
+
|
|
482
|
+
### When k Doesn't Matter
|
|
483
|
+
|
|
484
|
+
**Hierarchical clustering:**
|
|
485
|
+
|
|
486
|
+
- Don't need to choose k upfront
|
|
487
|
+
- Cut dendrogram at multiple heights
|
|
488
|
+
- Report cluster membership at different k values
|
|
489
|
+
|
|
490
|
+
**HDBSCAN:**
|
|
491
|
+
|
|
492
|
+
- Finds k automatically
|
|
493
|
+
- Focus on tuning min_cluster_size instead
|
|
494
|
+
- Accept that some points are labeled as noise (-1)
|
|
495
|
+
|
|
496
|
+
### Practical Workflow for k Determination
|
|
497
|
+
|
|
498
|
+
```python
|
|
499
|
+
# 1. Test range of k values
|
|
500
|
+
from scripts.optimal_clusters import find_optimal_clusters
|
|
501
|
+
|
|
502
|
+
results = find_optimal_clusters(
|
|
503
|
+
data, method="kmeans", k_range=range(2, 16),
|
|
504
|
+
metrics=["elbow", "silhouette", "gap", "calinski"]
|
|
505
|
+
)
|
|
506
|
+
|
|
507
|
+
# 2. Examine plots for all metrics
|
|
508
|
+
# Look for agreement and reasonable patterns
|
|
509
|
+
|
|
510
|
+
# 3. Short-list 2-3 candidate k values
|
|
511
|
+
candidates = [3, 5, 7] # Based on metric peaks
|
|
512
|
+
|
|
513
|
+
# 4. Test stability for each candidate
|
|
514
|
+
from scripts.stability_analysis import stability_analysis
|
|
515
|
+
|
|
516
|
+
for k in candidates:
|
|
517
|
+
labels, _ = kmeans_clustering(data, n_clusters=k)
|
|
518
|
+
stability = stability_analysis(data, labels, clustering_method="kmeans")
|
|
519
|
+
print(f"k={k}: stability = {stability['mean_stability']:.3f}")
|
|
520
|
+
|
|
521
|
+
# 5. Visualize each candidate
|
|
522
|
+
for k in candidates:
|
|
523
|
+
labels, _ = kmeans_clustering(data, n_clusters=k)
|
|
524
|
+
plot_pca_scatter(pca_data, labels, output_path=f"pca_k{k}.png")
|
|
525
|
+
|
|
526
|
+
# 6. Choose based on: silhouette + stability + visualization + biology
|
|
527
|
+
# Prioritize biological interpretation over marginal metric improvements
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
---
|
|
531
|
+
|
|
532
|
+
## Decision 4: Dimensionality Reduction
|
|
533
|
+
|
|
534
|
+
### When to Use PCA Before Clustering
|
|
535
|
+
|
|
536
|
+
**Use PCA when:**
|
|
537
|
+
|
|
538
|
+
- ✅ You have >1000 features (genes, proteins, metabolites)
|
|
539
|
+
- ✅ Features are correlated (gene expression data)
|
|
540
|
+
- ✅ You want to reduce noise
|
|
541
|
+
- ✅ Computational cost is high
|
|
542
|
+
- ✅ You want to focus on major variation patterns
|
|
543
|
+
|
|
544
|
+
**Skip PCA when:**
|
|
545
|
+
|
|
546
|
+
- ❌ You have <100 features
|
|
547
|
+
- ❌ You want to interpret individual features
|
|
548
|
+
- ❌ Features are already uncorrelated
|
|
549
|
+
- ❌ You have sparse data (may want sparse methods instead)
|
|
550
|
+
|
|
551
|
+
### How Many PCA Components to Keep?
|
|
552
|
+
|
|
553
|
+
**General rules:**
|
|
554
|
+
|
|
555
|
+
- **80-95% variance explained**: Most common choice
|
|
556
|
+
- **Elbow in scree plot**: Where explained variance flattens
|
|
557
|
+
- **Kaiser criterion**: Keep components with eigenvalue >1
|
|
558
|
+
- **Parallel analysis**: Compare to random data
|
|
559
|
+
|
|
560
|
+
**Practical approach:**
|
|
561
|
+
|
|
562
|
+
```python
|
|
563
|
+
# Try multiple component counts
|
|
564
|
+
n_components_list = [20, 50, 100]
|
|
565
|
+
|
|
566
|
+
for n in n_components_list:
|
|
567
|
+
pca_data, _, explained_var = apply_pca(data, n_components=n)
|
|
568
|
+
print(f"n={n}: {explained_var.sum():.1%} variance explained")
|
|
569
|
+
|
|
570
|
+
# Cluster and validate
|
|
571
|
+
labels, _ = kmeans_clustering(pca_data, n_clusters=5)
|
|
572
|
+
validation = validate_clustering(pca_data, labels)
|
|
573
|
+
print(f" Silhouette: {validation['silhouette']:.3f}")
|
|
574
|
+
|
|
575
|
+
# Choose n with good variance/silhouette trade-off
|
|
576
|
+
```
|
|
577
|
+
|
|
578
|
+
### UMAP: Visualization vs Clustering
|
|
579
|
+
|
|
580
|
+
**UMAP for visualization:**
|
|
581
|
+
|
|
582
|
+
- ✅ Create 2D embeddings for plotting
|
|
583
|
+
- ✅ Better preserves local structure than PCA
|
|
584
|
+
- ✅ Makes nice publication figures
|
|
585
|
+
|
|
586
|
+
**UMAP for clustering:**
|
|
587
|
+
|
|
588
|
+
- ⚠️ Can distort distances
|
|
589
|
+
- ⚠️ Results depend heavily on parameters
|
|
590
|
+
- ⚠️ May create artificial clusters
|
|
591
|
+
- **Recommendation:** Use PCA for clustering, UMAP for visualization only
|
|
592
|
+
|
|
593
|
+
---
|
|
594
|
+
|
|
595
|
+
## Decision 5: Validation Strategy
|
|
596
|
+
|
|
597
|
+
### Internal Validation (No Labels)
|
|
598
|
+
|
|
599
|
+
**Use when:** You don't have ground truth labels (most common)
|
|
600
|
+
|
|
601
|
+
**Metrics:**
|
|
602
|
+
|
|
603
|
+
- **Silhouette score**: Cluster cohesion and separation
|
|
604
|
+
- **Davies-Bouldin index**: Average similarity ratio (lower is better)
|
|
605
|
+
- **Calinski-Harabasz index**: Variance ratio (higher is better)
|
|
606
|
+
|
|
607
|
+
**Strategy:**
|
|
608
|
+
|
|
609
|
+
1. Compute all three metrics
|
|
610
|
+
2. Look for agreement
|
|
611
|
+
3. Silhouette >0.5 + Davies-Bouldin <1.0 = good clustering
|
|
612
|
+
|
|
613
|
+
### External Validation (Labels Available)
|
|
614
|
+
|
|
615
|
+
**Use when:** You have known groups (tissue types, treatment groups, etc.)
|
|
616
|
+
|
|
617
|
+
**Metrics:**
|
|
618
|
+
|
|
619
|
+
- **Adjusted Rand Index (ARI)**: Measures agreement with true labels (0-1)
|
|
620
|
+
- **Normalized Mutual Information (NMI)**: Information theoretic measure
|
|
621
|
+
- **Fowlkes-Mallows Index**: Geometric mean of precision and recall
|
|
622
|
+
|
|
623
|
+
**Strategy:**
|
|
624
|
+
|
|
625
|
+
- If ARI >0.7: Clustering matches known groups well
|
|
626
|
+
- If ARI 0.3-0.7: Partial agreement (may indicate subtypes)
|
|
627
|
+
- If ARI <0.3: Clustering finds different structure than labels
|
|
628
|
+
|
|
629
|
+
### Stability Testing
|
|
630
|
+
|
|
631
|
+
**Use when:** You want reproducible, robust clusters
|
|
632
|
+
|
|
633
|
+
**Method:**
|
|
634
|
+
|
|
635
|
+
1. Bootstrap resample your data (80% of samples)
|
|
636
|
+
2. Re-cluster each bootstrap
|
|
637
|
+
3. Measure consistency of cluster assignments
|
|
638
|
+
4. Stability >0.85 = very robust
|
|
639
|
+
|
|
640
|
+
**When stability is low (<0.7):**
|
|
641
|
+
|
|
642
|
+
- Try different k (may be unstable at this k)
|
|
643
|
+
- Try different algorithm
|
|
644
|
+
- Check for outliers or batch effects
|
|
645
|
+
- May indicate no strong clustering structure
|
|
646
|
+
|
|
647
|
+
### Biological Validation
|
|
648
|
+
|
|
649
|
+
**Use when:** You want biologically meaningful clusters
|
|
650
|
+
|
|
651
|
+
**Approaches:**
|
|
652
|
+
|
|
653
|
+
1. **Differential expression**: Find genes that distinguish clusters
|
|
654
|
+
2. **Pathway enrichment**: Are clusters enriched for biological functions?
|
|
655
|
+
3. **Clinical correlation**: Do clusters associate with outcomes/phenotypes?
|
|
656
|
+
4. **Literature validation**: Do clusters match known subtypes?
|
|
657
|
+
|
|
658
|
+
**Red flags:**
|
|
659
|
+
|
|
660
|
+
- Clusters separate by batch instead of biology
|
|
661
|
+
- No differentially expressed genes between clusters
|
|
662
|
+
- Clusters have no clinical/phenotypic differences
|
|
663
|
+
- Known subtypes split across multiple clusters
|
|
664
|
+
|
|
665
|
+
---
|
|
666
|
+
|
|
667
|
+
## Common Decision Scenarios (Comprehensive)
|
|
668
|
+
|
|
669
|
+
### Scenario 1: Small Exploratory Analysis
|
|
670
|
+
|
|
671
|
+
**Context:** 200 samples, 2000 genes, no idea about clusters
|
|
672
|
+
|
|
673
|
+
**Decisions:**
|
|
674
|
+
|
|
675
|
+
- Algorithm: Hierarchical (small dataset, exploration)
|
|
676
|
+
- Distance: Correlation (gene expression)
|
|
677
|
+
- k: Examine dendrogram, try k=2-10
|
|
678
|
+
- Dim reduction: PCA to 50 components (retain 90% variance)
|
|
679
|
+
- Validation: Silhouette + stability
|
|
680
|
+
|
|
681
|
+
### Scenario 2: Large-Scale Classification
|
|
682
|
+
|
|
683
|
+
**Context:** 10,000 samples, expecting 5 disease subtypes
|
|
684
|
+
|
|
685
|
+
**Decisions:**
|
|
686
|
+
|
|
687
|
+
- Algorithm: K-means (large dataset, known k)
|
|
688
|
+
- Distance: Euclidean (with z-score normalization)
|
|
689
|
+
- k: 5 (based on prior knowledge), validate with silhouette
|
|
690
|
+
- Dim reduction: PCA to 100 components
|
|
691
|
+
- Validation: Internal metrics + clinical correlation
|
|
692
|
+
|
|
693
|
+
### Scenario 3: Outlier Detection Focus
|
|
694
|
+
|
|
695
|
+
**Context:** 500 samples, suspect batch effects/outliers
|
|
696
|
+
|
|
697
|
+
**Decisions:**
|
|
698
|
+
|
|
699
|
+
- Algorithm: HDBSCAN (outlier detection)
|
|
700
|
+
- Distance: Manhattan (robust to outliers)
|
|
701
|
+
- k: Automatic (HDBSCAN finds it)
|
|
702
|
+
- Dim reduction: PCA to 30 components
|
|
703
|
+
- Validation: Identify outliers (-1 label), re-cluster clean data
|
|
704
|
+
|
|
705
|
+
### Scenario 4: Gene Pattern Discovery
|
|
706
|
+
|
|
707
|
+
**Context:** 5,000 genes, 50 samples, find co-expressed gene modules
|
|
708
|
+
|
|
709
|
+
**Decisions:**
|
|
710
|
+
|
|
711
|
+
- Algorithm: Hierarchical (visualization of gene relationships)
|
|
712
|
+
- Distance: Correlation (pattern similarity)
|
|
713
|
+
- k: Cut dendrogram at height corresponding to 10-20 modules
|
|
714
|
+
- Dim reduction: None (clustering in original space)
|
|
715
|
+
- Validation: Check for pathway enrichment in modules
|
|
716
|
+
|
|
717
|
+
### Scenario 5: Publication-Quality Analysis
|
|
718
|
+
|
|
719
|
+
**Context:** 400 samples, preparing manuscript, need robust results
|
|
720
|
+
|
|
721
|
+
**Decisions:**
|
|
722
|
+
|
|
723
|
+
- Algorithm: Compare hierarchical, k-means, HDBSCAN
|
|
724
|
+
- Distance: Try Euclidean and correlation
|
|
725
|
+
- k: Use gap statistic + silhouette, test stability
|
|
726
|
+
- Dim reduction: PCA with variance explained justification
|
|
727
|
+
- Validation: All internal metrics + stability + biological validation
|
|
728
|
+
|
|
729
|
+
### Scenario 6: Time-Series / Trajectory
|
|
730
|
+
|
|
731
|
+
**Context:** Developmental stages, time-course experiment
|
|
732
|
+
|
|
733
|
+
**Decisions:**
|
|
734
|
+
|
|
735
|
+
- Algorithm: Hierarchical (see progression) or HDBSCAN
|
|
736
|
+
- Distance: Correlation (pattern over time)
|
|
737
|
+
- k: Based on known stages or dendrogram
|
|
738
|
+
- Dim reduction: PCA + UMAP for trajectory visualization
|
|
739
|
+
- Validation: Check if clusters follow time order
|
|
740
|
+
|
|
741
|
+
---
|
|
742
|
+
|
|
743
|
+
## Decision Checklist
|
|
744
|
+
|
|
745
|
+
Before finalizing your clustering analysis, ensure:
|
|
746
|
+
|
|
747
|
+
- [ ] **Algorithm chosen** based on sample size, cluster shape expectations, and
|
|
748
|
+
k knowledge
|
|
749
|
+
- [ ] **Distance metric selected** based on data type, normalization, and
|
|
750
|
+
similarity definition
|
|
751
|
+
- [ ] **Optimal k determined** using multiple metrics (or method doesn't require
|
|
752
|
+
k)
|
|
753
|
+
- [ ] **Dimensionality reduction** decision made (PCA components or original
|
|
754
|
+
space)
|
|
755
|
+
- [ ] **Validation strategy** selected and executed
|
|
756
|
+
- [ ] **Computational resources** sufficient for chosen method
|
|
757
|
+
- [ ] **Results visualized** with PCA/UMAP to sanity-check
|
|
758
|
+
- [ ] **Cluster sizes** reasonable (not 1% vs 99%)
|
|
759
|
+
- [ ] **Stability tested** if results will be used for important decisions
|
|
760
|
+
- [ ] **Biological interpretation** considered and documented
|
|
761
|
+
- [ ] **Batch effects** checked and accounted for
|
|
762
|
+
- [ ] **Outliers** identified and handled appropriately
|
|
763
|
+
- [ ] **Multiple k values** tested for sensitivity analysis
|
|
764
|
+
- [ ] **Method comparison** performed if results are critical
|
|
765
|
+
- [ ] **Parameters documented** for reproducibility
|
|
766
|
+
|
|
767
|
+
---
|
|
768
|
+
|
|
769
|
+
## Troubleshooting Decision Problems
|
|
770
|
+
|
|
771
|
+
### Problem: "All my metrics disagree on optimal k"
|
|
772
|
+
|
|
773
|
+
**Solutions:**
|
|
774
|
+
|
|
775
|
+
1. Plot cluster results for k-1, k, k+1 (visualize with PCA)
|
|
776
|
+
2. Check stability at each k
|
|
777
|
+
3. Prioritize silhouette over other metrics
|
|
778
|
+
4. Consider biological interpretability
|
|
779
|
+
5. Report results for multiple k values
|
|
780
|
+
|
|
781
|
+
### Problem: "My clusters don't make biological sense"
|
|
782
|
+
|
|
783
|
+
**Check:**
|
|
784
|
+
|
|
785
|
+
1. Are you separating by batch instead of biology? (check PCA colored by batch)
|
|
786
|
+
2. Is your distance metric appropriate? (try correlation for gene expression)
|
|
787
|
+
3. Have you normalized your data properly?
|
|
788
|
+
4. Are there outliers dominating the clustering?
|
|
789
|
+
5. Try different algorithms (maybe data doesn't cluster well)
|
|
790
|
+
|
|
791
|
+
### Problem: "Clustering is very unstable"
|
|
792
|
+
|
|
793
|
+
**Solutions:**
|
|
794
|
+
|
|
795
|
+
1. Increase n_init for k-means (try 100+)
|
|
796
|
+
2. Try hierarchical instead (deterministic)
|
|
797
|
+
3. Reduce k (may be overfitting)
|
|
798
|
+
4. Remove outliers first
|
|
799
|
+
5. Check if there's real clustering structure (try HDBSCAN)
|
|
800
|
+
|
|
801
|
+
### Problem: "Results vary dramatically with small parameter changes"
|
|
802
|
+
|
|
803
|
+
**This suggests:**
|
|
804
|
+
|
|
805
|
+
- No strong clustering structure exists
|
|
806
|
+
- Consider not clustering at all
|
|
807
|
+
- Use stability analysis extensively
|
|
808
|
+
- Report sensitivity in results
|
|
809
|
+
- Try consensus clustering across multiple runs
|
|
810
|
+
|
|
811
|
+
### Problem: "Method A says k=3, Method B says k=7"
|
|
812
|
+
|
|
813
|
+
**Strategy:**
|
|
814
|
+
|
|
815
|
+
1. Check if k=7 is splitting k=3 clusters into subclusters
|
|
816
|
+
2. Test both with stability analysis
|
|
817
|
+
3. Look at dendrogram to see hierarchical relationships
|
|
818
|
+
4. Consider reporting both (coarse and fine-grained clustering)
|
|
819
|
+
5. Let biological interpretation guide final choice
|