@bgicli/bgicli 2.1.1 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (1266) hide show
  1. package/data/skills/aav-vector-design-agent/SKILL.md +198 -0
  2. package/data/skills/adaptyv/SKILL.md +112 -0
  3. package/data/skills/adhd-daily-planner/SKILL.md +271 -0
  4. package/data/skills/aeon/SKILL.md +372 -0
  5. package/data/skills/agent-browser/SKILL.md +159 -0
  6. package/data/skills/agentd-drug-discovery/SKILL.md +52 -0
  7. package/data/skills/ai-analyzer/SKILL.md +218 -0
  8. package/data/skills/alphafold/SKILL.md +183 -0
  9. package/data/skills/alphafold-database/SKILL.md +500 -0
  10. package/data/skills/anndata/SKILL.md +394 -0
  11. package/data/skills/antibody-design-agent/SKILL.md +64 -0
  12. package/data/skills/arboreto/SKILL.md +237 -0
  13. package/data/skills/armored-cart-design-agent/SKILL.md +225 -0
  14. package/data/skills/arxiv-search/SKILL.md +224 -0
  15. package/data/skills/autonomous-oncology-agent/SKILL.md +77 -0
  16. package/data/skills/bayesian-optimizer/SKILL.md +60 -0
  17. package/data/skills/benchling-integration/SKILL.md +473 -0
  18. package/data/skills/bgpt-paper-search/SKILL.md +81 -0
  19. package/data/skills/bindcraft/SKILL.md +198 -0
  20. package/data/skills/binder-design/SKILL.md +182 -0
  21. package/data/skills/binding-characterization/SKILL.md +234 -0
  22. package/data/skills/bindingdb-database/SKILL.md +332 -0
  23. package/data/skills/bio-admet-prediction/SKILL.md +224 -0
  24. package/data/skills/bio-alignment-files-bam-statistics/SKILL.md +340 -0
  25. package/data/skills/bio-alignment-filtering/SKILL.md +322 -0
  26. package/data/skills/bio-alignment-indexing/SKILL.md +249 -0
  27. package/data/skills/bio-alignment-io/SKILL.md +301 -0
  28. package/data/skills/bio-alignment-msa-parsing/SKILL.md +366 -0
  29. package/data/skills/bio-alignment-msa-statistics/SKILL.md +375 -0
  30. package/data/skills/bio-alignment-pairwise/SKILL.md +277 -0
  31. package/data/skills/bio-alignment-sorting/SKILL.md +296 -0
  32. package/data/skills/bio-alignment-validation/SKILL.md +374 -0
  33. package/data/skills/bio-atac-seq-atac-peak-calling/SKILL.md +221 -0
  34. package/data/skills/bio-atac-seq-atac-qc/SKILL.md +292 -0
  35. package/data/skills/bio-atac-seq-differential-accessibility/SKILL.md +268 -0
  36. package/data/skills/bio-atac-seq-footprinting/SKILL.md +256 -0
  37. package/data/skills/bio-atac-seq-motif-deviation/SKILL.md +319 -0
  38. package/data/skills/bio-atac-seq-nucleosome-positioning/SKILL.md +321 -0
  39. package/data/skills/bio-basecalling/SKILL.md +368 -0
  40. package/data/skills/bio-batch-downloads/SKILL.md +384 -0
  41. package/data/skills/bio-batch-processing/SKILL.md +303 -0
  42. package/data/skills/bio-bedgraph-handling/SKILL.md +336 -0
  43. package/data/skills/bio-blast-searches/SKILL.md +354 -0
  44. package/data/skills/bio-causal-genomics-colocalization-analysis/SKILL.md +264 -0
  45. package/data/skills/bio-causal-genomics-fine-mapping/SKILL.md +267 -0
  46. package/data/skills/bio-causal-genomics-mediation-analysis/SKILL.md +264 -0
  47. package/data/skills/bio-causal-genomics-mendelian-randomization/SKILL.md +221 -0
  48. package/data/skills/bio-causal-genomics-pleiotropy-detection/SKILL.md +292 -0
  49. package/data/skills/bio-cfdna-preprocessing/SKILL.md +200 -0
  50. package/data/skills/bio-chipseq-differential-binding/SKILL.md +262 -0
  51. package/data/skills/bio-chipseq-motif-analysis/SKILL.md +387 -0
  52. package/data/skills/bio-chipseq-peak-annotation/SKILL.md +239 -0
  53. package/data/skills/bio-chipseq-peak-calling/SKILL.md +277 -0
  54. package/data/skills/bio-chipseq-qc/SKILL.md +391 -0
  55. package/data/skills/bio-chipseq-super-enhancers/SKILL.md +288 -0
  56. package/data/skills/bio-chipseq-visualization/SKILL.md +289 -0
  57. package/data/skills/bio-clinical-databases-clinvar-lookup/SKILL.md +188 -0
  58. package/data/skills/bio-clinical-databases-dbsnp-queries/SKILL.md +171 -0
  59. package/data/skills/bio-clinical-databases-gnomad-frequencies/SKILL.md +205 -0
  60. package/data/skills/bio-clinical-databases-hla-typing/SKILL.md +248 -0
  61. package/data/skills/bio-clinical-databases-myvariant-queries/SKILL.md +174 -0
  62. package/data/skills/bio-clinical-databases-pharmacogenomics/SKILL.md +232 -0
  63. package/data/skills/bio-clinical-databases-polygenic-risk/SKILL.md +276 -0
  64. package/data/skills/bio-clinical-databases-somatic-signatures/SKILL.md +261 -0
  65. package/data/skills/bio-clinical-databases-tumor-mutational-burden/SKILL.md +301 -0
  66. package/data/skills/bio-clinical-databases-variant-prioritization/SKILL.md +225 -0
  67. package/data/skills/bio-clip-seq-binding-site-annotation/SKILL.md +66 -0
  68. package/data/skills/bio-clip-seq-clip-alignment/SKILL.md +70 -0
  69. package/data/skills/bio-clip-seq-clip-motif-analysis/SKILL.md +62 -0
  70. package/data/skills/bio-clip-seq-clip-peak-calling/SKILL.md +282 -0
  71. package/data/skills/bio-clip-seq-clip-preprocessing/SKILL.md +142 -0
  72. package/data/skills/bio-codon-usage/SKILL.md +353 -0
  73. package/data/skills/bio-comparative-genomics-ancestral-reconstruction/SKILL.md +312 -0
  74. package/data/skills/bio-comparative-genomics-hgt-detection/SKILL.md +341 -0
  75. package/data/skills/bio-comparative-genomics-ortholog-inference/SKILL.md +308 -0
  76. package/data/skills/bio-comparative-genomics-positive-selection/SKILL.md +354 -0
  77. package/data/skills/bio-comparative-genomics-synteny-analysis/SKILL.md +315 -0
  78. package/data/skills/bio-compressed-files/SKILL.md +263 -0
  79. package/data/skills/bio-consensus-sequences/SKILL.md +340 -0
  80. package/data/skills/bio-copy-number-cnv-annotation/SKILL.md +307 -0
  81. package/data/skills/bio-copy-number-cnv-visualization/SKILL.md +294 -0
  82. package/data/skills/bio-copy-number-cnvkit-analysis/SKILL.md +290 -0
  83. package/data/skills/bio-copy-number-gatk-cnv/SKILL.md +270 -0
  84. package/data/skills/bio-crispr-screens-base-editing-analysis/SKILL.md +110 -0
  85. package/data/skills/bio-crispr-screens-batch-correction/SKILL.md +316 -0
  86. package/data/skills/bio-crispr-screens-crispresso-editing/SKILL.md +205 -0
  87. package/data/skills/bio-crispr-screens-hit-calling/SKILL.md +264 -0
  88. package/data/skills/bio-crispr-screens-jacks-analysis/SKILL.md +313 -0
  89. package/data/skills/bio-crispr-screens-library-design/SKILL.md +417 -0
  90. package/data/skills/bio-crispr-screens-mageck-analysis/SKILL.md +222 -0
  91. package/data/skills/bio-crispr-screens-screen-qc/SKILL.md +243 -0
  92. package/data/skills/bio-ctdna-mutation-detection/SKILL.md +234 -0
  93. package/data/skills/bio-data-visualization-circos-plots/SKILL.md +405 -0
  94. package/data/skills/bio-data-visualization-color-palettes/SKILL.md +244 -0
  95. package/data/skills/bio-data-visualization-genome-browser-tracks/SKILL.md +328 -0
  96. package/data/skills/bio-data-visualization-genome-tracks/SKILL.md +249 -0
  97. package/data/skills/bio-data-visualization-ggplot2-fundamentals/SKILL.md +313 -0
  98. package/data/skills/bio-data-visualization-heatmaps-clustering/SKILL.md +227 -0
  99. package/data/skills/bio-data-visualization-interactive-visualization/SKILL.md +210 -0
  100. package/data/skills/bio-data-visualization-multipanel-figures/SKILL.md +274 -0
  101. package/data/skills/bio-data-visualization-specialized-omics-plots/SKILL.md +251 -0
  102. package/data/skills/bio-data-visualization-upset-plots/SKILL.md +228 -0
  103. package/data/skills/bio-data-visualization-volcano-customization/SKILL.md +233 -0
  104. package/data/skills/bio-de-deseq2-basics/SKILL.md +376 -0
  105. package/data/skills/bio-de-edger-basics/SKILL.md +418 -0
  106. package/data/skills/bio-de-results/SKILL.md +378 -0
  107. package/data/skills/bio-de-visualization/SKILL.md +408 -0
  108. package/data/skills/bio-differential-expression-batch-correction/SKILL.md +253 -0
  109. package/data/skills/bio-differential-expression-timeseries-de/SKILL.md +370 -0
  110. package/data/skills/bio-differential-splicing/SKILL.md +177 -0
  111. package/data/skills/bio-duplicate-handling/SKILL.md +292 -0
  112. package/data/skills/bio-entrez-fetch/SKILL.md +334 -0
  113. package/data/skills/bio-entrez-link/SKILL.md +325 -0
  114. package/data/skills/bio-entrez-search/SKILL.md +311 -0
  115. package/data/skills/bio-epidemiological-genomics-amr-surveillance/SKILL.md +233 -0
  116. package/data/skills/bio-epidemiological-genomics-pathogen-typing/SKILL.md +202 -0
  117. package/data/skills/bio-epidemiological-genomics-phylodynamics/SKILL.md +207 -0
  118. package/data/skills/bio-epidemiological-genomics-transmission-inference/SKILL.md +237 -0
  119. package/data/skills/bio-epidemiological-genomics-variant-surveillance/SKILL.md +237 -0
  120. package/data/skills/bio-epitranscriptomics-m6a-differential/SKILL.md +88 -0
  121. package/data/skills/bio-epitranscriptomics-m6a-peak-calling/SKILL.md +89 -0
  122. package/data/skills/bio-epitranscriptomics-m6anet-analysis/SKILL.md +101 -0
  123. package/data/skills/bio-epitranscriptomics-merip-preprocessing/SKILL.md +81 -0
  124. package/data/skills/bio-epitranscriptomics-modification-visualization/SKILL.md +98 -0
  125. package/data/skills/bio-experimental-design-batch-design/SKILL.md +110 -0
  126. package/data/skills/bio-experimental-design-multiple-testing/SKILL.md +98 -0
  127. package/data/skills/bio-experimental-design-power-analysis/SKILL.md +84 -0
  128. package/data/skills/bio-experimental-design-sample-size/SKILL.md +93 -0
  129. package/data/skills/bio-expression-matrix-counts-ingest/SKILL.md +220 -0
  130. package/data/skills/bio-expression-matrix-gene-id-mapping/SKILL.md +256 -0
  131. package/data/skills/bio-expression-matrix-metadata-joins/SKILL.md +271 -0
  132. package/data/skills/bio-expression-matrix-sparse-handling/SKILL.md +247 -0
  133. package/data/skills/bio-fastq-quality/SKILL.md +279 -0
  134. package/data/skills/bio-filter-sequences/SKILL.md +265 -0
  135. package/data/skills/bio-flow-cytometry-bead-normalization/SKILL.md +315 -0
  136. package/data/skills/bio-flow-cytometry-clustering-phenotyping/SKILL.md +237 -0
  137. package/data/skills/bio-flow-cytometry-compensation-transformation/SKILL.md +196 -0
  138. package/data/skills/bio-flow-cytometry-cytometry-qc/SKILL.md +382 -0
  139. package/data/skills/bio-flow-cytometry-differential-analysis/SKILL.md +217 -0
  140. package/data/skills/bio-flow-cytometry-doublet-detection/SKILL.md +288 -0
  141. package/data/skills/bio-flow-cytometry-fcs-handling/SKILL.md +221 -0
  142. package/data/skills/bio-flow-cytometry-gating-analysis/SKILL.md +193 -0
  143. package/data/skills/bio-format-conversion/SKILL.md +193 -0
  144. package/data/skills/bio-fragment-analysis/SKILL.md +214 -0
  145. package/data/skills/bio-gatk-variant-calling/SKILL.md +422 -0
  146. package/data/skills/bio-genome-assembly-assembly-polishing/SKILL.md +333 -0
  147. package/data/skills/bio-genome-assembly-assembly-qc/SKILL.md +344 -0
  148. package/data/skills/bio-genome-assembly-contamination-detection/SKILL.md +235 -0
  149. package/data/skills/bio-genome-assembly-hifi-assembly/SKILL.md +178 -0
  150. package/data/skills/bio-genome-assembly-long-read-assembly/SKILL.md +307 -0
  151. package/data/skills/bio-genome-assembly-metagenome-assembly/SKILL.md +227 -0
  152. package/data/skills/bio-genome-assembly-scaffolding/SKILL.md +204 -0
  153. package/data/skills/bio-genome-assembly-short-read-assembly/SKILL.md +319 -0
  154. package/data/skills/bio-genome-engineering-base-editing-design/SKILL.md +277 -0
  155. package/data/skills/bio-genome-engineering-grna-design/SKILL.md +221 -0
  156. package/data/skills/bio-genome-engineering-hdr-template-design/SKILL.md +264 -0
  157. package/data/skills/bio-genome-engineering-off-target-prediction/SKILL.md +232 -0
  158. package/data/skills/bio-genome-engineering-prime-editing-design/SKILL.md +275 -0
  159. package/data/skills/bio-genome-intervals-bed-file-basics/SKILL.md +357 -0
  160. package/data/skills/bio-genome-intervals-bigwig-tracks/SKILL.md +351 -0
  161. package/data/skills/bio-genome-intervals-coverage-analysis/SKILL.md +300 -0
  162. package/data/skills/bio-genome-intervals-gtf-gff-handling/SKILL.md +345 -0
  163. package/data/skills/bio-genome-intervals-interval-arithmetic/SKILL.md +485 -0
  164. package/data/skills/bio-genome-intervals-proximity-operations/SKILL.md +337 -0
  165. package/data/skills/bio-geo-data/SKILL.md +380 -0
  166. package/data/skills/bio-hi-c-analysis-compartment-analysis/SKILL.md +261 -0
  167. package/data/skills/bio-hi-c-analysis-contact-pairs/SKILL.md +278 -0
  168. package/data/skills/bio-hi-c-analysis-hic-data-io/SKILL.md +260 -0
  169. package/data/skills/bio-hi-c-analysis-hic-differential/SKILL.md +328 -0
  170. package/data/skills/bio-hi-c-analysis-hic-visualization/SKILL.md +297 -0
  171. package/data/skills/bio-hi-c-analysis-loop-calling/SKILL.md +284 -0
  172. package/data/skills/bio-hi-c-analysis-matrix-operations/SKILL.md +274 -0
  173. package/data/skills/bio-hi-c-analysis-tad-detection/SKILL.md +239 -0
  174. package/data/skills/bio-imaging-mass-cytometry-cell-segmentation/SKILL.md +241 -0
  175. package/data/skills/bio-imaging-mass-cytometry-data-preprocessing/SKILL.md +279 -0
  176. package/data/skills/bio-imaging-mass-cytometry-interactive-annotation/SKILL.md +304 -0
  177. package/data/skills/bio-imaging-mass-cytometry-phenotyping/SKILL.md +231 -0
  178. package/data/skills/bio-imaging-mass-cytometry-quality-metrics/SKILL.md +316 -0
  179. package/data/skills/bio-imaging-mass-cytometry-spatial-analysis/SKILL.md +246 -0
  180. package/data/skills/bio-immunoinformatics-epitope-prediction/SKILL.md +259 -0
  181. package/data/skills/bio-immunoinformatics-immunogenicity-scoring/SKILL.md +275 -0
  182. package/data/skills/bio-immunoinformatics-mhc-binding-prediction/SKILL.md +260 -0
  183. package/data/skills/bio-immunoinformatics-neoantigen-prediction/SKILL.md +277 -0
  184. package/data/skills/bio-immunoinformatics-tcr-epitope-binding/SKILL.md +257 -0
  185. package/data/skills/bio-isoform-switching/SKILL.md +192 -0
  186. package/data/skills/bio-liquid-biopsy-pipeline/SKILL.md +311 -0
  187. package/data/skills/bio-local-blast/SKILL.md +350 -0
  188. package/data/skills/bio-long-read-sequencing-clair3-variants/SKILL.md +252 -0
  189. package/data/skills/bio-long-read-sequencing-isoseq-analysis/SKILL.md +334 -0
  190. package/data/skills/bio-long-read-sequencing-nanopore-methylation/SKILL.md +110 -0
  191. package/data/skills/bio-longitudinal-monitoring/SKILL.md +271 -0
  192. package/data/skills/bio-longread-alignment/SKILL.md +193 -0
  193. package/data/skills/bio-longread-medaka/SKILL.md +176 -0
  194. package/data/skills/bio-longread-qc/SKILL.md +224 -0
  195. package/data/skills/bio-longread-structural-variants/SKILL.md +201 -0
  196. package/data/skills/bio-machine-learning-atlas-mapping/SKILL.md +139 -0
  197. package/data/skills/bio-machine-learning-biomarker-discovery/SKILL.md +157 -0
  198. package/data/skills/bio-machine-learning-model-validation/SKILL.md +148 -0
  199. package/data/skills/bio-machine-learning-omics-classifiers/SKILL.md +146 -0
  200. package/data/skills/bio-machine-learning-prediction-explanation/SKILL.md +162 -0
  201. package/data/skills/bio-machine-learning-survival-analysis/SKILL.md +176 -0
  202. package/data/skills/bio-metabolomics-lipidomics/SKILL.md +265 -0
  203. package/data/skills/bio-metabolomics-metabolite-annotation/SKILL.md +241 -0
  204. package/data/skills/bio-metabolomics-msdial-preprocessing/SKILL.md +308 -0
  205. package/data/skills/bio-metabolomics-normalization-qc/SKILL.md +283 -0
  206. package/data/skills/bio-metabolomics-pathway-mapping/SKILL.md +237 -0
  207. package/data/skills/bio-metabolomics-statistical-analysis/SKILL.md +276 -0
  208. package/data/skills/bio-metabolomics-targeted-analysis/SKILL.md +314 -0
  209. package/data/skills/bio-metabolomics-xcms-preprocessing/SKILL.md +268 -0
  210. package/data/skills/bio-metagenomics-abundance/SKILL.md +203 -0
  211. package/data/skills/bio-metagenomics-amr-detection/SKILL.md +293 -0
  212. package/data/skills/bio-metagenomics-functional-profiling/SKILL.md +252 -0
  213. package/data/skills/bio-metagenomics-kraken/SKILL.md +204 -0
  214. package/data/skills/bio-metagenomics-metaphlan/SKILL.md +214 -0
  215. package/data/skills/bio-metagenomics-strain-tracking/SKILL.md +292 -0
  216. package/data/skills/bio-metagenomics-visualization/SKILL.md +240 -0
  217. package/data/skills/bio-methylation-based-detection/SKILL.md +223 -0
  218. package/data/skills/bio-methylation-bismark-alignment/SKILL.md +195 -0
  219. package/data/skills/bio-methylation-calling/SKILL.md +200 -0
  220. package/data/skills/bio-methylation-dmr-detection/SKILL.md +211 -0
  221. package/data/skills/bio-methylation-methylkit/SKILL.md +219 -0
  222. package/data/skills/bio-microbiome-amplicon-processing/SKILL.md +137 -0
  223. package/data/skills/bio-microbiome-differential-abundance/SKILL.md +147 -0
  224. package/data/skills/bio-microbiome-diversity-analysis/SKILL.md +188 -0
  225. package/data/skills/bio-microbiome-functional-prediction/SKILL.md +153 -0
  226. package/data/skills/bio-microbiome-qiime2-workflow/SKILL.md +219 -0
  227. package/data/skills/bio-microbiome-taxonomy-assignment/SKILL.md +168 -0
  228. package/data/skills/bio-molecular-descriptors/SKILL.md +200 -0
  229. package/data/skills/bio-molecular-io/SKILL.md +188 -0
  230. package/data/skills/bio-motif-search/SKILL.md +354 -0
  231. package/data/skills/bio-multi-omics-data-harmonization/SKILL.md +228 -0
  232. package/data/skills/bio-multi-omics-mixomics-analysis/SKILL.md +221 -0
  233. package/data/skills/bio-multi-omics-mofa-integration/SKILL.md +225 -0
  234. package/data/skills/bio-multi-omics-similarity-network/SKILL.md +235 -0
  235. package/data/skills/bio-orchestrator/SKILL.md +133 -0
  236. package/data/skills/bio-paired-end-fastq/SKILL.md +334 -0
  237. package/data/skills/bio-pathway-enrichment-visualization/SKILL.md +278 -0
  238. package/data/skills/bio-pathway-go-enrichment/SKILL.md +218 -0
  239. package/data/skills/bio-pathway-gsea/SKILL.md +227 -0
  240. package/data/skills/bio-pathway-kegg-pathways/SKILL.md +234 -0
  241. package/data/skills/bio-pathway-reactome/SKILL.md +215 -0
  242. package/data/skills/bio-pathway-wikipathways/SKILL.md +255 -0
  243. package/data/skills/bio-pdb-geometric-analysis/SKILL.md +475 -0
  244. package/data/skills/bio-pdb-structure-io/SKILL.md +296 -0
  245. package/data/skills/bio-pdb-structure-modification/SKILL.md +448 -0
  246. package/data/skills/bio-pdb-structure-navigation/SKILL.md +335 -0
  247. package/data/skills/bio-phasing-imputation-genotype-imputation/SKILL.md +201 -0
  248. package/data/skills/bio-phasing-imputation-haplotype-phasing/SKILL.md +190 -0
  249. package/data/skills/bio-phasing-imputation-imputation-qc/SKILL.md +265 -0
  250. package/data/skills/bio-phasing-imputation-reference-panels/SKILL.md +203 -0
  251. package/data/skills/bio-phylo-distance-calculations/SKILL.md +307 -0
  252. package/data/skills/bio-phylo-modern-tree-inference/SKILL.md +274 -0
  253. package/data/skills/bio-phylo-tree-io/SKILL.md +252 -0
  254. package/data/skills/bio-phylo-tree-manipulation/SKILL.md +375 -0
  255. package/data/skills/bio-phylo-tree-visualization/SKILL.md +275 -0
  256. package/data/skills/bio-pileup-generation/SKILL.md +314 -0
  257. package/data/skills/bio-population-genetics-association-testing/SKILL.md +293 -0
  258. package/data/skills/bio-population-genetics-linkage-disequilibrium/SKILL.md +260 -0
  259. package/data/skills/bio-population-genetics-plink-basics/SKILL.md +338 -0
  260. package/data/skills/bio-population-genetics-population-structure/SKILL.md +352 -0
  261. package/data/skills/bio-population-genetics-scikit-allel-analysis/SKILL.md +306 -0
  262. package/data/skills/bio-population-genetics-selection-statistics/SKILL.md +251 -0
  263. package/data/skills/bio-primer-design-primer-basics/SKILL.md +289 -0
  264. package/data/skills/bio-primer-design-primer-validation/SKILL.md +344 -0
  265. package/data/skills/bio-primer-design-qpcr-primers/SKILL.md +273 -0
  266. package/data/skills/bio-proteomics-data-import/SKILL.md +122 -0
  267. package/data/skills/bio-proteomics-dia-analysis/SKILL.md +246 -0
  268. package/data/skills/bio-proteomics-differential-abundance/SKILL.md +129 -0
  269. package/data/skills/bio-proteomics-peptide-identification/SKILL.md +122 -0
  270. package/data/skills/bio-proteomics-protein-inference/SKILL.md +174 -0
  271. package/data/skills/bio-proteomics-proteomics-qc/SKILL.md +208 -0
  272. package/data/skills/bio-proteomics-ptm-analysis/SKILL.md +139 -0
  273. package/data/skills/bio-proteomics-quantification/SKILL.md +141 -0
  274. package/data/skills/bio-proteomics-spectral-libraries/SKILL.md +270 -0
  275. package/data/skills/bio-reaction-enumeration/SKILL.md +251 -0
  276. package/data/skills/bio-read-alignment-bowtie2-alignment/SKILL.md +189 -0
  277. package/data/skills/bio-read-alignment-bwa-alignment/SKILL.md +166 -0
  278. package/data/skills/bio-read-alignment-hisat2-alignment/SKILL.md +205 -0
  279. package/data/skills/bio-read-alignment-star-alignment/SKILL.md +204 -0
  280. package/data/skills/bio-read-qc-adapter-trimming/SKILL.md +222 -0
  281. package/data/skills/bio-read-qc-contamination-screening/SKILL.md +252 -0
  282. package/data/skills/bio-read-qc-fastp-workflow/SKILL.md +278 -0
  283. package/data/skills/bio-read-qc-quality-filtering/SKILL.md +231 -0
  284. package/data/skills/bio-read-qc-quality-reports/SKILL.md +204 -0
  285. package/data/skills/bio-read-qc-umi-processing/SKILL.md +391 -0
  286. package/data/skills/bio-read-sequences/SKILL.md +319 -0
  287. package/data/skills/bio-reference-operations/SKILL.md +302 -0
  288. package/data/skills/bio-reporting-automated-qc-reports/SKILL.md +103 -0
  289. package/data/skills/bio-reporting-figure-export/SKILL.md +112 -0
  290. package/data/skills/bio-reporting-jupyter-reports/SKILL.md +98 -0
  291. package/data/skills/bio-reporting-quarto-reports/SKILL.md +295 -0
  292. package/data/skills/bio-reporting-rmarkdown-reports/SKILL.md +276 -0
  293. package/data/skills/bio-research-tools-biomarker-signature-studio/SKILL.md +99 -0
  294. package/data/skills/bio-restriction-enzyme-selection/SKILL.md +342 -0
  295. package/data/skills/bio-restriction-fragment-analysis/SKILL.md +259 -0
  296. package/data/skills/bio-restriction-mapping/SKILL.md +239 -0
  297. package/data/skills/bio-restriction-sites/SKILL.md +222 -0
  298. package/data/skills/bio-reverse-complement/SKILL.md +250 -0
  299. package/data/skills/bio-ribo-seq-orf-detection/SKILL.md +303 -0
  300. package/data/skills/bio-ribo-seq-riboseq-preprocessing/SKILL.md +176 -0
  301. package/data/skills/bio-ribo-seq-ribosome-periodicity/SKILL.md +182 -0
  302. package/data/skills/bio-ribo-seq-ribosome-stalling/SKILL.md +217 -0
  303. package/data/skills/bio-ribo-seq-translation-efficiency/SKILL.md +183 -0
  304. package/data/skills/bio-rna-quantification-alignment-free-quant/SKILL.md +226 -0
  305. package/data/skills/bio-rna-quantification-count-matrix-qc/SKILL.md +310 -0
  306. package/data/skills/bio-rna-quantification-featurecounts-counting/SKILL.md +190 -0
  307. package/data/skills/bio-rna-quantification-tximport-workflow/SKILL.md +240 -0
  308. package/data/skills/bio-rnaseq-qc/SKILL.md +320 -0
  309. package/data/skills/bio-sam-bam-basics/SKILL.md +248 -0
  310. package/data/skills/bio-sashimi-plots/SKILL.md +175 -0
  311. package/data/skills/bio-seq-objects/SKILL.md +240 -0
  312. package/data/skills/bio-sequence-properties/SKILL.md +397 -0
  313. package/data/skills/bio-sequence-similarity/SKILL.md +335 -0
  314. package/data/skills/bio-sequence-slicing/SKILL.md +232 -0
  315. package/data/skills/bio-sequence-statistics/SKILL.md +318 -0
  316. package/data/skills/bio-similarity-searching/SKILL.md +200 -0
  317. package/data/skills/bio-single-cell-batch-integration/SKILL.md +317 -0
  318. package/data/skills/bio-single-cell-cell-annotation/SKILL.md +259 -0
  319. package/data/skills/bio-single-cell-cell-communication/SKILL.md +257 -0
  320. package/data/skills/bio-single-cell-clustering/SKILL.md +330 -0
  321. package/data/skills/bio-single-cell-data-io/SKILL.md +315 -0
  322. package/data/skills/bio-single-cell-doublet-detection/SKILL.md +362 -0
  323. package/data/skills/bio-single-cell-lineage-tracing/SKILL.md +319 -0
  324. package/data/skills/bio-single-cell-markers-annotation/SKILL.md +317 -0
  325. package/data/skills/bio-single-cell-metabolite-communication/SKILL.md +258 -0
  326. package/data/skills/bio-single-cell-multimodal-integration/SKILL.md +242 -0
  327. package/data/skills/bio-single-cell-perturb-seq/SKILL.md +258 -0
  328. package/data/skills/bio-single-cell-preprocessing/SKILL.md +338 -0
  329. package/data/skills/bio-single-cell-scatac-analysis/SKILL.md +326 -0
  330. package/data/skills/bio-single-cell-splicing/SKILL.md +199 -0
  331. package/data/skills/bio-single-cell-trajectory-inference/SKILL.md +225 -0
  332. package/data/skills/bio-small-rna-seq-differential-mirna/SKILL.md +194 -0
  333. package/data/skills/bio-small-rna-seq-mirdeep2-analysis/SKILL.md +180 -0
  334. package/data/skills/bio-small-rna-seq-mirge3-analysis/SKILL.md +178 -0
  335. package/data/skills/bio-small-rna-seq-smrna-preprocessing/SKILL.md +174 -0
  336. package/data/skills/bio-small-rna-seq-target-prediction/SKILL.md +202 -0
  337. package/data/skills/bio-spatial-transcriptomics-image-analysis/SKILL.md +283 -0
  338. package/data/skills/bio-spatial-transcriptomics-spatial-communication/SKILL.md +299 -0
  339. package/data/skills/bio-spatial-transcriptomics-spatial-data-io/SKILL.md +272 -0
  340. package/data/skills/bio-spatial-transcriptomics-spatial-deconvolution/SKILL.md +314 -0
  341. package/data/skills/bio-spatial-transcriptomics-spatial-domains/SKILL.md +254 -0
  342. package/data/skills/bio-spatial-transcriptomics-spatial-multiomics/SKILL.md +181 -0
  343. package/data/skills/bio-spatial-transcriptomics-spatial-neighbors/SKILL.md +198 -0
  344. package/data/skills/bio-spatial-transcriptomics-spatial-preprocessing/SKILL.md +269 -0
  345. package/data/skills/bio-spatial-transcriptomics-spatial-proteomics/SKILL.md +124 -0
  346. package/data/skills/bio-spatial-transcriptomics-spatial-statistics/SKILL.md +237 -0
  347. package/data/skills/bio-spatial-transcriptomics-spatial-visualization/SKILL.md +287 -0
  348. package/data/skills/bio-splicing-pipeline/SKILL.md +253 -0
  349. package/data/skills/bio-splicing-qc/SKILL.md +190 -0
  350. package/data/skills/bio-splicing-quantification/SKILL.md +145 -0
  351. package/data/skills/bio-sra-data/SKILL.md +363 -0
  352. package/data/skills/bio-structural-biology-alphafold-predictions/SKILL.md +258 -0
  353. package/data/skills/bio-structural-biology-modern-structure-prediction/SKILL.md +346 -0
  354. package/data/skills/bio-substructure-search/SKILL.md +206 -0
  355. package/data/skills/bio-systems-biology-context-specific-models/SKILL.md +241 -0
  356. package/data/skills/bio-systems-biology-flux-balance-analysis/SKILL.md +206 -0
  357. package/data/skills/bio-systems-biology-gene-essentiality/SKILL.md +235 -0
  358. package/data/skills/bio-systems-biology-metabolic-reconstruction/SKILL.md +215 -0
  359. package/data/skills/bio-systems-biology-model-curation/SKILL.md +243 -0
  360. package/data/skills/bio-tcr-bcr-analysis-immcantation-analysis/SKILL.md +195 -0
  361. package/data/skills/bio-tcr-bcr-analysis-mixcr-analysis/SKILL.md +167 -0
  362. package/data/skills/bio-tcr-bcr-analysis-repertoire-visualization/SKILL.md +224 -0
  363. package/data/skills/bio-tcr-bcr-analysis-scirpy-analysis/SKILL.md +168 -0
  364. package/data/skills/bio-tcr-bcr-analysis-vdjtools-analysis/SKILL.md +188 -0
  365. package/data/skills/bio-transcription-translation/SKILL.md +237 -0
  366. package/data/skills/bio-tumor-fraction-estimation/SKILL.md +211 -0
  367. package/data/skills/bio-uniprot-access/SKILL.md +239 -0
  368. package/data/skills/bio-variant-annotation/SKILL.md +410 -0
  369. package/data/skills/bio-variant-calling/SKILL.md +266 -0
  370. package/data/skills/bio-variant-calling-clinical-interpretation/SKILL.md +355 -0
  371. package/data/skills/bio-variant-calling-deepvariant/SKILL.md +315 -0
  372. package/data/skills/bio-variant-calling-filtering-best-practices/SKILL.md +403 -0
  373. package/data/skills/bio-variant-calling-joint-calling/SKILL.md +338 -0
  374. package/data/skills/bio-variant-calling-structural-variant-calling/SKILL.md +253 -0
  375. package/data/skills/bio-variant-normalization/SKILL.md +325 -0
  376. package/data/skills/bio-vcf-basics/SKILL.md +342 -0
  377. package/data/skills/bio-vcf-manipulation/SKILL.md +429 -0
  378. package/data/skills/bio-vcf-statistics/SKILL.md +445 -0
  379. package/data/skills/bio-virtual-screening/SKILL.md +263 -0
  380. package/data/skills/bio-workflow-management-cwl-workflows/SKILL.md +433 -0
  381. package/data/skills/bio-workflow-management-nextflow-pipelines/SKILL.md +386 -0
  382. package/data/skills/bio-workflow-management-snakemake-workflows/SKILL.md +383 -0
  383. package/data/skills/bio-workflow-management-wdl-workflows/SKILL.md +500 -0
  384. package/data/skills/bio-workflows-atacseq-pipeline/SKILL.md +362 -0
  385. package/data/skills/bio-workflows-biomarker-pipeline/SKILL.md +272 -0
  386. package/data/skills/bio-workflows-chipseq-pipeline/SKILL.md +282 -0
  387. package/data/skills/bio-workflows-clip-pipeline/SKILL.md +268 -0
  388. package/data/skills/bio-workflows-cnv-pipeline/SKILL.md +324 -0
  389. package/data/skills/bio-workflows-crispr-editing-pipeline/SKILL.md +455 -0
  390. package/data/skills/bio-workflows-crispr-screen-pipeline/SKILL.md +278 -0
  391. package/data/skills/bio-workflows-cytometry-pipeline/SKILL.md +328 -0
  392. package/data/skills/bio-workflows-expression-to-pathways/SKILL.md +329 -0
  393. package/data/skills/bio-workflows-fastq-to-variants/SKILL.md +374 -0
  394. package/data/skills/bio-workflows-genome-assembly-pipeline/SKILL.md +290 -0
  395. package/data/skills/bio-workflows-gwas-pipeline/SKILL.md +323 -0
  396. package/data/skills/bio-workflows-hic-pipeline/SKILL.md +304 -0
  397. package/data/skills/bio-workflows-imc-pipeline/SKILL.md +304 -0
  398. package/data/skills/bio-workflows-longread-sv-pipeline/SKILL.md +281 -0
  399. package/data/skills/bio-workflows-merip-pipeline/SKILL.md +222 -0
  400. package/data/skills/bio-workflows-metabolic-modeling-pipeline/SKILL.md +408 -0
  401. package/data/skills/bio-workflows-metabolomics-pipeline/SKILL.md +297 -0
  402. package/data/skills/bio-workflows-metagenomics-pipeline/SKILL.md +283 -0
  403. package/data/skills/bio-workflows-methylation-pipeline/SKILL.md +274 -0
  404. package/data/skills/bio-workflows-microbiome-pipeline/SKILL.md +221 -0
  405. package/data/skills/bio-workflows-multi-omics-pipeline/SKILL.md +362 -0
  406. package/data/skills/bio-workflows-multiome-pipeline/SKILL.md +298 -0
  407. package/data/skills/bio-workflows-neoantigen-pipeline/SKILL.md +325 -0
  408. package/data/skills/bio-workflows-outbreak-pipeline/SKILL.md +341 -0
  409. package/data/skills/bio-workflows-proteomics-pipeline/SKILL.md +226 -0
  410. package/data/skills/bio-workflows-riboseq-pipeline/SKILL.md +94 -0
  411. package/data/skills/bio-workflows-rnaseq-to-de/SKILL.md +345 -0
  412. package/data/skills/bio-workflows-scrnaseq-pipeline/SKILL.md +354 -0
  413. package/data/skills/bio-workflows-smrna-pipeline/SKILL.md +86 -0
  414. package/data/skills/bio-workflows-somatic-variant-pipeline/SKILL.md +313 -0
  415. package/data/skills/bio-workflows-spatial-pipeline/SKILL.md +267 -0
  416. package/data/skills/bio-workflows-tcr-pipeline/SKILL.md +84 -0
  417. package/data/skills/bio-write-sequences/SKILL.md +205 -0
  418. package/data/skills/bioinformatics-singlecell/SKILL.md +143 -0
  419. package/data/skills/biokernel/SKILL.md +61 -0
  420. package/data/skills/biologist-analyst/SKILL.md +799 -0
  421. package/data/skills/biomaster-workflows/SKILL.md +55 -0
  422. package/data/skills/biomcp-server/SKILL.md +65 -0
  423. package/data/skills/biomedical-data-analysis/SKILL.md +56 -0
  424. package/data/skills/biomedical-search/SKILL.md +214 -0
  425. package/data/skills/biomni/SKILL.md +309 -0
  426. package/data/skills/biomni-general-agent/SKILL.md +43 -0
  427. package/data/skills/biomni-research-agent/SKILL.md +76 -0
  428. package/data/skills/biopython/SKILL.md +437 -0
  429. package/data/skills/biorxiv-database/SKILL.md +477 -0
  430. package/data/skills/bioservices/SKILL.md +355 -0
  431. package/data/skills/boltz/SKILL.md +188 -0
  432. package/data/skills/boltzgen/SKILL.md +287 -0
  433. package/data/skills/bone-marrow-ai-agent/SKILL.md +163 -0
  434. package/data/skills/brainstorming/SKILL.md +96 -0
  435. package/data/skills/brenda-database/SKILL.md +714 -0
  436. package/data/skills/bulk-combat-correction/SKILL.md +54 -0
  437. package/data/skills/bulk-deg-analysis/SKILL.md +61 -0
  438. package/data/skills/bulk-deseq2-analysis/SKILL.md +50 -0
  439. package/data/skills/bulk-stringdb-ppi/SKILL.md +49 -0
  440. package/data/skills/bulk-to-single-deconvolution/SKILL.md +50 -0
  441. package/data/skills/bulk-trajblend-interpolation/SKILL.md +52 -0
  442. package/data/skills/bulk-wgcna-analysis/SKILL.md +56 -0
  443. package/data/skills/cancer-metabolism-agent/SKILL.md +180 -0
  444. package/data/skills/care-coordination/SKILL.md +35 -0
  445. package/data/skills/cart-design-optimizer-agent/SKILL.md +162 -0
  446. package/data/skills/cbioportal-database/SKILL.md +367 -0
  447. package/data/skills/cell-free-expression/SKILL.md +291 -0
  448. package/data/skills/cellagent-annotation/SKILL.md +69 -0
  449. package/data/skills/cellfree-rna-agent/SKILL.md +182 -0
  450. package/data/skills/cellular-senescence-agent/SKILL.md +183 -0
  451. package/data/skills/cellxgene-census/SKILL.md +505 -0
  452. package/data/skills/chai/SKILL.md +272 -0
  453. package/data/skills/chatehr-clinician-assistant/SKILL.md +67 -0
  454. package/data/skills/chematagent-drug-discovery/SKILL.md +68 -0
  455. package/data/skills/chembl-database/SKILL.md +383 -0
  456. package/data/skills/chembl-search/SKILL.md +211 -0
  457. package/data/skills/chemcrow-drug-discovery/SKILL.md +61 -0
  458. package/data/skills/chemical-property-lookup/SKILL.md +42 -0
  459. package/data/skills/chemist-analyst/SKILL.md +1603 -0
  460. package/data/skills/chemistry-agent/SKILL.md +62 -0
  461. package/data/skills/chip-clonal-hematopoiesis-agent/SKILL.md +224 -0
  462. package/data/skills/chromosomal-instability-agent/SKILL.md +187 -0
  463. package/data/skills/citation-management/SKILL.md +1081 -0
  464. package/data/skills/claims-appeals/SKILL.md +35 -0
  465. package/data/skills/claw-ancestry-pca/SKILL.md +145 -0
  466. package/data/skills/claw-metagenomics/SKILL.md +238 -0
  467. package/data/skills/claw-semantic-sim/SKILL.md +151 -0
  468. package/data/skills/clinical-decision-support/SKILL.md +504 -0
  469. package/data/skills/clinical-diagnostic-reasoning/SKILL.md +222 -0
  470. package/data/skills/clinical-nlp-extractor/SKILL.md +59 -0
  471. package/data/skills/clinical-note-summarization/SKILL.md +52 -0
  472. package/data/skills/clinical-reports/SKILL.md +1127 -0
  473. package/data/skills/clinical-trial-protocol-skill/SKILL.md +508 -0
  474. package/data/skills/clinical-trials-search/SKILL.md +211 -0
  475. package/data/skills/clinicaltrials-database/SKILL.md +501 -0
  476. package/data/skills/clinpgx/SKILL.md +96 -0
  477. package/data/skills/clinpgx-database/SKILL.md +632 -0
  478. package/data/skills/clinvar-database/SKILL.md +356 -0
  479. package/data/skills/cnv-caller-agent/SKILL.md +171 -0
  480. package/data/skills/coagulation-thrombosis-agent/SKILL.md +141 -0
  481. package/data/skills/cobrapy/SKILL.md +457 -0
  482. package/data/skills/compbioagent-explorer/SKILL.md +67 -0
  483. package/data/skills/computational-pathology-agent/SKILL.md +72 -0
  484. package/data/skills/convergence-study/SKILL.md +98 -0
  485. package/data/skills/cosmic-database/SKILL.md +330 -0
  486. package/data/skills/crisis-detection-intervention-ai/SKILL.md +569 -0
  487. package/data/skills/crisis-response-protocol/SKILL.md +456 -0
  488. package/data/skills/crispr-guide-design/SKILL.md +72 -0
  489. package/data/skills/crispr-offtarget-predictor/SKILL.md +56 -0
  490. package/data/skills/cryoem-ai-drug-design-agent/SKILL.md +216 -0
  491. package/data/skills/ctdna-dynamics-mrd-agent/SKILL.md +206 -0
  492. package/data/skills/cytokine-storm-analysis-agent/SKILL.md +180 -0
  493. package/data/skills/dask/SKILL.md +454 -0
  494. package/data/skills/data-stats-analysis/SKILL.md +477 -0
  495. package/data/skills/data-transform/SKILL.md +576 -0
  496. package/data/skills/data-visualization-biomedical/SKILL.md +252 -0
  497. package/data/skills/data-visualization-expert/SKILL.md +72 -0
  498. package/data/skills/data-viz-plots/SKILL.md +461 -0
  499. package/data/skills/datacommons-client/SKILL.md +253 -0
  500. package/data/skills/datamol/SKILL.md +700 -0
  501. package/data/skills/deep-research/SKILL.md +111 -0
  502. package/data/skills/deep-research-swarm/SKILL.md +62 -0
  503. package/data/skills/deep-visual-proteomics-agent/SKILL.md +149 -0
  504. package/data/skills/deepchem/SKILL.md +591 -0
  505. package/data/skills/deeptools/SKILL.md +525 -0
  506. package/data/skills/depmap/SKILL.md +300 -0
  507. package/data/skills/diffdock/SKILL.md +477 -0
  508. package/data/skills/differentiation-schemes/SKILL.md +159 -0
  509. package/data/skills/digital-twin-clinical-agent/SKILL.md +228 -0
  510. package/data/skills/dispatching-parallel-agents/SKILL.md +180 -0
  511. package/data/skills/dnanexus-integration/SKILL.md +376 -0
  512. package/data/skills/doc-coauthoring/SKILL.md +375 -0
  513. package/data/skills/docx/SKILL.md +590 -0
  514. package/data/skills/docx-official/SKILL.md +197 -0
  515. package/data/skills/drug-discovery-search/SKILL.md +214 -0
  516. package/data/skills/drug-interaction-checker/SKILL.md +56 -0
  517. package/data/skills/drug-labels-search/SKILL.md +211 -0
  518. package/data/skills/drug-photo/SKILL.md +149 -0
  519. package/data/skills/drugbank-database/SKILL.md +184 -0
  520. package/data/skills/drugbank-search/SKILL.md +211 -0
  521. package/data/skills/ehr-fhir-integration/SKILL.md +60 -0
  522. package/data/skills/emergency-card/SKILL.md +426 -0
  523. package/data/skills/ena-database/SKILL.md +198 -0
  524. package/data/skills/ensembl-database/SKILL.md +305 -0
  525. package/data/skills/epidemiologist-analyst/SKILL.md +1844 -0
  526. package/data/skills/epigenomics-methylgpt-agent/SKILL.md +111 -0
  527. package/data/skills/equity-scorer/SKILL.md +182 -0
  528. package/data/skills/esm/SKILL.md +300 -0
  529. package/data/skills/etetoolkit/SKILL.md +617 -0
  530. package/data/skills/executing-plans/SKILL.md +84 -0
  531. package/data/skills/exosome-ev-analysis-agent/SKILL.md +171 -0
  532. package/data/skills/exploratory-data-analysis/SKILL.md +440 -0
  533. package/data/skills/family-health-analyzer/SKILL.md +137 -0
  534. package/data/skills/fastq-analysis/SKILL.md +191 -0
  535. package/data/skills/fda-database/SKILL.md +512 -0
  536. package/data/skills/fhir-developer-skill/SKILL.md +294 -0
  537. package/data/skills/fhir-development/SKILL.md +35 -0
  538. package/data/skills/find-skills/SKILL.md +133 -0
  539. package/data/skills/finishing-a-development-branch/SKILL.md +200 -0
  540. package/data/skills/fitness-analyzer/SKILL.md +431 -0
  541. package/data/skills/flowio/SKILL.md +602 -0
  542. package/data/skills/foldseek/SKILL.md +179 -0
  543. package/data/skills/galaxy-bridge/SKILL.md +215 -0
  544. package/data/skills/gene-database/SKILL.md +173 -0
  545. package/data/skills/gene-panel-design-agent/SKILL.md +192 -0
  546. package/data/skills/geniml/SKILL.md +312 -0
  547. package/data/skills/genome-compare/SKILL.md +127 -0
  548. package/data/skills/geo-database/SKILL.md +809 -0
  549. package/data/skills/geopandas/SKILL.md +245 -0
  550. package/data/skills/gget/SKILL.md +865 -0
  551. package/data/skills/ginkgo-cloud-lab/SKILL.md +56 -0
  552. package/data/skills/glycoengineering/SKILL.md +338 -0
  553. package/data/skills/gnomad-database/SKILL.md +395 -0
  554. package/data/skills/goal-analyzer/SKILL.md +605 -0
  555. package/data/skills/grief-companion/SKILL.md +250 -0
  556. package/data/skills/gsea-enrichment/SKILL.md +151 -0
  557. package/data/skills/gtars/SKILL.md +279 -0
  558. package/data/skills/gtex-database/SKILL.md +315 -0
  559. package/data/skills/gwas-database/SKILL.md +602 -0
  560. package/data/skills/gwas-lookup/SKILL.md +122 -0
  561. package/data/skills/gwas-prs/SKILL.md +178 -0
  562. package/data/skills/health-trend-analyzer/SKILL.md +451 -0
  563. package/data/skills/hemoglobinopathy-analysis-agent/SKILL.md +167 -0
  564. package/data/skills/hipaa-compliance/SKILL.md +230 -0
  565. package/data/skills/histolab/SKILL.md +672 -0
  566. package/data/skills/hmdb-database/SKILL.md +190 -0
  567. package/data/skills/hrd-analysis-agent/SKILL.md +184 -0
  568. package/data/skills/hrv-alexithymia-expert/SKILL.md +151 -0
  569. package/data/skills/hypogenic/SKILL.md +649 -0
  570. package/data/skills/hypothesis-generation/SKILL.md +286 -0
  571. package/data/skills/imaging-data-commons/SKILL.md +843 -0
  572. package/data/skills/immune-checkpoint-combination-agent/SKILL.md +170 -0
  573. package/data/skills/infographics/SKILL.md +563 -0
  574. package/data/skills/instrument-data-to-allotrope/SKILL.md +280 -0
  575. package/data/skills/interpro-database/SKILL.md +305 -0
  576. package/data/skills/ipsae/SKILL.md +190 -0
  577. package/data/skills/iso-13485-certification/SKILL.md +678 -0
  578. package/data/skills/jaspar-database/SKILL.md +351 -0
  579. package/data/skills/jungian-psychologist/SKILL.md +191 -0
  580. package/data/skills/kegg-database/SKILL.md +371 -0
  581. package/data/skills/knowledge-synthesis/SKILL.md +283 -0
  582. package/data/skills/kragen-knowledge-graph/SKILL.md +68 -0
  583. package/data/skills/lab-results/SKILL.md +35 -0
  584. package/data/skills/labarchive-integration/SKILL.md +262 -0
  585. package/data/skills/labstep/SKILL.md +208 -0
  586. package/data/skills/lamindb/SKILL.md +384 -0
  587. package/data/skills/latchbio-integration/SKILL.md +347 -0
  588. package/data/skills/latex-posters/SKILL.md +1602 -0
  589. package/data/skills/leads-literature-mining/SKILL.md +68 -0
  590. package/data/skills/ligandmpnn/SKILL.md +170 -0
  591. package/data/skills/linear-solvers/SKILL.md +165 -0
  592. package/data/skills/liquid-biopsy-analytics-agent/SKILL.md +171 -0
  593. package/data/skills/lit-synthesizer/SKILL.md +53 -0
  594. package/data/skills/literature-review/SKILL.md +584 -0
  595. package/data/skills/literature-search/SKILL.md +214 -0
  596. package/data/skills/lobster-bioinformatics/SKILL.md +305 -0
  597. package/data/skills/long-read-sequencing-agent/SKILL.md +181 -0
  598. package/data/skills/mage-antibody-generator/SKILL.md +54 -0
  599. package/data/skills/markdown-mermaid-writing/SKILL.md +327 -0
  600. package/data/skills/markitdown/SKILL.md +486 -0
  601. package/data/skills/matchms/SKILL.md +197 -0
  602. package/data/skills/matplotlib/SKILL.md +359 -0
  603. package/data/skills/mcpmed-bioinformatics-server/SKILL.md +42 -0
  604. package/data/skills/medchem/SKILL.md +400 -0
  605. package/data/skills/medea-therapeutic-discovery/SKILL.md +45 -0
  606. package/data/skills/medical-entity-extractor/SKILL.md +144 -0
  607. package/data/skills/medical-imaging-review/SKILL.md +170 -0
  608. package/data/skills/medical-research-toolkit/SKILL.md +273 -0
  609. package/data/skills/medrxiv-search/SKILL.md +211 -0
  610. package/data/skills/mental-health-analyzer/SKILL.md +981 -0
  611. package/data/skills/mesh-generation/SKILL.md +149 -0
  612. package/data/skills/metabolomics-workbench-database/SKILL.md +253 -0
  613. package/data/skills/microbiome-cancer-agent/SKILL.md +180 -0
  614. package/data/skills/modern-drug-rehab-computer/SKILL.md +392 -0
  615. package/data/skills/molecular-dynamics/SKILL.md +457 -0
  616. package/data/skills/molecular-glue-discovery-agent/SKILL.md +224 -0
  617. package/data/skills/molecule-evolution-agent/SKILL.md +62 -0
  618. package/data/skills/molfeat/SKILL.md +505 -0
  619. package/data/skills/monarch-database/SKILL.md +372 -0
  620. package/data/skills/mpn-progression-monitor-agent/SKILL.md +228 -0
  621. package/data/skills/mpn-research-assistant/SKILL.md +197 -0
  622. package/data/skills/mrd-edge-detection-agent/SKILL.md +213 -0
  623. package/data/skills/multi-ancestry-prs-agent/SKILL.md +224 -0
  624. package/data/skills/multi-search-engine/SKILL.md +110 -0
  625. package/data/skills/multimodal-medical-imaging/SKILL.md +59 -0
  626. package/data/skills/multimodal-radpath-fusion-agent/SKILL.md +213 -0
  627. package/data/skills/myeloma-mrd-agent/SKILL.md +184 -0
  628. package/data/skills/networkx/SKILL.md +435 -0
  629. package/data/skills/neurokit2/SKILL.md +350 -0
  630. package/data/skills/neuropixels-analysis/SKILL.md +344 -0
  631. package/data/skills/nextflow-development/SKILL.md +290 -0
  632. package/data/skills/ngs-analysis/SKILL.md +183 -0
  633. package/data/skills/nicheformer-spatial-agent/SKILL.md +197 -0
  634. package/data/skills/nk-cell-therapy-agent/SKILL.md +186 -0
  635. package/data/skills/nonlinear-solvers/SKILL.md +180 -0
  636. package/data/skills/numerical-integration/SKILL.md +166 -0
  637. package/data/skills/numerical-stability/SKILL.md +149 -0
  638. package/data/skills/nutrition-analyzer/SKILL.md +775 -0
  639. package/data/skills/occupational-health-analyzer/SKILL.md +386 -0
  640. package/data/skills/omero-integration/SKILL.md +245 -0
  641. package/data/skills/ontology-explorer/SKILL.md +168 -0
  642. package/data/skills/ontology-mapper/SKILL.md +171 -0
  643. package/data/skills/ontology-validator/SKILL.md +136 -0
  644. package/data/skills/open-notebook/SKILL.md +289 -0
  645. package/data/skills/open-targets-search/SKILL.md +211 -0
  646. package/data/skills/openalex-database/SKILL.md +488 -0
  647. package/data/skills/opentargets-database/SKILL.md +367 -0
  648. package/data/skills/opentrons-integration/SKILL.md +567 -0
  649. package/data/skills/opentrons-protocol-agent/SKILL.md +58 -0
  650. package/data/skills/organoid-drug-response-agent/SKILL.md +189 -0
  651. package/data/skills/pan-cancer-multiomics-agent/SKILL.md +159 -0
  652. package/data/skills/paper-2-web/SKILL.md +495 -0
  653. package/data/skills/parameter-optimization/SKILL.md +141 -0
  654. package/data/skills/patents-search/SKILL.md +211 -0
  655. package/data/skills/pathml/SKILL.md +160 -0
  656. package/data/skills/patiently-ai/SKILL.md +103 -0
  657. package/data/skills/pdb/SKILL.md +217 -0
  658. package/data/skills/pdb-database/SKILL.md +303 -0
  659. package/data/skills/pdf/SKILL.md +314 -0
  660. package/data/skills/pdf-anthropic/SKILL.md +294 -0
  661. package/data/skills/pdf-processing/SKILL.md +149 -0
  662. package/data/skills/pdf-processing-pro/SKILL.md +296 -0
  663. package/data/skills/pdx-model-analysis-agent/SKILL.md +169 -0
  664. package/data/skills/peer-review/SKILL.md +565 -0
  665. package/data/skills/performance-profiling/SKILL.md +255 -0
  666. package/data/skills/perplexity-search/SKILL.md +441 -0
  667. package/data/skills/pharmacogenomics-agent/SKILL.md +143 -0
  668. package/data/skills/pharmgx-reporter/SKILL.md +134 -0
  669. package/data/skills/phylogenetics/SKILL.md +404 -0
  670. package/data/skills/plotly/SKILL.md +265 -0
  671. package/data/skills/polars/SKILL.md +385 -0
  672. package/data/skills/popeve-variant-predictor-agent/SKILL.md +213 -0
  673. package/data/skills/post-processing/SKILL.md +338 -0
  674. package/data/skills/pptx/SKILL.md +232 -0
  675. package/data/skills/pptx-official/SKILL.md +484 -0
  676. package/data/skills/pptx-posters/SKILL.md +414 -0
  677. package/data/skills/precision-oncology-agent/SKILL.md +53 -0
  678. package/data/skills/prior-auth-coworker/SKILL.md +60 -0
  679. package/data/skills/prior-auth-review-skill/SKILL.md +360 -0
  680. package/data/skills/profile-report/SKILL.md +120 -0
  681. package/data/skills/protac-design-agent/SKILL.md +220 -0
  682. package/data/skills/protein-design-workflow/SKILL.md +199 -0
  683. package/data/skills/protein-qc/SKILL.md +300 -0
  684. package/data/skills/protein-structure-prediction/SKILL.md +59 -0
  685. package/data/skills/proteinmpnn/SKILL.md +279 -0
  686. package/data/skills/protocolsio-integration/SKILL.md +415 -0
  687. package/data/skills/prs-net-deep-learning-agent/SKILL.md +232 -0
  688. package/data/skills/psychologist-analyst/SKILL.md +1888 -0
  689. package/data/skills/pubchem-database/SKILL.md +568 -0
  690. package/data/skills/pubmed-database/SKILL.md +454 -0
  691. package/data/skills/pubmed-search/SKILL.md +103 -0
  692. package/data/skills/pydeseq2/SKILL.md +553 -0
  693. package/data/skills/pydicom/SKILL.md +428 -0
  694. package/data/skills/pyhealth/SKILL.md +485 -0
  695. package/data/skills/pylabrobot/SKILL.md +179 -0
  696. package/data/skills/pymc/SKILL.md +566 -0
  697. package/data/skills/pymoo/SKILL.md +565 -0
  698. package/data/skills/pyopenms/SKILL.md +211 -0
  699. package/data/skills/pysam/SKILL.md +259 -0
  700. package/data/skills/pytdc/SKILL.md +454 -0
  701. package/data/skills/pytorch-lightning/SKILL.md +172 -0
  702. package/data/skills/pyzotero/SKILL.md +111 -0
  703. package/data/skills/radgpt-radiology-reporter/SKILL.md +67 -0
  704. package/data/skills/radiomics-pathomics-fusion-agent/SKILL.md +221 -0
  705. package/data/skills/rdkit/SKILL.md +763 -0
  706. package/data/skills/reactome-database/SKILL.md +272 -0
  707. package/data/skills/receiving-code-review/SKILL.md +213 -0
  708. package/data/skills/recovery-community-moderator/SKILL.md +175 -0
  709. package/data/skills/regulatory-drafter/SKILL.md +56 -0
  710. package/data/skills/regulatory-drafting/SKILL.md +35 -0
  711. package/data/skills/rehabilitation-analyzer/SKILL.md +636 -0
  712. package/data/skills/repro-enforcer/SKILL.md +50 -0
  713. package/data/skills/requesting-code-review/SKILL.md +105 -0
  714. package/data/skills/research-grants/SKILL.md +935 -0
  715. package/data/skills/research-literature/SKILL.md +35 -0
  716. package/data/skills/research-lookup/SKILL.md +502 -0
  717. package/data/skills/rfdiffusion/SKILL.md +306 -0
  718. package/data/skills/rna-velocity-agent/SKILL.md +174 -0
  719. package/data/skills/scanpy/SKILL.md +380 -0
  720. package/data/skills/scfoundation-model-agent/SKILL.md +210 -0
  721. package/data/skills/scientific-brainstorming/SKILL.md +185 -0
  722. package/data/skills/scientific-critical-thinking/SKILL.md +566 -0
  723. package/data/skills/scientific-manuscript/SKILL.md +181 -0
  724. package/data/skills/scientific-problem-selection/SKILL.md +269 -0
  725. package/data/skills/scientific-schematics/SKILL.md +619 -0
  726. package/data/skills/scientific-slides/SKILL.md +1154 -0
  727. package/data/skills/scientific-visualization/SKILL.md +773 -0
  728. package/data/skills/scientific-writing/SKILL.md +483 -0
  729. package/data/skills/scikit-bio/SKILL.md +431 -0
  730. package/data/skills/scikit-learn/SKILL.md +515 -0
  731. package/data/skills/scikit-survival/SKILL.md +393 -0
  732. package/data/skills/scrna-orchestrator/SKILL.md +204 -0
  733. package/data/skills/scrna-qc/SKILL.md +43 -0
  734. package/data/skills/scvelo/SKILL.md +321 -0
  735. package/data/skills/scvi-tools/SKILL.md +184 -0
  736. package/data/skills/seaborn/SKILL.md +671 -0
  737. package/data/skills/search-strategy/SKILL.md +247 -0
  738. package/data/skills/seq-wrangler/SKILL.md +58 -0
  739. package/data/skills/shap/SKILL.md +560 -0
  740. package/data/skills/simo-multiomics-integration-agent/SKILL.md +178 -0
  741. package/data/skills/simpy/SKILL.md +423 -0
  742. package/data/skills/simulation-orchestrator/SKILL.md +230 -0
  743. package/data/skills/simulation-validator/SKILL.md +195 -0
  744. package/data/skills/single-annotation/SKILL.md +129 -0
  745. package/data/skills/single-cell-rna-qc/SKILL.md +175 -0
  746. package/data/skills/single-cellphone-db/SKILL.md +68 -0
  747. package/data/skills/single-clustering/SKILL.md +75 -0
  748. package/data/skills/single-downstream-analysis/SKILL.md +150 -0
  749. package/data/skills/single-multiomics/SKILL.md +44 -0
  750. package/data/skills/single-preprocessing/SKILL.md +184 -0
  751. package/data/skills/single-to-spatial-mapping/SKILL.md +48 -0
  752. package/data/skills/single-trajectory/SKILL.md +62 -0
  753. package/data/skills/sleep-analyzer/SKILL.md +773 -0
  754. package/data/skills/slurm-job-script-generator/SKILL.md +135 -0
  755. package/data/skills/solublempnn/SKILL.md +165 -0
  756. package/data/skills/spatial-agent/SKILL.md +56 -0
  757. package/data/skills/spatial-epigenomics-agent/SKILL.md +163 -0
  758. package/data/skills/spatial-transcriptomics-agent/SKILL.md +75 -0
  759. package/data/skills/spatial-transcriptomics-analysis/SKILL.md +72 -0
  760. package/data/skills/spatial-transcriptomics-analysis/STAgent/SKILL.md +75 -0
  761. package/data/skills/spatial-transcriptomics-analysis/SpatialAgent/SKILL.md +56 -0
  762. package/data/skills/spatial-transcriptomics-analysis/bioSkills/image-analysis/SKILL.md +266 -0
  763. package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-communication/SKILL.md +287 -0
  764. package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-data-io/SKILL.md +243 -0
  765. package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-deconvolution/SKILL.md +298 -0
  766. package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-domains/SKILL.md +229 -0
  767. package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-multiomics/SKILL.md +172 -0
  768. package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-neighbors/SKILL.md +189 -0
  769. package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-preprocessing/SKILL.md +232 -0
  770. package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-proteomics/SKILL.md +127 -0
  771. package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-statistics/SKILL.md +225 -0
  772. package/data/skills/spatial-transcriptomics-analysis/bioSkills/spatial-visualization/SKILL.md +270 -0
  773. package/data/skills/spatial-tutorials/SKILL.md +87 -0
  774. package/data/skills/speech-pathology-ai/SKILL.md +184 -0
  775. package/data/skills/statistical-analysis/SKILL.md +626 -0
  776. package/data/skills/statsmodels/SKILL.md +608 -0
  777. package/data/skills/string-database/SKILL.md +528 -0
  778. package/data/skills/struct-predictor/SKILL.md +52 -0
  779. package/data/skills/subagent-driven-development/SKILL.md +242 -0
  780. package/data/skills/systematic-debugging/SKILL.md +296 -0
  781. package/data/skills/tcell-exhaustion-analysis-agent/SKILL.md +139 -0
  782. package/data/skills/tcga-preprocessing/SKILL.md +49 -0
  783. package/data/skills/tcm-constitution-analyzer/SKILL.md +664 -0
  784. package/data/skills/tcr-pmhc-prediction-agent/SKILL.md +226 -0
  785. package/data/skills/tcr-repertoire-analysis-agent/SKILL.md +218 -0
  786. package/data/skills/test-driven-development/SKILL.md +371 -0
  787. package/data/skills/tiledbvcf/SKILL.md +459 -0
  788. package/data/skills/time-resolved-cryoem-agent/SKILL.md +223 -0
  789. package/data/skills/time-stepping/SKILL.md +140 -0
  790. package/data/skills/timesfm-forecasting/SKILL.md +785 -0
  791. package/data/skills/tme-immune-profiling-agent/SKILL.md +220 -0
  792. package/data/skills/tooluniverse-adverse-event-detection/SKILL.md +1115 -0
  793. package/data/skills/tooluniverse-antibody-engineering/SKILL.md +1581 -0
  794. package/data/skills/tooluniverse-binder-discovery/SKILL.md +1459 -0
  795. package/data/skills/tooluniverse-cancer-variant-interpretation/SKILL.md +971 -0
  796. package/data/skills/tooluniverse-chemical-compound-retrieval/SKILL.md +322 -0
  797. package/data/skills/tooluniverse-chemical-safety/SKILL.md +733 -0
  798. package/data/skills/tooluniverse-clinical-guidelines/SKILL.md +399 -0
  799. package/data/skills/tooluniverse-clinical-trial-design/SKILL.md +1195 -0
  800. package/data/skills/tooluniverse-clinical-trial-matching/SKILL.md +1333 -0
  801. package/data/skills/tooluniverse-crispr-screen-analysis/SKILL.md +900 -0
  802. package/data/skills/tooluniverse-disease-research/SKILL.md +630 -0
  803. package/data/skills/tooluniverse-drug-drug-interaction/SKILL.md +73 -0
  804. package/data/skills/tooluniverse-drug-repurposing/SKILL.md +595 -0
  805. package/data/skills/tooluniverse-drug-research/SKILL.md +1642 -0
  806. package/data/skills/tooluniverse-drug-target-validation/SKILL.md +1206 -0
  807. package/data/skills/tooluniverse-epigenomics/SKILL.md +1489 -0
  808. package/data/skills/tooluniverse-expression-data-retrieval/SKILL.md +389 -0
  809. package/data/skills/tooluniverse-gene-enrichment/SKILL.md +402 -0
  810. package/data/skills/tooluniverse-gwas-drug-discovery/SKILL.md +576 -0
  811. package/data/skills/tooluniverse-gwas-finemapping/SKILL.md +309 -0
  812. package/data/skills/tooluniverse-gwas-snp-interpretation/SKILL.md +223 -0
  813. package/data/skills/tooluniverse-gwas-study-explorer/SKILL.md +342 -0
  814. package/data/skills/tooluniverse-gwas-trait-to-gene/SKILL.md +236 -0
  815. package/data/skills/tooluniverse-image-analysis/SKILL.md +439 -0
  816. package/data/skills/tooluniverse-immune-repertoire-analysis/SKILL.md +949 -0
  817. package/data/skills/tooluniverse-immunotherapy-response-prediction/SKILL.md +865 -0
  818. package/data/skills/tooluniverse-infectious-disease/SKILL.md +749 -0
  819. package/data/skills/tooluniverse-literature-deep-research/SKILL.md +1050 -0
  820. package/data/skills/tooluniverse-metabolomics/SKILL.md +298 -0
  821. package/data/skills/tooluniverse-metabolomics-analysis/SKILL.md +764 -0
  822. package/data/skills/tooluniverse-multi-omics-integration/SKILL.md +703 -0
  823. package/data/skills/tooluniverse-multiomic-disease-characterization/SKILL.md +1138 -0
  824. package/data/skills/tooluniverse-network-pharmacology/SKILL.md +1312 -0
  825. package/data/skills/tooluniverse-pharmacovigilance/SKILL.md +807 -0
  826. package/data/skills/tooluniverse-phylogenetics/SKILL.md +461 -0
  827. package/data/skills/tooluniverse-polygenic-risk-score/SKILL.md +397 -0
  828. package/data/skills/tooluniverse-precision-medicine-stratification/SKILL.md +1143 -0
  829. package/data/skills/tooluniverse-precision-oncology/SKILL.md +1091 -0
  830. package/data/skills/tooluniverse-protein-interactions/SKILL.md +446 -0
  831. package/data/skills/tooluniverse-protein-structure-retrieval/SKILL.md +416 -0
  832. package/data/skills/tooluniverse-protein-therapeutic-design/SKILL.md +637 -0
  833. package/data/skills/tooluniverse-proteomics-analysis/SKILL.md +843 -0
  834. package/data/skills/tooluniverse-rare-disease-diagnosis/SKILL.md +1257 -0
  835. package/data/skills/tooluniverse-rnaseq-deseq2/SKILL.md +536 -0
  836. package/data/skills/tooluniverse-sequence-retrieval/SKILL.md +419 -0
  837. package/data/skills/tooluniverse-single-cell/SKILL.md +719 -0
  838. package/data/skills/tooluniverse-spatial-omics-analysis/SKILL.md +1102 -0
  839. package/data/skills/tooluniverse-spatial-transcriptomics/SKILL.md +788 -0
  840. package/data/skills/tooluniverse-statistical-modeling/SKILL.md +557 -0
  841. package/data/skills/tooluniverse-structural-variant-analysis/SKILL.md +1356 -0
  842. package/data/skills/tooluniverse-systems-biology/SKILL.md +374 -0
  843. package/data/skills/tooluniverse-target-research/SKILL.md +1510 -0
  844. package/data/skills/tooluniverse-variant-analysis/SKILL.md +448 -0
  845. package/data/skills/tooluniverse-variant-interpretation/SKILL.md +1118 -0
  846. package/data/skills/torch-geometric/SKILL.md +674 -0
  847. package/data/skills/torch_geometric/SKILL.md +670 -0
  848. package/data/skills/torchdrug/SKILL.md +444 -0
  849. package/data/skills/tpd-ternary-complex-agent/SKILL.md +226 -0
  850. package/data/skills/transformers/SKILL.md +157 -0
  851. package/data/skills/travel-health-analyzer/SKILL.md +421 -0
  852. package/data/skills/treatment-plans/SKILL.md +1576 -0
  853. package/data/skills/trial-eligibility-agent/SKILL.md +54 -0
  854. package/data/skills/trialgpt-matching/SKILL.md +66 -0
  855. package/data/skills/tumor-clonal-evolution-agent/SKILL.md +134 -0
  856. package/data/skills/tumor-heterogeneity-agent/SKILL.md +216 -0
  857. package/data/skills/tumor-mutational-burden-agent/SKILL.md +188 -0
  858. package/data/skills/ukb-navigator/SKILL.md +113 -0
  859. package/data/skills/umap-learn/SKILL.md +473 -0
  860. package/data/skills/uniprot-database/SKILL.md +189 -0
  861. package/data/skills/universal-single-cell-annotator/SKILL.md +72 -0
  862. package/data/skills/using-git-worktrees/SKILL.md +218 -0
  863. package/data/skills/using-superpowers/SKILL.md +95 -0
  864. package/data/skills/usmle/SKILL.md +62 -0
  865. package/data/skills/uspto-database/SKILL.md +597 -0
  866. package/data/skills/vaex/SKILL.md +180 -0
  867. package/data/skills/varcadd-pathogenicity/SKILL.md +68 -0
  868. package/data/skills/variant-interpretation-acmg/SKILL.md +58 -0
  869. package/data/skills/variant-interpretation-acmg/bioSkills/clinical-interpretation/SKILL.md +334 -0
  870. package/data/skills/variant-interpretation-acmg/bioSkills/consensus-sequences/SKILL.md +343 -0
  871. package/data/skills/variant-interpretation-acmg/bioSkills/deepvariant/SKILL.md +279 -0
  872. package/data/skills/variant-interpretation-acmg/bioSkills/filtering-best-practices/SKILL.md +362 -0
  873. package/data/skills/variant-interpretation-acmg/bioSkills/gatk-variant-calling/SKILL.md +398 -0
  874. package/data/skills/variant-interpretation-acmg/bioSkills/joint-calling/SKILL.md +343 -0
  875. package/data/skills/variant-interpretation-acmg/bioSkills/structural-variant-calling/SKILL.md +256 -0
  876. package/data/skills/variant-interpretation-acmg/bioSkills/variant-annotation/SKILL.md +387 -0
  877. package/data/skills/variant-interpretation-acmg/bioSkills/variant-calling/SKILL.md +258 -0
  878. package/data/skills/variant-interpretation-acmg/bioSkills/variant-normalization/SKILL.md +304 -0
  879. package/data/skills/variant-interpretation-acmg/bioSkills/vcf-basics/SKILL.md +329 -0
  880. package/data/skills/variant-interpretation-acmg/bioSkills/vcf-manipulation/SKILL.md +398 -0
  881. package/data/skills/variant-interpretation-acmg/bioSkills/vcf-statistics/SKILL.md +424 -0
  882. package/data/skills/variant-interpretation-acmg/varCADD/SKILL.md +68 -0
  883. package/data/skills/vcf-annotator/SKILL.md +55 -0
  884. package/data/skills/verification-before-completion/SKILL.md +139 -0
  885. package/data/skills/virtual-lab-agent/SKILL.md +240 -0
  886. package/data/skills/wearable-analysis-agent/SKILL.md +70 -0
  887. package/data/skills/weightloss-analyzer/SKILL.md +320 -0
  888. package/data/skills/wellally-tech/SKILL.md +685 -0
  889. package/data/skills/wikipedia-search/SKILL.md +481 -0
  890. package/data/skills/writing-plans/SKILL.md +116 -0
  891. package/data/skills/writing-skills/SKILL.md +655 -0
  892. package/data/skills/xlsx/SKILL.md +292 -0
  893. package/data/skills/xlsx-official/SKILL.md +289 -0
  894. package/data/skills/zarr-python/SKILL.md +777 -0
  895. package/data/skills/zinc-database/SKILL.md +398 -0
  896. package/data/tools/__init__.py +8 -0
  897. package/data/tools/hpc.py +71 -0
  898. package/data/tools/hpc_client/__init__.py +8 -0
  899. package/data/tools/hpc_client/builders/__init__.py +12 -0
  900. package/data/tools/hpc_client/builders/alphafold.py +36 -0
  901. package/data/tools/hpc_client/builders/boltz.py +33 -0
  902. package/data/tools/hpc_client/builders/chai.py +30 -0
  903. package/data/tools/hpc_client/builders/immunebuilder.py +31 -0
  904. package/data/tools/hpc_client/builders/rfantibody.py +58 -0
  905. package/data/tools/hpc_client/builders/thermompnn.py +16 -0
  906. package/data/tools/hpc_client/hpc_api.py +41 -0
  907. package/data/tools/hpc_client/hpc_tools.py +218 -0
  908. package/data/tools/hpc_dynamic.py +71 -0
  909. package/data/tools/integrations/__init__.py +14 -0
  910. package/data/tools/integrations/adaptyv.py +107 -0
  911. package/data/tools/integrations/addgene.py +52 -0
  912. package/data/tools/integrations/api_internal.py +33 -0
  913. package/data/tools/molecular_biology.py +688 -0
  914. package/data/tools/pharmacology.py +67 -0
  915. package/data/workflows/bulk-omics-clustering/SKILL.md +501 -0
  916. package/data/workflows/bulk-omics-clustering/references/best_practices.md +395 -0
  917. package/data/workflows/bulk-omics-clustering/references/clustering_methods_comparison.md +288 -0
  918. package/data/workflows/bulk-omics-clustering/references/common-patterns.md +1136 -0
  919. package/data/workflows/bulk-omics-clustering/references/decision-guide.md +819 -0
  920. package/data/workflows/bulk-omics-clustering/references/distance_metrics_guide.md +388 -0
  921. package/data/workflows/bulk-omics-clustering/references/parameter_guide.md +396 -0
  922. package/data/workflows/bulk-omics-clustering/references/r-quick-start.md +105 -0
  923. package/data/workflows/bulk-omics-clustering/references/validation_metrics_guide.md +315 -0
  924. package/data/workflows/bulk-omics-clustering/scripts/characterize_clusters.py +255 -0
  925. package/data/workflows/bulk-omics-clustering/scripts/cluster_validation.py +449 -0
  926. package/data/workflows/bulk-omics-clustering/scripts/density_clustering.py +321 -0
  927. package/data/workflows/bulk-omics-clustering/scripts/dimensionality_reduction.py +328 -0
  928. package/data/workflows/bulk-omics-clustering/scripts/distance_metrics.py +251 -0
  929. package/data/workflows/bulk-omics-clustering/scripts/export_results.py +456 -0
  930. package/data/workflows/bulk-omics-clustering/scripts/hierarchical_clustering.R +229 -0
  931. package/data/workflows/bulk-omics-clustering/scripts/hierarchical_clustering.py +269 -0
  932. package/data/workflows/bulk-omics-clustering/scripts/kmeans_clustering.py +346 -0
  933. package/data/workflows/bulk-omics-clustering/scripts/load_example_data.R +171 -0
  934. package/data/workflows/bulk-omics-clustering/scripts/load_example_data.py +171 -0
  935. package/data/workflows/bulk-omics-clustering/scripts/model_based_clustering.py +370 -0
  936. package/data/workflows/bulk-omics-clustering/scripts/optimal_clusters.py +381 -0
  937. package/data/workflows/bulk-omics-clustering/scripts/plot_cluster_heatmap.R +141 -0
  938. package/data/workflows/bulk-omics-clustering/scripts/plot_clustering_results.py +452 -0
  939. package/data/workflows/bulk-omics-clustering/scripts/prepare_data.py +250 -0
  940. package/data/workflows/bulk-omics-clustering/scripts/stability_analysis.py +434 -0
  941. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/SKILL.md +505 -0
  942. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/references/comprehensive-reference.md +440 -0
  943. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/references/decision-guide.md +327 -0
  944. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/references/troubleshooting.md +456 -0
  945. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/references/usage-guide.md +75 -0
  946. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/basic_workflow.R +149 -0
  947. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/batch_correction.R +44 -0
  948. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/export_results.R +190 -0
  949. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/extract_results.R +242 -0
  950. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/load_example_data.R +250 -0
  951. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/multi_condition.R +50 -0
  952. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/qc_plots.R +410 -0
  953. package/data/workflows/bulk-rnaseq-counts-to-de-deseq2/scripts/transformations.R +218 -0
  954. package/data/workflows/chip-atlas-diff-analysis/SKILL.md +222 -0
  955. package/data/workflows/chip-atlas-diff-analysis/references/chipatlas_diff_api_format.md +106 -0
  956. package/data/workflows/chip-atlas-diff-analysis/references/diff_analysis_methods.md +89 -0
  957. package/data/workflows/chip-atlas-diff-analysis/references/output_format.md +78 -0
  958. package/data/workflows/chip-atlas-diff-analysis/scripts/__init__.py +1 -0
  959. package/data/workflows/chip-atlas-diff-analysis/scripts/annotate_genes.py +144 -0
  960. package/data/workflows/chip-atlas-diff-analysis/scripts/export_all.py +498 -0
  961. package/data/workflows/chip-atlas-diff-analysis/scripts/filter_regions.py +176 -0
  962. package/data/workflows/chip-atlas-diff-analysis/scripts/generate_all_plots.py +321 -0
  963. package/data/workflows/chip-atlas-diff-analysis/scripts/load_example_data.py +149 -0
  964. package/data/workflows/chip-atlas-diff-analysis/scripts/load_user_data.py +211 -0
  965. package/data/workflows/chip-atlas-diff-analysis/scripts/parse_bed_results.py +240 -0
  966. package/data/workflows/chip-atlas-diff-analysis/scripts/qc_checks.py +621 -0
  967. package/data/workflows/chip-atlas-diff-analysis/scripts/query_chipatlas_api.py +329 -0
  968. package/data/workflows/chip-atlas-diff-analysis/scripts/run_diff_workflow.py +256 -0
  969. package/data/workflows/chip-atlas-peak-enrichment/SKILL.md +212 -0
  970. package/data/workflows/chip-atlas-peak-enrichment/references/chipatlas_metadata_format.md +115 -0
  971. package/data/workflows/chip-atlas-peak-enrichment/references/enrichment_statistics.md +145 -0
  972. package/data/workflows/chip-atlas-peak-enrichment/references/peak_thresholds.md +63 -0
  973. package/data/workflows/chip-atlas-peak-enrichment/references/promoter_definitions.md +69 -0
  974. package/data/workflows/chip-atlas-peak-enrichment/scripts/__init__.py +1 -0
  975. package/data/workflows/chip-atlas-peak-enrichment/scripts/convert_genes_to_regions.py +271 -0
  976. package/data/workflows/chip-atlas-peak-enrichment/scripts/export_all.py +456 -0
  977. package/data/workflows/chip-atlas-peak-enrichment/scripts/filter_experiments.py +116 -0
  978. package/data/workflows/chip-atlas-peak-enrichment/scripts/generate_all_plots.py +280 -0
  979. package/data/workflows/chip-atlas-peak-enrichment/scripts/load_example_data.py +96 -0
  980. package/data/workflows/chip-atlas-peak-enrichment/scripts/load_user_data.py +183 -0
  981. package/data/workflows/chip-atlas-peak-enrichment/scripts/query_chipatlas_api.py +349 -0
  982. package/data/workflows/chip-atlas-peak-enrichment/scripts/run_enrichment_workflow.py +271 -0
  983. package/data/workflows/chip-atlas-target-genes/SKILL.md +230 -0
  984. package/data/workflows/chip-atlas-target-genes/references/macs2_binding_scores.md +89 -0
  985. package/data/workflows/chip-atlas-target-genes/references/string_scores.md +58 -0
  986. package/data/workflows/chip-atlas-target-genes/references/target_genes_data_format.md +73 -0
  987. package/data/workflows/chip-atlas-target-genes/scripts/__init__.py +0 -0
  988. package/data/workflows/chip-atlas-target-genes/scripts/download_target_genes.py +200 -0
  989. package/data/workflows/chip-atlas-target-genes/scripts/export_all.py +340 -0
  990. package/data/workflows/chip-atlas-target-genes/scripts/filter_targets.py +205 -0
  991. package/data/workflows/chip-atlas-target-genes/scripts/generate_all_plots.py +330 -0
  992. package/data/workflows/chip-atlas-target-genes/scripts/load_example_query.py +61 -0
  993. package/data/workflows/chip-atlas-target-genes/scripts/load_user_query.py +47 -0
  994. package/data/workflows/chip-atlas-target-genes/scripts/run_target_genes_workflow.py +141 -0
  995. package/data/workflows/clinicaltrials-landscape/SKILL.md +257 -0
  996. package/data/workflows/clinicaltrials-landscape/references/api-parameters.md +181 -0
  997. package/data/workflows/clinicaltrials-landscape/references/mechanisms.md +141 -0
  998. package/data/workflows/clinicaltrials-landscape/references/output-schema.md +184 -0
  999. package/data/workflows/clinicaltrials-landscape/scripts/__init__.py +1 -0
  1000. package/data/workflows/clinicaltrials-landscape/scripts/classify_mechanisms.py +359 -0
  1001. package/data/workflows/clinicaltrials-landscape/scripts/compile_trials.py +579 -0
  1002. package/data/workflows/clinicaltrials-landscape/scripts/disease_config.py +161 -0
  1003. package/data/workflows/clinicaltrials-landscape/scripts/export_all.py +242 -0
  1004. package/data/workflows/clinicaltrials-landscape/scripts/generate_landscape_plots.py +761 -0
  1005. package/data/workflows/clinicaltrials-landscape/scripts/generate_pdf_report.py +1465 -0
  1006. package/data/workflows/clinicaltrials-landscape/scripts/generate_report.py +1813 -0
  1007. package/data/workflows/clinicaltrials-landscape/scripts/query_clinicaltrials.py +307 -0
  1008. package/data/workflows/coexpression-network/SKILL.md +344 -0
  1009. package/data/workflows/coexpression-network/references/parameter-tuning-guide.md +591 -0
  1010. package/data/workflows/coexpression-network/references/troubleshooting.md +483 -0
  1011. package/data/workflows/coexpression-network/references/wgcna-best-practices.md +563 -0
  1012. package/data/workflows/coexpression-network/references/wgcna-reference.md +538 -0
  1013. package/data/workflows/coexpression-network/scripts/build_network.R +43 -0
  1014. package/data/workflows/coexpression-network/scripts/correlate_modules_traits.R +92 -0
  1015. package/data/workflows/coexpression-network/scripts/export_wgcna_results.R +117 -0
  1016. package/data/workflows/coexpression-network/scripts/identify_hub_genes.R +63 -0
  1017. package/data/workflows/coexpression-network/scripts/load_example_data.R +214 -0
  1018. package/data/workflows/coexpression-network/scripts/module_enrichment.R +159 -0
  1019. package/data/workflows/coexpression-network/scripts/pick_soft_power.R +70 -0
  1020. package/data/workflows/coexpression-network/scripts/plot_all_wgcna.R +104 -0
  1021. package/data/workflows/coexpression-network/scripts/plot_eigengene_heatmap.R +65 -0
  1022. package/data/workflows/coexpression-network/scripts/plot_hub_genes.R +70 -0
  1023. package/data/workflows/coexpression-network/scripts/plot_module_dendrogram.R +50 -0
  1024. package/data/workflows/coexpression-network/scripts/plotting_helpers.R +87 -0
  1025. package/data/workflows/coexpression-network/scripts/prepare_wgcna_data.R +73 -0
  1026. package/data/workflows/coexpression-network/scripts/wgcna_workflow.R +93 -0
  1027. package/data/workflows/experimental-design-statistics/SKILL.md +408 -0
  1028. package/data/workflows/experimental-design-statistics/references/batch_effect_mitigation.md +756 -0
  1029. package/data/workflows/experimental-design-statistics/references/cv_tissue_database.csv +30 -0
  1030. package/data/workflows/experimental-design-statistics/references/experimental_design_best_practices.md +515 -0
  1031. package/data/workflows/experimental-design-statistics/references/multiple_testing_guide.md +730 -0
  1032. package/data/workflows/experimental-design-statistics/references/power_analysis_guidelines.md +635 -0
  1033. package/data/workflows/experimental-design-statistics/references/qc_guidelines.md +310 -0
  1034. package/data/workflows/experimental-design-statistics/references/software_requirements.md +328 -0
  1035. package/data/workflows/experimental-design-statistics/references/troubleshooting_guide.md +510 -0
  1036. package/data/workflows/experimental-design-statistics/scripts/batch_assignment.R +302 -0
  1037. package/data/workflows/experimental-design-statistics/scripts/batch_validation.R +342 -0
  1038. package/data/workflows/experimental-design-statistics/scripts/export_design.R +352 -0
  1039. package/data/workflows/experimental-design-statistics/scripts/load_example_data.R +204 -0
  1040. package/data/workflows/experimental-design-statistics/scripts/multiple_testing.R +417 -0
  1041. package/data/workflows/experimental-design-statistics/scripts/plot_power_curves.R +317 -0
  1042. package/data/workflows/experimental-design-statistics/scripts/power_atacseq.R +229 -0
  1043. package/data/workflows/experimental-design-statistics/scripts/power_pilot_based.R +289 -0
  1044. package/data/workflows/experimental-design-statistics/scripts/power_rnaseq.R +247 -0
  1045. package/data/workflows/experimental-design-statistics/scripts/sample_size_de.R +327 -0
  1046. package/data/workflows/experimental-design-statistics/scripts/sample_size_scrna.R +304 -0
  1047. package/data/workflows/functional-enrichment-from-degs/SKILL.md +387 -0
  1048. package/data/workflows/functional-enrichment-from-degs/references/database_guide.md +354 -0
  1049. package/data/workflows/functional-enrichment-from-degs/references/decision-guide.md +546 -0
  1050. package/data/workflows/functional-enrichment-from-degs/references/gsea_ora_comparison.md +213 -0
  1051. package/data/workflows/functional-enrichment-from-degs/references/gsea_ora_validation_framework.md +483 -0
  1052. package/data/workflows/functional-enrichment-from-degs/references/interpretation_guidelines.md +374 -0
  1053. package/data/workflows/functional-enrichment-from-degs/references/method-reference.md +742 -0
  1054. package/data/workflows/functional-enrichment-from-degs/scripts/export_results.R +190 -0
  1055. package/data/workflows/functional-enrichment-from-degs/scripts/generate_plots.R +240 -0
  1056. package/data/workflows/functional-enrichment-from-degs/scripts/get_msigdb_genesets.R +75 -0
  1057. package/data/workflows/functional-enrichment-from-degs/scripts/load_de_results.R +60 -0
  1058. package/data/workflows/functional-enrichment-from-degs/scripts/load_example_data.R +212 -0
  1059. package/data/workflows/functional-enrichment-from-degs/scripts/prepare_gene_lists.R +92 -0
  1060. package/data/workflows/functional-enrichment-from-degs/scripts/run_gsea.R +44 -0
  1061. package/data/workflows/functional-enrichment-from-degs/scripts/run_ora.R +53 -0
  1062. package/data/workflows/genetic-variant-annotation/SKILL.md +440 -0
  1063. package/data/workflows/genetic-variant-annotation/references/auto_installation_implementation.md +274 -0
  1064. package/data/workflows/genetic-variant-annotation/references/consequence_terms.md +392 -0
  1065. package/data/workflows/genetic-variant-annotation/references/filtering_strategies.md +808 -0
  1066. package/data/workflows/genetic-variant-annotation/references/installation_guide.md +557 -0
  1067. package/data/workflows/genetic-variant-annotation/references/pathogenicity_interpretation.md +473 -0
  1068. package/data/workflows/genetic-variant-annotation/references/qc_guidelines.md +524 -0
  1069. package/data/workflows/genetic-variant-annotation/references/snpeff_best_practices.md +481 -0
  1070. package/data/workflows/genetic-variant-annotation/references/tool_selection_guide.md +433 -0
  1071. package/data/workflows/genetic-variant-annotation/references/troubleshooting_guide.md +678 -0
  1072. package/data/workflows/genetic-variant-annotation/references/vep_best_practices.md +450 -0
  1073. package/data/workflows/genetic-variant-annotation/scripts/annotate_genes.py +243 -0
  1074. package/data/workflows/genetic-variant-annotation/scripts/export_results.py +450 -0
  1075. package/data/workflows/genetic-variant-annotation/scripts/filter_variants.py +365 -0
  1076. package/data/workflows/genetic-variant-annotation/scripts/install_tools.py +246 -0
  1077. package/data/workflows/genetic-variant-annotation/scripts/load_example_data.py +166 -0
  1078. package/data/workflows/genetic-variant-annotation/scripts/parse_snpeff_output.py +283 -0
  1079. package/data/workflows/genetic-variant-annotation/scripts/parse_vep_output.py +257 -0
  1080. package/data/workflows/genetic-variant-annotation/scripts/plot_variant_distribution.py +372 -0
  1081. package/data/workflows/genetic-variant-annotation/scripts/prioritize_variants.py +287 -0
  1082. package/data/workflows/genetic-variant-annotation/scripts/run_snpeff.py +418 -0
  1083. package/data/workflows/genetic-variant-annotation/scripts/run_vep.py +358 -0
  1084. package/data/workflows/genetic-variant-annotation/scripts/select_tool.py +203 -0
  1085. package/data/workflows/genetic-variant-annotation/scripts/test_complete_workflow.py +312 -0
  1086. package/data/workflows/genetic-variant-annotation/scripts/test_pickle_load.py +118 -0
  1087. package/data/workflows/genetic-variant-annotation/scripts/validate_vcf.py +351 -0
  1088. package/data/workflows/genetic-variant-annotation/scripts/verify_changes.py +212 -0
  1089. package/data/workflows/grn-pyscenic/SKILL.md +331 -0
  1090. package/data/workflows/grn-pyscenic/references/cli_interface.md +222 -0
  1091. package/data/workflows/grn-pyscenic/references/database_downloads.md +245 -0
  1092. package/data/workflows/grn-pyscenic/scripts/export_all.py +192 -0
  1093. package/data/workflows/grn-pyscenic/scripts/generate_report.py +512 -0
  1094. package/data/workflows/grn-pyscenic/scripts/integrate_with_adata.py +54 -0
  1095. package/data/workflows/grn-pyscenic/scripts/load_example_data.py +200 -0
  1096. package/data/workflows/grn-pyscenic/scripts/load_expression_data.py +61 -0
  1097. package/data/workflows/grn-pyscenic/scripts/plot_regulon_visualizations.py +263 -0
  1098. package/data/workflows/grn-pyscenic/scripts/run_grn_workflow.py +184 -0
  1099. package/data/workflows/gwas-to-function-twas/SKILL.md +394 -0
  1100. package/data/workflows/gwas-to-function-twas/references/fusion_best_practices.md +120 -0
  1101. package/data/workflows/gwas-to-function-twas/references/installation-guide.md +414 -0
  1102. package/data/workflows/gwas-to-function-twas/references/ldsc_qc_guidelines.md +287 -0
  1103. package/data/workflows/gwas-to-function-twas/references/spredixxcan_best_practices.md +166 -0
  1104. package/data/workflows/gwas-to-function-twas/references/therapeutic_interpretation_guide.md +717 -0
  1105. package/data/workflows/gwas-to-function-twas/references/tissue_reference_guide.md +182 -0
  1106. package/data/workflows/gwas-to-function-twas/references/troubleshooting_guide.md +317 -0
  1107. package/data/workflows/gwas-to-function-twas/references/twas_hub_validation_guide.md +88 -0
  1108. package/data/workflows/gwas-to-function-twas/scripts/colocalization_analysis.py +187 -0
  1109. package/data/workflows/gwas-to-function-twas/scripts/druggability_scoring.py +199 -0
  1110. package/data/workflows/gwas-to-function-twas/scripts/export_results.py +220 -0
  1111. package/data/workflows/gwas-to-function-twas/scripts/integrate_variant_annotation.py +194 -0
  1112. package/data/workflows/gwas-to-function-twas/scripts/interpret_therapeutic_direction.py +418 -0
  1113. package/data/workflows/gwas-to-function-twas/scripts/mendelian_randomization.py +749 -0
  1114. package/data/workflows/gwas-to-function-twas/scripts/multilayer_direction_analysis.py +471 -0
  1115. package/data/workflows/gwas-to-function-twas/scripts/plot_twas_results.py +252 -0
  1116. package/data/workflows/gwas-to-function-twas/scripts/run_fusion.py +155 -0
  1117. package/data/workflows/gwas-to-function-twas/scripts/run_smultixcan.py +102 -0
  1118. package/data/workflows/gwas-to-function-twas/scripts/run_spredixxcan.py +138 -0
  1119. package/data/workflows/gwas-to-function-twas/scripts/select_reference_panel.py +253 -0
  1120. package/data/workflows/gwas-to-function-twas/scripts/validate_gwas_sumstats.py +214 -0
  1121. package/data/workflows/gwas-to-function-twas/scripts/validate_with_twas_hub.py +439 -0
  1122. package/data/workflows/lasso-biomarker-panel/SKILL.md +322 -0
  1123. package/data/workflows/lasso-biomarker-panel/references/decision-guide.md +64 -0
  1124. package/data/workflows/lasso-biomarker-panel/references/lasso-reference.md +110 -0
  1125. package/data/workflows/lasso-biomarker-panel/references/validation-guide.md +105 -0
  1126. package/data/workflows/lasso-biomarker-panel/scripts/biological_interpretation.R +1560 -0
  1127. package/data/workflows/lasso-biomarker-panel/scripts/biomarker_plots.R +350 -0
  1128. package/data/workflows/lasso-biomarker-panel/scripts/export_results.R +1492 -0
  1129. package/data/workflows/lasso-biomarker-panel/scripts/lasso_workflow.R +328 -0
  1130. package/data/workflows/lasso-biomarker-panel/scripts/load_example_data.R +1903 -0
  1131. package/data/workflows/lasso-biomarker-panel/scripts/plotting_helpers.R +78 -0
  1132. package/data/workflows/lasso-biomarker-panel/scripts/prepare_features.R +225 -0
  1133. package/data/workflows/lasso-biomarker-panel/scripts/query_cellxgene.py +107 -0
  1134. package/data/workflows/lasso-biomarker-panel/scripts/validate_external.R +174 -0
  1135. package/data/workflows/literature-preclinical/SKILL.md +276 -0
  1136. package/data/workflows/literature-preclinical/assets/eval/simple_test.py +386 -0
  1137. package/data/workflows/literature-preclinical/references/experiment-extraction-guide.md +147 -0
  1138. package/data/workflows/literature-preclinical/references/full-text-enrichment-guide.md +121 -0
  1139. package/data/workflows/literature-preclinical/references/preclinical-search-guide.md +117 -0
  1140. package/data/workflows/literature-preclinical/scripts/extract_experiments.py +401 -0
  1141. package/data/workflows/literature-preclinical/scripts/generate_plots.R +303 -0
  1142. package/data/workflows/literature-preclinical/scripts/narrative_synthesis.py +653 -0
  1143. package/data/workflows/literature-preclinical/scripts/preclinical_search.py +332 -0
  1144. package/data/workflows/literature-preclinical/scripts/preclinical_synthesis.py +237 -0
  1145. package/data/workflows/literature-preclinical/scripts/report_generation.py +326 -0
  1146. package/data/workflows/mendelian-randomization-twosamplemr/SKILL.md +210 -0
  1147. package/data/workflows/mendelian-randomization-twosamplemr/references/interpretation-guide.md +239 -0
  1148. package/data/workflows/mendelian-randomization-twosamplemr/references/method-reference.md +190 -0
  1149. package/data/workflows/mendelian-randomization-twosamplemr/scripts/export_results.R +123 -0
  1150. package/data/workflows/mendelian-randomization-twosamplemr/scripts/generate_report.R +411 -0
  1151. package/data/workflows/mendelian-randomization-twosamplemr/scripts/load_data.R +281 -0
  1152. package/data/workflows/mendelian-randomization-twosamplemr/scripts/mr_plots.R +163 -0
  1153. package/data/workflows/mendelian-randomization-twosamplemr/scripts/run_mr_analysis.R +322 -0
  1154. package/data/workflows/pcr-primer-design/SKILL.md +397 -0
  1155. package/data/workflows/pcr-primer-design/references/code_examples.md +594 -0
  1156. package/data/workflows/pcr-primer-design/references/miqe_guidelines.md +453 -0
  1157. package/data/workflows/pcr-primer-design/references/parameter_ranges.md +356 -0
  1158. package/data/workflows/pcr-primer-design/references/primer_design_best_practices.md +451 -0
  1159. package/data/workflows/pcr-primer-design/references/troubleshooting_guide.md +477 -0
  1160. package/data/workflows/pcr-primer-design/scripts/__init__.py +2 -0
  1161. package/data/workflows/pcr-primer-design/scripts/calculate_tm.py +306 -0
  1162. package/data/workflows/pcr-primer-design/scripts/check_dimers.py +298 -0
  1163. package/data/workflows/pcr-primer-design/scripts/check_secondary_structures.py +343 -0
  1164. package/data/workflows/pcr-primer-design/scripts/design_qpcr_primers.py +233 -0
  1165. package/data/workflows/pcr-primer-design/scripts/design_standard_primers.py +197 -0
  1166. package/data/workflows/pcr-primer-design/scripts/design_taqman_probes.py +226 -0
  1167. package/data/workflows/pcr-primer-design/scripts/export_results.py +382 -0
  1168. package/data/workflows/pcr-primer-design/scripts/generate_reports.py +379 -0
  1169. package/data/workflows/pcr-primer-design/scripts/validate_specificity.py +311 -0
  1170. package/data/workflows/pcr-primer-design/scripts/visualize_primers.py +379 -0
  1171. package/data/workflows/polygenic-risk-score-prs-catalog/SKILL.md +195 -0
  1172. package/data/workflows/polygenic-risk-score-prs-catalog/references/interpretation-guide.md +80 -0
  1173. package/data/workflows/polygenic-risk-score-prs-catalog/references/pgs-catalog-guide.md +109 -0
  1174. package/data/workflows/polygenic-risk-score-prs-catalog/scripts/export_results.R +186 -0
  1175. package/data/workflows/polygenic-risk-score-prs-catalog/scripts/generate_plots.R +283 -0
  1176. package/data/workflows/polygenic-risk-score-prs-catalog/scripts/load_pgs_weights.R +228 -0
  1177. package/data/workflows/polygenic-risk-score-prs-catalog/scripts/load_reference_data.R +191 -0
  1178. package/data/workflows/polygenic-risk-score-prs-catalog/scripts/score_traits.R +216 -0
  1179. package/data/workflows/pooled-crispr-screens/SKILL.md +362 -0
  1180. package/data/workflows/pooled-crispr-screens/references/crispr_screen_best_practices.md +349 -0
  1181. package/data/workflows/pooled-crispr-screens/references/qc_guidelines.md +722 -0
  1182. package/data/workflows/pooled-crispr-screens/references/statistical_methods.md +644 -0
  1183. package/data/workflows/pooled-crispr-screens/references/troubleshooting_guide.md +684 -0
  1184. package/data/workflows/pooled-crispr-screens/references/umi_optimization.md +297 -0
  1185. package/data/workflows/pooled-crispr-screens/scripts/concatenate_libraries.py +132 -0
  1186. package/data/workflows/pooled-crispr-screens/scripts/detect_perturbed_cells.py +255 -0
  1187. package/data/workflows/pooled-crispr-screens/scripts/differential_expression.py +202 -0
  1188. package/data/workflows/pooled-crispr-screens/scripts/differential_expression_glmgampoi.py +320 -0
  1189. package/data/workflows/pooled-crispr-screens/scripts/export_results.py +261 -0
  1190. package/data/workflows/pooled-crispr-screens/scripts/expression_filtering.py +159 -0
  1191. package/data/workflows/pooled-crispr-screens/scripts/gene_name_corrections.py +188 -0
  1192. package/data/workflows/pooled-crispr-screens/scripts/generate_report.py +485 -0
  1193. package/data/workflows/pooled-crispr-screens/scripts/load_10x_libraries.py +69 -0
  1194. package/data/workflows/pooled-crispr-screens/scripts/load_example_data.py +257 -0
  1195. package/data/workflows/pooled-crispr-screens/scripts/map_sgrna_to_cells.py +119 -0
  1196. package/data/workflows/pooled-crispr-screens/scripts/normalize_and_scale.py +140 -0
  1197. package/data/workflows/pooled-crispr-screens/scripts/qc_filtering.py +185 -0
  1198. package/data/workflows/pooled-crispr-screens/scripts/run_glmgampoi.R +181 -0
  1199. package/data/workflows/pooled-crispr-screens/scripts/screen_all_perturbations.py +306 -0
  1200. package/data/workflows/pooled-crispr-screens/scripts/validate_perturbations.py +314 -0
  1201. package/data/workflows/pooled-crispr-screens/scripts/visualize_perturbations.py +314 -0
  1202. package/data/workflows/scrnaseq-scanpy-core-analysis/SKILL.md +425 -0
  1203. package/data/workflows/scrnaseq-scanpy-core-analysis/references/ambient_rna_correction.md +422 -0
  1204. package/data/workflows/scrnaseq-scanpy-core-analysis/references/common-patterns.md +533 -0
  1205. package/data/workflows/scrnaseq-scanpy-core-analysis/references/integration_methods.md +820 -0
  1206. package/data/workflows/scrnaseq-scanpy-core-analysis/references/marker_gene_database.md +471 -0
  1207. package/data/workflows/scrnaseq-scanpy-core-analysis/references/pseudobulk_de_guide.md +408 -0
  1208. package/data/workflows/scrnaseq-scanpy-core-analysis/references/qc_guidelines.md +535 -0
  1209. package/data/workflows/scrnaseq-scanpy-core-analysis/references/scanpy_best_practices.md +496 -0
  1210. package/data/workflows/scrnaseq-scanpy-core-analysis/references/troubleshooting_guide.md +668 -0
  1211. package/data/workflows/scrnaseq-scanpy-core-analysis/references/workflow-details.md +727 -0
  1212. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/annotate_celltypes.py +431 -0
  1213. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/cluster_cells.py +293 -0
  1214. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/export_results.py +423 -0
  1215. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/filter_cells.py +531 -0
  1216. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/find_markers.py +391 -0
  1217. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/find_variable_genes.py +222 -0
  1218. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/integrate_scvi.py +665 -0
  1219. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/integration_diagnostics.py +678 -0
  1220. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/load_example_data.py +68 -0
  1221. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/normalize_data.py +325 -0
  1222. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/plot_dimreduction.py +389 -0
  1223. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/plot_qc.py +320 -0
  1224. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/pseudobulk_de.py +553 -0
  1225. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/qc_metrics.py +477 -0
  1226. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/remove_ambient_rna.py +347 -0
  1227. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/run_umap.py +188 -0
  1228. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/scale_and_pca.py +365 -0
  1229. package/data/workflows/scrnaseq-scanpy-core-analysis/scripts/setup_and_import.py +334 -0
  1230. package/data/workflows/scrnaseq-seurat-core-analysis/SKILL.md +585 -0
  1231. package/data/workflows/scrnaseq-seurat-core-analysis/references/ambient_rna_correction.md +422 -0
  1232. package/data/workflows/scrnaseq-seurat-core-analysis/references/common-patterns.md +667 -0
  1233. package/data/workflows/scrnaseq-seurat-core-analysis/references/decision-guide.md +456 -0
  1234. package/data/workflows/scrnaseq-seurat-core-analysis/references/integration_methods.md +864 -0
  1235. package/data/workflows/scrnaseq-seurat-core-analysis/references/marker_gene_database.md +471 -0
  1236. package/data/workflows/scrnaseq-seurat-core-analysis/references/pseudobulk_de_guide.md +408 -0
  1237. package/data/workflows/scrnaseq-seurat-core-analysis/references/qc_guidelines.md +452 -0
  1238. package/data/workflows/scrnaseq-seurat-core-analysis/references/seurat_best_practices.md +417 -0
  1239. package/data/workflows/scrnaseq-seurat-core-analysis/references/troubleshooting_guide.md +566 -0
  1240. package/data/workflows/scrnaseq-seurat-core-analysis/references/workflow-details.md +801 -0
  1241. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/annotate_celltypes.R +306 -0
  1242. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/cluster_cells.R +223 -0
  1243. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/export_results.R +292 -0
  1244. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/filter_cells.R +576 -0
  1245. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/find_markers.R +325 -0
  1246. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/find_variable_features.R +106 -0
  1247. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/integrate_batches.R +504 -0
  1248. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/integration_diagnostics.R +596 -0
  1249. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/load_example_data.R +89 -0
  1250. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/normalize_data.R +184 -0
  1251. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/plot_dimreduction.R +273 -0
  1252. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/plot_qc.R +250 -0
  1253. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/pseudobulk_de.R +324 -0
  1254. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/qc_metrics.R +358 -0
  1255. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/remove_ambient_rna.R +281 -0
  1256. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/run_umap.R +116 -0
  1257. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/scale_and_pca.R +243 -0
  1258. package/data/workflows/scrnaseq-seurat-core-analysis/scripts/setup_and_import.R +193 -0
  1259. package/data/workflows/spatial-transcriptomics/SKILL.md +256 -0
  1260. package/data/workflows/spatial-transcriptomics/references/spatial-analysis-guide.md +216 -0
  1261. package/data/workflows/spatial-transcriptomics/scripts/export_results.py +214 -0
  1262. package/data/workflows/spatial-transcriptomics/scripts/generate_all_plots.py +397 -0
  1263. package/data/workflows/spatial-transcriptomics/scripts/load_example_data.py +175 -0
  1264. package/data/workflows/spatial-transcriptomics/scripts/spatial_workflow.py +206 -0
  1265. package/dist/bgi.js +28 -1
  1266. package/package.json +2 -1
@@ -0,0 +1,1489 @@
1
+ ---
2
+ name: tooluniverse-epigenomics
3
+ description: Production-ready genomics and epigenomics data processing for BixBench questions. Handles methylation array analysis (CpG filtering, differential methylation, age-related CpG detection, chromosome-level density), ChIP-seq peak analysis (peak calling, motif enrichment, coverage stats), ATAC-seq chromatin accessibility, multi-omics integration (expression + methylation correlation), and genome-wide statistics. Pure Python computation (pandas, scipy, numpy, pysam, statsmodels) plus ToolUniverse annotation tools (Ensembl, ENCODE, SCREEN, JASPAR, ReMap, RegulomeDB, ChIPAtlas). Supports BED, BigWig, methylation beta-value matrices, Illumina manifest files, and multi-sample clinical data. Use when processing methylation data, ChIP-seq peaks, ATAC-seq signals, or answering questions about CpG sites, differential methylation, chromatin accessibility, histone marks, or epigenomic statistics.
4
+ ---
5
+
6
+ # Genomics and Epigenomics Data Processing
7
+
8
+ Production-ready computational skill for processing and analyzing epigenomics data. Combines local Python computation (pandas, scipy, numpy, pysam, statsmodels) with ToolUniverse annotation tools for regulatory context. Designed to solve BixBench-style questions about methylation, ChIP-seq, ATAC-seq, and multi-omics integration.
9
+
10
+ ## When to Use This Skill
11
+
12
+ **Triggers**:
13
+ - User provides methylation data (beta-value matrices, Illumina arrays) and asks about CpG sites
14
+ - Questions about differential methylation analysis
15
+ - Age-related CpG detection or epigenetic clock questions
16
+ - Chromosome-level methylation density or statistics
17
+ - ChIP-seq peak files (BED format) with analysis questions
18
+ - ATAC-seq chromatin accessibility questions
19
+ - Multi-omics integration (expression + methylation, expression + ChIP-seq)
20
+ - Genome-wide epigenomic statistics
21
+ - Questions mentioning "methylation", "CpG", "ChIP-seq", "ATAC-seq", "histone", "chromatin", "epigenetic"
22
+ - Questions about missing data across clinical/genomic/epigenomic modalities
23
+ - Regulatory element annotation for processed epigenomic data
24
+
25
+ **Example Questions This Skill Solves**:
26
+ 1. "How many patients have no missing data for vital status, gene expression, and methylation data?"
27
+ 2. "What is the ratio of filtered age-related CpG density between chromosomes?"
28
+ 3. "What is the genome-wide average chromosomal density of unique age-related CpGs per base pair?"
29
+ 4. "How many CpG sites show significant differential methylation (padj < 0.05)?"
30
+ 5. "What is the Pearson correlation between methylation and expression for gene X?"
31
+ 6. "How many ChIP-seq peaks overlap with promoter regions?"
32
+ 7. "What fraction of ATAC-seq peaks are in enhancer regions?"
33
+ 8. "Which chromosome has the highest density of hypermethylated CpGs?"
34
+ 9. "Filter CpG sites by variance > threshold and map to nearest genes"
35
+ 10. "What is the average beta value difference between tumor and normal for chromosome 17?"
36
+
37
+ **NOT for** (use other skills instead):
38
+ - Gene regulation lookup without data files -> Use existing epigenomics annotation pattern
39
+ - RNA-seq differential expression -> Use `tooluniverse-rnaseq-deseq2`
40
+ - Variant calling/annotation from VCF -> Use `tooluniverse-variant-analysis`
41
+ - Gene enrichment analysis -> Use `tooluniverse-gene-enrichment`
42
+ - Protein structure analysis -> Use `tooluniverse-protein-structure-retrieval`
43
+
44
+ ---
45
+
46
+ ## Required Python Packages
47
+
48
+ ```python
49
+ # Core (MUST be available)
50
+ import pandas as pd
51
+ import numpy as np
52
+ from scipy import stats
53
+ import statsmodels.stats.multitest as mt
54
+
55
+ # Optional but useful
56
+ import pysam # BAM/CRAM file access
57
+ import gseapy # Enrichment of genes from methylation analysis
58
+
59
+ # ToolUniverse (for annotation)
60
+ from tooluniverse import ToolUniverse
61
+ ```
62
+
63
+ ---
64
+
65
+ ## KEY PRINCIPLES
66
+
67
+ 1. **Data-first approach** - Load and inspect data files BEFORE any analysis
68
+ 2. **Question-driven** - Parse what the user is actually asking and extract the specific numeric answer
69
+ 3. **File format detection** - Automatically detect methylation arrays, BED files, BigWig, clinical data
70
+ 4. **Coordinate system awareness** - Track genome build (hg19, hg38, mm10), handle chr prefix differences
71
+ 5. **Statistical rigor** - Proper multiple testing correction, effect size filtering, sample size awareness
72
+ 6. **Missing data handling** - Explicitly report and handle NaN/missing values
73
+ 7. **Chromosome normalization** - Always normalize chromosome names (chr1 vs 1, chrX vs X)
74
+ 8. **CpG site identification** - Parse Illumina probe IDs (cg/ch probes), genomic coordinates
75
+ 9. **Report-first** - Create output file first, populate progressively
76
+ 10. **English-first queries** - Use English in all tool calls
77
+
78
+ ---
79
+
80
+ ## Complete Workflow
81
+
82
+ ### Phase 0: Question Parsing and Data Discovery
83
+
84
+ **CRITICAL FIRST STEP**: Before writing ANY code, parse the question to identify what is being asked and what data files are available.
85
+
86
+ #### 0.1 Discover Available Data Files
87
+
88
+ ```python
89
+ import os
90
+ import glob
91
+
92
+ data_dir = "." # or specified path
93
+ all_files = glob.glob(os.path.join(data_dir, "**/*"), recursive=True)
94
+
95
+ # Categorize files
96
+ methylation_files = [f for f in all_files if any(x in f.lower() for x in
97
+ ['methyl', 'beta', 'cpg', 'illumina', '450k', '850k', 'epic', 'mval'])]
98
+ chipseq_files = [f for f in all_files if any(x in f.lower() for x in
99
+ ['chip', 'peak', 'narrowpeak', 'broadpeak', 'histone'])]
100
+ atacseq_files = [f for f in all_files if any(x in f.lower() for x in
101
+ ['atac', 'accessibility', 'openChromatin', 'dnase'])]
102
+ bed_files = [f for f in all_files if f.endswith(('.bed', '.bed.gz', '.narrowPeak', '.broadPeak'))]
103
+ bigwig_files = [f for f in all_files if f.endswith(('.bw', '.bigwig', '.bigWig'))]
104
+ clinical_files = [f for f in all_files if any(x in f.lower() for x in
105
+ ['clinical', 'patient', 'sample', 'metadata', 'phenotype', 'survival'])]
106
+ expression_files = [f for f in all_files if any(x in f.lower() for x in
107
+ ['express', 'rnaseq', 'fpkm', 'tpm', 'counts', 'transcriptom'])]
108
+ manifest_files = [f for f in all_files if any(x in f.lower() for x in
109
+ ['manifest', 'annotation', 'probe', 'platform'])]
110
+
111
+ # Print summary
112
+ for category, files in [
113
+ ('Methylation', methylation_files),
114
+ ('ChIP-seq', chipseq_files),
115
+ ('ATAC-seq', atacseq_files),
116
+ ('BED', bed_files),
117
+ ('BigWig', bigwig_files),
118
+ ('Clinical', clinical_files),
119
+ ('Expression', expression_files),
120
+ ('Manifest', manifest_files),
121
+ ]:
122
+ if files:
123
+ print(f"{category}: {files}")
124
+ ```
125
+
126
+ #### 0.2 Parse Question Parameters
127
+
128
+ Extract these from the question:
129
+
130
+ | Parameter | Default | Example Question Text |
131
+ |-----------|---------|----------------------|
132
+ | **Significance threshold** | 0.05 | "padj < 0.05", "FDR < 0.01" |
133
+ | **Beta difference threshold** | 0 | "|delta_beta| > 0.2", "mean difference > 0.1" |
134
+ | **Variance filter** | None | "variance > 0.01", "top 5000 most variable" |
135
+ | **Chromosome filter** | All | "chromosome 17", "autosomes only" |
136
+ | **Genome build** | hg38 | "hg19", "GRCh37", "mm10" |
137
+ | **CpG type filter** | All | "cg probes only", "exclude ch probes" |
138
+ | **Region filter** | None | "promoter", "gene body", "intergenic" |
139
+ | **Missing data handling** | Report | "complete cases", "no missing data" |
140
+ | **Specific comparison** | Infer | "tumor vs normal", "old vs young" |
141
+ | **Specific statistic** | Infer | "density", "ratio", "count", "average" |
142
+
143
+ #### 0.3 Decision Tree
144
+
145
+ ```
146
+ Q: What type of epigenomics data?
147
+ METHYLATION -> Phase 1 (Methylation Processing)
148
+ CHIP-SEQ -> Phase 2 (ChIP-seq Processing)
149
+ ATAC-SEQ -> Phase 3 (ATAC-seq Processing)
150
+ MULTI-OMICS -> Phase 4 (Integration)
151
+ CLINICAL -> Phase 5 (Clinical Integration)
152
+ ANNOTATION -> Phase 6 (ToolUniverse Annotation)
153
+
154
+ Q: Is this a genome-wide statistics question?
155
+ YES -> Focus on chromosome-level aggregation (Phase 7)
156
+ NO -> Focus on site/region-level analysis
157
+ ```
158
+
159
+ ---
160
+
161
+ ### Phase 1: Methylation Data Processing
162
+
163
+ #### 1.1 Load Methylation Data
164
+
165
+ ```python
166
+ import pandas as pd
167
+ import numpy as np
168
+
169
+ def load_methylation_data(file_path, **kwargs):
170
+ """Load methylation beta-value or M-value matrix.
171
+
172
+ Expected format:
173
+ - Rows: CpG probes (cg00000029, cg00000108, ...)
174
+ - Columns: Samples (TCGA-XX-XXXX, ...)
175
+ - Values: Beta values (0-1) or M-values (log2 ratio)
176
+ """
177
+ ext = os.path.splitext(file_path)[1].lower()
178
+
179
+ if ext in ['.csv']:
180
+ df = pd.read_csv(file_path, index_col=0, **kwargs)
181
+ elif ext in ['.tsv', '.txt']:
182
+ df = pd.read_csv(file_path, sep='\t', index_col=0, **kwargs)
183
+ elif ext in ['.parquet']:
184
+ df = pd.read_parquet(file_path, **kwargs)
185
+ elif ext in ['.h5', '.hdf5']:
186
+ df = pd.read_hdf(file_path, **kwargs)
187
+ else:
188
+ # Try tab first, then comma
189
+ try:
190
+ df = pd.read_csv(file_path, sep='\t', index_col=0, **kwargs)
191
+ except Exception:
192
+ df = pd.read_csv(file_path, index_col=0, **kwargs)
193
+
194
+ return df
195
+
196
+
197
+ def detect_methylation_type(df):
198
+ """Detect if data is beta values (0-1) or M-values (unbounded)."""
199
+ sample_vals = df.iloc[:1000, :5].values.flatten()
200
+ sample_vals = sample_vals[~np.isnan(sample_vals)]
201
+
202
+ if sample_vals.min() >= 0 and sample_vals.max() <= 1:
203
+ return 'beta'
204
+ else:
205
+ return 'mvalue'
206
+
207
+
208
+ def beta_to_mvalue(beta):
209
+ """Convert beta values to M-values: M = log2(beta / (1 - beta))."""
210
+ beta = np.clip(beta, 1e-6, 1 - 1e-6)
211
+ return np.log2(beta / (1 - beta))
212
+
213
+
214
+ def mvalue_to_beta(mvalue):
215
+ """Convert M-values to beta values: beta = 2^M / (2^M + 1)."""
216
+ return 2**mvalue / (2**mvalue + 1)
217
+ ```
218
+
219
+ #### 1.2 Load Methylation Manifest / Probe Annotation
220
+
221
+ ```python
222
+ def load_probe_annotation(manifest_path):
223
+ """Load Illumina methylation array manifest.
224
+
225
+ Common columns: IlmnID, Name, CHR, MAPINFO (position), Strand,
226
+ UCSC_RefGene_Name, UCSC_RefGene_Group, Relation_to_UCSC_CpG_Island
227
+ """
228
+ # Try reading manifest - may have header rows to skip
229
+ for skiprows in [0, 7, 8]:
230
+ try:
231
+ manifest = pd.read_csv(manifest_path, skiprows=skiprows,
232
+ low_memory=False)
233
+ if 'CHR' in manifest.columns or 'chr' in manifest.columns:
234
+ break
235
+ if 'Name' in manifest.columns or 'IlmnID' in manifest.columns:
236
+ break
237
+ except Exception:
238
+ continue
239
+
240
+ # Normalize column names
241
+ col_map = {}
242
+ for col in manifest.columns:
243
+ lower = col.lower()
244
+ if lower in ['chr', 'chromosome']:
245
+ col_map[col] = 'chr'
246
+ elif lower in ['mapinfo', 'position', 'pos', 'start']:
247
+ col_map[col] = 'position'
248
+ elif lower in ['name', 'ilmnid', 'probe_id', 'cpg_id']:
249
+ col_map[col] = 'probe_id'
250
+ elif 'refgene_name' in lower or 'gene' in lower:
251
+ col_map[col] = 'gene_name'
252
+ elif 'refgene_group' in lower:
253
+ col_map[col] = 'gene_group'
254
+ elif 'cpg_island' in lower or 'relation' in lower:
255
+ col_map[col] = 'cpg_island_relation'
256
+
257
+ manifest = manifest.rename(columns=col_map)
258
+ return manifest
259
+
260
+
261
+ def normalize_chromosome(chrom):
262
+ """Normalize chromosome name: '1' -> 'chr1', 'chrX' -> 'chrX', etc."""
263
+ if chrom is None or pd.isna(chrom):
264
+ return None
265
+ chrom = str(chrom).strip()
266
+ if not chrom.startswith('chr'):
267
+ chrom = 'chr' + chrom
268
+ return chrom
269
+
270
+
271
+ def get_chromosome_lengths(genome='hg38'):
272
+ """Return chromosome lengths for common genome builds."""
273
+ # hg38 chromosome sizes (main chromosomes)
274
+ hg38 = {
275
+ 'chr1': 248956422, 'chr2': 242193529, 'chr3': 198295559,
276
+ 'chr4': 190214555, 'chr5': 181538259, 'chr6': 170805979,
277
+ 'chr7': 159345973, 'chr8': 145138636, 'chr9': 138394717,
278
+ 'chr10': 133797422, 'chr11': 135086622, 'chr12': 133275309,
279
+ 'chr13': 114364328, 'chr14': 107043718, 'chr15': 101991189,
280
+ 'chr16': 90338345, 'chr17': 83257441, 'chr18': 80373285,
281
+ 'chr19': 58617616, 'chr20': 64444167, 'chr21': 46709983,
282
+ 'chr22': 50818468, 'chrX': 156040895, 'chrY': 57227415,
283
+ }
284
+ hg19 = {
285
+ 'chr1': 249250621, 'chr2': 243199373, 'chr3': 198022430,
286
+ 'chr4': 191154276, 'chr5': 180915260, 'chr6': 171115067,
287
+ 'chr7': 159138663, 'chr8': 146364022, 'chr9': 141213431,
288
+ 'chr10': 135534747, 'chr11': 135006516, 'chr12': 133851895,
289
+ 'chr13': 115169878, 'chr14': 107349540, 'chr15': 102531392,
290
+ 'chr16': 90354753, 'chr17': 81195210, 'chr18': 78077248,
291
+ 'chr19': 59128983, 'chr20': 63025520, 'chr21': 48129895,
292
+ 'chr22': 51304566, 'chrX': 155270560, 'chrY': 59373566,
293
+ }
294
+ mm10 = {
295
+ 'chr1': 195471971, 'chr2': 182113224, 'chr3': 160039680,
296
+ 'chr4': 156508116, 'chr5': 151834684, 'chr6': 149736546,
297
+ 'chr7': 145441459, 'chr8': 129401213, 'chr9': 124595110,
298
+ 'chr10': 130694993, 'chr11': 122082543, 'chr12': 120129022,
299
+ 'chr13': 120421639, 'chr14': 124902244, 'chr15': 104043685,
300
+ 'chr16': 98207768, 'chr17': 94987271, 'chr18': 90702639,
301
+ 'chr19': 61431566, 'chrX': 171031299, 'chrY': 91744698,
302
+ }
303
+ genomes = {'hg38': hg38, 'hg19': hg19, 'mm10': mm10}
304
+ return genomes.get(genome, hg38)
305
+ ```
306
+
307
+ #### 1.3 CpG Site Filtering
308
+
309
+ ```python
310
+ def filter_cpg_probes(df, manifest=None, filters=None):
311
+ """Filter CpG probes based on various criteria.
312
+
313
+ Args:
314
+ df: Methylation matrix (probes x samples)
315
+ manifest: Probe annotation DataFrame
316
+ filters: dict with keys:
317
+ - 'variance_threshold': float, minimum variance across samples
318
+ - 'mean_beta_range': tuple (min, max), filter probes with extreme mean beta
319
+ - 'missing_threshold': float (0-1), max fraction of NaN allowed per probe
320
+ - 'chromosomes': list, keep only these chromosomes
321
+ - 'exclude_sex_chr': bool, remove chrX and chrY
322
+ - 'probe_type': 'cg' or 'ch', keep only one type
323
+ - 'cpg_island': str ('Island', 'Shore', 'Shelf', 'OpenSea')
324
+ - 'gene_group': str ('TSS200', 'TSS1500', 'Body', '1stExon', etc.)
325
+ - 'top_n_variable': int, keep top N most variable probes
326
+ """
327
+ if filters is None:
328
+ filters = {}
329
+
330
+ probe_mask = pd.Series(True, index=df.index)
331
+
332
+ # Probe type filter (cg vs ch)
333
+ if 'probe_type' in filters:
334
+ ptype = filters['probe_type']
335
+ probe_mask &= df.index.str.startswith(ptype)
336
+
337
+ # Missing data filter
338
+ if 'missing_threshold' in filters:
339
+ threshold = filters['missing_threshold']
340
+ missing_frac = df.isna().mean(axis=1)
341
+ probe_mask &= missing_frac <= threshold
342
+
343
+ # Variance filter
344
+ if 'variance_threshold' in filters:
345
+ var_threshold = filters['variance_threshold']
346
+ probe_var = df.var(axis=1, skipna=True)
347
+ probe_mask &= probe_var >= var_threshold
348
+
349
+ # Mean beta range filter
350
+ if 'mean_beta_range' in filters:
351
+ min_beta, max_beta = filters['mean_beta_range']
352
+ probe_mean = df.mean(axis=1, skipna=True)
353
+ probe_mask &= (probe_mean >= min_beta) & (probe_mean <= max_beta)
354
+
355
+ # Top N most variable
356
+ if 'top_n_variable' in filters:
357
+ n = filters['top_n_variable']
358
+ probe_var = df.var(axis=1, skipna=True)
359
+ top_probes = probe_var.nlargest(n).index
360
+ probe_mask &= df.index.isin(top_probes)
361
+
362
+ # Manifest-based filters
363
+ if manifest is not None and len(manifest) > 0:
364
+ probe_id_col = 'probe_id' if 'probe_id' in manifest.columns else manifest.columns[0]
365
+ manifest_indexed = manifest.set_index(probe_id_col) if probe_id_col in manifest.columns else manifest
366
+
367
+ # Chromosome filter
368
+ if 'chromosomes' in filters and 'chr' in manifest_indexed.columns:
369
+ valid_chr = [normalize_chromosome(c) for c in filters['chromosomes']]
370
+ chr_probes = manifest_indexed[
371
+ manifest_indexed['chr'].apply(normalize_chromosome).isin(valid_chr)
372
+ ].index
373
+ probe_mask &= df.index.isin(chr_probes)
374
+
375
+ # Exclude sex chromosomes
376
+ if filters.get('exclude_sex_chr', False) and 'chr' in manifest_indexed.columns:
377
+ sex_chr = ['chrX', 'chrY', 'X', 'Y']
378
+ nonsex_probes = manifest_indexed[
379
+ ~manifest_indexed['chr'].apply(normalize_chromosome).isin(['chrX', 'chrY'])
380
+ ].index
381
+ probe_mask &= df.index.isin(nonsex_probes)
382
+
383
+ # CpG island relation filter
384
+ if 'cpg_island' in filters and 'cpg_island_relation' in manifest_indexed.columns:
385
+ relation = filters['cpg_island']
386
+ island_probes = manifest_indexed[
387
+ manifest_indexed['cpg_island_relation'].str.contains(relation, na=False, case=False)
388
+ ].index
389
+ probe_mask &= df.index.isin(island_probes)
390
+
391
+ # Gene group filter (TSS200, Body, etc.)
392
+ if 'gene_group' in filters and 'gene_group' in manifest_indexed.columns:
393
+ group = filters['gene_group']
394
+ group_probes = manifest_indexed[
395
+ manifest_indexed['gene_group'].str.contains(group, na=False, case=False)
396
+ ].index
397
+ probe_mask &= df.index.isin(group_probes)
398
+
399
+ filtered_df = df[probe_mask]
400
+ return filtered_df
401
+ ```
402
+
403
+ #### 1.4 Differential Methylation Analysis
404
+
405
+ ```python
406
+ from scipy import stats
407
+ import statsmodels.stats.multitest as mt
408
+
409
+ def differential_methylation(beta_df, group1_samples, group2_samples,
410
+ test='ttest', correction='fdr_bh', alpha=0.05):
411
+ """Perform differential methylation analysis between two groups.
412
+
413
+ Args:
414
+ beta_df: Beta-value matrix (probes x samples)
415
+ group1_samples: list of sample IDs for group 1
416
+ group2_samples: list of sample IDs for group 2
417
+ test: 'ttest', 'wilcoxon', or 'ks' (Kolmogorov-Smirnov)
418
+ correction: multiple testing correction method
419
+ alpha: significance threshold
420
+
421
+ Returns:
422
+ DataFrame with columns: mean_g1, mean_g2, delta_beta, pvalue, padj
423
+ """
424
+ g1 = beta_df[group1_samples]
425
+ g2 = beta_df[group2_samples]
426
+
427
+ results = []
428
+ for probe in beta_df.index:
429
+ vals1 = g1.loc[probe].dropna().values
430
+ vals2 = g2.loc[probe].dropna().values
431
+
432
+ if len(vals1) < 2 or len(vals2) < 2:
433
+ results.append({
434
+ 'probe': probe, 'mean_g1': np.nan, 'mean_g2': np.nan,
435
+ 'delta_beta': np.nan, 'pvalue': np.nan
436
+ })
437
+ continue
438
+
439
+ mean1 = np.nanmean(vals1)
440
+ mean2 = np.nanmean(vals2)
441
+ delta = mean2 - mean1
442
+
443
+ if test == 'ttest':
444
+ stat, pval = stats.ttest_ind(vals1, vals2, equal_var=False)
445
+ elif test == 'wilcoxon':
446
+ stat, pval = stats.mannwhitneyu(vals1, vals2, alternative='two-sided')
447
+ elif test == 'ks':
448
+ stat, pval = stats.ks_2samp(vals1, vals2)
449
+ else:
450
+ stat, pval = stats.ttest_ind(vals1, vals2, equal_var=False)
451
+
452
+ results.append({
453
+ 'probe': probe, 'mean_g1': mean1, 'mean_g2': mean2,
454
+ 'delta_beta': delta, 'pvalue': pval
455
+ })
456
+
457
+ result_df = pd.DataFrame(results).set_index('probe')
458
+
459
+ # Multiple testing correction
460
+ valid_pvals = result_df['pvalue'].dropna()
461
+ if len(valid_pvals) > 0:
462
+ reject, padj, _, _ = mt.multipletests(valid_pvals.values, alpha=alpha, method=correction)
463
+ result_df.loc[valid_pvals.index, 'padj'] = padj
464
+ else:
465
+ result_df['padj'] = np.nan
466
+
467
+ return result_df
468
+
469
+
470
+ def identify_dmps(dm_results, alpha=0.05, delta_beta_threshold=0.0):
471
+ """Identify differentially methylated positions (DMPs).
472
+
473
+ Args:
474
+ dm_results: Output from differential_methylation()
475
+ alpha: adjusted p-value threshold
476
+ delta_beta_threshold: minimum absolute beta-value difference
477
+
478
+ Returns:
479
+ DataFrame of significant DMPs
480
+ """
481
+ dmps = dm_results[
482
+ (dm_results['padj'] < alpha) &
483
+ (dm_results['delta_beta'].abs() >= delta_beta_threshold)
484
+ ].copy()
485
+ dmps['direction'] = np.where(dmps['delta_beta'] > 0, 'hyper', 'hypo')
486
+ return dmps.sort_values('padj')
487
+ ```
488
+
489
+ #### 1.5 Age-Related CpG Analysis
490
+
491
+ ```python
492
+ def identify_age_related_cpgs(beta_df, ages, method='correlation',
493
+ correction='fdr_bh', alpha=0.05):
494
+ """Identify CpG sites associated with age.
495
+
496
+ Args:
497
+ beta_df: Beta-value matrix (probes x samples)
498
+ ages: Series or array of ages corresponding to samples
499
+ method: 'correlation' (Pearson/Spearman) or 'regression'
500
+ correction: multiple testing method
501
+ alpha: significance threshold
502
+
503
+ Returns:
504
+ DataFrame with correlation, p-value, adjusted p-value
505
+ """
506
+ results = []
507
+ for probe in beta_df.index:
508
+ vals = beta_df.loc[probe].values
509
+ mask = ~np.isnan(vals) & ~np.isnan(ages.values if hasattr(ages, 'values') else ages)
510
+ if sum(mask) < 5:
511
+ results.append({'probe': probe, 'correlation': np.nan,
512
+ 'pvalue': np.nan})
513
+ continue
514
+
515
+ if method == 'correlation':
516
+ corr, pval = stats.pearsonr(ages[mask] if hasattr(ages, '__getitem__') else
517
+ np.array(ages)[mask], vals[mask])
518
+ elif method == 'spearman':
519
+ corr, pval = stats.spearmanr(ages[mask] if hasattr(ages, '__getitem__') else
520
+ np.array(ages)[mask], vals[mask])
521
+ else:
522
+ corr, pval = stats.pearsonr(ages[mask] if hasattr(ages, '__getitem__') else
523
+ np.array(ages)[mask], vals[mask])
524
+
525
+ results.append({'probe': probe, 'correlation': corr, 'pvalue': pval})
526
+
527
+ result_df = pd.DataFrame(results).set_index('probe')
528
+
529
+ # Multiple testing correction
530
+ valid_pvals = result_df['pvalue'].dropna()
531
+ if len(valid_pvals) > 0:
532
+ reject, padj, _, _ = mt.multipletests(valid_pvals.values, alpha=alpha, method=correction)
533
+ result_df.loc[valid_pvals.index, 'padj'] = padj
534
+ else:
535
+ result_df['padj'] = np.nan
536
+
537
+ return result_df
538
+ ```
539
+
540
+ #### 1.6 Chromosome-Level Methylation Statistics
541
+
542
+ ```python
543
+ def chromosome_cpg_density(cpg_probes, manifest, genome='hg38'):
544
+ """Calculate CpG density per chromosome.
545
+
546
+ Args:
547
+ cpg_probes: list/Index of CpG probe IDs
548
+ manifest: probe annotation with chr and position columns
549
+ genome: genome build for chromosome lengths
550
+
551
+ Returns:
552
+ DataFrame with chr, n_cpgs, chr_length, density (CpGs per bp)
553
+ """
554
+ chr_lengths = get_chromosome_lengths(genome)
555
+
556
+ # Map probes to chromosomes
557
+ probe_id_col = 'probe_id' if 'probe_id' in manifest.columns else manifest.columns[0]
558
+ if probe_id_col in manifest.columns:
559
+ probe_chr = manifest.set_index(probe_id_col)
560
+ else:
561
+ probe_chr = manifest
562
+
563
+ # Get chromosome for each probe
564
+ if 'chr' in probe_chr.columns:
565
+ chr_col = 'chr'
566
+ elif 'CHR' in probe_chr.columns:
567
+ chr_col = 'CHR'
568
+ else:
569
+ raise ValueError("No chromosome column found in manifest")
570
+
571
+ # Count probes per chromosome
572
+ probe_chrs = probe_chr.loc[probe_chr.index.isin(cpg_probes), chr_col]
573
+ probe_chrs = probe_chrs.apply(normalize_chromosome)
574
+ chr_counts = probe_chrs.value_counts()
575
+
576
+ results = []
577
+ for chrom, count in chr_counts.items():
578
+ if chrom in chr_lengths:
579
+ length = chr_lengths[chrom]
580
+ density = count / length
581
+ results.append({
582
+ 'chr': chrom,
583
+ 'n_cpgs': count,
584
+ 'chr_length': length,
585
+ 'density_per_bp': density,
586
+ 'density_per_mb': density * 1e6,
587
+ })
588
+
589
+ return pd.DataFrame(results).sort_values('chr',
590
+ key=lambda x: x.str.replace('chr', '').replace({'X': '23', 'Y': '24'}).astype(int))
591
+
592
+
593
+ def genome_wide_average_density(density_df):
594
+ """Calculate genome-wide average CpG density across all chromosomes.
595
+
596
+ Args:
597
+ density_df: Output from chromosome_cpg_density()
598
+
599
+ Returns:
600
+ float: genome-wide average density (CpGs per bp)
601
+ """
602
+ total_cpgs = density_df['n_cpgs'].sum()
603
+ total_length = density_df['chr_length'].sum()
604
+ return total_cpgs / total_length
605
+
606
+
607
+ def chromosome_density_ratio(density_df, chr1, chr2):
608
+ """Calculate density ratio between two chromosomes.
609
+
610
+ Args:
611
+ density_df: Output from chromosome_cpg_density()
612
+ chr1, chr2: chromosome names (e.g., 'chr1', 'chr2')
613
+
614
+ Returns:
615
+ float: density ratio (chr1 / chr2)
616
+ """
617
+ chr1 = normalize_chromosome(chr1)
618
+ chr2 = normalize_chromosome(chr2)
619
+ d1 = density_df[density_df['chr'] == chr1]['density_per_bp'].values[0]
620
+ d2 = density_df[density_df['chr'] == chr2]['density_per_bp'].values[0]
621
+ return d1 / d2
622
+ ```
623
+
624
+ ---
625
+
626
+ ### Phase 2: ChIP-seq Peak Analysis
627
+
628
+ #### 2.1 Load BED/Peak Files
629
+
630
+ ```python
631
+ def load_bed_file(file_path, format='bed'):
632
+ """Load BED format file (standard BED, narrowPeak, broadPeak).
633
+
634
+ Standard BED: chrom, start, end, name, score, strand
635
+ narrowPeak: + signalValue, pValue, qValue, peak
636
+ broadPeak: + signalValue, pValue, qValue
637
+ """
638
+ if format == 'narrowPeak' or file_path.endswith('.narrowPeak'):
639
+ names = ['chrom', 'start', 'end', 'name', 'score', 'strand',
640
+ 'signalValue', 'pValue', 'qValue', 'peak']
641
+ elif format == 'broadPeak' or file_path.endswith('.broadPeak'):
642
+ names = ['chrom', 'start', 'end', 'name', 'score', 'strand',
643
+ 'signalValue', 'pValue', 'qValue']
644
+ else:
645
+ # Standard BED - detect number of columns
646
+ with open(file_path, 'r') as f:
647
+ first_line = f.readline().strip()
648
+ while first_line.startswith('#') or first_line.startswith('track') or first_line.startswith('browser'):
649
+ first_line = f.readline().strip()
650
+ n_cols = len(first_line.split('\t'))
651
+
652
+ bed_col_names = ['chrom', 'start', 'end', 'name', 'score', 'strand',
653
+ 'thickStart', 'thickEnd', 'itemRgb', 'blockCount',
654
+ 'blockSizes', 'blockStarts']
655
+ names = bed_col_names[:n_cols]
656
+
657
+ df = pd.read_csv(file_path, sep='\t', header=None, names=names,
658
+ comment='#', low_memory=False)
659
+
660
+ # Skip track/browser lines
661
+ df = df[~df['chrom'].astype(str).str.startswith(('track', 'browser'))]
662
+
663
+ # Normalize chromosomes
664
+ df['chrom'] = df['chrom'].apply(normalize_chromosome)
665
+
666
+ # Ensure numeric types
667
+ df['start'] = pd.to_numeric(df['start'], errors='coerce')
668
+ df['end'] = pd.to_numeric(df['end'], errors='coerce')
669
+
670
+ return df
671
+
672
+
673
+ def peak_statistics(peaks_df):
674
+ """Calculate basic peak statistics.
675
+
676
+ Args:
677
+ peaks_df: BED DataFrame from load_bed_file()
678
+
679
+ Returns:
680
+ dict with peak statistics
681
+ """
682
+ peaks_df = peaks_df.copy()
683
+ peaks_df['length'] = peaks_df['end'] - peaks_df['start']
684
+
685
+ stats_dict = {
686
+ 'total_peaks': len(peaks_df),
687
+ 'mean_peak_length': peaks_df['length'].mean(),
688
+ 'median_peak_length': peaks_df['length'].median(),
689
+ 'total_coverage_bp': peaks_df['length'].sum(),
690
+ 'peaks_per_chromosome': peaks_df['chrom'].value_counts().to_dict(),
691
+ }
692
+
693
+ if 'signalValue' in peaks_df.columns:
694
+ stats_dict['mean_signal'] = peaks_df['signalValue'].mean()
695
+ stats_dict['median_signal'] = peaks_df['signalValue'].median()
696
+
697
+ if 'qValue' in peaks_df.columns:
698
+ stats_dict['mean_qvalue'] = peaks_df['qValue'].mean()
699
+
700
+ return stats_dict
701
+ ```
702
+
703
+ #### 2.2 Peak Annotation
704
+
705
+ ```python
706
+ def annotate_peaks_to_genes(peaks_df, gene_annotation=None,
707
+ tss_upstream=2000, tss_downstream=500):
708
+ """Annotate peaks to nearest gene / genomic feature.
709
+
710
+ Args:
711
+ peaks_df: BED DataFrame
712
+ gene_annotation: DataFrame with gene coordinates (chr, start, end, gene_name, strand)
713
+ tss_upstream: bp upstream of TSS to define promoter
714
+ tss_downstream: bp downstream of TSS to define promoter
715
+
716
+ Returns:
717
+ DataFrame with peak annotations
718
+ """
719
+ if gene_annotation is None:
720
+ return peaks_df # No annotation available
721
+
722
+ annotated = peaks_df.copy()
723
+ annotations = []
724
+
725
+ for _, peak in peaks_df.iterrows():
726
+ peak_chr = peak['chrom']
727
+ peak_mid = (peak['start'] + peak['end']) // 2
728
+
729
+ # Filter genes on same chromosome
730
+ chr_genes = gene_annotation[gene_annotation['chr'] == peak_chr]
731
+
732
+ if len(chr_genes) == 0:
733
+ annotations.append({
734
+ 'nearest_gene': 'intergenic',
735
+ 'distance_to_tss': np.nan,
736
+ 'feature': 'intergenic'
737
+ })
738
+ continue
739
+
740
+ # Calculate distance to TSS
741
+ tss_positions = chr_genes.apply(
742
+ lambda g: g['start'] if g.get('strand', '+') == '+' else g['end'],
743
+ axis=1
744
+ )
745
+ distances = (peak_mid - tss_positions).abs()
746
+ nearest_idx = distances.idxmin()
747
+ nearest_gene = chr_genes.loc[nearest_idx]
748
+ distance = distances.loc[nearest_idx]
749
+ tss = tss_positions.loc[nearest_idx]
750
+
751
+ # Classify feature type
752
+ if abs(peak_mid - tss) <= tss_upstream:
753
+ feature = 'promoter'
754
+ elif peak['start'] >= nearest_gene['start'] and peak['end'] <= nearest_gene['end']:
755
+ feature = 'gene_body'
756
+ elif abs(peak_mid - tss) <= 10000:
757
+ feature = 'proximal'
758
+ else:
759
+ feature = 'distal'
760
+
761
+ annotations.append({
762
+ 'nearest_gene': nearest_gene.get('gene_name', nearest_gene.name),
763
+ 'distance_to_tss': int(distance),
764
+ 'feature': feature
765
+ })
766
+
767
+ ann_df = pd.DataFrame(annotations, index=peaks_df.index)
768
+ return pd.concat([peaks_df, ann_df], axis=1)
769
+
770
+
771
+ def classify_peak_regions(annotated_peaks):
772
+ """Classify peaks into genomic regions.
773
+
774
+ Returns:
775
+ dict with counts per region type
776
+ """
777
+ if 'feature' not in annotated_peaks.columns:
778
+ return {'unknown': len(annotated_peaks)}
779
+
780
+ return annotated_peaks['feature'].value_counts().to_dict()
781
+ ```
782
+
783
+ #### 2.3 Peak Overlap Analysis
784
+
785
+ ```python
786
+ def find_overlaps(peaks_a, peaks_b, min_overlap=1):
787
+ """Find overlapping peaks between two BED DataFrames.
788
+
789
+ Uses a simple interval overlap approach (pure Python, no pybedtools).
790
+
791
+ Args:
792
+ peaks_a: BED DataFrame (query)
793
+ peaks_b: BED DataFrame (subject)
794
+ min_overlap: minimum overlap in bp
795
+
796
+ Returns:
797
+ DataFrame of overlapping pairs
798
+ """
799
+ overlaps = []
800
+
801
+ # Group by chromosome for efficiency
802
+ for chrom in peaks_a['chrom'].unique():
803
+ a_chr = peaks_a[peaks_a['chrom'] == chrom].sort_values('start')
804
+ b_chr = peaks_b[peaks_b['chrom'] == chrom].sort_values('start')
805
+
806
+ if len(b_chr) == 0:
807
+ continue
808
+
809
+ for _, a_peak in a_chr.iterrows():
810
+ # Binary search for potential overlaps
811
+ for _, b_peak in b_chr.iterrows():
812
+ if b_peak['start'] >= a_peak['end']:
813
+ break
814
+ if b_peak['end'] <= a_peak['start']:
815
+ continue
816
+
817
+ # Calculate overlap
818
+ overlap_start = max(a_peak['start'], b_peak['start'])
819
+ overlap_end = min(a_peak['end'], b_peak['end'])
820
+ overlap_bp = overlap_end - overlap_start
821
+
822
+ if overlap_bp >= min_overlap:
823
+ overlaps.append({
824
+ 'a_chrom': chrom,
825
+ 'a_start': a_peak['start'],
826
+ 'a_end': a_peak['end'],
827
+ 'b_start': b_peak['start'],
828
+ 'b_end': b_peak['end'],
829
+ 'overlap_bp': overlap_bp,
830
+ })
831
+
832
+ return pd.DataFrame(overlaps) if overlaps else pd.DataFrame()
833
+
834
+
835
+ def jaccard_similarity(peaks_a, peaks_b, genome='hg38'):
836
+ """Calculate Jaccard similarity between two peak sets.
837
+
838
+ Jaccard = intersection / union of genomic coverage.
839
+ """
840
+ chr_lengths = get_chromosome_lengths(genome)
841
+ total_genome = sum(chr_lengths.values())
842
+
843
+ # Simple approximation: total bp covered
844
+ coverage_a = (peaks_a['end'] - peaks_a['start']).sum()
845
+ coverage_b = (peaks_b['end'] - peaks_b['start']).sum()
846
+
847
+ overlaps = find_overlaps(peaks_a, peaks_b)
848
+ if len(overlaps) == 0:
849
+ return 0.0
850
+
851
+ intersection = overlaps['overlap_bp'].sum()
852
+ union = coverage_a + coverage_b - intersection
853
+
854
+ return intersection / union if union > 0 else 0.0
855
+ ```
856
+
857
+ ---
858
+
859
+ ### Phase 3: ATAC-seq Analysis
860
+
861
+ #### 3.1 ATAC-seq Peak Processing
862
+
863
+ ```python
864
+ def load_atac_peaks(file_path):
865
+ """Load ATAC-seq peak file (typically narrowPeak format)."""
866
+ return load_bed_file(file_path, format='narrowPeak')
867
+
868
+
869
+ def atac_peak_statistics(peaks_df):
870
+ """ATAC-seq specific statistics.
871
+
872
+ ATAC-seq peaks represent open chromatin regions.
873
+ """
874
+ basic_stats = peak_statistics(peaks_df)
875
+
876
+ # ATAC-specific: nucleosome-free region (NFR) analysis
877
+ # NFR peaks typically < 150bp
878
+ peaks_df = peaks_df.copy()
879
+ peaks_df['length'] = peaks_df['end'] - peaks_df['start']
880
+ nfr_peaks = peaks_df[peaks_df['length'] < 150]
881
+ nucleosome_peaks = peaks_df[peaks_df['length'] >= 150]
882
+
883
+ basic_stats['nfr_peaks'] = len(nfr_peaks)
884
+ basic_stats['nucleosome_peaks'] = len(nucleosome_peaks)
885
+ basic_stats['nfr_fraction'] = len(nfr_peaks) / len(peaks_df) if len(peaks_df) > 0 else 0
886
+
887
+ return basic_stats
888
+
889
+
890
+ def chromatin_accessibility_by_region(peaks_df, gene_annotation=None):
891
+ """Calculate chromatin accessibility distribution across genomic regions."""
892
+ annotated = annotate_peaks_to_genes(peaks_df, gene_annotation)
893
+ regions = classify_peak_regions(annotated)
894
+
895
+ total = sum(regions.values())
896
+ region_fractions = {k: v / total for k, v in regions.items()}
897
+
898
+ return {
899
+ 'counts': regions,
900
+ 'fractions': region_fractions,
901
+ 'total_peaks': total,
902
+ }
903
+ ```
904
+
905
+ ---
906
+
907
+ ### Phase 4: Multi-Omics Integration
908
+
909
+ #### 4.1 Expression-Methylation Correlation
910
+
911
+ ```python
912
+ def correlate_methylation_expression(beta_df, expression_df, probe_gene_map,
913
+ method='pearson', correction='fdr_bh'):
914
+ """Correlate methylation levels with gene expression.
915
+
916
+ Args:
917
+ beta_df: Methylation matrix (probes x samples)
918
+ expression_df: Expression matrix (genes x samples)
919
+ probe_gene_map: dict or Series mapping probe IDs to gene symbols
920
+ method: 'pearson' or 'spearman'
921
+ correction: multiple testing correction
922
+
923
+ Returns:
924
+ DataFrame with correlation, p-value per probe-gene pair
925
+ """
926
+ # Align samples
927
+ common_samples = list(set(beta_df.columns) & set(expression_df.columns))
928
+ if len(common_samples) < 5:
929
+ raise ValueError(f"Not enough common samples: {len(common_samples)}")
930
+
931
+ beta_aligned = beta_df[common_samples]
932
+ expr_aligned = expression_df[common_samples]
933
+
934
+ results = []
935
+ for probe, gene in probe_gene_map.items():
936
+ if probe not in beta_aligned.index or gene not in expr_aligned.index:
937
+ continue
938
+
939
+ meth_vals = beta_aligned.loc[probe].values
940
+ expr_vals = expr_aligned.loc[gene].values
941
+
942
+ mask = ~np.isnan(meth_vals) & ~np.isnan(expr_vals)
943
+ if sum(mask) < 5:
944
+ continue
945
+
946
+ if method == 'pearson':
947
+ corr, pval = stats.pearsonr(meth_vals[mask], expr_vals[mask])
948
+ else:
949
+ corr, pval = stats.spearmanr(meth_vals[mask], expr_vals[mask])
950
+
951
+ results.append({
952
+ 'probe': probe,
953
+ 'gene': gene,
954
+ 'correlation': corr,
955
+ 'pvalue': pval,
956
+ 'n_samples': sum(mask),
957
+ })
958
+
959
+ result_df = pd.DataFrame(results)
960
+
961
+ if len(result_df) > 0:
962
+ valid_pvals = result_df['pvalue'].dropna()
963
+ if len(valid_pvals) > 0:
964
+ reject, padj, _, _ = mt.multipletests(valid_pvals.values, method=correction)
965
+ result_df.loc[valid_pvals.index, 'padj'] = padj
966
+
967
+ return result_df
968
+ ```
969
+
970
+ #### 4.2 ChIP-seq + Expression Integration
971
+
972
+ ```python
973
+ def integrate_chipseq_expression(peaks_df, expression_df, gene_annotation,
974
+ tss_window=5000):
975
+ """Integrate ChIP-seq peaks with gene expression.
976
+
977
+ Args:
978
+ peaks_df: ChIP-seq peaks (BED)
979
+ expression_df: Gene expression (genes x samples)
980
+ gene_annotation: Gene coordinates
981
+ tss_window: window around TSS for promoter peaks
982
+
983
+ Returns:
984
+ DataFrame with genes having promoter peaks and their expression
985
+ """
986
+ annotated = annotate_peaks_to_genes(peaks_df, gene_annotation,
987
+ tss_upstream=tss_window)
988
+ promoter_peaks = annotated[annotated['feature'] == 'promoter']
989
+
990
+ # Get genes with promoter peaks
991
+ peak_genes = promoter_peaks['nearest_gene'].unique()
992
+
993
+ # Get expression for these genes
994
+ common_genes = [g for g in peak_genes if g in expression_df.index]
995
+
996
+ result = pd.DataFrame({
997
+ 'gene': common_genes,
998
+ 'has_promoter_peak': True,
999
+ 'mean_expression': [expression_df.loc[g].mean() for g in common_genes],
1000
+ })
1001
+
1002
+ return result
1003
+ ```
1004
+
1005
+ ---
1006
+
1007
+ ### Phase 5: Clinical Data Integration
1008
+
1009
+ #### 5.1 Missing Data Analysis
1010
+
1011
+ ```python
1012
+ def missing_data_analysis(clinical_df=None, expression_df=None,
1013
+ methylation_df=None, sample_id_col=None):
1014
+ """Analyze missing data across multiple omics modalities.
1015
+
1016
+ For BixBench questions like:
1017
+ "How many patients have no missing data for vital status, gene expression, and methylation?"
1018
+
1019
+ Args:
1020
+ clinical_df: Clinical data (patients x variables)
1021
+ expression_df: Expression matrix (genes x samples)
1022
+ methylation_df: Methylation matrix (probes x samples)
1023
+ sample_id_col: Column name for sample/patient IDs in clinical data
1024
+
1025
+ Returns:
1026
+ dict with completeness statistics
1027
+ """
1028
+ results = {}
1029
+
1030
+ # Get sample sets from each modality
1031
+ clinical_samples = set()
1032
+ if clinical_df is not None:
1033
+ if sample_id_col and sample_id_col in clinical_df.columns:
1034
+ clinical_samples = set(clinical_df[sample_id_col].dropna())
1035
+ else:
1036
+ clinical_samples = set(clinical_df.index)
1037
+ results['clinical_samples'] = len(clinical_samples)
1038
+
1039
+ expression_samples = set()
1040
+ if expression_df is not None:
1041
+ expression_samples = set(expression_df.columns)
1042
+ results['expression_samples'] = len(expression_samples)
1043
+
1044
+ methylation_samples = set()
1045
+ if methylation_df is not None:
1046
+ methylation_samples = set(methylation_df.columns)
1047
+ results['methylation_samples'] = len(methylation_samples)
1048
+
1049
+ # Intersection: samples with ALL data types
1050
+ all_sets = []
1051
+ if clinical_samples:
1052
+ all_sets.append(clinical_samples)
1053
+ if expression_samples:
1054
+ all_sets.append(expression_samples)
1055
+ if methylation_samples:
1056
+ all_sets.append(methylation_samples)
1057
+
1058
+ if len(all_sets) > 0:
1059
+ complete_samples = set.intersection(*all_sets)
1060
+ results['complete_samples'] = len(complete_samples)
1061
+ results['complete_sample_ids'] = sorted(complete_samples)
1062
+ else:
1063
+ results['complete_samples'] = 0
1064
+
1065
+ # Additional: check for missing values within clinical data
1066
+ if clinical_df is not None:
1067
+ for col in clinical_df.columns:
1068
+ n_missing = clinical_df[col].isna().sum()
1069
+ n_total = len(clinical_df)
1070
+ results[f'clinical_{col}_missing'] = n_missing
1071
+ results[f'clinical_{col}_complete'] = n_total - n_missing
1072
+
1073
+ return results
1074
+
1075
+
1076
+ def find_complete_cases(data_frames, variables=None):
1077
+ """Find samples that are complete across specified data frames and variables.
1078
+
1079
+ Args:
1080
+ data_frames: dict of {name: DataFrame} where columns are samples
1081
+ variables: dict of {df_name: [variable_names]} for clinical variables to check
1082
+
1083
+ Returns:
1084
+ set of sample IDs with complete data
1085
+ """
1086
+ sample_sets = []
1087
+ for name, df in data_frames.items():
1088
+ if df is not None:
1089
+ if variables and name in variables:
1090
+ # Check specific variables for completeness
1091
+ for var in variables[name]:
1092
+ if var in df.columns:
1093
+ complete = set(df[df[var].notna()].index)
1094
+ sample_sets.append(complete)
1095
+ elif var in df.index:
1096
+ complete = set(df.columns[df.loc[var].notna()])
1097
+ sample_sets.append(complete)
1098
+ else:
1099
+ # Just check sample presence
1100
+ sample_sets.append(set(df.columns))
1101
+
1102
+ if not sample_sets:
1103
+ return set()
1104
+
1105
+ return set.intersection(*sample_sets)
1106
+ ```
1107
+
1108
+ ---
1109
+
1110
+ ### Phase 6: ToolUniverse Annotation Integration
1111
+
1112
+ Use ToolUniverse tools for biological annotation of epigenomic findings.
1113
+
1114
+ #### 6.1 Gene-Level Annotation
1115
+
1116
+ ```python
1117
+ from tooluniverse import ToolUniverse
1118
+ tu = ToolUniverse()
1119
+ tu.load_tools()
1120
+
1121
+ # Annotate differentially methylated genes
1122
+ def annotate_genes_with_tooluniverse(gene_list, tu):
1123
+ """Annotate a list of genes using ToolUniverse tools.
1124
+
1125
+ Uses:
1126
+ - Ensembl for gene coordinates and cross-references
1127
+ - SCREEN for regulatory elements near gene
1128
+ - ChIPAtlas for ChIP-seq experiments
1129
+ """
1130
+ annotations = {}
1131
+ for gene in gene_list[:20]: # Limit for API rate
1132
+ annotation = {'gene': gene}
1133
+
1134
+ # Ensembl lookup
1135
+ try:
1136
+ ens = tu.tools.ensembl_lookup_gene(id=gene, species='homo_sapiens')
1137
+ if isinstance(ens, dict):
1138
+ data = ens.get('data', ens)
1139
+ annotation['ensembl_id'] = data.get('id', 'N/A')
1140
+ annotation['chr'] = data.get('seq_region_name', 'N/A')
1141
+ annotation['start'] = data.get('start', 'N/A')
1142
+ annotation['end'] = data.get('end', 'N/A')
1143
+ annotation['biotype'] = data.get('biotype', 'N/A')
1144
+ except Exception:
1145
+ pass
1146
+
1147
+ # SCREEN regulatory elements
1148
+ try:
1149
+ screen = tu.tools.SCREEN_get_regulatory_elements(
1150
+ gene_name=gene, element_type="enhancer", limit=5
1151
+ )
1152
+ if screen is not None:
1153
+ annotation['screen_enhancers'] = 'available'
1154
+ except Exception:
1155
+ pass
1156
+
1157
+ annotations[gene] = annotation
1158
+
1159
+ return pd.DataFrame.from_dict(annotations, orient='index')
1160
+ ```
1161
+
1162
+ #### 6.2 ChIPAtlas Integration
1163
+
1164
+ ```python
1165
+ def query_chipatlas_experiments(antigen, genome='hg38', cell_type=None, tu=None):
1166
+ """Query ChIPAtlas for available ChIP-seq experiments.
1167
+
1168
+ Args:
1169
+ antigen: TF or histone mark name (e.g., 'H3K27ac', 'CTCF')
1170
+ genome: genome build
1171
+ cell_type: optional cell type filter
1172
+
1173
+ Returns:
1174
+ ChIPAtlas experiment metadata
1175
+ """
1176
+ if tu is None:
1177
+ from tooluniverse import ToolUniverse
1178
+ tu = ToolUniverse()
1179
+ tu.load_tools()
1180
+
1181
+ params = {
1182
+ 'operation': 'get_experiment_list',
1183
+ 'genome': genome,
1184
+ 'antigen': antigen,
1185
+ 'limit': 50,
1186
+ }
1187
+ if cell_type:
1188
+ params['cell_type'] = cell_type
1189
+
1190
+ return tu.tools.ChIPAtlas_get_experiments(**params)
1191
+ ```
1192
+
1193
+ #### 6.3 Ensembl Regulatory Feature Annotation
1194
+
1195
+ ```python
1196
+ def annotate_regions_with_ensembl(regions, species='human', tu=None):
1197
+ """Annotate genomic regions with Ensembl regulatory features.
1198
+
1199
+ Args:
1200
+ regions: list of (chr, start, end) tuples
1201
+ species: Ensembl species name
1202
+
1203
+ Returns:
1204
+ dict of region -> regulatory features
1205
+ """
1206
+ if tu is None:
1207
+ from tooluniverse import ToolUniverse
1208
+ tu = ToolUniverse()
1209
+ tu.load_tools()
1210
+
1211
+ annotations = {}
1212
+ for chrom, start, end in regions[:10]: # Limit for API rate
1213
+ # Ensembl uses chromosome without 'chr' prefix
1214
+ ens_chrom = chrom.replace('chr', '') if chrom.startswith('chr') else chrom
1215
+ region_str = f"{ens_chrom}:{start}-{end}"
1216
+
1217
+ try:
1218
+ result = tu.tools.ensembl_get_regulatory_features(
1219
+ region=region_str, feature="regulatory", species=species
1220
+ )
1221
+ annotations[(chrom, start, end)] = result
1222
+ except Exception as e:
1223
+ annotations[(chrom, start, end)] = {'error': str(e)}
1224
+
1225
+ return annotations
1226
+ ```
1227
+
1228
+ ---
1229
+
1230
+ ### Phase 7: Genome-Wide Statistics
1231
+
1232
+ #### 7.1 Comprehensive Genome Statistics
1233
+
1234
+ ```python
1235
+ def genome_wide_methylation_stats(beta_df, manifest=None, genome='hg38'):
1236
+ """Calculate comprehensive genome-wide methylation statistics.
1237
+
1238
+ Args:
1239
+ beta_df: Methylation matrix (probes x samples)
1240
+ manifest: Probe annotation
1241
+ genome: genome build
1242
+
1243
+ Returns:
1244
+ dict with genome-wide statistics
1245
+ """
1246
+ stats_result = {
1247
+ 'total_probes': len(beta_df),
1248
+ 'total_samples': beta_df.shape[1],
1249
+ 'global_mean_beta': float(beta_df.mean().mean()),
1250
+ 'global_median_beta': float(beta_df.median().median()),
1251
+ 'global_std_beta': float(beta_df.values[~np.isnan(beta_df.values)].std()),
1252
+ 'missing_fraction': float(beta_df.isna().mean().mean()),
1253
+ }
1254
+
1255
+ # Per-sample statistics
1256
+ stats_result['sample_means'] = beta_df.mean(axis=0).describe().to_dict()
1257
+
1258
+ # Per-probe statistics
1259
+ probe_var = beta_df.var(axis=1, skipna=True)
1260
+ stats_result['probe_variance'] = {
1261
+ 'mean': float(probe_var.mean()),
1262
+ 'median': float(probe_var.median()),
1263
+ 'max': float(probe_var.max()),
1264
+ }
1265
+ stats_result['high_variance_probes'] = int((probe_var > 0.01).sum())
1266
+
1267
+ # Chromosome-level stats (if manifest available)
1268
+ if manifest is not None:
1269
+ density_df = chromosome_cpg_density(beta_df.index.tolist(), manifest, genome)
1270
+ stats_result['chromosome_density'] = density_df.to_dict('records')
1271
+ stats_result['genome_wide_density'] = genome_wide_average_density(density_df)
1272
+
1273
+ return stats_result
1274
+
1275
+
1276
+ def summarize_differential_methylation(dm_results, alpha=0.05):
1277
+ """Summarize differential methylation results.
1278
+
1279
+ Args:
1280
+ dm_results: Output from differential_methylation()
1281
+
1282
+ Returns:
1283
+ dict with summary statistics
1284
+ """
1285
+ sig = dm_results[dm_results['padj'] < alpha]
1286
+ hyper = sig[sig['delta_beta'] > 0]
1287
+ hypo = sig[sig['delta_beta'] < 0]
1288
+
1289
+ return {
1290
+ 'total_tested': len(dm_results),
1291
+ 'total_significant': len(sig),
1292
+ 'hypermethylated': len(hyper),
1293
+ 'hypomethylated': len(hypo),
1294
+ 'fraction_significant': len(sig) / len(dm_results) if len(dm_results) > 0 else 0,
1295
+ 'mean_delta_beta_sig': float(sig['delta_beta'].mean()) if len(sig) > 0 else 0,
1296
+ 'max_abs_delta_beta': float(sig['delta_beta'].abs().max()) if len(sig) > 0 else 0,
1297
+ }
1298
+ ```
1299
+
1300
+ ---
1301
+
1302
+ ## ToolUniverse Tool Parameter Reference
1303
+
1304
+ ### Regulatory Annotation Tools
1305
+
1306
+ | Tool | Parameters | Returns |
1307
+ |------|-----------|---------|
1308
+ | `ensembl_lookup_gene` | `id: str`, `species: str` (REQUIRED) | `{status, data: {id, display_name, seq_region_name, start, end, strand, biotype}, url}` |
1309
+ | `ensembl_get_regulatory_features` | `region: str` (no "chr"), `feature: str`, `species: str` | `{status, data: [...features...]}` |
1310
+ | `ensembl_get_overlap_features` | `region: str`, `feature: str`, `species: str` | Gene/transcript overlap data |
1311
+ | `SCREEN_get_regulatory_elements` | `gene_name: str`, `element_type: str`, `limit: int` | cCREs (enhancers, promoters, insulators) |
1312
+ | `ReMap_get_transcription_factor_binding` | `gene_name: str`, `cell_type: str`, `limit: int` | TF binding sites |
1313
+ | `RegulomeDB_query_variant` | `rsid: str` | `{status, data, url}` regulatory score |
1314
+ | `jaspar_search_matrices` | `search: str`, `collection: str`, `species: str` | `{count, results: [...matrices...]}` |
1315
+ | `ENCODE_search_experiments` | `assay_title: str`, `target: str`, `organism: str`, `limit: int` | Experiment metadata |
1316
+ | `ChIPAtlas_get_experiments` | `operation: str` (REQUIRED: "get_experiment_list"), `genome: str`, `antigen: str`, `cell_type: str`, `limit: int` | Experiment list |
1317
+ | `ChIPAtlas_search_datasets` | `operation` REQUIRED, `antigenList/celltypeList` | Dataset search results |
1318
+ | `ChIPAtlas_enrichment_analysis` | Various input types (BED regions, motifs, genes) | Enrichment results |
1319
+ | `ChIPAtlas_get_peak_data` | `operation` REQUIRED | Peak data download URLs |
1320
+ | `FourDN_search_data` | `operation: str` (REQUIRED: "search_data"), `assay_title: str`, `limit: int` | Chromatin conformation data |
1321
+
1322
+ ### Gene Annotation Tools
1323
+
1324
+ | Tool | Parameters | Returns |
1325
+ |------|-----------|---------|
1326
+ | `MyGene_query_genes` | `query: str` | `{hits: [{_id, symbol, ensembl, ...}]}` |
1327
+ | `MyGene_batch_query` | `gene_ids: list[str]`, `fields: str` | `{results: [{query, symbol, ...}]}` |
1328
+ | `HGNC_get_gene_info` | `symbol: str` | Gene symbol, aliases, IDs |
1329
+ | `GO_get_annotations_for_gene` | `gene_id: str` | GO annotations |
1330
+
1331
+ ### CRITICAL Tool Notes
1332
+
1333
+ - **ensembl_lookup_gene**: REQUIRES `species='homo_sapiens'` parameter
1334
+ - **ensembl_get_regulatory_features**: Region format is "17:start-end" (NO "chr" prefix)
1335
+ - **ChIPAtlas tools**: ALL require `operation` parameter (SOAP-style)
1336
+ - **FourDN tools**: ALL require `operation` parameter (SOAP-style)
1337
+ - **SCREEN**: Returns JSON-LD format with `@context, @graph` keys
1338
+
1339
+ ---
1340
+
1341
+ ## Response Format Notes
1342
+
1343
+ - **Methylation data**: Typically stored as probes (rows) x samples (columns), beta values 0-1
1344
+ - **BED files**: Tab-separated, 0-based half-open coordinates
1345
+ - **narrowPeak**: 10-column BED extension with signalValue, pValue, qValue, peak
1346
+ - **Illumina manifests**: Contains probe ID, chromosome, position, gene annotation
1347
+ - **Clinical data**: Patient/sample-centric with clinical variables as columns
1348
+
1349
+ ---
1350
+
1351
+ ## Fallback Strategies
1352
+
1353
+ | Scenario | Primary | Fallback |
1354
+ |----------|---------|----------|
1355
+ | No manifest file | Load from data dir | Build minimal from Ensembl lookup |
1356
+ | No pybedtools | Pure Python overlap | pandas-based interval intersection |
1357
+ | No pyBigWig | Skip BigWig analysis | Use pre-computed summary tables |
1358
+ | Missing clinical data | Report missing | Use available samples only |
1359
+ | Low sample count | Parametric test | Use non-parametric (Wilcoxon) |
1360
+ | Large dataset (>500K probes) | Full analysis | Sample or chunk-based processing |
1361
+
1362
+ ---
1363
+
1364
+ ## Common Use Patterns
1365
+
1366
+ ### Pattern 1: Methylation Array Analysis
1367
+ ```
1368
+ Input: Beta-value matrix + manifest + clinical data
1369
+ Question: "How many CpGs are differentially methylated?"
1370
+
1371
+ Flow:
1372
+ 1. Load beta matrix, manifest, clinical data
1373
+ 2. Filter CpG probes (cg only, remove sex chr, variance filter)
1374
+ 3. Define groups from clinical data
1375
+ 4. Run differential_methylation()
1376
+ 5. Apply thresholds (padj < 0.05, |delta_beta| > 0.2)
1377
+ 6. Report count and direction (hyper/hypo)
1378
+ ```
1379
+
1380
+ ### Pattern 2: Age-Related CpG Density
1381
+ ```
1382
+ Input: Beta-value matrix + manifest + ages
1383
+ Question: "What is the density ratio of age-related CpGs between chr1 and chr2?"
1384
+
1385
+ Flow:
1386
+ 1. Load beta matrix and ages from clinical data
1387
+ 2. Run identify_age_related_cpgs()
1388
+ 3. Filter significant age-related CpGs
1389
+ 4. Map to chromosomes using manifest
1390
+ 5. Calculate chromosome_cpg_density()
1391
+ 6. Compute ratio between specified chromosomes
1392
+ ```
1393
+
1394
+ ### Pattern 3: Multi-Omics Missing Data
1395
+ ```
1396
+ Input: Clinical + expression + methylation data files
1397
+ Question: "How many patients have complete data for all modalities?"
1398
+
1399
+ Flow:
1400
+ 1. Load all data files
1401
+ 2. Extract sample IDs from each
1402
+ 3. Find intersection (common samples)
1403
+ 4. Check for NaN/missing within clinical variables
1404
+ 5. Report complete cases count
1405
+ ```
1406
+
1407
+ ### Pattern 4: ChIP-seq Peak Annotation
1408
+ ```
1409
+ Input: BED/narrowPeak file
1410
+ Question: "What fraction of peaks are in promoter regions?"
1411
+
1412
+ Flow:
1413
+ 1. Load BED file with load_bed_file()
1414
+ 2. Load or fetch gene annotation (Ensembl)
1415
+ 3. Run annotate_peaks_to_genes()
1416
+ 4. Classify regions with classify_peak_regions()
1417
+ 5. Calculate fraction in promoters
1418
+ ```
1419
+
1420
+ ### Pattern 5: Methylation-Expression Integration
1421
+ ```
1422
+ Input: Beta matrix + expression matrix + probe-gene mapping
1423
+ Question: "What is the correlation between methylation and expression?"
1424
+
1425
+ Flow:
1426
+ 1. Load both matrices
1427
+ 2. Build probe-gene map from manifest
1428
+ 3. Align samples across datasets
1429
+ 4. Run correlate_methylation_expression()
1430
+ 5. Report significant anti-correlations
1431
+ ```
1432
+
1433
+ ---
1434
+
1435
+ ## Edge Cases
1436
+
1437
+ ### Missing Probe Annotation
1438
+ When no manifest/annotation file is available:
1439
+ - Extract chromosome from probe ID naming patterns if possible
1440
+ - Use ToolUniverse Ensembl tools to build minimal annotation
1441
+ - Report limitation: "chromosome mapping unavailable for X probes"
1442
+
1443
+ ### Mixed Genome Builds
1444
+ When data uses different builds:
1445
+ - Detect build from context (data README, file names, known coordinates)
1446
+ - Use appropriate chromosome lengths for density calculations
1447
+ - Do NOT mix hg19 and hg38 coordinates
1448
+
1449
+ ### Very Large Datasets
1450
+ For datasets with >500K CpG sites:
1451
+ - Use chunked processing for differential methylation
1452
+ - Pre-filter by variance before statistical testing
1453
+ - Use vectorized operations (avoid row-by-row loops where possible)
1454
+
1455
+ ### Sample ID Mismatches
1456
+ Clinical and molecular data may use different ID formats:
1457
+ - TCGA: barcode (TCGA-XX-XXXX-01A) vs patient ID (TCGA-XX-XXXX)
1458
+ - Try truncating or matching partial IDs
1459
+ - Report number of matched/unmatched samples
1460
+
1461
+ ---
1462
+
1463
+ ## Limitations
1464
+
1465
+ - **No native pybedtools**: Uses pure Python interval operations (slower for very large BED files)
1466
+ - **No native pyBigWig**: Cannot read BigWig files directly without package
1467
+ - **No R bridge**: Does not use methylKit, ChIPseeker, or DiffBind
1468
+ - **Illumina-centric**: Methylation functions designed for 450K/EPIC arrays
1469
+ - **Statistical simplicity**: Uses t-test/Wilcoxon for differential methylation (not limma/bumphunter)
1470
+ - **No peak calling**: Assumes peaks are pre-called; does not run MACS2 or similar
1471
+ - **API rate limits**: ToolUniverse annotation limited to ~20 genes per batch
1472
+
1473
+ ---
1474
+
1475
+ ## Summary
1476
+
1477
+ **Genomics & Epigenomics Data Processing Skill** provides:
1478
+
1479
+ 1. **Methylation analysis** - Beta-value processing, CpG filtering, differential methylation, age-related CpGs, chromosome density
1480
+ 2. **ChIP-seq analysis** - Peak loading (BED/narrowPeak), peak annotation, overlap analysis, statistics
1481
+ 3. **ATAC-seq analysis** - Chromatin accessibility peaks, NFR detection, region classification
1482
+ 4. **Multi-omics integration** - Methylation-expression correlation, ChIP-seq + expression
1483
+ 5. **Clinical integration** - Missing data analysis across modalities, complete case identification
1484
+ 6. **Genome-wide statistics** - Chromosome-level density, genome-wide averages, density ratios
1485
+ 7. **ToolUniverse annotation** - Ensembl, SCREEN, ChIPAtlas, JASPAR, ENCODE for biological context
1486
+
1487
+ **Core packages**: pandas, numpy, scipy, statsmodels, pysam, gseapy
1488
+ **ToolUniverse tools**: 25+ tools across Ensembl, SCREEN, ENCODE, ChIPAtlas, JASPAR, ReMap, RegulomeDB, 4DN
1489
+ **Best for**: BixBench-style quantitative questions about methylation data, ChIP-seq peaks, chromatin accessibility, and multi-omics integration