pi-skill-search 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +20 -0
- package/LICENSE +21 -0
- package/README.md +97 -0
- package/index.ts +163 -0
- package/package.json +48 -0
- package/skills/adaptyv/SKILL.md +92 -0
- package/skills/add-community-extension/SKILL.md +85 -0
- package/skills/aeon/SKILL.md +111 -0
- package/skills/ai-slop-cleaner/SKILL.md +118 -0
- package/skills/anndata/SKILL.md +83 -0
- package/skills/arboreto/SKILL.md +107 -0
- package/skills/ask/SKILL.md +55 -0
- package/skills/astropy/SKILL.md +30 -0
- package/skills/async-worker-recovery/SKILL.md +44 -0
- package/skills/autopilot/SKILL.md +63 -0
- package/skills/autoresearch/SKILL.md +64 -0
- package/skills/autoskill/SKILL.md +116 -0
- package/skills/babysit/SKILL.md +43 -0
- package/skills/benchling-integration/SKILL.md +106 -0
- package/skills/bgpt-paper-search/SKILL.md +67 -0
- package/skills/biopython/SKILL.md +29 -0
- package/skills/bioservices/SKILL.md +96 -0
- package/skills/brainstorming/SKILL.md +104 -0
- package/skills/cancel/SKILL.md +85 -0
- package/skills/ccg/SKILL.md +87 -0
- package/skills/celery-pipeline/SKILL.md +30 -0
- package/skills/cellxgene-census/SKILL.md +104 -0
- package/skills/child-pi-spawning/SKILL.md +85 -0
- package/skills/cirq/SKILL.md +113 -0
- package/skills/citation-management/SKILL.md +91 -0
- package/skills/clinical-decision-support/SKILL.md +117 -0
- package/skills/clinical-reports/SKILL.md +118 -0
- package/skills/clinical-trial/SKILL.md +28 -0
- package/skills/cobrapy/SKILL.md +116 -0
- package/skills/configure-notifications/SKILL.md +85 -0
- package/skills/consciousness-council/SKILL.md +120 -0
- package/skills/context-artifact-hygiene/SKILL.md +85 -0
- package/skills/context-mode-ops/SKILL.md +87 -0
- package/skills/dask/SKILL.md +85 -0
- package/skills/database-lookup/SKILL.md +118 -0
- package/skills/datamol/SKILL.md +108 -0
- package/skills/debug/SKILL.md +32 -0
- package/skills/deep-dive/SKILL.md +114 -0
- package/skills/deep-interview/SKILL.md +90 -0
- package/skills/deepchem/SKILL.md +117 -0
- package/skills/deepinit/SKILL.md +100 -0
- package/skills/deeptools/SKILL.md +118 -0
- package/skills/delegation-patterns/SKILL.md +56 -0
- package/skills/depmap/SKILL.md +94 -0
- package/skills/dhdna-profiler/SKILL.md +86 -0
- package/skills/diffdock/SKILL.md +101 -0
- package/skills/dispatching-parallel-agents/SKILL.md +119 -0
- package/skills/dnanexus-integration/SKILL.md +118 -0
- package/skills/do/SKILL.md +48 -0
- package/skills/docker-sandbox/SKILL.md +29 -0
- package/skills/docx/SKILL.md +119 -0
- package/skills/esm/SKILL.md +116 -0
- package/skills/etetoolkit/SKILL.md +103 -0
- package/skills/event-log-tracing/SKILL.md +85 -0
- package/skills/exa-search/SKILL.md +72 -0
- package/skills/executing-plans/SKILL.md +69 -0
- package/skills/exploratory-data-analysis/SKILL.md +118 -0
- package/skills/external-context/SKILL.md +80 -0
- package/skills/fastapi/SKILL.md +30 -0
- package/skills/finishing-a-development-branch/SKILL.md +106 -0
- package/skills/flowio/SKILL.md +114 -0
- package/skills/fluidsim/SKILL.md +108 -0
- package/skills/generate-image/SKILL.md +108 -0
- package/skills/geniml/SKILL.md +117 -0
- package/skills/geomaster/SKILL.md +109 -0
- package/skills/geopandas/SKILL.md +114 -0
- package/skills/get-available-resources/SKILL.md +100 -0
- package/skills/gget/SKILL.md +111 -0
- package/skills/ginkgo-cloud-lab/SKILL.md +52 -0
- package/skills/git-master/SKILL.md +85 -0
- package/skills/glycoengineering/SKILL.md +104 -0
- package/skills/gtars/SKILL.md +104 -0
- package/skills/hackernews-frontpage/SKILL.md +46 -0
- package/skills/histolab/SKILL.md +98 -0
- package/skills/how-it-works/SKILL.md +25 -0
- package/skills/hud/SKILL.md +86 -0
- package/skills/hugging-science/SKILL.md +93 -0
- package/skills/huggingface/SKILL.md +30 -0
- package/skills/hypogenic/SKILL.md +107 -0
- package/skills/hypothesis-generation/SKILL.md +118 -0
- package/skills/imaging-data-commons/SKILL.md +119 -0
- package/skills/infographics/SKILL.md +102 -0
- package/skills/iso-13485-certification/SKILL.md +114 -0
- package/skills/knowledge-agent/SKILL.md +83 -0
- package/skills/labarchive-integration/SKILL.md +98 -0
- package/skills/lamindb/SKILL.md +119 -0
- package/skills/landsat/SKILL.md +29 -0
- package/skills/latchbio-integration/SKILL.md +118 -0
- package/skills/latex-posters/SKILL.md +112 -0
- package/skills/learn-codebase/SKILL.md +24 -0
- package/skills/learner/SKILL.md +118 -0
- package/skills/literature-review/SKILL.md +118 -0
- package/skills/live-agent-lifecycle/SKILL.md +85 -0
- package/skills/mailbox-interactive/SKILL.md +85 -0
- package/skills/make-plan/SKILL.md +59 -0
- package/skills/markdown-mermaid-writing/SKILL.md +118 -0
- package/skills/market-research-reports/SKILL.md +119 -0
- package/skills/markitdown/SKILL.md +111 -0
- package/skills/markitdown-docs/SKILL.md +28 -0
- package/skills/matchms/SKILL.md +91 -0
- package/skills/matlab/SKILL.md +118 -0
- package/skills/matplotlib/SKILL.md +30 -0
- package/skills/mcp-setup/SKILL.md +84 -0
- package/skills/medchem/SKILL.md +109 -0
- package/skills/mem-search/SKILL.md +96 -0
- package/skills/modal/SKILL.md +104 -0
- package/skills/model-routing-context/SKILL.md +85 -0
- package/skills/molecular-dynamics/SKILL.md +116 -0
- package/skills/molfeat/SKILL.md +110 -0
- package/skills/multi-perspective-review/SKILL.md +85 -0
- package/skills/networkx/SKILL.md +111 -0
- package/skills/neurokit2/SKILL.md +114 -0
- package/skills/neuropixels-analysis/SKILL.md +112 -0
- package/skills/nilearn/SKILL.md +29 -0
- package/skills/observability-reliability/SKILL.md +43 -0
- package/skills/omc-doctor/SKILL.md +86 -0
- package/skills/omc-reference/SKILL.md +119 -0
- package/skills/omc-setup/SKILL.md +82 -0
- package/skills/omc-teams/SKILL.md +81 -0
- package/skills/omero-integration/SKILL.md +111 -0
- package/skills/open-notebook/SKILL.md +100 -0
- package/skills/openephys/SKILL.md +28 -0
- package/skills/opentrons-integration/SKILL.md +110 -0
- package/skills/optimize-for-gpu/SKILL.md +119 -0
- package/skills/orchestration/SKILL.md +85 -0
- package/skills/ownership-session-security/SKILL.md +43 -0
- package/skills/paper-lookup/SKILL.md +119 -0
- package/skills/paperzilla/SKILL.md +114 -0
- package/skills/parallel-web/SKILL.md +64 -0
- package/skills/pathfinder/SKILL.md +114 -0
- package/skills/pathml/SKILL.md +98 -0
- package/skills/pdf/SKILL.md +113 -0
- package/skills/peer-review/SKILL.md +119 -0
- package/skills/pennylane/SKILL.md +119 -0
- package/skills/phylogenetics/SKILL.md +102 -0
- package/skills/pi-extension-lifecycle/SKILL.md +41 -0
- package/skills/plan/SKILL.md +66 -0
- package/skills/polars/SKILL.md +114 -0
- package/skills/polars-bio/SKILL.md +84 -0
- package/skills/pptx/SKILL.md +118 -0
- package/skills/pptx-posters/SKILL.md +112 -0
- package/skills/primekg/SKILL.md +97 -0
- package/skills/project-session-manager/SKILL.md +85 -0
- package/skills/protocolsio-integration/SKILL.md +119 -0
- package/skills/pubmed-search/SKILL.md +29 -0
- package/skills/pufferlib/SKILL.md +103 -0
- package/skills/pydeseq2/SKILL.md +106 -0
- package/skills/pydicom/SKILL.md +115 -0
- package/skills/pyhealth/SKILL.md +117 -0
- package/skills/pylabrobot/SKILL.md +100 -0
- package/skills/pymatgen/SKILL.md +28 -0
- package/skills/pymc/SKILL.md +108 -0
- package/skills/pymoo/SKILL.md +90 -0
- package/skills/pyopenms/SKILL.md +119 -0
- package/skills/pysam/SKILL.md +118 -0
- package/skills/pyspark/SKILL.md +30 -0
- package/skills/pytdc/SKILL.md +102 -0
- package/skills/pytorch/SKILL.md +31 -0
- package/skills/pytorch-lightning/SKILL.md +119 -0
- package/skills/pyzotero/SKILL.md +104 -0
- package/skills/qiskit/SKILL.md +119 -0
- package/skills/qutip/SKILL.md +111 -0
- package/skills/ralph/SKILL.md +23 -0
- package/skills/ralplan/SKILL.md +105 -0
- package/skills/rdflib/SKILL.md +29 -0
- package/skills/rdkit/SKILL.md +30 -0
- package/skills/read-only-explorer/SKILL.md +85 -0
- package/skills/receiving-code-review/SKILL.md +103 -0
- package/skills/release/SKILL.md +117 -0
- package/skills/remember/SKILL.md +39 -0
- package/skills/requesting-code-review/SKILL.md +85 -0
- package/skills/requirements-to-task-packet/SKILL.md +65 -0
- package/skills/research-grants/SKILL.md +118 -0
- package/skills/research-lookup/SKILL.md +117 -0
- package/skills/research-reproducibility/SKILL.md +28 -0
- package/skills/resource-discovery-config/SKILL.md +43 -0
- package/skills/rowan/SKILL.md +100 -0
- package/skills/runtime-state-reader/SKILL.md +46 -0
- package/skills/safe-bash/SKILL.md +85 -0
- package/skills/scanpy/SKILL.md +32 -0
- package/skills/scholar-evaluation/SKILL.md +115 -0
- package/skills/scientific-brainstorming/SKILL.md +118 -0
- package/skills/scientific-critical-thinking/SKILL.md +119 -0
- package/skills/scientific-schematics/SKILL.md +116 -0
- package/skills/scientific-slides/SKILL.md +117 -0
- package/skills/scientific-visualization/SKILL.md +109 -0
- package/skills/scientific-writing/SKILL.md +119 -0
- package/skills/scikit-bio/SKILL.md +92 -0
- package/skills/scikit-learn/SKILL.md +99 -0
- package/skills/scikit-survival/SKILL.md +110 -0
- package/skills/sciomc/SKILL.md +86 -0
- package/skills/scvelo/SKILL.md +106 -0
- package/skills/scvi-tools/SKILL.md +114 -0
- package/skills/seaborn/SKILL.md +97 -0
- package/skills/secure-agent-orchestration-review/SKILL.md +47 -0
- package/skills/self-improve/SKILL.md +119 -0
- package/skills/semantic-compression/SKILL.md +62 -0
- package/skills/setup/SKILL.md +42 -0
- package/skills/shap/SKILL.md +103 -0
- package/skills/simpy/SKILL.md +116 -0
- package/skills/skill/SKILL.md +117 -0
- package/skills/skill-search/SKILL.md +67 -0
- package/skills/skillify/SKILL.md +46 -0
- package/skills/smart-explore/SKILL.md +94 -0
- package/skills/sqlite-pandas/SKILL.md +30 -0
- package/skills/stable-baselines3/SKILL.md +86 -0
- package/skills/state-mutation-locking/SKILL.md +44 -0
- package/skills/statistical-analysis/SKILL.md +108 -0
- package/skills/statsmodels/SKILL.md +29 -0
- package/skills/subagent-driven-development/SKILL.md +89 -0
- package/skills/sympy/SKILL.md +115 -0
- package/skills/system-prompts/SKILL.md +116 -0
- package/skills/systematic-debugging/SKILL.md +119 -0
- package/skills/team/SKILL.md +85 -0
- package/skills/test-driven-development/SKILL.md +84 -0
- package/skills/tiledbvcf/SKILL.md +119 -0
- package/skills/timeline-report/SKILL.md +85 -0
- package/skills/timesfm-forecasting/SKILL.md +112 -0
- package/skills/torch-geometric/SKILL.md +118 -0
- package/skills/torchdrug/SKILL.md +118 -0
- package/skills/trace/SKILL.md +118 -0
- package/skills/transformers/SKILL.md +110 -0
- package/skills/treatment-plans/SKILL.md +119 -0
- package/skills/ui-render-performance/SKILL.md +41 -0
- package/skills/ultragoal/SKILL.md +63 -0
- package/skills/ultraqa/SKILL.md +85 -0
- package/skills/ultrawork/SKILL.md +20 -0
- package/skills/umap-learn/SKILL.md +119 -0
- package/skills/usfiscaldata/SKILL.md +118 -0
- package/skills/using-git-worktrees/SKILL.md +112 -0
- package/skills/using-superpowers/SKILL.md +85 -0
- package/skills/using-vetc/SKILL.md +92 -0
- package/skills/vaex/SKILL.md +111 -0
- package/skills/venue-templates/SKILL.md +113 -0
- package/skills/verification-before-completion/SKILL.md +88 -0
- package/skills/verification-before-done/SKILL.md +68 -0
- package/skills/verify/SKILL.md +33 -0
- package/skills/version-bump/SKILL.md +54 -0
- package/skills/vetc-analyze-ba/SKILL.md +117 -0
- package/skills/vetc-analyze-codebase/SKILL.md +118 -0
- package/skills/vetc-api-design/SKILL.md +103 -0
- package/skills/vetc-brainstorming/SKILL.md +116 -0
- package/skills/vetc-change-proposal/SKILL.md +111 -0
- package/skills/vetc-cicd/SKILL.md +113 -0
- package/skills/vetc-continuous-learning/SKILL.md +115 -0
- package/skills/vetc-deep-interview/SKILL.md +103 -0
- package/skills/vetc-docgen/SKILL.md +108 -0
- package/skills/vetc-frontend-patterns/SKILL.md +99 -0
- package/skills/vetc-iterative-retrieval/SKILL.md +110 -0
- package/skills/vetc-java-patterns/SKILL.md +113 -0
- package/skills/vetc-meta-skill-creator/SKILL.md +99 -0
- package/skills/vetc-oracle-patterns/SKILL.md +109 -0
- package/skills/vetc-performance-testing/SKILL.md +104 -0
- package/skills/vetc-pr-response/SKILL.md +106 -0
- package/skills/vetc-ralph/SKILL.md +108 -0
- package/skills/vetc-ralplan/SKILL.md +116 -0
- package/skills/vetc-receiving-review/SKILL.md +106 -0
- package/skills/vetc-reconcile-patterns/SKILL.md +117 -0
- package/skills/vetc-refactoring/SKILL.md +96 -0
- package/skills/vetc-runbook/SKILL.md +118 -0
- package/skills/vetc-sast/SKILL.md +118 -0
- package/skills/vetc-sdlc/SKILL.md +97 -0
- package/skills/vetc-security/SKILL.md +117 -0
- package/skills/vetc-spec-driven/SKILL.md +111 -0
- package/skills/vetc-spec-quality/SKILL.md +117 -0
- package/skills/vetc-systematic-debugging/SKILL.md +74 -0
- package/skills/vetc-tdd/SKILL.md +96 -0
- package/skills/vetc-thinking-pm/SKILL.md +110 -0
- package/skills/vetc-ui-visual-qa/SKILL.md +117 -0
- package/skills/vetc-verify/SKILL.md +101 -0
- package/skills/visual-verdict/SKILL.md +59 -0
- package/skills/what-if-oracle/SKILL.md +87 -0
- package/skills/widget-rendering/SKILL.md +85 -0
- package/skills/wiki/SKILL.md +69 -0
- package/skills/workspace-isolation/SKILL.md +85 -0
- package/skills/worktree-isolation/SKILL.md +85 -0
- package/skills/wowerpoint/SKILL.md +101 -0
- package/skills/writer-memory/SKILL.md +82 -0
- package/skills/writing-plans/SKILL.md +115 -0
- package/skills/writing-skills/SKILL.md +115 -0
- package/skills/xgboost/SKILL.md +29 -0
- package/skills/xgboost-ts/SKILL.md +28 -0
- package/skills/xlsx/SKILL.md +111 -0
- package/skills/zarr-python/SKILL.md +101 -0
- package/src/categories.ts +383 -0
- package/src/format.ts +104 -0
- package/src/indexer.ts +101 -0
- package/src/proactive.ts +51 -0
- package/src/scanner.ts +85 -0
- package/src/search.ts +89 -0
- package/src/strip.ts +29 -0
- package/src/synonyms.ts +83 -0
- package/src/text.ts +118 -0
- package/src/types.ts +64 -0
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pymoo
|
|
3
|
+
description: Multi-objective optimization framework. NSGA-II, NSGA-III, MOEA/D, Pareto fronts, constraint handling, benchmarks (ZDT, DTLZ), for engineering design and optimization problems.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Pymoo - Multi-Objective Optimization in Python
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Pymoo is a comprehensive Python framework for optimization with emphasis on multi-objective problems. Solve single and multi-objective optimization using state-of-the-art algorithms (NSGA-II/III, MOEA/D), benchmark problems (ZDT, DTLZ), customizable genetic operators, and multi-criteria decision making methods. Excels at finding trade-off solutions (Pareto fronts) for problems with conflicting objectives.
|
|
11
|
+
|
|
12
|
+
## When to Use This Skill
|
|
13
|
+
|
|
14
|
+
This skill should be used when:
|
|
15
|
+
- Solving optimization problems with one or multiple objectives
|
|
16
|
+
- Finding Pareto-optimal solutions and analyzing trade-offs
|
|
17
|
+
- Implementing evolutionary algorithms (GA, DE, PSO, NSGA-II/III)
|
|
18
|
+
- Working with constrained optimization problems
|
|
19
|
+
- Benchmarking algorithms on standard test problems (ZDT, DTLZ, WFG)
|
|
20
|
+
- Customizing genetic operators (crossover, mutation, selection)
|
|
21
|
+
- Visualizing high-dimensional optimization results
|
|
22
|
+
- Making decisions from multiple competing solutions
|
|
23
|
+
- Handling binary, discrete, continuous, or mixed-variable problems
|
|
24
|
+
|
|
25
|
+
## Core Concepts
|
|
26
|
+
|
|
27
|
+
### The Unified Interface
|
|
28
|
+
|
|
29
|
+
Pymoo uses a consistent `minimize()` function for all optimization tasks:
|
|
30
|
+
|
|
31
|
+
```python
|
|
32
|
+
from pymoo.optimize import minimize
|
|
33
|
+
|
|
34
|
+
result = minimize(
|
|
35
|
+
problem, # What to optimize
|
|
36
|
+
algorithm, # How to optimize
|
|
37
|
+
termination, # When to stop
|
|
38
|
+
seed=1,
|
|
39
|
+
verbose=True
|
|
40
|
+
)
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
### Problem Types
|
|
44
|
+
|
|
45
|
+
**Single-objective:** One objective to minimize/maximize
|
|
46
|
+
**Multi-objective:** 2-3 conflicting objectives → Pareto front
|
|
47
|
+
**Many-objective:** 4+ objectives → High-dimensional Pareto front
|
|
48
|
+
**Constrained:** Objectives + inequality/equality constraints
|
|
49
|
+
**Dynamic:** Time-varying objectives or constraints
|
|
50
|
+
|
|
51
|
+
## Quick Start Workflows
|
|
52
|
+
|
|
53
|
+
### Workflow 1: Single-Objective Optimization
|
|
54
|
+
|
|
55
|
+
**When:** Optimizing one objective function
|
|
56
|
+
|
|
57
|
+
**Steps:**
|
|
58
|
+
1. Define or select problem
|
|
59
|
+
2. Choose single-objective algorithm (GA, DE, PSO, CMA-ES)
|
|
60
|
+
3. Configure termination criteria
|
|
61
|
+
4. Run optimization
|
|
62
|
+
5. Extract best solution
|
|
63
|
+
|
|
64
|
+
**Example:**
|
|
65
|
+
```python
|
|
66
|
+
from pymoo.algorithms.soo.nonconvex.ga import GA
|
|
67
|
+
from pymoo.problems import get_problem
|
|
68
|
+
from pymoo.optimize import minimize
|
|
69
|
+
|
|
70
|
+
# Built-in problem
|
|
71
|
+
problem = get_problem("rastrigin", n_var=10)
|
|
72
|
+
|
|
73
|
+
# Configure Genetic Algorithm
|
|
74
|
+
algorithm = GA(
|
|
75
|
+
pop_size=100,
|
|
76
|
+
eliminate_duplicates=True
|
|
77
|
+
)
|
|
78
|
+
|
|
79
|
+
# Optimize
|
|
80
|
+
result = minimize(
|
|
81
|
+
problem,
|
|
82
|
+
algorithm,
|
|
83
|
+
('n_gen', 200),
|
|
84
|
+
seed=1,
|
|
85
|
+
verbose=True
|
|
86
|
+
)
|
|
87
|
+
|
|
88
|
+
print(f"Best solution: {result.X}")
|
|
89
|
+
print(f"Best objective: {result.F[0]}")
|
|
90
|
+
```
|
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pyopenms
|
|
3
|
+
description: Complete mass spectrometry analysis platform. Use for proteomics workflows feature detection, peptide identification, protein quantification, and complex LC-MS/MS pipelines. Supports extensive file formats and algorithms. Best for proteomics, comprehensive MS data processing. For simple spectral comparison and metabolite ID use matchms.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# PyOpenMS
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
PyOpenMS provides Python bindings to the OpenMS library for computational mass spectrometry, enabling analysis of proteomics and metabolomics data. Use for handling mass spectrometry file formats, processing spectral data, detecting features, identifying peptides/proteins, and performing quantitative analysis.
|
|
11
|
+
|
|
12
|
+
## Core Capabilities
|
|
13
|
+
|
|
14
|
+
PyOpenMS organizes functionality into these domains:
|
|
15
|
+
|
|
16
|
+
# Read mzML file
|
|
17
|
+
exp = ms.MSExperiment()
|
|
18
|
+
ms.MzMLFile().load("data.mzML", exp)
|
|
19
|
+
|
|
20
|
+
# Access spectra
|
|
21
|
+
for spectrum in exp:
|
|
22
|
+
mz, intensity = spectrum.get_peaks()
|
|
23
|
+
print(f"Spectrum: {len(mz)} peaks")
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
**For detailed file handling**: See `(see docs)`
|
|
27
|
+
|
|
28
|
+
### 2. Signal Processing
|
|
29
|
+
|
|
30
|
+
Process raw spectral data with smoothing, filtering, centroiding, and normalization.
|
|
31
|
+
|
|
32
|
+
Basic spectrum processing:
|
|
33
|
+
|
|
34
|
+
```python
|
|
35
|
+
# Smooth spectrum with Gaussian filter
|
|
36
|
+
gaussian = ms.GaussFilter()
|
|
37
|
+
params = gaussian.getParameters()
|
|
38
|
+
params.setValue("gaussian_width", 0.1)
|
|
39
|
+
gaussian.setParameters(params)
|
|
40
|
+
gaussian.filterExperiment(exp)
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
**For algorithm details**: See `(see docs)`
|
|
44
|
+
|
|
45
|
+
### 3. Feature Detection
|
|
46
|
+
|
|
47
|
+
Detect and link features across spectra and samples for quantitative analysis.
|
|
48
|
+
|
|
49
|
+
```python
|
|
50
|
+
# Detect features
|
|
51
|
+
ff = ms.FeatureFinder()
|
|
52
|
+
ff.run("centroided", exp, features, params, ms.FeatureMap())
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
**For complete workflows**: See `(see docs)`
|
|
56
|
+
|
|
57
|
+
### 4. Peptide and Protein Identification
|
|
58
|
+
|
|
59
|
+
Integrate with search engines and process identification results.
|
|
60
|
+
|
|
61
|
+
**Supported engines**: Comet, Mascot, MSGFPlus, XTandem, OMSSA, Myrimatch
|
|
62
|
+
|
|
63
|
+
Basic identification workflow:
|
|
64
|
+
|
|
65
|
+
```python
|
|
66
|
+
# Load identification data
|
|
67
|
+
protein_ids = []
|
|
68
|
+
peptide_ids = []
|
|
69
|
+
ms.IdXMLFile().load("identifications.idXML", protein_ids, peptide_ids)
|
|
70
|
+
|
|
71
|
+
# Apply FDR filtering
|
|
72
|
+
fdr = ms.FalseDiscoveryRate()
|
|
73
|
+
fdr.apply(peptide_ids)
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
**For detailed workflows**: See `(see docs)`
|
|
77
|
+
|
|
78
|
+
### 5. Metabolomics Analysis
|
|
79
|
+
|
|
80
|
+
Perform untargeted metabolomics preprocessing and analysis.
|
|
81
|
+
|
|
82
|
+
Typical workflow:
|
|
83
|
+
1. Load and process raw data
|
|
84
|
+
2. Detect features
|
|
85
|
+
3. Align retention times across samples
|
|
86
|
+
4. Link features to consensus map
|
|
87
|
+
5. Annotate with compound databases
|
|
88
|
+
|
|
89
|
+
**For complete metabolomics workflows**: See `(see docs)`
|
|
90
|
+
|
|
91
|
+
## Data Structures
|
|
92
|
+
|
|
93
|
+
PyOpenMS uses these primary objects:
|
|
94
|
+
|
|
95
|
+
- **MSExperiment**: Collection of spectra and chromatograms
|
|
96
|
+
- **MSSpectrum**: Single mass spectrum with m/z and intensity pairs
|
|
97
|
+
- **MSChromatogram**: Chromatographic trace
|
|
98
|
+
- **Feature**: Detected chromatographic peak with quality metrics
|
|
99
|
+
- **FeatureMap**: Collection of features
|
|
100
|
+
- **PeptideIdentification**: Search results for peptides
|
|
101
|
+
- **ProteinIdentification**: Search results for proteins
|
|
102
|
+
|
|
103
|
+
**For detailed documentation**: See `(see docs)`
|
|
104
|
+
|
|
105
|
+
## Common Workflows
|
|
106
|
+
|
|
107
|
+
### Quick Start: Load and Explore Data
|
|
108
|
+
|
|
109
|
+
```python
|
|
110
|
+
import pyopenms as ms
|
|
111
|
+
|
|
112
|
+
# Load mzML file
|
|
113
|
+
exp = ms.MSExperiment()
|
|
114
|
+
ms.MzMLFile().load("sample.mzML", exp)
|
|
115
|
+
|
|
116
|
+
# Get basic statistics
|
|
117
|
+
print(f"Number of spectra: {exp.getNrSpectra()}")
|
|
118
|
+
|
|
119
|
+
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pysam
|
|
3
|
+
description: Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Pysam
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Pysam is a Python module for reading, manipulating, and writing genomic datasets. Read/write SAM/BAM/CRAM alignment files, VCF/BCF variant files, and FASTA/FASTQ sequences with a Pythonic interface to htslib. Query tabix-indexed files, perform pileup analysis for coverage, and execute samtools/bcftools commands.
|
|
11
|
+
|
|
12
|
+
## When to Use This Skill
|
|
13
|
+
|
|
14
|
+
This skill should be used when:
|
|
15
|
+
- Working with sequencing alignment files (BAM/CRAM)
|
|
16
|
+
- Analyzing genetic variants (VCF/BCF)
|
|
17
|
+
- Extracting reference sequences or gene regions
|
|
18
|
+
- Processing raw sequencing data (FASTQ)
|
|
19
|
+
- Calculating coverage or read depth
|
|
20
|
+
- Implementing bioinformatics analysis pipelines
|
|
21
|
+
- Quality control of sequencing data
|
|
22
|
+
- Variant calling and annotation workflows
|
|
23
|
+
|
|
24
|
+
## Quick Start
|
|
25
|
+
|
|
26
|
+
### Basic Examples
|
|
27
|
+
|
|
28
|
+
**Read alignment file:**
|
|
29
|
+
```python
|
|
30
|
+
import pysam
|
|
31
|
+
|
|
32
|
+
# Open BAM file and fetch reads in region
|
|
33
|
+
samfile = pysam.AlignmentFile("example.bam", "rb")
|
|
34
|
+
for read in samfile.fetch("chr1", 1000, 2000):
|
|
35
|
+
print(f"{read.query_name}: {read.reference_start}")
|
|
36
|
+
samfile.close()
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
**Read variant file:**
|
|
40
|
+
```python
|
|
41
|
+
# Open VCF file and iterate variants
|
|
42
|
+
vcf = pysam.VariantFile("variants.vcf")
|
|
43
|
+
for variant in vcf:
|
|
44
|
+
print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alts}")
|
|
45
|
+
vcf.close()
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
**Query reference sequence:**
|
|
49
|
+
```python
|
|
50
|
+
# Open FASTA and extract sequence
|
|
51
|
+
fasta = pysam.FastaFile("reference.fasta")
|
|
52
|
+
sequence = fasta.fetch("chr1", 1000, 2000)
|
|
53
|
+
print(sequence)
|
|
54
|
+
fasta.close()
|
|
55
|
+
|
|
56
|
+
## Core Capabilities
|
|
57
|
+
|
|
58
|
+
### 1. Alignment File Operations (SAM/BAM/CRAM)
|
|
59
|
+
|
|
60
|
+
Use the `AlignmentFile` class to work with aligned sequencing reads. This is appropriate for analyzing mapping results, calculating coverage, extracting reads, or quality control.
|
|
61
|
+
|
|
62
|
+
**Common operations:**
|
|
63
|
+
- Open and read BAM/SAM/CRAM files
|
|
64
|
+
- Fetch reads from specific genomic regions
|
|
65
|
+
- Filter reads by mapping quality, flags, or other criteria
|
|
66
|
+
- Write filtered or modified alignments
|
|
67
|
+
- Calculate coverage statistics
|
|
68
|
+
- Perform pileup analysis (base-by-base coverage)
|
|
69
|
+
- Access read sequences, quality scores, and alignment information
|
|
70
|
+
|
|
71
|
+
**Reference:** See `(see docs)` for detailed documentation on:
|
|
72
|
+
- Opening and reading alignment files
|
|
73
|
+
|
|
74
|
+
### 2. Variant File Operations (VCF/BCF)
|
|
75
|
+
|
|
76
|
+
Use the `VariantFile` class to work with genetic variants from variant calling pipelines. This is appropriate for variant analysis, filtering, annotation, or population genetics.
|
|
77
|
+
|
|
78
|
+
**Common operations:**
|
|
79
|
+
- Read and write VCF/BCF files
|
|
80
|
+
- Query variants in specific regions
|
|
81
|
+
- Access variant information (position, alleles, quality)
|
|
82
|
+
- Extract genotype data for samples
|
|
83
|
+
- Filter variants by quality, allele frequency, or other criteria
|
|
84
|
+
- Annotate variants with additional information
|
|
85
|
+
- Subset samples or regions
|
|
86
|
+
|
|
87
|
+
**Reference:** See `(see docs)` for detailed documentation on:
|
|
88
|
+
- Opening and reading variant files
|
|
89
|
+
|
|
90
|
+
### 3. Sequence File Operations (FASTA/FASTQ)
|
|
91
|
+
|
|
92
|
+
Use `FastaFile` for random access to reference sequences and `FastxFile` for reading raw sequencing data. This is appropriate for extracting gene sequences, validating variants against reference, or processing raw reads.
|
|
93
|
+
|
|
94
|
+
**Common operations:**
|
|
95
|
+
- Query reference sequences by genomic coordinates
|
|
96
|
+
- Extract sequences for genes or regions of interest
|
|
97
|
+
- Read FASTQ files with quality scores
|
|
98
|
+
- Validate variant reference alleles
|
|
99
|
+
- Calculate sequence statistics
|
|
100
|
+
- Filter reads by quality or length
|
|
101
|
+
- Convert between FASTA and FASTQ formats
|
|
102
|
+
|
|
103
|
+
**Reference:** See `(see docs)` for detailed documentation on:
|
|
104
|
+
- FASTA file access and indexing
|
|
105
|
+
|
|
106
|
+
## Key Concepts
|
|
107
|
+
|
|
108
|
+
### Coordinate Systems
|
|
109
|
+
|
|
110
|
+
**Critical:** Pysam uses **0-based, half-open** coordinates (Python convention):
|
|
111
|
+
- Start positions are 0-based (first base is position 0)
|
|
112
|
+
- End positions are exclusive (not included in the range)
|
|
113
|
+
- Region 1000-2000 includes bases 1000-1999 (1000 bases total)
|
|
114
|
+
|
|
115
|
+
**Exception:** Region strings in `fetch()` follow samtools convention (1-based):
|
|
116
|
+
```python
|
|
117
|
+
|
|
118
|
+
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pyspark
|
|
3
|
+
description: Distributed data processing with Apache Spark. Use when working with large-scale datasets, distributed SQL, DataFrame operations, ETL pipelines, or cluster computing. Trigger on imports of pyspark, SparkSession, or mentions of big data, distributed, cluster, ETL, Spark, DataFrame at scale.
|
|
4
|
+
---
|
|
5
|
+
# pyspark
|
|
6
|
+
|
|
7
|
+
Use this skill for large-scale distributed data processing.
|
|
8
|
+
|
|
9
|
+
## Core patterns
|
|
10
|
+
|
|
11
|
+
- **Session**: `SparkSession.builder.appName('analysis').getOrCreate()`.
|
|
12
|
+
- **Read**: `spark.read.parquet('data/')` or `spark.read.csv('data.csv', header=True, inferSchema=True)`.
|
|
13
|
+
- **Transform**: `df.filter()`, `df.select()`, `df.groupBy().agg()`, `df.join(other, on='key')`.
|
|
14
|
+
- **SQL**: `df.createOrReplaceTempView('table')` → `spark.sql('SELECT * FROM table')`.
|
|
15
|
+
- **Write**: `df.write.parquet('output/', mode='overwrite')`.
|
|
16
|
+
|
|
17
|
+
## Rules
|
|
18
|
+
|
|
19
|
+
- Always use `coalesce(1)` or `repartition()` before writing small outputs.
|
|
20
|
+
- Persist intermediate DataFrames used multiple times: `df.persist(StorageLevel.MEMORY_AND_DISK)`.
|
|
21
|
+
- Use broadcast join for small/large table joins: `broadcast(small_df)`.
|
|
22
|
+
- Avoid `collect()` on large DataFrames — use `take(n)` or `toPandas()` with caution.
|
|
23
|
+
|
|
24
|
+
## Anti-patterns
|
|
25
|
+
|
|
26
|
+
- Don't call `toPandas()` on large DataFrames — it collects all data to driver.
|
|
27
|
+
- Don't use Python UDFs when built-in Spark SQL functions suffice.
|
|
28
|
+
- Don't create `SparkSession` per operation — reuse across the application.
|
|
29
|
+
|
|
30
|
+
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pytdc
|
|
3
|
+
description: Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# PyTDC (Therapeutics Data Commons)
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
PyTDC is an open-science platform providing AI-ready datasets and benchmarks for drug discovery and development. Access curated datasets spanning the entire therapeutics pipeline with standardized evaluation metrics and meaningful data splits, organized into three categories: single-instance prediction (molecular/protein properties), multi-instance prediction (drug-target interactions, DDI), and generation (molecule generation, retrosynthesis).
|
|
11
|
+
|
|
12
|
+
## When to Use This Skill
|
|
13
|
+
|
|
14
|
+
This skill should be used when:
|
|
15
|
+
- Working with drug discovery or therapeutic ML datasets
|
|
16
|
+
- Benchmarking machine learning models on standardized pharmaceutical tasks
|
|
17
|
+
- Predicting molecular properties (ADME, toxicity, bioactivity)
|
|
18
|
+
- Predicting drug-target or drug-drug interactions
|
|
19
|
+
- Generating novel molecules with desired properties
|
|
20
|
+
- Accessing curated datasets with proper train/test splits (scaffold, cold-split)
|
|
21
|
+
- Using molecular oracles for property optimization
|
|
22
|
+
|
|
23
|
+
## Quick Start
|
|
24
|
+
|
|
25
|
+
The basic pattern for accessing any TDC dataset follows this structure:
|
|
26
|
+
|
|
27
|
+
```python
|
|
28
|
+
from tdc.<problem> import
|
|
29
|
+
data = (name='')
|
|
30
|
+
split = data.get_split(method='scaffold', seed=1, frac=[0.7, 0.1, 0.2])
|
|
31
|
+
df = data.get_data(format='df')
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
Where:
|
|
35
|
+
- `<problem>`: One of `single_pred`, `multi_pred`, or `generation`
|
|
36
|
+
- ``: Specific task category (e.g., ADME, DTI, MolGen)
|
|
37
|
+
- ``: Dataset name within that task
|
|
38
|
+
|
|
39
|
+
# Returns dict with 'train', 'valid', 'test' DataFrames
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Single-Instance Prediction Tasks
|
|
43
|
+
|
|
44
|
+
Single-instance prediction involves forecasting properties of individual biomedical entities (molecules, proteins, etc.).
|
|
45
|
+
|
|
46
|
+
### Available Task Categories
|
|
47
|
+
|
|
48
|
+
#### 1. ADME (Absorption, Distribution, Metabolism, Excretion)
|
|
49
|
+
|
|
50
|
+
Predict pharmacokinetic properties of drug molecules.
|
|
51
|
+
|
|
52
|
+
```python
|
|
53
|
+
from tdc.single_pred import ADME
|
|
54
|
+
data = ADME(name='Caco2_Wang') # Intestinal permeability
|
|
55
|
+
# Other datasets: HIA_Hou, Bioavailability_Ma, Lipophilicity_AstraZeneca, etc.
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
**Common ADME datasets:**
|
|
59
|
+
- Caco2 - Intestinal permeability
|
|
60
|
+
- HIA - Human intestinal absorption
|
|
61
|
+
- Bioavailability - Oral bioavailability
|
|
62
|
+
- Lipophilicity - Octanol-water partition coefficient
|
|
63
|
+
- Solubility - Aqueous solubility
|
|
64
|
+
- BBB - Blood-brain barrier penetration
|
|
65
|
+
- CYP - Cytochrome P450 metabolism
|
|
66
|
+
|
|
67
|
+
#### 2. Toxicity (Tox)
|
|
68
|
+
|
|
69
|
+
Predict toxicity and adverse effects of compounds.
|
|
70
|
+
|
|
71
|
+
# Other datasets: AMES, DILI, Carcinogens_Lagunin, etc.
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
**Common toxicity datasets:**
|
|
75
|
+
- hERG - Cardiac toxicity
|
|
76
|
+
- AMES - Mutagenicity
|
|
77
|
+
- DILI - Drug-induced liver injury
|
|
78
|
+
- Carcinogens - Carcinogenicity
|
|
79
|
+
- ClinTox - Clinical trial toxicity
|
|
80
|
+
|
|
81
|
+
#### 3. HTS (High-Throughput Screening)
|
|
82
|
+
|
|
83
|
+
Bioactivity predictions from screening data.
|
|
84
|
+
|
|
85
|
+
```python
|
|
86
|
+
|
|
87
|
+
## Multi-Instance Prediction Tasks
|
|
88
|
+
|
|
89
|
+
Multi-instance prediction involves forecasting properties of interactions between multiple biomedical entities.
|
|
90
|
+
|
|
91
|
+
### Available Task Categories
|
|
92
|
+
|
|
93
|
+
#### 1. DTI (Drug-Target Interaction)
|
|
94
|
+
|
|
95
|
+
Predict binding affinity between drugs and protein targets.
|
|
96
|
+
|
|
97
|
+
```python
|
|
98
|
+
from tdc.multi_pred import DTI
|
|
99
|
+
data = DTI(name='BindingDB_Kd')
|
|
100
|
+
split = data.get_split()
|
|
101
|
+
|
|
102
|
+
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pytorch
|
|
3
|
+
description: Deep learning framework for building and training neural networks. Use when creating CNN, RNN, Transformer, or custom architectures, training models with GPU acceleration, implementing custom loss functions, or optimizing with autograd. Trigger on imports of torch, torchvision, torchaudio, nn.Module, or mentions of neural network training, GPU, CUDA, tensor operations.
|
|
4
|
+
---
|
|
5
|
+
# pytorch
|
|
6
|
+
|
|
7
|
+
Use this skill for deep learning model development.
|
|
8
|
+
|
|
9
|
+
## Core patterns
|
|
10
|
+
|
|
11
|
+
- **Model**: Subclass `nn.Module`, define `__init__` and `forward()`.
|
|
12
|
+
- **Training loop**: Forward → loss → `loss.backward()` → `optimizer.step()` → `optimizer.zero_grad()`.
|
|
13
|
+
- **Data**: `Dataset` + `DataLoader(shuffle=True, num_workers=4, pin_memory=True)`.
|
|
14
|
+
- **GPU**: `tensor.to(device)`, `model.to(device)`. Check `torch.cuda.is_available()`.
|
|
15
|
+
- **Saving**: `torch.save(model.state_dict(), path)` / `model.load_state_dict(torch.load(path))`.
|
|
16
|
+
|
|
17
|
+
## Rules
|
|
18
|
+
|
|
19
|
+
- Use `torch.no_grad()` context during inference and evaluation.
|
|
20
|
+
- Set `model.eval()` before validation; `model.train()` before training.
|
|
21
|
+
- Use `nn.Sequential` for simple stacks; custom `forward()` for complex architectures.
|
|
22
|
+
- Learning rate scheduling: call `scheduler.step()` after `optimizer.step()`.
|
|
23
|
+
- Mixed precision: `torch.amp.autocast('cuda')` + `GradScaler` for faster training.
|
|
24
|
+
|
|
25
|
+
## Anti-patterns
|
|
26
|
+
|
|
27
|
+
- Don't forget `optimizer.zero_grad()` — gradients accumulate by default.
|
|
28
|
+
- Don't use `.item()` inside training loop on large tensors — only for scalar metrics.
|
|
29
|
+
- Don't hardcode device — always use `device = 'cuda' if torch.cuda.is_available() else 'cpu'`.
|
|
30
|
+
|
|
31
|
+
|
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pytorch-lightning
|
|
3
|
+
description: Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# PyTorch Lightning
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
PyTorch Lightning is a deep learning framework that organizes PyTorch code to eliminate boilerplate while maintaining full flexibility. Automate training workflows, multi-device orchestration, and implement best practices for neural network training and scaling across multiple GPUs/TPUs.
|
|
11
|
+
|
|
12
|
+
## When to Use This Skill
|
|
13
|
+
|
|
14
|
+
This skill should be used when:
|
|
15
|
+
- Building, training, or deploying neural networks using PyTorch Lightning
|
|
16
|
+
- Organizing PyTorch code into LightningModules
|
|
17
|
+
- Configuring Trainers for multi-GPU/TPU training
|
|
18
|
+
- Implementing data pipelines with LightningDataModules
|
|
19
|
+
- Working with callbacks, logging, and distributed training strategies (DDP, FSDP, DeepSpeed)
|
|
20
|
+
- Structuring deep learning projects professionally
|
|
21
|
+
|
|
22
|
+
## Core Capabilities
|
|
23
|
+
|
|
24
|
+
### 1. LightningModule - Model Definition
|
|
25
|
+
|
|
26
|
+
Organize PyTorch models into six logical sections:
|
|
27
|
+
|
|
28
|
+
1. **Initialization** - `__init__()` and `setup()`
|
|
29
|
+
2. **Training Loop** - `training_step(batch, batch_idx)`
|
|
30
|
+
3. **Validation Loop** - `validation_step(batch, batch_idx)`
|
|
31
|
+
4. **Test Loop** - `test_step(batch, batch_idx)`
|
|
32
|
+
5. **Prediction** - `predict_step(batch, batch_idx)`
|
|
33
|
+
6. **Optimizer Configuration** - `configure_optimizers()`
|
|
34
|
+
|
|
35
|
+
**Quick template reference:** See `scripts/template_lightning_module.py` for a complete boilerplate.
|
|
36
|
+
|
|
37
|
+
**Detailed documentation:** Read `(see docs)` for comprehensive method documentation, hooks, properties, and best practices.
|
|
38
|
+
|
|
39
|
+
### 2. Trainer - Training Automation
|
|
40
|
+
|
|
41
|
+
The Trainer automates the training loop, device management, gradient operations, and callbacks. Key features:
|
|
42
|
+
|
|
43
|
+
- Multi-GPU/TPU support with strategy selection (DDP, FSDP, DeepSpeed)
|
|
44
|
+
- Automatic mixed precision training
|
|
45
|
+
- Gradient accumulation and clipping
|
|
46
|
+
- Checkpointing and early stopping
|
|
47
|
+
- Progress bars and logging
|
|
48
|
+
|
|
49
|
+
**Quick setup reference:** See `scripts/quick_trainer_setup.py` for common Trainer configurations.
|
|
50
|
+
|
|
51
|
+
**Detailed documentation:** Read `(see docs)` for all parameters, methods, and configuration options.
|
|
52
|
+
|
|
53
|
+
### 3. LightningDataModule - Data Pipeline Organization
|
|
54
|
+
|
|
55
|
+
Encapsulate all data processing steps in a reusable class:
|
|
56
|
+
|
|
57
|
+
1. `prepare_data()` - Download and process data (single-process)
|
|
58
|
+
2. `setup()` - Create datasets and apply transforms (per-GPU)
|
|
59
|
+
3. `train_dataloader()` - Return training DataLoader
|
|
60
|
+
4. `val_dataloader()` - Return validation DataLoader
|
|
61
|
+
5. `test_dataloader()` - Return test DataLoader
|
|
62
|
+
|
|
63
|
+
**Quick template reference:** See `scripts/template_datamodule.py` for a complete boilerplate.
|
|
64
|
+
|
|
65
|
+
**Detailed documentation:** Read `(see docs)` for method details and usage patterns.
|
|
66
|
+
|
|
67
|
+
### 4. Callbacks - Extensible Training Logic
|
|
68
|
+
|
|
69
|
+
Add custom functionality at specific training hooks without modifying your LightningModule. Built-in callbacks include:
|
|
70
|
+
|
|
71
|
+
- **ModelCheckpoint** - Save best/latest models
|
|
72
|
+
- **EarlyStopping** - Stop when metrics plateau
|
|
73
|
+
- **LearningRateMonitor** - Track LR scheduler changes
|
|
74
|
+
- **BatchSizeFinder** - Auto-determine optimal batch size
|
|
75
|
+
|
|
76
|
+
**Detailed documentation:** Read `(see docs)` for built-in callbacks and custom callback creation.
|
|
77
|
+
|
|
78
|
+
### 5. Logging - Experiment Tracking
|
|
79
|
+
|
|
80
|
+
Integrate with multiple logging platforms:
|
|
81
|
+
|
|
82
|
+
- TensorBoard (default)
|
|
83
|
+
- Weights & Biases (WandbLogger)
|
|
84
|
+
- MLflow (MLFlowLogger)
|
|
85
|
+
- Neptune (NeptuneLogger)
|
|
86
|
+
- Comet (CometLogger)
|
|
87
|
+
- CSV (CSVLogger)
|
|
88
|
+
|
|
89
|
+
Log metrics using `self.log("metric_name", value)` in any LightningModule method.
|
|
90
|
+
|
|
91
|
+
**Detailed documentation:** Read `(see docs)` for logger setup and configuration.
|
|
92
|
+
|
|
93
|
+
### 6. Distributed Training - Scale to Multiple Devices
|
|
94
|
+
|
|
95
|
+
Choose the right strategy based on model size:
|
|
96
|
+
|
|
97
|
+
- **DDP** - For models <500M parameters (ResNet, smaller transformers)
|
|
98
|
+
- **FSDP** - For models 500M+ parameters (large transformers, recommended for Lightning users)
|
|
99
|
+
- **DeepSpeed** - For cutting-edge features and fine-grained control
|
|
100
|
+
|
|
101
|
+
Configure with: `Trainer(strategy="ddp", accelerator="gpu", devices=4)`
|
|
102
|
+
|
|
103
|
+
**Detailed documentation:** Read `(see docs)` for strategy comparison and configuration.
|
|
104
|
+
|
|
105
|
+
### 7. Best Practices
|
|
106
|
+
|
|
107
|
+
- Device agnostic code - Use `self.device` instead of `.cuda()`
|
|
108
|
+
- Hyperparameter saving - Use `self.save_hyperparameters()` in `__init__()`
|
|
109
|
+
- Metric logging - Use `self.log()` for automatic aggregation across devices
|
|
110
|
+
- Reproducibility - Use `seed_everything()` and `Trainer(deterministic=True)`
|
|
111
|
+
- Debugging - Use `Trainer(fast_dev_run=True)` to test with 1 batch
|
|
112
|
+
|
|
113
|
+
**Detailed documentation:** Read `(see docs)` for common patterns and pitfalls.
|
|
114
|
+
|
|
115
|
+
## Quick Workflow
|
|
116
|
+
|
|
117
|
+
1. **Define model:**
|
|
118
|
+
|
|
119
|
+
|