npm - pi-skill-search - Versions diffs - 0.1.0 - Mend

pi-skill-search 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (299) hide show

package/CHANGELOG.md +20 -0
package/LICENSE +21 -0
package/README.md +97 -0
package/index.ts +163 -0
package/package.json +48 -0
package/skills/adaptyv/SKILL.md +92 -0
package/skills/add-community-extension/SKILL.md +85 -0
package/skills/aeon/SKILL.md +111 -0
package/skills/ai-slop-cleaner/SKILL.md +118 -0
package/skills/anndata/SKILL.md +83 -0
package/skills/arboreto/SKILL.md +107 -0
package/skills/ask/SKILL.md +55 -0
package/skills/astropy/SKILL.md +30 -0
package/skills/async-worker-recovery/SKILL.md +44 -0
package/skills/autopilot/SKILL.md +63 -0
package/skills/autoresearch/SKILL.md +64 -0
package/skills/autoskill/SKILL.md +116 -0
package/skills/babysit/SKILL.md +43 -0
package/skills/benchling-integration/SKILL.md +106 -0
package/skills/bgpt-paper-search/SKILL.md +67 -0
package/skills/biopython/SKILL.md +29 -0
package/skills/bioservices/SKILL.md +96 -0
package/skills/brainstorming/SKILL.md +104 -0
package/skills/cancel/SKILL.md +85 -0
package/skills/ccg/SKILL.md +87 -0
package/skills/celery-pipeline/SKILL.md +30 -0
package/skills/cellxgene-census/SKILL.md +104 -0
package/skills/child-pi-spawning/SKILL.md +85 -0
package/skills/cirq/SKILL.md +113 -0
package/skills/citation-management/SKILL.md +91 -0
package/skills/clinical-decision-support/SKILL.md +117 -0
package/skills/clinical-reports/SKILL.md +118 -0
package/skills/clinical-trial/SKILL.md +28 -0
package/skills/cobrapy/SKILL.md +116 -0
package/skills/configure-notifications/SKILL.md +85 -0
package/skills/consciousness-council/SKILL.md +120 -0
package/skills/context-artifact-hygiene/SKILL.md +85 -0
package/skills/context-mode-ops/SKILL.md +87 -0
package/skills/dask/SKILL.md +85 -0
package/skills/database-lookup/SKILL.md +118 -0
package/skills/datamol/SKILL.md +108 -0
package/skills/debug/SKILL.md +32 -0
package/skills/deep-dive/SKILL.md +114 -0
package/skills/deep-interview/SKILL.md +90 -0
package/skills/deepchem/SKILL.md +117 -0
package/skills/deepinit/SKILL.md +100 -0
package/skills/deeptools/SKILL.md +118 -0
package/skills/delegation-patterns/SKILL.md +56 -0
package/skills/depmap/SKILL.md +94 -0
package/skills/dhdna-profiler/SKILL.md +86 -0
package/skills/diffdock/SKILL.md +101 -0
package/skills/dispatching-parallel-agents/SKILL.md +119 -0
package/skills/dnanexus-integration/SKILL.md +118 -0
package/skills/do/SKILL.md +48 -0
package/skills/docker-sandbox/SKILL.md +29 -0
package/skills/docx/SKILL.md +119 -0
package/skills/esm/SKILL.md +116 -0
package/skills/etetoolkit/SKILL.md +103 -0
package/skills/event-log-tracing/SKILL.md +85 -0
package/skills/exa-search/SKILL.md +72 -0
package/skills/executing-plans/SKILL.md +69 -0
package/skills/exploratory-data-analysis/SKILL.md +118 -0
package/skills/external-context/SKILL.md +80 -0
package/skills/fastapi/SKILL.md +30 -0
package/skills/finishing-a-development-branch/SKILL.md +106 -0
package/skills/flowio/SKILL.md +114 -0
package/skills/fluidsim/SKILL.md +108 -0
package/skills/generate-image/SKILL.md +108 -0
package/skills/geniml/SKILL.md +117 -0
package/skills/geomaster/SKILL.md +109 -0
package/skills/geopandas/SKILL.md +114 -0
package/skills/get-available-resources/SKILL.md +100 -0
package/skills/gget/SKILL.md +111 -0
package/skills/ginkgo-cloud-lab/SKILL.md +52 -0
package/skills/git-master/SKILL.md +85 -0
package/skills/glycoengineering/SKILL.md +104 -0
package/skills/gtars/SKILL.md +104 -0
package/skills/hackernews-frontpage/SKILL.md +46 -0
package/skills/histolab/SKILL.md +98 -0
package/skills/how-it-works/SKILL.md +25 -0
package/skills/hud/SKILL.md +86 -0
package/skills/hugging-science/SKILL.md +93 -0
package/skills/huggingface/SKILL.md +30 -0
package/skills/hypogenic/SKILL.md +107 -0
package/skills/hypothesis-generation/SKILL.md +118 -0
package/skills/imaging-data-commons/SKILL.md +119 -0
package/skills/infographics/SKILL.md +102 -0
package/skills/iso-13485-certification/SKILL.md +114 -0
package/skills/knowledge-agent/SKILL.md +83 -0
package/skills/labarchive-integration/SKILL.md +98 -0
package/skills/lamindb/SKILL.md +119 -0
package/skills/landsat/SKILL.md +29 -0
package/skills/latchbio-integration/SKILL.md +118 -0
package/skills/latex-posters/SKILL.md +112 -0
package/skills/learn-codebase/SKILL.md +24 -0
package/skills/learner/SKILL.md +118 -0
package/skills/literature-review/SKILL.md +118 -0
package/skills/live-agent-lifecycle/SKILL.md +85 -0
package/skills/mailbox-interactive/SKILL.md +85 -0
package/skills/make-plan/SKILL.md +59 -0
package/skills/markdown-mermaid-writing/SKILL.md +118 -0
package/skills/market-research-reports/SKILL.md +119 -0
package/skills/markitdown/SKILL.md +111 -0
package/skills/markitdown-docs/SKILL.md +28 -0
package/skills/matchms/SKILL.md +91 -0
package/skills/matlab/SKILL.md +118 -0
package/skills/matplotlib/SKILL.md +30 -0
package/skills/mcp-setup/SKILL.md +84 -0
package/skills/medchem/SKILL.md +109 -0
package/skills/mem-search/SKILL.md +96 -0
package/skills/modal/SKILL.md +104 -0
package/skills/model-routing-context/SKILL.md +85 -0
package/skills/molecular-dynamics/SKILL.md +116 -0
package/skills/molfeat/SKILL.md +110 -0
package/skills/multi-perspective-review/SKILL.md +85 -0
package/skills/networkx/SKILL.md +111 -0
package/skills/neurokit2/SKILL.md +114 -0
package/skills/neuropixels-analysis/SKILL.md +112 -0
package/skills/nilearn/SKILL.md +29 -0
package/skills/observability-reliability/SKILL.md +43 -0
package/skills/omc-doctor/SKILL.md +86 -0
package/skills/omc-reference/SKILL.md +119 -0
package/skills/omc-setup/SKILL.md +82 -0
package/skills/omc-teams/SKILL.md +81 -0
package/skills/omero-integration/SKILL.md +111 -0
package/skills/open-notebook/SKILL.md +100 -0
package/skills/openephys/SKILL.md +28 -0
package/skills/opentrons-integration/SKILL.md +110 -0
package/skills/optimize-for-gpu/SKILL.md +119 -0
package/skills/orchestration/SKILL.md +85 -0
package/skills/ownership-session-security/SKILL.md +43 -0
package/skills/paper-lookup/SKILL.md +119 -0
package/skills/paperzilla/SKILL.md +114 -0
package/skills/parallel-web/SKILL.md +64 -0
package/skills/pathfinder/SKILL.md +114 -0
package/skills/pathml/SKILL.md +98 -0
package/skills/pdf/SKILL.md +113 -0
package/skills/peer-review/SKILL.md +119 -0
package/skills/pennylane/SKILL.md +119 -0
package/skills/phylogenetics/SKILL.md +102 -0
package/skills/pi-extension-lifecycle/SKILL.md +41 -0
package/skills/plan/SKILL.md +66 -0
package/skills/polars/SKILL.md +114 -0
package/skills/polars-bio/SKILL.md +84 -0
package/skills/pptx/SKILL.md +118 -0
package/skills/pptx-posters/SKILL.md +112 -0
package/skills/primekg/SKILL.md +97 -0
package/skills/project-session-manager/SKILL.md +85 -0
package/skills/protocolsio-integration/SKILL.md +119 -0
package/skills/pubmed-search/SKILL.md +29 -0
package/skills/pufferlib/SKILL.md +103 -0
package/skills/pydeseq2/SKILL.md +106 -0
package/skills/pydicom/SKILL.md +115 -0
package/skills/pyhealth/SKILL.md +117 -0
package/skills/pylabrobot/SKILL.md +100 -0
package/skills/pymatgen/SKILL.md +28 -0
package/skills/pymc/SKILL.md +108 -0
package/skills/pymoo/SKILL.md +90 -0
package/skills/pyopenms/SKILL.md +119 -0
package/skills/pysam/SKILL.md +118 -0
package/skills/pyspark/SKILL.md +30 -0
package/skills/pytdc/SKILL.md +102 -0
package/skills/pytorch/SKILL.md +31 -0
package/skills/pytorch-lightning/SKILL.md +119 -0
package/skills/pyzotero/SKILL.md +104 -0
package/skills/qiskit/SKILL.md +119 -0
package/skills/qutip/SKILL.md +111 -0
package/skills/ralph/SKILL.md +23 -0
package/skills/ralplan/SKILL.md +105 -0
package/skills/rdflib/SKILL.md +29 -0
package/skills/rdkit/SKILL.md +30 -0
package/skills/read-only-explorer/SKILL.md +85 -0
package/skills/receiving-code-review/SKILL.md +103 -0
package/skills/release/SKILL.md +117 -0
package/skills/remember/SKILL.md +39 -0
package/skills/requesting-code-review/SKILL.md +85 -0
package/skills/requirements-to-task-packet/SKILL.md +65 -0
package/skills/research-grants/SKILL.md +118 -0
package/skills/research-lookup/SKILL.md +117 -0
package/skills/research-reproducibility/SKILL.md +28 -0
package/skills/resource-discovery-config/SKILL.md +43 -0
package/skills/rowan/SKILL.md +100 -0
package/skills/runtime-state-reader/SKILL.md +46 -0
package/skills/safe-bash/SKILL.md +85 -0
package/skills/scanpy/SKILL.md +32 -0
package/skills/scholar-evaluation/SKILL.md +115 -0
package/skills/scientific-brainstorming/SKILL.md +118 -0
package/skills/scientific-critical-thinking/SKILL.md +119 -0
package/skills/scientific-schematics/SKILL.md +116 -0
package/skills/scientific-slides/SKILL.md +117 -0
package/skills/scientific-visualization/SKILL.md +109 -0
package/skills/scientific-writing/SKILL.md +119 -0
package/skills/scikit-bio/SKILL.md +92 -0
package/skills/scikit-learn/SKILL.md +99 -0
package/skills/scikit-survival/SKILL.md +110 -0
package/skills/sciomc/SKILL.md +86 -0
package/skills/scvelo/SKILL.md +106 -0
package/skills/scvi-tools/SKILL.md +114 -0
package/skills/seaborn/SKILL.md +97 -0
package/skills/secure-agent-orchestration-review/SKILL.md +47 -0
package/skills/self-improve/SKILL.md +119 -0
package/skills/semantic-compression/SKILL.md +62 -0
package/skills/setup/SKILL.md +42 -0
package/skills/shap/SKILL.md +103 -0
package/skills/simpy/SKILL.md +116 -0
package/skills/skill/SKILL.md +117 -0
package/skills/skill-search/SKILL.md +67 -0
package/skills/skillify/SKILL.md +46 -0
package/skills/smart-explore/SKILL.md +94 -0
package/skills/sqlite-pandas/SKILL.md +30 -0
package/skills/stable-baselines3/SKILL.md +86 -0
package/skills/state-mutation-locking/SKILL.md +44 -0
package/skills/statistical-analysis/SKILL.md +108 -0
package/skills/statsmodels/SKILL.md +29 -0
package/skills/subagent-driven-development/SKILL.md +89 -0
package/skills/sympy/SKILL.md +115 -0
package/skills/system-prompts/SKILL.md +116 -0
package/skills/systematic-debugging/SKILL.md +119 -0
package/skills/team/SKILL.md +85 -0
package/skills/test-driven-development/SKILL.md +84 -0
package/skills/tiledbvcf/SKILL.md +119 -0
package/skills/timeline-report/SKILL.md +85 -0
package/skills/timesfm-forecasting/SKILL.md +112 -0
package/skills/torch-geometric/SKILL.md +118 -0
package/skills/torchdrug/SKILL.md +118 -0
package/skills/trace/SKILL.md +118 -0
package/skills/transformers/SKILL.md +110 -0
package/skills/treatment-plans/SKILL.md +119 -0
package/skills/ui-render-performance/SKILL.md +41 -0
package/skills/ultragoal/SKILL.md +63 -0
package/skills/ultraqa/SKILL.md +85 -0
package/skills/ultrawork/SKILL.md +20 -0
package/skills/umap-learn/SKILL.md +119 -0
package/skills/usfiscaldata/SKILL.md +118 -0
package/skills/using-git-worktrees/SKILL.md +112 -0
package/skills/using-superpowers/SKILL.md +85 -0
package/skills/using-vetc/SKILL.md +92 -0
package/skills/vaex/SKILL.md +111 -0
package/skills/venue-templates/SKILL.md +113 -0
package/skills/verification-before-completion/SKILL.md +88 -0
package/skills/verification-before-done/SKILL.md +68 -0
package/skills/verify/SKILL.md +33 -0
package/skills/version-bump/SKILL.md +54 -0
package/skills/vetc-analyze-ba/SKILL.md +117 -0
package/skills/vetc-analyze-codebase/SKILL.md +118 -0
package/skills/vetc-api-design/SKILL.md +103 -0
package/skills/vetc-brainstorming/SKILL.md +116 -0
package/skills/vetc-change-proposal/SKILL.md +111 -0
package/skills/vetc-cicd/SKILL.md +113 -0
package/skills/vetc-continuous-learning/SKILL.md +115 -0
package/skills/vetc-deep-interview/SKILL.md +103 -0
package/skills/vetc-docgen/SKILL.md +108 -0
package/skills/vetc-frontend-patterns/SKILL.md +99 -0
package/skills/vetc-iterative-retrieval/SKILL.md +110 -0
package/skills/vetc-java-patterns/SKILL.md +113 -0
package/skills/vetc-meta-skill-creator/SKILL.md +99 -0
package/skills/vetc-oracle-patterns/SKILL.md +109 -0
package/skills/vetc-performance-testing/SKILL.md +104 -0
package/skills/vetc-pr-response/SKILL.md +106 -0
package/skills/vetc-ralph/SKILL.md +108 -0
package/skills/vetc-ralplan/SKILL.md +116 -0
package/skills/vetc-receiving-review/SKILL.md +106 -0
package/skills/vetc-reconcile-patterns/SKILL.md +117 -0
package/skills/vetc-refactoring/SKILL.md +96 -0
package/skills/vetc-runbook/SKILL.md +118 -0
package/skills/vetc-sast/SKILL.md +118 -0
package/skills/vetc-sdlc/SKILL.md +97 -0
package/skills/vetc-security/SKILL.md +117 -0
package/skills/vetc-spec-driven/SKILL.md +111 -0
package/skills/vetc-spec-quality/SKILL.md +117 -0
package/skills/vetc-systematic-debugging/SKILL.md +74 -0
package/skills/vetc-tdd/SKILL.md +96 -0
package/skills/vetc-thinking-pm/SKILL.md +110 -0
package/skills/vetc-ui-visual-qa/SKILL.md +117 -0
package/skills/vetc-verify/SKILL.md +101 -0
package/skills/visual-verdict/SKILL.md +59 -0
package/skills/what-if-oracle/SKILL.md +87 -0
package/skills/widget-rendering/SKILL.md +85 -0
package/skills/wiki/SKILL.md +69 -0
package/skills/workspace-isolation/SKILL.md +85 -0
package/skills/worktree-isolation/SKILL.md +85 -0
package/skills/wowerpoint/SKILL.md +101 -0
package/skills/writer-memory/SKILL.md +82 -0
package/skills/writing-plans/SKILL.md +115 -0
package/skills/writing-skills/SKILL.md +115 -0
package/skills/xgboost/SKILL.md +29 -0
package/skills/xgboost-ts/SKILL.md +28 -0
package/skills/xlsx/SKILL.md +111 -0
package/skills/zarr-python/SKILL.md +101 -0
package/src/categories.ts +383 -0
package/src/format.ts +104 -0
package/src/indexer.ts +101 -0
package/src/proactive.ts +51 -0
package/src/scanner.ts +85 -0
package/src/search.ts +89 -0
package/src/strip.ts +29 -0
package/src/synonyms.ts +83 -0
package/src/text.ts +118 -0
package/src/types.ts +64 -0

package/skills/hugging-science/SKILL.md ADDED Viewed

@@ -0,0 +1,93 @@
+---
+name: hugging-science
+description: Use when the user is doing AI/ML work in a scientific domain — biology, chemistry, physics, astronomy, climate, genomics, materials science, medicine, ecology, energy, conservation, engineering, mathematics, scientific reasoning, drug discovery, protein design, weather modeling, theorem proving, single-cell, PDE solving, or anything similar. Hugging Science (huggingscience.co) is a curated catalog of scientific datasets, models, blog posts, and interactive Spaces; the `hugging-science` org on Hugging Face hosts community datasets, models, and demo Spaces. This skill helps you discover the right resource AND actually use it — loading datasets via `datasets`, running models via `transformers` or the HF Inference API, calling Spaces like BoltzGen via `gradio_client`, and citing blog posts for methodology. Trigger this skill whenever a user mentions a scientific ML task, asks for "a dataset/model for X" where X is a scientific topic, wants to fine-tune on scientific data, asks about protein / molecule / genome / climate / materials / astronomy / pathology / weather ML, or needs AI tools for research — even if they never say "Hugging Science" explicitly. The catalog is purpose-built for LLM agents (it ships an `llms-full.txt`); prefer it over generic web search for these tasks.
+---
+# Hugging Science
+Hugging Science is a curated, LLM-friendly index of scientific datasets, models, blog posts, and interactive demos for ML researchers. Use it when a scientific ML question lands in front of you — it's much higher signal than generic search and the entries are pre-filtered for quality and openness.
+There are two related surfaces, and you should use both:
+- **The catalog at `huggingscience.co`** — a static, parseable index of resources across 17 scientific domains. It exposes `llms.txt` (compact), `llms-full.txt` (full content), and `topics/<slug>.md` (per-domain). These are markdown files designed to be fetched and read.
+- **The `hugging-science` Hugging Face organization** — `huggingface.co/hugging-science` — community-submitted datasets, a few models, and ~27 interactive Spaces (notably BoltzGen for protein/binder design, Dataset Quest for submissions, and Science Release Heatmap for ecosystem visualization).
+The catalog *points to* resources hosted on the broader Hugging Face Hub. So an entry like `arcinstitute/opengenome2` is a regular HF dataset that you load with the `datasets` library; an entry like `facebook/esm2_t33_650M_UR50D` is a regular HF model you load with `transformers`. The catalog's job is curation and discovery; usage goes through standard Hugging Face APIs.
+## When to use this skill
+Engage this skill when the user's task involves AI/ML applied to science. Common signals:
+- Names a scientific domain (protein, genome, molecule, crystal, weather, climate, galaxy, EEG, microbiome, pathology, plasma, …)
+- Asks "is there a dataset/model for X" where X is scientific
+- Wants to fine-tune on scientific data, evaluate on scientific benchmarks, or reproduce a scientific ML paper
+- Asks about specific known scientific models (Evo-2, ESM2, BoltzGen, Nucleotide Transformer, AlphaFold-derived, etc.)
+- Needs an interactive demo for a scientific task (binder design, theorem proving, etc.)
+If the task is generic ML (recommendation systems, chatbot RAG, vision on cats and dogs), this skill is **not** the right tool — defer to general HF Hub knowledge instead.
+## Core workflow
+Most invocations follow this five-step loop. Don't skip discovery — the value of Hugging Science is that it has already filtered hundreds of resources down to high-signal picks per domain.
+### 1. Identify the domain(s)
+Map the user's task to one or more of the 17 topic slugs:
+`astronomy` · `benchmark` · `biology` · `biotechnology` · `chemistry` · `climate` · `conservation` · `earth-science` · `ecology` · `energy` · `engineering` · `genomics` · `materials-science` · `mathematics` · `medicine` · `physics` · `scientific-reasoning`
+Some tasks span multiple topics (e.g., drug discovery → `chemistry` + `biology` + `medicine`). Fetch each relevant topic.
+### 2. Fetch the relevant catalog content
+Use the bundled script for clean, structured access:
+```bash
+python scripts/fetch_catalog.py topic biology
+python scripts/fetch_catalog.py topic materials-science --filter models
+python scripts/fetch_catalog.py search "protein language model"
+python scripts/fetch_catalog.py all     # full llms-full.txt
+```
+You can also fetch the raw markdown directly:
+- `https://huggingscience.co/llms.txt` — compact index
+- `https://huggingscience.co/llms-full.txt` — every entry, every domain
+### 3. Pick the right resource(s)
+Read the descriptions and tags. Match to the user's task with judgment, not keyword overlap. Things to weigh:
+- **Scale fit** — Evo-2 40B is overkill for a quick sequence classification on a laptop; ESM2 35M might be perfect.
+- **License and access** — most are open, but check the underlying HF model card.
+- **Modality alignment** — DNA vs. protein vs. SMILES vs. crystal structure; many "biology" models are not interchangeable.
+- **Recency / supersession** — if both an older and newer entry cover the same task, prefer newer unless there's a reason not to.
+If you're not sure which resource to pick, briefly present the top 2–3 candidates to the user with their tradeoffs, then proceed once they choose. Don't pick silently when the choice materially changes the work.
+For domain-specific go-to picks (the "if in doubt, start here" entries), see `(see docs)`.
+### 4. Use the resource
+The mechanics depend on resource type. Read the matching reference file before writing code:
+- **Datasets** → `(see docs)` — loading via `datasets`, streaming for huge corpora, common columns, splits
+- **Models** → `(see docs)` — local `transformers`, Hugging Face Inference API, Inference Providers for very large models, GPU sizing
+- **Spaces (interactive demos)** → `(see docs)` — `gradio_client` pattern with a worked BoltzGen example
+The reference files are short and focused. If you're already fluent in the relevant API, skim; if not, read fully before writing code. The patterns are different from generic HF usage in a few important places (e.g., `trust_remote_code` requirements, scientific-data dtype gotchas).
+### 5. Cite the methodology
+When the catalog has a blog post matching the task (`Type: blog` or in the Blog Posts section of a topic file), include its URL when you explain your approach to the user. Methodology blogs are written by the dataset/model authors and answer "why this design" questions that model cards usually skip. Treat them like citations — a one-line "see <link> for the methodology behind X" is plenty.
+## Authentication: HF_TOKEN
+Many catalog resources are gated (clinical data, large foundation models, private Spaces). Authenticate via the `HF_TOKEN` environment variable.
+**Load `HF_TOKEN` from a `.env` file when available** — that's where the user keeps secrets. Use `python-dotenv` at the top of any script that hits the HF API:
+```python
+from dotenv import load_dotenv
+load_dotenv()    # picks up HF_TOKEN from .env in cwd or any parent dir
+```

package/skills/huggingface/SKILL.md ADDED Viewed

@@ -0,0 +1,30 @@
+---
+name: huggingface
+description: Machine learning model hub and transformer inference. Use when loading pretrained models, running NLP tasks, text generation, image classification, or working with transformers, tokenizers, and model hubs. Trigger on imports of transformers, datasets, torch, or mentions of BERT, GPT, fine-tuning, NLP, model hub, scikit, sklearn.
+---
+# huggingface
+Use this skill for working with pretrained ML models and transformers.
+## Core patterns
+- **Pipeline**: `pipeline('text-classification', model='bert-base-uncased')` for quick inference.
+- **Tokenizer + Model**: `AutoTokenizer.from_pretrained()` + `AutoModelForSequenceClassification.from_pretrained()`.
+- **Fine-tuning**: `Trainer(model, args, train_dataset, eval_dataset)` with `TrainingArguments`.
+- **Datasets**: `load_dataset('squad')` for benchmark datasets.
+- **GPU**: `model = model.to('cuda')` + `inputs = tokenizer(text, return_tensors='pt').to('cuda')`.
+## Rules
+- Always specify `revision` when loading models in production — default `main` can change.
+- Use `torch.no_grad()` for inference to save memory.
+- Tokenize with `truncation=True, max_length=512` to prevent oversized inputs.
+- For custom training, implement `compute_metrics()` for evaluation during training.
+## Anti-patterns
+- Don't load models without checking GPU memory — large models OOM silently.
+- Don't use `pipeline()` without specifying model — defaults change between versions.
+- Don't fine-tune on raw text — tokenize and format as Dataset first.

package/skills/hypogenic/SKILL.md ADDED Viewed

@@ -0,0 +1,107 @@
+---
+name: hypogenic
+description: Automated LLM-driven hypothesis generation and testing on tabular datasets. Use when you want to systematically explore hypotheses about patterns in empirical data (e.g., deception detection, content analysis). Combines literature insights with data-driven hypothesis testing. For manual hypothesis formulation use hypothesis-generation; for creative ideation use scientific-brainstorming.
+---
+# Hypogenic
+## Overview
+Hypogenic provides automated hypothesis generation and testing using large language models to accelerate scientific discovery. The framework supports three approaches: HypoGeniC (data-driven hypothesis generation), HypoRefine (synergistic literature and data integration), and Union methods (mechanistic combination of literature and data-driven hypotheses).
+## Quick Start
+Get started with Hypogenic in minutes:
+```bash
+# Clone example datasets
+git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data
+# Run basic hypothesis generation
+hypogenic_generation --config ./data/your_task/config.yaml --method hypogenic --num_hypotheses 20
+# Run inference on generated hypotheses
+hypogenic_inference --config ./data/your_task/config.yaml --hypotheses output/hypotheses.json
+```
+**Or use Python API:**
+```python
+from hypogenic import BaseTask
+# Create task with your configuration
+task = BaseTask(config_path="./data/your_task/config.yaml")
+# Generate hypotheses
+task.generate_hypotheses(method="hypogenic", num_hypotheses=20)
+# Run inference
+results = task.inference(hypothesis_bank="./output/hypotheses.json")
+```
+## When to Use This Skill
+Use this skill when working on:
+- Generating scientific hypotheses from observational datasets
+- Testing multiple competing hypotheses systematically
+- Combining literature insights with empirical patterns
+- Accelerating research discovery through automated hypothesis ideation
+- Domains requiring hypothesis-driven analysis: deception detection, AI-generated content identification, mental health indicators, predictive modeling, or other empirical research
+## Key Features
+**Automated Hypothesis Generation**
+- Generate 10-20+ testable hypotheses from data in minutes
+- Iterative refinement based on validation performance
+- Support for both API-based (OpenAI, Anthropic) and local LLMs
+**Literature Integration**
+- Extract insights from research papers via PDF processing
+- Combine theoretical foundations with empirical patterns
+- Systematic literature-to-hypothesis pipeline with GROBID
+**Performance Optimization**
+- Redis caching reduces API costs for repeated experiments
+- Parallel processing for large-scale hypothesis testing
+## Core Capabilities
+### 1. HypoGeniC: Data-Driven Hypothesis Generation
+Generate hypotheses solely from observational data through iterative refinement.
+**Process:**
+1. Initialize with a small data subset to generate candidate hypotheses
+2. Iteratively refine hypotheses based on performance
+3. Replace poorly-performing hypotheses with new ones from challenging examples
+**Best for:** Exploratory research without existing literature, pattern discovery in novel datasets
+### 2. HypoRefine: Literature and Data Integration
+Synergistically combine existing literature with empirical data through an agentic framework.
+**Process:**
+1. Extract insights from relevant research papers (typically 10 papers)
+2. Generate theory-grounded hypotheses from literature
+3. Generate data-driven hypotheses from observational patterns
+4. Refine both hypothesis banks through iterative improvement
+**Best for:** Research with established theoretical foundations, validating or extending existing theories
+### 3. Union Methods
+Mechanistically combine literature-only hypotheses with framework outputs.
+**Variants:**
+- **Literature ∪ HypoGeniC**: Combines literature hypotheses with data-driven generation
+- **Literature ∪ HypoRefine**: Combines literature hypotheses with integrated approach
+**Best for:** Comprehensive hypothesis coverage, eliminating redundancy while maintaining diverse perspectives
+# For HypoGeniC examples
+git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data
+# For HypoRefine/Union examples

package/skills/hypothesis-generation/SKILL.md ADDED Viewed

@@ -0,0 +1,118 @@
+---
+name: hypothesis-generation
+description: Structured hypothesis formulation from observations. Use when you have experimental observations or data and need to formulate testable hypotheses with predictions, propose mechanisms, and design experiments to test them. Follows scientific method framework. For open-ended ideation use scientific-brainstorming; for automated LLM-driven hypothesis testing on datasets use hypogenic.
+---
+# Scientific Hypothesis Generation
+## Overview
+Hypothesis generation is a systematic process for developing testable explanations. Formulate evidence-based hypotheses from observations, design experiments, explore competing explanations, and develop predictions. Apply this skill for scientific inquiry across domains.
+## When to Use This Skill
+This skill should be used when:
+- Developing hypotheses from observations or preliminary data
+- Designing experiments to test scientific questions
+- Exploring competing explanations for phenomena
+- Formulating testable predictions for research
+- Conducting literature-based hypothesis generation
+- Planning mechanistic studies across scientific domains
+## Visual Enhancement with Scientific Schematics
+**⚠️ MANDATORY: Every hypothesis generation report MUST include at least 1-2 AI-generated figures using the scientific-schematics skill.**
+This is not optional. Hypothesis reports without visual elements are incomplete. Before finalizing any document:
+1. Generate at minimum ONE schematic or diagram (e.g., hypothesis framework showing competing explanations)
+2. Prefer 2-3 figures for comprehensive reports (mechanistic pathway, experimental design flowchart, prediction decision tree)
+**How to generate figures:**
+- Use the **scientific-schematics** skill to generate AI-powered publication-quality diagrams
+- Simply describe your desired diagram in natural language
+- Nano Banana Pro will automatically generate, review, and refine the schematic
+**How to generate schematics:**
+## Workflow
+Follow this systematic process to generate robust scientific hypotheses:
+### 1. Understand the Phenomenon
+Start by clarifying the observation, question, or phenomenon that requires explanation:
+- Identify the core observation or pattern that needs explanation
+- Define the scope and boundaries of the phenomenon
+- Note any constraints or specific contexts
+- Clarify what is already known vs. what is uncertain
+- Identify the relevant scientific domain(s)
+### 2. Conduct Comprehensive Literature Search
+Search existing scientific literature to ground hypotheses in current evidence. Use both PubMed (for biomedical topics) and general web search (for broader scientific domains):
+**For biomedical topics:**
+- Use WebFetch with PubMed URLs to access relevant literature
+- Search for recent reviews, meta-analyses, and primary research
+- Look for similar phenomena, related mechanisms, or analogous systems
+**For all scientific domains:**
+- Use WebSearch to find recent papers, preprints, and reviews
+- Search for established theories, mechanisms, or frameworks
+- Identify gaps in current understanding
+**Search strategy:**
+### 3. Synthesize Existing Evidence
+Analyze and integrate findings from literature search:
+- Summarize current understanding of the phenomenon
+- Identify established mechanisms or theories that may apply
+- Note conflicting evidence or alternative viewpoints
+- Recognize gaps, limitations, or unanswered questions
+- Identify analogies from related systems or domains
+### 4. Generate Competing Hypotheses
+Develop 3-5 distinct hypotheses that could explain the phenomenon. Each hypothesis should:
+- Provide a mechanistic explanation (not just description)
+- Be distinguishable from other hypotheses
+- Draw on evidence from the literature synthesis
+- Consider different levels of explanation (molecular, cellular, systemic, population, etc.)
+**Strategies for generating hypotheses:**
+- Apply known mechanisms from analogous systems
+- Consider multiple causative pathways
+- Explore different scales of explanation
+- Question assumptions in existing explanations
+- Combine mechanisms in novel ways
+### 5. Evaluate Hypothesis Quality
+Assess each hypothesis against established quality criteria from `(see docs)`:
+**Testability:** Can the hypothesis be empirically tested?
+**Falsifiability:** What observations would disprove it?
+**Parsimony:** Is it the simplest explanation that fits the evidence?
+**Explanatory Power:** How much of the phenomenon does it explain?
+**Scope:** What range of observations does it cover?
+**Consistency:** Does it align with established principles?
+**Novelty:** Does it offer new insights beyond existing explanations?
+Explicitly note the strengths and weaknesses of each hypothesis.
+### 6. Design Experimental Tests
+For each viable hypothesis, propose specific experiments or studies to test it. Consult `(see docs)` for common approaches:
+**Experimental design elements:**
+- What would be measured or observed?
+- What comparisons or controls are needed?
+- What methods or techniques would be used?
+- What sample sizes or statistical approaches are appropriate?
+- What are potential confounds and how to address them?

package/skills/imaging-data-commons/SKILL.md ADDED Viewed

@@ -0,0 +1,119 @@
+---
+name: imaging-data-commons
+description: Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Use for accessing large-scale radiology (CT, MR, PET) and pathology datasets for AI training or research. No authentication required. Query by metadata, visualize in browser, check licenses.
+---
+# Imaging Data Commons
+## Overview
+Use the `idc-index` Python package to query and download public cancer imaging data from the National Cancer Institute Imaging Data Commons (IDC). No authentication required for data access.
+**Current IDC Data Version: v23** (always verify with `IDCClient().get_idc_version()`)
+**Primary tool:** `idc-index` ([GitHub](https://github.com/imagingdatacommons/idc-index))
+**CRITICAL - Check package version and upgrade if needed (run this FIRST):**
+```python
+import idc_index
+REQUIRED_VERSION = "0.11.14"  # Must match metadata.idc-index in this file
+installed = idc_index.__version__
+# Verify IDC data version (should be "v23")
+print(f"IDC data version: {client.get_idc_version()}")
+# Get collection count and total series
+stats = client.sql_query("""
+    SELECT
+        COUNT(DISTINCT collection_id) as collections,
+        COUNT(DISTINCT analysis_result_id) as analysis_results,
+        COUNT(DISTINCT PatientID) as patients,
+        COUNT(DISTINCT StudyInstanceUID) as studies,
+        COUNT(DISTINCT SeriesInstanceUID) as series,
+        SUM(instanceCount) as instances,
+        SUM(series_size_MB)/1000000 as size_TB
+    FROM index
+""")
+print(stats)
+```
+## When to Use This Skill
+- Finding publicly available radiology (CT, MR, PET) or pathology (slide microscopy) images
+- Selecting image subsets by cancer type, modality, anatomical site, or other metadata
+- Downloading DICOM data from IDC
+- Checking data licenses before use in research or commercial applications
+- Visualizing medical images in a browser without local DICOM viewer software
+## Quick Navigation
+**Core Sections (inline):**
+- IDC Data Model - Collection and analysis result hierarchy
+- Index Tables - Available tables and joining patterns
+- Installation - Package setup and version verification
+- Core Capabilities - Essential API patterns (query, download, visualize, license, citations, batch)
+- Best Practices - Usage guidelines
+- Troubleshooting - Common issues and solutions
+**Reference Guides (load on demand):**
+| Guide | When to Load |
+|-------|--------------|
+| `index_tables_guide.md` | Complex JOINs, schema discovery, DataFrame access |
+## IDC Data Model
+IDC adds two grouping levels above the standard DICOM hierarchy (Patient → Study → Series → Instance):
+- **collection_id**: Groups patients by disease, modality, or research focus (e.g., `tcga_luad`, `nlst`). A patient belongs to exactly one collection.
+- **analysis_result_id**: Identifies derived objects (segmentations, annotations, radiomics features) across one or more original collections.
+Use `collection_id` to find original imaging data, may include annotations deposited along with the images; use `analysis_result_id` to find AI-generated or expert annotations.
+**Key identifiers for queries:**
+| Identifier | Scope | Use for |
+|------------|-------|---------|
+| `collection_id` | Dataset grouping | Filtering by project/study |
+| `PatientID` | Patient | Grouping images by patient |
+| `StudyInstanceUID` | DICOM study | Grouping of related series, visualization |
+| `SeriesInstanceUID` | DICOM series | Grouping of related series, visualization |
+## Index Tables
+The `idc-index` package provides multiple metadata index tables, accessible via SQL or as pandas DataFrames.
+**Complete index table documentation:** Use https://idc-index.readthedocs.io/en/latest/indices_reference.html for quick check of available tables and columns without executing any code.
+**Important:** Use `client.indices_overview` to get current table descriptions and column schemas. This is the authoritative source for available columns and their types — always query it when writing SQL or exploring data structure.
+### Available Tables
+| Table | Row Granularity | Loaded | Description |
+|-------|-----------------|--------|-------------|
+| `index` | 1 row = 1 DICOM series | Auto | Primary metadata for all current IDC data |
+| `prior_versions_index` | 1 row = 1 DICOM series | Auto | Series from previous IDC releases; for downloading deprecated data |
+| `collections_index` | 1 row = 1 collection | fetch_index() | Collection-level metadata and descriptions |
+| `analysis_results_index` | 1 row = 1 analysis result collection | fetch_index() | Metadata about derived datasets (annotations, segmentations) |
+| `clinical_index` | 1 row = 1 clinical data column | fetch_index() | Dictionary mapping clinical table columns to collections |
+| `sm_index` | 1 row = 1 slide microscopy series | fetch_index() | Slide Microscopy (pathology) series metadata |
+| `sm_instance_index` | 1 row = 1 slide microscopy instance | fetch_index() | Instance-level (SOPInstanceUID) metadata for slide microscopy |
+| `seg_index` | 1 row = 1 DICOM Segmentation series | fetch_index() | Segmentation metadata: algorithm, segment count, reference to source image series |
+| `ann_index` | 1 row = 1 DICOM ANN series | fetch_index() | Microscopy Bulk Simple Annotations series metadata; references annotated image series |
+| `ann_group_index` | 1 row = 1 annotation group | fetch_index() | Detailed annotation group metadata: graphic type, annotation count, property codes, algorithm |
+| `contrast_index` | 1 row = 1 series with contrast info | fetch_index() | Contrast agent metadata: agent name, ingredient, administration route (CT, MR, PT, XA, RF) |
+### Joining Tables
+**Key columns are not explicitly labeled, the following is a subset that can be used in joins.**
+| Join Column | Tables | Use Case |
+|-------------|--------|----------|
+| `collection_id` | index, prior_versions_index, collections_index, clinical_index | Link series to collection metadata or clinical data |
+| `SeriesInstanceUID` | index, prior_versions_index, sm_index, sm_instance_index | Link series across tables; connect to slide microscopy details |
+| `StudyInstanceUID` | index, prior_versions_index | Link studies across current and historical data |
+| `PatientID` | index, prior_versions_index | Link patients across current and historical data |

package/skills/infographics/SKILL.md ADDED Viewed

@@ -0,0 +1,102 @@
+---
+name: infographics
+description: "Create professional infographics using Nano Banana Pro AI with smart iterative refinement. Uses Gemini 3 Pro for quality review. Integrates research-lookup and web search for accurate data. Supports 10 infographic types, 8 industry styles, and colorblind-safe palettes."
+---
+# Infographics
+## Overview
+Infographics are visual representations of information, data, or knowledge designed to present complex content quickly and clearly. **This skill uses Nano Banana Pro AI for infographic generation with Gemini 3 Pro quality review and Perplexity Sonar for research.**
+**How it works:**
+- (Optional) **Research phase**: Gather accurate facts and statistics using Perplexity Sonar
+- Describe your infographic in natural language
+- Nano Banana Pro generates publication-quality infographics automatically
+- **Gemini 3 Pro reviews quality** against document-type thresholds
+- **Smart iteration**: Only regenerates if quality is below threshold
+- Professional-ready output in minutes
+- No design skills required
+**Quality Thresholds by Document Type:**
+| Document Type | Threshold | Description |
+## Quick Start
+Generate any infographic by simply describing it:
+```bash
+# Generate a list infographic (default threshold 7.5/10)
+python skills/infographics/scripts/generate_infographic.py \
+  "5 benefits of regular exercise" \
+  -o figures/exercise_benefits.png --type list
+# Generate for marketing (highest threshold: 8.5/10)
+python skills/infographics/scripts/generate_infographic.py \
+  "Product features comparison" \
+  -o figures/product_comparison.png --type comparison --doc-type marketing
+# Generate with corporate style
+python skills/infographics/scripts/generate_infographic.py \
+  "Company milestones 2010-2025" \
+  -o figures/timeline.png --type timeline --style corporate
+# Generate with colorblind-safe palette
+python skills/infographics/scripts/generate_infographic.py \
+  "Heart disease statistics worldwide" \
+  -o figures/health_stats.png --type statistical --palette wong
+# Generate WITH RESEARCH for accurate, up-to-date data
+python skills/infographics/scripts/generate_infographic.py \
+  "Global AI market size and growth projections" \
+  -o figures/ai_market.png --type statistical --research
+```
+**What happens behind the scenes:**
+1. **(Optional) Research**: Perplexity Sonar gathers accurate facts, statistics, and data
+2. **Generation 1**: Nano Banana Pro creates initial infographic following design best practices
+3. **Review 1**: **Gemini 3 Pro** evaluates quality against document-type threshold
+4. **Decision**: If quality >= threshold → **DONE** (no more iterations needed!)
+5. **If below threshold**: Improved prompt based on critique, regenerate
+6. **Repeat**: Until quality meets threshold OR max iterations reached
+**Smart Iteration Benefits:**
+## When to Use This Skill
+Use the **infographics** skill when:
+- Presenting data or statistics in a visual format
+- Creating timeline visualizations for project milestones or history
+- Explaining processes, workflows, or step-by-step guides
+- Comparing options, products, or concepts side-by-side
+- Summarizing key points in an engaging visual format
+- Creating geographic or map-based data visualizations
+- Building hierarchical or organizational charts
+- Designing social media content or marketing materials
+**Use scientific-schematics instead for:**
+- Technical flowcharts and circuit diagrams
+- Biological pathways and molecular diagrams
+## Research Integration
+### Automatic Data Gathering (`--research`)
+When creating infographics that require accurate, up-to-date data, use the `--research` flag to automatically gather facts and statistics using **Perplexity Sonar Pro**.
+```bash
+# Research and generate statistical infographic
+python skills/infographics/scripts/generate_infographic.py \
+  "Global renewable energy adoption rates by country" \
+  -o figures/renewable_energy.png --type statistical --research
+# Research for timeline infographic
+python skills/infographics/scripts/generate_infographic.py \
+  "History of artificial intelligence breakthroughs" \
+  -o figures/ai_history.png --type timeline --research
+# Research for comparison infographic
+python skills/infographics/scripts/generate_infographic.py \
+  "Electric vehicles vs hydrogen vehicles comparison" \
+  -o figures/ev_hydrogen.png --type comparison --research
+```