@synsci/cli-darwin-x64-baseline 1.1.76 → 1.1.78
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/skills/adaptyv/SKILL.md +114 -0
- package/bin/skills/adaptyv/reference/api_reference.md +308 -0
- package/bin/skills/adaptyv/reference/examples.md +913 -0
- package/bin/skills/adaptyv/reference/experiments.md +360 -0
- package/bin/skills/adaptyv/reference/protein_optimization.md +637 -0
- package/bin/skills/aeon/SKILL.md +374 -0
- package/bin/skills/aeon/references/anomaly_detection.md +154 -0
- package/bin/skills/aeon/references/classification.md +144 -0
- package/bin/skills/aeon/references/clustering.md +123 -0
- package/bin/skills/aeon/references/datasets_benchmarking.md +387 -0
- package/bin/skills/aeon/references/distances.md +256 -0
- package/bin/skills/aeon/references/forecasting.md +140 -0
- package/bin/skills/aeon/references/networks.md +289 -0
- package/bin/skills/aeon/references/regression.md +118 -0
- package/bin/skills/aeon/references/segmentation.md +163 -0
- package/bin/skills/aeon/references/similarity_search.md +187 -0
- package/bin/skills/aeon/references/transformations.md +246 -0
- package/bin/skills/alphafold-database/SKILL.md +513 -0
- package/bin/skills/alphafold-database/references/api_reference.md +423 -0
- package/bin/skills/anndata/SKILL.md +400 -0
- package/bin/skills/anndata/references/best_practices.md +525 -0
- package/bin/skills/anndata/references/concatenation.md +396 -0
- package/bin/skills/anndata/references/data_structure.md +314 -0
- package/bin/skills/anndata/references/io_operations.md +404 -0
- package/bin/skills/anndata/references/manipulation.md +516 -0
- package/bin/skills/arboreto/SKILL.md +243 -0
- package/bin/skills/arboreto/references/algorithms.md +138 -0
- package/bin/skills/arboreto/references/basic_inference.md +151 -0
- package/bin/skills/arboreto/references/distributed_computing.md +242 -0
- package/bin/skills/arboreto/scripts/basic_grn_inference.py +97 -0
- package/bin/skills/astropy/SKILL.md +331 -0
- package/bin/skills/astropy/references/coordinates.md +273 -0
- package/bin/skills/astropy/references/cosmology.md +307 -0
- package/bin/skills/astropy/references/fits.md +396 -0
- package/bin/skills/astropy/references/tables.md +489 -0
- package/bin/skills/astropy/references/time.md +404 -0
- package/bin/skills/astropy/references/units.md +178 -0
- package/bin/skills/astropy/references/wcs_and_other_modules.md +373 -0
- package/bin/skills/benchling-integration/SKILL.md +480 -0
- package/bin/skills/benchling-integration/references/api_endpoints.md +883 -0
- package/bin/skills/benchling-integration/references/authentication.md +379 -0
- package/bin/skills/benchling-integration/references/sdk_reference.md +774 -0
- package/bin/skills/biopython/SKILL.md +443 -0
- package/bin/skills/biopython/references/advanced.md +577 -0
- package/bin/skills/biopython/references/alignment.md +362 -0
- package/bin/skills/biopython/references/blast.md +455 -0
- package/bin/skills/biopython/references/databases.md +484 -0
- package/bin/skills/biopython/references/phylogenetics.md +566 -0
- package/bin/skills/biopython/references/sequence_io.md +285 -0
- package/bin/skills/biopython/references/structure.md +564 -0
- package/bin/skills/biorxiv-database/SKILL.md +483 -0
- package/bin/skills/biorxiv-database/references/api_reference.md +280 -0
- package/bin/skills/biorxiv-database/scripts/biorxiv_search.py +445 -0
- package/bin/skills/bioservices/SKILL.md +361 -0
- package/bin/skills/bioservices/references/identifier_mapping.md +685 -0
- package/bin/skills/bioservices/references/services_reference.md +636 -0
- package/bin/skills/bioservices/references/workflow_patterns.md +811 -0
- package/bin/skills/bioservices/scripts/batch_id_converter.py +347 -0
- package/bin/skills/bioservices/scripts/compound_cross_reference.py +378 -0
- package/bin/skills/bioservices/scripts/pathway_analysis.py +309 -0
- package/bin/skills/bioservices/scripts/protein_analysis_workflow.py +408 -0
- package/bin/skills/brenda-database/SKILL.md +719 -0
- package/bin/skills/brenda-database/references/api_reference.md +537 -0
- package/bin/skills/brenda-database/scripts/brenda_queries.py +844 -0
- package/bin/skills/brenda-database/scripts/brenda_visualization.py +772 -0
- package/bin/skills/brenda-database/scripts/enzyme_pathway_builder.py +1053 -0
- package/bin/skills/cellxgene-census/SKILL.md +511 -0
- package/bin/skills/cellxgene-census/references/census_schema.md +182 -0
- package/bin/skills/cellxgene-census/references/common_patterns.md +351 -0
- package/bin/skills/chembl-database/SKILL.md +389 -0
- package/bin/skills/chembl-database/references/api_reference.md +272 -0
- package/bin/skills/chembl-database/scripts/example_queries.py +278 -0
- package/bin/skills/cirq/SKILL.md +346 -0
- package/bin/skills/cirq/references/building.md +307 -0
- package/bin/skills/cirq/references/experiments.md +572 -0
- package/bin/skills/cirq/references/hardware.md +515 -0
- package/bin/skills/cirq/references/noise.md +515 -0
- package/bin/skills/cirq/references/simulation.md +350 -0
- package/bin/skills/cirq/references/transformation.md +416 -0
- package/bin/skills/clinicaltrials-database/SKILL.md +507 -0
- package/bin/skills/clinicaltrials-database/references/api_reference.md +358 -0
- package/bin/skills/clinicaltrials-database/scripts/query_clinicaltrials.py +215 -0
- package/bin/skills/clinpgx-database/SKILL.md +638 -0
- package/bin/skills/clinpgx-database/references/api_reference.md +757 -0
- package/bin/skills/clinpgx-database/scripts/query_clinpgx.py +518 -0
- package/bin/skills/clinvar-database/SKILL.md +362 -0
- package/bin/skills/clinvar-database/references/api_reference.md +227 -0
- package/bin/skills/clinvar-database/references/clinical_significance.md +218 -0
- package/bin/skills/clinvar-database/references/data_formats.md +358 -0
- package/bin/skills/cobrapy/SKILL.md +463 -0
- package/bin/skills/cobrapy/references/api_quick_reference.md +655 -0
- package/bin/skills/cobrapy/references/workflows.md +593 -0
- package/bin/skills/cosmic-database/SKILL.md +336 -0
- package/bin/skills/cosmic-database/references/cosmic_data_reference.md +220 -0
- package/bin/skills/cosmic-database/scripts/download_cosmic.py +231 -0
- package/bin/skills/dask/SKILL.md +456 -0
- package/bin/skills/dask/references/arrays.md +497 -0
- package/bin/skills/dask/references/bags.md +468 -0
- package/bin/skills/dask/references/best-practices.md +277 -0
- package/bin/skills/dask/references/dataframes.md +368 -0
- package/bin/skills/dask/references/futures.md +541 -0
- package/bin/skills/dask/references/schedulers.md +504 -0
- package/bin/skills/datacommons-client/SKILL.md +255 -0
- package/bin/skills/datacommons-client/references/getting_started.md +417 -0
- package/bin/skills/datacommons-client/references/node.md +250 -0
- package/bin/skills/datacommons-client/references/observation.md +185 -0
- package/bin/skills/datacommons-client/references/resolve.md +246 -0
- package/bin/skills/datamol/SKILL.md +706 -0
- package/bin/skills/datamol/references/conformers_module.md +131 -0
- package/bin/skills/datamol/references/core_api.md +130 -0
- package/bin/skills/datamol/references/descriptors_viz.md +195 -0
- package/bin/skills/datamol/references/fragments_scaffolds.md +174 -0
- package/bin/skills/datamol/references/io_module.md +109 -0
- package/bin/skills/datamol/references/reactions_data.md +218 -0
- package/bin/skills/deepchem/SKILL.md +597 -0
- package/bin/skills/deepchem/references/api_reference.md +303 -0
- package/bin/skills/deepchem/references/workflows.md +491 -0
- package/bin/skills/deepchem/scripts/graph_neural_network.py +338 -0
- package/bin/skills/deepchem/scripts/predict_solubility.py +224 -0
- package/bin/skills/deepchem/scripts/transfer_learning.py +375 -0
- package/bin/skills/deeptools/SKILL.md +531 -0
- package/bin/skills/deeptools/assets/quick_reference.md +58 -0
- package/bin/skills/deeptools/references/effective_genome_sizes.md +116 -0
- package/bin/skills/deeptools/references/normalization_methods.md +410 -0
- package/bin/skills/deeptools/references/tools_reference.md +533 -0
- package/bin/skills/deeptools/references/workflows.md +474 -0
- package/bin/skills/deeptools/scripts/validate_files.py +195 -0
- package/bin/skills/deeptools/scripts/workflow_generator.py +454 -0
- package/bin/skills/denario/SKILL.md +215 -0
- package/bin/skills/denario/references/examples.md +494 -0
- package/bin/skills/denario/references/installation.md +213 -0
- package/bin/skills/denario/references/llm_configuration.md +265 -0
- package/bin/skills/denario/references/research_pipeline.md +471 -0
- package/bin/skills/diffdock/SKILL.md +483 -0
- package/bin/skills/diffdock/assets/batch_template.csv +4 -0
- package/bin/skills/diffdock/assets/custom_inference_config.yaml +90 -0
- package/bin/skills/diffdock/references/confidence_and_limitations.md +182 -0
- package/bin/skills/diffdock/references/parameters_reference.md +163 -0
- package/bin/skills/diffdock/references/workflows_examples.md +392 -0
- package/bin/skills/diffdock/scripts/analyze_results.py +334 -0
- package/bin/skills/diffdock/scripts/prepare_batch_csv.py +254 -0
- package/bin/skills/diffdock/scripts/setup_check.py +278 -0
- package/bin/skills/dnanexus-integration/SKILL.md +383 -0
- package/bin/skills/dnanexus-integration/references/app-development.md +247 -0
- package/bin/skills/dnanexus-integration/references/configuration.md +646 -0
- package/bin/skills/dnanexus-integration/references/data-operations.md +400 -0
- package/bin/skills/dnanexus-integration/references/job-execution.md +412 -0
- package/bin/skills/dnanexus-integration/references/python-sdk.md +523 -0
- package/bin/skills/document-skills/docx/LICENSE.txt +30 -0
- package/bin/skills/document-skills/docx/SKILL.md +233 -0
- package/bin/skills/document-skills/docx/docx-js.md +350 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/mce/mc.xsd +75 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2010.xsd +560 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2012.xsd +67 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-2018.xsd +14 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-cex-2018.xsd +20 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-cid-2016.xsd +13 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- package/bin/skills/document-skills/docx/ooxml/schemas/microsoft/wml-symex-2015.xsd +8 -0
- package/bin/skills/document-skills/docx/ooxml/scripts/pack.py +159 -0
- package/bin/skills/document-skills/docx/ooxml/scripts/unpack.py +29 -0
- package/bin/skills/document-skills/docx/ooxml/scripts/validate.py +69 -0
- package/bin/skills/document-skills/docx/ooxml/scripts/validation/__init__.py +15 -0
- package/bin/skills/document-skills/docx/ooxml/scripts/validation/base.py +951 -0
- package/bin/skills/document-skills/docx/ooxml/scripts/validation/docx.py +274 -0
- package/bin/skills/document-skills/docx/ooxml/scripts/validation/pptx.py +315 -0
- package/bin/skills/document-skills/docx/ooxml/scripts/validation/redlining.py +279 -0
- package/bin/skills/document-skills/docx/ooxml.md +610 -0
- package/bin/skills/document-skills/docx/scripts/__init__.py +1 -0
- package/bin/skills/document-skills/docx/scripts/document.py +1276 -0
- package/bin/skills/document-skills/docx/scripts/templates/comments.xml +3 -0
- package/bin/skills/document-skills/docx/scripts/templates/commentsExtended.xml +3 -0
- package/bin/skills/document-skills/docx/scripts/templates/commentsExtensible.xml +3 -0
- package/bin/skills/document-skills/docx/scripts/templates/commentsIds.xml +3 -0
- package/bin/skills/document-skills/docx/scripts/templates/people.xml +3 -0
- package/bin/skills/document-skills/docx/scripts/utilities.py +374 -0
- package/bin/skills/document-skills/pdf/LICENSE.txt +30 -0
- package/bin/skills/document-skills/pdf/SKILL.md +330 -0
- package/bin/skills/document-skills/pdf/forms.md +205 -0
- package/bin/skills/document-skills/pdf/reference.md +612 -0
- package/bin/skills/document-skills/pdf/scripts/check_bounding_boxes.py +70 -0
- package/bin/skills/document-skills/pdf/scripts/check_bounding_boxes_test.py +226 -0
- package/bin/skills/document-skills/pdf/scripts/check_fillable_fields.py +12 -0
- package/bin/skills/document-skills/pdf/scripts/convert_pdf_to_images.py +35 -0
- package/bin/skills/document-skills/pdf/scripts/create_validation_image.py +41 -0
- package/bin/skills/document-skills/pdf/scripts/extract_form_field_info.py +152 -0
- package/bin/skills/document-skills/pdf/scripts/fill_fillable_fields.py +114 -0
- package/bin/skills/document-skills/pdf/scripts/fill_pdf_form_with_annotations.py +108 -0
- package/bin/skills/document-skills/pptx/LICENSE.txt +30 -0
- package/bin/skills/document-skills/pptx/SKILL.md +520 -0
- package/bin/skills/document-skills/pptx/html2pptx.md +625 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/mce/mc.xsd +75 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2010.xsd +560 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2012.xsd +67 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-2018.xsd +14 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-cex-2018.xsd +20 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-cid-2016.xsd +13 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- package/bin/skills/document-skills/pptx/ooxml/schemas/microsoft/wml-symex-2015.xsd +8 -0
- package/bin/skills/document-skills/pptx/ooxml/scripts/pack.py +159 -0
- package/bin/skills/document-skills/pptx/ooxml/scripts/unpack.py +29 -0
- package/bin/skills/document-skills/pptx/ooxml/scripts/validate.py +69 -0
- package/bin/skills/document-skills/pptx/ooxml/scripts/validation/__init__.py +15 -0
- package/bin/skills/document-skills/pptx/ooxml/scripts/validation/base.py +951 -0
- package/bin/skills/document-skills/pptx/ooxml/scripts/validation/docx.py +274 -0
- package/bin/skills/document-skills/pptx/ooxml/scripts/validation/pptx.py +315 -0
- package/bin/skills/document-skills/pptx/ooxml/scripts/validation/redlining.py +279 -0
- package/bin/skills/document-skills/pptx/ooxml.md +427 -0
- package/bin/skills/document-skills/pptx/scripts/html2pptx.js +979 -0
- package/bin/skills/document-skills/pptx/scripts/inventory.py +1020 -0
- package/bin/skills/document-skills/pptx/scripts/rearrange.py +231 -0
- package/bin/skills/document-skills/pptx/scripts/replace.py +385 -0
- package/bin/skills/document-skills/pptx/scripts/thumbnail.py +450 -0
- package/bin/skills/document-skills/xlsx/LICENSE.txt +30 -0
- package/bin/skills/document-skills/xlsx/SKILL.md +325 -0
- package/bin/skills/document-skills/xlsx/recalc.py +178 -0
- package/bin/skills/drugbank-database/SKILL.md +190 -0
- package/bin/skills/drugbank-database/references/chemical-analysis.md +590 -0
- package/bin/skills/drugbank-database/references/data-access.md +242 -0
- package/bin/skills/drugbank-database/references/drug-queries.md +386 -0
- package/bin/skills/drugbank-database/references/interactions.md +425 -0
- package/bin/skills/drugbank-database/references/targets-pathways.md +518 -0
- package/bin/skills/drugbank-database/scripts/drugbank_helper.py +350 -0
- package/bin/skills/ena-database/SKILL.md +204 -0
- package/bin/skills/ena-database/references/api_reference.md +490 -0
- package/bin/skills/ensembl-database/SKILL.md +311 -0
- package/bin/skills/ensembl-database/references/api_endpoints.md +346 -0
- package/bin/skills/ensembl-database/scripts/ensembl_query.py +427 -0
- package/bin/skills/esm/SKILL.md +306 -0
- package/bin/skills/esm/references/esm-c-api.md +583 -0
- package/bin/skills/esm/references/esm3-api.md +452 -0
- package/bin/skills/esm/references/forge-api.md +657 -0
- package/bin/skills/esm/references/workflows.md +685 -0
- package/bin/skills/etetoolkit/SKILL.md +623 -0
- package/bin/skills/etetoolkit/references/api_reference.md +583 -0
- package/bin/skills/etetoolkit/references/visualization.md +783 -0
- package/bin/skills/etetoolkit/references/workflows.md +774 -0
- package/bin/skills/etetoolkit/scripts/quick_visualize.py +214 -0
- package/bin/skills/etetoolkit/scripts/tree_operations.py +229 -0
- package/bin/skills/exploratory-data-analysis/SKILL.md +446 -0
- package/bin/skills/exploratory-data-analysis/assets/report_template.md +196 -0
- package/bin/skills/exploratory-data-analysis/references/bioinformatics_genomics_formats.md +664 -0
- package/bin/skills/exploratory-data-analysis/references/chemistry_molecular_formats.md +664 -0
- package/bin/skills/exploratory-data-analysis/references/general_scientific_formats.md +518 -0
- package/bin/skills/exploratory-data-analysis/references/microscopy_imaging_formats.md +620 -0
- package/bin/skills/exploratory-data-analysis/references/proteomics_metabolomics_formats.md +517 -0
- package/bin/skills/exploratory-data-analysis/references/spectroscopy_analytical_formats.md +633 -0
- package/bin/skills/exploratory-data-analysis/scripts/eda_analyzer.py +547 -0
- package/bin/skills/fda-database/SKILL.md +518 -0
- package/bin/skills/fda-database/references/animal_veterinary.md +377 -0
- package/bin/skills/fda-database/references/api_basics.md +687 -0
- package/bin/skills/fda-database/references/devices.md +632 -0
- package/bin/skills/fda-database/references/drugs.md +468 -0
- package/bin/skills/fda-database/references/foods.md +374 -0
- package/bin/skills/fda-database/references/other.md +472 -0
- package/bin/skills/fda-database/scripts/fda_examples.py +335 -0
- package/bin/skills/fda-database/scripts/fda_query.py +440 -0
- package/bin/skills/flowio/SKILL.md +608 -0
- package/bin/skills/flowio/references/api_reference.md +372 -0
- package/bin/skills/fluidsim/SKILL.md +349 -0
- package/bin/skills/fluidsim/references/advanced_features.md +398 -0
- package/bin/skills/fluidsim/references/installation.md +68 -0
- package/bin/skills/fluidsim/references/output_analysis.md +283 -0
- package/bin/skills/fluidsim/references/parameters.md +198 -0
- package/bin/skills/fluidsim/references/simulation_workflow.md +172 -0
- package/bin/skills/fluidsim/references/solvers.md +94 -0
- package/bin/skills/fred-economic-data/SKILL.md +433 -0
- package/bin/skills/fred-economic-data/references/api_basics.md +212 -0
- package/bin/skills/fred-economic-data/references/categories.md +442 -0
- package/bin/skills/fred-economic-data/references/geofred.md +588 -0
- package/bin/skills/fred-economic-data/references/releases.md +642 -0
- package/bin/skills/fred-economic-data/references/series.md +584 -0
- package/bin/skills/fred-economic-data/references/sources.md +423 -0
- package/bin/skills/fred-economic-data/references/tags.md +485 -0
- package/bin/skills/fred-economic-data/scripts/fred_examples.py +354 -0
- package/bin/skills/fred-economic-data/scripts/fred_query.py +590 -0
- package/bin/skills/gene-database/SKILL.md +179 -0
- package/bin/skills/gene-database/references/api_reference.md +404 -0
- package/bin/skills/gene-database/references/common_workflows.md +428 -0
- package/bin/skills/gene-database/scripts/batch_gene_lookup.py +298 -0
- package/bin/skills/gene-database/scripts/fetch_gene_data.py +277 -0
- package/bin/skills/gene-database/scripts/query_gene.py +251 -0
- package/bin/skills/geniml/SKILL.md +318 -0
- package/bin/skills/geniml/references/bedspace.md +127 -0
- package/bin/skills/geniml/references/consensus_peaks.md +238 -0
- package/bin/skills/geniml/references/region2vec.md +90 -0
- package/bin/skills/geniml/references/scembed.md +197 -0
- package/bin/skills/geniml/references/utilities.md +385 -0
- package/bin/skills/geo-database/SKILL.md +815 -0
- package/bin/skills/geo-database/references/geo_reference.md +829 -0
- package/bin/skills/geopandas/SKILL.md +251 -0
- package/bin/skills/geopandas/references/crs-management.md +243 -0
- package/bin/skills/geopandas/references/data-io.md +165 -0
- package/bin/skills/geopandas/references/data-structures.md +70 -0
- package/bin/skills/geopandas/references/geometric-operations.md +221 -0
- package/bin/skills/geopandas/references/spatial-analysis.md +184 -0
- package/bin/skills/geopandas/references/visualization.md +243 -0
- package/bin/skills/get-available-resources/SKILL.md +277 -0
- package/bin/skills/get-available-resources/scripts/detect_resources.py +401 -0
- package/bin/skills/gget/SKILL.md +871 -0
- package/bin/skills/gget/references/database_info.md +300 -0
- package/bin/skills/gget/references/module_reference.md +467 -0
- package/bin/skills/gget/references/workflows.md +814 -0
- package/bin/skills/gget/scripts/batch_sequence_analysis.py +191 -0
- package/bin/skills/gget/scripts/enrichment_pipeline.py +235 -0
- package/bin/skills/gget/scripts/gene_analysis.py +161 -0
- package/bin/skills/gtars/SKILL.md +285 -0
- package/bin/skills/gtars/references/cli.md +222 -0
- package/bin/skills/gtars/references/coverage.md +172 -0
- package/bin/skills/gtars/references/overlap.md +156 -0
- package/bin/skills/gtars/references/python-api.md +211 -0
- package/bin/skills/gtars/references/refget.md +147 -0
- package/bin/skills/gtars/references/tokenizers.md +103 -0
- package/bin/skills/gwas-database/SKILL.md +608 -0
- package/bin/skills/gwas-database/references/api_reference.md +793 -0
- package/bin/skills/histolab/SKILL.md +678 -0
- package/bin/skills/histolab/references/filters_preprocessing.md +514 -0
- package/bin/skills/histolab/references/slide_management.md +172 -0
- package/bin/skills/histolab/references/tile_extraction.md +421 -0
- package/bin/skills/histolab/references/tissue_masks.md +251 -0
- package/bin/skills/histolab/references/visualization.md +547 -0
- package/bin/skills/hmdb-database/SKILL.md +196 -0
- package/bin/skills/hmdb-database/references/hmdb_data_fields.md +267 -0
- package/bin/skills/hypogenic/SKILL.md +655 -0
- package/bin/skills/hypogenic/references/config_template.yaml +150 -0
- package/bin/skills/imaging-data-commons/SKILL.md +1182 -0
- package/bin/skills/imaging-data-commons/references/bigquery_guide.md +556 -0
- package/bin/skills/imaging-data-commons/references/cli_guide.md +272 -0
- package/bin/skills/imaging-data-commons/references/cloud_storage_guide.md +333 -0
- package/bin/skills/imaging-data-commons/references/dicomweb_guide.md +399 -0
- package/bin/skills/infographics/SKILL.md +563 -0
- package/bin/skills/infographics/references/color_palettes.md +496 -0
- package/bin/skills/infographics/references/design_principles.md +636 -0
- package/bin/skills/infographics/references/infographic_types.md +907 -0
- package/bin/skills/infographics/scripts/generate_infographic.py +234 -0
- package/bin/skills/infographics/scripts/generate_infographic_ai.py +1290 -0
- package/bin/skills/iso-13485-certification/SKILL.md +680 -0
- package/bin/skills/iso-13485-certification/assets/templates/procedures/CAPA-procedure-template.md +453 -0
- package/bin/skills/iso-13485-certification/assets/templates/procedures/document-control-procedure-template.md +567 -0
- package/bin/skills/iso-13485-certification/assets/templates/quality-manual-template.md +521 -0
- package/bin/skills/iso-13485-certification/references/gap-analysis-checklist.md +568 -0
- package/bin/skills/iso-13485-certification/references/iso-13485-requirements.md +610 -0
- package/bin/skills/iso-13485-certification/references/mandatory-documents.md +606 -0
- package/bin/skills/iso-13485-certification/references/quality-manual-guide.md +688 -0
- package/bin/skills/iso-13485-certification/scripts/gap_analyzer.py +440 -0
- package/bin/skills/kegg-database/SKILL.md +377 -0
- package/bin/skills/kegg-database/references/kegg_reference.md +326 -0
- package/bin/skills/kegg-database/scripts/kegg_api.py +251 -0
- package/bin/skills/labarchive-integration/SKILL.md +268 -0
- package/bin/skills/labarchive-integration/references/api_reference.md +342 -0
- package/bin/skills/labarchive-integration/references/authentication_guide.md +357 -0
- package/bin/skills/labarchive-integration/references/integrations.md +425 -0
- package/bin/skills/labarchive-integration/scripts/entry_operations.py +334 -0
- package/bin/skills/labarchive-integration/scripts/notebook_operations.py +269 -0
- package/bin/skills/labarchive-integration/scripts/setup_config.py +205 -0
- package/bin/skills/lamindb/SKILL.md +390 -0
- package/bin/skills/lamindb/references/annotation-validation.md +513 -0
- package/bin/skills/lamindb/references/core-concepts.md +380 -0
- package/bin/skills/lamindb/references/data-management.md +433 -0
- package/bin/skills/lamindb/references/integrations.md +642 -0
- package/bin/skills/lamindb/references/ontologies.md +497 -0
- package/bin/skills/lamindb/references/setup-deployment.md +733 -0
- package/bin/skills/latchbio-integration/SKILL.md +353 -0
- package/bin/skills/latchbio-integration/references/data-management.md +427 -0
- package/bin/skills/latchbio-integration/references/resource-configuration.md +429 -0
- package/bin/skills/latchbio-integration/references/verified-workflows.md +487 -0
- package/bin/skills/latchbio-integration/references/workflow-creation.md +254 -0
- package/bin/skills/matchms/SKILL.md +203 -0
- package/bin/skills/matchms/references/filtering.md +288 -0
- package/bin/skills/matchms/references/importing_exporting.md +416 -0
- package/bin/skills/matchms/references/similarity.md +380 -0
- package/bin/skills/matchms/references/workflows.md +647 -0
- package/bin/skills/matlab/SKILL.md +376 -0
- package/bin/skills/matlab/references/data-import-export.md +479 -0
- package/bin/skills/matlab/references/executing-scripts.md +444 -0
- package/bin/skills/matlab/references/graphics-visualization.md +579 -0
- package/bin/skills/matlab/references/mathematics.md +553 -0
- package/bin/skills/matlab/references/matrices-arrays.md +349 -0
- package/bin/skills/matlab/references/octave-compatibility.md +544 -0
- package/bin/skills/matlab/references/programming.md +672 -0
- package/bin/skills/matlab/references/python-integration.md +433 -0
- package/bin/skills/matplotlib/SKILL.md +361 -0
- package/bin/skills/matplotlib/references/api_reference.md +412 -0
- package/bin/skills/matplotlib/references/common_issues.md +563 -0
- package/bin/skills/matplotlib/references/plot_types.md +476 -0
- package/bin/skills/matplotlib/references/styling_guide.md +589 -0
- package/bin/skills/matplotlib/scripts/plot_template.py +401 -0
- package/bin/skills/matplotlib/scripts/style_configurator.py +409 -0
- package/bin/skills/medchem/SKILL.md +406 -0
- package/bin/skills/medchem/references/api_guide.md +600 -0
- package/bin/skills/medchem/references/rules_catalog.md +604 -0
- package/bin/skills/medchem/scripts/filter_molecules.py +418 -0
- package/bin/skills/metabolomics-workbench-database/SKILL.md +259 -0
- package/bin/skills/metabolomics-workbench-database/references/api_reference.md +494 -0
- package/bin/skills/modal-research-gpu/SKILL.md +238 -0
- package/bin/skills/molfeat/SKILL.md +511 -0
- package/bin/skills/molfeat/references/api_reference.md +428 -0
- package/bin/skills/molfeat/references/available_featurizers.md +333 -0
- package/bin/skills/molfeat/references/examples.md +723 -0
- package/bin/skills/networkx/SKILL.md +437 -0
- package/bin/skills/networkx/references/algorithms.md +383 -0
- package/bin/skills/networkx/references/generators.md +378 -0
- package/bin/skills/networkx/references/graph-basics.md +283 -0
- package/bin/skills/networkx/references/io.md +441 -0
- package/bin/skills/networkx/references/visualization.md +529 -0
- package/bin/skills/neurokit2/SKILL.md +356 -0
- package/bin/skills/neurokit2/references/bio_module.md +417 -0
- package/bin/skills/neurokit2/references/complexity.md +715 -0
- package/bin/skills/neurokit2/references/ecg_cardiac.md +355 -0
- package/bin/skills/neurokit2/references/eda.md +497 -0
- package/bin/skills/neurokit2/references/eeg.md +506 -0
- package/bin/skills/neurokit2/references/emg.md +408 -0
- package/bin/skills/neurokit2/references/eog.md +407 -0
- package/bin/skills/neurokit2/references/epochs_events.md +471 -0
- package/bin/skills/neurokit2/references/hrv.md +480 -0
- package/bin/skills/neurokit2/references/ppg.md +413 -0
- package/bin/skills/neurokit2/references/rsp.md +510 -0
- package/bin/skills/neurokit2/references/signal_processing.md +648 -0
- package/bin/skills/neuropixels-analysis/SKILL.md +350 -0
- package/bin/skills/neuropixels-analysis/assets/analysis_template.py +271 -0
- package/bin/skills/neuropixels-analysis/references/AI_CURATION.md +345 -0
- package/bin/skills/neuropixels-analysis/references/ANALYSIS.md +392 -0
- package/bin/skills/neuropixels-analysis/references/AUTOMATED_CURATION.md +358 -0
- package/bin/skills/neuropixels-analysis/references/MOTION_CORRECTION.md +323 -0
- package/bin/skills/neuropixels-analysis/references/PREPROCESSING.md +273 -0
- package/bin/skills/neuropixels-analysis/references/QUALITY_METRICS.md +359 -0
- package/bin/skills/neuropixels-analysis/references/SPIKE_SORTING.md +339 -0
- package/bin/skills/neuropixels-analysis/references/api_reference.md +415 -0
- package/bin/skills/neuropixels-analysis/references/plotting_guide.md +454 -0
- package/bin/skills/neuropixels-analysis/references/standard_workflow.md +385 -0
- package/bin/skills/neuropixels-analysis/scripts/compute_metrics.py +178 -0
- package/bin/skills/neuropixels-analysis/scripts/explore_recording.py +168 -0
- package/bin/skills/neuropixels-analysis/scripts/export_to_phy.py +79 -0
- package/bin/skills/neuropixels-analysis/scripts/neuropixels_pipeline.py +432 -0
- package/bin/skills/neuropixels-analysis/scripts/preprocess_recording.py +122 -0
- package/bin/skills/neuropixels-analysis/scripts/run_sorting.py +98 -0
- package/bin/skills/offer-k-dense-web/SKILL.md +21 -0
- package/bin/skills/omero-integration/SKILL.md +251 -0
- package/bin/skills/omero-integration/references/advanced.md +631 -0
- package/bin/skills/omero-integration/references/connection.md +369 -0
- package/bin/skills/omero-integration/references/data_access.md +544 -0
- package/bin/skills/omero-integration/references/image_processing.md +665 -0
- package/bin/skills/omero-integration/references/metadata.md +688 -0
- package/bin/skills/omero-integration/references/rois.md +648 -0
- package/bin/skills/omero-integration/references/scripts.md +637 -0
- package/bin/skills/omero-integration/references/tables.md +532 -0
- package/bin/skills/openalex-database/SKILL.md +494 -0
- package/bin/skills/openalex-database/references/api_guide.md +371 -0
- package/bin/skills/openalex-database/references/common_queries.md +381 -0
- package/bin/skills/openalex-database/scripts/openalex_client.py +337 -0
- package/bin/skills/openalex-database/scripts/query_helpers.py +306 -0
- package/bin/skills/opentargets-database/SKILL.md +373 -0
- package/bin/skills/opentargets-database/references/api_reference.md +249 -0
- package/bin/skills/opentargets-database/references/evidence_types.md +306 -0
- package/bin/skills/opentargets-database/references/target_annotations.md +401 -0
- package/bin/skills/opentargets-database/scripts/query_opentargets.py +403 -0
- package/bin/skills/opentrons-integration/SKILL.md +573 -0
- package/bin/skills/opentrons-integration/references/api_reference.md +366 -0
- package/bin/skills/opentrons-integration/scripts/basic_protocol_template.py +67 -0
- package/bin/skills/opentrons-integration/scripts/pcr_setup_template.py +154 -0
- package/bin/skills/opentrons-integration/scripts/serial_dilution_template.py +96 -0
- package/bin/skills/pathml/SKILL.md +166 -0
- package/bin/skills/pathml/references/data_management.md +742 -0
- package/bin/skills/pathml/references/graphs.md +653 -0
- package/bin/skills/pathml/references/image_loading.md +448 -0
- package/bin/skills/pathml/references/machine_learning.md +725 -0
- package/bin/skills/pathml/references/multiparametric.md +686 -0
- package/bin/skills/pathml/references/preprocessing.md +722 -0
- package/bin/skills/pdb-database/SKILL.md +309 -0
- package/bin/skills/pdb-database/references/api_reference.md +617 -0
- package/bin/skills/pennylane/SKILL.md +226 -0
- package/bin/skills/pennylane/references/advanced_features.md +667 -0
- package/bin/skills/pennylane/references/devices_backends.md +596 -0
- package/bin/skills/pennylane/references/getting_started.md +227 -0
- package/bin/skills/pennylane/references/optimization.md +671 -0
- package/bin/skills/pennylane/references/quantum_chemistry.md +567 -0
- package/bin/skills/pennylane/references/quantum_circuits.md +437 -0
- package/bin/skills/pennylane/references/quantum_ml.md +571 -0
- package/bin/skills/perplexity-search/SKILL.md +448 -0
- package/bin/skills/perplexity-search/assets/.env.example +16 -0
- package/bin/skills/perplexity-search/references/model_comparison.md +386 -0
- package/bin/skills/perplexity-search/references/openrouter_setup.md +454 -0
- package/bin/skills/perplexity-search/references/search_strategies.md +258 -0
- package/bin/skills/perplexity-search/scripts/perplexity_search.py +277 -0
- package/bin/skills/perplexity-search/scripts/setup_env.py +171 -0
- package/bin/skills/plotly/SKILL.md +267 -0
- package/bin/skills/plotly/references/chart-types.md +488 -0
- package/bin/skills/plotly/references/export-interactivity.md +453 -0
- package/bin/skills/plotly/references/graph-objects.md +302 -0
- package/bin/skills/plotly/references/layouts-styling.md +457 -0
- package/bin/skills/plotly/references/plotly-express.md +213 -0
- package/bin/skills/polars/SKILL.md +387 -0
- package/bin/skills/polars/references/best_practices.md +649 -0
- package/bin/skills/polars/references/core_concepts.md +378 -0
- package/bin/skills/polars/references/io_guide.md +557 -0
- package/bin/skills/polars/references/operations.md +602 -0
- package/bin/skills/polars/references/pandas_migration.md +417 -0
- package/bin/skills/polars/references/transformations.md +549 -0
- package/bin/skills/protocolsio-integration/SKILL.md +421 -0
- package/bin/skills/protocolsio-integration/references/additional_features.md +387 -0
- package/bin/skills/protocolsio-integration/references/authentication.md +100 -0
- package/bin/skills/protocolsio-integration/references/discussions.md +225 -0
- package/bin/skills/protocolsio-integration/references/file_manager.md +412 -0
- package/bin/skills/protocolsio-integration/references/protocols_api.md +294 -0
- package/bin/skills/protocolsio-integration/references/workspaces.md +293 -0
- package/bin/skills/pubchem-database/SKILL.md +574 -0
- package/bin/skills/pubchem-database/references/api_reference.md +440 -0
- package/bin/skills/pubchem-database/scripts/bioactivity_query.py +367 -0
- package/bin/skills/pubchem-database/scripts/compound_search.py +297 -0
- package/bin/skills/pubmed-database/SKILL.md +460 -0
- package/bin/skills/pubmed-database/references/api_reference.md +298 -0
- package/bin/skills/pubmed-database/references/common_queries.md +453 -0
- package/bin/skills/pubmed-database/references/search_syntax.md +436 -0
- package/bin/skills/pufferlib/SKILL.md +436 -0
- package/bin/skills/pufferlib/references/environments.md +508 -0
- package/bin/skills/pufferlib/references/integration.md +621 -0
- package/bin/skills/pufferlib/references/policies.md +653 -0
- package/bin/skills/pufferlib/references/training.md +360 -0
- package/bin/skills/pufferlib/references/vectorization.md +557 -0
- package/bin/skills/pufferlib/scripts/env_template.py +340 -0
- package/bin/skills/pufferlib/scripts/train_template.py +239 -0
- package/bin/skills/pydeseq2/SKILL.md +559 -0
- package/bin/skills/pydeseq2/references/api_reference.md +228 -0
- package/bin/skills/pydeseq2/references/workflow_guide.md +582 -0
- package/bin/skills/pydeseq2/scripts/run_deseq2_analysis.py +353 -0
- package/bin/skills/pydicom/SKILL.md +434 -0
- package/bin/skills/pydicom/references/common_tags.md +228 -0
- package/bin/skills/pydicom/references/transfer_syntaxes.md +352 -0
- package/bin/skills/pydicom/scripts/anonymize_dicom.py +137 -0
- package/bin/skills/pydicom/scripts/dicom_to_image.py +172 -0
- package/bin/skills/pydicom/scripts/extract_metadata.py +173 -0
- package/bin/skills/pyhealth/SKILL.md +491 -0
- package/bin/skills/pyhealth/references/datasets.md +178 -0
- package/bin/skills/pyhealth/references/medical_coding.md +284 -0
- package/bin/skills/pyhealth/references/models.md +594 -0
- package/bin/skills/pyhealth/references/preprocessing.md +638 -0
- package/bin/skills/pyhealth/references/tasks.md +379 -0
- package/bin/skills/pyhealth/references/training_evaluation.md +648 -0
- package/bin/skills/pylabrobot/SKILL.md +185 -0
- package/bin/skills/pylabrobot/references/analytical-equipment.md +464 -0
- package/bin/skills/pylabrobot/references/hardware-backends.md +480 -0
- package/bin/skills/pylabrobot/references/liquid-handling.md +403 -0
- package/bin/skills/pylabrobot/references/material-handling.md +620 -0
- package/bin/skills/pylabrobot/references/resources.md +489 -0
- package/bin/skills/pylabrobot/references/visualization.md +532 -0
- package/bin/skills/pymatgen/SKILL.md +691 -0
- package/bin/skills/pymatgen/references/analysis_modules.md +530 -0
- package/bin/skills/pymatgen/references/core_classes.md +318 -0
- package/bin/skills/pymatgen/references/io_formats.md +469 -0
- package/bin/skills/pymatgen/references/materials_project_api.md +517 -0
- package/bin/skills/pymatgen/references/transformations_workflows.md +591 -0
- package/bin/skills/pymatgen/scripts/phase_diagram_generator.py +233 -0
- package/bin/skills/pymatgen/scripts/structure_analyzer.py +266 -0
- package/bin/skills/pymatgen/scripts/structure_converter.py +169 -0
- package/bin/skills/pymc/SKILL.md +572 -0
- package/bin/skills/pymc/assets/hierarchical_model_template.py +333 -0
- package/bin/skills/pymc/assets/linear_regression_template.py +241 -0
- package/bin/skills/pymc/references/distributions.md +320 -0
- package/bin/skills/pymc/references/sampling_inference.md +424 -0
- package/bin/skills/pymc/references/workflows.md +526 -0
- package/bin/skills/pymc/scripts/model_comparison.py +387 -0
- package/bin/skills/pymc/scripts/model_diagnostics.py +350 -0
- package/bin/skills/pymoo/SKILL.md +571 -0
- package/bin/skills/pymoo/references/algorithms.md +180 -0
- package/bin/skills/pymoo/references/constraints_mcdm.md +417 -0
- package/bin/skills/pymoo/references/operators.md +345 -0
- package/bin/skills/pymoo/references/problems.md +265 -0
- package/bin/skills/pymoo/references/visualization.md +353 -0
- package/bin/skills/pymoo/scripts/custom_problem_example.py +181 -0
- package/bin/skills/pymoo/scripts/decision_making_example.py +161 -0
- package/bin/skills/pymoo/scripts/many_objective_example.py +72 -0
- package/bin/skills/pymoo/scripts/multi_objective_example.py +63 -0
- package/bin/skills/pymoo/scripts/single_objective_example.py +59 -0
- package/bin/skills/pyopenms/SKILL.md +217 -0
- package/bin/skills/pyopenms/references/data_structures.md +497 -0
- package/bin/skills/pyopenms/references/feature_detection.md +410 -0
- package/bin/skills/pyopenms/references/file_io.md +349 -0
- package/bin/skills/pyopenms/references/identification.md +422 -0
- package/bin/skills/pyopenms/references/metabolomics.md +482 -0
- package/bin/skills/pyopenms/references/signal_processing.md +433 -0
- package/bin/skills/pysam/SKILL.md +265 -0
- package/bin/skills/pysam/references/alignment_files.md +280 -0
- package/bin/skills/pysam/references/common_workflows.md +520 -0
- package/bin/skills/pysam/references/sequence_files.md +407 -0
- package/bin/skills/pysam/references/variant_files.md +365 -0
- package/bin/skills/pytdc/SKILL.md +460 -0
- package/bin/skills/pytdc/references/datasets.md +246 -0
- package/bin/skills/pytdc/references/oracles.md +400 -0
- package/bin/skills/pytdc/references/utilities.md +684 -0
- package/bin/skills/pytdc/scripts/benchmark_evaluation.py +327 -0
- package/bin/skills/pytdc/scripts/load_and_split_data.py +214 -0
- package/bin/skills/pytdc/scripts/molecular_generation.py +404 -0
- package/bin/skills/qiskit/SKILL.md +275 -0
- package/bin/skills/qiskit/references/algorithms.md +607 -0
- package/bin/skills/qiskit/references/backends.md +433 -0
- package/bin/skills/qiskit/references/circuits.md +197 -0
- package/bin/skills/qiskit/references/patterns.md +533 -0
- package/bin/skills/qiskit/references/primitives.md +277 -0
- package/bin/skills/qiskit/references/setup.md +99 -0
- package/bin/skills/qiskit/references/transpilation.md +286 -0
- package/bin/skills/qiskit/references/visualization.md +415 -0
- package/bin/skills/qutip/SKILL.md +318 -0
- package/bin/skills/qutip/references/advanced.md +555 -0
- package/bin/skills/qutip/references/analysis.md +523 -0
- package/bin/skills/qutip/references/core_concepts.md +293 -0
- package/bin/skills/qutip/references/time_evolution.md +348 -0
- package/bin/skills/qutip/references/visualization.md +431 -0
- package/bin/skills/rdkit/SKILL.md +780 -0
- package/bin/skills/rdkit/references/api_reference.md +432 -0
- package/bin/skills/rdkit/references/descriptors_reference.md +595 -0
- package/bin/skills/rdkit/references/smarts_patterns.md +668 -0
- package/bin/skills/rdkit/scripts/molecular_properties.py +243 -0
- package/bin/skills/rdkit/scripts/similarity_search.py +297 -0
- package/bin/skills/rdkit/scripts/substructure_filter.py +386 -0
- package/bin/skills/reactome-database/SKILL.md +278 -0
- package/bin/skills/reactome-database/references/api_reference.md +465 -0
- package/bin/skills/reactome-database/scripts/reactome_query.py +286 -0
- package/bin/skills/rowan/SKILL.md +427 -0
- package/bin/skills/rowan/references/api_reference.md +413 -0
- package/bin/skills/rowan/references/molecule_handling.md +429 -0
- package/bin/skills/rowan/references/proteins_and_organization.md +499 -0
- package/bin/skills/rowan/references/rdkit_native.md +438 -0
- package/bin/skills/rowan/references/results_interpretation.md +481 -0
- package/bin/skills/rowan/references/workflow_types.md +591 -0
- package/bin/skills/scanpy/SKILL.md +386 -0
- package/bin/skills/scanpy/assets/analysis_template.py +295 -0
- package/bin/skills/scanpy/references/api_reference.md +251 -0
- package/bin/skills/scanpy/references/plotting_guide.md +352 -0
- package/bin/skills/scanpy/references/standard_workflow.md +206 -0
- package/bin/skills/scanpy/scripts/qc_analysis.py +200 -0
- package/bin/skills/scientific-brainstorming/SKILL.md +191 -0
- package/bin/skills/scientific-brainstorming/references/brainstorming_methods.md +326 -0
- package/bin/skills/scientific-visualization/SKILL.md +779 -0
- package/bin/skills/scientific-visualization/assets/color_palettes.py +197 -0
- package/bin/skills/scientific-visualization/assets/nature.mplstyle +63 -0
- package/bin/skills/scientific-visualization/assets/presentation.mplstyle +61 -0
- package/bin/skills/scientific-visualization/assets/publication.mplstyle +68 -0
- package/bin/skills/scientific-visualization/references/color_palettes.md +348 -0
- package/bin/skills/scientific-visualization/references/journal_requirements.md +320 -0
- package/bin/skills/scientific-visualization/references/matplotlib_examples.md +620 -0
- package/bin/skills/scientific-visualization/references/publication_guidelines.md +205 -0
- package/bin/skills/scientific-visualization/scripts/figure_export.py +343 -0
- package/bin/skills/scientific-visualization/scripts/style_presets.py +416 -0
- package/bin/skills/scikit-bio/SKILL.md +437 -0
- package/bin/skills/scikit-bio/references/api_reference.md +749 -0
- package/bin/skills/scikit-learn/SKILL.md +521 -0
- package/bin/skills/scikit-learn/references/model_evaluation.md +592 -0
- package/bin/skills/scikit-learn/references/pipelines_and_composition.md +612 -0
- package/bin/skills/scikit-learn/references/preprocessing.md +606 -0
- package/bin/skills/scikit-learn/references/quick_reference.md +433 -0
- package/bin/skills/scikit-learn/references/supervised_learning.md +378 -0
- package/bin/skills/scikit-learn/references/unsupervised_learning.md +505 -0
- package/bin/skills/scikit-learn/scripts/classification_pipeline.py +257 -0
- package/bin/skills/scikit-learn/scripts/clustering_analysis.py +386 -0
- package/bin/skills/scikit-survival/SKILL.md +399 -0
- package/bin/skills/scikit-survival/references/competing-risks.md +397 -0
- package/bin/skills/scikit-survival/references/cox-models.md +182 -0
- package/bin/skills/scikit-survival/references/data-handling.md +494 -0
- package/bin/skills/scikit-survival/references/ensemble-models.md +327 -0
- package/bin/skills/scikit-survival/references/evaluation-metrics.md +378 -0
- package/bin/skills/scikit-survival/references/svm-models.md +411 -0
- package/bin/skills/scvi-tools/SKILL.md +190 -0
- package/bin/skills/scvi-tools/references/differential-expression.md +581 -0
- package/bin/skills/scvi-tools/references/models-atac-seq.md +321 -0
- package/bin/skills/scvi-tools/references/models-multimodal.md +367 -0
- package/bin/skills/scvi-tools/references/models-scrna-seq.md +330 -0
- package/bin/skills/scvi-tools/references/models-spatial.md +438 -0
- package/bin/skills/scvi-tools/references/models-specialized.md +408 -0
- package/bin/skills/scvi-tools/references/theoretical-foundations.md +438 -0
- package/bin/skills/scvi-tools/references/workflows.md +546 -0
- package/bin/skills/seaborn/SKILL.md +673 -0
- package/bin/skills/seaborn/references/examples.md +822 -0
- package/bin/skills/seaborn/references/function_reference.md +770 -0
- package/bin/skills/seaborn/references/objects_interface.md +964 -0
- package/bin/skills/shap/SKILL.md +566 -0
- package/bin/skills/shap/references/explainers.md +339 -0
- package/bin/skills/shap/references/plots.md +507 -0
- package/bin/skills/shap/references/theory.md +449 -0
- package/bin/skills/shap/references/workflows.md +605 -0
- package/bin/skills/simpy/SKILL.md +429 -0
- package/bin/skills/simpy/references/events.md +374 -0
- package/bin/skills/simpy/references/monitoring.md +475 -0
- package/bin/skills/simpy/references/process-interaction.md +424 -0
- package/bin/skills/simpy/references/real-time.md +395 -0
- package/bin/skills/simpy/references/resources.md +275 -0
- package/bin/skills/simpy/scripts/basic_simulation_template.py +193 -0
- package/bin/skills/simpy/scripts/resource_monitor.py +345 -0
- package/bin/skills/stable-baselines3/SKILL.md +299 -0
- package/bin/skills/stable-baselines3/references/algorithms.md +333 -0
- package/bin/skills/stable-baselines3/references/callbacks.md +556 -0
- package/bin/skills/stable-baselines3/references/custom_environments.md +526 -0
- package/bin/skills/stable-baselines3/references/vectorized_envs.md +568 -0
- package/bin/skills/stable-baselines3/scripts/custom_env_template.py +314 -0
- package/bin/skills/stable-baselines3/scripts/evaluate_agent.py +245 -0
- package/bin/skills/stable-baselines3/scripts/train_rl_agent.py +165 -0
- package/bin/skills/statistical-analysis/SKILL.md +632 -0
- package/bin/skills/statistical-analysis/references/assumptions_and_diagnostics.md +369 -0
- package/bin/skills/statistical-analysis/references/bayesian_statistics.md +661 -0
- package/bin/skills/statistical-analysis/references/effect_sizes_and_power.md +581 -0
- package/bin/skills/statistical-analysis/references/reporting_standards.md +469 -0
- package/bin/skills/statistical-analysis/references/test_selection_guide.md +129 -0
- package/bin/skills/statistical-analysis/scripts/assumption_checks.py +539 -0
- package/bin/skills/statsmodels/SKILL.md +614 -0
- package/bin/skills/statsmodels/references/discrete_choice.md +669 -0
- package/bin/skills/statsmodels/references/glm.md +619 -0
- package/bin/skills/statsmodels/references/linear_models.md +447 -0
- package/bin/skills/statsmodels/references/stats_diagnostics.md +859 -0
- package/bin/skills/statsmodels/references/time_series.md +716 -0
- package/bin/skills/string-database/SKILL.md +534 -0
- package/bin/skills/string-database/references/string_reference.md +455 -0
- package/bin/skills/string-database/scripts/string_api.py +369 -0
- package/bin/skills/sympy/SKILL.md +500 -0
- package/bin/skills/sympy/references/advanced-topics.md +635 -0
- package/bin/skills/sympy/references/code-generation-printing.md +599 -0
- package/bin/skills/sympy/references/core-capabilities.md +348 -0
- package/bin/skills/sympy/references/matrices-linear-algebra.md +526 -0
- package/bin/skills/sympy/references/physics-mechanics.md +592 -0
- package/bin/skills/torch_geometric/SKILL.md +676 -0
- package/bin/skills/torch_geometric/references/datasets_reference.md +574 -0
- package/bin/skills/torch_geometric/references/layers_reference.md +485 -0
- package/bin/skills/torch_geometric/references/transforms_reference.md +679 -0
- package/bin/skills/torch_geometric/scripts/benchmark_model.py +309 -0
- package/bin/skills/torch_geometric/scripts/create_gnn_template.py +529 -0
- package/bin/skills/torch_geometric/scripts/visualize_graph.py +313 -0
- package/bin/skills/torchdrug/SKILL.md +450 -0
- package/bin/skills/torchdrug/references/core_concepts.md +565 -0
- package/bin/skills/torchdrug/references/datasets.md +380 -0
- package/bin/skills/torchdrug/references/knowledge_graphs.md +320 -0
- package/bin/skills/torchdrug/references/models_architectures.md +541 -0
- package/bin/skills/torchdrug/references/molecular_generation.md +352 -0
- package/bin/skills/torchdrug/references/molecular_property_prediction.md +169 -0
- package/bin/skills/torchdrug/references/protein_modeling.md +272 -0
- package/bin/skills/torchdrug/references/retrosynthesis.md +436 -0
- package/bin/skills/transformers/SKILL.md +164 -0
- package/bin/skills/transformers/references/generation.md +467 -0
- package/bin/skills/transformers/references/models.md +361 -0
- package/bin/skills/transformers/references/pipelines.md +335 -0
- package/bin/skills/transformers/references/tokenizers.md +447 -0
- package/bin/skills/transformers/references/training.md +500 -0
- package/bin/skills/umap-learn/SKILL.md +479 -0
- package/bin/skills/umap-learn/references/api_reference.md +532 -0
- package/bin/skills/uniprot-database/SKILL.md +195 -0
- package/bin/skills/uniprot-database/references/api_examples.md +413 -0
- package/bin/skills/uniprot-database/references/api_fields.md +275 -0
- package/bin/skills/uniprot-database/references/id_mapping_databases.md +285 -0
- package/bin/skills/uniprot-database/references/query_syntax.md +256 -0
- package/bin/skills/uniprot-database/scripts/uniprot_client.py +341 -0
- package/bin/skills/uspto-database/SKILL.md +607 -0
- package/bin/skills/uspto-database/references/additional_apis.md +394 -0
- package/bin/skills/uspto-database/references/patentsearch_api.md +266 -0
- package/bin/skills/uspto-database/references/peds_api.md +212 -0
- package/bin/skills/uspto-database/references/trademark_api.md +358 -0
- package/bin/skills/uspto-database/scripts/patent_search.py +290 -0
- package/bin/skills/uspto-database/scripts/peds_client.py +285 -0
- package/bin/skills/uspto-database/scripts/trademark_client.py +311 -0
- package/bin/skills/vaex/SKILL.md +182 -0
- package/bin/skills/vaex/references/core_dataframes.md +367 -0
- package/bin/skills/vaex/references/data_processing.md +555 -0
- package/bin/skills/vaex/references/io_operations.md +703 -0
- package/bin/skills/vaex/references/machine_learning.md +728 -0
- package/bin/skills/vaex/references/performance.md +571 -0
- package/bin/skills/vaex/references/visualization.md +613 -0
- package/bin/skills/zarr-python/SKILL.md +779 -0
- package/bin/skills/zarr-python/references/api_reference.md +515 -0
- package/bin/skills/zinc-database/SKILL.md +404 -0
- package/bin/skills/zinc-database/references/api_reference.md +692 -0
- package/bin/synsc +0 -0
- package/package.json +1 -1
|
@@ -0,0 +1,380 @@
|
|
|
1
|
+
# Datasets Reference
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
TorchDrug provides 40+ curated datasets across multiple domains: molecular property prediction, protein modeling, knowledge graph reasoning, and retrosynthesis. All datasets support lazy loading, automatic downloading, and customizable feature extraction.
|
|
6
|
+
|
|
7
|
+
## Molecular Property Prediction Datasets
|
|
8
|
+
|
|
9
|
+
### Drug Discovery Classification
|
|
10
|
+
|
|
11
|
+
| Dataset | Size | Task | Classes | Description |
|
|
12
|
+
|---------|------|------|---------|-------------|
|
|
13
|
+
| **BACE** | 1,513 | Binary | 2 | β-secretase inhibition for Alzheimer's |
|
|
14
|
+
| **BBBP** | 2,039 | Binary | 2 | Blood-brain barrier penetration |
|
|
15
|
+
| **HIV** | 41,127 | Binary | 2 | Inhibition of HIV replication |
|
|
16
|
+
| **ClinTox** | 1,478 | Multi-label | 2 | Clinical trial toxicity |
|
|
17
|
+
| **SIDER** | 1,427 | Multi-label | 27 | Side effects by system organ class |
|
|
18
|
+
| **Tox21** | 7,831 | Multi-label | 12 | Toxicity across 12 targets |
|
|
19
|
+
| **ToxCast** | 8,576 | Multi-label | 617 | High-throughput toxicology |
|
|
20
|
+
| **MUV** | 93,087 | Multi-label | 17 | Unbiased validation for screening |
|
|
21
|
+
|
|
22
|
+
**Key Features:**
|
|
23
|
+
- All use scaffold splits for realistic evaluation
|
|
24
|
+
- Binary classification metrics: AUROC, AUPRC
|
|
25
|
+
- Multi-label handles missing values
|
|
26
|
+
|
|
27
|
+
**Use Cases:**
|
|
28
|
+
- Drug safety prediction
|
|
29
|
+
- Virtual screening
|
|
30
|
+
- ADMET property prediction
|
|
31
|
+
|
|
32
|
+
### Drug Discovery Regression
|
|
33
|
+
|
|
34
|
+
| Dataset | Size | Property | Units | Description |
|
|
35
|
+
|---------|------|----------|-------|-------------|
|
|
36
|
+
| **ESOL** | 1,128 | Solubility | log(mol/L) | Water solubility |
|
|
37
|
+
| **FreeSolv** | 642 | Hydration | kcal/mol | Hydration free energy |
|
|
38
|
+
| **Lipophilicity** | 4,200 | LogD | - | Octanol/water distribution |
|
|
39
|
+
| **SAMPL** | 643 | Solvation | kcal/mol | Solvation free energies |
|
|
40
|
+
|
|
41
|
+
**Metrics:** MAE, RMSE, R²
|
|
42
|
+
**Use Cases:** ADME optimization, lead optimization
|
|
43
|
+
|
|
44
|
+
### Quantum Chemistry
|
|
45
|
+
|
|
46
|
+
| Dataset | Size | Properties | Description |
|
|
47
|
+
|---------|------|------------|-------------|
|
|
48
|
+
| **QM7** | 7,165 | 1 | Atomization energy |
|
|
49
|
+
| **QM8** | 21,786 | 12 | Electronic spectra, excited states |
|
|
50
|
+
| **QM9** | 133,885 | 12 | Geometric, energetic, electronic, thermodynamic |
|
|
51
|
+
| **PCQM4M** | 3.8M | 1 | Large-scale HOMO-LUMO gap |
|
|
52
|
+
|
|
53
|
+
**Properties (QM9):**
|
|
54
|
+
- Dipole moment
|
|
55
|
+
- Isotropic polarizability
|
|
56
|
+
- HOMO/LUMO energies
|
|
57
|
+
- Internal energy, enthalpy, free energy
|
|
58
|
+
- Heat capacity
|
|
59
|
+
- Electronic spatial extent
|
|
60
|
+
|
|
61
|
+
**Use Cases:**
|
|
62
|
+
- Quantum property prediction
|
|
63
|
+
- Method development benchmarking
|
|
64
|
+
- Pre-training molecular models
|
|
65
|
+
|
|
66
|
+
### Large Molecule Databases
|
|
67
|
+
|
|
68
|
+
| Dataset | Size | Description | Use Case |
|
|
69
|
+
|---------|------|-------------|----------|
|
|
70
|
+
| **ZINC250k** | 250,000 | Drug-like molecules | Generative model training |
|
|
71
|
+
| **ZINC2M** | 2,000,000 | Drug-like molecules | Large-scale pre-training |
|
|
72
|
+
| **ChEMBL** | Millions | Bioactive molecules | Property prediction, generation |
|
|
73
|
+
|
|
74
|
+
## Protein Datasets
|
|
75
|
+
|
|
76
|
+
### Function Prediction
|
|
77
|
+
|
|
78
|
+
| Dataset | Size | Task | Classes | Description |
|
|
79
|
+
|---------|------|------|---------|-------------|
|
|
80
|
+
| **EnzymeCommission** | 17,562 | Multi-class | 7 levels | EC number classification |
|
|
81
|
+
| **GeneOntology** | 46,796 | Multi-label | 489 | GO term prediction (BP/MF/CC) |
|
|
82
|
+
| **BetaLactamase** | 5,864 | Regression | - | Enzyme activity levels |
|
|
83
|
+
| **Fluorescence** | 54,025 | Regression | - | GFP fluorescence intensity |
|
|
84
|
+
| **Stability** | 53,614 | Regression | - | Thermostability (ΔΔG) |
|
|
85
|
+
|
|
86
|
+
**Features:**
|
|
87
|
+
- Sequence and/or structure input
|
|
88
|
+
- Evolutionary information available
|
|
89
|
+
- Multiple train/test splits
|
|
90
|
+
|
|
91
|
+
**Use Cases:**
|
|
92
|
+
- Protein engineering
|
|
93
|
+
- Function annotation
|
|
94
|
+
- Enzyme design
|
|
95
|
+
|
|
96
|
+
### Localization and Solubility
|
|
97
|
+
|
|
98
|
+
| Dataset | Size | Task | Classes | Description |
|
|
99
|
+
|---------|------|------|---------|-------------|
|
|
100
|
+
| **Solubility** | 62,478 | Binary | 2 | Protein solubility |
|
|
101
|
+
| **BinaryLocalization** | 22,168 | Binary | 2 | Membrane vs soluble |
|
|
102
|
+
| **SubcellularLocalization** | 8,943 | Multi-class | 10 | Subcellular compartment |
|
|
103
|
+
|
|
104
|
+
**Use Cases:**
|
|
105
|
+
- Protein expression optimization
|
|
106
|
+
- Target identification
|
|
107
|
+
- Cell biology
|
|
108
|
+
|
|
109
|
+
### Structure Prediction
|
|
110
|
+
|
|
111
|
+
| Dataset | Size | Task | Description |
|
|
112
|
+
|---------|------|------|-------------|
|
|
113
|
+
| **Fold** | 16,712 | Multi-class (1,195) | Structural fold recognition |
|
|
114
|
+
| **SecondaryStructure** | 8,678 | Sequence labeling | 3-state or 8-state prediction |
|
|
115
|
+
| **ProteinNet** | Varied | Contact prediction | Residue-residue contacts |
|
|
116
|
+
|
|
117
|
+
**Use Cases:**
|
|
118
|
+
- Structure prediction pipelines
|
|
119
|
+
- Fold recognition
|
|
120
|
+
- Contact map generation
|
|
121
|
+
|
|
122
|
+
### Protein Interactions
|
|
123
|
+
|
|
124
|
+
| Dataset | Size | Positives | Negatives | Description |
|
|
125
|
+
|---------|------|-----------|-----------|-------------|
|
|
126
|
+
| **HumanPPI** | 1,412 proteins | 6,584 | - | Human protein interactions |
|
|
127
|
+
| **YeastPPI** | 2,018 proteins | 6,451 | - | Yeast protein interactions |
|
|
128
|
+
| **PPIAffinity** | 2,156 pairs | - | - | Binding affinity values |
|
|
129
|
+
|
|
130
|
+
**Use Cases:**
|
|
131
|
+
- PPI prediction
|
|
132
|
+
- Network biology
|
|
133
|
+
- Drug target identification
|
|
134
|
+
|
|
135
|
+
### Protein-Ligand Binding
|
|
136
|
+
|
|
137
|
+
| Dataset | Size | Type | Description |
|
|
138
|
+
|---------|------|------|-------------|
|
|
139
|
+
| **BindingDB** | ~1.5M | Affinity | Comprehensive binding data |
|
|
140
|
+
| **PDBBind** | 20,000+ | 3D complexes | Structure-based binding |
|
|
141
|
+
| - Refined Set | 5,316 | High quality | Curated crystal structures |
|
|
142
|
+
| - Core Set | 285 | Benchmark | Diverse test set |
|
|
143
|
+
|
|
144
|
+
**Use Cases:**
|
|
145
|
+
- Binding affinity prediction
|
|
146
|
+
- Structure-based drug design
|
|
147
|
+
- Scoring function development
|
|
148
|
+
|
|
149
|
+
### Large Protein Databases
|
|
150
|
+
|
|
151
|
+
| Dataset | Size | Description |
|
|
152
|
+
|---------|------|-------------|
|
|
153
|
+
| **AlphaFoldDB** | 200M+ | Predicted structures for most known proteins |
|
|
154
|
+
| **UniProt** | Integration | Sequence and annotation data |
|
|
155
|
+
|
|
156
|
+
## Knowledge Graph Datasets
|
|
157
|
+
|
|
158
|
+
### General Knowledge
|
|
159
|
+
|
|
160
|
+
| Dataset | Entities | Relations | Triples | Domain |
|
|
161
|
+
|---------|----------|-----------|---------|--------|
|
|
162
|
+
| **FB15k** | 14,951 | 1,345 | 592,213 | Freebase (general knowledge) |
|
|
163
|
+
| **FB15k-237** | 14,541 | 237 | 310,116 | Filtered Freebase |
|
|
164
|
+
| **WN18** | 40,943 | 18 | 151,442 | WordNet (lexical) |
|
|
165
|
+
| **WN18RR** | 40,943 | 11 | 93,003 | Filtered WordNet |
|
|
166
|
+
|
|
167
|
+
**Relation Types (FB15k-237):**
|
|
168
|
+
- `/people/person/nationality`
|
|
169
|
+
- `/film/film/genre`
|
|
170
|
+
- `/location/location/contains`
|
|
171
|
+
- `/business/company/founders`
|
|
172
|
+
- Many more...
|
|
173
|
+
|
|
174
|
+
**Use Cases:**
|
|
175
|
+
- Link prediction
|
|
176
|
+
- Relation extraction
|
|
177
|
+
- Knowledge base completion
|
|
178
|
+
|
|
179
|
+
### Biomedical Knowledge
|
|
180
|
+
|
|
181
|
+
| Dataset | Entities | Relations | Triples | Description |
|
|
182
|
+
|---------|----------|-----------|---------|-------------|
|
|
183
|
+
| **Hetionet** | 45,158 | 24 | 2,250,197 | Integrates 29 biomedical databases |
|
|
184
|
+
|
|
185
|
+
**Entity Types in Hetionet:**
|
|
186
|
+
- Genes (20,945)
|
|
187
|
+
- Compounds (1,552)
|
|
188
|
+
- Diseases (137)
|
|
189
|
+
- Anatomy (400)
|
|
190
|
+
- Pathways (1,822)
|
|
191
|
+
- Pharmacologic classes
|
|
192
|
+
- Side effects
|
|
193
|
+
- Symptoms
|
|
194
|
+
- Molecular functions
|
|
195
|
+
- Biological processes
|
|
196
|
+
- Cellular components
|
|
197
|
+
|
|
198
|
+
**Relation Types:**
|
|
199
|
+
- Compound-binds-Gene
|
|
200
|
+
- Gene-associates-Disease
|
|
201
|
+
- Disease-presents-Symptom
|
|
202
|
+
- Compound-treats-Disease
|
|
203
|
+
- Compound-causes-Side effect
|
|
204
|
+
- Gene-participates-Pathway
|
|
205
|
+
- And 18 more...
|
|
206
|
+
|
|
207
|
+
**Use Cases:**
|
|
208
|
+
- Drug repurposing
|
|
209
|
+
- Disease mechanism discovery
|
|
210
|
+
- Target identification
|
|
211
|
+
- Multi-hop reasoning in biomedicine
|
|
212
|
+
|
|
213
|
+
## Citation Network Datasets
|
|
214
|
+
|
|
215
|
+
| Dataset | Nodes | Edges | Classes | Description |
|
|
216
|
+
|---------|-------|-------|---------|-------------|
|
|
217
|
+
| **Cora** | 2,708 | 5,429 | 7 | Machine learning papers |
|
|
218
|
+
| **CiteSeer** | 3,327 | 4,732 | 6 | Computer science papers |
|
|
219
|
+
| **PubMed** | 19,717 | 44,338 | 3 | Biomedical papers |
|
|
220
|
+
|
|
221
|
+
**Use Cases:**
|
|
222
|
+
- Node classification
|
|
223
|
+
- GNN baseline comparisons
|
|
224
|
+
- Method development
|
|
225
|
+
|
|
226
|
+
## Retrosynthesis Datasets
|
|
227
|
+
|
|
228
|
+
| Dataset | Size | Description |
|
|
229
|
+
|---------|------|-------------|
|
|
230
|
+
| **USPTO-50k** | 50,017 | Curated patent reactions, single-step |
|
|
231
|
+
|
|
232
|
+
**Features:**
|
|
233
|
+
- Product → Reactants mapping
|
|
234
|
+
- Atom mapping for reaction centers
|
|
235
|
+
- Canonicalized SMILES
|
|
236
|
+
- Balanced across reaction types
|
|
237
|
+
|
|
238
|
+
**Splits:**
|
|
239
|
+
- Train: ~40,000
|
|
240
|
+
- Validation: ~5,000
|
|
241
|
+
- Test: ~5,000
|
|
242
|
+
|
|
243
|
+
**Use Cases:**
|
|
244
|
+
- Retrosynthesis prediction
|
|
245
|
+
- Reaction type classification
|
|
246
|
+
- Synthetic route planning
|
|
247
|
+
|
|
248
|
+
## Dataset Usage Patterns
|
|
249
|
+
|
|
250
|
+
### Loading Datasets
|
|
251
|
+
|
|
252
|
+
```python
|
|
253
|
+
from torchdrug import datasets
|
|
254
|
+
|
|
255
|
+
# Basic loading
|
|
256
|
+
dataset = datasets.BBBP("~/molecule-datasets/")
|
|
257
|
+
|
|
258
|
+
# With transforms
|
|
259
|
+
from torchdrug import transforms
|
|
260
|
+
transform = transforms.VirtualNode()
|
|
261
|
+
dataset = datasets.BBBP("~/molecule-datasets/", transform=transform)
|
|
262
|
+
|
|
263
|
+
# Protein dataset
|
|
264
|
+
dataset = datasets.EnzymeCommission("~/protein-datasets/")
|
|
265
|
+
|
|
266
|
+
# Knowledge graph
|
|
267
|
+
dataset = datasets.FB15k237("~/kg-datasets/")
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
### Data Splitting
|
|
271
|
+
|
|
272
|
+
```python
|
|
273
|
+
# Random split
|
|
274
|
+
train, valid, test = dataset.split([0.8, 0.1, 0.1])
|
|
275
|
+
|
|
276
|
+
# Scaffold split (for molecules)
|
|
277
|
+
from torchdrug import utils
|
|
278
|
+
train, valid, test = dataset.split(
|
|
279
|
+
utils.scaffold_split(dataset, [0.8, 0.1, 0.1])
|
|
280
|
+
)
|
|
281
|
+
|
|
282
|
+
# Predefined splits (some datasets)
|
|
283
|
+
train, valid, test = dataset.split()
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
### Feature Extraction
|
|
287
|
+
|
|
288
|
+
**Node Features (Molecules):**
|
|
289
|
+
- Atom type (one-hot or embedding)
|
|
290
|
+
- Formal charge
|
|
291
|
+
- Hybridization
|
|
292
|
+
- Aromaticity
|
|
293
|
+
- Number of hydrogens
|
|
294
|
+
- Chirality
|
|
295
|
+
|
|
296
|
+
**Edge Features (Molecules):**
|
|
297
|
+
- Bond type (single, double, triple, aromatic)
|
|
298
|
+
- Stereochemistry
|
|
299
|
+
- Conjugation
|
|
300
|
+
- Ring membership
|
|
301
|
+
|
|
302
|
+
**Node Features (Proteins):**
|
|
303
|
+
- Amino acid type (one-hot)
|
|
304
|
+
- Physicochemical properties
|
|
305
|
+
- Position in sequence
|
|
306
|
+
- Secondary structure
|
|
307
|
+
- Solvent accessibility
|
|
308
|
+
|
|
309
|
+
**Edge Features (Proteins):**
|
|
310
|
+
- Edge type (sequential, spatial, contact)
|
|
311
|
+
- Distance
|
|
312
|
+
- Angles and dihedrals
|
|
313
|
+
|
|
314
|
+
## Choosing Datasets
|
|
315
|
+
|
|
316
|
+
### By Task
|
|
317
|
+
|
|
318
|
+
**Molecular Property Prediction:**
|
|
319
|
+
- Start with BBBP or HIV (medium size, clear task)
|
|
320
|
+
- Use QM9 for quantum properties
|
|
321
|
+
- ESOL/FreeSolv for regression
|
|
322
|
+
|
|
323
|
+
**Protein Function:**
|
|
324
|
+
- EnzymeCommission (well-defined classes)
|
|
325
|
+
- GeneOntology (comprehensive annotations)
|
|
326
|
+
|
|
327
|
+
**Drug Safety:**
|
|
328
|
+
- Tox21 (standard benchmark)
|
|
329
|
+
- ClinTox (clinical relevance)
|
|
330
|
+
|
|
331
|
+
**Structure-Based:**
|
|
332
|
+
- PDBBind (protein-ligand)
|
|
333
|
+
- ProteinNet (structure prediction)
|
|
334
|
+
|
|
335
|
+
**Knowledge Graph:**
|
|
336
|
+
- FB15k-237 (standard benchmark)
|
|
337
|
+
- Hetionet (biomedical applications)
|
|
338
|
+
|
|
339
|
+
**Generation:**
|
|
340
|
+
- ZINC250k (training)
|
|
341
|
+
- QM9 (with properties)
|
|
342
|
+
|
|
343
|
+
**Retrosynthesis:**
|
|
344
|
+
- USPTO-50k (only choice)
|
|
345
|
+
|
|
346
|
+
### By Size and Resources
|
|
347
|
+
|
|
348
|
+
**Small (<5k, for testing):**
|
|
349
|
+
- BACE, FreeSolv, ClinTox
|
|
350
|
+
- Core set of PDBBind
|
|
351
|
+
|
|
352
|
+
**Medium (5k-100k):**
|
|
353
|
+
- BBBP, HIV, ESOL, Tox21
|
|
354
|
+
- EnzymeCommission, Fold
|
|
355
|
+
- FB15k-237, WN18RR
|
|
356
|
+
|
|
357
|
+
**Large (>100k):**
|
|
358
|
+
- QM9, MUV, PCQM4M
|
|
359
|
+
- GeneOntology, AlphaFoldDB
|
|
360
|
+
- ZINC2M, BindingDB
|
|
361
|
+
|
|
362
|
+
### By Domain
|
|
363
|
+
|
|
364
|
+
**Drug Discovery:** BBBP, HIV, Tox21, ESOL, ZINC
|
|
365
|
+
**Quantum Chemistry:** QM7, QM8, QM9, PCQM4M
|
|
366
|
+
**Protein Engineering:** Fluorescence, Stability, Solubility
|
|
367
|
+
**Structural Biology:** Fold, PDBBind, ProteinNet, AlphaFoldDB
|
|
368
|
+
**Biomedical:** Hetionet, GeneOntology, EnzymeCommission
|
|
369
|
+
**Retrosynthesis:** USPTO-50k
|
|
370
|
+
|
|
371
|
+
## Best Practices
|
|
372
|
+
|
|
373
|
+
1. **Start Small**: Test on small datasets before scaling
|
|
374
|
+
2. **Scaffold Split**: Use for realistic drug discovery evaluation
|
|
375
|
+
3. **Balanced Metrics**: Use AUROC + AUPRC for imbalanced data
|
|
376
|
+
4. **Multiple Runs**: Report mean ± std over multiple random seeds
|
|
377
|
+
5. **Data Leakage**: Be careful with pre-trained models
|
|
378
|
+
6. **Domain Knowledge**: Understand what you're predicting
|
|
379
|
+
7. **Validation**: Always use held-out test set
|
|
380
|
+
8. **Preprocessing**: Standardize features, handle missing values
|
|
@@ -0,0 +1,320 @@
|
|
|
1
|
+
# Knowledge Graph Reasoning
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Knowledge graphs represent structured information as entities and relations in a graph format. TorchDrug provides comprehensive support for knowledge graph completion (link prediction) using embedding-based models and neural reasoning approaches.
|
|
6
|
+
|
|
7
|
+
## Available Datasets
|
|
8
|
+
|
|
9
|
+
### General Knowledge Graphs
|
|
10
|
+
|
|
11
|
+
**FB15k (Freebase subset):**
|
|
12
|
+
- 14,951 entities
|
|
13
|
+
- 1,345 relation types
|
|
14
|
+
- 592,213 triples
|
|
15
|
+
- General world knowledge from Freebase
|
|
16
|
+
|
|
17
|
+
**FB15k-237:**
|
|
18
|
+
- 14,541 entities
|
|
19
|
+
- 237 relation types
|
|
20
|
+
- 310,116 triples
|
|
21
|
+
- Filtered version removing inverse relations
|
|
22
|
+
- More challenging benchmark
|
|
23
|
+
|
|
24
|
+
**WN18 (WordNet):**
|
|
25
|
+
- 40,943 entities (word senses)
|
|
26
|
+
- 18 relation types (lexical relations)
|
|
27
|
+
- 151,442 triples
|
|
28
|
+
- Linguistic knowledge graph
|
|
29
|
+
|
|
30
|
+
**WN18RR:**
|
|
31
|
+
- 40,943 entities
|
|
32
|
+
- 11 relation types
|
|
33
|
+
- 93,003 triples
|
|
34
|
+
- Filtered WordNet removing easy inverse patterns
|
|
35
|
+
|
|
36
|
+
### Biomedical Knowledge Graphs
|
|
37
|
+
|
|
38
|
+
**Hetionet:**
|
|
39
|
+
- 45,158 entities (genes, compounds, diseases, pathways, etc.)
|
|
40
|
+
- 24 relation types (treats, causes, binds, etc.)
|
|
41
|
+
- 2,250,197 edges
|
|
42
|
+
- Integrates 29 public biomedical databases
|
|
43
|
+
- Designed for drug repurposing and disease understanding
|
|
44
|
+
|
|
45
|
+
## Task: KnowledgeGraphCompletion
|
|
46
|
+
|
|
47
|
+
The primary task for knowledge graphs is link prediction - given a head entity and relation, predict the tail entity (or vice versa).
|
|
48
|
+
|
|
49
|
+
### Task Modes
|
|
50
|
+
|
|
51
|
+
**Head Prediction:**
|
|
52
|
+
- Given (?, relation, tail), predict head entity
|
|
53
|
+
- "What can cause Disease X?"
|
|
54
|
+
|
|
55
|
+
**Tail Prediction:**
|
|
56
|
+
- Given (head, relation, ?), predict tail entity
|
|
57
|
+
- "What diseases does Gene X cause?"
|
|
58
|
+
|
|
59
|
+
**Both:**
|
|
60
|
+
- Predict both head and tail
|
|
61
|
+
- Standard evaluation protocol
|
|
62
|
+
|
|
63
|
+
### Evaluation Metrics
|
|
64
|
+
|
|
65
|
+
**Ranking Metrics:**
|
|
66
|
+
- **Mean Rank (MR)**: Average rank of correct entity
|
|
67
|
+
- **Mean Reciprocal Rank (MRR)**: Average of 1/rank
|
|
68
|
+
- **Hits@K**: Percentage of correct entities in top K predictions
|
|
69
|
+
- Typically reported for K=1, 3, 10
|
|
70
|
+
|
|
71
|
+
**Filtered vs Raw:**
|
|
72
|
+
- **Filtered**: Remove other known true triples from ranking
|
|
73
|
+
- **Raw**: Rank among all possible entities
|
|
74
|
+
- Filtered is standard for evaluation
|
|
75
|
+
|
|
76
|
+
## Embedding Models
|
|
77
|
+
|
|
78
|
+
### Translational Models
|
|
79
|
+
|
|
80
|
+
**TransE (Translation Embedding):**
|
|
81
|
+
- Represents relations as translations in embedding space
|
|
82
|
+
- h + r ≈ t (head + relation ≈ tail)
|
|
83
|
+
- Simple and effective baseline
|
|
84
|
+
- Works well for 1-to-1 relations
|
|
85
|
+
- Struggles with N-to-N relations
|
|
86
|
+
|
|
87
|
+
**RotatE (Rotation Embedding):**
|
|
88
|
+
- Relations as rotations in complex space
|
|
89
|
+
- Better handles symmetric and inverse relations
|
|
90
|
+
- State-of-the-art on many benchmarks
|
|
91
|
+
- Can model composition patterns
|
|
92
|
+
|
|
93
|
+
### Semantic Matching Models
|
|
94
|
+
|
|
95
|
+
**DistMult:**
|
|
96
|
+
- Bilinear scoring function
|
|
97
|
+
- Handles symmetric relations naturally
|
|
98
|
+
- Cannot model asymmetric relations
|
|
99
|
+
- Fast and memory efficient
|
|
100
|
+
|
|
101
|
+
**ComplEx:**
|
|
102
|
+
- Complex-valued embeddings
|
|
103
|
+
- Models asymmetric and inverse relations
|
|
104
|
+
- Better than DistMult for most graphs
|
|
105
|
+
- Balances expressiveness and efficiency
|
|
106
|
+
|
|
107
|
+
**SimplE:**
|
|
108
|
+
- Extends DistMult with inverse relations
|
|
109
|
+
- Fully expressive (can represent any relation pattern)
|
|
110
|
+
- Two embeddings per entity (canonical and inverse)
|
|
111
|
+
|
|
112
|
+
### Neural Logic Models
|
|
113
|
+
|
|
114
|
+
**NeuralLP (Neural Logic Programming):**
|
|
115
|
+
- Learns logical rules through differentiable operations
|
|
116
|
+
- Interprets predictions via learned rules
|
|
117
|
+
- Good for sparse knowledge graphs
|
|
118
|
+
- Computationally more expensive
|
|
119
|
+
|
|
120
|
+
**KBGAT (Knowledge Base Graph Attention):**
|
|
121
|
+
- Graph attention networks for KG completion
|
|
122
|
+
- Learns entity representations from neighborhood
|
|
123
|
+
- Handles unseen entities through inductive learning
|
|
124
|
+
- Better for incomplete graphs
|
|
125
|
+
|
|
126
|
+
## Training Workflow
|
|
127
|
+
|
|
128
|
+
### Basic Pipeline
|
|
129
|
+
|
|
130
|
+
```python
|
|
131
|
+
from torchdrug import datasets, models, tasks, core
|
|
132
|
+
|
|
133
|
+
# Load dataset
|
|
134
|
+
dataset = datasets.FB15k237("~/kg-datasets/")
|
|
135
|
+
|
|
136
|
+
# Define model
|
|
137
|
+
model = models.RotatE(
|
|
138
|
+
num_entity=dataset.num_entity,
|
|
139
|
+
num_relation=dataset.num_relation,
|
|
140
|
+
embedding_dim=2000,
|
|
141
|
+
max_score=9
|
|
142
|
+
)
|
|
143
|
+
|
|
144
|
+
# Define task
|
|
145
|
+
task = tasks.KnowledgeGraphCompletion(
|
|
146
|
+
model,
|
|
147
|
+
num_negative=128,
|
|
148
|
+
adversarial_temperature=2,
|
|
149
|
+
criterion="bce"
|
|
150
|
+
)
|
|
151
|
+
|
|
152
|
+
# Train with PyTorch Lightning or custom loop
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### Negative Sampling
|
|
156
|
+
|
|
157
|
+
**Strategies:**
|
|
158
|
+
- **Uniform**: Sample entities uniformly at random
|
|
159
|
+
- **Self-Adversarial**: Weight samples by current model's scores
|
|
160
|
+
- **Type-Constrained**: Sample only valid entity types for relation
|
|
161
|
+
|
|
162
|
+
**Parameters:**
|
|
163
|
+
- `num_negative`: Number of negative samples per positive triple
|
|
164
|
+
- `adversarial_temperature`: Temperature for self-adversarial weighting
|
|
165
|
+
- Higher temperature = more focus on hard negatives
|
|
166
|
+
|
|
167
|
+
### Loss Functions
|
|
168
|
+
|
|
169
|
+
**Binary Cross-Entropy (BCE):**
|
|
170
|
+
- Treats each triple independently
|
|
171
|
+
- Balanced classification between positive and negative
|
|
172
|
+
|
|
173
|
+
**Margin Loss:**
|
|
174
|
+
- Ensures positive scores higher than negative by margin
|
|
175
|
+
- `max(0, margin + score_neg - score_pos)`
|
|
176
|
+
|
|
177
|
+
**Logistic Loss:**
|
|
178
|
+
- Smooth version of margin loss
|
|
179
|
+
- Better gradient properties
|
|
180
|
+
|
|
181
|
+
## Model Selection Guide
|
|
182
|
+
|
|
183
|
+
### By Relation Patterns
|
|
184
|
+
|
|
185
|
+
**1-to-1 Relations:**
|
|
186
|
+
- TransE works well
|
|
187
|
+
- Any model will likely succeed
|
|
188
|
+
|
|
189
|
+
**1-to-N Relations:**
|
|
190
|
+
- DistMult, ComplEx, SimplE
|
|
191
|
+
- Avoid TransE
|
|
192
|
+
|
|
193
|
+
**N-to-1 Relations:**
|
|
194
|
+
- DistMult, ComplEx, SimplE
|
|
195
|
+
- Avoid TransE
|
|
196
|
+
|
|
197
|
+
**N-to-N Relations:**
|
|
198
|
+
- ComplEx, SimplE, RotatE
|
|
199
|
+
- Most challenging pattern
|
|
200
|
+
|
|
201
|
+
**Symmetric Relations:**
|
|
202
|
+
- DistMult, ComplEx
|
|
203
|
+
- RotatE with proper initialization
|
|
204
|
+
|
|
205
|
+
**Antisymmetric Relations:**
|
|
206
|
+
- ComplEx, SimplE, RotatE
|
|
207
|
+
- Avoid DistMult
|
|
208
|
+
|
|
209
|
+
**Inverse Relations:**
|
|
210
|
+
- ComplEx, SimplE, RotatE
|
|
211
|
+
- Important for bidirectional reasoning
|
|
212
|
+
|
|
213
|
+
**Composition:**
|
|
214
|
+
- RotatE (best)
|
|
215
|
+
- TransE (reasonable)
|
|
216
|
+
- Captures multi-hop paths
|
|
217
|
+
|
|
218
|
+
### By Dataset Characteristics
|
|
219
|
+
|
|
220
|
+
**Small Graphs (< 50k entities):**
|
|
221
|
+
- ComplEx or SimplE
|
|
222
|
+
- Lower embedding dimensions (200-500)
|
|
223
|
+
|
|
224
|
+
**Large Graphs (> 100k entities):**
|
|
225
|
+
- DistMult for efficiency
|
|
226
|
+
- RotatE for accuracy
|
|
227
|
+
- Higher dimensions (500-2000)
|
|
228
|
+
|
|
229
|
+
**Sparse Graphs:**
|
|
230
|
+
- NeuralLP (learns rules from limited data)
|
|
231
|
+
- Pre-train embeddings on larger graphs
|
|
232
|
+
|
|
233
|
+
**Dense, Complete Graphs:**
|
|
234
|
+
- Any embedding model works well
|
|
235
|
+
- Choose based on relation patterns
|
|
236
|
+
|
|
237
|
+
**Biomedical/Domain Graphs:**
|
|
238
|
+
- Consider type constraints in sampling
|
|
239
|
+
- Use domain-specific negative sampling
|
|
240
|
+
- Hetionet benefits from relation-specific models
|
|
241
|
+
|
|
242
|
+
## Advanced Techniques
|
|
243
|
+
|
|
244
|
+
### Multi-Hop Reasoning
|
|
245
|
+
|
|
246
|
+
Chain multiple relations to answer complex queries:
|
|
247
|
+
- "What drugs treat diseases caused by gene X?"
|
|
248
|
+
- Requires path-based or rule-based reasoning
|
|
249
|
+
- NeuralLP naturally supports this
|
|
250
|
+
|
|
251
|
+
### Temporal Knowledge Graphs
|
|
252
|
+
|
|
253
|
+
Extend to time-varying facts:
|
|
254
|
+
- Add temporal information to triples
|
|
255
|
+
- Predict future facts
|
|
256
|
+
- Requires temporal encoding in models
|
|
257
|
+
|
|
258
|
+
### Few-Shot Learning
|
|
259
|
+
|
|
260
|
+
Handle relations with few examples:
|
|
261
|
+
- Meta-learning approaches
|
|
262
|
+
- Transfer from related relations
|
|
263
|
+
- Important for emerging knowledge
|
|
264
|
+
|
|
265
|
+
### Inductive Learning
|
|
266
|
+
|
|
267
|
+
Generalize to unseen entities:
|
|
268
|
+
- KBGAT and other GNN-based methods
|
|
269
|
+
- Use entity features/descriptions
|
|
270
|
+
- Critical for evolving knowledge graphs
|
|
271
|
+
|
|
272
|
+
## Biomedical Applications
|
|
273
|
+
|
|
274
|
+
### Drug Repurposing
|
|
275
|
+
|
|
276
|
+
Predict "drug treats disease" links in Hetionet:
|
|
277
|
+
1. Train on known drug-disease associations
|
|
278
|
+
2. Predict new treatment candidates
|
|
279
|
+
3. Filter by mechanism (gene, pathway involvement)
|
|
280
|
+
4. Validate predictions experimentally
|
|
281
|
+
|
|
282
|
+
### Disease Gene Discovery
|
|
283
|
+
|
|
284
|
+
Identify genes associated with diseases:
|
|
285
|
+
1. Model gene-disease-pathway networks
|
|
286
|
+
2. Predict missing gene-disease links
|
|
287
|
+
3. Incorporate protein interactions, expression data
|
|
288
|
+
4. Prioritize candidates for validation
|
|
289
|
+
|
|
290
|
+
### Protein Function Prediction
|
|
291
|
+
|
|
292
|
+
Link proteins to biological processes:
|
|
293
|
+
1. Integrate protein interactions, GO terms
|
|
294
|
+
2. Predict missing GO annotations
|
|
295
|
+
3. Transfer function from similar proteins
|
|
296
|
+
|
|
297
|
+
## Common Issues and Solutions
|
|
298
|
+
|
|
299
|
+
**Issue: Poor performance on specific relation types**
|
|
300
|
+
- Solution: Analyze relation patterns, choose appropriate model, or use relation-specific models
|
|
301
|
+
|
|
302
|
+
**Issue: Overfitting on small graphs**
|
|
303
|
+
- Solution: Reduce embedding dimension, increase regularization, or use simpler models
|
|
304
|
+
|
|
305
|
+
**Issue: Slow training on large graphs**
|
|
306
|
+
- Solution: Reduce negative samples, use DistMult for efficiency, or implement mini-batch training
|
|
307
|
+
|
|
308
|
+
**Issue: Cannot handle new entities**
|
|
309
|
+
- Solution: Use inductive models (KBGAT), incorporate entity features, or pre-compute embeddings for new entities based on their neighbors
|
|
310
|
+
|
|
311
|
+
## Best Practices
|
|
312
|
+
|
|
313
|
+
1. Start with ComplEx or RotatE for most tasks
|
|
314
|
+
2. Use self-adversarial negative sampling
|
|
315
|
+
3. Tune embedding dimension (typically 500-2000)
|
|
316
|
+
4. Apply regularization to prevent overfitting
|
|
317
|
+
5. Use filtered evaluation metrics
|
|
318
|
+
6. Analyze performance per relation type
|
|
319
|
+
7. Consider relation-specific models for heterogeneous graphs
|
|
320
|
+
8. Validate predictions with domain experts
|