PyPI - polygenic-pgx - Versions diffs - 2.5.0__tar.gz - Mend

polygenic-pgx 2.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

polygenic_pgx-2.5.0/LICENSE +1 -0
polygenic_pgx-2.5.0/PKG-INFO +43 -0
polygenic_pgx-2.5.0/README.md +604 -0
polygenic_pgx-2.5.0/polygenic/__init__.py +0 -0
polygenic_pgx-2.5.0/polygenic/__main__.py +12 -0
polygenic_pgx-2.5.0/polygenic/data/__init__.py +0 -0
polygenic_pgx-2.5.0/polygenic/data/csv_accessor.py +143 -0
polygenic_pgx-2.5.0/polygenic/data/data_accessor.py +109 -0
polygenic_pgx-2.5.0/polygenic/data/gwas.py +638 -0
polygenic_pgx-2.5.0/polygenic/data/mobigen_utils.py +70 -0
polygenic_pgx-2.5.0/polygenic/data/polars_frame.py +128 -0
polygenic_pgx-2.5.0/polygenic/data/snp_data.py +23 -0
polygenic_pgx-2.5.0/polygenic/data/vcf_accessor.py +220 -0
polygenic_pgx-2.5.0/polygenic/data/vcf_record.py +213 -0
polygenic_pgx-2.5.0/polygenic/error/__init__.py +0 -0
polygenic_pgx-2.5.0/polygenic/error/polygenic_exception.py +3 -0
polygenic_pgx-2.5.0/polygenic/model/__init__.py +0 -0
polygenic_pgx-2.5.0/polygenic/model/model.py +799 -0
polygenic_pgx-2.5.0/polygenic/model/utils.py +140 -0
polygenic_pgx-2.5.0/polygenic/pgstk.py +212 -0
polygenic_pgx-2.5.0/polygenic/resources/chromsizes/hg38.chrom.sizes +26 -0
polygenic_pgx-2.5.0/polygenic/rsidx/__init__.py +38 -0
polygenic_pgx-2.5.0/polygenic/rsidx/__main__.py +31 -0
polygenic_pgx-2.5.0/polygenic/rsidx/_version.py +520 -0
polygenic_pgx-2.5.0/polygenic/rsidx/cli.py +83 -0
polygenic_pgx-2.5.0/polygenic/rsidx/index.py +69 -0
polygenic_pgx-2.5.0/polygenic/rsidx/search.py +63 -0
polygenic_pgx-2.5.0/polygenic/tools/__init__.py +6 -0
polygenic_pgx-2.5.0/polygenic/tools/data/__init__.py +0 -0
polygenic_pgx-2.5.0/polygenic/tools/data/chromsizes.py +111 -0
polygenic_pgx-2.5.0/polygenic/tools/data/colors.py +49 -0
polygenic_pgx-2.5.0/polygenic/tools/gwasfilecreate.py +22 -0
polygenic_pgx-2.5.0/polygenic/tools/modelbiobankuk.py +196 -0
polygenic_pgx-2.5.0/polygenic/tools/modelgwasfile.py +14 -0
polygenic_pgx-2.5.0/polygenic/tools/modelpgscat.py +164 -0
polygenic_pgx-2.5.0/polygenic/tools/pgscompute.py +262 -0
polygenic_pgx-2.5.0/polygenic/tools/plotmanhattan.py +101 -0
polygenic_pgx-2.5.0/polygenic/tools/tsvtovcf.py +24 -0
polygenic_pgx-2.5.0/polygenic/tools/utils.py +562 -0
polygenic_pgx-2.5.0/polygenic/tools/vcfimpute.py +263 -0
polygenic_pgx-2.5.0/polygenic/tools/vcfindex.py +5 -0
polygenic_pgx-2.5.0/polygenic/tools/vcfstatbaf.py +98 -0
polygenic_pgx-2.5.0/polygenic/tools/vcfstatzygosity.py +75 -0
polygenic_pgx-2.5.0/polygenic/version.py +3 -0
polygenic_pgx-2.5.0/polygenic_pgx.egg-info/PKG-INFO +43 -0
polygenic_pgx-2.5.0/polygenic_pgx.egg-info/SOURCES.txt +51 -0
polygenic_pgx-2.5.0/polygenic_pgx.egg-info/dependency_links.txt +1 -0
polygenic_pgx-2.5.0/polygenic_pgx.egg-info/entry_points.txt +4 -0
polygenic_pgx-2.5.0/polygenic_pgx.egg-info/requires.txt +18 -0
polygenic_pgx-2.5.0/polygenic_pgx.egg-info/top_level.txt +1 -0
polygenic_pgx-2.5.0/pyproject.toml +5 -0
polygenic_pgx-2.5.0/setup.cfg +4 -0
polygenic_pgx-2.5.0/setup.py +74 -0

polygenic_pgx-2.5.0/LICENSE ADDED Viewed

@@ -0,0 +1 @@

+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

polygenic_pgx-2.5.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,43 @@
+Metadata-Version: 2.4
+Name: polygenic-pgx
+Version: 2.5.0
+Summary: Polygenic score toolkit
+Home-page: https://github.com/marpiech/polygenic
+Author: Marcin Piechota, Wojciech Galan
+Author-email: piechota@intelliseq.com
+License: Intelliseq dual licenses this package. For commercial use, please contact [contact @ intelliseq.com](mailto:contact@intelliseq.com). For non-commercial use, this license permits use of the software only by government agencies, schools, universities, non-profit organizations or individuals on projects that do not receive external funding other than government research grants and contracts. Any other use requires a commercial license. For the full license, please see [LICENSE.md](https://github.com/intelliseq/polygenic/blob/master/LICENSE.md), in this source repository.
+Classifier: Development Status :: 5 - Production/Stable
+Classifier: License :: Free for non-commercial use
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: Unix
+Classifier: Operating System :: MacOS
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Requires-Python: >=3.8
+License-File: LICENSE
+Requires-Dist: numpy
+Requires-Dist: scipy
+Requires-Dist: progressbar2
+Requires-Dist: python-utils
+Requires-Dist: pysam
+Requires-Dist: pytabix
+Requires-Dist: pandas
+Requires-Dist: plotly
+Requires-Dist: kaleido
+Requires-Dist: DotMap
+Requires-Dist: pyyaml
+Requires-Dist: tqdm
+Requires-Dist: plotnine
+Requires-Dist: kaleido
+Requires-Dist: polars
+Requires-Dist: importlib-resources
+Requires-Dist: logdecorator
+Requires-Dist: pyarrow
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: home-page
+Dynamic: license
+Dynamic: license-file
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary

polygenic_pgx-2.5.0/README.md ADDED Viewed

@@ -0,0 +1,604 @@
+# polygenic - the polygenic scores toolkit
+## Basic info
+[![PyPI pyversions](https://img.shields.io/pypi/pyversions/polygenic.svg)](https://pypi.python.org/pypi/polygenic/)
+[![PyPI](https://img.shields.io/pypi/v/polygenic.svg)](https://pypi.python.org/pypi/polygenic)
+[![Maintainer]](https://img.shields.io/badge/maintainer-marpiech-blue)
+## Downloads
+- pip [![PyPI download month](https://img.shields.io/pypi/dm/polygenic.svg)](https://pypi.python.org/pypi/polygenic/)
+- docker with data [![Docker](https://img.shields.io/docker/pulls/marpiech/polygenictk.svg)](https://hub.docker.com/repository/docker/marpiech/polygenictk)
+- docker without data [![Docker](https://img.shields.io/docker/pulls/intelliseq/polygenic.svg)](https://hub.docker.com/repository/docker/intelliseq/polygenic)
+## Index
+* [Summary](#summary)
+* [Diplotyping Algorithm](#diplotyping-algorithm)
+* [Installation](#installation)
+  * [With pip](#with-pip)
+  * [With conda](#with-conda)
+  * [With docker](#with-docker)
+* [Quick start guide](#quick-start-guide)
+* [Manual](#manual)
+  * [Tools](#tools)
+    * [pgs-compute](#pgs_compute)
+    * [pgs-build](#pgs_build)
+    * [pgs-validate](#pgs_validate)
+    * [vcf-index](#vcf_index)
+    * [vcf-validate](#vcf_validate)
+    * [vcf-stat](#vcf_stat)
+    * [model-biobankuk](#model_biobankuk)
+    * [model-pgscat](#model_pgscat)
+    * [model-gbe](#model_gbe)
+    * [model-pharmvar](#model_pharmvar)
+  * [Docker images](#docker_images)
+  * [Building models](#building_models)
+  * [Example models](#example_models)
+  * [Usecases](#usecases)
+    * [pgx](#pgx)
+* [License](#license)
+* [Updates](#updates)
+## Summary
+Polygenic is a toolkit for a wide range of polygenic scores analysis tasks. The most important use cases include computing scores for samples in vcf files, building scores for GWAS results or fetching scores from repositories.
+## Diplotyping Algorithm
+We begin by reading individual genetic variants (genotypes) from the patient's VCF file, where each genotype carries two alleles — one per chromosome. The system supports phasing using a custom reference panel to resolve which alleles sit on the same chromosome; when phased data is available, the algorithm preserves this linkage information, while for unphased data both chromosomes are treated symmetrically. We define haplotypes as specific combinations of co-inherited variants that form recognized gene versions, such as pharmacogenomic star alleles, where each haplotype definition distinguishes core defining variants (weighted at 1.0) from supportive sub-lineage variants (weighted at 0.05). The algorithm scores every candidate haplotype against the patient's alleles, then selects the best-matching haplotype for the first chromosome — keeping all candidates within a 2% margin of the top score. The matched alleles are then "claimed" by that haplotype, and the remaining unmatched alleles (leftovers) are passed into a second round, where up to 100 candidate haplotypes are re-scored against only those residual alleles to identify the second haplotype. Each first/second haplotype pair is ranked by combined match percentage and filtered by total missing data, producing the final diplotype call — e.g., CYP2D6 \*1/\*4. Every variant in the result carries a source label — direct genotyping, LD proxy, imputation flag from VCF, allele-frequency-based imputation, reference, or missing — enabling granular quality control and full traceability of the diplotype call.
+### Missing variant handling (`--ref-fallback`)
+By default, variants called `./.` in the input VCF are treated as **missing** (`source: "missing"`) — they are subtracted from both the numerator and the denominator of the haplotype match score, so they neither support nor penalize any candidate. This matches the approach taken by [PharmCAT's Named Allele Matcher](https://pharmcat.clinpgx.org/methods/NamedAlleleMatcher-101/), which explicitly drops missing positions rather than imputing them: *"If the sample data has missing positions that are required by a named allele definition, the position will be dropped from consideration."*
+Passing `--ref-fallback` restores the legacy behavior of filling `./.` with homozygous reference (`source: "reference"`). This is almost always the **wrong** default for pharmacogenomic panels. The canonical example is **CYP2C19**: the variant rs3758581 at chr10:94842866 has GRCh38 reference `A`, but `G` is the major allele in every population and PharmVar v5 defines every "real" star allele (\*1, \*2, \*17 …) as requiring `G` at this position, reserving CYP2C19\*38 (= old \*1.001) for the minority carrying the reference `A` (see [PharmVar GeneFocus: CYP2C19](https://pmc.ncbi.nlm.nih.gov/articles/PMC7769975/)). If rs3758581 is not on the panel and the code fills it with the reference, every sample drifts toward \*38/\*38. Default off avoids this trap. Enable `--ref-fallback` only when your panel's `./.` genuinely means "0/0 and I chose not to spell it out" — for example when re-running test fixtures authored under the old default.
+### Match-confidence gate (`--top-n`)
+After scoring all candidate haplotype pairs, the algorithm returns a `haplotype_id` when **either** of these holds:
+- the best pair's per-chromosome `max_percent_match` is `≥ 50%` (high-confidence call), **or**
+- the total scored-candidate pool size is `≤ top_n` (default **15**) — a small, non-ambiguous candidate space is itself a confidence signal.
+Otherwise `haplotype_id` is `None` (the caller refuses to guess). This is more conservative than [PharmCAT](https://pharmcat.clinpgx.org/methods/NamedAlleleMatcher-101/), which always returns the top-scoring diplotype with deterministic tie-breaking, and roughly comparable to [Aldy 4](https://pmc.ncbi.nlm.nih.gov/articles/PMC9977157/), which reports "no more than three diplotypes" when it cannot fully disambiguate. Tune `--top-n` higher (laxer) or lower (stricter) depending on how much ambiguity is acceptable downstream. Set `--top-n 0` to rely solely on the 50% threshold.
+### CYP2C19*38 worked example
+On a clinical panel that does **not** assay rs3758581, running `pgs-compute` with the CYP2C19 PharmVar model on four real samples (three carrying non-reference CYP2C19 variants, one all-reference) produces these calls:
+| Sample | Non-ref CYP2C19 variants | Default (no `--ref-fallback`) | Legacy (`--ref-fallback`) |
+|---|---|---|---|
+| Sample A | none | `*1.001/*1.001` (≡ \*1/\*1) | `*1.001/*1.001` |
+| Sample B | rs12769205, rs4244285 (het) | `*2/*1.001` ✓ | `*1.001/*1.001` ✗ |
+| Sample C | rs12248560 (hom) | `*17/*17` ✓ | `*1.001/*1.001` ✗ |
+| Sample D | rs12769205, rs4244285 (het) | `*2/*1.001` ✓ | `*1.001/*1.001` ✗ |
+With the legacy `--ref-fallback` on, the missing rs3758581 becomes `A/A` (genomic reference), which does not match the `G` required by \*1, \*2, \*17 etc., so every sample collapses to the empty-haplotype \*1.001 (equivalent to \*38 in PharmVar v5 terms). With the new default, rs3758581 stays missing, the scoring ignores it, and the panel's actual informative variants drive the call.
+## Installation
+### With pip
+#### Install for user account
+```
+python3 -m pip install --upgrade polygenic
+```
+#### Install globally
+```
+sudo -H python3 -m pip install polygenic
+```
+### With conda
+Run conda image
+```
+docker run -it conda/miniconda3 /bin/bash
+```
+Create python3.8 environment and install polygenic
+```
+yes | conda create --name py38 python=3.8
+eval "$(conda shell.bash hook)"
+conda activate py38
+### should be 3.8
+python --version
+### gcc is missing to build pytabix
+apt -qq update
+apt -y install build-essential tabix
+pip install polygenic
+```
+### With docker
+#### Large image with all data included
+```
+docker run intelliseq:polygenictk:2.1.0 *command*
+```
+#### Thin image with just polygenic package installed
+```
+docker run intelliseq:polygenic:2.1.0 *command*
+```
+## Quick start guide
+```
+mkdir polygenic && cd polygenic # create working directory
+wget https://downloads.intelliseq.com/public/polygenic/gbe-INI78-bone-density.yml # download model
+wget https://downloads.intelliseq.com/public/polygenic/illu_merged-imputed.vcf.gz # download genotypes
+wget https://downloads.intelliseq.com/public/polygenic/illu_merged-imputed.vcf.gz.tbi # download position index
+wget https://downloads.intelliseq.com/public/polygenic/illu_merged-imputed.vcf.gz.idx.db # download rsid index
+docker run -v $(pwd):/data intelliseq/polygenic:latest --vcf /data/illu_merged-imputed.vcf.gz --model /data/gbe-INI78-bone-density.yml --output-directory /data # compute model
+```
+## Manual
+### Tools
+#### pgs-compute
+```
+usage: pgstk [-h] -i VCF [-m MODEL [MODEL ...]] [-p PARAMETERS] [-s SAMPLE_NAME] [-o OUTPUT_DIRECTORY] [-n OUTPUT_NAME_APPENDIX] [-l LOG_FILE] [--af AF] [--af-field AF_FIELD]
+             [-v] [--print]
+pgs-compute computes polygenic scores for genotyped sample in vcf format
+optional arguments:
+  -h, --help            show this help message and exit
+  -i, --vcf VCF         vcf.gz file with genotypes
+  -m, --model MODEL [MODEL ...]
+                        path to .yml model (can be specified multiple times with space as separator)
+  -p, --parameters PARAMETERS
+                        parameters json (to be used in formula models)
+  -s, --sample-name SAMPLE_NAME
+                        sample name in vcf.gz to calculate
+  -o, --output-directory OUTPUT_DIRECTORY
+                        output directory
+  -n, --output-name-appendix OUTPUT_NAME_APPENDIX
+                        appendix for output file names
+  -l, --log-file LOG_FILE
+                        path to log file
+  --af AF               vcf file containing allele freq data
+  --af-field AF_FIELD   name of the INFO field to be used as allele frequency
+  -v, --version         show program's version number and exit
+  --print               Print output to stdout
+```
+### Arguments
+#### Required
+- `--vcf` vcf.gz file with genotypes (tabix index should be available)
+- `--model` path to model file
+#### Optional
+- `--log_file` log file
+- `--out_dir` directory for result jsons
+- `--population` population code
+- `--models_path` path to a directory containing models
+- `--af` an indexed vcf.gz file containing allele freq data
+- `--version` prints version of package
+## Building models in yml
+Index:
+[Model structure](#model_structure)
+[Model types](#model_types)
+[Parameters](#parameters)
+### Model structure
+##### Core structure
+Models have two properties which is `model` and `description`. `model` is a specification of computation to be performed and `description` is additional information to be included in the result.
+```
+model:
+description:
+```
+##### Object keys
+Each object that is not collection has a set of predefined keys (required or optional) that can be used for computation. For example: `diplotype_model` object has a required `diplotypes` key.
+```
+diplotype_model:
+  diplotypes:
+```
+The computation is first delegated to key specified objects and later aggregated by the top level object itself.
+##### Collections
+There is special category of objects that don't have predefined keys but are collections. Each key within collection becomes element of collection. Collections are easy to recognize, because they are specified in plural form like `diplotypes` or `variants`. Each element of collection will be defined as singular object of collection type. For example key in `variants` collection will becomes objects of `variant` type.
+```
+      variants:
+        rs7041: {diplotype: C/C}
+        rs4588: {diplotype: T/T}
+```
+##### Variants
+Variants can be identified by rsid. Variant value will be computed basing on information provided: `diplotype` or `effect_allele`.
+Accepted sets of fields are:
+- diplotypes
+    - `diplotype`
+    - `symbol`
+- score
+    - `effect_allele`
+    - `effect_size`
+    - `symbol`
+### Model types
+There are currently implemented four types of models:
+- `score_model`
+- `diplotype_model`
+- `haplotype_model`
+- `formula_model`
+The type of model can be specified at the top of yml structure or within the `model` field.
+##### Specification of model type at the top of yml structure
+```
+diplotype_model:
+description:
+```
+##### Specification of model type within the `model` field
+```
+model:
+  diplotype_model:
+description:
+```
+### Parameters
+External parameters can be used in `formula_model` through `@parameters` keyword.
+Example parameters file in `.json` format:
+```
+{"sex": "F"}
+```
+Path to file can be provided as argument to polygenic tool:
+```
+--parameters /path/to/parameters.json
+```
+Example of use of parameters in the `formula_model`:
+```
+formula_model:
+  formula:
+    value: "@female.score_model.value if @parameters.sex == 'F' else @male.score_model.value"
+  male:
+    score_model:
+      variants:
+        ...
+  female:
+    score_model:
+      variants:
+```
+## Example models
+### Example diplotype model
+This example diplotype model is based on [Randolph 2014](https://pubmed.ncbi.nlm.nih.gov/24447085/).
+```
+diplotype_model:
+  diplotypes:
+    1/1:
+      variants:
+        rs7041: {diplotype: C/C}
+        rs4588: {diplotype: T/T}
+    1/1s:
+      variants:
+        rs7041: {diplotype: C/C}
+        rs4588: {diplotype: T/G}
+    1/1f:
+      variants:
+        rs7041: {diplotype: C/A}
+        rs4588: {diplotype: T/G}
+    1/2:
+      variants:
+        rs7041: {diplotype: C/A}
+        rs4588: {diplotype: T/T}
+    1s/1s:
+      variants:
+        rs7041: {diplotype: C/C}
+        rs4588: {diplotype: G/G}
+    1s/1f:
+      variants:
+        rs7041: {diplotype: C/A}
+        rs4588: {diplotype: G/G}
+    1s/2:
+      variants:
+        rs7041: {diplotype: C/A}
+        rs4588: {diplotype: G/T}
+    1f/1f:
+      variants:
+        rs7041: {diplotype: A/A}
+        rs4588: {diplotype: G/G}
+    1f/2:
+      variants:
+        rs7041: {diplotype: A/A}
+        rs4588: {diplotype: G/T}
+    2/2:
+      variants:
+        rs7041: {diplotype: A/A}
+        rs4588: {diplotype: T/T}
+description:
+  pmid: 24447085
+  genes: [GC]
+  result_diplotype_choice:
+    1/1: Moderate
+    1/1s: High
+    1/1f: High
+    1/2: Low
+    1s/1s: Very high
+    1s/1f: Very high
+    1s/2: Moderate
+    1f/1f: Very high
+    1f/2: Moderate
+    2/2: Very low
+```
+### Example haplotype model
+Haplotype model can be used for HLA and PGx.
+To define haplotype models a list of alleles is required (called `variants` in this case, to be consistent with othe rypes of models). Each allele has associated list of defining mutations (alternative SNV alles) defined by Gnomad ID along with `ref`, `alt` and `effect_allele` properties. One star allele should be empty (containing only reference SNV alleles). The algorithm will utilised any phasing information in the vcf.
+```
+haplotype_model:
+  variants:
+    CYP2D6*1.001:
+    CYP2D6*1.002:
+      22-42126963-C-T: {ref: "C", alt: "T", effect_allele: "T"}
+    CYP2D6*1.003:
+      22-42128813-G-A: {ref: "G", alt: "A", effect_allele: "A"}
+    CYP2D6*1.004:
+      22-42128216-G-T: {ref: "G", alt: "T", effect_allele: "T"}
+    CYP2D6*1.005:
+      22-42128922-A-G: {ref: "A", alt: "G", effect_allele: "G"}
+    CYP2D6*1.006:
+      22-42129726-A-C: {ref: "A", alt: "C", effect_allele: "C"}
+      22-42129950-A-C: {ref: "A", alt: "C", effect_allele: "C"}
+      22-42130482-C-A: {ref: "C", alt: "A", effect_allele: "A"}
+```
+For copy-number star alleles (CYP2D6 `*5`/`*1xN`, CYP2C19 `*36`/`*37`) the model gains a
+`copy_number:` block and `structural:` haplotypes — see **[docs/pgx-cnv.md](docs/pgx-cnv.md)**
+for the VCF contract and YAML schema.
+### Example score model with categories rescaling
+```
+score_model:
+  variants:
+    rs10012: {effect_allele: G, effect_size: 0.369215857410143}
+    rs1014971: {effect_allele: T, effect_size: 0.075546961392531}
+    rs10936599: {effect_allele: C, effect_size: 0.086359830674748}
+    rs11892031: {effect_allele: C, effect_size: -0.552841968657781}
+    rs1495741: {effect_allele: A, effect_size: 0.05307844348342}
+    rs17674580: {effect_allele: C, effect_size: 0.187520720836463}
+    rs2294008: {effect_allele: T, effect_size: 0.08278537031645}
+    rs798766: {effect_allele: T, effect_size: 0.093421685162235}
+    rs9642880: {effect_allele: G, effect_size: 0.093421685162235}
+  categories:
+    High risk: {from: 1.371624087, to: 2.581880425, scale_from: 2, scale_to: 3}
+    Potential risk: {from: 1.169616034, to: 1.371624087, scale_from: 1, scale_to: 2}
+    Average risk: {from: -0.346748358, to: 1.169616034, scale_from: 0, scale_to: 1}
+    Low risk: {from: -1.657132197, to: -0.346748358, scale_from: -1, scale_to: 0}
+description:
+  about:
+  genes: []
+  result_statement_choice:
+    Average risk: Avg
+    Potential risk: Pot
+    High risk: Hig
+    Low risk: Low
+  science_behind_the_test:
+  test_type: Polygenic Risk Score
+  trait: Breast cancer
+  trait_authors:
+    - taken from the PGS catalog
+  trait_copyright: Intelliseq all rights reserved
+  trait_explained: None
+  trait_heritability: None
+  trait_pgs_id: PGS000001
+  trait_pmids:
+    - 25855707
+  trait_snp_heritability: None
+  trait_title: Breast_Cancer
+  trait_version: 1.0
+  what_you_can_do_choice:
+    Average risk:
+    High risk:
+    Low risk:
+  what_your_result_means_choice:
+    Average risk:
+    High risk:
+    Low risk:
+ ```
+#### Example Formula Model
+```
+formula_model:
+  formula:
+    brownexp: "math.exp(@brown.score_model.value - 2.0769)"
+    redexp: "math.exp(@red.score_model.value - 6.3953)"
+    blackexp: "math.exp(@black.score_model.value - 2.4029)"
+    sumexp: "@brownexp + @redexp + @blackexp"
+    brown_prob: "@brownexp / (1 + @sumexp)"
+    red_prob: "@redexp / (1 + @sumexp)"
+    black_prob: "@blackexp / (1 + @sumexp)"
+    blonde_prob: "1 - (@brown_prob + @red_prob + @black_prob)"
+  brown:
+    score_model:
+      variants:
+        rs796296176: {effect_allele: CA, effect_size: 1.2522}
+        rs11547464: {effect_allele: A, effect_size: -0.61155}
+        rs885479: {effect_allele: T, effect_size: 0.2937}
+        rs1805008: {effect_allele: T, effect_size: -0.50143}
+        rs1805005: {effect_allele: T, effect_size: 0.21172}
+        rs1805006: {effect_allele: A, effect_size: 1.9293}
+        rs1805007: {effect_allele: T, effect_size: -0.32318}
+        rs1805009: {effect_allele: C, effect_size: 0.60861}
+        rs1805009: {effect_allele: A, effect_size: 0.25624}
+        rs2228479: {effect_allele: A, effect_size: -0.054143}
+        rs1110400: {effect_allele: C, effect_size: -0.56315}
+        rs28777: {effect_allele: C, effect_size: 0.52168}
+        rs16891982: {effect_allele: C, effect_size: 0.75284}
+        rs12821256: {effect_allele: G, effect_size: -0.34957}
+        rs4959270: {effect_allele: A, effect_size: -0.19171}
+        rs12203592: {effect_allele: T, effect_size: 1.6475}
+        rs1042602: {effect_allele: T, effect_size: 0.16092}
+        rs1800407: {effect_allele: A, effect_size: -0.19111}
+        rs2402130: {effect_allele: G, effect_size: 0.35821}
+        rs12913832: {effect_allele: T, effect_size: 1.214}
+        rs2378249: {effect_allele: C, effect_size: 0.12669}
+        rs683: {effect_allele: C, effect_size: 0.21172}
+  red:
+    score_model:
+      variants:
+        rs796296176: {effect_allele: CA, effect_size: 25.508}
+        rs11547464: {effect_allele: A, effect_size: 2.5381}
+        rs885479: {effect_allele: T, effect_size: -0.20889}
+        rs1805008: {effect_allele: T, effect_size: 2.801}
+        rs1805005: {effect_allele: T, effect_size: 0.93493}
+        rs1805006: {effect_allele: A, effect_size: 3.65}
+        rs1805007: {effect_allele: T, effect_size: 3.4408}
+        rs1805009: {effect_allele: C, effect_size: 4.5868}
+        rs1805009: {effect_allele: A, effect_size: 22.107}
+        rs2228479: {effect_allele: A, effect_size: 0.62307}
+        rs1110400: {effect_allele: C, effect_size: 1.4453}
+        rs28777: {effect_allele: C, effect_size: 0.70401}
+        rs16891982: {effect_allele: C, effect_size: -0.41869}
+        rs12821256: {effect_allele: G, effect_size: -0.57964}
+        rs4959270: {effect_allele: A, effect_size: 0.24861}
+        rs12203592: {effect_allele: T, effect_size: 0.90233}
+        rs1042602: {effect_allele: T, effect_size: 0.45003}
+        rs1800407: {effect_allele: A, effect_size: -0.27606}
+        rs2402130: {effect_allele: G, effect_size: 0.28313}
+        rs12913832: {effect_allele: T, effect_size: -0.093776}
+        rs2378249: {effect_allele: C, effect_size: 0.76634}
+        rs683: {effect_allele: C, effect_size: -0.053427}
+  black:
+    score_model:
+      variants:
+        rs796296176: {effect_allele: CA, effect_size: 2.732}
+        rs11547464: {effect_allele: A, effect_size: -16.969}
+        rs885479: {effect_allele: T, effect_size: 0.39983}
+        rs1805008: {effect_allele: T, effect_size: -0.86062}
+        rs1805005: {effect_allele: T, effect_size: -0.0029013}
+        rs1805006: {effect_allele: A, effect_size: -16.088}
+        rs1805007: {effect_allele: T, effect_size: -1.3757}
+        rs1805009: {effect_allele: C, effect_size: 0.060631}
+        rs1805009: {effect_allele: A, effect_size: 3.9824}
+        rs2228479: {effect_allele: A, effect_size: 0.17012}
+        rs1110400: {effect_allele: C, effect_size: 0.29143}
+        rs28777: {effect_allele: C, effect_size: 0.82228}
+        rs16891982: {effect_allele: C, effect_size: 1.1617}
+        rs12821256: {effect_allele: G, effect_size: -0.89824}
+        rs4959270: {effect_allele: A, effect_size: -0.36359}
+        rs12203592: {effect_allele: T, effect_size: 1.997}
+        rs1042602: {effect_allele: T, effect_size: 0.065432}
+        rs1800407: {effect_allele: A, effect_size: -0.49601}
+        rs2402130: {effect_allele: G, effect_size: 0.26536}
+        rs12913832: {effect_allele: T, effect_size: 1.9391}
+        rs2378249: {effect_allele: C, effect_size: -0.089509}
+        rs683: {effect_allele: C, effect_size: 0.15796}
+description:
+  name: HirisPlex
+```
+### Description
+### Model keys glossary
+- `model` - generic model that can aggregate results of other model types
+- `diplotype_model`
+    Required keys:
+    - `diplotypes`
+- `description` - all properties to be included in the final results
+### Usecases
+#### PGX
+```
+python3 -m pip install polygenic
+pgstk pgs-compute --vcf [PATH_TO_VCF_GZ] --model cyp2d6-pharmvar.yml --print | jq .haplotype_model.haplotypes.match
+```
+## License
+Proprietary (contact@intelliseq.pl)
+## Updates
+### 2.5.0
+- FEATURE: copy-number (CNV) star-allele calling for CYP2D6 (`*5`, `*1xN`) and CYP2C19 (`*36`/`*37`). `polygenic` consumes copy number resolved by an upstream caller (symbolic `<DEL>`/`<DUP>` ALT + range + `FORMAT/CN` + phase) and emits non-diploid calls (`*5/*5`, `*1/*5`, `*1/*4x2`). It does no depth/ratio math — allele-specific duplication side is read from phase; unphased input yields `*a/*b (CNx)` + `allele_specific_unphased`. See [docs/pgx-cnv.md](docs/pgx-cnv.md).
+- FEATURE: `haplotype_model` gains a `copy_number:` block, `structural:` haplotype definitions (`scope: whole|partial`), and a CYP2D6-only `multiplication:` block.
+- TEST: structural VcfRecord unit tests and CNV deletion/duplication/regression integration tests (`test_vcfrecord_structural.py`, `test_haplotypemodel_cnv.py`).
+### 2.4.0
+- FEATURE: `--ref-fallback` flag (opt-in; default off). `./.` genotypes now stay `source="missing"` by default — matches PharmCAT behavior and fixes spurious CYP2C19*38 calls on panels that don't assay rs3758581.
+- FEATURE: `--top-n` flag (default 15). Haplotype caller returns a call when either the match ≥50% OR the scored candidate pool is ≤top_n, enabling calls on sparse panels without ambiguity risk.
+- BUG: ScoreModel.compute no longer KeyErrors on `adjusted_score` when a variant has no `af` field (model.py:549).
+- BUG: Diplotype QC now correctly counts variant sources (previously miscounted every variant as "missing" due to structural mismatch in compute_qc).
+- BUG: ScoreModel handles models without `args` section (previously AttributeError on `self.get("args").get("prevalence")`).
+- TEST: Added PGx truth-set integration tests against 1000 Genomes phased GRCh38 slices (NA12878, NA18507, NA19240 × {CYP2C19, CYP2C9, CYP2B6, CYP2D6, SLCO1B1}); rebuildable via `scripts/build_pgx_fixtures.py`.
+- TEST: Added PharmVar-named haplotype tests and NA18507 CYP2B6/SLCO1B1 regression tests.
+### 2.3.17
+- BUG: resolved bug with low weight for missing genotypes
+### 2.3.16
+- BUG: resolved bug with weight of genotypes in haplotypes
+### 2.3.15
+- BUG: resolved bug with not enough haplotypes to check
+### 2.3.14
+- BUG: resolved bug with wrong leftover genotypes
+### 2.3.12
+- FEATURE: added gene names to genotypes if available
+### 2.3.11
+- BUG: resolved bug with wrong genotype sources counts
+### 2.3.10
+- BUG: resolved bug with missing genotype sources counts
+### 2.3.9
+- FEATURE: add reference as a genotyping source
+### 2.3.8
+- BUG: resolved bugs inside mobigen wdl task
+### 2.3.7
+- FEATURE: added ldproxy imputation source
+### 2.3.6
+- BUG: resolved bug with missing polars package after installation
+### 2.3.5
+- BUG: resolved bug with 'type' object is not subscriptable running pgstk
+### 2.3.4
+- BUG: resolved bug with where model does not provide to or from category fields
+### 2.3.3
+- BUG: resolved bug with missing pyarrow package after installation
+### 2.3.2
+- BUG: renamed jpg to jpeg outputs from vcfstat
+### 2.3.1
+- BUG: resolved bug with missing importlib-resources package after installation
+### 2.3.0
+- FEATURE: added vcf stat tool for zygosities
+- FEATURE: added vcf stat tool for baf computation
+### 2.2.15
+- UPDATE: updated parsing for new version of pan biobankuk
+- DEV: updated numpy version to 1.23.4
+### 2.2.14
+- FEATURE: added module for ldproxy imputing
+- FEATURE: added option for merging output as an array instead of dictionary in pgs-compute
+### 2.2.13
+- BUG: resolved bug with missing score in haplotype model
+- DEV: cleaned up test resources
+### 2.2.12
+- BUG: resolved bug with empty argument in executable
+### 2.2.11
+- BUG: resolved bug with naming of multiple models in one file
+### 2.2.10
+- DOC: improved diploty model documentation
+### 2.2.9
+- BUG: missing effect allele in diplotyp models
+### 2.2.8
+- BUG: imputed source is based on IMP tag in the INFO field or GT:DS in format field
+### 2.2.7
+- BUG: repaired bug with missing math library in eval
+### 2.2.6
+- FEATURE: added qc to model results
+### 2.2.5
+- ENHACEMENT: libraries updates
+### 2.2.0
+- ENHANCEMENT: better computing of haplotype models. First one haplotype is identified and further the second haplotype is identified from leftover genotypes
+- ENHANCEMENT: moved argparse from tools to pgstk
+### 2.1.10
+- BUG: resolved bug with wrong plink.clumped path in clumping
+### 2.1.9
+- BUG: resolved bug with missing index in biobankuk model
+### 2.1.8
+- BUG: resolved bug with biobankuk model for codenames with special characters
+### 2.1.7
+- BUG: resolved bug with haplotype model where none of haplotypes matched genotype. Most probable genotype is provided
+### 2.1.6
+- DOC: added docker badges
+- FEATURE: added posibility to output all pgs results in one json file `--merge-outputs`
+- FEATURE: added category to diplotype model
+- FEATURE: added caching in genotyping module
+### 2.1.5
+- BUG: biobankuk model output files now contain only alphanumeric characters
+- BUG: biobankuk model code names with special characters are now being downloaded
+### 2.1.4
+- FEATURE: added model_name and sample_name to description
+### 2.1.3
+- FEATURE: added support for multiple models in pgs-compute
+- FEATURE: added missing variants count to haplotype in haplotype model
+- BUG: id field in haplotype model
+### 2.1.2
+- FEATURE: allow gnomadid for variant in yml models
+- FEATURE: added printing output option in pgs-compute
+### 2.1.1
+- BUG: resolved NoneType bug with empty haplotype
+### 2.1.0
+- FEATURE: haplotype model now works with phased data
+### 2.0.0
+- FEATURE: switched to yaml model definitions
+- FEATURE: implemented formula, score, haplotype and diplotype model types
+- FEATURE: added gene symbols to description
+- DEVOPS: prepared docker image with resources for building models

polygenic_pgx-2.5.0/polygenic/__init__.py ADDED Viewed

File without changes

polygenic_pgx-2.5.0/polygenic/__main__.py ADDED Viewed

@@ -0,0 +1,12 @@
+# -*- coding: utf-8 -*-
+"""
+Entrypoint module
+"""
+import sys
+from polygenic.pgstk import main
+if __name__ == "__main__":
+    main(sys.argv[1:])

polygenic_pgx-2.5.0/polygenic/data/__init__.py ADDED Viewed

File without changes