repare 0.1.4__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- repare/__init__.py +4 -0
- repare/main.py +89 -0
- repare/pedigree.py +1484 -0
- repare/pedigree_reconstructor.py +1186 -0
- repare-0.1.4.dist-info/METADATA +143 -0
- repare-0.1.4.dist-info/RECORD +10 -0
- repare-0.1.4.dist-info/WHEEL +5 -0
- repare-0.1.4.dist-info/entry_points.txt +2 -0
- repare-0.1.4.dist-info/licenses/LICENSE +7 -0
- repare-0.1.4.dist-info/top_level.txt +1 -0
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: repare
|
|
3
|
+
Version: 0.1.4
|
|
4
|
+
Summary: Reconstruct (ancient) pedigrees from pairwise kinship relations.
|
|
5
|
+
Author-email: Edward Huang <edwardhuangc@gmail.com>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Source, https://github.com/ehuangc/repare
|
|
8
|
+
Requires-Python: >=3.10
|
|
9
|
+
Description-Content-Type: text/markdown
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Requires-Dist: matplotlib
|
|
12
|
+
Requires-Dist: networkx
|
|
13
|
+
Requires-Dist: pandas
|
|
14
|
+
Requires-Dist: tqdm
|
|
15
|
+
Provides-Extra: benchmark
|
|
16
|
+
Requires-Dist: scikit-learn; extra == "benchmark"
|
|
17
|
+
Requires-Dist: seaborn; extra == "benchmark"
|
|
18
|
+
Provides-Extra: plot
|
|
19
|
+
Requires-Dist: pygraphviz; extra == "plot"
|
|
20
|
+
Dynamic: license-file
|
|
21
|
+
|
|
22
|
+
🌲 **repare** is a Python package for (ancient) pedigree reconstruction.
|
|
23
|
+
|
|
24
|
+
## Installation
|
|
25
|
+
|
|
26
|
+
### Recommended
|
|
27
|
+
```
|
|
28
|
+
curl -O https://raw.githubusercontent.com/ehuangc/repare/main/repare-environment.yml
|
|
29
|
+
conda env create -f repare-environment.yml
|
|
30
|
+
conda activate repare
|
|
31
|
+
```
|
|
32
|
+
repare uses PyGraphviz to plot reconstructed pedigrees. Since PyGraphviz relies on Graphviz which cannot be installed using `pip`, we recommend installing repare and its dependencies in a fresh conda environment, as shown above. This conda-based installation method automatically installs Graphviz and ensures PyGraphviz is linked to it.
|
|
33
|
+
|
|
34
|
+
To install conda, see [this page](https://www.anaconda.com/docs/getting-started/miniconda/install).
|
|
35
|
+
|
|
36
|
+
### Other
|
|
37
|
+
If you don't need to plot reconstructed pedigrees, you can install repare directly with `pip install repare`. If you need to plot reconstructed pedigrees but have your own Graphviz installation, you can install repare and Pygraphviz (and not Graphviz) with `pip install repare[plot]`.
|
|
38
|
+
|
|
39
|
+
To install PyGraphviz and Graphviz (yourself), see [this page](https://pygraphviz.github.io/documentation/stable/install.html).
|
|
40
|
+
|
|
41
|
+
## Usage
|
|
42
|
+
|
|
43
|
+
We recommend running repare through its command-line interface.
|
|
44
|
+
```
|
|
45
|
+
repare -n NODES -r RELATIONS [-o OUTPUT] [-m MAX_CANDIDATE_PEDIGREES] [-e EPSILON] [-s SEED] [-d] [-w] [-v]
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
> [!NOTE]
|
|
49
|
+
> Minimal command:
|
|
50
|
+
> ```
|
|
51
|
+
> repare -n nodes.csv -r relations.csv
|
|
52
|
+
> ```
|
|
53
|
+
> For example data inputs, see [examples/nodes.csv](https://github.com/ehuangc/repare/blob/main/examples/nodes.csv) and [examples/relations.csv](https://github.com/ehuangc/repare/blob/main/examples/relations.csv).
|
|
54
|
+
|
|
55
|
+
### Inputs
|
|
56
|
+
**Nodes** (-n) (*required*): Path to a CSV file that contains information about the individuals to be analyzed by repare.
|
|
57
|
+
|
|
58
|
+
<dl>
|
|
59
|
+
<dd>
|
|
60
|
+
<details open>
|
|
61
|
+
<summary><ins>Nodes CSV file columns</ins></summary>
|
|
62
|
+
|
|
63
|
+
- **id** *(required)*: ID of individual. Cannot be fully numeric, as numeric IDs are reserved for placeholder nodes.
|
|
64
|
+
- **sex** *(required)*: Genetic sex of individual. Value must be "M" or "F".
|
|
65
|
+
- **y_haplogroup** *(required)*: Y chromosome haplogroup of individual. Can include "*" as a wildcard expansion character at the end if haplogroup is not fully inferred.
|
|
66
|
+
- **mt_haplogroup** *(required)*: Mitochondrial haplogroup of individual. Can include "*" as a wildcard expansion character at the end if haplogroup is not fully inferred.
|
|
67
|
+
- **can_have_children** *(optional)*: Whether the individual *can* have offspring (e.g., as indicated by age of death). If provided, value must be "True" or "False". Defaults to "True".
|
|
68
|
+
- **can_be_inbred** *(optional)*: Whether the individual *can* have parents related at the 3rd-degree or closer (e.g., as indicated by ROH). If provided, value must be "True" or "False". Defaults to "True".
|
|
69
|
+
- **years_before_present** *(optional)*: (Approximate) date of birth of individual, in years before present. If provided, will be used to prune temporally invalid pedigrees. *This column should only be used when backed by strong dating evidence.*
|
|
70
|
+
</details>
|
|
71
|
+
</dd>
|
|
72
|
+
</dl>
|
|
73
|
+
|
|
74
|
+
**Relations** (-r) (*required*): Path to a CSV file that contains information about inferred pairwise kinship relations. Methods to infer these kinship relations include [KIN](https://doi.org/10.1186/s13059-023-02847-7) and [READv2](https://doi.org/10.1186/s13059-024-03350-3). All individuals included in this file must be specified in the nodes CSV.
|
|
75
|
+
|
|
76
|
+
<dl>
|
|
77
|
+
<dd>
|
|
78
|
+
<details open>
|
|
79
|
+
<summary><ins>Relations CSV file columns</ins></summary>
|
|
80
|
+
|
|
81
|
+
- **id1** *(required)*: ID of individual 1.
|
|
82
|
+
- **id2** *(required)*: ID of individual 2.
|
|
83
|
+
- **degree** *(required)*: Degree of (inferred) kinship relation between individual 1 and individual 2. Value must be "1", "2", or "3". Higher-degree relatives are considered unrelated.
|
|
84
|
+
- **constraints** *(optional)*: Semicolon-delimited list of possible configurations of kinship relation. For example, a parental 1st-degree relation can be constrained with "parent-child;child-parent". Many kinship inference methods will classify 1st-degree relation types, which can be used as relation constraints. Valid constraints: "parent-child", "child-parent", "siblings", "maternal aunt/uncle-nephew/niece", "maternal nephew/niece-aunt/uncle", "paternal aunt/uncle-nephew/niece", "paternal nephew/niece-aunt/uncle", "maternal grandparent-grandchild", "maternal grandchild-grandparent", "paternal grandparent-grandchild", "paternal grandchild-grandparent" "maternal half-siblings", "paternal half-siblings", "double cousins".
|
|
85
|
+
- **force_constraints** *(optional)*: Whether the corresponding constraint should be forced. If provided, value must be "True" or "False". If "True", the constraint must be followed. If "False", breaking the constraint counts as one inconsistency. Defaults to "False".
|
|
86
|
+
</details>
|
|
87
|
+
</dd>
|
|
88
|
+
</dl>
|
|
89
|
+
|
|
90
|
+
**Output** (-o) (*optional*): Path to directory for saving repare outputs. Defaults to the current working directory.
|
|
91
|
+
|
|
92
|
+
**Max Candidate Pedigrees** (-m) (*optional*): Maximum number of candidate pedigrees to keep after each algorithm iteration. Defaults to 1000.
|
|
93
|
+
|
|
94
|
+
**Epsilon** (-e) (*optional*): Parameter for adapted epsilon-greedy sampling at the end of each algorithm iteration. Defaults to 0.2.
|
|
95
|
+
|
|
96
|
+
**Seed** (-s) (*optional*): Random seed for reproducibility. Defaults to 42.
|
|
97
|
+
|
|
98
|
+
**Do Not Plot** (-d) (*flag*): If set, do not plot reconstructed pedigree(s).
|
|
99
|
+
|
|
100
|
+
**Write Alternate Pedigrees** (-w) (*flag*): If set, write outputs for alternate reconstructed pedigrees. These pedigrees share the same number of inconsistencies and 3rd-degree "tiebreaker" inconsistencies as the primary output pedigree. Also write the kinship relations that are constant across the primary and all alternate pedigrees.
|
|
101
|
+
|
|
102
|
+
**Verbose** (-v) (*flag*): If set, enable verbose output (INFO-level logging).
|
|
103
|
+
|
|
104
|
+
> [!TIP]
|
|
105
|
+
> Run `repare --print-allowed-constraints` to print a list of allowed relation constraint strings directly from the CLI.
|
|
106
|
+
|
|
107
|
+
<p align="center">
|
|
108
|
+
<img src="https://raw.githubusercontent.com/ehuangc/repare/main/examples/algorithm_diagram.svg" alt="Reconstruction Process Diagram" />
|
|
109
|
+
<br>
|
|
110
|
+
<em>Diagram of repare's pedigree reconstruction process</em>
|
|
111
|
+
</p>
|
|
112
|
+
|
|
113
|
+
## Reproducibility
|
|
114
|
+
We recommend using [pixi](https://pixi.sh/) to reproduce the results in this repo.
|
|
115
|
+
```
|
|
116
|
+
git clone https://github.com/ehuangc/repare.git
|
|
117
|
+
cd repare
|
|
118
|
+
pixi shell
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Once in the pixi shell, you can run the script(s) corresponding to the results you'd like to reproduce. For example:
|
|
122
|
+
```
|
|
123
|
+
python benchmarks/published/run_parameter_experiment.py
|
|
124
|
+
exit
|
|
125
|
+
```
|
|
126
|
+
To install pixi, see [this page](https://pixi.sh/latest/installation/).
|
|
127
|
+
|
|
128
|
+
## Citation
|
|
129
|
+
If you use repare in your work, please cite our preprint:
|
|
130
|
+
|
|
131
|
+
> Huang, E. C., Li, K. A., & Narasimhan, V. M. (2025). Fault-tolerant pedigree reconstruction from pairwise kinship relations. bioRxiv. [https://doi.org/10.1101/2025.08.21.671608](https://doi.org/10.1101/2025.08.21.671608)
|
|
132
|
+
|
|
133
|
+
```
|
|
134
|
+
@article{repare_preprint2025,
|
|
135
|
+
doi = {10.1101/2025.08.21.671608},
|
|
136
|
+
author = {Huang, Edward C. and Li, Kevin A. and Narasimhan, Vagheesh M.},
|
|
137
|
+
title = {Fault-tolerant pedigree reconstruction from pairwise kinship relations},
|
|
138
|
+
journal = {bioRxiv},
|
|
139
|
+
month = {aug},
|
|
140
|
+
year = {2025},
|
|
141
|
+
url = {https://doi.org/10.1101/2025.08.21.671608},
|
|
142
|
+
}
|
|
143
|
+
```
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
repare/__init__.py,sha256=esYDcCYXwJUJVVpWQFFpocPwCZzL5xo_Hxihbknfunc,138
|
|
2
|
+
repare/main.py,sha256=6RffJ5YwOHp4aOm9u9mZ5wwpk416W4G6xsrIS-umnNw,3396
|
|
3
|
+
repare/pedigree.py,sha256=g4FntEa-ceksc5CnXe7JlvzVxgSscUwrW0wrw5k68r8,71514
|
|
4
|
+
repare/pedigree_reconstructor.py,sha256=j9jKwAGetehBFz44zIDIlmbdyGxTSloZPG5NzkCdRn4,56261
|
|
5
|
+
repare-0.1.4.dist-info/licenses/LICENSE,sha256=E49gc2SMWWehuPkrKV8-UHjiZ9AcHOFk3j8IJqTGTjI,1064
|
|
6
|
+
repare-0.1.4.dist-info/METADATA,sha256=5YhQF2ynnWqiy5Cjom7XYx34jO4Fi428uut1yo-BfiU,7880
|
|
7
|
+
repare-0.1.4.dist-info/WHEEL,sha256=qELbo2s1Yzl39ZmrAibXA2jjPLUYfnVhUNTlyF1rq0Y,92
|
|
8
|
+
repare-0.1.4.dist-info/entry_points.txt,sha256=tWRppCTqmNN8n4hJ_ShCgO8dJFU4PKTQsexMZS-PFHw,44
|
|
9
|
+
repare-0.1.4.dist-info/top_level.txt,sha256=MBgnP6OarsEmlqLXjKcPqKFIMIdpwADg5vt6eMPVA0M,7
|
|
10
|
+
repare-0.1.4.dist-info/RECORD,,
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
Copyright 2024 (c) Edward Huang
|
|
2
|
+
|
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
|
4
|
+
|
|
5
|
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
|
6
|
+
|
|
7
|
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
repare
|